[00:10:18] --- steven.jenkins has left [00:11:09] --- steven.jenkins has become available [01:58:39] --- abo has become available [02:22:57] Is OpenID supposed to work on the new OpenAFS Wiki? [03:59:04] --- Simon Wilkinson has left [04:57:33] > I bet I can only list them one at a time amd64_* [05:02:37] openid: it worked for russ. [05:22:38] --- JSund has become available [05:45:20] --- matt has become available [06:13:09] --- sxw has become available [06:14:27] not for me, both with the Google idp, nor with our local one. [06:14:41] Does it not support https? [06:34:34] --- kaj has become available [06:35:33] --- sxw has left [06:39:37] --- sxw has become available [06:39:53] --- meffie has left [06:40:24] --- meffie has become available [07:05:22] --- jaltman has left: Disconnected [07:05:31] --- jaltman has become available [07:05:48] --- kaj has left [07:07:07] i haven't used the openi support at all, so i can tell you nothing. i'd bet the openid google support needs help [07:07:11] is http://openafs-wiki.stanford.edu/AFSLore/ supposed to be a directory listing? [07:08:58] well, there's no index.mdwn in it [07:09:04] so, it's "supposed" to be that [07:09:30] note that that's not the link that was in the email. i suppose the navbar could be pointed at the same link for now [07:09:54] sure, but I'm using the link from the navbar [07:10:50] --- abo has left [07:11:06] --- abo has become available [07:12:26] the site definitely needs work, but at least it's in a place where the work can be done [07:23:32] --- deason has become available [07:24:42] --- reuteras has left [07:27:05] navbar fixed. main page cleaned up slightly [07:30:20] --- sxw has left [07:38:32] --- sxw has become available [07:40:46] --- sxw has left [07:53:30] --- kaj has become available [07:55:50] --- mho has left [07:57:36] --- mho has become available [08:00:58] --- Russ has become available [08:19:55] --- kaj has left [08:27:15] --- sxw has become available [08:29:41] I'd actually rather most changes to the wiki go through Git and Gerrit, since I think that's going to be our fallback once people start spamming it. OpenID did work for me, though. [08:29:54] However, Stanford is having various network problems this morning, which may be part of it. [08:30:11] Also, no, SSL probably doesn't work since I didn't get a cert for the current name since I was expecting it to change. [08:31:53] if you configure the name to be wiki.openafs.org jhutz is (or maybe has) set it up [08:32:19] ah, not yet [08:32:41] --- sxw has left [08:32:53] It will respond to wiki.openafs.org now, I believe, once DNS is set up. [08:33:46] ok [09:07:42] src/rxkad/v5gen.c indicates it was generated from something and I should not edit it... should I be listening to it? [09:07:59] ydrgen, i think [09:08:09] no wait, the asn1 tool. anyway, edit away [09:10:19] Good morning. Thanks for the heads-up on config, Derrick; have a new patch. [09:10:28] excellent [09:10:38] i tried to reply last night and my cable modem fell over [09:10:43] so i gave up and slept [09:10:48] Sleep is good. [09:12:28] I wonder if the tool still generates code like that? the way that code is written, it really looks like there are serious memory management problems with it [09:13:45] (brief example... decode_Ticket calls decode_PrincipalName... if decode_PrincipalName fails, it frees the principal data itself before returning; decode_Ticket frees the ticket if decode_PrincipalName fails, which involves freeing the sname principal name again) [09:15:20] and there's also that many of the decode_* functions (if not all of them) free their working data on about half of the errors they encounter, but not the other half [09:16:00] so... if there's a tool that's generating code like this, I'd kinda want to tell them about it, too [09:16:15] it's part of heimdal. i suspect it's been modified [09:25:55] damn phantom leg vibrations. it's not even the phone [09:25:57] er, mix [09:29:47] --- kaj has become available [09:29:58] --- kaj has left [09:35:48] asn1_gen [09:36:47] asn1_gen appears to be something else... I think I'm looking for asn1_compile [09:37:02] sorry. yeah. [09:37:19] the thing is, all this source is, is copied and pasted from what's in a lib/asn1 build of heimdal [09:38:01] could just generate it again from a modern heimdal and pull it in; but I assume people'd just want individual things fixed [09:38:02] so the question is whether the right answer is to regenerate it or fix what we have [09:38:29] i have no strong opinion [09:41:33] actually, i have a heimdal tree right here [09:42:17] --- matt has left [09:46:03] but this is a moot point, since modern heimdal still seems to have at least part of this problem [09:46:14] well, that was what i wanted to find out :) [09:46:21] 1.3.3 does, anyway [09:46:31] ideally heimdal would stop having the problem [09:46:34] at least it solves the problem of functions freeing the data half of the time [09:47:03] now it never frees it :) [09:52:21] I think ours is "either never free or free twice", whereas new heimdal is "either free once or twice" [09:52:26] it's an improvement :) [10:06:41] if I wanted to talk to heimdal about this, do I want to use heimdal-discuss, or the bug tracker or something? (not used to not having -devel) [10:07:29] heimdal-discuss is probably fine to talk about it. patches to heimdal-bugs@h5l.org [10:10:17] Love seems to like me sending patches to heimdal-discuss, actually. It's not clear to me what the bug list is really used for. [10:23:59] or maybe it's not a problem with modern heimdal... is free(NULL) always actually okay? [10:24:10] --- kaj has become available [10:24:10] not on sunos4, but otherwise, yes [10:25:06] --- kaj has left [10:25:21] what I actually mean is, do we care if we free(NULL)? [10:25:30] only on sunos4 [10:25:38] (e.g. no) [10:41:24] I tend to add checks for freeing NULL pointers because in most cases an attempt to do so is a logic error [10:44:02] regarding the asn1 from Heimdal, Love has completely re-written the asn1 engine in Heimdal within the last year. [10:44:27] i regenerated our v5gen.c and am making it pretty again [12:26:00] --- kaj has become available [12:35:49] --- kaj has left [12:41:38] delayed, but the updated code from current heimdal is pushed to gerrit [12:42:12] except i think i pushed not all of it. sigh [12:43:39] --- sxw has become available [12:49:50] wrong sha1. oops. [12:52:44] 127544, btw; I'm assuming you can see that [12:53:33] let's find out [12:55:36] at the moment i can't, but i think i'm gonna fix that. stupid mysql transaction lock bullshit [12:57:14] Subject: Why go to Dartmouth? Beat the system for a good life. actinobacillosis acting acescence yeah,i think no [12:57:49] 127554 [13:14:35] --- tkeiser has become available [13:14:57] --- sxw has left [13:16:22] --- Simon Wilkinson has become available [13:21:19] --- Simon Wilkinson has left [13:21:53] --- Simon Wilkinson has become available [13:25:20] Given to sxw. Gee. Thanks :) [13:25:38] >I'd actually rather most changes to the wiki go through Git and Gerrit [13:26:07] I really don't think that's workable for a wiki. It's certainly over my tolerance factor if I want to quickly edit I page. I'll just end up not writing pages. [13:31:04] My personal experience is that I'm way more likely to do quick edits of pages that way then over the web editor. [13:31:17] It depends a lot on who we're expecting most of the contributors to be. [13:31:25] If we're hoping for random drive-by Internet users, I agree with you. [13:31:28] I think we need to be able to support both. [13:31:51] If it's mostly the same thirty of us who have local checkouts, editing a text file in an editor and then doing git push (I assume Gerrit would be set up to auto-approve) is way easier than touching a web browser, IMO. [13:31:57] I don't think we're looking for random drive by users, but I'd really hope that the wiki is, and can be, of interest and use to more than just the developers. [13:32:19] And I don't want the barrier to contributing text to the wiki being "Learn about git, learn about gerrit" [13:32:28] I would ideally like to support both, but if the wiki starts getting spammed to death, I'm not going to spend a bunch of time filtering out spam and reverting changes. If we have volunteers to do that, that's great -- if not, I think we're going to need to keep raising the bar of entry until the spammers give up. [13:32:50] Web edits should work fine now. [13:32:55] The OpenID thing works for me. [13:33:14] I don't know what's not working for other people -- unfortunately, I'm probably not a good person to try to debug. I'm happy to try applying configuration changes that people think will help. [13:33:32] Are your IDPs using https? [13:33:45] I can also re-enable simple password authentication, although I think that's going to get us spammed pretty fast. [13:33:53] No. [13:34:33] So it may just be the missing cert, which we can fix when we have wiki.openafs.org DNS, I think, and I think someone (Derrick?) was already going to ask jhutz about that. [13:35:25] The web server certificate (or lack of it) doesn't come into it. [13:35:43] Does the machine that you're using have the necessary perl modules installed for LWP to do https? [13:36:02] (it's an outgoing client https connection, rather than an incoming one, that will be the problem) [13:36:04] Oh, on the other side. that's a good question. I bet not. [13:36:06] What do I need? [13:36:10] Do you know off-hand? [13:36:12] Crypt::SSLeay [13:36:23] or something else. The something else is better, but I can never remember what it is. [13:36:28] Have that, yes. [13:36:34] (It's the same system Gerrit and whatnot is running on, btw.) [13:36:42] --- tkeiser has left [13:37:16] I'll go poke it with a stick. [13:37:47] --- tkeiser has become available [13:38:37] I suspect you're right that I'm missing some Perl modules. [13:38:47] It's got the modules it needs to do https. [13:39:05] That is, perl -MLWP::Simple -e 'getprint "https://www.google.com/"' works. [13:39:28] It looks like it just uses Net::OpenID::Consumer for the OpenID work. [13:39:54] And LWPx::ParanoidAgent (which is also installed). [13:41:20] Net::OpenID::Consumer is certainly supposed to work with Google's OpenID implementation. [13:45:14] I wonder if LWPx::ParanoidAgent is being too, err, paranoid. I'm half-tempted to remove it, which makes ikiwiki fall back on LWP. [13:46:05] This is debian stable, right? [13:46:14] yes. [13:46:54] Yeah. The version of Net::OpenID::Consumer on it is prehistoric, and known not to work properly. [13:47:03] Oh, sigh. [13:47:18] Oh, but hey, I can fix that. [13:47:48] If you perl -MNet::OpenID::Consumer, it also complains about a missing Math::BigInt library. [13:48:00] Okay, upgraded to 1.03. [13:48:23] Yeah. Looks like Crypt::DH wants Math::BigInt::GMP or Math::BigInt::Pari [13:48:39] Installed the first. [13:49:38] Yay! OpenID editing now works. [13:49:44] --- sxw has become available [13:49:44] (using a Google Account, that is) [13:50:06] Okay, cool. [13:50:08] Thanks Russ! [13:50:15] Here's hoping Derrick's right that spammers aren't using OpenID yet. [13:50:33] I don't think they've got that sophisticated yet. [13:51:48] --- tkeiser has left [13:52:13] --- tkeiser has become available [14:22:03] the hard work is done, we did patches, right? ;) [14:24:53] --- sxw has left [14:26:22] btw, all these if test "x$GCC" = "xyes" tests are broken--plenty of platforms test positive for gcc, but we still don't use it... [14:26:56] Are we doing that, or is configuring doing that for us? [14:27:22] at this point we shouldn't be overriding cc nearly so often [14:28:48] dux, hpux 10 and 11, aix 4-6, irix. [14:29:14] so, .... of those, the modern one is aix. [14:29:26] also broken on solaris [14:30:04] depends what SOLARISCC ends up set to [14:30:53] i suppose it defaults to cc. gcc is a valid choice. [14:33:02] I suspect that we should start building userland with whatever configure selects as CC, and only do our special stuff in the kernel module [14:33:28] which I hope is not gcc on some of these platforms. their optimizer sucks goats on a lot of these archs [14:33:37] well, other than on solaris, the issue is the platforms that needs testing on are the same ones we use cc on [14:34:22] and solaris we could fix easily. if SOLARISCC is not set to something which is gcc, set $GCC to no [14:34:28] autoconf will always pick gcc in preference to the machine's bundled compiler. [14:34:28] (in acinlude.m4) [14:34:38] --- abo has left [14:34:39] unless you tell it not to [14:34:54] --- abo has become available [14:34:55] My point is that we're an autoconf userland application. We should stop trying to be smarter than the user that's driving the build. [14:35:03] on aix i build: CC=cc KRB5LIBS="-L/usr/krb5/lib -lkrb5 -lksvc" ./configure [14:35:06] If the user sets CC to something, we should use it. [14:35:26] Otherwise we should do what every other autoconf application does, and use gcc without whimpering. [14:35:34] well, then we need a KCC [14:35:44] which is fine, but it is what it is [14:36:12] actually, i could fix solaris easily. hang on [14:36:20] disagree. it not only violates the principle of least surprise, it also is doing something which is deliberately braindead. we know better [14:36:45] I run ./configure, I expect what happens every time I run ./configure [14:36:46] "it" [14:36:54] tom, what violates ... [14:37:32] I don't expect some half arsed decisions that were made 10 years ago to come back an bite me. Witness the Solaris problem, where no modern Solaris box had a compiler where we would look for one. [14:38:03] existing behavior on the proprietary platforms is to use the native cc. I suppose if we put it in the release notes, and do it at a major version change then it's ok. I still very much dislike preferring gcc, but whatever [14:38:19] I'll just have to run around telling everyone to override CC [14:38:32] for binary builds, i certainly do [14:38:53] (override) [14:40:29] solaris patch to honor whatever CC is selected: 2307 [14:40:36] (except for the kernel) [14:45:22] --- mdionne has become available [14:47:15] I agree with Simon that we should do what Autoconf does for userspace builds rather than trying to fiddle with it. Fiddling with it creates various annoying problems; either we override a user who wanted to set CC, or we lose on various Autoconf-generated tests because they're done assuming GCC and then we switch compilers. [14:47:24] We need to separate the kernel compiler out to a separate setting. [14:47:26] the real problem with knowing better, is when do we really know better? i'd like to just benchmark every compiler and do something sensible. of course, for the average user, it's the kernel which will do most of the AFS work anyway; only for admins will vos otherwise even be relevant, and generally, badly optimized? eh. [14:47:52] In my experience, 95% of users either have gcc installed and want to use it or don't have gcc installed. [14:48:04] The 5% case where they installed gcc but want to use something else can set CC. [14:48:08] badly optimized (for things not vos) that is [14:50:23] I actually want developers to use gcc by default because that's what all of our diagnostic infrastructure is built around. (Other compilers are capable, but I doubt anyone's going to volunteer to do the work Simon did for gcc with, say, IRIX cc.) [14:51:00] Although if we were warning-free with IRIX cc, we could eat off of the code.... :) [14:51:58] well, we'd have other issues. irix cc wants things in some cases that we can't really have [14:53:01] never mind clang-analyzer. irixcc-analyzer. [14:53:08] Yeah... [14:53:34] Although, it wouldn't surprise me if XCode ships with clang by default, rather than gcc, before too much longer. [14:56:10] That 95/5 split does not meet with my experience in big commercial unix installations... [14:56:13] given gplv3... [14:58:10] I was actually going to look at getting warnings/checking to work with sunwspro once I get a chance... [15:04:19] * Russ is very dubious of the claim that gcc isn't optimized enough when applied to anything that isn't performance-critical scientific calculations or similar tight-loop, highly-optimized performance-critical code. [15:04:46] I really doubt that compiler optimization is a meaningful bottleneck for the fileserver compared to all the other crap that slows it down. [15:05:29] For simulation code and number crunching, absolutely, gcc can suck. [15:05:53] Crypto is, I suppose, a marginal case. [15:05:59] --- sxw has become available [15:06:08] Although the best way to speed up fcrypt is not to pick a better compiler, but to replace it with AES. :) [15:10:04] I can definitely say I've seen the cpu be a bottleneck in our code... but yeah, crypto [15:10:08] --- sxw has left [15:10:28] but also things like lock contention and memory allocation, though I'm less sure how much our stuff would play into that [15:13:07] --- sxw has become available [15:14:35] --- sxw has left [15:16:20] I'm dubious compiler optimization would help with that, too. [15:16:25] Crypto, maybe. [15:16:30] I guess I can kind of see it for fcrypt. [15:32:41] --- kaj has become available [15:32:51] --- kaj has left [15:56:34] --- tkeiser has left [17:43:06] --- Brandon Allbery has become available [17:45:51] --- deason has left [18:00:40] --- deason has become available [18:52:43] re the gcc discussion, arla already has (and has solved) this problem [18:53:08] might look at how they handle cc for nnpfs vs. everything else [19:25:12] --- matt has become available [19:29:17] --- matt has left [19:44:19] --- mdionne has left [19:50:45] --- geekosaur has become available [19:50:51] --- Brandon Allbery has left [19:52:08] huh. bitlbee is weird [20:35:35] --- jaltman has left: Replaced by new connection [20:35:36] --- jaltman has become available [20:51:53] --- Brandon Allbery has become available [20:52:09] --- Brandon Allbery has left [21:07:34] I'm kind of tempted to try adding WITNESS support to struct afs_lock (which would give file/line information about when something was locked). Someone tell me this is a bad idea. [21:08:18] i think it's a fine idea [21:08:32] it's on my list but i have a large list [21:10:24] --- jaltman has left: Disconnected [21:10:31] --- jaltman has become available [21:12:18] --- cclausen has become available [21:12:18] In other news, cp seems to be deadlocking against itself (or maybe the per-minute daemon thread) on afs_xvcache; cp wants a shared lock at afs_vcache.c:1551 (GetVCache) but appears to already have a write lock. [21:13:10] UpgradeSToWLock(&afs_xvcache, 21); [21:13:16] is "have shared, want write" [21:13:37] what is the actual cmdebug output? [21:13:45] Right, so it's probably deadlocking against itself. [21:13:56] maybe. or maybe you're misreading it :) [21:14:01] What, you want cmdebug output? :p (I'm just looking at a coredump) [21:14:17] Also a distinct possibility. [21:14:31] well, what's in the core? got backtrace? [21:15:04] Lots of backtrace; you want it here? [21:15:13] pastebin or afs [21:17:59] /afs/sipb.mit.edu/user/kaduk/freebsd/openafs/deadlock-2010-07-01/trace.txt [21:21:28] Also daemon.txt in that directory may be relevant. [21:25:51] hang on [21:27:14] Hm, looks like afs_vnop_create.c:415 grabbed a write lock. [21:27:38] Which matches the src_indicator [21:29:09] 138? [21:29:48] oh god. vgone(l) calling back into vop_close? [21:32:04] i think you need to drop xvcache around in osi_TryEvictVCache before GUNLOCK, then reacquire after... at least as the code is *now* structured. but that structure is probably wrong. hang on [21:32:11] Hm, that's odd. vnode_if.h:225 is in VOP_CLOSE [21:33:37] but vgonel() definitely has a check to see if the vnode is in use, which calls VOP_CLOSE and _INACTIVE [21:33:53] optimizer ftw. [21:35:04] this is why inactive can end up with different locks held depending on code path, incidentally. [21:35:13] but close i hadn't noticed before [21:36:39] Cute. [21:38:09] so yeah. TryEvict needs to drop xvcache before GUNLOCK; reacquire afyer GLOCK [21:38:24] it has one caller. that caller always has it, W [21:38:30] Okies. [21:39:32] Do we have a list of the various src_indicator values in use? [21:39:40] and... if you GUNLOCK, *slept should be set to 1 [21:39:56] nope. grep [21:40:18] (I assume I want to put something unique there) [21:40:23] yup [21:41:59] use "2" [21:42:07] oddly, nothing is. [21:42:16] no, i lie. never mind [21:42:36] 12 is safe [21:43:06] I was going to use 340, but 12 works, too. [21:43:58] use 340, then [21:44:11] other callers are "nearby" [21:44:36] at some point i want to make a debug tool which does something smart with lock location allocation and lookup. we'll have to renumber all locks above 1000 [21:46:14] It would be nice to have infinite time, yes. [21:46:47] well, periodically i find time for things i shouldn't because (thing) pisses me off a lot "now" [21:46:57] like, the darwin panic decoder [21:47:31] --- Kevin Sumner has left [21:47:34] --- meffie has left [21:47:37] "here are all my kernel debug kit dmgs" "here is the tree of afs installer dmgs" "here is a panic lot" (wait) get back a decoded panic [21:47:56] Sure. (I do this, too, from time to time.) [21:48:08] --- Kevin Sumner has become available [21:48:12] --- meffie has become available [21:49:16] Hm, apparently 'slept' was NULL ... [21:49:29] uh. &fv_slept is passed in? [21:49:50] and is clearly not null: int fv_slept; [21:49:52] ... maybe I should have pulled a crash dump. "Let's try that again" [21:50:20] and what i mean is if you are on a path where you drop glock, fv_slept is supposed to get set to 1. [21:50:25] ... and it wasn't actually NULL, anyway; rather 0x10 [21:50:34] Right. [21:51:19] e.g. the vlru could have resorted [21:57:14] Dumping ... and now I relocate to home. [21:57:37] --- deason has left [22:13:57] --- reuteras has become available [23:18:29] grr. silly optimizer, osi_TryEvictVCache is not in vnode_if.h [23:31:18] --- Russ has left: Disconnected [23:46:15] Hm. Looks like AFSTOV(avc) is null for the VOP_UNLOCK. If I declare a local struct vnode *vp = AFSTOV(avc) at the top of osi_TryEvictVCache, I no longer page fault, but I do get to deal with reclaim ...