[00:01:54] --- Russ has left: Disconnected [00:11:23] --- shadow@gmail.com/owl8E1009BB has left [00:24:54] --- summatusmentis has become available [00:34:07] --- dev-zero@jabber.org has left [01:14:20] --- dev-zero@jabber.org has become available [01:31:40] --- Simon Wilkinson has become available [01:51:59] --- jaltman has become available [02:41:33] --- haba has left [03:35:10] --- haba has become available [04:02:19] --- meffie has left [04:53:39] --- shadow@gmail.com/owl25200966 has become available [05:02:44] --- kula has become available [05:17:49] --- pod has become available [05:50:24] --- jaltman has left: Disconnected [06:06:53] --- meffie has become available [06:22:12] --- meffie has left [06:54:38] --- deason has become available [06:55:29] I'm sorely tempted to clone the OpenAFS tree into github, if only to get their code browsing tools ... [07:03:23] --- meffie has become available [07:04:41] i considered it (cloning to github) [07:05:10] the risk is someone forks that and does development on something stale [07:08:20] --- Simon Wilkinson has left [07:08:26] --- stevenjenkins has left [07:11:17] --- stevenjenkins has become available [07:14:37] --- Simon Wilkinson has become available [07:14:53] Indeed. [07:15:23] I wonder if we could make gerrit push to github at the same time as it pushes everywhere else. [07:15:58] (reason for lag: corrupted backup volume + time machine = mac go boom ) [07:16:26] assuming it will 1) use an ssh key and 2) push more than one place at once, sure [07:16:52] It will definitely do both [07:17:09] It will even do both in different threads, so that if one is slow, the other is still up to date. [07:17:44] well, then it seems simple [07:18:11] i did some brief github playing and shoved the bochs iphone port into it [07:18:39] if i had free time i would port the work i did to make mplayer do iphone audio to bochs. yeah right. time! [07:18:50] --- stevenjenkins has left [07:18:58] the branch visualization is really nice [07:19:24] --- stevenjenkins has become available [07:22:58] BTW: what did you mean in your gerrit comment about max(5%, bulkstat size) [07:23:20] My thought was that we should only try and flush vcaches when we're over the cacheStat limit that the user gave on the command line. [07:55:50] i figured we should have a point below that where we started looking around for some to reclaim [08:01:31] Where does bulkstat size come into it? [08:04:37] when you bulkstat you're gonna want to populate that many vcaches. not needing to hunt for one vcache each time would be nice [08:05:40] Ah. Okay. Sure. [08:06:01] Is VCACHE_FREE set to that value currently? [08:07:00] nope. it's set to something silly like 5 [08:07:08] 5 in fact [08:09:03] The trade of in upping that is going to be in the amount of time you hold xvcache for. [08:09:46] you can either hold it longer once or have lock contention. everything has tradeoffs [08:12:18] I'm just thinking of this now, because there's just been a case in the Linux kernel where particularly large NFS directories would result in dcache_lock being held for 10secs or longer whilst the dcache is pruned. [08:39:51] --- jaltman has become available [08:55:13] --- reuteras has left [09:24:43] --- Simon Wilkinson has left [09:35:56] --- Russ has become available [10:10:54] --- Russ has left: Disconnected [10:20:55] --- reuteras has become available [10:21:07] --- reuteras has left [10:31:19] --- haba has left [10:45:37] --- meffie has left [10:49:32] --- Russ has become available [11:30:05] --- dev-zero@jabber.org has left [12:15:12] --- meffie has become available [12:21:17] --- dev-zero@jabber.org has become available [12:36:12] --- haba has become available [12:48:50] --- dev-zero@jabber.org has left [12:55:55] --- Simon Wilkinson has become available [12:56:08] meffie: You around? [12:56:49] yes [12:57:22] So, 125596, looks like you've got either memory scribbling, or duff memory. [12:58:12] yes, i have three cores, from different hosts, all similar. [12:59:54] something triggered this bug it seems [13:00:23] Okay. So not duff memory, then. [13:00:58] How does the rest of the vcache look? Is it just callback that's corrupt? [13:02:53] This is 1.4.11, right? Is it 1.4.11 or 1.4.11-snowleopard? [13:03:08] the vcache looks reasonable i think. just the contents of the memory callback points to is a text string of an afs path. "/afs/cellname..." [13:03:34] 1.4.11 on solaris sparc [13:04:12] It's interesting that it's a path. The cache manager doesn't really get to see paths that much. [13:05:19] Except as symlink data [13:05:28] maybe it's symlink data? i dont know. deason has been looking at the cores too. [13:05:31] Yup. Which should be in linkdata [13:06:00] So, what's the linkData on that vnode? [13:06:19] linkData is NULL, it was nulled right before queueVCB [13:06:34] What's the FID of the vnode, and what does that FID point at on the disk? [13:06:58] oh wait, what the hell [13:07:07] I thought I put in an analysis of a possible locking problem hours ago to that ticket [13:07:43] I suspect RT may have bounced it. Marc got caught by that a few days ago. [13:07:55] --- stevenjenkins has left [13:08:06] Do you have an RT login? [13:08:35] --- stevenjenkins has become available [13:08:48] no [13:09:03] oh, wow, yeah, none of my responses have showed up there [13:09:09] I suspect me adding you as a CC on the ticket won't have allowed you to comment on it then. Sorry. [13:09:12] I somehow haven't received any bounce messages either [13:09:20] If you forward me what you said, I'll add it to the ticket. [13:10:12] --- abo has left [13:10:24] And I'll add this fuel to "got to sort this out" fire [13:10:27] --- abo has become available [13:10:36] @inf.ed.ac.uk, right? [13:10:55] Yeh, or simon AT sxw.org.uk [13:11:03] (That's sxw AT inf.ed.ac.uk ) [13:11:10] oh, I see, I think we have to be requestors for it to accept email from us [13:11:31] because I know I've responded to other people's tickets before; I must have been set as a requestor when I requested to be added [13:11:45] Being CC on a ticket should allow you to reply to it (not comment), if you use the correct email address. RT does not send bounces for mail it gets that fails access control. That's deliberate; otherwise it would generate lots of spam blowback. [13:11:57] Let me try making you a requestor, rather than a CC. [13:12:19] yeah, I should have tried looking at the ticket itself when I noticed mike was repeating some of the things I said :) [13:12:22] The problem is that as a requestor, you won't see comments. [13:12:41] I wouldn't see them anyway, even if I created the ticket [13:13:08] Yeh. We should fix that too. [13:13:36] "comments" and "correspondence" are not the same thing. If you want the requestor (or CC's) to see things, don't use comment. [13:13:38] well, I thought that was kinda the idea of comments [13:13:43] yeah, that [13:14:03] What ticket? [13:14:16] I think the real problem is that RT (the way we've got it configured) is more of an end-user help desk system - not an open source development thing. [13:14:52] So people who come from other open source projects bug trackers find that things don't quite work the same way. And sometimes they feel that they're being pushed away by that. [13:15:20] 125596 1.4.11 cache manager panic on solaris [13:15:25] Oh, I see. "CC" and "AdminCC" are not the same thing. And in our current configuration, adminCC can comment but cannot reply, which is arguably a bug. [13:15:38] What can 'CC' do? [13:15:56] reply, I assume [13:16:07] --- stevenjenkins has left [13:16:14] also, Simon: messages forwarded to you [13:16:22] Cool. I'll add them to the bug. [13:16:26] Yes, CC can reply, just like a requestor. [13:16:29] --- abo has left [13:16:32] --- dev-zero@jabber.org has become available [13:16:43] So CC is more powerful than Admin CC, in effect? [13:16:48] --- stevenjenkins has become available [13:16:53] it's supposed to be the other way around [13:17:01] (Does the requestor see the replies from other requestors? ) [13:17:07] 'comments' are more privileged than 'replies' [13:17:16] Ah, OK. The problem here is there's a group, openafs-bugs-view, which has the ability to see comments on anything in the queue and to add themselves as admincc to arbitrary tickets. Under the model at the time, we didn't want to give those people the ability to reply to tickets. [13:17:22] --- abo has become available [13:17:25] adminCC can send and view comments. [13:18:32] ah, so they can add stuff to tickets, but it won't appear on the web interface, so you stop e.g. spambots from creating visible spam on existing tickets [13:18:34] replies are public comments are "internal" so, to avoid giving -view only people the ability to make a public reply, admincc does not have reply [13:19:47] Only the requestor, CC's, owner, and people in certain groups can reply to an existing ticket. spambots are not relevant; they can't gain any role with repsect to an existing ticket. [13:20:21] Simon: I think the main difference is just that we don't force you to sign up in order to file bugs; so those without an account have less access [13:20:44] Yes. [13:20:49] iirc, most web-based systems make you create some kind of account or something, so you can prevent things just automating form submission [13:20:49] basically, we need to revamp some things so that RT can behave more like other ticketing systems, with a sane way for people to be able to sign up and then make public replies to anything. [13:20:52] But we have no way for those who have signed up to have more access. [13:21:07] And I don't think that we should have a way to have 'private' comments. [13:21:29] Too late; that exists, and it would be bad to make previously private comments be public. [13:21:40] There's probably a reason for having things that aren't sent to the requestor - but I don't think we should be in the business of hiding things in the bug tracking system. [13:22:05] the security queue should, I believe [13:22:18] The security queue is not public at all. [13:22:23] Yes. And I think that's right. [13:22:33] and you could keep existing comments private, but somehow make it more difficult to make new ones [13:22:34] But I think the person who reports the bug should get to see the discussion around it. [13:22:40] I mean, now it's just two links next to each other [13:22:55] (assuming it's like most RTs I've seen) [13:23:04] Yeh. It is. [13:23:21] With regard to the public queue, we can either keep existing comments private and discontinue the ability to add new ones, or remove existing comments and open up the ability to send and view comments to everyone (with the effect that it becomes possible to add something that's not emailed to the requestor). [13:24:06] I think I'd go for the latter, but we'd want to consider what information we might lose by doing so. [13:24:09] Though now that I think about it, once we upgrade, it'll be possible via the web interface to enter a "reply" but control who gets email copies. [13:24:24] which effectively means we get both features. [13:24:27] You can do that now, I think, through the check boxes at the bottom of the page. [13:24:55] Oh, is openafs's new enough? I use three or four RT's and can't keep straight which have which features. [13:25:18] --- stevenjenkins has left [13:25:23] --- abo has left [13:25:39] --- abo has become available [13:25:40] In any case, openafs-bugs-view currently has no members, so it is probably reasonable to simply allow admincc's to reply. [13:26:02] --- stevenjenkins has become available [13:26:36] Could you do that now, and we can thrash out the rest later? [13:26:55] I figured I'd wait a few minutes for objections. [13:27:06] But otherwise, yes. [13:27:37] deason: I think that's all of your comments added to that ticket. [13:27:46] yep, that's them [13:27:54] Going to read through the locking thing you pointed out. [13:27:54] thank you [13:28:05] --- haba has left [13:28:14] In general, locking issues are only a problem if there's a GLOCK drop in the middle of when we should have the locks. [13:28:37] there is; we can drop it to obtain afs_xvcb write lock, yes? [13:28:58] Yes, if we can't get the lock immediately, we'll drop it. [13:29:04] Or if there's a potential deadlock. Remember, acquiring any lock could involve a GLOCK drop. [13:29:24] Well, except when we know we already had the lock since the last GLOCK drop. [13:30:12] I've generally been fixing things even when they theoretically aren't a problem, on the basis that it would be really nice to kill the GLOCK at some point. [13:30:49] --- haba has become available [13:31:51] It would, but there are some cases where we depend in somewhat convoluted ways on it being safe to release a particular lock, or assume there won't be a GLOCK drop, to avoid an otherwise difficult-to-resolve deadlock. And those are a PITA to fix. [13:32:18] Yeh. [13:33:19] --- stevenjenkins has left [13:33:41] --- stevenjenkins has become available [13:37:28] deason: The question is going to be whether the xvcache is protecting the stuff we care about. [13:37:53] s/xvcache/xvcache lock/ [13:38:36] yes, basically, if we see an error we won't hit, fixing it to make things hopefully sane later is the smart way to proceed [13:38:54] sometimes this means "document that the locking model is entirely broken and move on" [13:39:37] Derrick, do you know what locks struct server records? [13:41:56] I can trace from afs_FlushServerCBs up to afs_GetServer not locking xvcache so far [13:42:39] I thought it would be xserver or xsrvAddr, since that's what flushservercbs specifically holds, but that's just kinda intuition [13:42:51] and assuming the comments about various locks are correct [13:43:03] I think it's xvcb [13:43:21] what locks them as in callers? [13:43:34] What lock should you hold before modifiying them? [13:43:45] In particular, before modifying the hash chain [13:44:11] should be xserver [13:44:45] note i say *should* [13:45:07] But should we also hold xvcb before playing with server->cbrs ? [13:45:17] yes. but xvcb must be acquired first [13:45:41] in fact... hang on [13:45:46] wtf is that file called [13:45:53] The Lock Hierarchy [13:45:55] aha [13:45:55] DOCS/locks [13:45:59] src/afs/DOC/afs_rwlocks [13:46:01] or something like that [13:46:29] fs-cm-spec.h? [13:47:51] So, according to the comments, we should hold xvcb before calling AllocCBR [13:48:01] hm, so do you think getting avc->callback should happen after grabbing xvcb? [13:48:33] probably. looking... [13:48:50] Not sure what you mean by 'getting' - server structures aren't reference counted, are they? [13:49:12] i assumed populating? [13:49:28] I just mean, before we tsp = avc->callback [13:49:42] yeah [13:49:45] not 'get' and in 'get/put' [13:50:08] your english good is. [13:50:16] (is ok, i am not going so well myself) [13:50:16] unpossible [13:50:18] "as in" [13:50:27] "not 'get as in 'get/put'" [13:50:36] on the 3rd parse i got it [13:50:38] I think we should get it before we call QueueVCB [13:50:44] well, yes [13:50:54] but it means [13:51:00] yeah, that'd work, too, since we check for avc->callback non-null-ness [13:51:16] we could also acquire lock, do check inside queueVCB, and return if it's null [13:51:33] Actually, that should be get xserver, I think. [13:51:56] We already get xvcb inside queueVCB - I think where that is is fine. [13:52:45] sorry, was trying to ^C. anyway, there's just one caller. that caller can easily be modified but given that's true there's no pressing need to modify the caller rather than QueueVCB itself [13:53:39] --- abo has left [13:53:43] Simon: yes on xserver; my limited understanding says that's the right lock, but I wasn't sure [13:53:59] it should just be that tsp moves inside a xserver lock, inside the xvcb lock. the xvcache lock is already "above" these [13:54:08] I think we're good for the lock hierarchy [13:54:12] --- abo has become available [13:54:28] I mean, the question is just what lock do we need so tsp = avc->callback; gunlock/glock /* tsp doesn't point at garbage */ [13:54:29] Any patch which fixes this should document the locking assumptions its based on :) [13:55:14] /* go read src/afs/DOC/afs-rwlocks */ [13:55:22] Not the hierarchy [13:55:32] It should say that the value of avc->callback is protected by xserver [13:55:56] yeah, rwlocks isn't helpful, it just tells me '11. afs_xserver -- locked before afs_xconn in afs_ResetUserConns.' [13:55:58] that file *should* be fleshed out further to explain what protects what. it does, in limited circumstances [13:56:14] it's helpful. just not *for this* [13:56:40] possibly more helpful for this is the comment for xserver's decl, which is afs_rwlock_t afs_xserver; /* allocation lock for servers */ [13:56:50] My preference would be that the structure definitions are annotated with which parameters elements are protected by which locks [13:57:19] which means while thew lock information is with what it locks, it doesn't appear all in any one place [13:57:33] so really i want a good source browser [13:57:36] You can't win them all :) [13:57:48] i can win them all. it just requires you to lose. [13:57:50] I really must get that opengrok instance I ran a while back up and running again. [13:58:21] disagree. we should see if debian has it and get russ to have (redacted) set it up [13:58:36] Well that's a better plan. [13:58:42] Why (redacted)? [13:59:03] it's not mine to decide whether (redacted)'s identity should be known [13:59:14] i know no reason not to share it, i'm just not going to be first [14:00:06] Debian doesn't have opengrok so far as I can tell. [14:00:13] there's an lxr instance set up on some place I never heard of before [14:00:20] where? [14:00:25] http://jjfiles.com/lxr/source , old though [14:00:26] Debian does have lxr. [14:00:47] John Livingston, Springfield, VA [14:00:51] huh [14:01:04] oh, that guy? [14:01:05] I know him [14:01:10] where is that on that page? [14:01:13] --- stevenjenkins has left [14:01:28] it's not. i learned how to use whois in kindergarten [14:01:38] he's got internet stalking down to a fine art ... [14:01:39] haha, right [14:01:44] --- stevenjenkins has become available [14:01:46] (i lie. in kindergarten they on;y let me have punch cards) [14:02:25] pretty sure troy benjegerdes also had a lxr for a while [14:02:43] i wonder if i spelled that right [14:03:00] wow, google says yes [14:03:45] Actually, FlushServerCBs just tramples over our entire locking model. [14:03:50] woot! [14:04:14] It modifies callback, dchint and f.states [14:07:49] deason: I think there's a question here about what _should_ protect avc->callback [14:08:23] Because I don't think having to acquire a global lock everytime we want to look up its contents is going to be pretty. [14:08:59] I don't think we need something protecting that member; we can get that fine; what we need is something to protect against a host we have a pointer to from being freed [14:09:13] which is what afs_xserver looks like to me [14:10:02] Yeh. It's just it stops _anything_ from happening to _any_ server structure. [14:10:11] Holding it isn't exactly low cost. [14:11:33] well, we could reasonably add a per-server-object lock [14:11:44] Or reference count the server object. Or both. [14:12:04] if one, a lock would be my preference. i want to be able to lock out anyone else [14:12:13] I still have a concern about what protects avc->callback, too. [14:12:29] But I suspect we're not going to fix the cache managers locking model this evening. [14:16:00] if we assume we have glock, I'm not sure I see the problem; it's always either NULL or points to something valid [14:16:28] or maybe I'm just forgetting what I understood about locking models; it happens sometimes [14:16:30] The problem is if you start an operation assuming it's valid, and then have it removed from under you. [14:16:56] you can put the pointer in a local var, and then you just need to lock/refcount/whatever the server object itself [14:17:02] Something like if (avc->callback) { ObtainWriteLock(blah); avc->callback->foo = blob } isn't safe. [14:17:14] yes, we don't do that anyway [14:17:19] Really? [14:17:31] at least, not in this specific specific instance :) [14:17:36] Yeh, not here. [14:17:36] tsp = avc->callback; and all that [14:17:46] well, I suppose we do, actually [14:17:47] I think just getting xvcache is fine for this specific specific instance [14:17:59] since we check avc->callback, and then use avc->callback inside afs_queueVCBs [14:17:59] xserver, even, sorry. [14:18:25] but we don't drop glock between the two lines that reference it, so effectively we don't :) [14:18:26] If you can guarantee that every path that might free a server record holds xserver [14:20:37] I was hoping someone else might know where else they could be [14:20:57] grepping for 'free' and 'server' doesn't show anything else, but that's not exactly authoritative [14:21:09] find . | xargs grep Free | grep 'struct server' [14:21:17] is probably pretty conclusive. [14:22:53] I'd be worried about something like afs_osi_Free(ts, sizeof(*ts)), but not so much to kill myself over [14:23:10] we have a xserver lock in afs_FlushServer, where they are freed, but what about afs_QueueVCB? [14:23:20] We don't have one there. [14:25:10] there is one other place they are freed, but it's just during shutdown; should I assume that's okay? [14:25:12] nor in afs_FlushVCache i think? [14:26:38] FlushVCache doesn't free server objects, does it? [14:27:43] no it doesnt. i mean i'm just double checking, we are not holding xserver there, right? [14:28:06] no [14:28:15] --- Simon Wilkinson has left [14:29:25] --- Simon Wilkinson has become available [14:29:57] Sorry, this mac seems to have decided not to tell me when I'm running out of juice. [14:30:53] Nothing that calls FlushVCache can safely hold anything below the individual vcache lock in the lock hierarchy [14:30:59] it's telling you it's time for scotch, not code [14:31:44] Oh, it probably is. [14:32:02] do we need a write lock on xserver, because we modify it? or is a write lock just for "I'm freeing a server"? [14:33:17] If you believe that tsp->cbrs is protected by xvcb, then you just need a read lock [15:21:29] --- jaltman has left: Disconnected [15:22:01] --- deason has left [16:12:10] --- dev-zero@jabber.org has left: Replaced by new connection [16:12:11] --- dev-zero@jabber.org has become available [17:41:15] --- Russ has left: Disconnected [17:48:48] --- dev-zero@jabber.org has left [17:49:03] --- dev-zero@jabber.org has become available [18:04:28] --- Russ has become available [19:27:16] --- deason has become available [20:15:13] --- deason has left [20:22:12] --- meffie has left [23:56:51] --- Russ has left: Disconnected