[00:06:09] --- stevenjenkins has left [00:06:40] --- stevenjenkins has become available [00:06:51] --- stevenjenkins has left [00:33:35] --- dev-zero@jabber.org has left [01:23:29] --- dev-zero@jabber.org has become available [01:41:10] --- Russ has left: Disconnected [05:27:36] --- jaltman has left: Disconnected [05:42:02] --- mmeffie has become available [05:59:23] --- Jeffrey Altman has left: Replaced by new connection [06:06:08] --- stevenjenkins has become available [06:38:04] --- deason has become available [06:53:55] --- Jeffrey Altman has become available [07:06:35] --- abo has left [07:13:15] --- reuteras has left [07:29:15] --- abo has become available [07:35:06] --- dev-zero@jabber.org has left [07:35:39] --- dev-zero@jabber.org has become available [07:37:16] --- jaltman has become available [08:35:30] --- mmeffie has left [08:39:19] --- dev-zero@jabber.org has left [09:03:42] --- manfred has become available [09:38:15] --- mmeffie has become available [09:57:47] --- jaltman has left: Disconnected [10:04:36] --- jaltman has become available [10:08:18] --- manfred has left [10:36:16] --- jaltman has left: Replaced by new connection [10:36:17] --- jaltman has become available [11:21:26] --- Russ has become available [12:09:18] --- dev-zero@jabber.org has become available [12:25:30] --- dev-zero@jabber.org has left [12:25:42] --- dev-zero@jabber.org has become available [12:37:39] --- manfred has become available [13:49:50] Simon, are you here? i think you may be right, 800 could be introducing a deadlock. [13:50:00] Hi [13:50:21] i think it does. [13:51:34] yeah, I thought it may not be an issue because we already possibly were... but we're introducing a new case where it can happen, so that's definitely not okay [13:52:04] More specifically, you'll deadlock with the place that you were racing with before :) [13:52:10] yeah :) [13:52:35] I'm fairly certain the only violator of the documented xserver/xvcb hierarchy is FlushServer; I'm trying to see if it's easier to fix it, or to reverse their roles [13:52:50] The other option is to change the lock hierarchy. [13:53:16] err, s/reverse their roles/reverse their positions in the hierarchy/, yeah [13:53:19] i'm thinking FlushServer is right, actually [13:53:20] I haven't looked at the impact of that, but there aren't many places that take xvcb, so it might be relatively easy to validate. [13:53:47] There isn't really a right or wrong - it's a case of "pick and order, and stick to it" [13:54:32] well, xservers [13:54:51] yeah, I understand that FlushServer's order makes more sense; it's kinda hard to change there [13:55:03] but if any one is different from the others, they're all arguably "wrong" [13:55:29] Indeed. [13:55:40] --- stevenjenkins has left [13:55:45] I think that here holding xservers is really just a sticking plaster for a bigger problem. [13:55:48] --- abo has left [13:56:03] okay, it looks like all of the code paths that lock both xvcb and xservers are trivial (one right after the other), except for flushservers [13:56:27] If you could do this in two patches, that would be great. [13:56:31] so, putting xvcb below xservers looks right to me [13:56:41] One to swap the lock hierarchy, and one to add locking to stop this race. [13:56:43] right; 1 reverse, 2 add xserver lock? [13:56:45] right [13:56:46] Yeh. [13:56:46] --- abo has become available [13:57:19] I'd imagine we'll want to pull them both up to 1.4 eventually too. [13:57:29] that's our intention, yes [13:57:36] I _really_ don't want master and branches to have different lock orderings. That would make my head explode. [13:58:52] agreed. [13:59:22] --- stevenjenkins has become available [14:02:53] --- dev-zero@jabber.org has left [14:05:12] I think there's a bigger issue here, in that I don't think taking xserver is sufficient in cases where we aren't protected by the GLOCK. [14:05:40] But I'm increasingly feeling that, if we do want to kill the GLOCK, we're going to have to radically overhaul the way we currently lock things anyway. [14:06:08] I _really_ don't want large changes on 1.4. Changing the locking order is a large change. You may think you have thought through all the implications, but (no offense) you are probably wrong. [14:06:24] Then review the change when it comes in. [14:06:53] But if 1.4 and 1.5 have different locking orders, then that hugely increases the chance that future pullups to 1.4 will break it. [14:07:25] True. So think about whether you can fix this without changing the locking order, and hold off the change until we are fairly sure we're abandoning 1.4 [14:07:41] i wonder if holding the xserver lock is really a red herring. it seems to me the issue is avc->callback is not ref counted. [14:08:08] There's a load of issues. [14:08:29] [14:08:32] The lack of reference counting of servers is what means that you have to hold xserver [14:08:49] My recollection is that there is a subtlely here, but I don't recall what it is. Which sucks, and indicates we really need to be better about documenting (in comments) sublte behavior -- and in such a way that it is reasonably easy to tell if the code has changed and no longer agrees with the comment. [14:08:56] But, there's also the issue that avc->callback is modified without holding avc->lock [14:09:32] xserver protects you from that too, because the only sites that modify avc->callback do so whilst holding avc->lock. [14:09:38] > avc->callback is modified without holding avc->lock IIRC, we depend on the GLOCK to help us make that OK. [14:10:07] Sadly, that doesn't work when you do tsp = avc->callback; ObtainWriteLock(); tsp->blob = foo [14:10:17] The ObtainWriteLock drops the GLOCK, and boom. [14:10:29] Yeah; that's unsafe. [14:10:38] --- deason has left [14:10:47] --- abo has left [14:10:50] Unless you _know_ the ObtainWriteLock is going to succeed without blocking, but such cases are very rare. [14:10:52] --- stevenjenkins has left [14:11:06] --- stevenjenkins has become available [14:11:06] --- deason has become available [14:11:12] --- deason has left [14:11:21] The only case I can think of where that's true is when you do ObtainWriteLock; ReleaseWriteLock; ObtainWriteLock [14:11:37] with nothing that drops the GLOCK in the middle. And that's fragile to the point of being broken already. [14:11:39] --- deason has become available [14:11:39] --- abo has become available [14:12:17] on changing the locking order... I believe that fixing this without changing the locking order requires a much more invasive change [14:13:10] That's my belief, too. [14:13:25] > Sadly, that doesn't work when you do tsp = avc->callback; ObtainWriteLock(); tsp->blob = foo 800 will change that assignment to be after acquiring the lock [14:13:39] (or well, it already does, but 800 will change; I believe the new 800 will also be that way) [14:13:51] I wonder if _just_ making that change will be sufficient to solve your problem. [14:14:35] (that is, move the lock Obtain before the assignment to tsp) [14:15:01] just moving it after xvcb and not acquiring xserver? yes, I'm fairly certain that will solve it, but I was wary as it's not the "right" solution [14:15:01] well, wait a sec. the locking order in already inconsistent. ah. [14:15:59] it just happens to work because the code that frees the struct servers also holds xvcb [14:16:25] I think it works because the GLOCK prevents concurrency. [14:17:00] It doesn't matter that the freeing code holds xcb - the freeing code can't run whilst you're manipulating tsp, because the GLOCK is held. [14:17:29] right; nevermind, I'm backwards [14:17:37] I was thinking even if we didn't have glock, but that's wrong [14:18:09] If we don't have the GLOCK, we're doomed in so many ways it's just not funny. [14:18:13] since xvcb is only held for a small part of freeing the struct server [14:18:22] I know, I was just trying to not make it worse [14:18:33] That's my general intention too. [14:19:17] But Mike's right, to fix this case we'd want reference counts. We'd probably want per server locks, and we'd almost certainly want to define which lock protects avc->callback. [14:19:36] so we could keep the existing 800 change exactly the same, except get rid of the added xserver locks [14:20:00] I think that's probably the change for 1.4 [14:20:05] and we can create a separate change to reorder xserver/xvcb, which I still think we should do [14:20:27] Yes. I think we should do it. Even if that change sits in gerrit for a while for consideration. [14:21:18] as we're already risking deadlock from the order in afs_FlushServer being different than afs_FlushVBCs [14:21:25] err, VCBs [14:21:45] I suspect the fact that we're not seeing that deadlock in wild means that something is currently saving us from t. [14:22:04] But yes, we should fix that. [14:23:15] Hmmm, disconnected gets this wrong, too. bah. [14:24:35] > the GLOCK prevents concurrency. Yes. [14:24:36] well, there is also that flushServer happens very rarely (I think?), and in FlushVCBs, the locks are right next to each other [14:24:52] --- manfred has left [14:24:55] and FlushVCBs is called in the afs_Daemon loop [14:25:12] every... 20 seconds, or something like that? [14:25:52] The fact that the server lock is a read lock means its much harder to deadlock on. You need a 3rd player to get involved. [14:25:57] so we are talking something like? http://pastebin.com/m3729346a [14:26:45] or maybe it's just because we don't ever block on acuiring some other lock between the xvcb/xserver locks in FlushServer &caller [14:27:03] --- abo has left [14:27:47] No. processA: ObtainWriteLock(&a); ObtainWriteLock(&b); processB: ObtainWriteLock(&b); ObtainWriteLock(&a) [14:27:48] oh wait, that doesn't work [14:27:49] --- abo has become available [14:27:52] will deadlock in our locking model. [14:29:24] FlushVCBs only locks xvcb conditionally; when it's called from AllocCBR, xvc is already held [14:29:29] xvcb [14:30:03] Simon: with glock, and those are the only two instances of those locks? [14:30:31] However, if A and B both take read locks on a, rather than write locks, we'll only deadlock if there's a 3rd player which takes a write lock on a first, and process A is woken before process B. [14:30:37] deason: Yes. [14:31:52] Actually, maybe not. processA will only sleep if there's another holder of a, so B won't get a look in. [14:32:15] So you need a process C that causes process A to block whilst obtaining lock a. [14:32:32] yes [14:32:57] there's a few other locks held between the two getting grabbed in FlushServer, but maybe something prevents them from being contested [14:34:39] I think we must just be lucky. Even the 'it's a read lock' theory doesn't hold water - afs_GetServer takes a Write lock on xserver [14:35:31] oh argh, right, we can't hold xserver at all before calling AllocCBR in QueueVCB [14:35:51] since AllocCBR->FlushVCBs, which holds xserver [14:36:06] arg. [14:37:29] that's not a show-stopper, you just have to make the lockit conditional apply to both locks, I think [14:37:56] You could. You'll need to check that all of the callers where lockit is 0 are getting both locks correctly. [14:38:20] there's only 2 callers in 1.4 [14:38:34] And you'll need to check that all of the callers of AllocCBR also get xserver [14:39:15] 3 callers of FlushVCBs in 1.5 (disconnected) [14:39:19] and only 1 caller of AllocCBR [14:40:09] disconnected is broken actually - I just spotted that. It should call FlushVCBs(1) [14:40:25] But I'll fix that shortly - it's a different issue. [14:41:06] If we want to fix this correctly, though, I think we're looking at too small a picture. [14:41:40] I'm thinking of 1.4; I'm not sure leaving things as they are is an option [14:41:58] but changing to e.g. a struct server refcount would be a 1.5 thing, right? [14:41:58] Oh, the 1.4 fix is the one we discussed earlier. Shift the lock. Move on with life. [14:42:06] But I think we want to fix this properly in 1.5 [14:42:19] okay, and assume since we haven't seen the existing deadlock, we're okay? [14:42:33] Yes. If someone reports that deadlock, deal with it then. [14:42:53] But I would be interesting in seeing patches to fix these problems properly for 1.4 [14:42:59] sorry, 1.5 - finger slipped. [14:43:25] (I doubt sticking plaster patches for 1.4 will make it in without corresponding real commits for 1.5) [14:44:20] how about a hold bitmap like the old viced host package? ;) [14:44:25] everyone loved that! [14:44:40] Yeh. NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO [14:45:09] Lock reordering seems like a good idea, if we can do it without too much pain. [14:45:28] I'd like to come up with some idea of which lock actually protects avc->callback, and have that be consistent throughout the code [14:45:49] --- abo has left [14:45:58] really, lock reordering? I thought you'd want something more general, like locking the individual struct hosts, or refcounts on them [14:46:17] I'd be happy to solve the ordering so it's consistent. [14:46:46] --- abo has become available [14:46:47] shorter term I suppose, but I thought the idea was to get rid of the xserver lock [14:47:14] or... that's probably used for some global hash thing, too [14:47:14] Well, reducing its scope would be good, I think. [14:47:25] Yeh. You'd still need it to protect the hash tables. [14:47:34] Individual locks are not necessarily the right answer. [14:47:54] Well, they can just increase lock contention. [14:47:59] Right. [14:48:35] I think the question here is whether we're trying to prepare the code for a glorious GLOCK free future, or whether we just want it to work right in the presence of GLOCKs. [14:49:09] I don't think there's much benefit in major lock surgery until we're actually looking to remove the GLOCK, and then I think our locking model is going to have to change beyond all recognition, anyway. [14:50:41] By looking at the bigger picture, what I meant was resolving the question of what actually protects avc->callback. Ultimately, this problem is due to that being changed behind our backs. I think we should decide which lock protects that field, document that somewhere, and make sure that we hold it correctly. [14:52:30] This is an issue, because we're queuing a callback break here. If the callback attribute on the vcache changes, then that means that someone's raced us - and that shouldn't happen. Sure, in this case taking xserver prevents both the race, and its symptoms, but that's symptomatic of this particular case - it's possible other races may occur where the other process doesn't hold xserver. [14:52:47] maybe I just haven't though enough about it yet, but I'm unsure if you mean protecting what avc->callback points to, or the contents of what it points to; that is, a per-vcache lock, or a per-host lock (assuming we went with individual locks) [14:52:56] for Windows, the equivalent of the callback and expiration field is protected by a rwlock on the vcache object equivalent. [14:53:32] I'm talking about protecting the avc->callback, not *(avc->callback) [14:53:41] okay [14:53:48] the object that callback points to is protected by a refCount that is in turn protected by a global lock [14:54:15] Hark at Windows and its grown up locking model :) :) [14:54:26] > and that shouldn't happen well, right now it's racing against when we determine a server's address changed (I think), which is when we null/free all of the associated avc->callback's [14:55:49] Yes. That's the case here. It doesn't matter that it's racing, as long as that race is resolved properly. That is, we should either queue a callback break that gets handled with all of the other queued callback breaks. Or, callback is NULL, and we shouldn't do anything. [14:56:06] What worries me is that I'm not convinced that the currently locking model guarantees either. [14:58:09] I really hate that we don't have a documented locking model and well-documented reasoning about why things are OK under that model and what assumptions were made. [14:58:21] I strongly advise that someone implement a lock order validation mode similar to what I did for Windows before you start playing with lock ordering. It just isn't possible to visually validate lock order hierarchies [14:58:34] We know that our lock hierarchy is broken :) [14:59:14] jhutz: I'm pretty sure that things aren't OK under our current model. As machines (and networks) get faster, we're gradually shaking out more and more bugs. [14:59:17] but what you don't know is how its broken and what the contention paths will be in the function call hierarchy [14:59:42] Unfortunately, it's not that simple. We have lots of cases where things _appear_ to violate locking order but are acceptable due to some assumption we make that would not be visible to such a tool, like "we can't block in this place" [15:00:24] so build that assumption into the validator [15:00:29] As long as you're not releasing the code, it isn't a huge amount of work to run OpenAFS against the Linux kernel lock validator. [15:01:58] Simon: I agree. The problem is that every time we find such a problem, several people spend several hours or days reasoning about the locking model, coming to a conclusion about what is wrong, how to fix it, and why their fix doesn't introduce a new problem. Then they make the change, fail to document any of that reasoning, and the next time someone has to do it all over again and comes to a different conclusion. The result is that the reasoning is frequently flawed and we introduce as many new bugs as we fix. [15:02:22] The Linux kernel lock validator doesn't know about our locking primitives. [15:03:05] So it doesn't actually tell us whether we're complying with our own locking model. And of course, it would also be nice to validate that things we think are protected by a lock really are. [15:03:15] No, but we can implement our locking primitives in terms of Linux kernel locks. [15:03:34] (doing shared locks hurts, and the validator won't catch those properly, but it is doable) [15:05:59] Oh, I see. [15:08:26] --- mdionne has become available [15:13:21] marc: Newest version of the creds thing looks good. [15:18:51] Simon: ok, thanks. the session keyring handling can be cleaned up later. handling it in crset/dup() doesn't give us much [15:19:33] I think I'd like to see, eventually, us treat afs_ucred_t as 'const', except where its gone through crdup. [15:19:52] But that's going to be a big change, and I don't think it's worth looking for that now. [15:21:33] that would help catch cases where we try to modify the creds, yes. but that might touch a lot more code [15:21:45] Yeh. Definitely not for now ... [15:22:15] the next patch i'm preparing deals with PagInCred - make it look at the passed credentials for the keyring instead of the current process [15:22:49] Which will allow us to do away with override_creds() :? [15:23:23] Yes, and do away with needing to look at groups [15:23:57] Very nice ... [15:24:40] at least for writeback by the flushing thread and pre-fetching this is working as expected [15:25:20] Great. [15:25:34] I've got a couple of changes I'd like you to take a look at, if you've got a moment. [15:25:45] I haven't had time to test them well enough to push them to gerrit. [15:26:22] The first sets the page writeback bits and drops page locks much earlier in writeout. Providing I've read the kernel docs properly, it should hugely reduce the chances of us deadlocking. [15:26:27] sure, send them over, have some time tonight. would be nice to have a gerrit or a git tree for experiment code [15:26:28] --- deason has left [15:26:42] experiment -> experimental [15:26:43] --- deason has become available [15:26:55] The second sets the page flags in prepare_write() / write_begin, so we don't read pages we've just written. [15:27:13] Okay. I'll email them through. [15:27:41] It would be nice to have an experimental gerrit. Which reminds me, I should have a conversation with Russ about hosting 'developer' trees on o.s.e [15:36:03] --- jaltman has left: Disconnected [15:48:27] Gerrit makes that relatively straightforward if all the patches are pushed through Gerrit. It's a bit harder if ssh accounts are needed, although with git-shell it's probably not too bad. [15:51:53] put it in the real gerrit for now, and mark it -1 verified? [15:59:13] I worry about the clutter from that - but it's definitely a short term solution. [15:59:33] Gerrit can let you push stuff through it without code review. In effect, it can work just like gitosis. [16:00:07] What I was thinking about is allowing developers to have their own OpenAFS trees hosted on o.s.e, in a similar way to the way that kernel.org functions. [16:00:27] But, it's possible that somewhere like github could do a much better job of hosting trees than we can ... [16:02:38] if you mark it -1 verified it will block [16:02:47] or, do you mean just pushing to not an incident? [16:03:40] I mean, pushing like you do to a 'normal' git repository. gerrit can be configured to just do access control, and not do code review at all. You can manage that configuration on a per-repository basis. [16:03:52] ah, ok [16:04:13] So, I could have a personal tree, you could have one, marc could have one, etc. We could all push whatever we wanted into those trees, and point people at them as 'works in progress' [16:04:46] sure [16:05:01] if there were something open source like github i would have suggested it. i did look [16:05:13] --- deason has left [16:06:02] --- deason has become available [16:07:54] --- deason has left [16:47:14] --- mmeffie has left [16:54:23] --- jaltman has become available [17:35:04] --- jaltman has left: Disconnected [17:46:31] --- deason has become available [18:01:33] --- Russ has left: Disconnected [18:17:48] --- Russ has become available [18:49:33] --- jaltman has become available [19:20:23] --- mdionne has left [20:20:39] --- jaltman has left: Disconnected [20:25:27] --- jaltman has become available [20:50:12] --- jaltman has left: Disconnected [21:03:43] --- dev-zero@jabber.org has become available [21:04:02] --- dev-zero@jabber.org has left [21:04:15] --- dev-zero@jabber.org has become available [22:06:07] --- reuteras has become available [22:26:17] --- deason has left [22:31:00] --- dev-zero@jabber.org has left [23:09:05] --- dev-zero@jabber.org has become available [23:25:32] --- dev-zero@jabber.org has left [23:25:45] --- dev-zero@jabber.org has become available [23:47:18] --- manfred has become available