[00:08:24] --- Russ has left: Disconnected [00:11:31] --- kaj has become available [00:17:32] --- kaj has left [00:27:08] --- kaj has become available [00:50:21] --- Simon Wilkinson has left [01:09:27] --- rod has left [01:40:16] --- Simon Wilkinson has become available [01:51:21] --- Simon Wilkinson has left [02:17:07] --- Simon Wilkinson has become available [03:19:22] --- rod has become available [03:28:32] This keyring thing is rather nasty. I'm also looking at a situation where the keyring is destroyed (by, for example, pam_keyinit), and yet the groups remain. [05:29:56] --- Simon Wilkinson has left [05:39:44] --- Simon Wilkinson has become available [05:41:56] --- meffie has left [05:43:58] --- Simon Wilkinson has left [05:44:08] --- Simon Wilkinson has become available [05:47:34] --- meffie has become available [05:59:20] --- Simon Wilkinson has left [06:03:09] --- jaltman has left: Disconnected [06:03:38] --- jaltman has become available [06:17:51] --- jaltman has left: Disconnected [06:38:33] --- jaltman has become available [06:55:28] --- jaltman has left: Replaced by new connection [06:55:28] --- jaltman has become available [07:04:10] --- jaltman has left: Disconnected [07:04:20] --- jaltman has become available [07:06:37] --- kaj has left [07:08:29] 1. cablevision just finished re-wiring the house from the pole. hopefully the cable modem will be able to hold the link signal now. 2. time warner will be at the nyc apartment to re-wire that modem this afternoon. 3. I'm not going to be able to pick up the Xserve today. I'm going to have to wait for it to be delivered on Thursday. I'm simply out of time. Which is fine since I will need to get the Cisco routers setup before I install the Xserve [07:11:35] mix? [07:11:38] oops [07:23:42] --- deason has become available [07:47:35] --- Simon Wilkinson has become available [08:03:50] > but it's a little pathalogical. not hypothetical, though :) [08:08:07] There's actually an entire set of brokeness that we can encounter, sadly. [08:08:29] I've got a fix for when keyring and groups don't match. But what do you do when the keyring has been destroyed entirely? [08:17:30] --- meffie has left [08:43:29] --- Simon Wilkinson has left [08:48:13] --- jaltman has left: Disconnected [08:50:48] --- reuteras has left [08:51:15] I suppose it would be a bad idea (or impossible) to try to remove the pag gids from the group list when a keyring pag gets destroyed? [08:51:45] well, i'd want to also gc the tokens if that wasn't done. but that would be my theory too [08:53:39] err, not necessarily when it gets destroyed, but whenever you put back a ref or something [08:54:27] i am now confused what you're proposing [08:57:23] well, the way I came about saying that was.... if you have a pag key that gets destroyed, you could have several processes that reference that pag in their grouplist, right? [08:57:57] yes. [08:58:04] so you'd either have to walk the proc table to find and delete any gid pags (ew).... or, what you should do is just remove th pag from the group list whenever that process deletes it's ref to the key [08:58:25] at least... I think; it's been a long time since I looked at that kind of thing [09:02:39] oh, yeah, probably [09:30:47] --- Simon Wilkinson has become available [10:03:45] --- Simon Wilkinson has left [10:13:57] --- jaltman has become available [10:17:01] --- Kevin Sumner has become available [10:18:57] --- jaltman has left: Replaced by new connection [10:18:57] --- jaltman has become available [10:21:43] --- rod has left [11:33:05] --- rra has become available [11:46:13] --- jaltman has left: Disconnected [12:18:28] --- Jeffrey Altman has become available [12:25:33] --- Jeffrey Altman has left: Replaced by new connection [12:25:33] --- Jeffrey Altman has become available [12:27:58] --- Jeffrey Altman has left: Disconnected [12:51:20] --- jaltman has become available [13:02:34] --- Jeffrey Altman has become available [13:02:40] --- Jeffrey Altman has left [13:06:28] --- Jeffrey Altman has become available [13:06:32] --- Jeffrey Altman has left [13:49:41] --- dwbotsch has left [13:50:01] --- phalenor has left [13:50:37] --- dwbotsch has become available [13:59:54] --- phalenor has become available [14:06:42] --- Simon Wilkinson has become available [14:07:06] --- jaltman has left: Disconnected [14:07:26] We don't have that ability, sadly. [14:08:22] There's no guarantee that the keyring destruction callback is called by the process which destroyed the keyring. Even then, that's just one process - many processes may have had a reference to that key over time. [14:08:35] Basically, the only thing you can do in that situation is walk the proc table. [14:09:15] The other option, and the one that I'm leaning towards, is having PagInCred remove the special group if it discovers that there isn't a keyring any more. [14:09:27] That allows users to escape from a PAG, but at least makes it clear what's happened. [14:09:46] The other thing we need to do is to stop actually using credentials which are scheduled for destruction. [14:14:00] --- jaltman has become available [14:15:56] We _already_ walk the proc table periodically to GC PAGs [14:16:11] Actually, we don't. [14:16:17] Well, we used to [14:16:23] Once keyrings are disabled, we no longer walk the proc table. [14:16:50] We can't safely walk the proc table in recent (most 2.6.x) Linux, because to do so safely you need to hold the rcu_lock, and we can't, because we're ot GPL. [14:17:10] s/keyrings are disabled/keyrings are enabled/ [14:17:13] Oh, right. [14:20:04] blargh. so, what is the original problem here? if we use keyrings, then we get notified when a key is destroyed, and we should be able to keep a list of active PAGs with refcounts. If we don't use keyrings, we can walk the proc table to find groups. Either way, we can tell which PAGs are referenced, and GC connections and credentials belonging to PAGs which are not in use, which we should already do. so, somewhere along the way I must have missed there being a new problem. [14:20:40] if someone clobbers our keyring, we lose the keyring PAG, but we also have the PAG id recorded in the group list [14:20:56] The problem is that in 1.4.x, we track PAGs in two places - the group and the keyring. [14:20:57] so when we go to look up the PAG, we see the PAG id in the group list [14:21:14] but we destroyed that PAG [14:23:46] Yes, so if you destroy your keyring, then you risk your PAG not being referenced in a way we can see, which means your credentials may disappear. Of course, if you do something that triggers PAG recovery before the credentials have gone away, it's all good. If not, you lose. In cases where we intend to disassociate a process from a PAG, we should clear the group as well, so that PAG recovery can't reconnect the process to that PAG. [14:24:01] Is there actually a situation where gid-based PAG recovery still helps us? [14:24:11] None at all. [14:24:24] But it was felt that moving to just using keyrings was too big a behaviour change for 1.4.x [14:24:47] In which case, we should get rid of it, stop putting PAGs in GID's, and get on with life. Well, in 1.6. Certainly not in 1.4. [14:24:56] Is there a new bug in 1.4? [14:24:59] Yes. [14:25:33] er, what if we don't have keyrings? or is the suggestion to just use one or the other, not both? [14:25:37] (I'm sorry to ask stupid questions, but I somehow have no context on this) [14:25:53] So, 1.6 has the better answer. [14:25:54] If we don't have keyrings, we're using groups, as always. [14:26:32] Or else we're losing, if you have neither keyrings nor the things we need to do group-based PAGs. For example, on a new kernel with keyrings configured off, you lose. [14:26:32] Where we have keyrings we use them, and they're authoritative. We populated the group list for the folk who check the group list to see if they're in a PAG, but the kernel module ignores the group list entirely. [14:26:46] When we have groups, and we can hook the syscall table, we use groups. [14:27:06] When we can't hook the syscall table, we stagger on as best we can, but you'll drop your PAG whenever you setgroup [14:27:08] s/have groups/don't have keyrings/ Yes, that's the best option from 1.6 forward [14:27:53] The problem is 1.4.x [14:27:54] FWIW, I think if things like PAM had been around when this stuff was originally written, we might not have ever trapped setgroups(), and just required that things which called it also know enough to preserve PAGs. [14:27:55] yeah, but I thought the trouble was determining what to do in 1.4 [14:28:08] OK, so let's talk about 1.4. What's the new bug? [14:28:42] There are two different versions, both of which have the same cause [14:29:06] The first is that, in some situations, you can end up with a keyring containing one PAG, and a group list containing the other. [14:29:33] The PAG contained in the group list is garbage collected, and you mysteriously lose tokens. [14:30:18] (forgot to mention earlier, the "pathological" case I mentioned before was also caused by a different openafs bug; so even if we don't solve the keyrings thing, the problem has already gone away) [14:30:21] The second is that, in some situations, your keyring PAG gets destroyed (generally when a PAM module comes along and tries to give you a new session keyring). This garbage collects the PAG, despite the fact that it is still referenced by the group list. [14:30:40] for that _specific_ case, that is; there are of course others [14:30:53] We saw the second in testing for Fedora 13. [14:31:20] We're seeing the first in production use on SL5 on heavily loaded servers - we've yet to identify the root cause, beyond the fact that the group and keyring PAGs are becoming disjoint. [14:31:57] OK. So, the second situation has been true all along, and the solution is to configure your PAM stack not to do that. I believe I recall a past discussion about adding a pioctl and PAM module parameters to allow adding a PAG to an existing session keyring, instead of cutting a new one, in part to help deal with this sanely. [14:32:39] and, as simon noted earlier, it would help these cases if we stop using "destroyed" pags immediately, instead of waiting for the GC to come around [14:33:40] If the group and keyring PAGs are becoming disjoint, then obviously something is wrong. At present, the confusing part is that only the keyring reference "counts", but the group PAG is the one we actually use (groups are authoritative; keyrings are backup). In 1.6, the behavior might be that you'd end up using the wrong PAG, or none at all [14:34:10] Typically, it is the group based PAG that is "wrong". [14:34:25] (what tends to happen is a variation of getgroups(); setpag(); setgroups() ) [14:34:57] well, that might be because we wouldn't notice if the keyring pag was wrong [14:35:02] I think we should be concentrating on the first thing, and on fixing it. The second thing (someone replaces your session keyring) is IMHO not really an AFS bug; it's a configuration problem. [14:35:16] That's my feeling too. [14:35:24] Although, the second is a useful way of testing the first. [14:36:10] I have a patch for 1.4 makes PagInCred check that the group PAG matches the keyring one, and if it doesn't, calls __setpag() with the keyring version. [14:36:19] OK, so, if it's the group-based PAG that's wrong, then 1.6 won't have this, because it ignores the group-based PAG. And if it's wrong because a user-mode process is making it wrong, rather than because of an AFS bug, then groups-based systems won't have it, because in such systems we trap setgroups and don't allow it to change the PAG. [14:36:39] Yeah. I don't think there is a problem with 1.6 [14:36:45] So, effectively, you switch to treating keyrings as authoritative. [14:36:47] It's just that I can't deploy 1.6 here just yet. [14:37:11] I am all for continuing to make 1.4 stable and usable. [14:37:32] The question I'm wondering about is what the correct fix for 1.4 is. [14:38:02] Backporting the 1.6 fix is challenging, as it's a big feature change, and because Marc pretty much rewrote the bulk of the PAG code to make it happen. [14:38:26] So I think my 'fix up' solution is probably a good compromise for 1.4 [14:38:29] Good question. Switching to treating keyrings as authoritative seems like a big change, but maybe it isn't. [14:39:04] Basically, you say that if both are present, and they don't agree, we make them agree, right? [14:39:23] if only the groups are wrong, it's not really a noticeable change at all (except for those deliberately changing their pag with setgroups) [14:39:23] and if only the keyring is present, we already copy it into groups, yes? [14:39:36] Indeed. [14:40:48] I don't care about people deliberately changing their PAG with setgroups. That's never been supported, and can only possibly work on 1.4.x on Linux new enough that we can't trap setgroups. If we can set setgroups, we prohibit that. [14:40:56] --- Kevin Sumner has left [14:41:14] --- Kevin Sumner has become available [14:41:28] So, except for something nonportable, unsupported, and which I doubt anyone is actually counting on, there isn't actually a behavior change. Well, except for fixing the bug. [14:41:51] So, the fix I'm proposing sounds fine? [14:41:51] In fact, come to think of it, the fact that setgroups() can change your PAG _is_ the bug. [14:42:08] Yes, I think it does, if the code changes are not excessively complex. [14:42:22] It's about 5 lines. [14:42:40] I just need to get it onto one of our servers which are showing the bug to verify it solves our problem./ [14:42:53] Sadly they're multi-user machines, and scheduling downtime is like pulling teeth. [14:43:37] just say someone tripped over the power cord ;) [14:44:41] > If we can set setgroups, we prohibit that ? I don't think so.... or are you just proposing that's what we change it to? [14:44:59] I'm kind of surprised that my coworkers didn't hound me more than they did when I rebooted our machine (twice, actually) by calling a function in gdb (as a mortal user). [14:45:22] "the PAGs tripped over the power cord" [14:46:59] our users probably hang our machines and force a reboot more often than we have to reboot them for upgrades and the like [14:47:34] "oh, this machine has 64GB of memory? I think I'll load up a data structure in R that uses 96GB!" [14:48:23] No, that's what we _do_ do. If we can trap setgroups, then if you are in a PAG before setgroups, you are in the same PAG after setgroups. Hm, except that's not what the code on Linux does. [14:48:24] The Linux OOM killer is evil. [14:48:57] We had someone running an rsync on a filesever (don't ask) that used up all of the memory. Rather than take out the rsync, the oom killer hit the fileserver. [14:49:00] It would be less evil if it didn't make totally wrong decisions. [14:49:03] Cue day long AFS outage. [14:49:24] where's that link for the OOM airplane analogy... [14:49:55] http://lwn.net/Articles/104185/ [14:51:18] Yes, but see 104145, which contains a patch to allow certain programs to be exempted. Not exactly how I'd configure it, but... [14:51:39] Simon Wilkinson: what exactly is going on with your machines that you're seeing pags disappearing? we've never seen anything like that, unless it's just going unreported... [14:54:12] jhutz: there is a way to do basically that on moder linux, but I forget if that's the patch that they actually used [14:54:23] (and I forget what the knob is called) [14:54:47] So, what happens is that at some points users lose their tokens. The symptoms is that they no longer have any tokens. Nothing is logged in dmesg, although they still have a PAG-style group in the group list, and they still have a valid afs keyring entry. [14:55:38] Further prodding reveals that the PAG in the keyring differs from that in the group list. And that the PAG in the keyring has been destroyed. [14:56:41] The PAG in the _keyring_ has been destroyed? That's odd. [14:57:02] Sorry, the PAG in the group list has been destroyed. It's late here ... [14:57:34] OK, that makes more sene. [14:58:57] The thing that I don't know is when the group PAG became wrong. [14:59:37] Because things keep working until they don't, which means that the group PAG must, for most of the life of the session, have valid credentials. [15:00:08] Yeah. So... 1) use kprobes to track group-PAG changes? 2) use ksplice to install your fix? [15:00:27] Yeah. Both of those are good ideas. [15:00:46] Oh, but... 0) sleep [15:00:54] That sounds like a better idea :) [15:14:16] --- Simon Wilkinson has left [15:19:56] --- jaltman has left: Disconnected [15:20:06] --- jaltman has become available [15:35:07] --- deason has left [17:26:53] --- rra has left: Disconnected [17:43:52] --- Kevin Sumner has left: Lost connection [20:36:11] --- Born Fool has become available [20:37:11] --- jaltman has left: Replaced by new connection [20:37:12] --- jaltman has become available [21:05:05] --- deason has become available [21:47:26] --- Kevin Sumner has become available [22:43:06] --- deason has left [22:50:13] --- Russ has become available [23:05:10] --- reuteras has become available [23:50:37] --- Simon Wilkinson has become available