[00:29:02] --- kaj has become available [01:17:30] --- Simon Wilkinson has become available [01:22:23] Gerrit should now be sending emails using TLS [01:25:58] --- Simon Wilkinson has left [01:35:53] --- steven.jenkins has left [01:38:50] --- steven.jenkins has become available [01:55:25] --- rod has become available [02:52:37] --- jaltman has left: Replaced by new connection [02:52:37] --- jaltman has become available [03:43:16] --- Simon Wilkinson has become available [03:44:29] Except that smtp.stanford.edu is returning a 421 error when it tries. So I've reverted back to sending email unencrypted [03:47:17] --- Simon Wilkinson has left [04:21:55] --- Simon Wilkinson has become available [04:22:04] --- Simon Wilkinson has left [04:40:23] --- Simon Wilkinson has become available [04:51:57] --- Simon Wilkinson has left [05:05:16] --- jaltman has left: Replaced by new connection [05:05:16] --- jaltman has become available [05:23:00] Simon: do you know which SMTP command the 421 error is in response to? [05:25:04] --- Simon Wilkinson has become available [05:49:29] --- Simon Wilkinson has left [05:52:24] --- jaltman has left: Replaced by new connection [05:52:24] --- jaltman has become available [05:55:36] --- jaltman has left: Replaced by new connection [05:55:36] --- jaltman has become available [06:01:20] --- jaltman has left: Replaced by new connection [06:01:21] --- jaltman has become available [06:59:50] --- jaltman has left: Disconnected [06:59:59] --- jaltman has become available [07:11:56] --- reuteras has left [07:15:26] --- jts has become available [07:30:40] --- Simon Wilkinson has become available [07:31:51] Lack of root access stops me from answering that question. [07:32:07] All I know is the exception that Java throws - I can't get a packet dump to actually work out what's going on. [07:32:35] It might be possible to work it out from the exception. Hang on. [07:43:35] s_client -starttls smtp ? [07:48:25] Well, the error is coming in response to us trying to start TLS - it's in the connection establishment code, rather than anywhere else. I've already turned off certificate validation. [09:16:15] --- rra has become available [09:34:38] --- kaj has left [10:08:42] --- Simon Wilkinson has left [10:08:42] --- Simon Wilkinson has become available [10:09:41] --- Simon Wilkinson has left [10:09:48] --- Simon Wilkinson has become available [10:10:45] So, we're now seeing a variation of the issue that Andrew described in #125198 [10:11:17] Essentially, what's happening is that the PAG number stored in the keyring, and the PAG number in the group set are becoming distinct. [10:11:40] inkling which is correct? [10:12:06] and you are 1.4 or 1.5 on these clients? [10:12:23] Well, the problem is that the garbage collection is done based on the keyrings being destroyed. [10:12:33] So what we end up with is the keyring PAG being garbage collected whilst it's still in use by the group PAG. [10:12:44] This is 1.4 - in 1.5 we fixed it by just ignoring the group PAGs entirely. [10:12:56] syscall hooked? [10:13:06] No syscalls hooked. [10:13:09] if no, how this happened is obvious [10:13:12] yeah, ok [10:13:25] Yeah - how this happened is obvious. What I'm wondering is [10:13:29] so.. backport the 1.5 change? [10:13:32] 1) Will we be doing another 1.4.x release? [10:13:37] we will [10:13:48] 2) What kind of fix would be appropriate for that release. [10:13:48] regardless of what else happens, we will [10:14:35] Backporting the 1.5 change involves a big rewrite to the way that we handle PAGs. If that's acceptable, then I'll do that. [10:14:36] I was wondering if something less invasive might be preferred. [10:14:46] disable keyring pag gc from groups when not hooked [10:14:58] less invasive would be ideal, yes [10:16:21] With this kernel, disabling keyring pag gc means completely disabling gc. [10:16:44] uh. can we gc pags based on unused keyring (only) [10:17:36] I'm wondering if the solution is to have PagInCred check the PAG in the keyring, and if it doesn't match, update the user's groups so that they're in the correct PAG. [10:18:06] so, here's an interesting question: does it make sense to call a volume which will later trigger VOFFLINE online in attaching if we return a volume pointer but it's to a non-V_inUse() volume? if not, VAttachVolumesByPartition should change, or VAttachVolumeByName needs to put back vp if V_inUse isn't set at the end of the function. [10:18:44] > I'm wondering if the solution is to have PagInCred check the PAG [10:18:45] it is. [10:18:52] that was the original intent [10:19:13] i'm not sure it's enough but it's probably good enough [10:19:45] The other option is to do what 1.5 does and always use the keyring PAG, and just set the PAG In the group if there isn't one there already. [10:19:45] What would you prefer? [10:19:57] Okay. I'll take a look at doing that for 1.4 [10:20:22] behavior change for 1.4 is probably bad. even if almost certainly safe. just leaving it be is better [10:20:46] since it won't be exactly what 1.5 is, we'd not even have operational "it's correct" experience. so the closer to untouched it can be, the better [10:21:58] The only truly safe option would be to just disable garbage collection. But given the O(n^2) performance of the afs_user stuff that MS reported, that isn't a viable option. [10:22:26] yeah [10:25:30] --- Simon Wilkinson has left [10:25:31] --- Simon Wilkinson has become available [10:26:29] So the options boil down to - doing what 1.5 does (PAG rules them all, groups are informational), or doing a new thing (PAG updates group list on lookup, group list is authoritative otherwise) [10:26:40] Risk wise, I wonder if the 1.5 approach is safer,though it is a behaviour change. [10:26:56] s/PAG/keyring/ [10:27:14] Sorry yes. [10:27:39] my take is risk wise 1.5 way is safer but i'd be worried about missing an edge case and being wrong about that [10:28:44] Yes. Although we're finding the current 1.4 client unusable on Fedora 13 due to PAGs being incorrectly discarded, so all things are relative :) [10:29:10] well, that could argue for change only where needed, but... [10:29:23] It's occasionally broken on SL5, too. [10:29:51] It's a race, so it depends on exactly how token acquisition is configured, and how speedy/loaded the machine in question is. [10:30:06] do you have a reasonable test case? [10:30:12] even vaguely? [10:30:13] On Fedora 13, yes. [10:30:29] then maybe the behavior change is less bad [10:30:55] Andrew has a test case that will break everywhere, but it's a little pathalogical. [10:31:13] --- jaltman has left: Replaced by new connection [10:31:14] (Andrew's test case is that you do setpag(); getgroups(); setpag(); setgroups() [10:31:14] --- jaltman has become available [10:31:34] well, that's the simplest test, yes [10:31:48] and well-known, but not... [10:31:53] not uh... likely [10:32:16] Yeah. We can trigger this with pam_afs_session in a bog standard Fedora 13 PAM stack. [10:32:35] Pretty reliably throws your tokens away after 10->20 minutes (depends on when the GC run is scheduled) [10:36:03] --- rod has left [10:37:12] --- kaj has become available [10:53:04] Well, crap, I wonder what's wrong with TLS negotiation. [10:53:13] Simon, could you just change the outgoing smtp server to smtp-unencrypted.stanford.edu for right now? [10:53:19] Until I get a chance to figure out what's going on? [10:56:23] --- Simon Wilkinson has left [10:57:49] --- jaltman has left: Replaced by new connection [10:57:50] --- jaltman has become available [10:58:33] --- sxw has become available [11:19:28] --- Simon Wilkinson has become available [11:21:56] Anyone mind if gerrit gets restarted? Shout now, or ... [11:22:09] kick it [11:23:46] Okay, we're now sending via smtp-unencrypted.stanford.edu [11:37:20] And, actually, we'll still have the keyring vs groups issue on 1.5.x, if we're using a kernel that doesn't have creds structures attached to processes. Bah. [11:58:00] --- jaltman has left: Disconnected [12:04:04] --- sxw has left [12:07:29] I'd forgotten just how messy 1.4.x builds are ... [12:09:49] --- jaltman has become available [12:32:29] --- sxw has become available [12:49:41] --- sxw has left [13:06:16] --- sxw has become available [13:09:16] --- kaj has left [13:09:20] --- kaj has become available [13:15:45] --- geekosaur has left [13:15:51] --- geekosaur has become available [13:20:47] --- sxw has left [13:30:47] Have a test for the 1.4.x PAG problem, which folk will be testing for me tomorrow. If all goes well, will push to gerrit. [13:33:30] ok [13:40:34] --- JSund_ has become available [13:40:34] --- JSund_ is now known as JSund__ [13:40:34] --- JSund has left [13:40:34] --- geekosaur has left [13:40:34] --- JSund__ is now known as JSund [13:40:34] --- JSund is now known as JSund_ [13:40:34] --- JSund_ is now known as JSund__ [13:40:37] --- geekosaur has become available [13:44:43] --- JSund__ is now known as JSund [13:44:43] --- JSund is now known as JSund_ [13:49:35] --- JSund_ is now known as JSund__ [13:49:35] --- JSund__ is now known as JSund [13:50:22] --- sxw has become available [13:52:42] --- sxw has left [13:54:35] --- JSund is now known as JSund__ [13:54:35] --- JSund__ is now known as JSund_ [13:59:34] --- JSund_ is now known as JSund [13:59:34] --- JSund is now known as JSund_ [13:59:34] --- JSund_ is now known as JSund__ [14:04:34] --- JSund__ is now known as JSund [14:04:34] --- JSund is now known as JSund__ [14:04:34] --- JSund__ is now known as JSund_ [14:09:34] --- JSund_ is now known as JSund [14:09:34] --- JSund is now known as JSund__ [14:14:34] --- JSund__ is now known as JSund_ [14:14:34] --- JSund_ is now known as JSund [14:14:34] --- JSund is now known as JSund__ [14:14:52] --- geekosaur has left [14:14:57] --- geekosaur has become available [14:19:34] --- JSund__ is now known as JSund [14:19:34] --- JSund is now known as JSund_ [14:19:34] --- JSund_ is now known as JSund__ [14:21:37] --- JSund__ is now known as JSund_ [14:21:37] --- JSund_ is now known as JSund [14:22:07] --- JSund is now known as JSund_ [14:22:07] --- JSund_ is now known as JSund__ [14:22:07] --- JSund__ is now known as JSund [14:22:11] --- JSund has left [14:22:11] --- JSund_ has become available [14:22:11] --- JSund_ is now known as JSund__ [14:22:16] --- JSund__ has left: leaving [14:23:02] --- JSund has become available [15:23:41] was the consensus that tviced\*.[ch] should move to viced ? [15:25:20] i believe it was, but i didn't wanted to do it as part of the other mess [15:25:38] ok. I will make windows build and then move them [15:36:54] --- shadow@gmail.com/owl56357369 has left [15:38:13] --- shadow@gmail.com/owlECA78C6F has become available [17:19:14] --- jts has left [17:44:15] --- kula has left [19:02:03] --- kula has become available [19:18:48] --- phalenor has left [19:28:41] --- phalenor has become available [19:51:55] --- steven.jenkins has left [19:51:57] --- steven.jenkins has become available [19:52:17] --- jts has become available [20:00:08] --- steven.jenkins has left [20:00:38] --- steven.jenkins has become available [20:11:01] --- phalenor has left [20:21:02] --- phalenor has become available [20:29:30] --- dwbotsch has left [20:30:26] --- dwbotsch has become available [20:39:57] --- Born Fool has become available [20:47:34] --- phalenor has left [20:57:35] --- phalenor has become available [22:20:03] --- rra has left: Disconnected [22:33:04] --- Russ has become available [22:57:17] --- kula has left [23:18:45] --- reuteras has become available [23:24:42] --- Born Fool has left [23:33:29] --- kaj has left [23:49:42] --- rod has become available