[00:27:27] --- jaltman/FrogsLeap has left: Disconnected [00:27:35] --- jaltman/FrogsLeap has become available [05:29:06] --- haba has left [05:44:36] --- meffie has become available [06:22:51] --- jaltman/FrogsLeap has left: Disconnected [06:38:38] --- haba has become available [07:12:55] > build the kernel I was using a 'buildworld' (that is, full base system userland build) as a test for a while, as well. [07:44:32] --- deason/gmail has become available [08:01:48] --- phalenor has left [08:01:53] --- phalenor has become available [08:16:19] --- phalenor has left [08:16:25] --- phalenor has become available [08:18:40] thanks all [08:20:26] building openafs in afs iteratively with md5 sums of sun4x_510/dest/* logged after every run, then distclean, then zero-ing cache and re-setting its size [08:20:30] sound reasonable? [08:21:27] is there something in place nowadays that would catch all of the Linux cache corruption issues that were present a few years ago with releases? [08:27:58] --- Russ has become available [08:28:44] --- ktdreyer has become available [08:36:13] --- jaltman/FrogsLeap has become available [08:42:27] jblaine: what you are looking for is a regression test suite that contains reproducible test cases for each previously identified bug. such a beast does not exist. [08:42:36] stress testing is not the same as a regression test suite [08:44:04] fair enough, but stress testing is better than no testing, no? [08:46:25] no one is arguing that you shouldn't test [08:47:18] Even very simple sanity checking tests would catch cases where libafs.ko doesn't load [08:47:37] however, do not confuse testing with a specific usage pattern as providing broad test coverage for all combinations of operations and success/failure cases [08:48:10] jblaine asked about cache corruption [08:49:34] --- jaltman/FrogsLeap has left: Disconnected [08:58:02] --- jaltman/FrogsLeap has become available [09:02:31] --- phalenor has left [09:02:36] --- phalenor has become available [09:12:01] --- haba has left [09:23:36] --- jaltman/FrogsLeap has left: Disconnected [09:46:52] hmm, md5 summing dest doesn't work as a test [09:47:02] not comparing across builds at least [09:53:14] kaduk@mit.edu/barnowl: Here now -- what's up? [09:55:41] Was going back to my schroot/pam_afs_session woes that we discussed here: http://jabber.openafs.org/openafs@conference.openafs.org/2011-06-05.txt I got a chance to instrument the working login on a lucid machine, which had D(2): pam_putenv: set KRB5CCNAME=FILE:/tmp/krb5cc_20922_cLcfLE The failing session on a natty machine did not have a pam_putenv line for KRB5CCNAME. So, two questions: (1) Is this likely to be the "root source" of my problem, and (2) how do I tell which pam module is doing that putenv? [10:30:25] --- Russ has left [10:30:29] --- Russ has become available [10:31:52] Yeah, probably, if that file doesn't exist in the chroot. And maybe ltrace? [10:32:17] Or just running it inside a debugger. [10:32:33] The file should exist, though, and schroot sets it for the environment of the chrooted process -- it just doesn't seem to be available for pam modules. [10:32:48] > schroot sets it KRB5CCNAME [10:34:23] Oh, wait, I see -- the failing session doesn't have it. [10:34:42] Well, it's probably set by pam-krb5. [10:34:50] and yeah, not having that set will definitely cause aklog to not be run. [10:35:07] pam-krb5 will set KRB5CCNAME in the environment if one authenticates with a Kerberos password. [10:35:16] Are you running the 2.4 release of pam-afs-sesssion, btw? [10:35:27] Oh, natty, so probably not. [10:35:49] pam-afs-session 2.4 will fall back on KRB5CCNAME set in the general environment if it's not set in the PAM environment, which may just make your problem go away, regardless of what's causing it. [10:36:04] libpam-afs-session 1.7-2 [10:37:45] Okay, good to know that I'm going in the right direction, and I even have a possible workaround to test! [10:38:06] Thank you for the help. [10:44:06] --- jblaine has left [10:44:06] --- meffie has left [10:46:41] --- meffie has become available [10:56:54] --- jblaine has become available [12:26:16] --- jaltman/FrogsLeap has become available [12:34:31] deason/gmail: you there? [12:34:59] ya? [12:35:43] your dtrace for the sol 10 deadlock doesn't work for me with 1.4 patched [12:35:45] should it? [12:35:46] dtrace: invalid probe specifier fbt::osi_VM_MultiPageConflict:return { @["conflict"] = quantize(arg1);}: probe description fbt::osi_VM_MultiPageConflict:return does not match any probes [12:36:09] 1.4 patched how? [12:36:17] this is after the module is loaded? [12:36:36] yes [12:36:49] shadow: patched for the solaris 10 cache deadlock issue [12:36:54] ah, i see the patch now that gerrit refreshed [12:37:13] can you put 'dtrace -l | grep afs' in a pastebin or something? [12:37:23] er, 'dtrace -l | grep fbt:afs' [12:40:11] dtrace -l | grep fbt:afs --> nothing [12:40:25] fbt::afs ? [12:40:28] 2 colons? [12:41:42] output is ID PROVIDER MODULE FUNCTION NAME [12:42:14] fbt is showing in PROVIDER and afs in MODULE with functions like local_osi_Time [12:42:22] there is no fbt:afs or fbt::afs [12:43:26] sorry, let me get my solaris box up [12:45:18] dtrace -l | grep -i multipageconflict shows nothing [12:45:21] I meant, 'dtrace -l -m fbt:afs' [12:45:21] fwtf [12:47:28] http://pastebin.com/eWXNgx4Z [12:47:59] or 'nm /kernel/fs/amd64/afs | grep osi_VM_MultiPageConflict' (or sparcv9, or wherever the module is) [12:48:07] if that grep doesn't show anything, you're not running with the patch [12:49:57] hm, I think I see what happened. I rebooted and we selectively pull AFS client stuff from NFS before our AFS init script runs. I got bit by that and reverted. Will retreat and try again. [12:50:16] thanks [12:51:33] the dtrace stuff also suggests that you're not running it of course... or dtrace screwed up or something, which is why I'd check with nm [12:51:34] but yeah [12:52:56] also, I get the feeling recently that the i386 buildbot is pretty borked [12:57:33] for debian? [12:58:36] yes [12:59:06] or _something_ is; I don't know if it's the slave or what that's the actual cause, but it sure does seem to be failing on git or internal stuff a lot [13:01:00] deason/gmail: sweet, works fine [13:02:55] cool [13:06:30] yeah, noticed that :-/ [15:36:16] --- deason/gmail has left [16:58:08] --- summatusmentis has left [17:25:45] --- summatusmentis has become available [19:15:13] --- steven.jenkins has left [19:15:28] --- steven.jenkins has become available [19:17:02] Russ: I grabbed libpam-afs-session 2.4-1 from unstable and it seems to be a fix, or at least a workaround. So, "correct diagnosis", and thanks again. [19:17:02] --- shadow@gmail.com/barnowlA109197F has left: Lost connection [19:18:14] Sure thing. I'm guessing that for some reason some PAM module was exporting your regular KRB5CCNAME into the PAM environment in earlier versions and stopped. [19:20:13] It sure looks like it. Darned if I know what, though -- the pam_krb5 version on the two machines claims to be the same. [19:20:46] --- shadow@gmail.com/barnowlA109197F has become available [19:20:56] What really is bugging me is that I want to say I read something early on in my diagnosis which said KRB5CCNAME was no longer passed in, like in a changelog or something. But I can't find it again. [19:20:59] Yeah, pam-krb5 never would have done that. It only sets environment variables for its own stuff. [19:21:21] It wouldn't have exported environment settings for pre-existing tickets, I'm fairly sure. [19:26:56] (Do you want me to ponder possibilities loudly here, or should I shut up?) [19:27:41] Oh, I certainly don't mind! [19:36:01] Well, the pam.d directory is quite similar between the two machines; the newer one does not have pam_ecryptfs in a few files including common-*. pam.d/schroot is just auth sufficient pam_rootok.so @include common-auth @include common-account @include common-session [19:36:53] I suspect a different version of a common PAM module rather than a different module entirely. [19:37:01] Me, too. [19:37:34] But would things from common-auth be able to affect the pam environment for the session stack? (I'm not really familiar with how pam works, alas.) [19:38:51] Yeah, as soon as it goes into the PAM environment, it's there for the whole PAM session, unless the application closes out the PAM session and starts a new one. [19:46:27] Well, it looks like we've got pam_unix, pam_winbind, pam_krb5, pam_echo, pam_deny, pam_permit, and pam_ecryptfs before pam_afs_session runs. [19:47:47] Hm. Those all seem relatively unlikely. [19:47:57] I wonder if libpam itself could have been lifting environment variables? [19:56:02] grepping for KRB5CC in /lib/x86_64-linux-gnu/security/pam*so finds pam_krb5, pam_afs_session, and pam_winbind on the old (working) machine. Hmm, it finds nothing on the new (natty) machine ... which is understandable since none of those three are present in that directory. [19:57:03] Ah, they're in /lib/security and all three still match the grep. [20:08:31] --- ktdreyer has left [20:09:03] --- ktdreyer has become available [20:14:53] (pam_winbind is from the samba source package) There certainly seems to be a function here _pam_setup_krb5_env that calls pam_putenv(ctx->pamh, "KRB5CCNAME=stuff") ... but this is in the source package I got from the new (broken) machine. Hmmm. [20:34:43] Ah, huh, I wonder if that may be it. [20:35:08] Maybe it changed its logic about when it does that. [20:35:15] --- shadow@gmail.com/barnowlA109197F has left: Lost connection [20:35:25] I am suspecting it is. (Currently pulling source packages for diff-ing.) Though the configuration in pam.d is the same on both systems, so that. [20:36:07] --- shadow@gmail.com/barnowlA109197F has become available [21:13:37] --- mfelliott83429 has become available [21:13:37] --- mfelliott83429 has left [21:13:37] --- mfelliott37436 has become available [21:13:37] --- mfelliott37436 is now known as mfelliott91173 [21:13:37] --- mfelliott91173 has left [21:13:37] --- mfelliott has left [21:13:37] --- mfelliott37436 has become available [21:13:55] --- jaltman/FrogsLeap has left: Disconnected [21:14:07] --- jaltman/FrogsLeap has become available [21:22:23] --- jakllsch has left [21:30:56] --- mfelliott has become available [21:30:56] --- mfelliott37436 has left [21:34:17] --- jakllsch has become available [21:50:22] --- kula has left [23:40:44] --- Russ has left: Disconnected