[00:37:28] --- Russ has left: Disconnected [00:45:14] --- reuteras has left [00:45:54] --- reuteras has become available [05:13:44] --- reuteras has left [05:13:44] --- reuteras has become available [05:25:04] --- reuteras has left [05:25:04] --- abo has left [05:25:04] --- reuteras has become available [05:25:34] --- abo has become available [05:29:34] --- reuteras has left [05:31:14] --- jaltman/FrogsLeap has left: Disconnected [05:38:41] --- reuteras has become available [06:36:14] --- jaltman/FrogsLeap has become available [07:40:26] --- mho has left [07:41:13] --- deason has become available [07:43:00] >Program terminated with signal 3, Quit. that just happens if you stop the fileserver before the QUIT handler is installed [07:44:32] I'm not sure I follow. [07:44:59] bosserver kills viced with a QUIT signal; we install a signal handler to shut down gracefully on QUIT [07:45:09] the default action of the QUIT signal without a signal handler is to dump core [07:45:24] Okay. But we should do that installation quite early, right? [07:45:56] there's some patch somewhere to do it earlier, but there will always be at least a small window like that [07:46:29] --- haba has become available [07:46:57] oh, it's already in 1.6, heh [07:47:36] it used to be after volume attachment, now it's just right before it, I think [07:48:20] the reasoning that it's not installed until then is that installing it earlier makes it a bit more complex, since it uses mutexes and stuff that aren't initialized until later [07:48:41] but I just realized now that we could just install one right away that just quit without giving a core, and then put in the 'real' one later [07:48:59] My thoughts exactly [07:50:55] So if that only happened when I shut the fs instance down, then my earlier troubles were for other reasons. [07:53:07] --- reuteras has left [08:14:24] --- meffie has become available [08:18:26] * haba back in business. Hi everyone [08:19:57] I have this sshd OpenSSH_5.3p1 Debian-3ubuntu4 and I have configured it GSSAPICleanupCredentials no but it still seems that my tokens go away at logout (so a job started with nohup will not be able to write output anyway). [08:21:52] I know this is an ssh question, but you folks here use to know ;-) [08:28:44] --- reuteras has become available [08:29:31] Or is this a result of that the pag is now a keyring and all the keyrings are cleaned up? [08:35:48] --- reuteras has left [08:53:36] --- mho has become available [09:01:31] I have of course a workaround for the nohup problem, but I don't want to give it to the user in question, well, because, see yourself: kpagsh bash -c 'cp -p `echo $0 | sed s/FILE://1` `echo $KRB5CCNAME | sed s/FILE://1` ; afslog ; nohup ./testrun & ' $KRB5CCNAME [09:03:00] didn't someone do patches or scripts for screen which basically do that? [09:04:07] Some reseachers in physics do want to learn to navigate around in screen, some just don't. ;-) [09:04:40] i'm not suggesting navigating. i'm suggesting using it as nohup [09:05:58] Ah, as in 'screen command' instead of 'nohup command' [09:13:28] So in that case one needs to find or recreate these patches or scripts that probably should go into .screenrc. [09:32:15] --- jaltman/FrogsLeap has left: Replaced by new connection [09:32:16] --- jaltman/FrogsLeap has become available [09:50:54] --- rra has become available [10:03:53] --- jaltman/FrogsLeap has left: Disconnected [10:04:01] --- jaltman/FrogsLeap has become available [10:40:08] --- jakllsch has left [10:42:37] --- mfelliott64428 has become available [10:42:37] --- mfelliott64428 has left [10:42:37] --- mfelliott6797 has become available [10:42:37] --- mfelliott6797 has left [10:42:37] --- mfelliott81763 has become available [10:42:37] --- mfelliott has left [11:02:06] --- haba has left [11:02:36] I have some hackery that does that, but it depends on the fact that my .zshenv does it too, and is far from friendly for random users [11:52:22] Only half paying attention, but it looks like we still need to define AFS64BIT_ENV for the fbsd7{1,2,3,4} and 82 param.h files on 1_6_x. [11:52:34] i believe you are correct [12:47:04] --- jakllsch has become available [12:59:23] --- shadow@gmail.com/owlCA75C6BE has left [13:02:07] --- shadow@gmail.com/owlFB78C669 has become available [13:13:11] Derrick, did you see my note about the panic starting up with 1.6.x on freebsd 7.3? [13:14:01] last i knew you needed a console or something to get a backtrace? [13:14:20] Nah, this is a different machine. [13:14:36] (I am squatting a sipb office head for testing fbsd 7.) [13:15:19] Though apparently I didn't actually send the backtrace here. [13:20:42] Here it is: (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xffffffff8052d6c3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xffffffff8052db4c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0xffffffff805ac57b in namei (ndp=0xffffff803fe4f8a0) at /usr/src/sys/kern/vfs_lookup.c:129 #4 0xffffff800086bb91 in osi_lookupname ( aname=0xffffff000b345800 "/CellItems", seg=UIO_SYSSPACE, followlink=0, vpp=0xffffff803fe4f9a8) at /usr/ports/net/openafs/work/openafs-1.6.0pre1/src/afs/FBSD/osi_misc.c:44 #5 0xffffff800081a81a in afs_LookupInodeByPath (filename=Variable "filename" is not available. ) at /usr/ports/net/openafs/work/openafs-1.6.0pre1/src/afs/afs_init.c:262 #6 0xffffff800081ac3f in afs_InitCellInfo (afile=Variable "afile" is not available. ) at /usr/ports/net/openafs/work/openafs-1.6.0pre1/src/afs/afs_init.c:286 #7 0xffffff80008711fd in afs_syscall_call (parm=34, parm2=5422912, parm3=Variable "parm3" is not available. ) at /usr/ports/net/openafs/work/openafs-1.6.0pre1/src/afs/afs_call.c:914 #8 0xffffff80008270dd in afs3_syscall (p=0xffffff000b027ae0, args=Variable "args" is not available. ) at /usr/ports/net/openafs/work/openafs-1.6.0pre1/src/afs/afs_syscall.c:672 #9 0xffffffff807e8b7e in syscall (frame=0xffffff803fe4fc80) at /usr/src/sys/amd64/amd64/trap.c:920 #10 0xffffffff807d1beb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:339 [13:21:32] From a: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x7cf00000000 [13:22:30] I didn't look too closely at what's going on in the source, but cn_nameptr looks fishy in this printout: (kgdb) down #4 0xffffff800086bb91 in osi_lookupname ( aname=0xffffff000b345800 "/CellItems", seg=UIO_SYSSPACE, followlink=0, vpp=0xffffff803fe4f9a8) at /usr/ports/net/openafs/work/openafs-1.6.0pre1/src/afs/FBSD/osi_misc.c:44 44 if ((error = namei(&n)) != 0) { (kgdb) p n $1 = {ni_dirp = 0xffffff000b345800 "/CellItems", ni_segflg = UIO_SYSSPACE, ni_startdir = 0x0, ni_rootdir = 0xffffff007fe402b0, ni_topdir = 0xffffff803fe4f8f0, ni_vp = 0x6cd8051fc60, ni_dvp = 0x0, ni_pathlen = 18446742974385903616, ni_next = 0xffffff007fe402a0 "\001\177", ni_loopcnt = 7557007084064, ni_cnd = {cn_nameiop = 0, cn_flags = 4, cn_thread = 0xffffff000b027ae0, cn_cred = 0xffffffff80b90a20, cn_lkflags = 10013824, cn_pnbuf = 0x0, cn_nameptr = 0x6cd
, cn_namelen = -2135356896, cn_consume = -549745800064}} [13:23:02] yeah, well, what was passed in via the AFSOP_CELLINFO afs syscall? was it that? [13:25:16] Perhaps I should have a closer look at what the afs configuration files look like on that machine. [13:25:32] well, afsd should make it so you don't screw up like that [13:26:05] oh. huh. no, i see an issue [13:27:20] no, i'm wrong. mismatched braces mentally. the afsd code should work [13:29:30] > what was passed in via the AFSOP_CELLINFO afs ssycall? Er, where? [13:29:53] in afsd.c there is afsd_call_syscall(AFSOP_CELLINFO, fullpn_CellInfoFile); [13:30:51] Oh. That would be a bit of effort to detangle, I think. [13:31:14] well, it's just the argument to afs_InitCellInfo, basically [13:31:22] --- deason has left [13:31:23] --- deason has become available [13:32:24] It seems to have been stored in a register. [13:33:05] Though. (kgdb) down #4 0xffffff800086bb91 in osi_lookupname ( aname=0xffffff000b345800 "/CellItems", seg=UIO_SYSSPACE, followlink=0, vpp=0xffffff803fe4f9a8) at /usr/ports/net/openafs/work/openafs-1.6.0pre1/src/afs/FBSD/osi_misc.c:44 44 if ((error = namei(&n)) != 0) { (kgdb) p aname $7 = 0xffffff000b345800 "/CellItems" [13:33:53] sure. but that's after it passed through some hands. [13:34:07] It may be all I can get from this dump. [13:34:19] see if it's reproducible when you can [13:34:34] Unless I can pull it from the syscall args directly. [13:34:44] parm2 [13:35:00] (kgdb) p/x uap->parm2 $10 = 0x52bf40 [13:36:23] which ends up being copyinstr'd [13:36:31] but i assume copyinstr is not broken [13:37:23] One would like to hope so. [13:39:05] Changed ThisCell to be sipb.mit.edu and it looks to still be reproduced. (We'll see what happens if the machine comes back up.) [13:40:51] is (was?) there a CellItems file in the cache? [13:41:14] It should be a memcache, unless I'm completely braindead. [13:41:41] thanks. that's the detail i needed. [13:45:16] yeah, i have a fix for you. [13:45:35] Cool. [13:47:42] the master version, gerrit 3651, should cherrypick clean onto 1.6. [13:48:48] Hmm, that might even explain the crash that swills saw way back when. [13:49:39] a 1.6 (only) delta that adds AFS_64BIT_ENV defines would be welcome, if you have time [13:50:41] Probably late tonight. [13:51:17] i may beat you to it then [13:56:23] in fact i will [14:52:50] Well, the patch lets me start now. I think the panic that wedged mega-man was something else, though. [14:53:26] it looked like it [14:53:37] from what joshua said anyway [14:57:12] Also, the umount process hangs after shutdown, even to the point of preventing a clean reboot. [14:57:47] I will see if (1) aklog works and (2) vop*pages are still broken before I run off to a meeting. [15:01:03] unmount before shutdown and get a backtrace :) [15:04:18] Amusingly enough, I could still break to the debugger after all vnodes were synced while reboot was hanging: _sleep kern_synch.c:230 afs_osi_Sleep FBSD/osi_sleep.c:142 osi_StopListener FBSD/rx_knet.c:116 afs_shutdown afs_call.c:1284 [15:06:23] aklog works (or at least claims to), but $ fs la . panic: NOT MPSAFE and Giant not held cpuid = 1 Uptime: 5m53s Physical memory: 2033 MB Dumping 1380MB: 1365 1349 1333 1317kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 instruction pointer = 0x8:0x0 and I can't get to the debugger. [15:06:44] And I'm gone. [16:02:46] --- deason has left [16:34:16] --- Russ has become available [17:31:25] --- meffie has left [17:31:25] --- meffie has become available [19:29:18] --- jaltman/FrogsLeap has left: Disconnected [19:51:12] --- jaltman/FrogsLeap has become available [20:55:16] --- deason has become available [22:36:32] --- deason has left [23:21:42] --- reuteras has become available