[00:22:54] --- abo has become available [06:08:37] --- deason has become available [06:27:54] --- Roman Mitz has become available [07:16:10] --- natefoo has become available [07:18:26] hey all, i encountered a kernel panic from afs_background with 1.6.0(-3-debian) on kernel 3.1.0 on x86_64. is there a known issue/something fixed in dev or should i report it? [07:21:34] report it. hopefully with a backtrace. [07:22:10] or means to reproduce or pretty much anything other than "it paniced" [07:23:24] yeah, i have a backtrace. [07:23:41] all i was doing was copying data, i'll try doing that again to see if it happens again. [07:23:48] i had cache bypass enabled but i assume that won't matter. [07:23:56] it might. [07:24:01] oh wait, i was copying out. [07:24:09] sorry, this happened before break, everything's a bit fuzzy. ;) [07:24:24] i *thought* i pushed everything i had for cache bypass fixes in, tho, and i pushed it fairly hard. [07:24:52] they'd have to have been integrated downstream in the debian packages though. [07:25:36] i don't know what russ has pulled from 1.6.1pre1 into 1.6.0-3 [07:28:46] reporting method still rt? [07:28:57] mail openafs-bugs@openafs.org, which is rt [07:29:11] --- haba has become available [07:29:12] okay, thanks. [07:30:27] Happy new year (and our central sysadmin group fixed that the jabber server talks to the rest of the world again as you see) [07:30:33] pow! happened again. [07:30:42] * natefoo submits. [07:30:55] if reproducible, can you disable cache bypass and try? e.g. let's isolate [07:31:01] yup. [07:33:31] Is there a bug report in RT on that the AFS startup never terminates when network is not available at startup time, even if network appears later? Or even a fix? [07:34:38] (kernel module loaded, afsd running but /afs not mounted) [07:36:12] That then Ubuntu is stupid and tries to start openafs-client without network is an Ubuntu problem. [07:38:44] harald, there is a report but i don't know the number offhand. we've also taken a few passes at fixing it. of course the easy fix is dynroot [07:41:37] Ok. The more annoying thing is that if you happen to be in that state, when doing an openafs-client stop wedges the box somehow. [07:42:04] if you had a way to get a backtrace i suspect 1) you'd see a panic and 2) we could fix it [07:42:41] bug report sent. [07:44:00] There is nothing in the logs, but I could try sysrq and see if I get something out of that. [07:44:11] sure. [07:54:13] --- Russ has become available [07:57:10] Hi Russ! [07:57:47] My console shows a warning in bdi_forker_task [07:58:35] wild guess: i wonder if we are doing bdi deallocation when we didn't get far enough to allocate it at the start [07:58:43] er s/allocation/initialization [07:58:48] And if I google on bdi_forker_task I get this one http://us.generation-nt.com/answer/bug-608173-similar-problem-using-newer-version-openafs-modules-dkms-help-204735731.html (as said, hi Russ) [08:02:54] Yes, I think I have a duplicate of http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=608173 [08:04:49] However, this is 1.6 [08:06:44] haba: do you get COLD shutdown? [08:06:51] Yes [08:07:31] Yeah, that's probably that same thing, then. [08:07:55] I don't know what's different between a COLD and a WARM shutdown, but based on the bug reports I've gotten, COLD shutdowns are doomed and pretty much always result in a kernel panic. [08:09:18] There's a separate bug where non-dynroot AFS startup fails without a network. They're related only insofar the latter seems to lead to a COLD shutdown when you try to restart the client once the network is up. [08:09:47] Actually, on Ubuntu, AFS should not be started from a sysv5 script in /etc/init.d but from a "new religion" conf file in /etc/init. The reason is that the sysv5 init stuff is started when IFACE=lo is available (but eth0 may not be ready yet). [08:09:52] The openafs-client init script now depends on $network to try to make this less likely. [08:10:28] haba: I'm not willing to try to support upstart until Debian has policy for how to do so properly, so I'm afraid Ubuntu users are on their own with respect to upstart for the time being (since I don't use Ubuntu myself). [08:10:51] in /etc/init/rc-sysinit.conf (which starts the old scripts) it says: start on filesystem and net-device-up IFACE=lo [08:11:27] Yeah, but openafs-client depends on $network, so innserv that Ubuntu uses to run those scripts should still do the right thing. But maybe there's something else broken. [08:12:06] The script is run even if my network cable is OUT., [08:12:19] So yes, BROKEN in Ubuntu. [08:12:30] Well, the network cable being out doesn't necessarily prevent the network from being started; those are two unrelated things. [08:12:46] If you have a static IP address, then even Debian will happily consider the network started once it's run ifconfig, even if the network cable is unplugged. [08:13:49] yes. The actual event (default route supplied by DHCP) which you want is not there. [08:14:58] But afs should cope starting up without net even in the non-dynroot case. At least continue starting when the net then appears. [08:15:58] Currently I fix the problem on the workstations by root@tern:/etc/init# ed rc-sysinit.conf 1513 /lo start on filesystem and net-device-up IFACE=lo s/lo/eth0 start on filesystem and net-device-up IFACE=eth0 w 1515 q [08:18:47] Yeah, if that fixes it, a real upstart rule for OpenAFS would as well. [08:19:05] But yes, there's a deeper bug that the client croaks on startup if there's no network and it's not dynroot. [08:19:29] BTW, why don't you use dynroot? That's why few people see this, I think. We switched to dynroot on all of our systems years and years ago. [08:19:51] COLD is "mount failed and we didn't actually start being AFS" [08:20:01] I ask because one easy way to fix this would be to just rip out all the non-dynroot code.... [08:21:01] shadow: Ah, so I suspect somewhere in there we're freeing things that were never allocated on some failure modes that lead to a COLD shutdown, which is basically what you said above. [08:21:03] making mount succeed if network eventually appears would involve vnode monte, but that hasn't worked since the derek atkins code and isn't trivial to just do again [08:21:07] Why? The system group that runs the student ws wants to be able to change the view on /afs/ by changing contents in the volume methinks. [08:28:05] Ah, you have stuff in /afs other than mount points? [08:28:18] Er, cell mount points, I should say. [08:28:44] No, but it looks like: # ls /afs BROKEN-CELLS e.kth.se kth.se nada.kth.se stacken.kth.se @cell it.kth.se md.kth.se pdc.kth.se su.se [08:29:36] @cell is doable with dynroot. i dunno what BROKEN-CELLS is [08:29:37] the most often case I see for non-dynroot is when the org wants to restrict access to only the cells they expose in root.afs [08:30:07] which is doable with dynroot. disable afsdb and list only some cells in CellServDB. done. [08:30:15] In /afs there is the collection of cells most folks need and which answer quickly. [08:30:27] dynroot is dynroot. has nothing to do with what is visible in the client overall. [08:30:40] I know. Its security via obsurity [08:30:51] (here have a missing 'c') [08:31:04] security through absurdity would be funnier [08:31:08] then in BROKEN-CELLS are the slow ones that make ls --color whine. [08:31:45] so those could be in a volume and it's all doable with dynroot. just involves configuring it [08:31:54] non-dynroot is a easy way to trim the view of /afs without logging in to all computers. [08:32:21] How do you do that with dynroot without logging in? [08:32:57] without logging in wasn't on the table before i said that, so, changing your scope? meh [08:33:40] Just believe me that there is a reason that dynroot does not work in all circumstances. [08:33:46] i'm quite sure [08:34:04] but most of the people using it have reasons which are untrue. so, yes, i will nitpick [08:34:17] Besides there is a religious reason: It is not default. [08:34:32] that I believe should change with 2.0 [08:34:54] I agree that that should change [08:35:11] I assume there is no windows in the picture [08:35:23] i'd like to fix the no network failure, but it's not rally at the top of the pile of things to do [08:35:40] no windows in the current picture [08:36:44] It would be nice if either the crash issue or the not-start issue would be fixed. [08:36:53] Whatever is easier. [08:37:05] if you have a patch, it will get included [08:37:32] because I can tell people: And if AFS does not come up, you can repair it by openafs-client stop; openafs-client start. [08:37:38] I'm hopeful that the Debian process for figuring out how to cope with upstart is nearing a conculsion. [08:37:49] At which point I can start shipping an upstart script, which will solve your immediate problem. [08:39:06] Btw, the exact shutdown message of AFS is: afs: COLD shutting down of: vcaches... CB... afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener... ALL allocated tables... done [08:39:51] is that unexpected to you? [08:40:11] stupid idea to be discarded. can we avoid the vnode swap issue by allocating a special vnode value for "root.afs" and ignore the real AFS FID in that case? [08:40:11] If that does deallocate stuff which was not allocated as Derrick thinks, it's no wonder system wedges. [08:41:18] harald, given that doesn't really tell us what is allocated OR deallocated, just what is printed, .... [08:41:24] .0.1.1 to mean root.afs [08:41:29] it goes through all of those steps regardless; the specific steps are not "supposed to" do anything if they weren't brought up [08:41:40] jeff, possibly, the heavy lifting then becomes updating the contents of vnode 1 [08:42:01] sure but we do that all of the time with dv changes [08:42:16] Ah, you mean instead of .536871362.1.1 in this case. [08:43:02] the problem we have at startup without network is we don't know what root.afs resolves to [08:43:19] Another cheap way to bail out would be to detect that there is no network and then start dynroot (with a small warning in the log) [08:43:43] still better than crash [08:44:03] swapping to dynroot would be actively harmful [08:44:20] you think you have afs but instead you have something which doesn't match what you expect [08:44:22] I have this problem on Windows. The kernel module has no idea whether dynroot is in use nor does it care. all it wants is the root.afs volume so it asks for 0.01.1 and it gets it no matter what the actual volume is [08:44:34] if effort is to be spent it should be spent fixing the problem [08:45:06] the value zero is reserved so that name space is available for queries like this. [08:46:56] it requires someone with free time to work on it. [08:49:55] Which reminds me that my workday is at its end. Partly success for this issue at least for my part as now I know the reason of the crash. [09:58:03] --- Jeffrey Altman has left: Replaced by new connection [09:58:04] --- Jeffrey Altman has become available [09:58:11] --- jaltman/FrogsLeap has left: Replaced by new connection [09:58:12] --- jaltman/FrogsLeap has become available [10:00:07] --- haba has left [10:00:54] --- jaltman/FrogsLeap has left: Disconnected [10:14:01] --- jaltman/FrogsLeap has become available [10:19:47] --- jaltman/FrogsLeap has left: Disconnected [10:27:10] --- jaltman/FrogsLeap has become available [11:34:45] --- natefoo has left [11:35:06] --- natefoo has become available [14:04:57] --- Roman Mitz has left [14:14:36] --- sxw has become available [14:33:49] --- deason has left [15:00:30] --- sxw has left [15:10:44] --- jaltman/FrogsLeap has left: Replaced by new connection [15:10:45] --- jaltman/FrogsLeap has become available [15:25:23] --- Roman Mitz has become available [18:03:18] --- meffie has left [18:03:33] --- meffie has become available [18:04:07] --- meffie has left [18:44:03] Could those with rx experience please review http://gerrit.openafs.org/#change,6443 ? [19:52:29] --- Roman Mitz has left [23:35:47] --- Russ has left: Disconnected