[00:19:27] --- reuteras has become available [00:25:18] --- Russ has left: Disconnected [02:41:30] --- Simon Wilkinson has become available [02:44:00] --- Simon Wilkinson has left [03:00:36] --- kaj@kth.se has left [03:03:39] --- kaj@kth.se has become available [04:00:59] --- kaj@kth.se has left [05:00:08] --- kaj has become available [06:04:37] --- shadow@gmail.com/owlF2E50C3B has left [06:05:09] --- shadow@gmail.com/owlF2E50C3B has become available [06:31:17] --- Simon Wilkinson has become available [06:38:44] --- Simon Wilkinson has left [07:26:32] --- deason has become available [07:53:00] --- reuteras has left [08:06:37] --- jaltman has left: Disconnected [08:27:59] --- kaj has left [08:32:23] --- Simon Wilkinson has become available [08:58:09] --- Simon Wilkinson has left [08:58:12] --- Simon Wilkinson has become available [09:42:05] --- rra has become available [10:06:17] --- meffie has left [10:20:54] --- meffie has become available [11:36:22] --- jaltman has become available [13:03:31] --- kaj has become available [13:15:53] --- jakllsch has become available [13:16:30] --- jakllsch has left [13:16:34] --- jakllsch has become available [13:18:21] is this thing on? [13:18:52] nope :) [13:19:01] k [13:19:33] does anyone have any idea how "locking" is supposed to work where it doesn't already? [13:19:48] i keep trying things i don't know what do and not getting very good results :-) [13:20:28] be a little more specific? "locking"... what? [13:20:40] vnodes? [13:21:06] fileserver-side or client-side? [13:21:06] or cache manager "object"-thigies, i'm not sure what i'm doing :-) [13:21:09] client [13:22:13] are you asking about how the locking mechanism works, or where/why you want to lock them? [13:22:23] the latter [13:22:59] i'm diving head first into the NetBSD CM port [13:23:01] what do you mean by "doesn't already"? a new platform? [13:23:03] oh, okay [13:23:46] for locking 'struct vcache's on the afs cm-side, I would think just looking at what other platforms do is what you want to follow.... [13:24:19] if you mean locking stuff that the netbsd kernel expects you to have locked or some kind of mutual exclusion, well... if there are no netbsd people around, I don't think you'll get much information here :) [13:24:50] although as a person who knows nothing about netbsd internals, I would naively guess that it works at least a little bit similarly to the other BSDs [13:24:53] vc->rwlock is the particular one in question [13:24:58] i'm not sure what it actually locks :-) [13:25:16] --- kaj has left [13:25:22] ah, like what structure it's protecting, etc? [13:25:46] yeah [13:28:58] I'm not actually sure... that lock only exists on certain platforms... [13:29:19] if shadow@gmail.com/owlF2E50C3B or kaduk@mit.edu/barnowl are around, the might be able to say more, as it exists on darwin and other BSDs [13:36:05] or actually... for solaris and darwin it looks to just be used as a "lock" vnode operation, whatever that means for the platform [13:36:27] Yeah. I think it's because on those platforms we're using own our vcaches, rather than piggybacking on the kernels. [13:36:42] So we need to provide a way for the kernel to lock and unlock the whole datastructure. [13:37:03] The AFS locks that ObtainWriteLock et al use don't provide that, because they aren't real locks. [13:37:56] --- jaltman has left: Replaced by new connection [13:38:00] --- jaltman has become available [14:03:17] --- jaltman has left: Disconnected [14:03:57] --- jaltman has become available [14:08:28] --- jaltman has left: Disconnected [14:08:51] --- jaltman has become available [14:28:33] --- meffie has left [14:44:02] I actually don't remember running into (struct vcache)->rwlock directly, but I'm kind of distracted at the moment. [14:45:24] kern/vnode_if.src describes much of the vnode locking schema, which you will probably want to look at. An NFS client implementation is a great place to go to get examples. [14:46:56] sys/vnode.h describes what locking is needed around accesses to struct vnode elements (we usually get at these through macros); you will want to check that locking eventually as well. [14:57:15] --- meffie has become available [15:37:45] --- deason has left [15:42:32] yeah, that apparently might not be my problem [15:42:34] or maybe it is [15:42:52] "What kind of kernel panics are you seeing?" [15:47:34] Or is it failing in other ways? [16:10:42] --- matt has become available [16:15:03] --- matt has left [16:22:59] --- jaltman has left: Disconnected [16:39:56] mostly 'options LOCKDEBUG'-detected locking errors [16:40:06] at least after i made readdir work [16:54:55] Ah, okay. Does it give function names? [16:55:31] maybe [16:57:15] well, backtraces anyway [17:01:30] (I am assuming that is analagous to FreeBSD's WITNESS option, which adds assertions for vnode locking on entry and exit of most vnops, as well as general lock order checking and other lock assertions scattered throughout the kernel. At present, I can't tell if your errors are due to vnode locking or other locking.) [17:02:58] yeah [17:07:53] You don't happen to have a generated vnode_if.h somewhere you could send a link to, do you? [17:27:39] http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/vnode_if.h?rev=1.72&content-type=text/x-cvsweb-markup [17:28:44] the issue appears to be i'm unlocking an already unlocked /afs (directory vnode) [17:32:10] Okay. Is it in something annoying like vop_lookup? [17:33:14] i think so [17:34:10] well, i think i'm falling over in nbsd_unlock, but lookup() appears to be involved nearby [17:34:58] Sure. Lookup is always annoying; I don't think FBSD has it quite right, yet. [17:35:17] There is a useful-looking long comment in vnode_if.src about lookup. [17:35:57] Finding a way to get line numbers associated with your trace will probably make you less sad. [17:36:27] --- rra has left: Disconnected [17:43:34] (I usually end up taking a kernel crash dump and loading it into kgdb for post-mortem, to do that.) [17:45:04] heh, maybe i should be doing this on a box with significantly less than 8G of RAM [17:45:35] Possibly. [17:47:00] My machine has 4G, but I think I ran into some issue where setting maxmem to something lower caused the kernel dump process to be very slow, so I'm still using it all. [17:49:56] Going back a bit, now that I'm on a machine with a local source tree, it looks like (struct vcache)->rwlock is unused on FBSD at the moment. [17:52:23] --- Russ has become available [17:55:23] I snagged rxdebug output during the 20(-ish)-second window when my 'make buildworld' was in afs_rx_cv_wait; it's at /afs/athena.mit.edu/user/k/a/kaduk/Public/d/rxdebug.log Anyone care to help interpret it? [18:03:50] --- Simon Wilkinson has left [18:05:10] I guess it (rwlock) is a relic of the FBSD 4.X days. [18:12:14] --- meffie has left [18:32:32] --- jaltman has become available [18:54:27] ugh. vfs doesn't make any sense. [18:54:28] --- deason has become available [18:55:58] It is something of a tricksy creature, yes. I don't fully understand it, myself. [19:04:38] >rxdebug.log you're waiting for a response from 18.181.0.22, and there are packets in the xmit queue... not sure if you're looking for more than that [19:06:22] I'm looking for an explanation of "why is this operation sitting for thirty seconds in this state?". But I suppose if it is trying to do something silly like rm -rf a big directory, it might actually be waiting on data from the server for that whole time. [19:08:04] It shouldn't be -- no single operation it can ask the server to do should take very long. Certainly rm -rf involves lots of user/kernel switches, since every file is a separate unlink(2) call. [19:09:39] I don't have full trace data of what it was doing the whole time; just: rm -rf /afs/sipb.mit.edu/project/freebsd/build/afs/sipb.mit.edu/project/freebsd/build/src/tmp load: 0.00 cmd: rm 7588 [afs_rx_cv_wait] 0.22r 0.00u 0.00s 0% 1020k load: 0.00 cmd: rm 7588 [afs_rx_cv_wait] 13.03r 0.03u 0.86s 5% 1144k [19:10:47] is that time inside a single afs_rx_cv_wait call, or collectively? [19:11:20] Collectively for the rm process. [19:11:33] I guess I'll buy that. The hard work is all being done remotely. [19:11:57] (and if you want to know "why", you could rxdebug the server and see what it says for the conn/call to that client) [19:12:19] but yeah, if it's aggregate over the whole rm execution, that doesn't seem unbelievable [19:12:26] so sure, your rm process is going to spend most of its time waiting on the CV that synchronizes the front and back halves of rx_Read and rx_Write [19:13:26] er, why are there two lines for the time spent, if it's collected over the whole run? [19:13:55] That's the output from when I hit ^T during execution -- I hit it twice. [19:16:23] how big is "big", btw? you could just try to delete the dir from a non-fbsd client and see if the times are comparable [19:19:30] 50k files/directories, judging by the local build directory. [19:20:07] (deleting it right now would be a bad idea, as I'm trying to see if I can reproduce this lockup I saw a few days ago.) [19:39:46] aix build fails because the configure test for 'struct winsize' existence includes , but aix has no (but it does have a ), causing the test to fail even though there is a struct winsize [19:40:01] is one of those more right, or do we need it in a #ifdef HAVE_SYS_TERMIOS_H ? [20:05:34] --- Jeffrey Altman has left [20:09:13] --- Russ has left: Disconnected [20:09:41] --- Russ has become available [22:52:07] --- deason has left [23:56:33] --- Simon Wilkinson has become available