[00:23:55] --- Russ has left: Disconnected [00:44:59] --- dev-zero@jabber.org has left: Replaced by new connection [00:45:00] --- dev-zero@jabber.org has become available [01:45:43] --- dev-zero@jabber.org has left [05:08:11] --- Derrick Brashear has become available [06:54:55] --- dragos.tatulea has become available [08:50:17] --- dragos.tatulea has left [09:02:40] --- dragos.tatulea has become available [09:07:30] Derrick Brashear: Hi. Can you please remind me when the disconnected deadlock occured and the panic as well? [09:07:42] I can't find the logs and need to test my changes. [09:23:23] --- summatusmentis has become available [10:00:28] --- summatusmentis has left [10:02:31] the deadlock occurred when i was doing an ls which was taking a while and disconnected while it was hung unknowingly. the panic occurred while i was disconnected and on a plane and had been for hours [10:02:42] and i suspect was "the gui did something" [10:15:27] --- summatusmentis has become available [10:26:25] --- summatusmentis has left [11:36:10] --- Simon Wilkinson has become available [11:36:43] dragos.tatulea: Backtrace from the panic suggests that an rmdir was in progress. [11:37:25] It looks like a reference counting issue. It could be due to the issues in afs_remove. [11:37:46] Deadlock is probably because we're failing to hold the discon lock upon entry to a VFS operation. I'm not sure where, though. [11:37:58] i never found where [11:39:19] We still shouldn't really deadlock on a GetVCache though. I wonder if there's another piece of code where we're doing a FindVCache for a vcache we already hold. [11:40:56] i didn't find that either [11:58:57] I think I have decided that a proper marriage of ubik and sqlite would actually be cleaner and easier than trying to make sqlite use ubik as a storage backend. [12:12:43] back [12:13:12] I fixed the afs_remove issue. And I want to test it now. [12:15:15] Simon Wilkinson: Oh, another piece of code...hmm. [12:15:22] I must look around. [12:19:49] If you've sorted the overall reference counting behaviour, that's likely to make quite a few changes to the existence or not of other bugs. [12:20:30] Bear in mind that you can also deadlock by violating the locking hierarchy - and possibly do so just by locking one vnode whilst holding a lock on another. [12:21:16] True. [12:23:08] Derrick's cmdebug output said where the lock was being held - unfortunately, you can't work out where things are blocked on that lock without a backtrace, which I don't think he's got. [12:29:30] So, then I'll just hit it with all kinds of operations, until it breaks. [12:29:36] There's no choice. [12:29:49] It _might_ be Mac specific. [12:29:58] I'd concentrate on looking at rmdirs [12:30:18] Oh, and that's the panic. [12:30:42] The deadlock, you should try setting up something that does loads of file system operations, and constantly tries to disconnect and reconnect during them. [12:31:04] In Derrick's case, he was just doing a big ls when it deadlocked, so you might need to test no more than that. [12:31:44] Oh, big ls. Thanks for the tip. [12:32:41] I want to see if I'm puting the vnodes properly. That's my first concern. [12:33:36] But I can verify that by checking out the warnings during afs shutdown. [12:40:51] You should catch some errors that way. [12:41:24] You might want to check the undercount path. There's a vnode operation on Linux that the kernel will call when it wants to remove a vnode - I'm not sure what that function does if the vnode is still in the vhash or LRUQ. [12:44:16] In Linux osi code? Or in linux kernel? [12:44:46] The Linux osi code does it. [12:45:29] When the kernel sees that an inode's reference count has hit zero, it will try and free that inode. It calls an operation in the VFS module which 'owns' the inode in order to do so. [12:49:38] What does big dir min? How many entries? 100? 1000? [12:50:31] it was probably /afs/andrew.cmu.edu/usr/ or /afs/andrew.cmu.edu/usr/shadow [12:51:09] thanks [12:51:56] I think it was something like ls /afs/andrew.cmu.edu/usr/shadow | xargs cat >/dev/null [13:11:35] --- Russ has become available [13:48:07] How can I setup my linux router so that I can select to which computer/IP I am connecting to when I'm away? I am currently using some iptables rules to forward a port to a specified IP. But it's inefficient for same port on multiple machines. [13:48:13] Any hints? [14:16:18] I did ls -l twice in /afs/andrew.cmu.edu/usr/ and it deadlocks the second time without ever going into disconnected mode. [14:16:56] Hmmm. I bet it doesn't do that without --enable-disconnected [14:17:13] Can you find the deadlock? [14:17:29] alt-sysreq-t should give you a backtrace of the kernel threads, and at least give you an idea of where to start. [14:21:17] It's in afs_cv_wait for the ls process. [14:21:35] It seems to be reading stuff from rx. [14:22:27] Which lock is it deadlocking on? Is it really deadlocking? [14:23:45] Well...it doesn't seem like it's deadlocking. [14:24:35] http://pastebin.com/m1b8c0da2 [14:26:55] So, what's it doing? Is it just taking a long time to complete? [14:30:17] Well, it completed right the first time. [14:30:24] s/right/fast [14:30:52] oh wait. there's something else: I did only a ls the first time. Now it's ls -l. [14:32:04] Yeah. I can still see afs traffic on the wire. So it's not deadlocking. [14:33:12] Yup. I'd expect ls -l to take longer. [14:33:42] (for some versions of ls - others stat everything when you just do a 'ls' so they can get the colouring right.) [14:37:54] Yeah, well, 20 minutes or so of getting stats from that dir is still a long time... [14:38:11] Is it using BulkStat ? [14:40:20] inline bulk stat [14:41:23] And it's nothing local to you taking an age? Like looking up non-existent UIDs in nss, for example? [14:43:02] nope, no message [14:43:14] I canceled the operation , was getting bored. [14:51:29] --- summatusmentis has become available [15:56:57] --- dragos.tatulea has left [16:42:23] --- summatusmentis has left [16:56:33] --- summatusmentis has become available [17:00:39] --- summatusmentis has left [18:52:36] > than trying to make sqlite use ubik as a storage backend. [18:52:41] i wouldn't have tried that anyway [18:53:15] > without a backtrace, which I don't think he's got. [18:53:17] alas no. [18:53:29] i may be able to reproduce it. immediately before i take my next update i will try [18:53:49] --- Derrick Brashear has left [21:23:31] --- Russ has left: Disconnected [22:00:43] --- Russ has become available [23:10:05] --- reuteras has become available [23:18:25] --- summatusmentis has become available [23:45:58] --- Simon Wilkinson has left