[00:11:26] --- Russ has left: Disconnected [00:18:52] --- pod has left [00:20:05] --- Simon Wilkinson has left [00:39:39] --- reuteras has left [00:45:34] --- reuteras has become available [01:00:33] --- dev-zero@jabber.org has left [01:02:47] --- reuteras has left [01:06:13] --- haba has left [01:34:44] --- reuteras has become available [01:37:57] --- sxw mobile has become available [01:44:23] --- dev-zero@jabber.org has become available [01:52:25] --- dev-zero@jabber.org has left [01:52:38] --- dev-zero@jabber.org has become available [02:24:48] --- kula has left [02:25:25] --- sxw mobile has left [02:26:36] --- sxw mobile has become available [02:33:27] --- haba has become available [02:34:11] --- reuteras has left [02:35:02] --- sxw mobile has left [02:43:19] --- sxw mobile has become available [02:46:01] --- sxw mobile has left [03:04:33] --- sxw mobile has become available [03:09:54] --- sxw mobile has left [03:24:59] --- sxw mobile has become available [04:03:52] --- Simon Wilkinson has become available [04:04:55] --- Simon Wilkinson has left [04:07:22] --- Simon Wilkinson has become available [04:10:00] --- Simon Wilkinson has left [04:29:59] --- Simon Wilkinson has become available [04:31:23] Given that Russ blew away the DUX client files in 2006, is there any reason to keep code which is AFS_OSF_ENV or AFS_DUX*_ENV in src/afs ? [04:31:57] --- sxw mobile has left [04:34:55] --- pod has become available [04:42:23] --- kula has become available [04:49:18] > reason to keep code which is AFS_OSF_ENV or AFS_DUX*_ENV in src/afs probably not [04:50:43] and ah yes, the cherry-picked-from note. sorry :( [04:57:02] --- Simon Wilkinson has left [05:02:28] --- reuteras has become available [05:31:18] --- haba has left [05:35:28] --- haba has become available [05:58:39] --- Jeffrey Altman has left: Replaced by new connection [06:08:02] --- meffie has become available [06:59:49] --- deason has become available [07:46:25] --- stevenjenkins has left [07:50:44] --- sxw mobile has become available [08:12:06] --- sxw mobile has left [08:17:50] --- haba has left [08:18:54] --- sxw mobile has become available [08:20:10] --- reuteras has left [08:23:24] --- sxw mobile has left [08:31:20] --- sxw mobile has become available [08:33:55] --- sxw mobile has left [08:44:10] --- sxw mobile has become available [08:47:57] --- sxw mobile has left [08:52:51] --- sxw mobile has become available [08:56:23] --- sxw mobile has left [09:53:43] --- stevenjenkins has become available [09:54:54] hm. oops in iput in FlushVCache. i wonder if this machine had a stale sandbox [10:01:19] --- dev-zero@jabber.org has left [10:19:22] --- Russ has become available [10:22:23] Simon: Did I miss any of them? I was trying to be careful about that and I thought I got them all. [10:37:10] --- haba has become available [11:00:46] --- dev-zero@jabber.org has become available [13:38:37] --- Simon Wilkinson has become available [13:39:01] Russ: Lots of stuff in the generic code. I have a patch... [13:39:55] shadow: Marc's newest Ubuntu bug is an oops in iput in FlushVCache. Are you seeing it on Darwin? [13:40:31] --- mdionne has become available [13:42:54] --- mdionne has left [13:44:34] that was linux [13:44:35] --- mdionne has become available [13:45:02] Simon: Oh, sorry, I meant on missing the cherry-pick comments, but I see now what you were seeing. [13:45:07] (And it wasn't mine, I think.) [13:45:17] New openafs packages uploaded to unstable last night. [13:45:28] a peek at kerneloops.org also shows 20 cases of the same oops - clear_inode from iput in FlushVCache [13:45:34] Ah. Sorry cross purposes. [13:45:43] (I thought you were talking about DUX removal) [13:48:45] Marc: Was there any more information in the Ubuntu bugs about what the system was doing at the time? [13:49:37] --- mdionne has left [13:49:45] --- mdionne has become available [13:50:30] Russ: I think you also want change 767 - there's several Ubuntu bugs related to that ("blocks not freed" warning) [13:51:07] Yeah, I'll pick that one up in the next update -- thank you very much for fixing that! [13:51:22] It should resolve a couple of Debian bugs too. [13:53:02] So, looking through the backtraces on kerneloops.org, the clear_inode bug is almost certainly dynamic_vcaches fault. [13:53:04] Simon: no, didn't see much more information. what I did figure from the kernel source is that i_sb is not null when we enter iput(), otherwise we wouldn't get to clear_inode [13:53:48] Did we ever verify that it was the lookup of i_sb that was going bang? [13:54:26] no, not definitely. pretty sure it is though [13:54:59] You know what. I bet we don't have the BKL when we're called from the Daemon process [13:56:14] iput_final says it should be called with the inode lock held, too. [13:57:33] will have to look again, but there's a decrement and lock operation that should lock the inode [13:57:58] but later, there's a period where the inode has a ref count of 0 and is unlocked before clear_inode is called [13:58:26] The drop function releases the inode_lock spinlock [13:58:35] I doubt we ever acquire it. [13:59:05] Ah, no, iput does. Not that then. [13:59:14] iput does, yes [13:59:24] atomic_dec_and_lock [14:00:18] --- abo has left [14:00:53] --- abo has become available [14:08:55] One possibility is that we're just being much more aggressive now about shaking loose vcaches. Historically, we'd only try when we approached the limit - now we try to invalidate the dentry cache ever 5 minutes. It's possibe that that process is tripping up some bugs. [14:09:28] I have a suspicion that ShakeLooseVCaches is too aggressive anyway - the kernel has a dentry cache for a reason, and we should let it decide when it's time to invalidate that, rather than forcing it into it. [14:15:32] --- mdionne has left [14:22:05] Okay, so I've looked at some disassembly for this. We're going boom because inode->i_sb_s_op == NULL [14:22:43] (that's the operations field in the super block) I'm now very, very puzzled. As Marc notes, this must have had a value earlier, otherwise we wouldn't have got this far. [14:38:35] Would folks here expect find -fstype afs -prune to work correctly? [14:38:51] (It didn't on Debian lenny, and I'm trying to figure out the appropriate place to send a bug report.) [14:39:26] We've noticed that it doesn't here, too. [14:39:31] Ah, the source code makes this all clear. [14:39:41] Answer: Yes, if you build find with AFS support, which I bet no distribution does. [14:40:09] If you build it with AFS support, it actually does the right thing (a pioctl call). [14:40:17] But of course that requires building it with the AFS libraries, so no one does. [14:40:43] James Youngman is the person to speak to, IIRC. [14:42:55] Hm, but it does seem to work in sid. [14:45:44] And I have no idea why, since MOUNT_AFS isn't defined so the gnulib code isn't doing it. [14:46:17] that requires a pioctl? it can't look at, say, mnttab or whatever? [14:46:22] Okay. So I know why we go boom. [14:46:28] or wait, no, hah [14:46:59] I'm amazed we haven't died more spectacularly. We're walking the VLRUQ without holding any locks. [14:47:27] not mnttab, I mean, a statvfs/statfs struct member or something [14:53:14] --- dev-zero@jabber.org has left: Replaced by new connection [14:53:15] --- dev-zero@jabber.org has become available [14:53:29] --- dev-zero@jabber.org has left [14:53:42] --- dev-zero@jabber.org has become available [14:53:51] Ah, I see. getmntent returns an mnt_type field. [14:54:01] On current kernels, that must actually work right and end up being "afs". [14:54:10] Since the AFS-specific code isn't running. [14:54:18] But it must not work with 2.6.26 lenny kernels. [14:56:03] Do you have an 'afs' entry in /etc/fstab ? [14:56:37] --- deason has left [14:59:52] No. [15:00:16] Wierd. I thought getmntent just returned results from those files, rather than querying the kernel directly. [15:00:25] Hm. [15:00:31] Good point. [15:00:59] Ah, I *do* have an 'afs' entry in /etc/mtab. [15:01:04] I suspect that's what it's reading. [15:01:10] But why is that not working on lenny? [15:01:12] I wonder what's putting that there ... [15:01:50] Ah, will you look at that. [15:02:00] The /etc/mtab entry for afs on lenny is bogus. [15:02:45] Ah. [15:03:04] Because we screwed up the server configuration. [15:03:07] Okay, that explains it. [15:03:27] Cool! [15:03:31] Yes, listing a symlink in your cacheinfo file will actually lead everything to still work except that your kernel will think that AFS is mounted on /afs when in fact it's actually mounted on the target of the symlink. [15:03:46] Nice ... [15:03:54] Yeah, I think something about that is wrong. [15:03:55] :) [15:05:24] --- abo has left [15:06:11] --- abo has become available [15:07:57] Until we can get a new client out there, the solution to the Linux clear_inode panics is to disable dynamic vcaches. [15:32:32] --- haba has left [15:40:03] --- mdionne has become available [16:07:53] sorry, went away, my sister had a piece in a gallery opening. anyway, the s_op thing sounds familiar [16:09:57] oh. fun. [16:12:37] --- meffie has left [16:13:43] --- dev-zero@jabber.org has left: Replaced by new connection [16:13:44] --- dev-zero@jabber.org has become available [16:16:21] --- mdionne has left [16:16:37] --- mdionne has become available [16:18:31] --- dev-zero@jabber.org has left: Lost connection [16:23:55] --- deason has become available [16:32:27] --- dev-zero@jabber.org has become available [16:39:48] --- mdionne has left [17:46:00] --- andersk@mit.edu/dr-wily has left [17:46:00] --- andersk@mit.edu/dr-wily has become available [17:46:06] --- andersk@mit.edu/dr-wily has left [21:52:45] --- deason has left [22:00:38] --- reuteras has become available [23:41:46] --- Russ has left: Disconnected