[06:28:09] --- kula has left: Lost connection [07:53:07] --- kula has become available [10:05:41] --- mvitale has left [12:40:04] --- andersk has left [12:40:06] --- andersk has become available [12:40:20] Occasionally OpenAFS on scripts.mit.edu servers has been getting into a state where getcwd() in certain directories returns ENOENT. Any idea what might cause this? [12:44:39] And the /proc/self/cwd link is marked as (deleted). [12:49:40] that sounds like still more linux dcache wtfery [12:52:14] Any useful debugging info I can recover? [12:53:54] hrm. I don't know the guts well enough to know. (deason, you around?) [12:54:29] also this conference is looking kinda empty, did we lose everyone who was using gtalk? I know they were talking about shutting off federation… [12:54:53] I'm not sure who was using gtalk. [12:55:45] andersk: and you know that the directory is still there and fine and everything? [12:56:02] --- deason/gmail has become available [12:56:13] huh [12:56:34] Yes. In this case it’s /afs/athena.mit.edu/contrib/scripts. It seems to operate normally other than showing up as deleted and confusing getcwd(). [12:57:04] coming in here via google works, but it shows a different list of people in the room, and gives older scrollback [12:57:19] but I still see everything sent [12:57:20] --- deason/gmail has left [12:57:31] Um. That seems poor. (the ~netsplit) [12:58:42] andersk: do you know if there are any other mountpoints to that volume, or the volumes in the path ancestors? [12:58:50] (that is, that could/would be accessed by that client) [12:58:57] and what version of the client? [12:59:07] short name vs. full name of the cell? [13:00:21] The client (this is bees-knees.mit.edu) is 1.6.2.1 on kernel 3.8.13. [13:00:50] general debugging info, though, would be either fstrace capture, or looking at internals with 'crash' or capturing a vmcore by crashing the kernel [13:01:49] in crash, you'd just look at the dentry for the final directory, and work your way back up, just printing each dentry [13:02:29] I don’t know about other mountpoints. It’s a webserver serving content from thousands of users out of AFS, so in principle it could be accessing just about anything. [13:06:16] crash can examine a running kernel, right? [13:08:46] yes [13:09:21] 'files' can give you the list of files open for a process and give you the dentry/inodes for it [13:09:59] specifically check if the vfsmount structure looks to be pointing to the right thing [13:10:06] Hmm, this is an interesting distinction: $ ls -l /proc/self/fd/9 9< . lr-x------ 1 scripts scripts 64 Jun 11 16:09 /proc/self/fd/9 -> /afs/athena.mit.edu/contrib/scripts (deleted) $ ls -l /proc/self/fd/9 9< /afs/athena.mit.edu/contrib/scripts lr-x------ 1 scripts scripts 64 Jun 11 16:09 /proc/self/fd/9 -> /afs/athena.mit.edu/contrib/scripts [13:10:06] and follow the parents, etc [13:11:11] Oh never mind, that’s because it fixed itself. :-( [13:11:13] ah, so yeah, what you could do is open . as fd 8 and the full path as 9, and see how the dentries differ [13:11:17] oh, or that [13:12:58] Well, it happens every few days. I’ll check these things next time. Thanks! [13:13:48] if you wanna know what's going on in general, it's probably triggered by someone else accessing that dir via a different mountpoint at the same time, which means we have multiple dentries that sorta resolve to the same mountpoint [13:14:19] we either screwed something up with trying to get it to work that kernel, or what we want to do is impossible to do "correctly" [13:14:48] obviously the newer kernels haven't been tested as long... and the relevant vfs stuff has indeed been changing [13:15:08] (and I must go pay attention to something else now, good luck :) [13:15:17] Alright. [13:16:16] I have crash working now, so hopefully I’ll be able to get more information. [16:21:52] --- deason has left [22:39:15] --- jaltman/FrogsLeap has left: Disconnected [22:49:38] --- jaltman/FrogsLeap has become available