[00:11:31] --- kaj has become available [00:17:33] --- kaj has left [00:19:47] --- haba has left [00:27:08] --- kaj has become available [00:48:06] --- haba has become available [01:19:42] --- dev-zero@jabber.org has become available [02:45:58] --- Jeffrey Altman has become available [02:46:48] --- jaltman has left: Replaced by new connection [02:46:49] --- jaltman has become available [02:53:13] --- reuteras has left [02:53:33] --- reuteras has become available [03:49:31] --- jaltman has left: Disconnected [06:00:21] --- haba has left [06:06:54] we should do cppi at the same time we fix whitespace. probably "immediately before 1.6" [06:07:52] Sounds good [06:34:55] --- haba has become available [06:43:18] --- Simon Wilkinson has left [06:43:18] --- Simon Wilkinson has become available [07:13:12] --- deason has become available [07:22:41] Yeah, that sounds good. It'll be annoying to have another "reformat the world" commit, but in the long run, probably less annoying than not dealing. [07:22:54] so don't do that [07:22:59] mix [07:23:03] pmix, in fact [07:44:35] I have a rather hanging AFS situation: ekman# jobs [1] Running ls /afs/nada.kth.se/ & [2] Running /usr/openafs/bin/fs where /afs/nada.kth.se/ & [3] Running /usr/openafs/bin/fs flushv /afs/nada.kth.se/ & [4]- Running /usr/openafs/bin/fs flushv /afs/nada.kth.se & [5]+ Running /usr/openafs/bin/fs flushv /afs & [07:45:01] cmdebug [07:46:18] ... says a lot about Lock .... (none_waiting) [07:46:33] Did something deadlock? [07:46:33] if there's nothing up, it should say nothing. pastebin [07:46:42] beats me [07:47:06] What info do we want to pick before we try to restart? [07:47:11] This is 1.4.11 [07:47:26] the cmdebug information? [07:47:55] Does -long give me everything? [07:48:12] i don't want everything. [07:48:16] (it does say "all" in the help text ;-) [07:48:18] run cmdebug. pastebin the output [07:48:30] like, ideally, *before* you go off rebooting shit [07:48:42] sure, now (=before) [07:51:35] cmdebug -long output now at /afs/stacken.kth.se/home/haba/Public/cmdebug-ekman-20100304-long.log [07:52:18] It's only 1372 lines. [07:52:41] great. can i have it without -long? [07:53:01] you keep trying to give me a firehose, i'd like a faucet [07:53:14] Not that interresting without -long: ekman# cat /scratch/cmdebug.log ** Cache entry @ 0xe4d2a540 for 8.536870913.38.10404 [pdc.kth.se] locks: (writer_waiting, write_locked(pid:17143 at:133), 2 waiters) 63142 bytes DV 0 refcnt 4 callback 41f970c0 expires 1267550028 0 opens 0 writers mount point states (0x4), read-only [07:53:30] on the contrary, that's very interesting [07:53:42] Ok!! [07:53:48] it tells us the problem instead of a bunch of bullshit we don't care about [07:53:56] *bows* [07:54:21] that lock is "EvalMountPoint" [07:54:37] 536870913 is root.afs [07:56:20] on another machine, where you have access to root.afs and not dynroot, ls -lid /afs|grep 65574 [07:58:13] no match. But on other machine, fs getfid /afs/all-mountp-I-tried hangs, too. :-( [07:58:28] Something is unhappy [07:58:43] or very very slow [07:59:21] very slow, probably. afsdb? [08:00:48] You mean very slow db servers? [08:01:02] afsdb records failing to resolve, or being very slow to [08:01:07] no. i meant what i asked. [08:01:19] Ah, DNS?! [08:01:20] Hmm [08:01:25] or rather "are your clients using AFSDB records" [08:01:36] is afsd invoked with -afsdb [08:01:42] No, CellServDB almost all of them [08:02:02] you say "no" and proceed to not answer my question [08:02:20] i don't care if the cells are all in CellServDB: they'll still be looked up in DNS [08:02:29] (if afsd is run with -afsdb; is it?) [08:02:35] yes [08:02:42] invoked with -afsdb [08:02:56] is this linux? wouldn't 'echo t > /proc/sysrq-trigger' say specifically what it's hanging on? [08:03:20] ok. so by no you mean yes. on that client (fs getfid) does cmdebug show waiting on the afsdb_req lock? [08:03:36] and yeah, what deason said for the one which is hanging in EvalMountPoint [08:06:13] oh. sorry. the command i gave earlier, i said "ls -lid" right? i meant ls -li [08:06:21] try that and grep for (whatever that number was) [08:06:29] * haba took a look at the console. There was a LOT. [08:06:51] yes, there is a lot. you need yo look for pid 17143 [08:07:17] (which according to the cmdebug you gave us is the process blocked with lock 133 held) [08:08:59] This looks like a match: bash S ffff810001014f00 0 17143 1 17606 3460 (NOTLB) ffff81019e869748 0000000000000046 ffffffff886729e0 ffff810124bc8800 ffff81043f1a6800 000000000000000a ffff8101231267b0 ffff8102471d07f0 000a810a1740f73b 00000000000094ce ffff810123126998 00000004906d1610 Call Trace: [] :libafs:afs_cv_wait+0x143/0x257 [] default_wake_function+0x0/0xe [] :libafs:rxi_ReadProc+0x29b/0x3ff [] :libafs:rxi_AllocPacketNoLock+0x92/0x141 [] :libafs:rxi_AllocSendPacket+0x61/0x135 [] :libafs:rx_ReadProc32+0x9e/0xb1 [] :libafs:afs_xdr_char+0x0/0x54 [] :libafs:xdrrx_getint32+0x16/0x2e [] :libafs:xdrrx_putbytes+0xc/0x16 [] :libafs:afs_xdr_char+0x3d/0x54 [] :libafs:afs_xdr_vector+0x2f/0x51 [] :libafs:xdr_uvldbentry+0x22/0x1f0 [] :libafs:VL_GetEntryByNameU+0x6b/0x15d [] :libafs:afs_ConnByMHosts+0x45/0x5f [] :libafs:afs_NewVolumeByName+0x2ba/0x5bd [] :libafs:EvalMountPoint+0x1b4/0x3cf [] :libafs:afs_lookup+0xccd/0x112e [] :libafs:afs_linux_lookup+0x8c/0x1d7 [] :libafs:afs_access+0x375/0x398 [] do_lookup+0xe5/0x1e6 [] __link_path_walk+0xa01/0xf42 [] link_path_walk+0x5c/0xe5 [] do_path_lookup+0x270/0x2e8 [] getname+0x15b/0x1c1 [] __user_walk_fd+0x37/0x4c [] vfs_stat_fd+0x1b/0x4a [] sys_newstat+0x19/0x31 [] system_call+0x7e/0x83 [08:09:22] so it's waiting in futility for the other side to answer [08:09:31] it *should* time out [08:10:35] you're waiting for a vldb query, though; vlservers okay? [08:12:10] whose vldb? mine? someone elses? udebug says that mine is fine. [08:12:26] well, if you run that ls -li i told you, we can tell you who [08:13:28] I have not any ls -li that has returned yet. [08:13:45] run it on a client which you haven't already screwed? [08:13:58] same same [08:14:04] oh. i suppose if all your clients are the same, you'll hang it too [08:15:42] --- kaj has left [08:15:48] i suppose you can try seeing if rxdebug helps, see what the client has an open connection to port 7003 on and is waiting gotr packets [08:15:50] "waiting for" [08:25:25] --- reuteras has left [08:25:40] This is in "wait": Connection from host 130.237.234.253, port 7003, Cuid ab53ca29/2db7b928 serial 42, natMTU 1444, security index 0, client conn call 0: # 11, state active, mode: receiving, flags: reader_wait, has_output_packets call 1: # 0, state not initialized [08:26:01] that seems plausible [08:26:15] The strange thing with this conn is that 130.237.234.253 is not any DB server or something like that. [08:26:27] it has a vlserver on it [08:27:06] The merrier! [08:27:29] besserwisser.org [08:28:09] Ok, _that_ cell. [08:28:11] Hmm [08:29:12] Should I go and shoot that server? [08:29:27] probably. rxdebug 7003 looks kinda sickly [08:30:13] But in that case we have a 1.4.8 vldb that can hang a 1.4.11 client? [08:30:27] --- deason has left [08:30:39] seems it. oddly it suggests idledeadtime doesn't apply to vl connections. [08:30:57] --- deason has become available [08:31:03] because that's the point of idledeadtime. before that, any server could hang any client [08:35:32] why would a hanging vl query in another cell stop other accesses, though? it's not all in one place like afsdb [08:35:55] oh, that lock is global [08:36:09] Yuck [08:36:18] no, it's on the tvc [08:36:51] I would have thought it would only cause other ops on that mount point to block, not everything (going by the 133 lock, up there) [08:37:17] I guess it depends on what other locks we hold at that point. [08:37:31] talked to Måns on a bus. he used his 3G to login to the server. vldb restarted, lock goes away [08:37:44] And the BKL won't be helping. [08:38:29] So yes, it was that vldb locking the whole shit. [08:38:58] This sounds like something we should fix, though. A single vldb hanging shouldn't take down the entire client. [08:39:04] If it does, can you open something in RT? [08:39:41] cmdebug only showed that vcache lock held, though, which is why I say that [08:39:47] BKL or something similar makes sense, though [08:40:14] Yes, I can open something in RT [08:40:17] I need to check what the semantics for the BKL are again. I know that it's re-entrant within a core. [08:40:39] I can't remember if it's restricted to just the process which got the lock, or if it's any process on that core. [08:42:45] * haba should dinner. [08:46:33] Thanks folks, I will summarize something for RT after dinner. [08:48:23] --- haba has left [09:29:01] --- jaltman has become available [09:32:31] --- dev-zero@jabber.org has left [09:58:19] --- meffie has become available [10:09:49] --- meffie has left [10:23:42] --- haba has become available [10:42:41] haba: if you still have that 'echo t' output, can you see what other processes are blocked on? [10:43:03] does lock_kernel appear in there, for instance? [10:45:37] I have a theory that, given that the operations we call on the underlying fs no longer require lock_kernel, we shouldn't either, so we should be able to get rid of lock_kernel entirely. [10:45:55] But I'd need a big SMP machine, and a lot of test cycles before I was prepared to ship with that hunch. [10:46:13] they just don't use the bkl, or they replace it with mor logical locks? [10:46:17] "more" [10:46:32] They use better locking internally,. [10:47:14] what I'm not clear on is if it makes sense to only lock/unlock GLOCK without the bkl [10:47:18] Basically, when Linux went SMP, the solution to the contention problem was to just use the BKL everywhere. Since then, it's gradually been purged, and at some point, all of the VFS operations we call into when using a disk cache became BKL safe. [10:47:38] a couple of places do both at the same time, but not most of them (since lock_kernel is going to be linux specific obviously) [10:47:41] I'm not sure what you mean. They're two different locks, protecting two different things. [10:48:15] we use the bkl just for the requirements of kernel callers? [10:48:24] "do we use" etc [10:48:43] I believe we use it to serialise the kernel functions that we call. [10:49:30] So, for example, it prevents two cores from simultaneously modifying the super block for the cache partition. (or, it used to, before that super block got its own lock) [10:49:32] I'm just thinking in this specific case of vl lookups, it doesn't seem very useful (at least in smp) to drop GLOCK but we keep the bkl (I think) [10:50:01] Hang on, I just need to check the ways in which the BKL is re-entrant, before I misspeak here. [10:54:27] Okay, so a given process can acquire the kernel lock as many times as it likes. [11:00:26] However, the BKL is also dropped when a task is rescheduled. I think, although I'll need to read more code to be sure, that that means that if our process sleeps (rather than blocking on a spinlock), it will give up its kernel lock. [11:02:10] I'm not sure, but from looking around, it looks like "schedule() drops the BKL" may have been changed.... I see references to fixing code that assumed that [11:02:33] The code for 2.6.33 suggests that it does... [11:03:13] I might just be looking at one of those "kill the bkl" trees or something strange [11:03:55] The kernel lock is a spinlock though, so if other processes were waiting on it, the load should have gone through the roof. [11:10:01] I thought that depends on the version; between 2.6.7 and 2.6.26 it may have been something else [11:10:13] at least, that's what I'm inferring from the beginning of http://kerneltrap.org/mailarchive/linux-kernel/2008/5/14/1821774 [11:12:05] There's certainly code that's dependent on CONFIG_PREEMPT in 2.6.33, so I suspect that post is just a snapshot in time, too. [11:13:07] If Ingo's stuff ever makes it to mainline, we're going to have to do some major surgery on the cache manager. We're hugely reliant on the fact that the BKL goes away on schedule() [11:13:15] (We make no attempt to release it when we sleep) [11:15:41] 2.6.20 seems to use a spinlock too. [12:01:57] --- jaltman has left: Disconnected [12:16:47] --- jaltman has become available [12:25:35] --- jaltman has left: Disconnected [12:50:43] --- haba has left [13:04:37] --- Russ has become available [13:16:22] --- meffie has become available [13:20:38] --- kaj has become available [13:28:15] --- kaj has left [13:28:48] --- jaltman has become available [13:32:09] --- kaj has become available [13:53:38] --- jaltman has left: Disconnected [14:02:06] --- haba has become available [14:04:53] --- jaltman has become available [14:40:15] --- kaj has left [15:22:16] --- deason has left [15:27:57] --- Simon Wilkinson has left: Lost connection [15:29:59] --- Simon Wilkinson has become available [15:30:55] --- Simon Wilkinson has left [15:54:40] --- Simon Wilkinson has become available [15:59:51] --- Simon Wilkinson has left [16:03:44] --- Simon Wilkinson has become available [16:09:11] --- Simon Wilkinson has left [16:10:09] --- Simon Wilkinson has become available [16:12:54] --- deason has become available [16:18:12] --- Simon Wilkinson has left [16:50:12] --- Simon Wilkinson has become available [18:17:40] --- Russ has left: Disconnected [18:54:30] --- Russ has become available [20:24:46] --- meffie has left [21:23:14] --- Russ has left: Disconnected [21:34:08] --- kaj has become available [21:40:20] --- reuteras has become available [22:21:57] --- jaltman has left: Disconnected [22:22:05] --- jaltman has become available [22:24:30] --- deason has left [22:28:27] --- kaj has left [23:29:44] --- kaj has become available