[00:00:14] --- Simon Wilkinson has left [00:04:13] --- Simon Wilkinson has become available [00:12:53] --- Simon Wilkinson has left [00:14:19] --- mvita has left [00:18:20] --- Daria has left [00:21:00] --- jaltman/FrogsLeap has left: Disconnected [00:26:43] --- jaltman/FrogsLeap has become available [00:56:47] --- Daria has become available [00:57:08] --- Simon Wilkinson has become available [01:06:42] --- Marc Dionne has left [01:28:12] --- Marc Dionne has become available [01:29:30] --- mvita has become available [01:45:10] --- mvita has left [03:02:03] --- Marc Dionne has left [03:14:02] --- mvita has become available [03:19:23] --- Marc Dionne has become available [03:22:03] --- mvita has left [03:29:36] --- mvita has become available [03:31:32] --- mvita has left [03:51:02] --- mvita has become available [03:59:59] --- mvita has left [04:13:26] --- mvita has become available [04:53:12] --- Marc Dionne has left [04:53:13] --- Simon Wilkinson has left [04:59:29] --- mvita has left [04:59:30] --- mvita has become available [05:02:13] --- mvita has left [05:02:15] --- mvita has become available [05:07:37] --- mvita has left [05:18:27] --- mvita has become available [05:33:51] --- mvita has left [06:00:37] --- Marc Dionne has become available [06:01:09] --- Marc Dionne has left [06:01:48] --- Simon Wilkinson has become available [06:03:18] --- Marc Dionne has become available [06:17:20] --- stephan.wiesand has become available [06:33:55] --- mvita has become available [06:34:13] --- mvita has left [06:35:16] --- mvita has become available [06:50:48] I suppose it might be good to try and audit what structure variables are allocated on the stack, as they are sometimes large. (I sort of remember having to remove some, possibly just for freebsd.) [06:54:51] Yeah, that's my point. [06:55:34] We shouldn't be running out of stack just because we're calling pretty deep (unless there's actual recursion going on). Much more likely is that we're allocating something on stack that we shouldn't be. [06:55:53] I am trying to remember why my memory is trying to tell me that someone had a script to generate a table of the stack variable types. [06:56:07] But maybe that was just array types. [06:58:23] --- deason has become available [06:58:28] Is there a way for me to get email whenever something new hits the openafs-bugs queue? [06:58:48] The reason Linus reduced the kernel stack size was to penalise people who did silly allocations. I know we were one of those people - we used to see regular crashes when we were accessing disk cache using filesystems with heavy stack use. [06:59:00] kaduk: I think you can nag Daria. [06:59:15] it's not _our_ allocations; writing into ext3 calls a bunch of things for journal stuff, etc [06:59:30] and we're many many layers deep in the path resolution; recursive symlinks etc [07:00:13] It is a bit like drinking from the firehose, though. You get to see all of the spam [07:00:29] Well, yes. [07:00:47] But I seem to be bad at remembering to periodically check RT for new things. Sometimes there are potentially interesting things. [07:00:52] The stack trace I just saw suggests at least some if it is our allocations. It might not all be ours, but we're a significant amount of the trace. [07:01:18] some is ours, but even if we reduced our stack usage to literally zero, you could still always hit the issue by going a few layers deeper [07:01:20] I think there was a substantial chunk in not-us in that trace as well. Will be good to see it in RT. [07:02:49] Yeah, not us is definitely more than 50% by frames. Once we have the trace, we can figure out frame sizes. [07:04:05] --- meffie has become available [07:04:12] Ticket 131831 created in queue 'openafs-bugs' [07:04:32] symlinks use ludicrous amounts of stack space because it allocates at least one full POSIX max pathname length string (1k) for each resolution [07:04:54] yes, but see the first one. [07:05:33] Simon Wilkinson and kaduk, 131831 has the stack traces we just talked about. [07:06:37] 1424 bytes in AFS stack in the first trace [07:07:28] 1824 bytes in the second [07:08:01] Apparently awk is not liking me. (I'm trying to avoid doing too much math in my head.) [07:08:37] Yeah, I haven't split it down per function yet. But I think those numbers are large enough as a proportion of an 8k stack that it's worth digging deeper. [07:08:47] --- madaraszg has become available [07:10:56] Second stack is actually larger - forget osi_rdwr is one of ours :) - make that 1520 and 1920 [07:11:30] the second stack also suffers from the recursion of the symlink resolution [07:11:45] but that's not openafs [07:11:50] Mike: In both stacks, you've only given us the end. Is there other AFS stuff in earlier frames? [07:12:19] simon: this is the actual stack, the beginnign was repeated oopses, etc [07:13:38] so no useful info there. as i understand the corruption happened just a little bit before, but was detected after thread_return() [07:14:09] Can you reproduce this? If so, it might be worth doing it with stack protection enabled so we can look at a clean stack. [07:15:13] I don't think we could deploy that on a large scale into production [07:15:29] Yeah, probably not. Just wondering if you had a simple test case that would generate this [07:16:35] nope. we currently have a couple of these per week, with thousands of hosts susceptible [07:17:29] btw offloading afs_SetupVolume to another thread was an idea of my colleague who's generally responsible for the linux kernel stuff [07:18:15] I think it's okay as a specific solution. What worries me is if we have a more general problem with overly liberal stack usage that this is just a single instance of. [07:18:41] the good news is that we haven't yet seen a stack overflow where afs_SetupVolume was not involved [07:19:23] Yeah, but it may be that all it takes is upstream adding 4 bytes more stack usage, and we blow up somewhere else. There's lots of places where we end up recursively calling into our cache file system. [07:19:30] yes, i share Simon Wilkinson's concern, actually, i'd want to avoid a wack-a-mole approach, but i havent seen other way yet. [07:21:52] well, if you can shave off at least half a KB from the openafs stack usage in these paths, that would certainly be useful [07:22:42] ext3 has some sizable frames. [07:22:44] * meffie laughing at kaduk's commit message in gerrit 10950 [07:23:32] we could do some more analysis into how far the stack have been overflown from the existing vmcores, if that would be useful [07:23:48] It would be interesting to know, if you can get that information [07:24:22] Is this a 1.4 derived kernel module, btw? [07:25:52] no, 1.6.2 [07:26:04] with some patches [07:26:48] Hmm, I'm not sure I believe the numbers I just posted to RT. How is afs_GetVolume using 288 bytes, for example. [07:27:07] That was the thing I was trying to figure out [07:27:33] Unless there's some inlining going on or something. [07:27:37] There might be inlined stuff [07:29:24] static struct volume *afs_NewDynrootVolume [07:29:36] Could be getting inlined if the compiler is happy. [07:29:48] That's only 8 octets though, surely? [07:30:08] Four pointers plus char[12] on the stack in that. [07:30:12] Ah, function, not variable. [07:32:21] afs_NewVolumeByName is also static, and a candidate for inlining. [07:32:41] So if both are being inlined, we get the stack hit from both of them. [07:33:07] Ane NewVolumeByName has a lot more pointers on the stack Oh, and a non-pointer struct vrequest. [07:33:23] *And [07:33:41] struct vrequest is huge [07:33:47] We shouldn't be allocating that on stack. [07:34:57] 88 octets, if we're packing as tightly as we can. [07:34:58] yeah, i see it. [07:35:41] The first thing to try would be to stop the compiler from inlining those functions. [07:36:49] *nods* [07:37:51] The thing before the first thing would be to look at the assembly and see if those functions are inlined. I suspect they are, though - static functions with a single caller. [07:38:17] Yup. [07:38:43] (BTW I gave up on awk and used python instead.) [07:39:55] afs_linux_readpages has a struct vrequest as well [07:41:32] Typing here instead of speaking to Mike over the top, removing the 'static' would have a decent chance of forcing the compiler to not inline it, but is not guaranteed. [07:41:41] *over the talk [07:42:19] Linux does have a "don't inline this ever" attribute [07:42:29] Yeah, that would be more reliable. [07:42:32] If you grep the kernel source you should find it pretty quickly. I wonder if I have that source here. [07:42:44] I don't have a local linux-2.6.git [07:42:53] But, I think /afs/sipb.mit.edu/project/linux does [07:43:16] Er, s/project/contrib/ [07:44:08] "static noinline" [07:44:38] --- kula has become available [07:44:48] afs_vtoi is static, and just before the afs_SetupVolume call... [07:45:43] No real static use in that, though. Two ints, so 8 octets plus overhead [07:45:49] s/static/stack/ [07:45:54] er before the the get vol slot. [07:47:55] --- mvita has left [07:49:35] 304 for osi_rdwr seems very high [07:50:17] Yeah, but I remember osi_rdwr being complicated, so I didn't look at it first. [07:51:46] the kernel guy says he remembers somewhere between "just a little" and "below 200 bytes", but only from a limited set of vmcores [07:51:46] How big is an mm_segment_t? [07:51:59] I mean for the overflow size [07:52:00] Standby ... [07:53:20] 4 octets [07:53:43] It's just a wrapped long, presumably so you don't accidentally do maths on it when you shouldn't. [07:53:51] Sure. [07:53:52] mm_segment_t is a long, so 8 octets [07:54:05] on x86_64 [07:54:54] Bah, yeah. I keep forgetting that. [07:55:08] LP64 not LLP64 [07:55:23] But still the same width as a primitive type, so probably uninteresting for stack consumption. [07:56:48] struct vattr is a good size, in afs_linux_dentry_revalidate [07:57:43] I can't see anything in osi_rdwr that would explain its stack size. get_fs and set_fs are actually pretty boring [07:59:29] Any chance we could see the assembly for this kernel module? [07:59:46] Or, Mike could look at the assembly and see what's going on? [08:00:33] i dont have it at the moment, but yeah [08:04:57] isn't osi_rdwr only 96 bytes, and do_sync_read the one using 304 - which makes senses looking that do_sync_read [08:05:30] That would make more sense. Let me read the raw stack trace again [08:06:04] Hmm, because the stack address is at entry, yes, that sounds right. [08:06:16] --- stephan.wiesand has left [08:06:39] yes i think you need to look at the difference with the following function [08:07:39] So afs_linux_readpages takes 0x140 ? [08:08:07] it has a vrequest [08:08:44] Yeah, that makes sense. [08:08:50] and pagevec is a good size [08:09:09] 16 longs and pointers [08:09:15] --- mvita has become available [08:09:36] --- Marc Dionne has left [08:31:05] --- Marc Dionne has become available [08:37:34] --- meffie has left [08:37:36] --- meffie has become available [08:39:11] --- stephan.wiesand has become available [09:05:32] moved the client changes required/not bullets onto their correct slides and reuploaded. sorry :\ [09:06:11] --- ktdreyer has become available [09:06:41] hilariously i even summarized the point of achernya's document which mentions rxkad-k5 requires no client changes after immediately previously contradicting myself.. [09:41:30] I declined to ask verbally since we're ~10 minutes behind schedule, but with mention of the "rx listener thread", does the balance change if one is using upcalls? [09:42:10] --- haba has become available [09:42:20] We've only got upcall support in the kernel, and I haven't really done much benchmarking there, as I know that GLOCK is going to dominate everything [09:42:38] Sure. [09:42:42] Can you do upcalls (downcalls ?) for incoming UDP packets in user space? [09:42:52] I don't know, offhand. [09:43:07] --- meffie has left [09:43:18] What is very promising is getting the whole incoming UDP queue out of the kernel in a single system call [09:43:32] That could be handy, yes. [09:44:25] Scheduler priorities look handy too, but you need to be root (or have specific bits set on the binary) in order to be able to use them. We've moved away from running servers as root [09:44:30] But adding platform-specific code can lead to a maintenance headache. [09:45:27] http://bugs.centos.org/view.php?id=6949 (The cpu-scheduler bug in RHEL-derived kernels) Credits of finding this goes to the folks of the HPC in Linköping (NSC), not PDC. [09:45:29] True, it's a trade off [10:00:37] I feel like if I have an out-of-band question I should be signalling it with morse code or semaphore or something. [10:00:53] Or asking it here. Andrew needs a Google Glass onto the chatroom ... [10:02:05] heh - out of band [10:03:04] --- meffie has become available [10:07:17] "disable hot threads" - is there a way to do that other than modifying the source code? [10:07:36] don't think so [10:07:40] I see I'm not the only one keeping a 'todo' list during these talks ;) [10:09:45] Some things have it as a command line switch. But I think the fileserver may not be one of them. [10:09:58] "That can be fixed" [10:10:03] Actually, you want to just rip out the code. Removing it makes the rx scheduler much simpler. [10:10:50] just double-checked, it's a hardcoded call rx_EnableHotThread() in fileserver - no option to turn it off [10:17:48] --- gorgo has become available [10:18:19] --- madaraszg has left [10:26:32] --- Daria has left [10:26:33] --- stephan.wiesand has left [10:26:57] --- mvita has left [10:26:58] --- haba has left [10:27:49] --- mvita has become available [10:28:07] --- Marc Dionne has left [10:28:15] --- mvita has left [10:29:18] --- ktdreyer has left [10:31:11] --- deason has left [10:33:50] --- meffie has left [10:34:41] --- Simon Wilkinson has left [10:46:27] --- ballbery has left [10:48:24] --- ballbery has become available [10:50:58] --- meffie has become available [10:51:56] --- mvita has become available [10:55:20] --- meffie has left [11:18:58] --- mvita has left [12:20:32] --- mvita has become available [12:28:21] --- ballbery has left [12:29:42] --- ballbery has become available [12:41:31] --- deason has become available [12:42:27] --- Simon Wilkinson has become available [12:44:37] --- Simon Wilkinson has left [12:44:46] --- Simon Wilkinson has become available [13:00:56] --- Simon Wilkinson has left [13:01:52] --- Marc Dionne has become available [13:02:19] --- Marc Dionne has left [13:03:49] --- stephan.wiesand has become available [13:04:08] --- Marc Dionne has become available [13:04:58] --- deason has left [13:04:59] --- deason has become available [13:05:12] --- ballbery has left [13:05:28] --- ballbery has become available [13:08:13] --- Simon Wilkinson has become available [13:08:22] --- Daria has become available [14:22:00] --- deason has left [14:24:22] --- deason has become available [14:25:24] --- deason has left [14:30:42] --- Simon Wilkinson has left [14:30:42] --- Simon Wilkinson has become available [14:30:42] --- Simon Wilkinson has left [14:35:31] --- Simon Wilkinson has become available [14:39:29] --- Daria has left [14:46:49] --- Simon Wilkinson has left [14:48:08] --- Marc Dionne has left [14:49:42] --- stephan.wiesand has left [14:53:12] --- Simon Wilkinson has become available [14:55:09] --- Marc Dionne has become available [14:57:03] --- Daria has become available [15:01:14] --- gorgo has left: leaving [15:02:47] --- gorgo has become available [15:42:43] --- Marc Dionne has left [15:43:47] --- Marc Dionne has become available [17:30:27] --- ballbery has left [17:30:28] --- ballbery has become available [17:46:27] --- ballbery has left [17:46:28] --- ballbery has become available [18:13:00] --- ballbery has left [19:21:16] --- ballbery has become available [22:34:35] --- jaltman/FrogsLeap has left: Replaced by new connection [22:34:46] --- jaltman/FrogsLeap has become available [22:38:12] --- jaltman/FrogsLeap has left: Lost connection [22:41:59] --- jaltman/FrogsLeap has become available [23:07:07] --- Daria has left [23:07:18] --- Daria has become available [23:07:24] --- Simon Wilkinson has left [23:07:32] --- Marc Dionne has left [23:07:39] --- mvita has left [23:07:41] --- mvita has become available [23:07:48] --- Simon Wilkinson has become available [23:09:48] --- mvita has left [23:09:50] --- mvita has become available [23:10:02] --- Daria has left [23:10:06] --- Simon Wilkinson has left [23:11:31] --- Simon Wilkinson has become available [23:14:19] --- Daria has become available [23:55:22] --- Simon Wilkinson has left