[00:04:32] --- Russ has left: Disconnected [00:32:44] --- Simon Wilkinson has left [01:32:40] --- reuteras has left [04:36:03] --- Simon Wilkinson has become available [04:51:08] --- Simon Wilkinson has left [04:58:37] --- Simon Wilkinson has become available [05:22:30] --- jaltman/FrogsLeap has left: Disconnected [06:03:05] --- jaltman/FrogsLeap has become available [06:54:01] --- reuteras has become available [07:03:22] --- mfelliott has become available [07:20:14] --- reuteras has left [07:34:26] --- deason has become available [07:40:05] --- mfelliott has left [07:44:51] --- mfelliott has become available [07:57:21] --- Simon Wilkinson has left [07:59:36] --- Simon Wilkinson has become available [08:02:31] --- haba has become available [08:03:16] --- meffie has left [08:05:59] Does "1.6.0" afsd crashes part of its forked childs when finding a cache made from "1.4.x" and does not mount /afs ring a bell? [08:06:13] Nope [08:06:17] nope [08:06:20] Or at least that's what I think happened. [08:06:26] afsd works with a 1.4 cache. I wrote the code to make that happen. [08:07:01] 1.6 afsd crashes with a 1.4 kernel module, or 1.6 kernel module crashes with a 1.4 afsd is certainly possible. I don't think that has been well tested. [08:07:08] Unfortunately I did a rm -rf on the cache partition so I can not reproduce easily and now it works. Stupid me. [08:07:25] yeah, if you run some afsd that didn't come with the kmod, i don't care [08:07:55] Stack trace / core / kernel ooops message ? [08:07:59] no, the afsd and the kernel module fit together. There was just OLD contents in the cache partition. [08:08:17] Stupid centos has coredumpsize=0 as default for root as well. [08:08:35] otherwise I would probably have a core [08:08:39] Pretty sure old cache contents, in general, isn't a problem [08:09:05] Apparent from the cache content, nothing changed between the two reboots. [08:09:07] yeah. in general that's fine. [08:09:08] --- mfelliott has left [08:15:59] Hm. Found a core, but that points to a problem in the malloc(sizeof(*args)) that was called from afsd.c:3148. And sizeof(*args) is 64. That would be a crash on a malloc(64). Strange. [08:17:09] You on RedHat? [08:17:21] Yeah, kindof (CentOS6) [08:18:05] Install the glibc-debuginfo RPM, and that should let you know where in malloc you are dieing. Death In Malloc usually points towards heap corruption, though. [08:23:23] http://debuginfo.centos.org/6/ is empty. Thanks for that.... [08:24:37] be thankful debuginfo.centos.org is responding at all :) it was down for weeks not long ago [08:24:50] --- Simon Wilkinson has left [08:25:17] --- Simon Wilkinson has become available [08:25:58] * haba thankful. Now, can I get some content? ;-) [08:27:42] http://ftp.redhat.com/pub/redhat/linux/enterprise/6Server/en/os/x86_64/Debuginfo/ contains 2 debuginfo packages for ruby. [08:29:38] You're using CentOS, so you want their debuginfo packages [08:30:10] just for comparison [08:30:52] I don't think debuginfo.centos.org is where you want to look. [08:31:43] If you look in your yum.conf you should see a disabled debuginfo line. If you enable that (either in that file, or on the yum command line), you should just be able to yum install the debuginfo rpm. [08:31:46] debuginfo.centos.org came from a file in /etc/yum.repos.d/ [08:32:01] Hmmm. There's someplace else that they lurk as well, then. [08:35:39] I think you're assuming that the debuginfo packages for the corresponding packages must be easily available; I wouldn't make that assumption :) [08:36:23] I do not fancy to build the libc from srpm [08:37:24] Actually, I think I'm confusing Scientific Linux with CentOS, sadly. SL hide their debuginfo RPMs very well, it looks like CentOS just don't bother with them at all. Sorry. [08:38:31] Looks like they are out of resources.... In august, they needed a "couple more days" according to https://www.centos.org/modules/newbb/viewtopic.php?topic_id=32248 [08:39:54] http://bugs.centos.org/view.php?id=5009 [08:39:56] Well well. [08:40:23] In any case, my suspicion is that the debuginfo packages won't tell us that much. It sounds like we're corrupting the malloc heap some how - running with valgrind or malloc debugging would be the best way to track that down [08:41:02] I will think about that if I can make it reproducable [08:41:43] http://bugs.centos.org/view.php?id=4948 seems to be the latest - "catastrophic failure of the debuginfo server" [08:44:35] * haba will get cappucino and banana from kitchen instead. [08:56:40] --- Simon Wilkinson has left [08:58:04] --- Simon Wilkinson has become available [09:08:05] --- Simon Wilkinson has left [09:08:47] --- Simon Wilkinson has become available [09:12:53] --- Simon Wilkinson has left [09:23:21] --- Russ has become available [09:45:16] --- haba has left [11:38:13] --- mfelliott has become available [11:58:35] --- meffie has become available [12:03:42] --- mfelliott has left [12:08:12] --- mfelliott has become available [12:21:43] --- haba has become available [13:10:42] --- haba has left [13:49:11] --- Simon Wilkinson has become available [15:10:47] There's a big difference in the directory buffers code between 1.6 and master - master has a load of stuff to make the way buffer allocation works more sane - hopefully as a stepping stone to doing away with a separate buffer cache entirely, and just using the page cache. [15:11:55] Bah. The (outdated) history in this room is getting really annoying [15:17:40] it does at least give you (or me, at least) timestamps, so it's possible to tell if something's a week old :) [15:18:19] But while I'm continuing thread necromancy. I think the making -p better thing was a) Andrei's results from last year, where he found that the fileserver performed better for his workload with fewer threads, and b) Garrett Wollman's results which suggested that on FreeBSD -L made things worse. [15:19:07] deason: Sadly, Adium doesn't grace me with time stamps. It thinks the most recent thing said in the room was by you at 16:53. Only by checking the web logs do I realise that the discussion about -L -p is yonks old. [15:37:20] --- mfelliott has left [16:25:38] --- deason has left [17:24:14] "what about the discussion about 'all buffers locked'?" [17:59:35] --- Russ has left: Replaced by new connection [17:59:35] --- Russ has become available [22:47:26] --- reuteras has become available [22:50:39] --- Russ has left: Disconnected [22:52:35] --- Russ has become available [23:00:18] --- Russ has left: Replaced by new connection [23:00:19] --- Russ has become available