[00:07:09] --- mfelliott has left [00:07:20] --- mfelliott has become available [01:06:42] I zapped the message to openafs-announce when it became obvious that it was our stupidity, not Debian's that was at play here. [01:07:55] --- cudave has left: Replaced by new connection [01:07:56] --- cudave has become available [01:07:58] --- Jeffrey Altman has become available [01:15:27] --- haba has become available [03:31:13] --- Simon Wilkinson has left [05:12:29] --- summatusmentis has left [05:12:34] --- summatusmentis has become available [05:13:28] ok [05:32:42] --- summatusmentis has left [05:32:48] --- summatusmentis has become available [05:45:55] --- sxw has become available [05:50:43] Huh, the 1.6.0pre2 build is bailing in my amd64_fbsd_82 chroot, but did fine in the _81 chroot. (In JUAFS, it's complaining about conflicting types for SRXAFSCB_{CallBack,...}) [05:51:19] weird [05:51:26] 64bit_env not defined? [05:51:50] --- sxw has left [05:53:11] --- sxw has become available [05:53:53] Or something has a different, incompatible size in a system header that's included in one build type, but not in the other. [05:56:02] could be. [06:17:09] --- sxw has left [06:43:25] --- haba has left [07:23:10] --- deason has become available [07:34:52] > I zapped the message to openafs-announce wait... so we're not sending an announcement still? [07:35:11] not that one... [08:07:25] > 64bit_env could be, though I thought we fixed that. Maybe it didn't make it to 1_6_x [08:10:46] it looks like you only have it for !defined(UKERNEL) [08:11:33] --- reuteras has left [08:26:14] --- haba has become available [08:42:04] Hmm, do I want to just disable UKERNEL builds ... [08:44:40] not really? [08:51:07] --- kaduk@mit.edu/barnowl has left [08:52:11] --- kaduk@mit.edu/barnowl has become available [08:53:08] Incidentally, geofft ran into issues with afsd.fuse not working at all, recently. This caused me to actually look -- it is expected to be fairly functional at this point? [08:55:13] it worked once, a few months ago; I haven't really looked at it since [08:55:58] So, bug reports to -devel@, then? [08:56:06] but it's still labeled as an "experimental" part of the tree [08:56:24] Sure, but as I recall his words were "something with half a chance of working" [08:56:36] (being his expectation of "experimental") [08:57:06] if you want to send something with more info, sure, but if it just doesn't work at all due to recent changes, I can just try to run it from 1.6 today and see what's up [08:57:19] (I'm assuming this is linux, btw? I don't expect anything else to work) [08:57:25] I'm not sure which version he was running. Yeah, linux. [08:59:49] geofft / openafs / geofft 11:59 (Geoffrey Thomas) 1.6.0~rc1 from the PPA [09:00:43] actually, I do have a 1.6.0pre1 tree lying around.... afsd.fuse doesn't fail immediately for me [09:01:17] Hmm. He says: geofft / openafs / geofft 12:00 (Geoffrey Thomas) I may have not been doing something right, but neither afsd.fuse -mountdir foo nor sudo afsd.fuse -mountdir foo seemed to do anything geofft / openafs / geofft 12:00 (Geoffrey Thomas) (The former failed, and the latter worked but all accesses hung) [09:03:09] you don't need to be root to use it; if there's any part of a real client on that machine I would suggest -cachedir /foo/bar as well [09:03:29] for me, 'afsd.fuse -cachedir /tmp/foo.cache -mountdir /tmp/foo.afs' seems to work as much as I expect [09:03:44] (after creating those dirs) [09:04:10] if it's hanging, you can just attach with gdb and give me a backtrace to see what it's trying to do [09:04:12] do you have the "don't verify /afs" patch? i think it needs to hit 1.6 [09:06:16] --- geoffreyerffoeg@gmail.com/barnowlBEF62565 has become available [09:06:17] we should check for the existence of cacheMountDir, though, and we're not using -nomount [09:06:21] I don't have a tree handy on this machine. geofft is probably too hosed to check right now. [09:06:47] in fact, nor do I really have a machine to play with at the moment (I was doing this on a live CD) [09:08:04] --- pod has left [09:10:11] oh, and if you pass -d to afsd.fuse, it'll run in the foreground and give you some more output, possibly complaining about what's wrong [09:12:21] --- haba has left [09:27:10] --- pod has become available [10:02:34] --- rra has become available [12:05:29] With 1.6.0pre2, I get no output other than $ sudo afsd.fuse -d -cachedir /tmp/foo.cache -mountdir /tmp/foo.afs FUSE library version: 2.8.4 nullpath_ok: 0 unique: 1, opcode: INIT (26), nodeid: 0, insize: 56 INIT: 7.16 flags=0x0000007b max_readahead=0x00020000 until I background it and then foreground it, at which point ^Z [1]+ Stopped sudo afsd.fuse -d -cachedir /tmp/foo.cache -mountdir /tmp/foo.afs anders@fixed-disk:~$ fg sudo afsd.fuse -d -cachedir /tmp/foo.cache -mountdir /tmp/foo.afs afsd: Mountpoint /tmp/foo.afs missing. [12:06:06] (All accesses to /tmp/foo.afs hang in the meantime.) [12:44:40] --- sxw has become available [12:45:22] --- sxw has left [12:53:53] andersk: hmm, okay, I see... pre1 works, though. it's obviously the new /afs check; I didn't think fuse mounted the dir so early, but apparently it does [13:04:56] --- haba has become available [13:30:17] --- haba has left [13:51:48] --- deason has left [13:58:22] --- deason has become available [14:08:37] --- Simon Wilkinson has become available [14:22:26] Huh, looks like the fbsd81 builder is not automatically getting feeds from the master? [14:27:06] Does it maybe need the magic touch from Jason, yet? [14:28:28] have you tested the builder to ensure it works by manually issuing a branch and reference to build? [14:28:47] Yeah. http://buildbot.openafs.org:8010/buildslaves says "Not used by any builders", which means that there is some configuration missing to link it in with the change notifications. [14:29:10] jason doesn't configure a slave for auto-builds until he has verification that it is working [14:29:28] you then need to tell him whether you want daily builds or build every patchset [14:29:29] jaltman: no. (But it's Garrett's machine, not mine.) [14:29:34] http://buildbot.openafs.org:8010/buildslaves/freebsd81-amd64 suggests that it is [14:29:50] Actually, that looks like it's just doing "build tip every 12 hours" [14:30:06] there you go [14:30:11] Does anyone actually look at those results? [14:30:16] I do [14:31:15] Is there a specific waterfall that only shows tip builds? So you can tell the ones where failure is a problem with the committed tree? [14:32:56] no. I click through the grid display to check the last build for each of the machines that is obviously not building every patchset [14:33:22] I don't check daily. usually once a week unless I'm looking for something in particular. [14:34:07] > Does anyone actually look at those results? just speaking for myself, I wouldn't mind an email or some kind of notification mechanism when those things fail [14:34:50] Yeah. I'm just looking, and there's oh-so-much breakage in master. [14:34:56] Much of it mine, I fear. [14:35:05] Yeah, mail would be good for these things. [14:35:21] I haven't touched master in a month or two, focusing on testing 1.6 [14:35:29] For example, hcrypto is broken on mac os x86_64. And that must have been the case for months. [14:40:12] Further investigation reveals that it's that build machine that's bust, not the build. [14:41:13] In fact, a number of the builders on that page seem to not be properly cleaning their trees between builds. [14:41:51] The windows slaves "git clean" as part of the build script. [14:41:58] I suspect these are not [14:42:28] The ARM builder is legitimately broken - it doesn't have a GSSAPI library installed, which master now requires. [14:44:01] RHEL5-x86_64 is broken due to the new rfc3961 library code. It's pulling in the keyring includes, and the definition there of key_type is conflicting with Heimdal's [14:45:26] AIX6 doesn't have a working C compiler :) [14:45:29] I've been meaning to ask about gss... we don't want to make it possible to build (just without some components) if a gssapi lib isn't available? [14:47:22] I think we probably do. It should just mean not building rxgk, and a load of #ifdefs in auth/ and aklog/ [14:47:40] my opinion is that it should be possible to deploy in an rxkad only mode but that building should require rxgk on master (but perhaps have an off switch in 1.10). [14:48:09] I need to figure out how to get Russ's RRA_LIB_GSSAPI to not fail if it can't find a GSSAPI library. [14:51:13] According to http://www.openafs.org/no-more-des.html, the 1.6 release will optionally not build kaserver. After 1.6 kaserver will not be built by default. 2.0 will be the release with rxgk and one year after that rxkad will be deprecated. First disabling on clients, then on servers, and then removing it from the source tree. [14:52:08] > 1.6 release will optionally not build kaserver [14:52:12] So we should do that, then? [14:52:17] we should [14:52:28] and we should apply the post-1.6 change to master [14:53:18] Assuming that the next release from master will be the one with rxgk in it. [14:54:09] The post-1.6 (do not build kaserver by default) change is regardless of rxgk. That is you really must use a krb5 kdc [14:54:38] That is different from the "do not build rxkad" by default which will occur one year after the release of 2.0 [14:55:49] Ah, okay. [15:17:36] --- Simon Wilkinson has left [15:17:37] --- Simon Wilkinson has become available [16:09:38] --- phalenor has left [16:09:44] --- phalenor has become available [16:14:27] --- deason has left [16:38:04] > AIX6 doesn't have a working C compiler :) yes it does. the wrapper script sucks [16:39:48] it's the same system i do the aix6 builds on by hand, so, yeah, pretty sure it works [16:40:33] So it's just buildbot hating us? [16:41:10] yeah [16:44:00] the script being executed by the slave [16:47:21] --- phalenor has left [16:47:26] --- phalenor has become available [16:47:56] starking, freebsd has automated tinderboxen that fetch HEAD and tries to build it, in a tight loop; the (shortened) build log is sent to freebsd-current@freebsd.org in the failure case. If the error goes unfixed, it keeps sending mail. [16:50:27] That might get tired in our case. [16:51:02] Personally, I think a waterfall display that shows only machines that should be working would go a long way. I think that's fairly easy to configure in buildbot, providing you know which machines should be working. [17:22:09] --- shadow@gmail.com/owlAA60DA2E has left [17:30:35] --- shadow@gmail.com/owl81329388 has become available [17:42:52] --- rra has left: Disconnected [18:10:44] --- deason has become available [18:11:52] >yes it does. the wrapper script sucks well, that error can mean a lot of things (like the disk being full); it's help to know what the config log says [18:20:30] --- phalenor has left [18:20:36] --- phalenor has become available [19:10:33] --- deason has left [19:14:43] --- deason has become available [19:27:18] Hmm, with a fresh 1.6.0pre2 client, trying to copy a 20M file from local disk into AFS, cp hangs in afs_osi_sleep(&afs_WaitForCacheDrain), afs_dcache.c:3243 . afs_vcount is 10, and I have hardly done anything on the machine; the memcache is some 450M. [19:30:30] Well, that's not really hanging; it's blocking on a CV [19:30:47] *rolls eyes* the copy is going nowhere, and no data has been written. [19:30:54] The process is unkillable. [19:32:23] OK, fine. From user mode, it looks hung. In the kernel, you know it's waiting on another thread. [19:34:11] This is about dcaches, not vcaches. What is your chunk size? [19:35:23] I appear to be running with "-stat 2800 -dcache 2400 -daemons 6 -volumes 128" and autotuning for ~everything else. [19:36:07] cmdebug -cache I don't remember what the chunk size heuristic is for memcache [19:37:37] That sleep should only happen when a substantial fraction of the cache blocks are in use. What you describe oughtn't trigger that, unless of course the cache size is not what you think it is. But even then, it ought to drain, even if that means pushing stuff to the fileserver early. [19:37:44] the CacheTruncateDaemon is sleeping at line 448: 446 if ((afs_termState != AFSOP_STOP_TRUNCDAEMON) && afs_CacheTooFull 447 && (!afs_blocksDiscarded || afs_WaitForCacheDrain)) { 448 afs_osi_Wait(100, 0, 0); /* 100 milliseconds */ [19:38:12] OH, hm; my sandbox is old. One moment... [19:39:04] freebuild# cmdebug localhost -cache 7001 Chunk files: 2400 Stat caches: 2800 Data caches: 2400 Volume caches: 128 Chunk size: 8192 Cache size: 19200 kB Set time: no Cache type: memory [19:39:46] OK, so the cache is not as big as you think it is. [19:39:58] Hmm ... [19:41:51] But where would it be getting the other size from? [19:42:41] What other size? [19:43:22] I have freebuild# cat /usr/local/etc/openafs/cacheinfo /afs:/usr/vice/cache:425900 So where is this 19200 coming from? [19:45:42] It's computed. [19:46:40] Clearly I should RTFM. Which one? [19:47:30] Memcache imposes some constraints, since everything is in memory. One of those is the one I just mentioned. Another is that the number of chunks and the number of dcache entries is exactly the same. By specifying that number, you caused the cache size to be computed based on that and the chunk size (which, in that case, is inferred if you don't specify it). [19:49:25] If you'd given both -blocks and -dcache on the command line, it would have refused to start. If you'd given -files, it would have printed a warning, but continued; that parameter is not used in memcache. Since you gave -dcache but not -blocks, it ignored the size it read from the cacheinfo, in favor of computing one from what you gave on the command line. Arguably this is a silly way to behave. [19:49:52] I don't know that there is much documentation on memcache. [19:50:26] Okay. Do I want to just not give -dcache? [19:52:34] Yeah; for memcache I'd specify the cache size and maybe the chunk size (it will pick something pseudo-reasonable if you don't). It's really silly for -dcache to be the thing that's used here, since what that's supposed to do is specify the size of an in-memory cache of a larger on-disk data structure, but for memcache, the whole data structure is in memory anyway. [19:52:50] FWIW, I'm reading src/afsd/afsd.c [20:50:26] (For reference, make -j6 with 1.6.0pre2 dies horribly. But I didn't keep a full build log with which to point fingers.) [21:59:00] --- haba has become available [22:11:35] --- deason has left [22:26:30] --- haba has left [22:30:17] --- reuteras has become available [22:50:21] --- Russ has become available [23:36:28] --- Russ has left: Disconnected [23:47:23] --- reuteras has left [23:50:05] kaduk: master seems to build pretty reliably with -j. I wonder if we are missing a pullup somewhere. [23:56:48] --- haba has become available [23:58:18] --- haba has left [23:58:21] --- haba has become available [23:58:53] --- Simon Wilkinson has left