[00:26:38] --- kaj has become available [01:16:28] --- abo has become available [01:24:33] --- Simon Wilkinson has become available [02:01:13] --- Simon Wilkinson has left [04:10:00] --- matt has left [04:24:09] --- jaltman has left: Disconnected [04:51:31] --- dwbotsch has left [04:53:04] --- dwbotsch has become available [05:39:18] --- jaltman has become available [07:07:06] --- deason has become available [08:02:36] --- meffie has become available [08:07:45] --- steven.jenkins has become available [08:40:57] --- matt has become available [09:17:10] --- reuteras has left [09:19:13] --- kaj has left [10:28:34] --- kaj has become available [10:42:10] --- kaj has left [11:45:14] --- Russ has become available [12:32:01] --- Simon Wilkinson has become available [12:33:47] Gah. I think I've found the reason why our performance falls off a cliff when the fileserver is running on ext3. [12:34:00] do tell [12:34:15] We call sync() every 10 seconds. [13:22:20] --- Simon Wilkinson has left [13:23:33] looks like -fakestat fakes the stat info for local cellular mounts (e.g. "localcell:vol.foo"), but the docs imply otherwise IMO: bug in the docs or in the code? [13:24:43] hrm. fakestat0-all is supposed to fake everything, fakestat only foreign [13:24:49] without the typo of course [13:26:01] code looks like it's "fakestat only cellular" not "only foreign", which is an admittedly small but nonzero difference [13:28:35] we actively discourage local cerll cellular mounts and always have [13:28:39] so "meh" [13:28:43] wow, this net sucks [13:29:39] > we actively discourage local cerll cellular mounts and always have Uh, I'm pretty sure I haven't always. And, there are some cells that have been around a bit longer than openafs [13:30:09] it was discouraged when we cmu used ibm afs. [13:30:50] i remember talking with dan lovinger about it [13:31:06] I presume because it resets the search-for-RO, and in general makes it look foreign so it's weird? [13:32:12] the RO behavior was the obivois one. 14 years out i no longer remember the rest of the discussion [13:34:45] so at least, it's probably not worth a code change; I'll drop a note in -fakestat [13:57:21] Except that it _doesn't_ reset the RO behavior, which is based entirely on whether the mount point and target volume are actually in the same cell. Though it did at the time -- that behavior changed in 3.5 [13:59:02] 'fs mkmount's manpage heavily implies that it does, though of course I haven't checked [14:00:36] That was probably written decades ago [14:01:03] but I checked -- the behavior changed between 3.4ap13 and 3.5. [14:03:31] I tend to believe you, but do you have a function name or something handy so I don't need to believe you? [14:04:06] or wtf am I talking about, I can just make a mount [14:04:38] it's easy to test. if your network is happier than mine [14:05:03] You're interested in EvalMountPoint(), and particularly in code that refers to 'cpos' in the old code and 'samecell' in the new. That's in src/afs/afs_vnodeops.c in old code, and in src/afs/VNOPS/afs_vnop_lookup.c in new code [14:05:30] > src/afs/afs_vnodeops.c [14:05:35] that's old alright [14:05:49] but yeah; you only have to believe me or read code if you want to know _when_ it changed. [14:06:03] Not that old. just before the zumach reorg [14:06:19] that's been a while [14:06:30] feels like yesterday, sometimes :-) [14:07:36] $ fs mkm foo.root localcell:root.cell $ fs lq Volume Name Quota Used %Used Partition root.cell no limit 1111 0% 51% $ cd foo.root $ fs lq Volume Name Quota Used %Used Partition root.cell.readonly no limit 1106 0% 51% [14:08:44] this is some random 1.5 client though, so a million things could be wrong with it [14:16:34] The code on current master is a bit more complex than in 3.5 ... 1.4.x, but still looks correct in that regard. It shouldn't push you onto an RO volume unless you're crossing from an RO or crossing into another cell. [14:18:19] I really should port /afs/cs.cmu.edu/misc/openafs/src/Patches/rxgen-cppcmd-on-unix-and-less-buffer-overruns.diff forward and submit it. [14:18:41] I agree (at least as far as samecell being correct)... I'll maybe test more later tonight [14:21:18] --- meffie has left [14:32:03] WTF does the search field in the git web UI not let me just type in an object ID??? [15:13:08] is "automated hook processor" the new git enhancement? [15:13:17] or, gerrit, or... [15:15:30] > is "automated hook processor" the new git enhancement? gerrit. but yes [15:15:49] that's how we will eventually do auto-build-verification on submit [15:16:43] derrick: good news is the issue from this morning is somehow introduced by me, will track it back [15:16:55] ah cool [15:32:55] local gsoc student = i provide free transportation, she comes to workshop. nice. [15:33:08] yeah [15:33:29] (since i'm driving anyway) [15:33:44] Actually, that'd work for non-local students, but be more expensive. [15:35:13] i'm not driving to (greece, india, pakistan, scotland) sorry [15:37:56] I'd be impressed if you could do any of those. [15:47:35] --- deason has left [16:20:06] go fast enough, and you can just hydroplane over the oceans in question [17:03:13] Huh, a git fetch and rebase claims that param.i386_fbsd_42.h->param.nbsd40.h in HEAD and deleted in Kill FBSD4X with fire, so I get conflicts. [17:42:00] --- mdionne has become available [17:53:24] --- kaduk@mit.edu/barnowl has left [17:54:02] --- kaduk@mit.edu/barnowl has become available [18:13:38] --- deason has become available [18:41:28] --- Russ has left: Disconnected [18:57:32] --- Russ has become available [19:19:29] --- matt has left [19:53:43] --- mdionne has left [21:09:33] --- Born Fool has become available [21:34:18] --- Born Fool has left [21:59:53] Grumble, I configured with --enable-debug, but kgdb doesn't seem to be finding the symbols in the module. [22:10:45] --- reuteras has become available [22:25:07] there's --enable-debug-kernel and --disable-optimize-kernel [22:25:20] dunno if debug options work differently on fbsd or something, though [22:27:47] I feel like this used to work, though. Clearly I should have taken better notes. [22:29:35] Say, while you're here, any thoughts on why I would be getting spew of _end() at [address] on the console? I note that there is an _end symbol in libafs.ko but not libafs.kld, but the 'ld -Bshareable -d -warn-common -o libafs.ko libafs.kld' incatation that produces libafs.ko doesn't seem to obviously cause that. [22:30:38] There are a few rx_freePacketQueue() at rx_freePacketQueue sprinkled in as well. [22:35:19] no; I remember you mentioning that, and I was just as confused as to what those messages are [22:35:59] I presume that it is some fbsd-specific thing, but failed to get much of a response from #bsddev. Maybe I will break down and send mail. [22:38:03] yeah,the messages appear neither informative nor very greppable [22:38:51] It's kind of annoying because I have a vague recollection of reading about some message of that sort getting printed when control flow reaches a non-existent function ... but that doesn't make any sense. [22:39:30] Anyway, on to things that you are more likely to know about. [22:41:12] In that after a while of high load, I get the 'tokens for user ... are discarded' on the console, and permission denied for my filesystem accesses. [22:41:43] But `tokens` still claims they're there, and I can 'pts mem' myself. [22:43:32] so the GetTokens pioctl is still working, but for some reason the pag tracking in the regular code says you have bad tokens.... [22:44:15] Hrrm. Maybe I should break down and rehash the pag code, then. [22:45:07] rx_freePacketQueue is a variable not a function. that is odd. [22:45:40] debugging what the unixuser structure looks like that PGetTokens is giving you may be enlightening.... [22:46:03] like, if some flag is set but not another one, or something odd like that [22:46:53] PGetTokens also appears to have some funky "get the Nth token" interface I'm not entirely familiar with [22:46:58] you might want to record the token that is being stored and see if after the problem occurs you are still getting the same token back [22:47:46] what is the specific error that is being generated? [22:48:19] there typically would be an rxkad error reported before the discard [22:48:43] or maybe you have an Nth token that's good, but on FS accesses you're getting an invalidated one? just wild guesses, though [22:48:58] afs: Tokens for user of AFS id 24729 for cell sipb.mit.edu: rxkad error=19270410 [22:49:21] sealed data inconsistent [22:49:34] "bug" :) [22:50:25] the tokens are being discarded because the file server has been given a partially damaged token [22:50:47] at the start of a new connection [22:51:05] --- abo has left [22:51:15] --- abo has become available [22:52:29] or the checksum on an rx packet doesn't match. [22:55:55] there is a code path where the error RXKADSEALEDINCON could be generated locally [22:56:23] it would be helpful if you knew whether an abort response with that error was received from the server. [22:58:13] This only seems to occur under fairly heavy load, so I'm not entirely sure that I want to tcpdump. I suppose I could spin up a high-debug-level fileserver in the zone.mit.edu cell at some point, though. I should probably be getting to sleep fairly soon tonight ... [23:00:36] you can filter out just rx aborts, though if tcpdump can't keep up, not seeing an abort may just mean you missed it... [23:01:16] I'd have to read up to figure out how to make tcpdump filter out just rx aborts. [23:02:25] or rather, I assume it can; wireshark sure can, and tcpdump decodes them to at least that degree, I thought [23:07:38] --- deason has left [23:31:32] --- Russ has left: Disconnected