[00:12:39] --- cudave has left: Disconnected [00:19:21] --- abo has become available [00:54:01] --- haba has become available [00:56:18] --- haba has left [00:56:28] --- haba has become available [00:57:43] A full deck of cards is not as useful as it seems when the rules are changed in the middle of the game. [00:58:07] A bit like Flux, then. [00:59:59] At least one person in the arla devel team got very tired of that and did not so much arla kernel devel after such experience because it took the fun out of it and it just felt like work. yes, Flux is fun when you play it for playing Flux :) [01:16:40] --- kaj has become available [01:18:04] --- cudave has become available [01:51:26] --- jaltman has left: Disconnected [02:14:11] --- haba has left [02:53:12] --- haba has become available [04:02:55] --- tharidufernando has become available [05:56:17] --- reuteras has left [07:24:17] --- deason has become available [07:34:43] --- haba has left [07:37:30] --- haba has become available [07:46:43] --- tharidufernando has left [07:47:27] --- tharidufernando has become available [08:44:19] --- Russ has become available [08:45:05] Yeah; fluxx is fun as a game, but real life shouldn't be like that. [08:46:12] jhutz: Seeing a you're here [08:46:59] Have you had a chance to read the stuff about the unfortunate trade off between locking and write-on-close on openafs-info? [08:50:11] probably not [08:50:29] My openafs-info folder has 8593 unseen messages [08:51:56] Ah. Well, there's an interesting behaviour conflict between flock() and the Unix client's determination to not flush dirty pages when a file changes on the server. [09:44:57] --- meffie has become available [10:32:23] --- haba has left [10:32:37] --- tharidufernando has left [10:54:24] afs_StoreMini appears to be giving me some odd results... I'm not sure who's at fault here: [10:55:07] StartRXAFS_StoreData64 returns 0; EndRXAFS_StoreData64 returns -451 (marshalling error?), and rx_EndCall returns VBUSY, which is what the fileserver says it actually returned [10:55:40] storemini only uses the error code from rx_EndCall if we haven't already encountered an error... other code in the tree suggests that's what's normally done [10:55:59] but since the -451 is what's given to afs_Analyze, we don't retry as we would on a VBUSY, so an error is returned to userspace [10:56:11] Derrick and I looked at this a while back. [10:56:34] I suspect that might even be the point that StoreMini gained this behaviour. What does git blame tell you? [10:59:34] the "use the EndRXAFS_StoreData error code over rx_EndCall" came from you, b1eb6a7a3f80500f0187cc6a1dd2013e1a5e154a, and yeah, I remember seeing that now [10:59:44] but actually, I see it done both ways in the code, I don't know which one is right [10:59:58] that is, some places to 'code = rx_Endcall(call, code);' [11:00:14] and some do 'code2 = rx_EndCall(call, code); if (!code && code2) code = code2;' [11:00:24] The convention in the code is to use the most specific error. Otherwise you miss things like errors due to being over quota. [11:01:36] It looks like, in this case, that convention is breaking though. [11:01:42] So, if you already got an error on rx_Read/rx_Write, you shouldn't be calling the EndRXAFS_* function, which is responsible for unmarshalling the OUT arguments _after_ you've successfully done everything in between. Skip directly to rx_EndCall. [11:02:07] there is no rx_Write here, this is a truncation call [11:02:09] ... which will tell you the abort error, if the problem is that the call was aborted. [11:02:47] it just does StartRXAFS_StoreData, EndRXAFS_StoreData (if code ==0), and then rx_EndCall [11:04:43] --- kaj has left [11:04:44] --- kaj has become available [11:06:13] What's the server actually sending to the client? [11:06:23] it looks to me like the rx_EndCall error should just get precedence, since the only error we can get from EndRXAFS_* is RXGEN_CC_UNMARSHAL, isn't it? [11:06:31] it sends a VBUSY abort [11:08:03] Certainly seems that way. [11:08:19] An error from an abort packet generally should get precedence over something else. [11:09:17] I guess the question is how do you tell? [11:09:55] Unless you always prioritise rx_EndCall errors, and then you're back into the problem that the Windows client had, where it would miss interesting error cases like over quota. [11:10:10] In this case, I think the first chunk of b1eb6a7a3f80500f0187cc6a1dd2013e1a5e154a is wrong, and should probably be reverted. [11:10:30] Perhaps with a comment about why preferring EndCall is the correct thing to do. [11:10:32] that's what I don't understand, what call was actually reporting the over quota error? I don't think that would be EndRXAFS_StoreData [11:11:07] So, I think the answer here is that you should normally call rx_EndCall(call, code-you-already-have), and rx_EndCall should decide whether to return 0 or code-you-already-have or something else, and you should ~always trust its judgement. If rx_EndCall's current behavior doesn't make that reasonable, we should fix it. [11:11:34] No, it was a general thing. Jeffrey's assertion was that we should always prefer the result of the RPC, to the result of calling EndCall. [11:11:35] okay, so 'code = rx_EndCall(call, code);' is what I should see? [11:11:51] (according to jhutz, that is, heh) [11:12:33] According to jhutz, that's what we should have everywhere. And that makes a lot of sense. I don't think it would work everywhere currently, which leads to the "we should fix it" part of his comment. [11:13:15] yeah, i agre it should be what's everywhere, but i'm pretty sure it will need work [11:14:57] (currently rx_EndCall appears to just always uses the call error if we have one, fwiw) [11:23:25] --- kaj has left [11:26:24] --- kaj has become available [11:39:39] > and should probably be reverted 1808 [11:45:37] I’m seeing strange problems with 1.5.74. Both yesterday and today (so across a reboot), /afs/athena.mit.edu/user/a/n/andersk/Scripts/ showed up as empty. I ran ‘fs flushv’ and now andersk/Scripts/ is better but andersk/Public/ is empty. Also andersk/Scripts/git is a symlink to the empty string. [11:49:22] Another ‘fs flushv’ fixed Public but I can’t fix the empty-string symlink. [12:04:04] Others have seen these issues with 1.5 [12:04:19] In the past, it seems to have gone away if you stop the client, and flush the disk cache. [12:04:44] I haven't heard from anyone who has had problems after doing that. [12:05:01] (That is, the problem only appears with an upgrade from 1.4) [12:06:12] Okay. Should I do that, or is there any debugging information that might be useful to extract? [12:06:45] er, i missed something. but if it's "wipe the cache", just move it aside, and restart. if the cache is screwy you still have it [12:06:56] Yeah. What he said. [12:07:02] Before doing that, though. [12:07:12] Do you get the same results if you stop the client and then restart it? [12:07:18] If you don't, I don't care about the disk cache. [12:07:50] After stopping the client and starting it, andersk/Scripts is empty again. [12:08:14] Okay, stop the client and move the disk cache out of the way. [12:08:27] At some point, I may care about the cache contents. [12:09:04] Done. (Yes, that fixed it.) [12:09:17] --- abo has left [12:09:21] Did you just upgrade from 1.4? [12:09:33] --- abo has become available [12:10:21] Two weeks ago. [12:10:34] But without deleting the disk cache? [12:10:36] Yes. [12:10:45] Okay. Let me know if it goes boom again. [12:12:28] Okay. It's all my fault. [12:12:42] ? [12:12:55] The size of struct fcache has changed between 1.4 and 1.5 [12:13:49] It will change again depending on whether LINUX_USE_FH is defined or not. [12:15:15] And between kernel versions if the size of "struct fid" changes. Bah. This needs cleaned up. [12:17:36] --- haba has become available [12:18:37] er, if struct fcache goes directly on disk, does that mean we're storing a char* pointer on disk if AFS_CACHE_VNODE_PATH? [12:20:20] --- jaltman has become available [12:20:50] In the future, we should store the kernel version and the openafs kernel module version in the disk cache and if it is not identical just wipe the cache. Or does someone care preserving the cache contents between versions? [12:21:19] There's already a version number. We should probably bump the version number, and add a 'entrySize' field. [12:21:29] But deason's point is an interesting one. [12:21:43] On another machine that I had upgraded from 1.4 to 1.5 without clearing the cache, I got # /etc/init.d/openafs-client stop Stopping AFS services:Segmentation fault afsdERROR: Module openafs is in use openafs. and an oops in shutdown_AFS: http://pastebin.com/AqWrnGfw [12:24:10] deason: It does, but it isn't a problem. [12:24:43] Ermmm, I think, anyway. Still reading. [12:26:48] Yeah. Basically, InitCacheFile will cycle through all of the files in the cache, and replace whatever was saved off to disk with the current pointer, file handle, inode number, etc. [12:27:00] So, on start up, everything gets fixed. [12:28:24] okay; I wouldn't anticipate a problem with it, since I assume we would've seen a panic from that by now, but you never know... [12:29:25] I'm going to look at a patch to up the cache version number, add an entry size field, and blow away old caches. [12:52:39] --- jaltman has left: Replaced by new connection [12:52:41] --- jaltman has become available [13:07:37] --- mdionne has become available [13:14:42] in the FH case the size of our structure should depend on MAX_FH_LEN, not sizeof(struct fid) which is small (4 IIRC). this is currently set to accommodate the largest known handles [13:15:28] Given we need to bump the cache header version anyway, I'm just going to add a structure size block to it. [13:16:51] that sounds like a good idea [14:16:06] --- mdionne has left [14:34:30] http://gerrit.openafs.org/1811 [14:43:38] --- mdionne has become available [14:45:27] found the issue with rh6 and 1.5. turns out its a general problem with CREDENTIALS_DEBUG. will try a fix shortly [14:47:23] --- haba has left [14:51:00] --- mdionne has left [15:45:07] --- mdionne has become available [15:49:06] --- deason has left [15:49:20] --- meffie has left [16:40:54] --- jaltman has left: Disconnected [16:41:33] --- jaltman has become available [16:42:10] --- deason has become available [16:59:28] --- jaltman has left: Replaced by new connection [17:00:24] --- jaltman has become available [17:03:12] --- jaltman has left: Lost connection [19:21:25] --- Russ has left: Disconnected [19:38:01] --- mdionne has left [19:52:26] --- Russ has become available [20:40:09] --- Born Fool has become available [21:27:52] --- Born Fool has left [21:43:29] --- Russ has left: Disconnected [22:20:01] --- reuteras has become available [22:21:58] --- reuteras has left [22:24:14] --- jaltman has become available [22:29:19] --- kaj has left [22:52:03] --- deason has left