[00:07:27] --- Russ has left: Disconnected [00:21:41] --- dev-zero@jabber.org has become available [00:21:48] --- dev-zero@jabber.org has left: offline [01:49:23] Well, I seem to have managed to make the Unix client 50% or so faster for reads which are satisfied by the disk cache. [01:49:52] how so? [01:50:25] Doing our IO correctly, permitting readahead, and populating read-a-head pages in a background thread. [01:50:53] Also by cutting out all of the OSI layer, and using native calls when we know that the disk cache is up to date. [01:52:23] Re: know disk cache is up to date, doesn't the cache know if it is up to date? It sounds to me like you have another cache that says if the cache is populated or not [01:52:55] Not at all. [01:53:23] When the disk cache is up to date, the problem simply becomes getting the data out of the disk cache, and into the applications memory. That doesn't really involve AFS at all. [01:53:35] oh, I see [01:53:56] --- stevenjenkins has left [01:54:00] It means you can implement it in a very naive manner, that doesn't involve GLOCKs and all sorts of other AFS nastiness. [01:54:11] "native" as is native to platform, not native to openafs [01:54:26] Yes - direct Linux kernel calls. [01:54:34] so is this just on Linux? [01:54:37] Yup. [01:54:40] or applicable on say, Solaris ? [01:54:43] ah, ok [01:55:09] I don't know how Solaris's VM works. It might be that the same work is applicable there. [01:56:23] (Essentially on Linux, all filesystem access goes through the in-memory page cache. As a filesystem, all that you're required to do is to read and write items from this page cache. It means, for example, that every read you get will be for a page, which means that lots of the CM's logic for dealing with reads that cross chunk boundaries is never executed on Linux) [01:56:53] It's also why, in some cases, accessing data from a fileserver is quicker than getting it from the local disk cache. [01:56:59] --- stevenjenkins has become available [02:03:21] I suspect that further speed ups may also be possible by revisiting inode hinting. [02:04:15] (keep the disk cache filehandle open, in the assumption that if you've read a few blocks from a dcache, you'll probably end up reading more. Maintain a lruq to expire handles once you've crossed a certain limit) [05:30:49] --- Jeffrey Altman has left: Replaced by new connection [05:54:57] --- mmeffie has become available [06:00:52] > I can use the wdelta browser from my mobile phone. because I hate broken things you can't use from your phone :-) [06:10:35] --- Jeffrey Altman has become available [07:01:12] --- deason has become available [08:17:55] --- Russ has become available [09:37:38] I think we've got a race condition in the 1.4.10 kernel module. Which is rather dull. [09:38:17] Try using the 'blogbench' benchmark against a sufficiently fast fileserver, and reads fail with EIO. The faster the fileserver and client (or the more cores on the client) the more errors you'll see. [09:46:41] --- dev-zero@jabber.org has become available [09:50:30] Hmmm. Not the first one to find this - http://archive.netbsd.se/?ml=openafs-info&a=2009-02&t=9953242 No solution there, though, sadly. [09:53:21] that problem went back to 1.4.5 as well [09:53:25] so its not a new issue [09:54:59] It's possible that it's an 'always been there' issue that's only been exposed as machines have become faster. We used blogbench to benchmark the 1.3.x development line when we made the decision to move to OpenAFS and didn't seen any problems. [09:55:14] hmm [09:55:22] I wonder if 1.4.0 has these problems [09:55:28] and actually, I can test with 1.4.2 [09:55:34] it wouldn't be the first and won't be the last [09:55:42] I guess I should look into doing that [09:55:59] Would be good if you could. I'd have to downgrade kernels here to run 1.4.2. [09:56:38] Problem is that fstrace serialises/slows the kernel sufficiently that the problem goes away, so I think I'm going to have a boring evening with the 'printk' function. [09:57:14] gah! I can't log into anything as both of my KDCs are down due to cooling failure... [09:57:28] Thats ... very dull. [09:57:36] might have to wait until next week when it gets fixed [09:58:03] I guess I'll have to walk over there and login on the console [09:58:19] Okay. This isn't that high priority for me. I was just testing my faster reads code when I found this problem, and spent a couple of hours trying to find the issue in my code. Turns out it's not there at all ... [09:58:24] do we need to drop-ship you an air conditioner? [09:59:59] its a fairly large building [10:00:14] and the room in question has no windows, so a window unit won't work [10:00:26] its basically the attic of an old building [10:00:43] a decent power saw should solve the lack of a window problem.... [10:01:12] Anyone know how I can get RT's fscking awful web interface to let me do a query like (Contents = blogbench) && (status = open || states = resolved) [10:03:04] Query Builder? [10:03:22] Tried that. It would only let me build a query that was all AND or all OR. [10:03:41] you indent the lines you want to combine with different operations [10:03:52] and you can toggle AND to OR and vice versa [10:03:53] I know shiny modern RT's will do it. But the OpenAFS RT is to modern RTs what a dung bucket is to a flush lavatory. [10:04:13] I just did it so I know it is supported. [10:05:05] "Subject LIKE 'blogbench' AND (Status = 'new' OR Status = 'open')" [10:05:23] err, not subject [10:05:26] * cclausen fails [10:05:32] Content LIKE ... [10:05:39] unfortunately, I'm not finding anything [10:05:45] Is there somewhere you can actually type that? [10:06:15] in the query builder "Subject" is an item in a drop down list. Toggle it to "Content" [10:06:24] http://rt.central.org/rt/Search/Edit.html [10:06:54] I'm not seeing any results though [10:07:16] I don't think there are any. I gave up when we started this discussion and just made it search through everything, spam and all. [10:07:49] But now I know how to do it properly. Well, for the next few days at least, until my sieve-like brain discards the information. [10:14:34] > Is there somewhere you can actually type that? Yes; there's an "Advanced" tab [10:15:30] RT #124973 - so this doesn't get forgotten about again. [10:27:27] --- dev-zero@jabber.org has left: offline [10:29:01] Just read through https://lists.openafs.org/pipermail/openafs-info/2009-February/030928.html [10:29:34] hmm... do we think this is limited to just Linux? [10:30:16] Are you seeing the same issue as the Robbert where the EIO errors are occurring after the "RXAFS_Rename" operation completes and the FID of the file has changed out from underneath the cache manager? [10:30:55] Is the EIO error by chance the result of the VNOVNODE error coming back from the file server on the old FID? [10:32:04] As in - "Is this RT #124393" [10:33:54] that is what I'm asking [10:34:08] Just gathering the necessary information ... hang on. [10:34:35] The reason I created 124393 was because of that openafs-info thread [10:34:38] (What I don't understand, if this is the issue, is why it's so timing dependent) [10:35:02] the directory info is invalidated as part of the rename [10:35:53] if you read the old fid out just before the rename invalidates it, that FID will be used for the read and by the time the read hits the file server the FID is no longer present [10:36:59] if true, there should be a callback in your tcpdump [10:37:10] the client could protect itself by invalidating the directory contents prior to issuing the RXAFS_Rename to the file server [10:37:54] the Windows client locks the directory object during a rename to prevent the contents from being used until the revised status info is received. [10:38:53] a printk() in afs_Analyze for the VNOVNODE processing would also be a tell tale sign [10:39:45] I'll have a packtrace shortly. Just copying it back to my desktop. [10:47:45] Yup, VNOVODE in response to a FetchData64 [10:52:36] No renames issued against that FID, though :( [10:53:31] what fid is being fetched? [10:53:51] 536870924.304640.455003 [10:54:10] Providing I'm matching the abort up correctly with the call. [10:54:30] derrick/simon, I'm having issues with panic'y unix cm code, but I'm not sure what I should be asking, do you or others mind providing insight? [10:54:38] Not at all. [10:54:53] First question - panicky or ooopsy? [10:55:37] upon kextload'ing the kernel module, OS X panic'd [10:55:54] src/packaging/MacOS/decode-panic output is http://pastebin.ca/1470098 [10:56:13] Personally, I wouldn't try and do kernel development on OS initially [10:56:33] my src/afs/afs_server.c patch is at /afs/tproa.net/users/s/u/summatusmentis/current_afs_server.diff [10:56:44] --- deason has left [10:56:57] well, after the first crash, I've been reconsidering my workflow so my testing is in a VM, but I don't have that figured out quite yet [10:57:26] If you use a Fusion VM, then there are a lot more tools available to you, including live kernel debugging. [10:57:40] --- deason has become available [10:57:53] I do have fusion, I don't have a VM setup yet [10:57:59] or rather, set up* [10:58:14] Can you give me a few moments to stare at this packet trace some more, and then I'll take a look at your patch. But from the looks of things it's trying to do something it shouldn't with a socket. [10:58:40] sure [10:58:43] RX wise, it is the call ID that ties the abort in with the corresponding RPC, isn't it? [10:59:12] --- dev-zero@jabber.org has become available [10:59:15] --- dev-zero@jabber.org has left: offline [11:00:02] pardon? I'm not sure what you mean [11:00:25] That was for Derrick/Jeff/anyone who's stared at too many RX packet traces in their lives ... [11:00:31] oh, got it [11:01:30] Simon, can you put the packet trace somewhere I can stare at it? [11:02:22] Yup. /afs/inf.ed.ac.uk/user/s/sxw/Public/packetrace.pcap.gz [11:02:31] Oh, with one more t, sorry. [11:02:50] If you could let me know what you're looking at, that would be good, so I know if there's anything I've missed ... [11:06:02] --- mmeffie has left [11:06:12] > RX wise, it is the call ID that ties the abort in with the corresponding > RPC, isn't it? yes [11:06:36] jake, you're passing something bogus to soclose [11:07:11] e.g. rxk_FreeSocket [11:07:12] soclose is the same thing as rxk_FreeSocket? [11:07:15] yeah, ok [11:07:44] which means, I'm not actually creating a socket [11:07:58] or, at least not a socket in the way rxk_FreeSocket wants it [11:08:17] i would.... 1) create a global for the socket 2) never Free it until shutdown. 3) put a lock around it so only one caller does the GetServerPeers at a time [11:08:40] also, check for s not NULL in the return of rxk_NewSocket [11:08:58] the socket changes per server address though right? [11:09:18] oh. you're doing it wrong. [11:09:31] doing the socket wrong? or the whole thing? [11:09:44] i see. NewSocket's port is the port on your end, not what you want to talk to. [11:10:05] so fix that (in fact, you can use a random port) and then do what i said [11:10:43] NewSocket takes "the port of my end of the connection" and NewSocketHost takes "the ip and port of my end of the connection". neither takes "my peer" [11:11:06] so you create one socket, once, and use it until the cache manager shuts down [11:11:19] as long as you ensure only one thread at a time ever runs this code, you don't need a new lock [11:11:54] i suggest making sure e.g. the xserver or xsrvAddr lock is held when this code is run, and not introducing a new lock [11:12:01] --- stevenjenkins has left [11:12:13] looking at code in original openafs to confirm which [11:12:24] I'm rather confused about this packet trace. As far as I can see (using wireshark's search functions), we have vnodes upon which fetchstatus gets successfully performed, and then don't feature again in the packet trace until FetchData64 starts aborting for VNOVNODE. [11:13:38] which implies they went away via elsewhere? [11:13:45] can you influence a fileserver to log? [11:13:49] There should be no elsewhere. [11:14:16] One option is that wireshark's decoding of Rename is broken. Let me just check that first. [11:14:36] jake, xserver and xsrvAddr are both held write when afs_SetServerPrefs is called. as long as you hold both when you create the socket, you're fine [11:14:57] Rename will not include the FID of the object being renamed. It contains the FID of the parent directory and the name of the object being renamed [11:15:35] --- stevenjenkins has become available [11:15:42] the Old and New FIDs are both directories. [11:15:49] and as far as kernel *debugging*, osx is a crappy platform for it unless you run osx server in a vm and do remote kernel debugging of the vm [11:15:51] Bah. [11:16:00] do you know the name of the object that 536870924.304640.455003 refers to? [11:16:06] Sadly not. [11:16:16] Not unless you can tell me a magic way of determining it :) [11:16:27] then we need to parse out the directory data from the packet trace [11:16:31] derrick, is there an easy way (or a way at all) to build for linux on OS X? [11:17:32] just set up a linux vm and build in the vm [11:18:01] > Not unless you can tell me a magic way of determining it dump the volume, look for that vnode, and walk upward? In fact, if you dump the volume, afsdump_scan has a mode that will print the (relative) paths of all of the vnodes [11:18:23] I thought the problem at this point was that the vnode no longer exists [11:18:24] Sadly, it probably doesn't exist anymore. Hence the VNOVNODE errors. [11:19:01] does blogbench keep any logs of what failed? [11:19:27] Maybe it exists in a backup volume? [11:22:14] Sadly not. Won't be in a backup volume because blogbench has just made it. [11:22:27] Actually, if I start from scratch, I really should see creates. [11:22:39] I'm going to guess that the file name is article-29.xml [11:22:50] .tmp [11:22:56] And that it is a rename race? [11:26:36] possibly [11:28:16] the Unix CM appears to have the problem that a VNOVODE does not mark the status object as bad and so subsequent requests send additional FetchData RPCs to the file server. [11:29:25] I have a packet trace with more useful information... [11:29:55] I don't believe the unix CM has the concept of a vnode that exists but is "bad" and so can never be accessed. We could evict the "bad" entry from the vcache, but that would make things worse, not better. [11:30:54] The problem is that if FIDs are changing on renames, what happens to all of the things that have that file open? They're not going to do a lookup again (which is what they have to do to pick up the new FID) [11:31:21] localhero should be able to help if the issue is renumbering. if it's not, we need to force revalidation [11:31:39] Nobody tells you what the FID has changed too, is the problem. [11:31:46] What is this about fids changing on renames? [11:31:53] --- cclausen has left [11:31:55] Jeffrey's original theory. [11:31:57] --- cclausen has become available [11:32:07] did we ever verify the fid changed? [11:32:50] Not yet ... [11:32:51] I was able to show it from the Windows client. Jeff H. said I was wrong and that what I was seeing was impossible. [11:33:42] I modified the Windows client to not be affected by a FID change and went on with my life. [11:34:27] Simon, you have a new trace that includes the CreateFile calls ? [11:34:39] Indeed. Same place, with a 2 in its name. [11:35:04] You're looking for comment-22.xml.tmp ( 242822.4554509 ) [11:35:04] Well, I've read the code, and it doesn't change the fid of a file that's being renamed. If you think otherwise, please read SAFSS_Rename() and tell me where the change is happening. [11:35:27] And it does appear to be there, and then gone, following a rename. [11:37:57] you are seeing exactly what I was. If you now examine the directory you should find the file name with a new fid [11:39:50] The thing is, it's not reproducible behaviour. It doesn't happen every time you do a rename. [11:41:05] a new vnode will be created as part of the rename if the file server believes the old and new directories are different [11:41:13] SHOW ME WHERE THAT HAPPENS! [11:43:18] In my code, only SAFSS_CreateFile, SAFSS_Symlink, and SAFSS_MakeDir ever call Alloc_NewVnode [11:43:23] I can't reproduce that from the command line either. There's got to be more to it than that. [11:46:06] What would happen if we had a racing create and rename? [11:46:08] Note that if you move a file from one directory to another, its parent vnode pointer changes, but that is not the same as its fid changing. [11:46:26] Shouldn't happen; they should both have the directory locked. [11:48:13] does the fid it changed to look familiar, by chance? [11:48:56] something like, if you renamed over a file, it didn't somehow get the fid of the clobbered file, did it? [11:48:59] I haven't tracked down the FID its changed to, yet. Sadly, this benchmark creates many many directories with all the same file names. [11:51:07] So, let's be clear. The FID of a vnode cannot change. No way, no how. If it doesn't have the same FID, it _is not the same vnode_. Period. If there is a bug, you could end up with a directory entry pointing to the wrong fid, or you could end up with a vnode index entry with the wrong data, but the fid of a vnode _cannot change_. [11:51:37] Okay, so here are the observed items from the packet trace. [11:51:48] Think for a moment about what a FID is, and how it is used, and especially about how the namei on-disk structure works, and you'll see that I am right. [11:52:49] The file 'comment-22.xml.tmp' is created, and returns a FID. FetchStatus on that FID returns sucessfully. 'comment-22.xml.tmp' is renamed to 'comment-22.xml'. FetchData on our FID returns VNOVNODE. [11:53:20] The operations that would result in the mapping from a client OS inode to an AFS FID changing, like moving a file from one volume to another or splitting a file that had multiple links in a directory because you moved one of them to another directory, are _not_ handled by the fileserver. The fileserver returns errors in those cases. [11:53:50] We're not looking at what's happening in the client, we're looking at what's happening on the wire. [11:54:28] OK, so that doesn't mean the vnode's fid changed; it means the vnode ceased to exist. That suggests that the link count was reduced too far. [11:55:04] could the program be creating hard-links to files and then attempting to move them to different directories? [11:56:08] this is not a rename across directories. [11:56:16] It could be, but that doesn't result in the fileserver deleting vnodes; it results in the fileserver returning EXDEV [11:56:29] ok, nevermind [11:56:45] Oh, and yes, sorry ==Jeffrey - the source and target directories for the rename are the same. [11:57:12] I guess what I need to do is see if the directory which the rename takes place in is still on disk, and see what the directory data looks like. [12:04:07] Okay. So I know what's going on now. [12:04:16] Those with the packet trace can follow along at home... [12:04:36] In frame 37285 comment-22.xml.tmp is created [12:05:14] In frame 37394 it is renamed to comment-22.xml [12:05:18] And where is this mythical packet trace? [12:05:40] In frame 50984 comment-22.xml.tmp is created (lets call this FID B, and the first one FID A) [12:06:12] In frame 51206 comment-22.xml.tmp (FID B) is renamed as comment-22.xml [12:06:30] OK, so FID A becomes invalid. [12:06:34] In frame 51217 something tries to access FID A and gets VNOVNODE (because we just stamped all over it) [12:07:16] So, we're back to this being something up with the cache manager. Either it's not detecting open files correctly, or it's not invalidating dentries properly, is my guess. [12:10:39] the "mythical packet trace"'s path was specified earlier. i assume the path is still the same one as before, and i *know* this muc is logged. [12:11:45] simon, in between those two instances of comment-22.xml.tmp being created and renamed, there are several instances of the same behavior. I'm looking for the RemoveFiles and whether the parent directories are the same [12:12:55] i looked; I couldn't find it. Maybe I didn't go back far enough in time. [12:13:15] /afs/inf.ed.ac.uk/user/s/sxw/Public/packettrace2.pcap.gz I think [12:13:29] What RemoveFile ? [12:14:15] i assume there's no RemoveFile, and the "/* If the new name exists already, delete it and the file it points to */" code applies [12:14:38] There is no removefile [12:14:48] Removing something when you rename over top of it _is_ handled by the fileserver; it couldn't be atomic otherwise (actually, it's still not truly atomic, because it's possible for the remove to succeed and then the subsequent add to fail, but if both operations succeed, then no client will ever see the intermediate state) [12:14:50] And there are no other rename operations operating on that filename, in that directory. [12:15:02] there should, however, be a callback broken on the deletd file. [12:15:13] I believe that depends on what fileserver version you have. [12:15:29] We're 1.4.8, IIRC (and yes, I'm sad about that) [12:15:47] 1.39 (shadow 30-Oct-02): if (newfileptr && doDelete) { 1.39 (shadow 30-Oct-02): DeleteFileCallBacks(&newFileFid); /* no other references */ 1.39 (shadow 30-Oct-02): } [12:15:52] and i bet 1.39 is "retabify" [12:16:09] yeah. that code was there in 1.1 [12:16:50] so, anyway, there should be a callback being broken if the removefile is implicitly done in the fileserver. [12:16:56] and your choices are [12:17:01] we get it and mishandle it [12:17:04] we don't get it [12:17:10] Right; in sufficiently old code, you get a callback break on the deleted file only if its link count did _not_ drop to 0. [12:17:25] I can't see a callback break in that packet trace. [12:17:36] --- stevenjenkins has left [12:18:02] --- dev-zero@jabber.org has become available [12:18:11] the doDelete behavior hasn't changed. if you didn't get it before, you don't get it now [12:18:22] in fact: doDelete = 1; } else { /* Link count did not drop to zero. * Mark NewName vnode as changed - updates stime. */ newfileptr->changed_newTime = 1; [12:19:03] Note that DeleteFileCallbacks() does not break callbacks; it discards them [12:19:03] oh. nm, doDelete *deletes* the callbacks, it doesn't break them, anyway [12:19:10] --- stevenjenkins has become available [12:19:26] so fine. when the parent's callback is broken, we need to refetch it. [12:19:43] i bet we screw up in localhero [12:20:00] But, as far as I can see, we're not actually getting _any_ callback breaks as a side-effect of the rename. [12:20:53] there are potentially four objects involved. The source directory, the target directory, the object being renamed and the object whose name it might take. [12:20:59] i don't see how we can bomb from rename before " /* break call back on NewDirFid, OldDirFid, NewDirFid and newFileFid */ " [12:21:10] --- dev-zero@jabber.org has left: offline [12:21:31] NewDirFid always get a cb broken. if old and new dir are the same, that's sufficient and we're done [12:21:45] > we're not actually getting _any_ callback breaks Not surprising. [12:22:04] is this a volume root that's the parent? [12:22:19] No. It's a directory below the root. [12:22:23] ok [12:22:31] And the renames concerned are happening further below that. [12:22:38] the CM is being given status info for source and target directories so it has an updated status. the vnode that is being renamed should not be having the status for that object change as a result of the move. [12:23:01] wait. BreakCallBack is given the caller. so we don't break to it. [12:23:05] But the vnode that's being stomped on is surely having its status changed? [12:23:22] You only get a callback break on the object being renamed if it is a directory. You only get a callback break on the object being overwritten if its link count did _not_ go to zero. And you don't get _any_ of these callback breaks if you are the client that did the rename, since you are presumed to be able to do the updates yourself. [12:23:39] > You only get a callback break on the object being renamed if it is a directory. [12:23:48] Ah. Okay. So as the client, I'm suppose to deal... [12:24:03] untrue if *you* are doing the renaming, apparently [12:24:06] --- deason has left [12:24:08] since flag is not true, and [12:24:15] * Break all call backs for fid, except for the specified host (unless flag [12:24:37] Derrick: Yes, we break the object being renamed if it is a directory, because we changed its contents (we changed the .. link). But, if you read to the end of my zgram, you see I say you get nada if you are the requesting client. [12:24:43] so this is a client issue [12:24:56] --- deason has become available [12:25:03] not sending a callback for the object being deleted seems wrong as there is no method for the CM to know otherwise that the object was involved at all. [12:25:09] Oh, actually, if we break the object being stomped on, you should get that, because flag is true in that case. [12:25:19] Well, you don't :) [12:25:25] if we break it. [12:25:26] we are not seeing that callback in the packettrace [12:25:30] which we don't, so you don;t [12:25:38] Of course there is such a method. The CM knows the contents of the target directory, since it is updating it, and so knows what fid was there. [12:25:51] --- stevenjenkins has left [12:26:21] if doDelete is true, we delete. if there are other links, we break. [12:26:22] why would the client know the contents of the target directory? There is no requirement that the client know that and even if it thought it did, there would be a race condition [12:26:36] the client knows the contents because it has a copy of the directory object [12:26:44] IIRC, we discussed this before, and determined that it was OK not to break the object being deleted if its link count went to 0, because you'd get a break on the directory (or else you had no CB on the directory), and trying to access the object would get you VNOVNODE. And besides, it's not like you can fetchstatus it. [12:26:48] as long as it had the most recent version cached. [12:27:00] which is not guaranteed [12:27:23] yes it is [12:27:30] the client can check to see if that is true by checking to see if the DV of the target directory incremented by only one [12:27:30] It is on Unix. [12:27:33] Ah, but if it doesn't, then it has to fetch it to do a lookup. And in _this_ case, the source and target dirs are the same, so we know the client has a copy. [12:27:57] it is on unix. or you better get a breakdelayedcallback when you try to rename. since you ned to have a stat'd parent when you start [12:28:09] you know the client had a copy of the directory. you do not know that the content of the directory is current. [12:28:18] why don't i? [12:28:39] well, to the extent that i didn't *just* get a break delayed callback on the parent, why don't i? [12:29:00] --- stevenjenkins has become available [12:29:25] even if there's a race, i don't care: when the breakdelayedcallback *does* come, i eject my localhero patched copy and refetch [12:29:38] And local hero won't patch if the DV has changed by more than 1. [12:29:43] er, when the "breakcallback" ... [12:31:26] As far as I can see, we don't handle the case at all when you rename on top of a file which is open. [12:31:46] we are making an assumption that the client is able to determine what the FID of the squashed object is prior to issuing the RenameFile RPC. That requires that the directory be up to date. It is possible that it won't be since callbacks could be in flight. [12:31:50] wasn't there a bug open for that, that we should sillyrename then? [12:32:05] I suspect that the lack of sillyrename may be our problem. [12:32:24] I agree that localhero will not perform the local directory update if the DV change is not 1 [12:32:27] our problem, my problem, whatever. [12:33:14] we don't need to know the fid of the squashed object. we need to know that if that fid appears in the directory for the newname, we need to act. otherwise, the action will happen anyway if a racer's callback comes in [12:33:58] If we're going to sillyrename, don't we need to know the name so we can do the rename before the server stamps on it? [12:34:15] s/name/FID/ [12:34:27] if we have it open, we know the fid. [12:35:05] The thing is, we shouldn't _need_ a CB break when an object's link count goes to zero. RemoveFile doesn't do it either. [12:35:19] True, but in order to know the name, and thus whether there is a collision, we need an up to date directory. [12:36:56] 1. CM A creates "foo1" 2. CM A starts to process a rename of "foo1" to "foo2", and doesn't find "foo2" 3. CM B creates "foo2" [12:37:21] 4. as callback is in transit, CM A issues RenameFile(foo1, foo2) to file server [12:37:50] Yes, and? CM B gets a callback break on the target directory [12:38:04] 5. CM A receives callback break on the directory 6. CM A receives response to RenameFIle [12:38:05] Your scenario is no different from CM a issuing a RemoveFile(foo2) [12:38:35] I really think the core issue here is sillyrename support. Which has only ever worked when the CM removing the file is the one that has it open. [12:38:42] If (5,6) actually happen in that order, then by the time CM A is processing the response to the RenameFile, the vcache in question is no longer stat'd [12:38:46] this is no different than me deleting your file. the only issue that can be dealt with from the same CM that initiates [12:39:05] Yes, I'm fairly certain your issue is lack of sillyrename support. [12:39:11] as am i [12:40:10] The rest of this is about convincing jaltman that we do _not_ need to break callbacks on objects whose link count goes to 0, and that there is nothing special about an object whose link count goes to 0 because it was renamed over vs being removed, and that yes, we do have a cache consistency model that addresses this sort of race. [12:40:10] The question is whether I can implement sillyrename without it being racy. I suspect not. [12:40:40] is "sillyrename" a technical term? [12:41:07] it's a term which came from nfs, istr [12:41:42] search http://nfs.sourceforge.net/ for "silly rename" [12:41:56] sillyrename is inherently racy. but you can fix the part that matters. particularly, you can notice that the target is open, sillyrename it, then go back to the start to try again (which I think we might even do for unlink). And, if you find the target is not open or does not exist, you can apply locking to prevent such a file from becoming open on your client (possibly appearing first) during the rename. [12:42:43] You cannot prevent someone else from renaming one of your open files into the target name between when you decide the rename is safe and when it actually happens. [12:43:02] But then, you can't prevent someone else from removing or renaming over one of your open files, so... [12:43:17] (there is no way to win here) [12:43:41] (at least, not without propagating open-file state to the server, and that way lies madness) [12:50:50] hence, it being silly [12:54:42] Ironically, I suspect that this is actually a bug in blogbench. I don't think it intends to still be writing to a file that's been renamed out of existence. [13:01:09] I'll take a look at coding this, unless anyone else wants to... [13:01:50] I'm going to have to go back and restore the Windows CM code from February and recreate the problem I was seeing there. [13:02:03] That would be interesting to see... [13:02:36] It won't be today though [13:02:46] its already 4pm [13:03:17] derrick, shadow_gmailcom> 2) never Free it until shutdown. [13:03:33] how does shutdown of the cache manager happen? [13:03:37] or where should I be looking? [13:05:04] You unmount /afs [13:05:45] you can look around the code where afsd handles the -shutdown parameter [13:05:56] Really, you should be looking at 2-3 things: - what happens when the filesystem is unmounted - what happens when someone calls afs_syscall(AFSCALL_CALL, AFSOP_SHUTDOWN) - what happens when someone tries to unload the module [13:05:57] it makes some syscall, I forget the name [13:06:10] yeah, that's what I was thinking of [13:06:49] --- Jeffrey Altman has left [13:07:55] ok, thanks [13:09:26] jake, shutdown starts when the parm == AFSOP_SHUTDOWN case is called in afs_call.c [13:09:41] yeah, just got there, thanks [14:07:48] --- asedeno has left [14:44:29] --- asedeno has become available [15:47:54] --- Jeffrey Altman has become available [15:49:06] --- deason has left [15:58:51] --- Jeffrey Altman has left [16:02:48] --- asedeno has left [16:02:55] --- asedeno has become available [16:07:30] --- Jeffrey Altman has become available [16:21:57] --- Jeffrey Altman has left [16:31:58] --- Jeffrey Altman has become available [16:32:50] --- Jeffrey Altman has left [16:39:50] --- shadow@gmail.com/owlFA95C1D6 has left [16:47:32] --- Jeffrey Altman has become available [17:10:59] --- Jeffrey Altman has left [17:57:12] --- Jeffrey Altman has become available [19:34:31] --- deason has become available [21:09:16] --- shadow@gmail.com/owl6EE6455B has become available [22:11:20] bleah. not fun to find out something has been broken since November. [22:11:39] --- Jeffrey Altman has left: Replaced by new connection [22:54:11] --- dev-zero@jabber.org has become available [22:54:14] --- dev-zero@jabber.org has left: offline [23:20:45] --- cclausen has left [23:22:40] --- deason has left