[00:32:18] --- reuteras has become available [00:56:23] --- haba has become available [01:41:35] --- Russ has left: Disconnected [03:14:30] --- Simon Wilkinson has become available [03:22:59] --- Simon Wilkinson has left [03:49:09] --- jaltman has left: Replaced by new connection [03:49:11] --- jaltman has become available [04:45:36] --- reuteras has left [05:02:47] --- jaltman has left: Disconnected [05:12:13] --- jaltman has become available [05:23:15] --- abo has left [05:23:45] --- abo has become available [05:46:31] --- Simon Wilkinson has become available [05:50:27] --- Simon Wilkinson has left [05:53:17] --- meffie has left [05:56:03] --- Jeffrey Altman has become available [05:58:32] --- Simon Wilkinson has become available [05:59:30] --- Jeffrey Altman has left [06:21:38] --- jaltman has left: Disconnected [06:55:13] --- jaltman has become available [06:55:29] --- deason has become available [07:51:55] --- jaltman has left: Disconnected [08:03:06] --- Jeffrey Altman has become available [08:05:02] --- kaj has left [08:05:08] --- kaj has become available [08:07:20] I think daniel richard's problem would be lessened somewhat if we were a bit smarter about doing InlineBulkStat's [08:07:45] Derrick and I have just been discussing this. [08:08:27] If we know that we don't have 'r' on a directory, we don't [08:08:32] I was thinking the same thing; in fact, I'm surprised that doesn't already save him. [08:08:42] need to stat every file within that directory to confirm that's the case. [08:09:13] Careful. That's true on today's OpenAFS servers, but not on "foreign" servers or if we introduce per-file ACL's [08:09:24] My suspicion is that the error return from FetchStatus means that we don't set CStatd, which means that we always keep trying. [08:09:24] the issue is GetVCache calls FetchStatus directly. it never bulks [08:09:52] Yeah. Marc's already got capabilities for per file ACLs. Foreign servers have CForeign. [08:10:03] no, it's that gnu find closes the dir before doing the stat's [08:10:04] Oh; we can bulkstat all we want as part of the readdir, and the things we don't have access to are still not going to become cstatd. OK. [08:10:06] or at least, partway through [08:10:20] if you hold open an opendir() on the directory while find is executing, it goes much much faster [08:10:30] * haba did the OpenAFS Workshop registration and paypal dance (and they can't handle phone numbers in the format "+46.....") [08:10:46] * Simon Wilkinson wishes he could make the workshop. [08:11:02] I'm still on the fence. I should figure it out today. [08:11:15] That and do my taxes. [08:11:22] Now I only have to get a flight and the other stuff. [08:11:36] a workaround for find could be to hold open the dir.... but it seems like we should just do the bulkstats if we notice the objects are being accessed in order or something [08:11:48] --- abo has left [08:11:51] * haba has to do taxes until 3rd of May here, too. [08:12:09] --- abo has become available [08:12:41] But now I have to go visit my parents. Have a nice day! [08:12:57] deason: Is the problem not that the fileserver throttles us, rather than the way we are stating, We shouldn't bother stating once we've discovered that the directory isn't readable (modulo the conditions mentioned above) [08:13:21] we wouldn't get throttled if we inlinebulk'd [08:13:35] which is a better way to avoid the issue, since lots of deployed fileservers will not change [08:13:55] i mean, i suppose we can also realize we don't have r. [08:14:01] Yeah. [08:14:01] and cope [08:14:19] Inline bulk would help, but means that we suddenly have to maintain state. [08:14:24] but in general inlinebulk would be my first perference regardless, since it buys us fewer round trips [08:14:42] So, how do you decide that inlinebulk is worth doing? [08:15:02] technically we could replace fetchstatus with inlinebulk of one entry on any server that supported it and never get an error. which is wrong, but... [08:15:03] do we know if we don't have r on something just from the readdir? I mean, symlinks and mountpoints don't throw a wrench into that? [08:15:41] i don't think we do know [08:15:48] Oh gah. They will. [08:16:20] i've been shying from trying to solve it the "do we have r" route for that reason [08:16:25] for inlinebulkstat, we could maintain some stats like the index of the last accessed object, and how many sequential objs we've accessed in a row [08:16:38] Define sequential? [08:16:40] what's "sequential"? [08:16:48] the order we return readdir results in [08:17:00] So, someone who does a sort first loses? [08:17:23] I don't see how else to make a determination on order for what FIDs to inlinebulkstatus [08:17:40] --- jaltman has become available [08:17:40] --- haba has left [08:17:41] I mean, if we're just guessing at the fids to fetch it's not very worthwhile [08:17:54] Fetch them all? [08:18:12] --- abo has left [08:18:21] --- abo has become available [08:18:28] --- jaltman has left: Replaced by new connection [08:18:29] actually, it depends if the cost is dominated by the io or the net. if it's the net, doing a bulk when we use but 2 of the results is still a win [08:18:29] --- jaltman has become available [08:18:53] that wouldn't seem good if we're looking at a 30k+ dir and the caller only stats like 1 [08:19:17] not even from the standpoint of the server response, but also callbacks [08:19:49] I assume readdir-order is the current order we choose for inlinebulkstatus, right? or something like that [08:19:58] (and we only do 30 at a time now, iirc) [08:20:36] I think you'd need to base it on the caller having looked up a certain percentage of the directory. [08:21:24] bulkstat is 30 entries. when you get a call for one you don't have, bulk the next 30 [08:21:36] well, in e.g. this specific case, we need to do it before we start getting throttled by the fileserver [08:21:56] we can't wait for e.g. 10% of the dir entries if we start getting throttled at 2%, because it will still take forever [08:22:41] I guess this depends on whether we are fixing this purely for the 'find' case, or whether we're trying to do something more general in terms of working out when to switch to bulkstat. [08:22:47] we could just immediately go to bulk [08:23:11] So getattr() always results in a bulk stat? [08:23:52] not always. not if it's cached already [08:24:01] does inlinebulkstat never get throttled like fetchstatus does? doing that just seems like doing '-aborttreshold 0' then [08:24:18] it gets throttled if there's an error. on the RPC, not on the stat [08:24:23] --- abo has left [08:24:27] right [08:24:40] the difference is the RPC gets an error if the RPC fails. the permissions conveyed by stat are orthogonal [08:24:46] --- abo has become available [08:25:00] so it's not -abort 0 [08:25:07] it affects nothing else. [08:25:31] and I suppose we're fine with that... it would seem less likely for something to get caught in an inlinebulkstatus loop if it returns success... [08:26:11] the throttling behavior is sort of sketchy anyway [09:08:00] if you are willing to maintain state, record on the parent directory the last user to stat on object in the directory, a counter of how many were performed, and a timestamp for the last stat. If the count is greater than N and time since last < S, attempt a inlinebulkstat for the next M entries for which there is not a valid callback. [09:08:37] state maintenance is icky. linux does it for the translator. it's... not nice [09:08:55] Wait, what state do we maintain for the translator? [09:09:12] the readdir pid thing? [09:09:27] the inreaddir state, so we don't deadlock on ourselves [09:09:30] Oh, that. [09:09:30] yeah [09:09:35] dude, you wrote it... [09:09:44] I know. [09:10:33] Sorry; currently working on a nasty 19000-way race [09:11:22] is fine. this isn't really urgent [09:14:04] looking at this i noticed we didn't quite dtrt on macos and *bsd. 1753 [09:20:01] --- jaltman has left: Disconnected [09:20:45] --- kaj has left [09:24:57] --- jaltman has become available [09:33:56] --- Simon Wilkinson has left [09:50:39] --- jaltman has left: Disconnected [10:33:22] --- Simon Wilkinson has become available [10:51:25] --- Simon Wilkinson has left [10:59:42] --- Simon Wilkinson has become available [11:33:45] --- Simon Wilkinson has left [11:47:01] --- Russ has become available [11:57:19] --- steven.jenkins has left [12:05:21] --- haba has become available [12:14:39] anyone know what this is complaining about? RXAFS_GetCapabilities failed with code 105 [12:15:15] it's the only problem i see in 1725; and it just creates a delay; accesses all still seem to work [12:15:28] VNOSERVICE is an idledeadtime timeout i bet [12:17:04] what version is the server? [12:18:26] I get it for a 1.5.73 and a 1.4.11, I think [12:18:48] and the delay is only like 2 seconds; wouldn't idledeadtime timeout be much longer? [12:19:35] the 1.4.11 will have a bug which can cause it for delays opening the window. but yeah, a real timeout should be longer than that [12:19:49] the window issue i have seen exactly once ever [12:20:04] well, reproducible in that case, but... [12:20:32] er, bad estimation, more like 10 seconds [12:21:10] and it's the only call that seems to be failing.... I don't see anyting immediately wrong with the caller though [12:22:02] I'm not too concerned with fixing it immediately; I just didn't know if it's something with the fs caps code or the uafs code [12:49:00] okay, I see what it is; afs_GetCapabilities holds afs_xserver; when we call GetCapabilities, the server responds with an InitCallBackState3 if it needs to, and our handler for that needs afs_xserver as well [13:10:00] --- haba has left [14:06:08] --- jaltman has become available [14:46:37] --- mdionne has become available [15:36:30] > okay, I see what it is; afs_GetCapabilities holds afs_xserver gerrit 1754 [15:59:10] That works. I had noticed a delay (using caps for a while), but didn't make the connection with this code. [16:26:17] --- deason has left [16:26:20] --- deason has become available [17:02:53] --- jaltman has left: Replaced by new connection [17:02:53] --- jaltman has become available [17:41:59] --- jaltman has left: Disconnected [18:15:32] --- mdionne has left [19:08:56] --- jaltman has become available [19:29:33] --- cudave has left: Disconnected [19:30:02] --- cudave has become available [20:01:35] --- Born Fool has become available [20:27:22] --- haba has become available [21:13:23] --- haba has left [21:16:52] --- Born Fool has left [21:34:07] --- haba has become available [21:58:26] --- reuteras has become available [22:10:24] --- kaj has become available [22:36:15] --- deason has left [22:56:00] --- kaj has left [23:44:30] --- kaj has become available [23:50:38] --- kaj has left [23:56:46] --- kaj has become available