[01:32:00] --- Russ has left: Disconnected [05:28:56] --- meffie has become available [07:34:39] --- deason has become available [07:50:31] --- reuteras has left [09:13:54] --- mfelliott has become available [09:50:33] --- matt has become available [09:52:05] question for the author(s): it looks like it might be nice to support, at worst, internal as well as external ordering in the interface of opr_rbtree? [09:52:05] --- steven.jenkins has left: Lost connection [09:56:55] --- steven.jenkins has become available [10:19:17] --- rra has become available [14:05:22] --- meffie has left [14:16:17] Matt: I'd be happy to, but I'm not sure what benefits an API consumer would get from the different ordering? [14:30:31] Er. What I meant to say is, the current opr_rbtree_insert is really the insert-at operation. And we don't have the internal comparison function, so no rbtree_insert and rbtree_lookup. I have coded those, and added a size metric--the code would work exactly the same unless using those two functions ("internal ordering")--they assume a non-null comparison function is passed to opr_rbtree_init? [14:31:08] That sounds fine. [14:31:28] Ok. Thanks for doing this. It's great to have. [14:31:39] My intention on having an external comparison function was to avoid the speed impact of calling out to a function pointer when searching the tree. [14:31:54] If the search routine is locally written, then the compiler can inline the comparison, and you'll go much faster. [14:32:23] It makes sense, so I just want to use it with a less inside out interface from outside rx [14:32:31] Sure. [14:33:37] I'd like to see us support both mechanisms - so I do view the ABI required to handle locally written search functions as being "stable", but I'm also happy to handle the case where people don't want to go to that effort. [14:34:20] Yes, I'd just like to change the signature of opr_rbtree_init, and rename opr_rbtree_insert to opr_rbtree_insert_at (or something else, if you prefer). [14:34:36] I thought _parent, but _at is shorter. [14:35:58] (And add an opr_rbtree_lookup) [14:36:00] Either would be fine by me. [14:36:16] And changing opr_rbtree_init is fine too. [14:36:20] Ok, I'll send it this weekend, I need to unit test. [14:37:09] Tests would be great! [14:37:24] rx seems to test stuff quite thoroughly ;) [14:40:21] Only in ways that come back and bite you in 4 years time (see idle dead for details) [14:42:12] Yeh, been reading that. Sorry. [14:43:16] The thing is, the problem with idle dead lead us down a huge garden path with busy calls. Turns out the reason that we were suddenly having a problem with servers having a huge number of busy call channels is because of client-side idle dead support. [14:43:57] And the question now is, do you just rip it all out and go back to what we had in 2008, and start again from there... [14:44:37] I think so [14:45:25] I've proven to at least myself that idle dead timeouts do leave the CM in a state that is out of sync with the server [14:45:38] and no method for it to determine that it s [14:45:41] is [14:49:32] (The reasoning you sent sounds convincing.) [14:55:55] I'm certain that there is a problem. I'm just a bit worried about throwing out the baby with the bath water. I'm not clear on how many people rely upon the current idle dead support to keep their cells operational. [14:56:36] I don't think it matters. [14:56:51] the cell is not reliable the way it is [14:57:39] canceling the call must be done on the file server. [14:58:20] Cancelling a call which can have side-effects must be done on the fileserver. [14:58:45] I think clients can still abandon calls which are without side-effects, although doing so will make unhappy fileservers unhappier. [15:00:43] I disagree because it not only makes the file servers sadder but also eventually prevents the client from issuing calls to the file server when all calls are busy [15:00:51] call channels are busy [15:01:54] Well, I think part of it is that the way that we handle busy call channels needs to be more intelligent. If we have an incoming packet with a higher call number on a channel that is in use, then it can only mean that the client has chosen to abandon the call that's on that channel, and the server should do the same. [15:02:35] but that is a server side fix and if a server side fix is going to be deployed it is the server canceling the call [15:02:44] But implementing that requires significant changes, both to the way that RX handles its call allocation and reference counting, and to the way that the fileserver is structured (so that RX can terminate fileserver threads if they are handling servers which have been abandoned) [15:02:47] once that is deployed the other fix becomes unnecessary [15:03:17] No, the server side fix allows a client to cancel a call without repercussions. [15:03:35] (_providing_ that call doesn't modify server state) [15:06:29] if a packet is received on a call channel with a higher call number, to properly process the call cancelation requires the same ability to terminate a thread [15:08:08] Yes, it becomes a question of where you put the watchdog - on the client, or on the file server. [15:08:37] In this case, the client has the ability to make better decisions. It knows whether there are other replicas it should try, and whether it is worth hanging around waiting for the server to recover. [15:09:19] (although I guess the flip side is that the server probably has better knowledge about what storage is attached, and what kind of response time is "reasonable") [15:09:28] the client can still failover if the server cancels with an RX_CALL_TIMEOUT [15:09:49] the server knows whether or not there is a real problem or not [15:10:14] the client side timer is based on time of call issuance and not on when the call actually begins processing [15:10:31] let alone when the call is able to perform I/O [15:36:02] hum. [15:37:27] It seems like the client -ought- to be able to make a safe decision. Are there call/op specific coherency steps it could take that would remove peril? [15:38:56] In the case where the RPC modifies server state, I don't think so. The client has no way of knowing whether the operation it is choosing to abandon would have succeeded on the server or not. And once the call has been turned around, the server has no way of knowing that the client has abandoned it. [15:43:53] --- deason has left [16:00:56] Well, the client knows whether its operation is idempotent. [16:01:53] If it is not, failing such a call is serious. What if it assumes it has no registration? [16:02:50] Yes. Those are, I believe, the only operations for which it is safe to use idle dead for. [16:08:39] So, that's saying 'non-idempotent calls must not fail'? [16:09:01] I dont' see how to assert that. We can only hope for that. [16:09:44] its not a question of a call failing. its a question of a call being canceled [16:10:08] the problem is that there is no mechanism for a client to cancel a call and tell the server to rewind the transaction [16:10:20] I don't understand the distinction. And...AFS isn't transactional. [16:10:28] bingo!!! [16:10:57] That's not a bingo. You can't make it a protocol violation for the client to abandon an operation. [16:11:29] I absolutely can. The protocol doesn't support a client abandoning a call. [16:11:36] that was added in 2008 [16:11:43] doing so was a mistake [16:12:15] Ok, so you can make the client keep trying. But it seems to violate systemic reality/constraints. [16:12:41] this is not a question of the client retrying. [16:13:26] It isn't? [16:14:20] I'm not trying to be argumentative, just trying to understand the consistency angle. [16:14:55] if I ask the file server to create a file and then walk away from the call, the client doesn't know the outcome of the call. it may have succeeded or failed. if it succeeded, the DV on the directory has changed and the server thinks the client has a callback registration. If it failed, the client has no callback registration. The client walking away from the call thnks it has a callback registration and believes the DV is the original value. [16:15:38] Why should the client walking away from the call believe it has a registration? Removing that doesn't fix the consistency problem (!consistency). [16:15:42] the client has no reason to believe that it needs to ask the server for new status info [16:16:22] because the failure code of walking away is the same as if the server canceled the call due to a timeout. RX_CALL_TIMEOUT [16:16:23] i mean, doesn't removing that fix the consistency problem ? [16:17:18] If the server cancels the call due to timeout, it's promising that the client's operation had no effect? [16:17:20] Yes. It's a rather complex fix though, because it makes the retry loop much larger. [16:17:44] if the server cancels the call it has not been processed [16:17:49] there is no state change [16:18:23] Simon can you describe the application failures that Edinburgh has been seeing due to creating a file failing with EEXIST ? [16:18:50] here goes, roughly. [16:18:53] Ok, thanks. I'm just going to say, my intuition is, the longer loop may be more in line with the real semantics, but I've learned what I can. [16:21:07] Client sends a create file request to the fileserver. Fileserver receives that request, and starts breaking a callback to satisfy it. Whilst the callback break is occurring, idle dead on the client times out the call. In the mean time, the fileserver breaks the callback, creates the file, and attempts to return success to the client. The client throws the packet away. We then enter the retry loop in afs_Analyze, and the client retries the create file request. This attempt fails with EEXIST, and the client returns that error to the application. [16:21:49] I meant, why does the EEXIST break the application? [16:21:51] Critically, the client believes that its current directory state (without the file) is valid. Any further attempts to create the file get an EEXIST from the fileserver, despite the file not being visible on the client. [16:22:28] Because the application attempts to stat the file, discovers that it doesn't exist, so attempts to create it, gets told that it does. Attempts to open it, gets told that it doesn't exist. [16:22:53] Application tells the filesystem to please make up its mind, and gives up in disgust. [16:24:41] The windows cm handles that by saying that if the file didn't exist prior to the CreateFile and EEXIST is returned that the local data must be wrong and forces a rebuild of the directory. However, the EEXIST still gets returned to the application because the application's attempt to create the file failed. [16:25:36] This breaks the Windows Stress Test which assumes it has complete control over the directory name space it is working in and there must not be any other apps modifying the contents. So the EEXIST means the client or the file server is bust. [16:58:14] Anyone care to talk more about ia_net{,mask}? It seems this is a historical artifact from when class A/B/C networks were actually all that was used and there was not necessarily a distinct subnet concept. [17:06:27] it is historical, that is true [17:07:33] server selection on unix still relies on it and there maybe be something in ubik. I haven't looked in a while. [17:07:42] I patched it in the freebsd ports collection by just changing the check for (ip&mask)==net to always be true. I'm probably going to submit a patch to gerrit to just remove the check entirely, unless convinced otherwise. [17:09:05] (The context being that freebsd removed those elements from the struct in_ifaddr) [17:09:26] I understand. you explained that a few weeks ago [17:09:38] or what feels like months ago to me [17:12:20] Fair enough. So in my mind it is a question of adding more OS-conditionals versus removing a check of a historical field. [17:12:56] what I need to understand is what the implication of that removal is on current deployments [17:13:53] if there are none then the answer is easy. if there are implications, does that mean we have to not deploy fbsd ubik servers mixed with others? [17:16:04] I feel like there would only be implications for people with very weird network configurations, but I have no way of surveying what configurations are actually seeing use. [17:16:52] surveys are not important. just need to walk through the possibilities. [17:17:02] if it is possible, some site is doing it [17:17:57] we don't support mixed deployments of db servers so we can make such a change at a major version number change [17:19:18] Right. So such a patch could go on master, and the question is whether it could be merged to 1.6 [17:28:54] a patch could go on master with documentation of the impact [17:39:19] --- rra has left: Disconnected [18:01:12] --- Russ has become available [18:08:01] Hmm, though IN_CLASSA() is still around. So maybe I can be lazy and just swap FBSD over to a different case of afsi_SetServerIPRank() implementation. [18:09:42] does IN_CLASSA() actually work? [18:09:59] #define IN_CLASSA(i) (((u_int32_t)(i) & 0x80000000) == 0) [18:11:15] ok [18:13:49] Ouch, netbsd has: 162 * By byte-swapping the constants, we avoid ever having to byte-swap IP 163 * addresses inside the kernel. Unfortunately, user-level programs rely 164 * on these macros not doing byte-swapping. 165 */ 166 #ifdef _KERNEL 167 #define __IPADDR(x) ((uint32_t) htonl((uint32_t)(x))) 168 #else 169 #define __IPADDR(x) ((uint32_t)(x)) 170 #endif 171 172 #define IN_CLASSA(i) (((uint32_t)(i) & __IPADDR(0x80000000)) == \ 173 __IPADDR(0x00000000)) [20:26:41] --- jaltman/FrogsLeap has left: Disconnected [20:29:57] --- jaltman/FrogsLeap has become available [21:56:02] --- jaltman/FrogsLeap has left: Replaced by new connection [21:56:03] --- jaltman/FrogsLeap has become available