[00:54:38] --- steven.jenkins has left [00:56:08] --- steven.jenkins has become available [01:02:38] --- sxw has left [01:16:21] --- Russ has left: Disconnected [03:26:58] --- jaltman has left: Disconnected [03:27:00] --- jaltman has become available [03:44:57] --- jaltman has left: Disconnected [04:52:28] > Well, I am kind of suspicious of the locking in GetVCache that looks right to me. [05:16:12] > We do a force unmount in all cases on some platforms, right? on macos this is true because of how finder unmount works. i need to see what i can figure out about it again [05:27:26] --- shadow has left [06:19:23] --- Simon Wilkinson has become available [06:24:52] --- mmeffie has left [06:25:07] --- mmeffie has become available [07:09:36] --- Simon Wilkinson has left [07:09:37] --- Simon Wilkinson has become available [07:15:53] --- jaltman has become available [07:23:42] --- deason has become available [08:03:23] --- reuteras has left [08:07:19] --- matt has become available [08:09:24] well good heavens--who is planning to rewrite SUPERGROUPS? [08:12:46] > > locking in GetVCache > looks right to me I just can't help but feel that using VOP_ISLOCKED() is something of an abomination, mostly. [08:17:45] I have a hatred for code paths that do that - it always seems like a bodge to me. [08:19:21] we don't control the vfs, so we have to cope with multiple codepaths that treat locking inconsistently [08:21:10] I presume VOP_ISLOCKED() tells you if the current thread holds the lock? not that it's just locked by *someone*? [08:22:40] in theory it's any thread (process) has it locked, not just current [08:22:52] Surely that's racy? [08:23:08] so if someone else has it locked.... [08:24:00] we *should* go back and figure out which paths can drop us here with a locked vnode, but it would be easier imo if we were otherwise further along [08:28:24] --- jaltman has left: Replaced by new connection [08:28:26] --- jaltman has become available [08:31:29] --- jaltman has left: Replaced by new connection [08:31:38] --- jaltman has become available [08:33:22] not racy, is what it is; derrick++ [08:34:03] the thing is, some of the things we work around like that eventually get fixed and we fail to notice and keep the poor behavior [08:34:16] i ejected a lot of macos "bad behavior" code frex recently [08:34:22] But presumably you can't tell the difference if you've arrived through a code path in which you already hold the lock, and when someone else holds the lock, but just happens to be executing simultaneously. [08:34:36] --- jaltman has left: Replaced by new connection [08:34:37] --- jaltman has become available [08:34:51] In the first case, you don't want to get the lock (you'd deadlock), but in the second you do want to (so you wait until the process you're racing against completes its work) [08:34:58] oh, we shouldn't be trying to use an idiom like that--I hope we aren't [08:35:02] probably. and it's not necessarily possible to tell with some locking primitives [08:35:14] that's not a supported use of vfs locking [08:35:29] "that" [08:37:41] there are just wrong behaviors in VOP_RECLAIM, and maybe related handling [08:37:50] elsewhere [08:38:07] we're out of sync with what vfs is doing now [08:38:17] yes, i figured we were. [08:38:31] i wanted to, like, get past other problems before revisiting [08:38:40] when ben swots that, life will get much better :) [08:39:01] things started getting wonky in 7.1 or so [08:39:41] but there are inconsistencies (that don't deadlock as far as I know) in 7.0 too ... [08:40:24] on smp, definitely could get incorrect results in iozone testing, for example [08:41:49] > and it's not necessarily possible to tell with some locking primitives fbsd definitely appears to keep track of the holder; but I dunno how possible it is to get it to tell you [08:42:34] we're supposed to use the kpi, and then... [08:42:59] or rather, we're supposed to use vfs correctly [08:47:49] derrick: unable to connect to www.dementia.org? [08:48:36] I think one of the reasons Linux hated us so completely a few years ago was our refusal to use the "proper" interfaces to the VFS. [08:49:38] "interfaces" [08:49:48] --- mmeffie has left [08:49:59] (unless those ones don't change as much) [08:52:10] --- mmeffie has become available [08:58:31] --- jaltman has left: Disconnected [09:15:17] --- jaltman has become available [09:16:53] yup, www.dementia.org is a vm on a server that's being rebooted with updates [09:18:15] > fbsd ... appears ... track of the holder It tracks the writer on read-write locks; exclusive on s/x, etc.. This is not always sufficient for what one wants, though it can help a lot. [09:19:21] > ben swots that The plan is to read up on the VFS and audit our locking, yes. I think I may go through and see what (else) matt left unimplemented for 80 when he added 70 support first, though. [09:36:21] --- Simon Wilkinson has left [09:37:01] --- rra has become available [09:43:08] --- jaltman has left: Disconnected [09:48:12] I made the cm work better on 70 than it did before, which was not at all. I did this -without- a perfect knowledge of fbsd vfs at either version, there is undoubtedly lots left to find and fix. [09:49:22] I didn't even do that much, in fact, overall. [09:54:38] the reclaim path is definitely broken in 8, and probably isn't in 7. HOWEVER, 7 does some -actually- unholy vop_lock stuff that does look in the hidden state of vnode lock objects, or at least it did. I don't think deeply inspecting that is the fastest clue as to how to fix the vnode locking in 8, clearly we should be looking at the bundled filesystems over and over ... [09:56:09] --- Simon Wilkinson has become available [09:58:32] Okay. But I don't really want to stumble upon more #if defined(AFS_FBSD80_ENV) /* nothing yet */ #elif defined(AFS_FBSD70_ENV) ... [09:58:46] > HOWEVER, 7 does some -actually- unholy vop_lock stuff i am so glad apple fixed most of this with 10.4 [09:59:33] "But I don't really want...": dude. It's incomplete. [09:59:43] I left markers to help myself and others. [09:59:46] --- jaltman has become available [10:00:06] i don't want a pony. [10:00:08] i want 2. [10:00:27] ;) [10:00:45] I might be able to get you a cat... [10:01:02] stringy or meaty? [10:01:08] meaty, actually [10:01:19] and very cuddly--but with some bad habits [10:27:31] So, uh, with Charlie Root's commit ... does git commit --amend give me enough rope to fix that in my working copy, or do I need to play more clever games? [10:27:32] --- jaltman has left: Disconnected [10:27:56] git commit --amend --author=foo [10:29:21] --amend -author (whatever) [10:29:25] yeah [10:57:14] Hm, looks like that messed up the changeID. Do I want to push to refs/changes/14/2214 or refs/changes/14/2214/2 ? [10:58:07] neither [10:58:12] refs/changes/2214 [10:58:47] Okay, let's see what happens. [11:00:23] --- jaltman has become available [13:01:55] if VOP_LOCKED is not "I have the lock", then the code in question is just wrongwrongwrong [13:06:00] Looks like we use lockstatus() The lockstatus() function returns the status of the lock in relation to the current thread. [13:07:32] OK, so that code is right, if unfortunate. [13:09:42] yeah, but if I'm reading FXR right, it still returns nonzero if someone else holds the lock [13:10:06] it just returns a different code, LK_EXCLOTHER vs LK_EXCLUSIVE [13:11:17] Yuck. [13:13:45] Though I still don't have a good sense of whether I can race myself with a tight shell loop in this case. [13:17:12] Oh, if that's the case then the test is wrong. Where "wrong" means you don't so much race as not even acknowledge the other guy exists. [13:20:15] Er, "could this cause my tokens to get discarded?" [13:24:07] Really, I have no clue what's causing that problem. It doesn't really make much sense. Someone should teach ethereal to decrypt rxkad [13:25:27] --- jaltman has left: Disconnected [13:28:23] Actually, I do have a guess. My guess is that you are somehow using the wrong key on a connection, either because some pointer points at the wrong copy of a data structure or because something has trampled on the copy of the key that actually matters, which is the key schedule in struct rxkad_cprivate and not the ClearToken HandShakeKey (which is used only when creating a new connection) [13:28:51] Or rather, the ClearToken copy is only referenced when setting up a new connection. [13:38:45] Unfortunately, the rx call has finished by the time of my crash dump. [13:41:44] yes, but the connection still exists. [13:42:04] Oh? [13:42:07] and the interesting data structure is actually per-secobj [13:43:00] yeah; the "tokens discarded" message actually means it's setting the flag that will force new connections. The connection data structure should still exist. [13:48:36] Pointers to which data structure(s) would be useful; I haven't looked at this stuff very much. [13:49:25] --- tkeiser has become available [13:53:26] oh, I have to remember how this works. a connection is struct rx_connection in src/rx/rx.h that has a struct rx_securityClass *securityObject which is the secobj, and is filled with method pointers and a void *privateData. the connection also has a void *securityData securityObject->privateData points at a struct rxkad_cprivate securityData points at a struct rxkad_cconn both of which are defined in src/rxkad/private_data.h [13:54:03] the secobj private data contains its own copy of the ticket, and an fcrypt key schedule for the session key [13:54:07] Ah, thanks. [14:10:27] It doesn't look like securityData has much of interest? [14:11:31] nope [14:12:10] not for rxkad. that's one of its big shortcomings; it uses the same keying material on every connection that shares the same secobj [14:13:29] Hm, and rxkad_cprivate's ticket is longer than what aconn->user->stp printed. I guess I'll have to go check the rest of it. [14:13:38] (But the first part matches up) [14:30:16] back from the dead, it comes [14:32:44] --- kula has left [14:43:27] --- tkeiser has left [14:49:29] --- sxw has become available [14:49:44] Okay, aconn->id->securityObject->privateDate->ticket and aconn->user->stp appear to be identical within my visual resolution. [15:26:36] --- deason has left [16:23:19] --- tkeiser has become available [16:32:19] --- tkeiser has left [16:38:21] --- sxw has left [16:48:51] --- rra has left: Disconnected [16:51:52] --- matt has left [16:55:16] --- sxw has become available [17:06:01] --- Russ has become available [17:22:16] --- jaltman has become available [17:46:08] --- kula has become available [17:46:40] --- kula has left [17:47:58] --- kula has become available [17:52:09] --- jaltman has left: Replaced by new connection [17:52:10] --- jaltman has become available [18:09:25] --- jaltman has left: Replaced by new connection [18:09:25] --- jaltman has become available [18:11:52] --- tkeiser has become available [19:34:32] --- tkeiser has left [20:33:47] --- Jeffrey Altman has left: Replaced by new connection [20:33:48] --- Jeffrey Altman has become available [23:44:13] I'm going to go ahead and finish the gerrit.openafs.org upgrade now. That will also kick everyone off of the chatserver while the system is rebooted to pick up the new kernel. [23:44:18] Everything should be back shortly. [23:48:34] --- LOGGING STARTED [23:48:35] --- JSund has become available [23:48:43] --- jaltman has become available [23:48:51] --- Russ has become available [23:48:54] --- steven.jenkins has become available [23:49:08] --- kula has become available [23:52:11] --- phalenor has become available [23:52:45] --- shadow@gmail.com/owl1EA1D463 has become available [23:53:04] --- dwbotsch has become available