[00:06:45] --- abo has become available [00:08:41] --- dev-zero@jabber.org has left [00:22:55] --- kaj has become available [01:04:37] --- dev-zero@jabber.org has become available [01:21:12] --- dev-zero@jabber.org has left [01:21:19] --- dev-zero@jabber.org has become available [01:43:13] --- Simon Wilkinson has become available [02:44:52] --- jaltman has left: Replaced by new connection [02:44:52] --- jaltman has become available [02:45:06] --- Jeffrey Altman has left: Replaced by new connection [04:00:57] --- Simon Wilkinson has left [04:01:02] --- Simon Wilkinson has become available [05:53:13] --- meffie has become available [06:09:10] --- kaj has left [06:09:11] --- kaj has become available [06:33:37] gerrit appears to be temporarily out of service. [06:34:47] I'll look. [06:40:01] --- reuteras has left [06:44:24] Should be back. Sorry, the changes for github caused it to fallover shortly after starting up. [06:52:37] --- dev-zero@jabber.org has left: Lost connection [06:57:40] --- dev-zero@jabber.org has become available [07:03:18] --- abo has left [07:03:47] --- abo has become available [07:17:12] --- deason has become available [08:47:43] --- kaj has left [09:33:45] --- Kevin Sumner has become available [09:35:03] kevin, what client are you using? [09:35:13] jabber client, that is [09:35:21] Empathy on Ubuntu 9.10 [09:35:23] Problems? [09:35:37] it's sending us status notifications when you type. [09:35:44] i'm just curious why, and where the bug is [09:36:00] I'll switch over to something else. Recommendations for a linux jabber client? [09:37:34] it's of slightly more than academic interest as i've done at least a little work on both jabber clients and jabber servers. not a lot [09:38:14] Is it showing up in the chat logs or just on your client? And which client are you using? [09:38:34] my client. barnowl. [09:39:13] no clue if adium shows it. i could look at that easily enough i suppose [09:39:47] --- abo has left [09:40:03] --- Derrick Brashear has become available [09:40:27] --- abo has become available [09:41:48] no indication they'd show in adium [09:42:09] yeah, i see nothing here. in barnowl i still see them [09:42:19] Interesting. [09:44:09] But you still haven't answered the question. [09:44:29] indeed. [09:45:41] --- Kevin Sumner has left [09:45:59] --- Kevin Sumner has become available [09:46:27] There we go, I've switched to Pidgin on my Ubuntu machine. Empathy's minor annoyances has been bothering me for some time. [09:46:57] empathy [09:46:58] ok [09:49:27] Anyway, what I originally was typing up: This is offtopic, but I figured this is as good a place as any to ask. I'm examining using Heimdal krb tickets to authenticated to Windows AD services and I'm getting enctype mismatches. Anybody have any guidance? [09:49:32] --- meffie has left [09:50:03] you are… overloading one realm name? doing crossrealm authentication? something else? [09:51:35] --- Derrick Brashear has left [09:53:40] We aren't overloading the realm name, but we have a one way trust from Heimdal to AD. We have single sign-on through pam_krb5, pam_afs_session for AFS tickets. Most of our services are authenticated through the Heimdal KDCs, but we have a couple of AD-authenticated services that we'd like for our *nix users to be able to get at with krb5 tickets. Make sense? [09:53:54] AFS tokens* [09:56:18] OK, so by "getting enctype mismatches", what exactly do you mean? What error emssage(s) do you get, where, and at what stage? [09:56:33] (if there's an appropriate forum for this, other than the heimdal-discuss mailing list, I don't know what it is) [09:59:25] I'm using some CIFS shares to test with and I'm getting the following from smbclient when using our Heimdal tickets from ISIS.UNC.EDU -- AD tickets from DEPTS.UNC.EDU work just fine with smbclient. ads_krb5_mk_req: krb5_get_credentials failed for webna1$@DEPTS.UNC.EDU (KDC has no support for encryption type) cli_session_setup_kerberos: spnego_gen_negTokenTarg failed: KDC has no support for encryption type session setup failed: SUCCESS - 0 [09:59:59] command I'm using for both AD and Heimdal tickets is: smbclient -k -L storage.depts.unc.edu [10:01:15] --- mho has become available [10:01:24] And after you do that, do you have krbtgt/DEPTS.UNC.EDU tickets? [10:03:14] I have these two: krbtgt/ISIS.UNC.EDU@ISIS.UNC.EDU and krbtgt/DEPTS.UNC.EDU@ISIS.UNC.EDU [10:03:29] 1) What tickets do you have? Use klist -v , which includes enctypes 2) Does your krb5.conf set default_etypes? 3) What enctypes does your cross-realm principal have? Use kadmin get krbtgt/DEPTS.UNC.EDU [10:04:20] OK, so it is actually your request to AD that is failing. [10:05:45] esdgecombe owes derrick an apostrophe key [10:05:58] edgecombe, that is; I can't type [10:15:17] I don't have a default_etypes entry in krb5.conf -- which is the same conf thatour identity management team distributes. I'll add that and see how things go. I can't view our cross-realm princ as I haven't been given admin on our Kerberos realm. They only trust us AFS guys with AFS admin here. :) [10:17:58] > esdgecombe owes derrick an apostrophe key [10:18:05] what did he do? [10:21:50] --- jaltman has left: Disconnected [10:22:02] --- jaltman has become available [10:23:37] Adding "default_etypes = des3-hmac-sha1 des-cbc-crc rc4-hmac des-cbc-md5 arcfour-hmac-md5" didn't help. I've asked our id. mgmt. to let me know what our enctypes both for the realm and for the cross-realm princ are. [10:28:49] > I'll add that and see how things go. Why would you do that? [10:29:33] Did you do klist -v, like I suggested, so you can see what enctypes are in the tickets you actually have? [11:21:02] --- jaltman has left: Disconnected [11:21:10] --- jaltman has become available [11:28:18] Kevin: the only enctype that should be configured for the cross realm TGT between AD and Heimdal is rc4-hmac [11:31:04] I see now why I was a bit confused reading BreakVolumeCallBacks and BreakLaterCallBacks.... have we always sent volume callback breaks to every host in hostList? [11:34:44] we're not supposed to? [11:37:51] I don't know if that's a question or a surprised/confused statement; sending them to everyone seems silly if we know what hosts actually do and don't have a callback for the volume [11:38:16] it's done in the background, sure, but it seems like that could really slow down FsyncCheckLWP [11:38:47] i was unaware we sent them to anyone other than we know wants them [11:39:26] looking at the code makes it very much look like that's exactly what it does; I tried a trivial test of master with tcpdump that seems to agree with it [11:40:02] we just call h_Enumerate(MultiBreakVolumeLaterCallBack, foo) [11:40:33] which effectively does MultiBreakCallBack_r in batches of MAX_CB_HOSTS [11:41:32] which is why I was so confused trying to add the HOSTDELETED check to BreakLaterCallBacks and BreakVolumeCallBacks; they hold the host ptr but never seem to do anything with it [11:42:29] --- abo has left [11:42:39] (the test was just: have 2 clients, acquire a callback on some RO, release, see the breaks. release again, see no breaks. acquire a callback on _one_ of the clients and release again, see the breaks on both clients) [11:43:24] --- abo has become available [11:43:34] i need to finish looking at what i am, then i'll look [11:44:41] it's not particularly important to anything at the moment; I just thought I was missing something big, but I haven't found it yet and I'm closer now to thinking it's just weird/broken [11:47:12] of note, if true it's been true since afs 3.6 or earlier. and it may well be true [11:57:55] --- meffie has become available [12:08:36] it looks like breaking volume callbacks on all undeleted hosts dates to at least 3.1 [12:09:15] the h_Enum call isn't wrong. they just shouldn't be chained into the break list if they aren't eligible [12:11:07] the h_Enum call isn't necssarily wrong, but I'm not sure how you do it efficiently that way; to avoid hosts that don't need the break, you'd need to traverse the callbacks list to see if it has one or something [12:11:27] I thought it would make more sense to keep an array of hosts as we go through [12:11:54] quite possibly. istr the other half of later callbacks already does that, for instance [12:12:45] but it's been a while [12:16:58] It never rains but it pours. The report on openafs-info@openafs.org from Jack Neely looks now [12:17:30] new? [12:17:39] new, sorry. yes. [12:17:48] As in "A backtrace that I haven't seen before" [12:18:08] Actually as in "A backtrace that looks completely mangled" [12:18:30] remove_wait_queue from BackgroundDaemon? [12:18:49] That's the one. [12:24:18] ... asking if that backtrace is from our kernel modules. If so, I can at least work out what afs_GetDCache+0x1c0a is (other than a clue that that function is way, way too long) [12:41:34] > h_Enumerate(MultiBreakVolumeLaterCallBack, foo) ... which calls MultiBreakVolumeCallBack_r, which does nothing if !isheld, which is set to 0 by h_Enumerate for hosts which were not already held by this thread on entry to h_Enumerate. Conveniently, due to the calling conditions for BreakVolumeCallBacks et al, the only hosts held by this thread on entry to h_Enumerate are those which were identified as needing a callback break by the loop before the call to h_Enumerate. [12:42:55] hm. good point. [12:43:50] which sounds like it may be completely broken by the move away from the hold bitmap [12:44:04] Yay for side-effects! [12:44:56] ("completely broken" is a bit harsh... "completely broken" == "we send callbacks to all hosts") [12:46:19] well, there's a reason this code wasn't 1.4.x ready [12:46:30] so yay. ok. well, now we need to fix it [12:47:17] deason: Even if you intend on fixing this rapidly, could you open an RT ticket so it doesn't get lost? [12:48:25] sure; I don't intend on fixing it rapidly, since there are other things I intend to fix sooner [12:51:10] > move away from the hold bitmap It doesn't matter how you represent holds, as long as it is possible for a thread to tell whether it already has a host held. Several things depend on that, and while some of them can be converted to just use recursive holds, some users of h_Enumerate depend on this behavior. [12:54:40] --- dev-zero@jabber.org has left [12:54:54] > as long as it is possible for > a thread to tell whether it already has a host held you can't, as far as I know [12:54:58] what else depends on it? [12:56:41] (GetSomeSpace is one guess; I'm already looking into fixing that a bit for other reasons) [12:56:56] I don't know, offhand. I'd have to go look. But instead, I'll place that burden on anyone proposing to remove the interface. [12:57:12] the host hold interface? it was already removed; it was removed months ago [12:57:23] good thing you comment on things in gerrit. anyway, it will be fixed before this is stable. so you can just refrain from whining [12:57:26] or rather, changed to use reference counts instead [12:57:37] (tho you probably won't) [12:58:19] If you removed it, how did you not find things calling the h_Held interface? [12:58:34] I'd comment on things in gerrit, if I had any time. [12:59:00] you can use all the time you spend whining on it. we'll be ok. [12:59:20] everything used h_Held already, to determine if it needed to release afterwards or not; I would guess some things may have been confused whether or not h_Held was used for that purpose, or for functional reasons [13:00:03] h_Held was used much like ISGLOCK is in several places. [13:00:07] (not to mention, some uses of h_Held were getting to become hard to follow) [13:00:32] agreed. [13:01:04] > everything used h_Held already Yeah. And you changed h_Held to do what? Always return false? Always return true? Both are obviously wrong, since one causes things not to be held that should be, and the other causes things not to be released that should be. So you needed to find all of those call sites to find out what they were using it for and fix them to do something else instead. [13:01:38] i could spend time reading code for you, or do something useful. i'll do something useful. [13:01:43] In the case of h_Enumerate, since you couldn't propagate the isheld state to the enumeration callback, you needed to find all the enumeration callbacks, figure out what they did with that bit, and make them do something else instead. [13:01:45] the references to h_Held were removed, and the callers adjusted; don't ask me if it was done correctly, as I didn't do it [13:02:09] it obviously need to be re-reviewed. got it. if you have more data, share. if you don't, i'll go back to what i was doing [13:02:14] > some uses of h_Held were getting to become hard to follow Yeah, I can see that. Some of them are complicated. [13:02:42] Sorry; when I said "you", I didn't necessarily mean you personally. But someone, as part of preparing a patch. [13:03:46] In any case, someone needs to go back and find all the h_Enumerate users and see what they were actually using that for. Sadly, I don't have a good suggestion for how to fix the volume callback thing. Since you no longer have a per-thread bit on every host, you will need to invent some other method of collecting up all the affected hosts. [13:04:32] I suppose you could use thread-local storage, but, what a waste. [13:04:43] I don't think that's hard; you collect an array of them and send the callbacks in batches MAX_CB_HOST large [13:05:04] since that's effectively what we do anyway [13:05:09] a reference held by count, an array of pointers, passed into h_Enum* as a param, the question is only how you make the array not suck [13:05:13] Oh, but it is hard, because you're trying not to hold the large locks that prevent things in your array from disappearing. [13:05:37] you don't hold locks. they're referenced. they better not disappear or you have a problem [13:05:47] what's the point of a reference count if you ignore it? [13:05:52] > how you make the array not suck usually by doing one pass to count, allocating the array, and then doing another pass. But again, large locks. [13:07:24] yes, I was trying to suggest having an array just constantly of size N, look through until you find N hosts, remember where you were, break callbacks, and continue on; details obviously make that take a bit more thought, but that's just in general... [13:07:44] "remember where you were" is the part that requires holding large locks. [13:07:56] i.e. locks that prevent changing the set of what hosts there are. [13:08:41] Anyway, you'll have to read the code. The current mechanism effectively does that, by taking advantage of the h_Enumerate interface. [13:09:08] yeah, I know; it's easier to debate these things if you're actually looking at a patch [13:09:40] what we're actually iterating over is FEs (or the CBs in each FE); I was hoping for a way to keep them around outside of H_LOCK, but perhaps that doesn't exist right now [13:09:41] we'll figure it out. [13:10:21] The problem is that you want to first enumerate all the callbacks on the volume and somehow record which hosts are affected. The current code does that by marking hosts in a way that allows it to mark _all_ the hosts quickly, holding the H_LOCK, then release it before breaking the callbacks. [13:10:37] perhaps it could still be done in chunks of N... just repeat enumerating over the FEs from the start each time [13:11:05] If you do that, you are not guaranteed to ever finish. [13:11:13] that is true [13:11:39] I like algorithms. Our code should use algorithms. [13:12:25] --- abo has left [13:12:48] --- abo has become available [13:14:14] actually, if we can assume just one thread is processing vol callbacks at once, we could just set a flag in the host struct [13:14:26] ...but I should stop trying to think of ways to do this when I know I'm not doing this today [13:17:28] I don't think we can make that assumption. [13:17:44] I'm not 100% sure, though. [13:18:41] well, we can make it assumable if we need to :) mutual exclusion is possible [13:19:00] If we could, then yes, we could do that. Or if we could assume that multiple threads trying is unlikely enough that we can safely serialize them, we could invent a lock for the operation. Either of those assumptions would let us do something very similar to what the current code does, but without relying on h_Held working. [13:19:51] multiple threads trying for the same one, or for any one? [13:19:57] For any one. [13:20:01] I would be concerned about ensuring that we do not break callbacks for a volume on a host more than once per triggering event [13:20:28] more than one breakvolumecallbacks at once? probably yes [13:20:52] more than one breakvolumecallbacks per event, seems easy to avoid. what happens if you have 2 events on the same volume in rapid succession? [13:21:05] Traditionally, I don't think we can have more than one at once, because the only thing that does that is fssync, and the fssync server was one thread. That may not be true in the long term. [13:21:23] they should all be processed by FsyncCheckLWP; I don't see how we'd have more than one at once [13:21:44] The same thing that happens today -- two breaks. Note that "rapid succession" isn't that rapid; it takes some time to do even a no-op volume release [13:22:26] > the only thing that does that is fssync fssync now passes them to another thread so it can still service other fssync requests; but it passes them to one specific thread (not e.g. spawning a thread to take care of it) so they should be serialized [13:22:47] Er, yeah; that's been true for a while [13:22:48] --- abo has left [13:23:07] ah, okay [13:23:21] if you assume there is one thread you don't even need a lock. make a flag. that thread owns it [13:23:22] --- abo has become available [13:23:29] hold a reference for the flag [13:23:50] hold an *extra* reference for the flag that is [13:24:11] no thread that is not servicing those callbacks gets to touch it. [13:24:25] i guess i had code to parallelize this that i dropped. it probably is no longer applicable [13:25:11] exactly nothing calls BreakVolumeCallBacks (as opposed to BreakVolumeCallBacksLater), perhaps we should kill some code [13:25:37] you could still do the same basic idea for a small number of threads; just have a small array/queue of thread identifiers that are doing the callback breaks [13:25:57] yes, that was confusing, too [14:42:25] Bah. I think we may have a memory corruption problem on 1.4.11 [16:14:01] --- dev-zero@jabber.org has become available [16:17:33] Today's favourite git incantation: git rebase -f --whitespace=fix origin/master [16:17:54] (which will fix all of the whitespace errors in your tree between your HEAD, and the point you diverged with master) [16:18:47] --- deason has left [16:22:32] ♥ [16:23:07] --- mdionne has become available [16:49:52] --- dev-zero@jabber.org has left [17:01:42] --- meffie has left [19:30:08] --- mdionne has left [20:13:25] --- Rrrrrred has become available [20:19:34] --- Rrrrrred has left [20:40:01] --- deason has become available [21:50:01] --- deason has left [22:17:36] --- reuteras has become available