[00:22:15] --- Simon Wilkinson has become available [00:58:18] --- Simon Wilkinson has left [03:08:04] --- simonxwilkinson has become available [03:19:17] --- simonxwilkinson has left [04:15:33] --- Simon Wilkinson has become available [05:04:30] --- Simon Wilkinson has left [05:45:00] --- paul.smeddle has become available [06:01:08] --- Simon Wilkinson has become available [06:04:46] --- Simon Wilkinson has left [06:44:51] --- meffie has become available [06:55:22] hey Mike [06:55:41] morning [06:55:52] or afternoon, as the case may be [06:56:09] thankee [06:56:49] Good evening ;-) [06:57:10] (sunset here) [06:59:36] yeah, -2 and frosty in Oxfordshire. Not quite sunset ... [07:00:19] Got up early to make it in to work before we started, here in Cambridge. [07:00:40] not quite sunrise here [07:01:45] Thanks a lot everyone for testing the current "pre1" tarballs! [07:02:51] Stephen Quinney started building as well. He reports a problem with the Red Hat spec. [07:02:51] --- Simon Wilkinson has become available [07:03:22] Hi Simon, just right ;-) [07:03:36] Hi [07:04:39] In the RedHat/openafs.spec.in, there seems to be an unconditional build dependency that can't be satisfied on EL5 (perl-devel). [07:04:51] There was also that email about poll() in fs-sync [07:05:03] Hmmm. Stephen reports that things build fine on CentOS 5 [07:05:49] I guess he fixed it manually. Looks like this was done before. [07:06:03] I don't think so. He's building the standard RPMS, I think. [07:06:21] Probably best to ask him - I haven't tried building on EL5 in a while. [07:06:27] I see this dependency in the openafs.spec.in in the 1.6.0 and 1.6.1 tarballs. [07:06:41] Or I'm getting something wrong. [07:07:16] --- deason has become available [07:08:15] I feel this shouldn't block pre1 as is. We can look at it for pre2. Ok? [07:08:32] you mean the poll()? I agree. [07:08:52] Hmm, I was till talking about the Red Hat spec. [07:09:21] Ok. [07:09:30] Not so sure about the poll() issue. [07:09:54] Perhaps the poll() issue should be resummarized so we're all talking about the same thing? [07:10:31] I'll cite Christof: [07:10:41] In 1.6.2 the poll() is activated. In fssync-server.c the poll is given the timeout of "-1". This has lead to high CPU-times on a test machines. Giving it a timeout of 5 secs or so, brings the CPU-time down again. [07:11:42] My problem is that this doesn't make any sense. [07:11:53] A timeout of −1 is infinite. [07:12:26] Has anyone else seen the problem he reports? [07:12:30] The poll stuff is a bit of a scary change - I said so when it went in. It does completely change the way that we handle fsync connections. [07:12:45] Isn't poll() spinning in a tight loop by definition? [07:13:08] Yeah, it's in a loop. In theory a timeout of 5 secs, rather than −1, should actually increase the load. [07:13:47] Unless it actually times out and you check the load after that. [07:14:31] I haven't seen that issue... I've got it running right now [07:14:40] does he say what platform? [07:14:48] I would imagine SuSe, but no. [07:15:30] oh, I have that email [07:15:35] well, in the first one he says suse, yeah [07:16:18] They're a suse shop. That's what he'd test first. [07:16:35] The only thing I can think of is that there's a problem with someone's poll implementation where −1 means 0, which means poll returns immediately [07:16:58] But I would have thought that would be noticed, and fixed, pretty sharpish. [07:17:20] Right. [07:18:46] Andrew, what platform is your server? (the one without the issue) [07:19:22] soalris and linux; I'm compiling on a suse12 box now [07:20:10] I guess you're running DAFS? Can that make a difference? [07:23:27] the code paths are the same; I had one on sol10 that was non-dafs [07:23:55] or rather, the poll() call is the same; if something else is actually causing that issue, the other stuff interacting with fssync can be different... [07:26:48] What was the reason for the poll change? Is it just the problem with fd limits > 1024? [07:26:56] yes [07:28:02] I wrote it ages ago, but somehow the configure test that enable it got mangled. [07:28:11] So it's only actually been turned on very recently [07:28:36] Can it be turned off at configure time? [07:28:52] Not easily, no. It's a feature test. [07:29:15] turning it off means you likely get memory corruption instead [07:29:21] But we've only had one report so far. [07:29:24] and if it is off in the wrong place, it well cause crashes [07:29:24] And if we wanted to disable it for 1.6.2, we'd need to replace it with something that limits you to 1024 fds. [07:29:38] Anyone running a busy server on RHEL6 will get memory corruption. [07:30:30] there have been multiple reports of crashes do to the fd issue [07:30:50] due* [07:32:20] The workaround is to simply run ulimit in the init script? [07:32:45] i'd be curious to see if his poll() is actually raising but the appropriate fd is never being serviced [07:33:15] Changing the timeout value shouldn't alter that behaviour [07:33:21] yeah, I've got a suse12 server now, and I'm not seeing ths [07:34:19] Let's leave it in for the time being. If it does keep a core busy on a current fileserver, no catastrophy. [07:35:59] I agree.. stable is better [07:37:36] Objections? [07:37:41] no [07:38:01] that is, i agree it should be in. [07:39:13] Anything else that would keep us from tagging pre1? [07:40:25] We already said we weren't waiting for all the Windows stuff? (Sorry to have parallel email thread going at the same time as this chat...) [07:40:26] I have no objections to tagging [07:40:55] tag at 58c2a08b73e832330b9ff606bbcf3a30b454454f? [07:41:29] Yes, please. [07:42:33] Awesome. [07:43:08] done [07:44:07] Thanks. [07:44:51] Still, any news/input on 8464? [07:45:54] has anyone tried 8464? [07:47:12] deason: have you spoken to Markus Koeberl about 8464? [07:50:28] to answer what I think the question should be, no, I haven't reviewed or verified what you have said [07:52:54] I'm confused about this change. It rips out server side idle dead processing? [07:55:19] I thought the problem with that is in the client side? [07:55:42] well, the problem was definitely in the server side, no matter which change you're talking about [07:56:14] (to be clear, the original submission was from me, the current one is from jeff and something I haven't looked at) [07:56:27] from me actually [07:57:33] going back to the IBM AFS days. The server had idle dead detection for incoming calls which never finished sending all of the data. It did not terminate a call by sending VNOSERVICE aborts when the server stopped sending data for a period of time after the call changed directions. [07:59:05] I have a feeling that I personally might love to see 8464 go in. But not enough insight to be sure. [08:00:47] current behavior is also not new, going back to 1.4.7 or 1.4.12 or whenever it was [08:00:50] Due to problems that a large site was seeing with clients getting stuck in calls on a file server that had disk i/o that never completed, Derrick proposed and implemented the VNOSERVICE response. At the time no one considered the possibility of simply disabling keep alive packets across disk i/o and pts calls. Now that we have the latter, there is no benefit to the former. 8464 removes the VNOSERVICE mechanism and restores the server behavior to the way it was [08:01:44] I would like to see 8464 go into pre2 but not until after it has been tested. [08:02:53] Andrew, seems it went in between 1.4.7 and 1.4.8. Something went into 1.4.8 that shouldn't have, and it *may* be what 8464 now would back out. [08:03:11] --- Derrick Brashear has left [08:04:10] c26dc0e6aaefedc55ed5c35a5744b5c01ba39ea1 was committed 9 May 2008. [08:04:46] and was pulled onto 1.4.x on 20 June 2008 [08:05:36] and it was shipped in 1.4.8 [08:05:49] --- Simon Wilkinson has left [08:07:23] We're not going to resolve it today, but it would be nice if it could be looked at by the experts. [08:07:54] I've looked at the patch. I believe it is correct. It would be nice if someones would put it on a server [08:09:16] yes, that's what I'll be doing [08:09:32] if someone gives me a target deadline, I can get it done by then [08:09:41] next wed [08:10:13] shall we target pre2 for the following friday? [08:11:19] Ambitious. [08:11:35] I would rather like to hear from stephan/paul on whether this needs to hit this release [08:11:39] --- simonxwilkinson has become available [08:11:43] (or defer the conversation) [08:12:31] It rips out bits of idle dead. [08:13:03] Does some site have an urgent need for this patch? (Sorry if this has been mentioned) [08:13:45] nobody I've talked to [08:14:02] well not specifically; people complain about idledead stuff since it went in [08:14:17] I think it only hits if your disks are crawling [08:14:36] to test this, you need to somehow simulate bad disks, right? [08:15:04] For this particular case, yes [08:16:14] My feeling is it can wait til 1.6.3, but shouldn't be forgotten. [08:16:25] Doesn't sound super-urgent. Might be nice for 1.6.2 if tested. [08:16:27] yes, agree [08:17:33] well, if it's not in a pre-release, your testing will be even more limited. [08:18:02] Well, if we can get it into pre2 for testing so be it. If we can't we should wait for 1.6.3pre1 [08:18:14] +1 [08:18:23] okay, but 'desirable' for 1.6.2? [08:18:34] of course. the question is "who on this list besides Andrew is going to test it?" [08:19:10] is there a site that has "issues" with i/o we could ask? [08:19:24] I'll try once I have 1.6.2pre servers. But as is, it doesn't even apply. [08:19:24] I'm happy to do so but given that I'm on the road for the next week and have all of the windows testing to do its going to be hard for me to look at this [08:19:29] you don't need a real reproduction case, mike [08:19:47] I can try and test this too [08:19:56] it's a sleep() happening at the right point; that is nearly identical to where the initial issue was seen [08:20:03] yes, i understand, i was just wondering if there is someone to ask. [08:20:26] I know I have seen kernel patches to allow the introduction of failures into disk i/o on-demand, but I don't seem to be able to find them right now... [08:20:39] My fear isn't whether it fixes the issue, but what else it might break [08:20:44] Andrew, 'desirable' really depends on your input. [08:21:01] simon++ [08:21:25] yes, what was said is what's needed to test the reported issue; you need to test other aspects of idle dead to have any kind of confidence in this [08:21:50] it's pretty obvious that it'll fix that issue if the patch does what I think it does; since the problematic functionality is simply not there [08:23:23] its clear that we aren't going to make further progress on 8464 today. if next Friday (the last work day before the holiday break) is too ambitious for pre2, do we have a proposal for when pre2 should be? [08:24:05] Stephan Wiesand: I can form an opinion by next week if you want; or whenever [08:24:26] I would've thought that question would be answered after pre1 was released, since it depends on what happens with it [08:24:26] We can still target it, but what's the point unless we had sufficient feedback on pre1 before? [08:24:46] so the windows code gets tested? [08:25:12] Right... [08:25:39] pre2 may even be Windows-only then. [08:25:46] if you tell me no pre2 before Jan 4 then I will plan my next week schedule very differently [08:26:45] if pre2 were to be windows-only, then it seems like we could accelerate that; just do it as soon as the patches get in [08:27:20] It wasn't quite clear to me that more Windows patches are required. [08:27:25] and then pre3 with fixes for any new problems found, and maybe 8684 and whatever else [08:27:43] oh, sorry, if windows stuff is already in then nevermind [08:27:54] if I have to alter the auth package to add krb524 support to openafs to prevent breakage with mit kfw 4.0 there should be a pre2 for all platforms [08:29:13] so did you mean to write conv_creds instead of conv_principal in that last email to ben? [08:29:33] I copied what ben wrote [08:30:15] as my final e-mail said, whatever functionality was removed will need to be imported into openafs [08:30:29] I am basically just confused how krb4 is still required, since MIT krb5 on unix has dropped krb4 for many years, and openafs still works on unix just fine. [08:31:20] ben wrote that conv_creds doesn't work, and conv_principal does; you responded by saying that if conv_principal is broken, then we need to change things [08:31:28] it's just a little confusing; if it was a simple typo then okay [08:31:41] if conv_creds is missing, then it needs to be replaced [08:31:46] ok [08:32:05] It's not missing, it is a stub that returns failure. [08:32:28] the functionality is missing, I believe is the point [08:33:03] How is windows different from unix in this regard? [08:33:29] there is still a very large percentage of the openafs community that uses kaserver and uses krb524 to convert from krb5 to krb4.. some of the sites that use krb524 do so because it performs name translation. Others because they still have IBM 3.6 AFS servers or pre OpenAFS 1.2.11 servers in their environment because they rely upon certain behavior of those servers. [08:36:19] I'm sorry if I'm being obtuse, but I don't see how a recent Ubuntu machine would function as an openafs client in such an environment, either. Are these large corporate environments where all software is pinned to old versions? [08:37:30] They build their own... [08:38:43] Paul. How would you feel if WinAFS in your company prevented Security from being able to roll out KFW 4.0 (your firm is a Kerberos consortium member) because portions of the Windows environment still rely on krb524 for name translation to access AFS? [08:40:14] Ah, um. [08:41:02] I will point out that this is not a hypothetical question. The XP/2003 plant still relies on the SMB releases (1.6.x) and KFW is used for authentication from all of the systems [08:41:08] Wait, I must admit to being rather ignorant of the windows client setup. But that's not ideal. [08:41:28] You don't actually need to answer the question. I'm going to answer that it is unacceptable [08:42:38] It would certainly result in howls of anger and derision from the Windows engineering people and their management. [08:42:45] which, yes, is unacceptable. [08:42:53] that's not answering ben's question; what I assume based on that text is that there are some aspects of windows-only environments that depend on 524 conversion, but the unix clients don't need it [08:43:30] So, what I am hearing is that we need compatibility layers to allow client machines running modern software to cope with ancient server environments that cannot be upgraded, is that correct? [08:43:32] or rather, the windows-only portion of an environment [08:44:05] The server environment is fine, it's the client behaviour which is impacted. [08:44:46] On Unix, this compatibility layer is provided by in-house builds of the kerberos libraries, and we are currently discussing where the compatibility layer should be for windows clients. [08:45:12] its not windows specific. this problem impacts unix clients as well. however, for windows sites do not have an easy option of build their own or control what KFW is installed on client systems since the client systems are not managed [08:46:36] Jeff: I do accept tickets from more than one realm on the servers though. It was my understanding that that change removed the need for 5to4, but I defer to your greater knowledge of the Windows client setup. [08:46:39] yeah, I didn't mean it's windows-specific in terms of a platform-specific issue; but it so happens that the environment that needs it happens to be windows; the unix stuff either doesn't need it, or it's easier to change the software installed [08:46:42] Sorry, Jeffrey, I don't think I'm correctly parsing "however, for windows sites do not have an easy option of build their own or control what KFW is installed on client systems since the client systems are not managed" [08:47:23] sorry, tokens I should say [08:47:44] at MIT, how many of the systems that access the campus network are managed? [08:48:03] Is it already clear that kauth has to be touched? [08:48:11] managed == systems where installed software is controlled by central IT [08:48:51] I dunno, maybe a thousand managed machines at MIT? (I'm not in ops, obviously.) [08:49:03] and how many which are not? [08:49:15] 10,000? 20,000? [08:49:32] O(10k), sure. [08:50:04] and how many of those users if they see "shiny new KFW" will download it? [08:50:31] An interesting question. IS&T is going to try and push it, I think. [08:50:49] now replace MIT with a site such as UCSC which still has an IBM AFS file server for home work submissions [08:51:28] IBM AFS servers can't use krb5 based tokens. They have to use krb524 for token acquisition [08:51:55] How many help desk requests will be triggered as a result of students being unable to submit homework? [08:51:56] And you get user outrage when it breaks, yes. I would love it if the result of such outrage was that the IBM server was replaced with something modern, but that's unlikely to be what happens, I see your point. [08:52:27] Thank you for humoring me with the more detailed explanation. [08:52:38] The IBM AFS server is used specifically because OpenAFS 1.1 fixed a security hole that the homework submission process relies upon [08:53:47] but that is just one example. OpenAFS has maintained backward compatibility across releases. We could blame this on MIT but I'm going to take that position that we should fix the user experience in OpenAFS so end users do not notice it [08:53:56] So we are back to the question of where this compatibility layer should live. [08:54:36] for OpenAFS we have a krb4 implementation and we already import krb5 ticket processing. Adding the missing functionality should not be hard [08:55:52] And, now that I understand the situation more clearly, I agree that OpenAFS is the least-bad place to put it. [08:55:59] way back when, mit krb5 1.1?, even mit didn't have krb524 conversion support. it was an external library with two functions in it. I propose that library be added to the openafs code base [08:56:53] so, targeting pre2 for that, if I remember correctly? [08:56:57] --- simonxwilkinson has left [08:57:04] my time availability for today ran out 30 minutes ago. I really must leave [08:57:07] No objection in principle (I haven't looked at that code). [08:57:24] Okay, we'll take anything else to email, then. [08:57:29] Thanks for staying this long. [08:57:54] btw, feel free to implement it and send a patch. I'm not going to look at it until at least next Tuesday [08:58:23] I have some other pressing issues this week, but will keep that in mind. [08:59:47] Okay, so we're looking at whichever pre we're at in early 2013, realistically. [09:00:02] (Thanks for your input Jeff) [09:04:24] So, how to continue? Keep testing pre1 by the release team, and wait for pre2 before going public? [09:05:39] Why would we not make pre1 public? [09:06:08] Because we would go public with pre2 just a few days later? [09:07:08] Oh, if pre2 is just pre1+the 524 lib? [09:09:32] That's how I read Jeff. Pre2 should be tested on Unix. Asking folks to test something new every couple of days is not going to work well, I guess. [09:10:14] Yeah, if just pre1+lib524, then I agree. I was still thinking about what we had discussed for pre2 previously. [09:10:51] Hrm. What's the point of tagging a pre without public testing? [09:11:16] Morale. "We have actually started the release process!" :) [09:11:27] I'll buy that ;) [09:12:48] since the tag is public, we should announce that at least [09:13:09] When we tagged pre1, I hadn't understood that we'll have a pre2 with a change needed for windows affecting unix, which is needed a.s.a.p. so windows gets tested at all. [09:13:54] Yeah, fair enough. As Ken says, though, we should make an announcement. [09:14:52] Ok. [09:15:04] With the caveat that it looks substantially like what we're aiming for, and interested poarties can continue to test the tarballs, but an official package will wait fot pre2. [09:15:11] What do you think? [09:16:23] Sounds reasonable.. [09:16:36] I'm happy to try and sketch out an appropriate email fir feedback from this list... [09:17:00] Thanks. [09:17:52] looks like Russ used to do the NEWS entries. I'll submit a draft to Gerrit later today [09:18:21] Thanks. [09:18:58] NB Stephen wrote he'll look into the redhat spec issue tomorrow. [09:20:06] Anything else for this meeting? [09:20:41] No, I think that's it. [09:20:56] We're going to need release notes. Unless Paul can simply write them, we'll need input. [09:22:41] ok [09:23:40] Thanks everyone for staying around that long. [09:26:01] Time to go home. See you. [09:27:17] bye Stephan, and thanks [09:27:30] --- Stephan Wiesand has left [09:33:21] --- meffie has left [09:33:22] --- mmeffie has become available [09:36:00] --- mmeffie is now known as meffie [09:50:37] --- Derrick Brashear has become available [10:45:00] --- Simon Wilkinson has become available [10:47:08] Arriving very late to this party. [10:47:38] I think it's worth announcing pre1 now. pre2 isn't going to happen at this point until after the holiday period, realistically [10:48:30] To quote Journey, "The party's over, I have gone away" [10:49:31] Simon: sure, but we won't have official builds before pre2 [10:50:24] So I'm looking at something like announcing the tag, and inviting interested parties to build it, but stating that the first release with prebuilt packages will be pre2 [10:50:59] sounds like a good plan [10:51:54] Excellent, will send a short note to release-team, and if I get a few positive responses, I can mail the announce list. I assume you're a moderator? [10:52:43] i have approve power. i guess that makes me a moderator [11:03:20] --- stephan.wiesand has become available [11:17:07] Bah, I forgot that I don't get to see the most recent history in scrollback [11:17:24] why not? [11:18:13] Server bug, I think. If you leave the room and then rejoin, you seem to miss the last bit of history. So, when I rejoined the most recent message I was shown was "Hrm. What's the point of tagging a pre without public testing?" [11:18:46] Hmm, I don't think barnowl tries to get me history, so I don't notice. [11:18:57] It's also not clear to me that the 524 change need affect Unix at all. [11:19:11] We've been living with the gradual death of Kerberos v4 on Unix for years. [11:20:29] Simon, it's not clear to me either, but Jeff seems to assume he has to :modify the auth package". [11:21:19] He needs to modify his aklog. I'm not sure what has to change beyond that. [11:22:26] Looks like noone's sure, and we won't know before tuesday. [11:22:54] find . -name *.c | xargs grep 524_ [11:23:10] suggests that the only calls to 524 utilities are in aklog [11:24:01] Now it may also be that there are issues with things using kerberos v4 functions directly, which will impact on kauth [11:25:25] Not trying to kill the 524 discussion: We will have a couple of Linux packages for pre1, I think. Stephen is building for EL/Fedora, Christof submitted to opensuse build service. [11:25:56] Yeah, I think there's likely to be a number of things available for people to test. I don't see any reason not to announce it. [11:27:40] Oh, okay, my draft announcement says no packages! :) [11:28:00] I guess we can wait a day to see what gets built, and then modify it. [11:28:20] announce it and say some binary builds will follow [11:28:26] --- Derrick Brashear has left [11:30:07] Sounds like a good plan. I'll release the volume now... [11:34:59] /afs/grand.central.org (without dot) should work now. [11:36:57] Paul, you sure we want to encourage sites to widely deploy 1.6.2pre2? ;-) [11:39:14] Ken, thanks for the NEWS in gerrit. It wasn't clear to me that NEWS = Release Notes. [11:44:14] --- Derrick Brashear has become available [11:48:47] --- meffie has left [12:48:55] --- Derrick Brashear has left [12:55:28] --- deason has left [12:55:29] --- deason has become available [13:20:00] --- deason has left [13:20:00] --- deason has become available [13:25:38] --- paul.smeddle has left [13:36:01] --- deason has left [13:36:53] --- deason has become available [14:09:37] --- stephan.wiesand has left [14:22:48] --- Simon Wilkinson has left [14:52:50] --- Simon Wilkinson has become available [15:40:12] --- deason has left [20:30:52] --- Derrick Brashear has become available [20:43:46] --- stephan.wiesand has become available [20:44:09] --- stephan.wiesand has left [21:52:05] --- Stephan Wiesand has become available [21:52:20] --- Stephan Wiesand has left [21:52:23] --- Stephan Wiesand has become available [23:53:27] --- Simon Wilkinson has left