Home
release-team@conference.openafs.org
Wednesday, April 30, 2014< ^ >
Room Configuration
Room Occupants

GMT+0
[00:20:38] Jeffrey Altman joins the room
[08:19:19] wiesand joins the room
[11:31:20] <wiesand> test
[12:15:16] <Jeffrey Altman> passed
[13:15:32] meffie joins the room
[14:01:31] deason joins the room
[14:02:00] <wiesand> Hello. I see Mike, Jeff, Daria, Andrew... correct?
[14:02:07] <meffie> hello
[14:02:08] kaduk joins the room
[14:02:17] <deason> yes, hi
[14:02:22] <kaduk> Sorry I'm late.
[14:03:00] <wiesand> Does the new server write the log only once per day?
[14:03:26] <kaduk> I don't think so ... did you check that you do not have cached DNS or anything?
[14:04:06] <kaduk> Though, the old machine looks to be down down down
[14:04:50] <wiesand> http://conference.openafs.org/release-team@conference.openafs.org/2014-04-30.txt gives me "not found"
[14:05:16] <kaduk> Huh, it loads for me.
[14:05:47] <wiesand> Funny.
[14:05:47] <meffie> i get 404 as weel
[14:05:47] <kaduk> Er.
[14:05:48] <deason> that link gives me a 404
[14:05:49] <meffie> well
[14:05:52] <kaduk> Try .html, not .txt
[14:06:11] <meffie> .html works
[14:06:13] <kaduk> This is what happens when my browser is on a different monitor.
[14:06:17] <deason> yes, that loads
[14:06:26] <wiesand> Ah, great. Thanks. These logs are really useful.
[14:06:41] <wiesand> Let's start.
[14:07:11] <wiesand> I put the "next stable branch(es) topic first today.
[14:07:18] <wiesand> Any news? Thoughts?
[14:08:57] <wiesand> Crickets. Seems it's not the late time causing the silence on this topic. Move on?
[14:08:59] <kaduk> Well, the tip of the 'rxgk' patchset in gerrit allows using rxgk for the DISK_ RPCs of the intra-{vl,pt}server ubik connections, which is just barely edging into "people might want to test this" territory.
[14:09:46] <wiesand> Which means that it really better happened reasonably soon now?
[14:09:55] <kaduk> Since a big driver for cutting new branches is supposed to be pressure to get the rxgk bits merged.
[14:10:03] <kaduk> Well, for some value of "reasonably soon".
[14:10:26] <kaduk> I don't think it's a huge burden to maintain the current patchsets, but the next steps might be more invasive.
[14:11:42] <wiesand> Got it. Alas, can't help it.
[14:12:37] <wiesand> Is anyone able to write down a complete proposal?
[14:13:16] <deason> a complete proposal for... managing the branches?
[14:13:31] <kaduk> creating and numbering them, too, I think.
[14:13:38] <wiesand> What the new branches will look like, how they're named.
[14:15:03] <wiesand> I think this is not making progress because it has to be discussed elsewhere. Let's remove the topic from future agendas, and move on now.
[14:15:38] <wiesand> Let's defer Linux, hoping that Marc will join later.
[14:15:54] <wiesand> So we're at "problem reports".
[14:16:08] <meffie> you could not worry about the number for now, and just call it "openafs-next" ?
[14:16:34] <deason> the number isn't important; there are other issues, mike
[14:16:55] <meffie> yeah.
[14:17:08] <wiesand> I think we need the actual gerrit branches eventually to allow Ben make progress.
[14:17:54] <meffie> ok
[14:18:04] <meffie> (sorry, moving on)
[14:18:17] <wiesand> Ok. RT #131855 sounds like a showstopper.
[14:18:27] <wiesand> Any news on that?
[14:18:56] <Jeffrey Altman> sorry I am lte.
[14:19:46] <deason> it'd be nice to have any kind of information on it...
[14:19:55] <Jeffrey Altman> Wish I had some to give you
[14:20:26] <Jeffrey Altman> We can't reproduce it outside of the submitters production environment
[14:20:54] <wiesand> That may explain why I failed.
[14:20:55] <Jeffrey Altman> test machines in the same cell with the same OS, kernel, and openafs versions do not repro
[14:21:44] <Jeffrey Altman> removing the referenced commit results in proper behavior but getcwd() is then broken again
[14:22:08] <wiesand> It's actually 10984 causing this?
[14:22:11] <deason> that's still missing information; callbacks just get effectively ignored... always?
[14:22:24] <deason> something else doesn't see a dentry? or it sees a dentry that was removed?
[14:22:37] <deason> or file metadata is wrong? or file data is wrong (somehow)?
[14:22:43] <Jeffrey Altman> I can't answer that.  I'm not working on the issue.
[14:23:32] <kaduk> I don't see how 10984 would be relevant. (typo?)
[14:23:49] <wiesand> Yes, typo. 10948.
[14:24:07] <deason> 10948
[14:24:16] <wiesand> The second one, meant to correct side effects of 10804
[14:24:32] <deason> if we have to choose between them... the getcwd() issue existed for a while, didn't it?
[14:24:40] <Jeffrey Altman> it did
[14:25:08] <kaduk> It got worse during the 1.6 series, though, IIRC.  (Or maybe that was just a linux change that prompted more occurrences.)
[14:25:24] <deason> and (vfs-level) cache inconsistency seems "worse", but without any way to judge impact it's hard to make a decision about it
[14:26:25] <shadow@gmail.com/barnowl7628413F> it appears to be that the directory dentry is not updated to reflect
changes in what its children are
[14:27:00] <wiesand> Could the reproducer be added to the bug?
[14:27:00] <shadow@gmail.com/barnowl7628413F> the callbacks are delivered and acked; the contents are uust masked
[14:27:30] <wiesand> What's the OS?
[14:27:42] <shadow@gmail.com/barnowl7628413F> rhel6
[14:28:23] <wiesand> You're still working on it?
[14:28:24] <deason> " it appears to be that the directory dentry is not updated to reflect
changes in what its children are" this is still vague or I don't understand
[14:28:31] <deason> what is not getting updated?
[14:28:55] <deason> metadata like mtime or nlinks? or the presence of the child dentry?
[14:29:34] <shadow@gmail.com/barnowl7628413F> presense (absense, well, change in which) a child is
[14:30:19] <shadow@gmail.com/barnowl7628413F> foo is moved to foo~, and a new foo is created. the dentry rrflects
only foo, the old foo.
[14:30:59] <deason> 'mv foo foo2', and foo and foo2 both seem to exist?
[14:31:10] <deason> (the 'mv' on a remote client)
[14:31:39] <shadow@gmail.com/barnowl7628413F> mv foo foo2 && touch foo, remotely, and locally foo exists but is what
should now be foo2
[14:32:12] <deason> oh okay, the contents are those of foo2 when you open it
[14:32:16] <shadow@gmail.com/barnowl7628413F> confounder: not consistently reproducible even on what should be
identical machines
[14:32:33] <shadow@gmail.com/barnowl7628413F> yes, contents of foo2 as foo
[14:34:12] <wiesand> can't reproduce on my EL6 clients.
[14:34:27] <deason> I'm sure it's not as simple as just "it always happens"
[14:34:35] <shadow@gmail.com/barnowl7628413F> yup
[14:34:51] <kaduk> On the machine where it's observed, is it always reproducible?
[14:35:01] <shadow@gmail.com/barnowl7628413F> still trying to collect enough data to understand circumstances
[14:35:35] <shadow@gmail.com/barnowl7628413F> we don't know if it always is but there is a way to always reproduce
it.
[14:35:58] <kaduk> Well, that's better than the alternative, I suppose.
[14:36:25] <deason> is that something that could be put in the ticket? (I know it doesn't seem to work on other systems, but just to know what in general is happening)
[14:36:41] <shadow@gmail.com/barnowl7628413F> i'll get more details in the ticket
[14:38:16] <wiesand> I uploaded the last Red Hat binaries today. We could announce those, mention the problem, but encourage folks to test pre2 all the same because it's rare?
[14:38:38] <wiesand> So we'd get some value out of waiting for another week?
[14:38:57] <shadow@gmail.com/barnowl7628413F> well, might as well let people test pre2
[14:39:14] <shadow@gmail.com/barnowl7628413F> so yeah
[14:39:57] <wiesand> Any objections to doing that and waiting another week?
[14:40:18] <shadow@gmail.com/barnowl7628413F> none here
[14:40:32] <deason> it would be good to have an idea of what we're going to do if nothing changes
[14:40:46] <deason> such as, pulling the getcwd fixes out
[14:41:06] <wiesand> Probably.
[14:41:14] <deason> well, 'good' but not 'necessary', if you really want to just defer it to later
[14:42:11] <wiesand> With Daria and Marc working on it, I'm fairly optimistic that it will be solved eventually.
[14:42:38] <Jeffrey Altman> Given the lack of current information I am in favor of pulling the getcwd patches from 1.6.8 final but that opinion is subject to change based upon what is discovered regarding the trigger conditions.
[14:42:57] <wiesand> Maybe not within a week though. Reverting those two changes is an option if we feel we can't delay 1.6.8 any further.
[14:43:02] <Jeffrey Altman> Obviously, I would prefer a fix
[14:43:12] <Jeffrey Altman> agreed
[14:43:42] <wiesand> Ok. Good luck, Daria!
[14:43:43] <shadow@gmail.com/barnowl7628413F> yeah
[14:44:05] <wiesand> Andrew brought up RT #131852
[14:44:40] <Jeffrey Altman> The underlying problem in Heimdal is being worked on by Nico Williams
[14:45:19] <kaduk> I don't think there is harm from taking the patch in 11075.
[14:46:06] <deason> I think it's more "correct" to avoid any libkrb5 calls at all without rxkad.keytab, but I dont 'know if it will ever matter
[14:46:41] <wiesand> Will 11075 help the known cases?
[14:46:53] <shadow@gmail.com/barnowl7628413F> it will
[14:46:59] <deason> yeah
[14:47:18] <wiesand> So, ok for pre3 or 1.6.9.
[14:47:36] <wiesand> This wasn't a recent regression, right?
[14:47:39] <deason> or the other potential approach was to detect when libkrb5 isn't returning errors properly, and disable libkrb5 stuff
[14:48:00] <kaduk> Hasn't changed since rxkad.keytab was introduced.
[14:48:02] <deason> it was a regression in 1.6.5; if you mean for heimdal, it's not a regressio
[14:48:09] <shadow@gmail.com/barnowl7628413F> no. it was that way when cimmited for 1.6.5
[14:48:41] <wiesand> And there is a simple workaround I seem to rememeber.
[14:49:04] <kaduk> Yeah, just touch rxkad.keytab
[14:49:24] <deason> there is, but I had a concern that when you're actually using rxkad.keytab, there are a large number of other potential issues, with no way to know
[14:49:37] <deason> er, know that there's anything wrong
[14:49:51] <kaduk> because error reporting is neutered, right.
[14:50:18] <deason> but I wouldn't expect this to be too widespread; at least with the 1.5 versions, compiling heimdal on the relevant platforms didn't really work "out of the box"
[14:50:38] <kaduk> Speaking of heimdal, any guesses for a 1.6 release date?
[14:50:56] <deason> so it takes a little effort; it doesn't "just happen" on a widespread basis
[14:51:28] <deason> so taking that into account, just warning people may be enough, since anyone using heimdal on aix/solaris should know, and would probably notice the warning
[14:52:06] <wiesand> Sounds like the right approach for the time being.
[14:52:51] <wiesand> And 11075 is still better than nothing, but I don't see an urgent need to rush it out.
[14:52:59] <deason> well, "for the time being", are you hoping for anything else?
[14:53:33] <wiesand> We have 11075, which is an alternative to "just warn".
[14:53:44] <deason> I didn't think there are any plans to do anything else unless something comes up here; what we do now is all that's planned on getting done
[14:53:50] <deason> no no
[14:54:11] <deason> hmm, let me try to explain briefly what I'm thinking
[14:54:30] <wiesand> And it's being worked on in Heimdal. Eventual, the solution will be to upgrade that.
[14:56:35] <deason> a few options, not all mutually exclusive:
1. workaround the known crash, which is 11075
2. workaround any potential libkrb5 problems when rxkad.keytab is not in play (like 11075, but bigger)
3. warn about how using rxkad.keytab with heimdal on platform X could cause problems (with or without 11075)
4. test for the heimdal error-reporting problem at runtime, and disable libkrb5 calls if the problem is present
[14:57:32] <deason> so, with 11075 in place, a warning is still warranted; 11075 does not solve all of the relevant issues (but maybe solving all of them is not appropriate)
[14:57:35] <wiesand> 3, with 11075 eventually, unless Heimdal 1.6 beats us.
[14:57:42] <kaduk> (4) feels like a big patch that would be hard to review
[14:58:18] <deason> it would be about as big as the test program I put in the ticket; I don't think it's that big, but doing that test feels pretty ridiculous
[14:58:58] <deason> okay, but warning goes out ~now? since this is applicable to everything 1.6.5+, not just a new release
[14:59:38] <kaduk> Yes.
[14:59:45] <wiesand> We can add it to the "known problems" section when announcing the additional 1.6.8pre2 binaries.
[14:59:46] <kaduk> Are you volunteering to compose mail to -announce?
[14:59:57] Marc Dionne joins the room
[15:00:49] <deason> kaduk: yes, I can come up with something
[15:01:18] <deason> but I feel it's not enough to just put it in 1.6.8pre* stuff, since it's a case where a security release (1.6.5) can cause potentially worse security or stability issues
[15:01:28] <wiesand> Fine.
[15:01:57] <wiesand> Hello Marc.
[15:02:03] <Marc Dionne> hi Stephan
[15:02:12] <deason> wiesand: but that's just my opinion; if you want it in 1.6.8pre* notes, I'll support you in that and everything
[15:02:52] <wiesand> No, you're probably right.
[15:03:01] <deason> I didn't mean to be adamant about it; those thoughts just came up as we were discussing it
[15:03:56] <wiesand> I'm quite fine with a separate announcement. To -announce. In that case I'd send the pre2 mail to -info again anyway.
[15:04:35] <wiesand> Marc, could you test recent Linux mainline recently?
[15:05:07] <Marc Dionne> Yes, we're still ok as of some time yesterday, so rc3+
[15:05:13] <wiesand> [I think we're done with "problem reports"]
[15:05:21] <wiesand> Marc: Thanks. Good news.
[15:05:41] <Marc Dionne> looks like there will definitely be patches needed for 3.16 though
[15:05:44] <Jeffrey Altman> I'm reluctant to send an e-mail to announce saying there is a problem with Heimdal until such time as patches are available for Heimdal 1.5.x and included in a 1.6 pre-release.
[15:06:46] <kaduk> Just because there is not a fix for a problem does not mean that there is no problem; would you prefer just sending to -info until there is a full "official" fix?
[15:07:45] <deason> more importantly I'd say it's a problem that peopel can do something about, even without a fix for heimdal
[15:07:56] <wiesand> Marc: what's the 3.16 ETA?
[15:08:34] <Marc Dionne> hmm, well figure another 5-6 weeks for 3.15, plus another 3 months or so, as an estimate
[15:09:16] <Jeffrey Altman> What is such an announcement going to say "there is a known problem if you built any version of Heimdal on Solaris and AIX (which doesn't build on those platforms out of the box anyway) and use of any multi-threaded application.   At the present time there is no fix for the problem but we advise that if you are running OpenAFS 1.6.5 or later on said platforms and are using non-DES keys that you restore use of DES keys"?
[15:09:25] <wiesand> (Just trying to figure out whether we have a chance to release 1.6.9 in time, or will need 1.6.8.1)
[15:10:04] <deason> my advice would be either to delay upgrades, or "don't use heimdal on those platforms"
[15:10:54] <deason> or you can run it anyway; just be aware that issues exist
[15:12:01] <Jeffrey Altman> whoever decides to use Heimdal on those platforms must have a pretty good reason for doing so because Heimdal doesn't provide binaries and it doesn't build on those platforms without a lot of hand holding.    Heimdal 1.6 is intended to build on Solaris which is why this problem needs to be fixed there but the initial report comes from a site running 1.5.2
[15:13:39] <deason> the fact that heimdal doesn't build easily on those platforms is the very reason I think a warning is "enough"; if that were not the case, I would be less sure even a warning would be sufficient, and a workaround in the code more appropriate
[15:14:01] <Jeffrey Altman> I have no issue with starting a discussion topic on -info whose thread is appended to as more details become available.
[15:15:48] <wiesand> Andrew, is that acceptable?
[15:16:00] <deason> good enough that I don't wanna spend more time on it
[15:16:06] <deason> to -info it is
[15:16:30] <wiesand> Good.
[15:17:01] <wiesand> Regarding 1.6.8(pre3?), I think the schedule will be driven by the getcwd issue.
[15:17:17] <wiesand> Let's see what the state will be next week.
[15:17:30] <wiesand> So, on to "testing".
[15:17:45] <wiesand> Is it working out?
[15:18:10] <deason> I haven't been pressing the testers, since I wasn't sure what was happening with the 'known issues' and a possible pre3
[15:18:24] <deason> but based on this meeting, I shouldn't pay attention to that and should just get them to do something anyway
[15:18:59] <wiesand> It seems most sites are very unlikely to hit a known issue.
[15:19:24] <wiesand> It's fair to tell them, but testing pre2 as it is makes sense IMO.
[15:19:24] <deason> well, the concern is more "okay now that you've tested pre2 here's a pre3"
[15:19:35] <deason> even though we knew a pre3 was coming out
[15:19:53] <deason> and I know the reasons for wanting them to do that, but sometimes that doesn't sound good to a user
[15:20:10] <wiesand> I'm not even sure we need a pre3 if we just revert the two changes.
[15:20:23] <deason> and some sites I feel are going to take longer than a week to do anything....
[15:20:39] <deason> well yeah yeah, I mean
[15:21:06] <deason> I wasn't sure what was happening with the other issues; I wasn't sure if this meeting was going to say "there's definitely a new fix going in that's going to yield a pre3"
[15:21:17] <deason> but things are a little more clear now
[15:21:40] <wiesand> If you feel uneasy about it, don't do it.
[15:21:47] <deason> no no, I'm fine
[15:22:02] <deason> I'll poke them about it
[15:22:39] <wiesand> They can make their own decision whether they want to do anything, based on the available facts.
[15:23:09] <wiesand> Anything else to discuss today?
[15:24:11] <wiesand> I guess that means "no".
[15:24:14] <Jeffrey Altman> I just want to say thank you to Ben, Simon and Daria for the migration effort
[15:24:35] <wiesand> Right. And congratulations on how smoothly it went.
[15:24:48] <kaduk> It went mostly smoothly, I think.  The wiki is not fully functional yet, though; I need to scrounge up some perl modules that aren't in EPEL6.
[15:24:51] <Jeffrey Altman> As smoothly as it appeared to the rest of us to go
[15:24:56] <Marc Dionne> are the jabber logs for the new server available via web?
[15:25:07] <kaduk> Marc: yes, but / gives a 403 for some reason.
[15:25:15] <kaduk> Files end in .html, not .txt like they used to.
[15:25:25] <Marc Dionne> the new server is better at giving the recent messages when you come in, which is great
[15:25:43] <deason> there's still a directory listing at least at http://conference.openafs.org/release-team@conference.openafs.org/
[15:25:50] <kaduk> The new server is ejabberd
[15:26:12] <kaduk> deason: yes, but the page that would give a list of meetings is not viewable
[15:26:23] <deason> yeah I know what you mean
[15:26:25] <meffie> yes, the new jabber server is nice.
[15:26:38] <wiesand> Agreed.
[15:27:08] <wiesand> Thanks a lot everyone!
[15:27:09] <meffie> kaduk: let me know when the wiki is ready. i'll fix the pages the spammers broke.
[15:27:26] <kaduk> meffie: pushes through gerrit should work fine already.
[15:27:34] <kaduk> The issues are more with editing from the web and such.
[15:27:37] <meffie> ok. thanks.
[15:28:05] <wiesand> Feel free to continue, but I have to leave :-)
[15:28:22] wiesand leaves the room
[15:30:21] meffie leaves the room
[15:36:29] deason leaves the room
[15:47:35] kaduk leaves the room
[16:19:44] Marc Dionne leaves the room
Powered by ejabberd Powered by Erlang Valid XHTML 1.0 Transitional Valid CSS!