- Wednesday, December 17, 2014

release-team@conference.openafs.org

Wednesday, December 17, 2014< ^ >

Room Configuration

Room Occupants

GMT+0
[00:21:46] shadow@gmail.com/barnowlE5B64A04 leaves the room
[00:21:57] shadow@gmail.com/barnowlE5B64A04 joins the room
[13:53:43] meffie joins the room
[14:02:03] Marc Dionne joins the room
[14:48:16] kaduk joins the room
[14:50:01] <meffie> good morning Dr. Kaduk
[14:52:49] <kaduk> Mornin'
[14:53:17] <Marc Dionne> morning mike, ben
[14:57:12] wiesand joins the room
[14:57:54] <Jeffrey Altman> appropriate local time zone specific greeting
[14:58:33] <wiesand> Good evening
[14:59:33] <wiesand> Thanks for joining today
[14:59:43] <wiesand> I guess it will be the last meeting for 3 weeks..
[14:59:59] <wiesand> Marc, welcome :)
[15:00:06] <Marc Dionne> hi Stephan
[15:00:25] <wiesand> I think we’re a bit desperate w.r.t. linux clients. At least I am.
[15:00:29] <wiesand> Any news?
[15:00:53] <Marc Dionne> did you get the email I sent you?
[15:01:00] <kaduk> I think even a patch for the "inode freed while on LRU" that only worked on affected systems and broke unaffected systems would still be welcome, for folks here.
[15:01:23] <Marc Dionne> such a patch would be fairlyy simple
[15:01:38] <kaduk> I was thinking in the shower this morning that probably the difficulty there was writing a configure test.
[15:01:47] <Marc Dionne> the problem is there's no way to check if you have a kernel of the leaking or the non leaking variety
[15:02:40] <Marc Dionne> you could red the refcount before/after, but you don't have a lock so someone else can play with the refcount during that time
[15:02:49] <kaduk> Maybe we should summon a tactical strike to the LKML then.
[15:03:02] <wiesand> Marc, I did receive your mail. Thanks. Let me cite for the others: “So given all the recent problems with d_splice_alias, the current tack
that I'm testing in yfs is just to get rid of using d_slice_alias
alltogether, since there's really no point without the nfs exporting
facility which doesn't compile on recent kernels (GPL problems, and
probably also bitrot at this point). I've been thinking this for a
while, but this problem forces the issue.
“
[15:03:26] <Marc Dionne> lkml tends to care very little about issues from out of tree code
[15:03:40] <kaduk> True.
[15:05:12] <wiesand> I think releasing 1.6.11 w/o a fix for this issue makes no sense. Objections?
[15:05:21] <Marc Dionne> but maybe moving away from using d_splice_alias is a more substantial change for 1.6 - maybe there it would be better to assume a non leaking kernel
[15:06:04] <wiesand> Wouldn’t this break the majority of production kernels?
[15:06:16] <Marc Dionne> besides, i think that erring with an additional reference will cause less problems for the faulty kernels than the lower refcount did
[15:06:20] <kaduk> It's hard to say for certain without looking at a patch, but it does sounds like conceptually a big change.
[15:07:18] <Jeffrey Altman> We could force the builder to explicitly state whether the kernel that is being built for is leaking or non-leaking.
[15:07:25] <kaduk> While we're talking about ugly not-quite-solutions, we could add a configure argument --with-leaking-refcounts or something
[15:07:35] <Jeffrey Altman> Then it is up to the distributions to package the correct thing for their kernels
[15:08:19] <wiesand> What’s the correct thing for RHEL5/6/7, today and next week?
[15:08:20] <kaduk> That does seem like the fastest path forward here.
[15:08:31] <Marc Dionne> distros tend to rev quickly with stable updates, so not sure that many would still have the faulty kernels as current in their repos
[15:09:10] <Marc Dionne> i think rh5 and rh6 are unaffected, because they don't have that error path in d_splice_alias
[15:09:28] <wiesand> Is the bug easy to trigger and detect?
[15:09:44] <Jeffrey Altman> if it were we could write a configure test
[15:09:55] <Jeffrey Altman> or even a run time test
[15:10:55] <Jeffrey Altman> the right thing for the kernel folks to have done would have been to change symbol names in conjunction with a change in refcnt semantics but they didn't
[15:11:28] <Marc Dionne> recall that the only code that exists for them is in the tree...
[15:12:01] <kaduk> Right; breaking userspace is a cardinal sin, but in-kernel interfaces can break on a whim
[15:12:06] <Jeffrey Altman> exactly.   even in tree though best coding practice would have been to change the symbol name to ensure that all  callers were updated
[15:12:13] <wiesand> Any objections to the configure switch? Marc, could you whip that up in the near future?
[15:12:20] <Marc Dionne> a configure test could identify kernels where this is potentially a problem.  i wonder if looking at kernel versions would be workable here
[15:13:29] <wiesand> We could maintain a list of kernels where we know...
[15:13:42] <Marc Dionne> Stephan yes, so the proposal is: 1) make the code work for the non leaking kernels 2) add a configure switch to choose the leaking kernel behaviour
[15:13:44] <wiesand> But that would never be complete and rot quickly
[15:13:45] <kaduk> Looking at versions would be better than always assuming one thing, but I thought distros would frequently pull lots of patches in without bumping the version
[15:14:19] <Marc Dionne> esp. for Redhat, looking at the version doesn't tell you much
[15:14:35] <kaduk> Well, Jeffrey was talking about requiring the configure switch to be given, not just one default behavior with an override.
[15:15:42] <wiesand> “Find your kernel source, and check whether or not this patch has been applied”?
[15:16:15] <kaduk> Pretty much.
[15:17:03] <wiesand> It’s better than not having a release working with recent kernels for much longer.
[15:17:51] <kaduk> Pretty much, yup.
[15:18:10] <wiesand> I don’t have a strong opinion on whether the switch should be required or have a default (and which one).
[15:18:31] <wiesand> So: whatever is easier/faster to implement ;-)
[15:18:33] <Jeffrey Altman> my objection to a default is that is a game of russian roulette
[15:18:37] <Marc Dionne> i'll try to push something in the next few days, we can discuss further there
[15:18:53] <wiesand> Marc: Thanks!
[15:19:13] <Jeffrey Altman> downstream distributions should know what their kernels do
[15:19:19] <wiesand> Jeffrey: If the person building the client doesn’t know, it’s russion roulette either way.
[15:19:19] <Marc Dionne> if I can think of a way to sanely test for the bug at runtime i will look at that
[15:19:47] <Jeffrey Altman> The point is to force the person building to kernel to find out
[15:20:23] <wiesand> I’m fine with both options. Both as the release manager and the SL packager.
[15:20:29] <Marc Dionne> so you'd prefer a mandatory configure option with no default?
[15:21:50] <kaduk> I think that's the direction we're leaning in.
[15:21:55] <Marc Dionne> that's where removing d_splice_alias is more interesting, it makes the guess work moot
[15:22:42] <kaduk> Right.  But maybe we don't need to block 1.6.11 on a patch for that.
[15:23:21] <Jeffrey Altman> My preference would be a run time test.   I don't think that is possible at this time.
My next preference is a configure test based upon functionality.  I don't think that is possible at this time.
My next preference is a configure test based upon knowledge.   Where we do not have absolute knowledge of the correct behavior there should be no default.
[15:23:54] <wiesand> Sounds perfectly reasonable to me.
[15:24:09] <kaduk> Same here.
[15:25:06] <wiesand> So, agreed?
[15:25:14] <kaduk> Is Marc okay implementing this plan, or should someone else look at it?
[15:26:47] <Jeffrey Altman> The YFS position of the nfs exporter is that it cannot work and we will not include such functionality in our cache manager distributions.    We believe that userland nfs servers are the correct implementation for that functionality.   For YFS the d_splice_alias removal approach is appropriate.  Given my conflicts of interest I will not be involved in deciding whether OpenAFS removes the NFS exporter from its code base.
[15:26:54] <Marc Dionne> the minimum needed here is a patch that deals with both types of kernels correctly in the code around d_splice_alias, i will push something for that; we might argue more about the configure needs
[15:27:21] <kaduk> That sounds good to me, thanks.
[15:27:29] <Jeffrey Altman> Marc, please provide the kernel patches.   Someone else can do the configure logic for OpenAFS
[15:27:48] <Marc Dionne> ok, that sounds reasonable
[15:28:29] <wiesand> Thanks.
[15:28:30] <Jeffrey Altman> Marc, you might as well discuss what you have learned about the 3.19 kernel even if you do not have patches ready yet.
[15:28:56] <Marc Dionne> well i currently have a 5 patch set for 3.19, which is still not at rc1 yet
[15:29:19] <Marc Dionne> i will wait until sometime post rc1 before pushing them
[15:29:20] <wiesand> “Exciting” changes?
[15:29:47] <Marc Dionne> well changes to many basic things, luckily the patches so far are not too bad
[15:30:21] <wiesand> Anyway, I think we should release 1.6.11 before 3.19 is out, and probably w/o 3.19 related changes, and do a 1.6.11.1 for that.
[15:30:36] <kaduk> Sounds good to me.
[15:31:12] <wiesand> But thanks a lot for the heads up.
[15:31:44] <wiesand> Any new input regarding RT #131967?
[15:31:46] <Marc Dionne> for the bits, d_alias becomes d_u.d_alias, f_dentry becomes f_path.dentry, struct nameidata becomes opaque, msg.msg_iov is replaced
[15:32:07] <kaduk> "Fun"
[15:32:31] <wiesand> Lots... I’m not going to bring up the fuse discussion again, but...
[15:32:38] <Jeffrey Altman> finally Marc, can you summarize what we know about RT 131976, Christof's mmap bug
[15:33:05] <Marc Dionne> yeah i haven't looked at 131967 much.  but i have reproduced Christof's mmap corruption
[15:34:27] <Marc Dionne> not down to the root cause yet, but seems to occur when a partial write needs to be done because the cache is too full, and there is ongoing writing
[15:35:01] <Marc Dionne> this is an area that has a history of problems
[15:35:46] <wiesand> Do we know whether it’s a recent regression? Or what are the chances?
[15:36:36] <Marc Dionne> i'm pretty sure that this has worked correctly in the past, but it's possible that the issue has been there a while
[15:37:17] <Marc Dionne> since i can reproduce I should be able to narrow it down in the next few days
[15:38:15] <Marc Dionne> we get blocks of zeros in the file, always at page (4K) boundaries, and covering usually 1 but sometimes more complete pages
[15:38:47] <wiesand> Rings a bell...
[15:39:51] <wiesand> We should ask Christof to patch out client side idledead...
[15:40:03] <Jeffrey Altman> it has nothing to do with idledead
[15:40:18] <wiesand> That rings a bell too ;-)
[15:40:44] <wiesand> But I admit I have no clue.
[15:43:20] <wiesand> I’ll run some tests with the clients I have available. Let’s see.
[15:44:02] <Jeffrey Altman> I say it has nothing to do with idle dead because the problem is reproducible when there is only one client involved and an AuriStor file server.   There are no failed RPCs.   The problem occurs strictly when an immediate StoreData must be performed in order to make room in the local cache.   In this situation write-on-close is not used.
[15:44:37] <wiesand> If it wasn’t introduced recently, we probably shouldn’t block 1.6.11 on this. If it was, well...
[15:44:51] <Jeffrey Altman> I don't think this should block 1.6.11
[15:45:10] <wiesand> Ok.
[15:46:01] <wiesand> Fine. So let’s move on to “1.8”.
[15:46:01] <Marc Dionne> my guess would be that it's not very recent
[15:46:03] <Jeffrey Altman> We have no knowledge of which OpenAFS versions can reproduce the issue and it might be a combination of OpenAFS kernel module and a specific Linux kernel version that triggers it
[15:46:45] <Jeffrey Altman> fyi, I have 15 minutes left before I must go.
[15:46:48] <kaduk> I didn't send out a list of 1.8 topics for this meeting, as you probably noticed.
[15:47:05] <wiesand> Let’s use those for “1.8” then.
[15:47:13] <wiesand> I did ;-)
[15:47:14] <kaduk> There's a decent number of verified +2 changes in gerrit many of which do not depend on changes lacking +2
[15:48:04] <Jeffrey Altman> My time for reviewing / merging this past week has been lost to hardware recovery.
[15:48:22] <kaduk> There's a number of things where we don't necessarily need discussion, just someone to do the work -- things like Andrew's stack dealing with rx idleness and such, the stack I adopted with packet sizes and MTU updates, hash table sizes, log rotation, error messages to stderr, encrypting traffic,e tc.
[15:48:42] <kaduk> There are a few things that we could talk about, though.
[15:49:07] <kaduk> Jeff: understood; hardware recovery can pop up at any time, and usually cant wait.
[15:49:10] <shadow@gmail.com/barnowlE5B64A04> when i finish fighting with the 3 other things i am i wil review and
merge some stuff
[15:49:15] <kaduk> Thanks.
[15:49:30] <kaduk> Anders has had a patch sitting around to avoid using tee in the linux kernel module build.
[15:50:13] <kaduk> (let me find it...)
[15:50:17] <wiesand> [my time is currently being eaten by CVE-2014-9322]
[15:51:01] <Jeffrey Altman> I would like to discuss 9588   ptserver limit length of namelist, idlist
[15:51:18] <kaduk> 10513 is Anders' thing.
[15:51:24] <kaduk> I'm mostly curious how Marc feels about it.
[15:51:45] <kaduk> For 9588, I pushed an update which just used a larger constant limit, as you saw.
[15:52:35] <kaduk> Jeff, what about 9588 in particular?
[15:52:35] <Marc Dionne> i will have another look at 10513 - it's been a while
[15:53:00] <kaduk> Thanks.
[15:53:14] <Jeffrey Altman> My concern about 9588 is whether we can safely impose the limit in that manner.   There used to be a limit check in the Transarc days and it was simply a test in the server side RPC function to see whether the client was requesting a sane length value and rejecting it if it was not.
[15:53:37] <kaduk> There was also a limit in the RPC-L, though -- I'm not sure the server-side RPC function check did anything.
[15:53:53] <kaduk> Unless that was an athena-local patch or something.
[15:54:53] <kaduk> (When we finish this,) I'd also like to talk about 11629, which isn't verified since the BSD ffs() family of functions aren't portable and I haven't rewritten it to use log() yet.
[15:57:34] <kaduk> I guess it's not actually effective unless it's enforced in the RPC stubs, anyway, since the server proc isn't called until after the allocation is complete, and the goal is to avoid letting the peer force ridiculously large allocations.
[15:58:32] <Jeffrey Altman> yeah but then my concern is we are making a protocol change
[15:59:07] <Jeffrey Altman> unless we extend rxgen to support generating code to enforce an implementation maximum
[15:59:55] <kaduk> Well, the concern about a protocol change is why I added a new constant with OPENAFS_ prefix, to try to indicate that it is in fact an implementation specific constant and not part of the protocol.
[16:00:40] <Jeffrey Altman> that doesn't help because it imposes a restriction on the client
[16:01:29] <Jeffrey Altman> what we want is an RPC stub that permits any length to be requested and for the server to reject requests for a length that is considered too large for that deployment
[16:01:47] <kaduk> I thought you also had concerns about the values sent back to the client.
[16:02:54] <kaduk> And it seems like maybe there is a "perfect is the enemy of the good" thing going on here -- I think I would prefer the current proposal in gerrit to nothing, even if the rxgen extension is what we really want to end up with.
[16:04:49] <Jeffrey Altman> I'm not finding any concerns I raised about the values sent back to the client on 9588 or in a quick search of my logs of this channel
[16:05:48] <kaduk> I see "prlist<> and prentries<> should also be limited.  Outputs from the server are inputs to the clients." in the fifth comment on 9588
[16:06:03] <kaduk> which I think is what I'm thinking of.
[16:12:17] <wiesand> Looks like Jeffrey left?
[16:12:19] <Jeffrey Altman> slightly different but similar issue as Andrew pointed out.   The ptserver cannot make trust assumptions about the client issuing the RPC but the client is making a trust assumption about the ptserver.   The prlist<> and prentries<> can be limited by a cap on the number of responses that a server is willing to send.  I am nervous about imposing protocol limits on pre-existing RPCs.
[16:12:27] <wiesand> [ok]
[16:12:57] <Jeffrey Altman> lets discuss 11629 quickly because I'm 12 minutes late
[16:13:11] <kaduk> Okay, we'll postpone 9588
[16:14:21] <kaduk> The current code is a bit ugly, but I wanted to ask about the general approach, of checking whether to just use an existing preset, then tacking off bits of the memory for the small things, and splitting between vhash and callbacks at the end
[16:14:43] <Jeffrey Altman> one final thought on 9588, would a single limit on the size of memory to be allocated by rxgen generated code across all RPC be sufficient?
[16:15:18] <kaduk> A single rxgen-wide limit might be sufficient, but I'd want to check whether we have any RPCs in data-transfer paths that might reasonably send very large chunks.
[16:16:13] <kaduk> Another question about 11629 is whether we would be locking ourselves in to the particular breakdown, or if we would be okay rebalancing things during a stable release, e.g., if we wanted to give more to volcache and less to vhash or something.
[16:17:51] <kaduk> Oh, and when listing out the 1.8-related things that are ongoing/need work, we have an email thread or two that should get continued.
[16:17:55] <Jeffrey Altman> tuning a file server and a vol server for optimal performance is very dependent upon the number of volumes, type of volumes, and number and type of vnodes stored on the server in addition to the usage pattern.
[16:19:54] <kaduk> That doesn't mean we should require everyone to do a lot of tuning just to get something with decent performance.
[16:21:11] <wiesand> Is there a way to query the actual values from a running server?
[16:21:24] <kaduk> query which values?
[16:21:35] <wiesand> All tunables.
[16:22:03] <Jeffrey Altman> I think you should focus improvements on -udpsize, -rxpck, -p, -b, -cb
[16:23:58] <kaduk> Okay.
Does continuing along this route seem reasonable, though?
[16:24:19] <Jeffrey Altman> wiesand: what would be necessary is to either scan the partitions as part of startup to count volumes by type and vnodes by type, or to maintain a count of these values that are updated during runtime and saved at shutdown.   When YFS does evaluations of cells as part of our support contracts one of the most important functions we do is to analyze usage patterns and the sizes and distributions of data in volumes on servers to determine how the server should be tuned.
[16:26:07] <Jeffrey Altman> what we see at many sites is that the administrators simply allocate large amounts of memory for every possible setting and that can hurt performance
[16:26:11] <wiesand> Makes sense. But what I was after is a way for the admin to find out what the runtime value of, say, numberofcbs is after autotuning.
[16:26:34] <meffie> ah.
[16:26:45] <wiesand> If not, the results should at least be logged...
[16:28:18] <Jeffrey Altman> Callbacks are an interesting example.   Increasing the callback table size well beyond the actual need is a performance hit because it increases the time necessary to perform garbage collection operations.   These operations are done on a periodic basis and hold a global lock.  While that lock is held no RPCs can be initiated or completed.
[16:29:44] <Jeffrey Altman> If the size of the callback table is too small, then there is callback table thrashing.
[16:29:48] <kaduk> So, we can document that "setting -auto N to very large values of N can reduce performance if the fileserver does not receive high levels of client traffic; experiment with different settings of N to find what is best for your site".
[16:31:01] <Jeffrey Altman> wiesand: you can use rxstats to obtain info about the callback allocations at runtime
[16:31:28] <kaduk> Anyway, we should probably let Jeff get to his other thing, since he's now 30 minutes late...
[16:31:34] <wiesand> Ah, ok.
[16:31:34] <Jeffrey Altman> thank you
[16:31:40] <wiesand> Thanks Jeff!
[16:31:41] <kaduk> Thanks for staying this long.
[16:31:46] <Jeffrey Altman> Everyone have a wonderful holiday
[16:31:54] <kaduk> You, too
[16:31:57] <meffie> happy holidays!
[16:32:30] <Jeffrey Altman> I will be offline from Xmas to New Years for the most part.
[16:32:58] <wiesand> Good for you :-)
[16:33:23] <Jeffrey Altman> Ben, I merged what I could that I didn't want to review further.  If you can rebase the conflicts I or Daria will press submit later.
[16:33:35] <Jeffrey Altman> bye
[16:33:35] <kaduk> *nods*
[16:33:36] <kaduk> thanks
[16:34:05] <wiesand> Shall we call it a day then? Anything else to discuss?
[16:34:15] <wiesand> (it’s getting late for me too)
[16:34:55] <kaduk> Let's call it a day.
[16:34:58] <kaduk> Happy holidays!
[16:35:11] <wiesand> Fine. Happy holidays everyone!
[16:35:34] <Marc Dionne> cheers everyone, happy holidays
[16:35:41] <wiesand> I’ll be online regularly, most of the time.
[16:36:02] <wiesand> So if there’s saomething to look at, I will ;-)
[16:36:45] <wiesand> Thanks everyone. Bye, see you next year ;-)
[16:36:51] wiesand leaves the room
[16:37:02] Marc Dionne leaves the room
[16:54:21] meffie leaves the room
[17:03:13] <jhutz@jis.mit.edu/owl> > we are making a protocol change
... or you're imposing an implementation limit.  The fact that you're doing
so by changing the .xg file does not change whether it's a protocol change.
I'm pretty sure I commented on that change to this effect, but reality is
that implementations need to be able to impose limits, and there is a limit
on the name list length _whether or not you write one in the .xg file_,
because you do not have an infinite amount of memory or address space.
[17:03:44] <jhutz@jis.mit.edu/owl> A number in <> in a .xg file causes rxgen to emit code to enforce a
maximum.  It does not change the encoding.
[17:47:06] kaduk leaves the room