Home
release-team@conference.openafs.org
Wednesday, November 1, 2017< ^ >
Room Configuration
Room Occupants

GMT+0
[13:07:04] meffie joins the room
[14:07:39] <meffie> oh drat, the linux builder is failing with
    --> 'kernel-ppa install'
    dpkg: error: cannot access archive 'linux-*.deb': No such file or directory
[14:08:07] <meffie> i'll have to take a look.
[14:09:11] mvita leaves the room
[14:28:50] mvita joins the room
[15:01:27] <kadukoafs@gmail.com/barnowl0AF0C398> Greetings
[15:01:35] <mvita> hello.  Are we meeting now, or in an hour (Europe is now standard time, US is still on daylight time)
[15:01:58] <kadukoafs@gmail.com/barnowl0AF0C398> A fascinating question.
[15:02:07] <kadukoafs@gmail.com/barnowl0AF0C398> I'm happy to defer to Stephan.
[15:02:36] <kadukoafs@gmail.com/barnowl0AF0C398> mvita: sorry about the multiple rounds of updates on the rx-event
stuff, by the way
[15:02:45] <mvita> oh, no problem.
[15:02:53] <mvita> it's a big patch
[15:03:04] wiesand joins the room
[15:03:04] <kadukoafs@gmail.com/barnowl0AF0C398> Yup, it is.
[15:03:20] <kadukoafs@gmail.com/barnowl0AF0C398> I briefly considered splitting it up by event (so, into 9 patches),
but that seemed a little silly.
[15:03:41] <mvita> Yes, all those in one patch is better.
[15:03:49] <wiesand> Hello
[15:03:56] <mvita> Hi Stephan.
[15:04:13] <wiesand> Not much to discuss today from my point of view
[15:04:19] <wiesand> except getcwd()
[15:04:21] <mvita> This is an hour earlier for you, is it not?
[15:04:40] <wiesand> Mark, you said you can now reproduce it at will
[15:04:55] <wiesand> Yes, we're back to standard time already
[15:05:00] <mvita> Yes, and I have also reproduced it without shakeloose running.
[15:05:24] <wiesand> Would you share your reproducer?
[15:06:17] <mvita> it's essentially the same as yours, but with some code and config changes to make it happen sooner.
[15:06:47] <mvita> I changed the interval for shakeloose, and I run with a small stat cache:  -stat 2000
[15:07:31] <mvita> Also, I'm doing it in the volume root, not in a subdir.
[15:07:47] <wiesand> But then you also "reproduced it w/o shakeloose"?
[15:08:00] <kadukoafs@gmail.com/barnowl0AF0C398> I assme that means "without try-harder"
[15:08:19] <mvita> I mean it happened before shakeloose ran
[15:08:40] <mvita> I have not reverted anything yet for bisection of the problem
[15:09:49] <wiesand> I guess you haven't checked whether the EL7.4 actually makes it worse?
[15:09:54] <mvita> o far I've done my debugging by syslog msgs, but that is cumbersome and sometimes swamps the log
[15:09:57] <mvita> So
[15:10:00] <mvita> gah
[15:10:18] <mvita> No, I have not checked that yet.
[15:10:55] <mvita> I have identified a few candidate Linux changes at 7.4 that could be affecting this.
[15:11:03] <mvita> but haven't gone further with that yet
[15:11:15] <mvita> Yesterday I installed systemtap and got that working.
[15:11:53] <mvita> I think things will go faster once I'm able to catch the cwd dentry being unhashed
[15:12:25] <wiesand> Thanks for looking into this.
[15:12:37] <mvita> Notre Dame reported that the problem does not appear on older kernels
[15:12:41] <kadukoafs@gmail.com/barnowl0AF0C398> seems likely to go faster then, yes, and thank you for looking into it
[15:13:10] <wiesand> I'm not sure that nd confirmed they ran 1.6.21 on older kernels
[15:14:17] <mvita> The Linux bisection is very manual because RH doesn't publish the commits that go into a kernel release.
[15:14:48] <wiesand> Not anymore, sadly.
[15:14:48] <mvita> Oracle has their "RedPatch" project - but they haven't got anything for this RH release yet.
[15:15:22] <mvita> oss.oracle.com/git/gitweb.cgi?p=redpatch.git;a=summary <https://oss.oracle.com/git/gitweb.cgi?p=redpatch.git;a=summary>
[15:16:30] <wiesand> Last commit in January - looks dead :-(
[15:16:44] <mvita> so I've been doing side by side comparisons of linux-stable logs and code with CentOS 7.4 source
[15:16:58] <mvita> just for the critical dcache bits
[15:17:16] <wiesand> sounds like a lot of work
[15:17:31] <mvita> anyway, as I said, I'm hoping systemtap will be more efficient once I get over the learning curve.
[15:18:06] <mvita> More efficient means help me find the problem more quickly
[15:19:19] <wiesand> Let's see what you'll have found next week. I think 1.6.21.2 is not super urgent even if they release Linux 4.14 this weekend.
[15:20:19] <wiesand> I'll try the stat cache trick.
[15:20:43] <wiesand> Anything else to discuss regarding 1.6?
[15:20:54] <mvita> first do 'cmdebug <client> -cache' to see what current stat value is
[15:21:39] <mvita> then 'cmdebug <client> -long' and count how many vcaches you have in use
[15:22:13] <mvita> you want to make sure that any test you run (like the git commands you are using) actually creates more vcaches than you've got configured for -stat
[15:22:37] <wiesand> ok, got it
[15:22:48] <mvita> I call that 'vcache pressure'\
[15:23:09] <mvita> it's what shakeloose is designed to deflate
[15:23:42] <wiesand> NB is -disable-dynamic-vcaches likely to help?
[15:24:02] <kadukoafs@gmail.com/barnowl0AF0C398> I thought dynamic vcaches was mandatory on linux
[15:24:09] <kadukoafs@gmail.com/barnowl0AF0C398> or was that OS X?
[15:24:41] <mvita> well, that's a good question, but I haven't tried it myself.
[15:24:59] <mvita> I don't remember if you can still do that on recent versions of Linux.
[15:25:51] <mvita> I'm taking a note to research —disable-dynamic-vcaches
[15:26:30] <wiesand> If it's not available on linux, we should fix the manpage
[15:27:28] <kadukoafs@gmail.com/barnowl0AF0C398>     if (cmd_OptionPresent(as, OPT_nodynvcache)) {
#ifdef AFS_MAXVCOUNT_ENV
       afsd_dynamic_vcaches = 0;
#else
       printf("afsd: Error toggling flag, dynamically allocated
vcaches not supported on your platform\n");
       exit(1);
#endif
[15:28:19] <kadukoafs@gmail.com/barnowl0AF0C398> So I guess what I'm remembering is that dynamic vcaches in general are
a linux-only thing.
[15:28:40] <wiesand> Yes, that's what I thought
[15:28:48] <kadukoafs@gmail.com/barnowl0AF0C398> Sorry for the confusion.
[15:29:16] <wiesand> np
[15:29:31] <wiesand> So, on/back to 1.8/master?
[15:30:23] <kadukoafs@gmail.com/barnowl0AF0C398> My long-promised patch for rx event handling is finally in gerrit, and
Mark took a (quick?) look at it.
[15:30:30] <kadukoafs@gmail.com/barnowl0AF0C398> But I don't know how close to happy he is with it.
[15:31:02] <mvita> I haven't had a chance to look at the new one(s) yet
[15:32:01] <mvita> And the first look was _not_ "quick".
[15:32:11] <kadukoafs@gmail.com/barnowl0AF0C398> I split out the MUTEX_ASSERT()s into a separate commit and reverted
the putConnection-->rx_DestroyConnection changes, since we have a
housekeeping thread to clean up connections with refcount 0.
[15:32:26] <kadukoafs@gmail.com/barnowl0AF0C398> > the first look was _not_ "quick".
that's reassuring to know.
[15:32:27] <meffie> (oops, sorry lost track of time!)
[15:32:55] <kadukoafs@gmail.com/barnowl0AF0C398> There was also a bug that only showed up on the server side, since the
Challenge event handler can be called with NULL event, and would then
try to put a nonexistent event reference.
[15:36:29] <meffie> > Anything else to discuss regarding 1.6?
wiesand: any news on macos users/testers?
[15:37:33] <wiesand> no :-(
[15:37:48] <wiesand> but still hope
[15:37:59] <kadukoafs@gmail.com/barnowl0AF0C398> (contributing to conversation fork)
Since the rx event stuff is believed to be the only 1.8 blocker, of
course additional reviewers than Mark are most welcome.
[15:38:14] <meffie> ok, we have another person in SNA that just upgraded to high sierra and will be running the client.
[15:39:44] <meffie> > course additional reviewers than Mark are most welcome.
ok, i'll review too.
[15:39:52] <kadukoafs@gmail.com/barnowl0AF0C398> Thanks, Mike!
[15:40:12] <meffie> and Andrew told me he has started reading the rxgk patches in gerrit.
[15:41:10] <kadukoafs@gmail.com/barnowl0AF0C398> That was going to be my next item :)
[15:41:33] <kadukoafs@gmail.com/barnowl0AF0C398> No comments in gerrit yet, so hopefully that means things are not
completely crazy.
[15:41:34] <meffie> heh, sorry to get ahead.
[15:42:11] <kadukoafs@gmail.com/barnowl0AF0C398> No worries, I think it was time to move to master anyway
[15:42:28] <kadukoafs@gmail.com/barnowl0AF0C398> Well, I guess I can note the level of testing I gave the rx-event
patch.
[15:42:36] <meffie> ok
[15:43:11] <kadukoafs@gmail.com/barnowl0AF0C398> In that I have a single local VM that is both server and client for my
test cell.  Which is more of a "smoke test" than a real stress test,
and I haven't added tracing to check if (e.g.) the nat ping events are
firing at all.
[15:43:41] <kadukoafs@gmail.com/barnowl0AF0C398> back to master: any other topics?  I haven't looked at the high sierra
or pthread-conversion changes yet.
[15:44:44] <meffie> i pushed changes to convert upserver and upclient to pthreads.
[15:45:02] <kadukoafs@gmail.com/barnowl0AF0C398> exciting
[15:45:11] <meffie> they were already single threaded at the application layer, so it was just a matter of makefile foo.
[15:45:26] <kadukoafs@gmail.com/barnowl0AF0C398> *nods*
[15:45:32] <meffie> try to contain your excitement ;)
[15:46:03] <meffie> so, that leaves just kaserver (and WINNT bit) left using src/lwp
[15:46:11] <meffie> i think.
[15:46:57] <meffie> should we convert kaserver or just move lwp under it?
[15:47:42] <meffie> i feel we should convert it and kill off src/lwp
[15:47:57] <mvita> ooh
[15:48:03] <mvita> no, move lwp under it
[15:48:15] <kadukoafs@gmail.com/barnowl0AF0C398> "whichever is easier", which kind of feels like move lwp into
src/kauth, but I haven't looked at the other option.
[15:48:17] <mvita> no one can test a pthreaded conversion adequately
[15:48:31] <mvita> s/can/will/
[15:48:47] <meffie> yes, but that would teach you for using kaserver anyway. let that be a lesson :)
[15:50:02] <meffie> if we just move it, we still have to make sure it builds, and no one will be using it anyway.
[15:50:21] <meffie> so, "easier" long term may be to kill it.
[15:50:28] <mvita> "no functional change incurred by this commit"
[15:50:39] <kadukoafs@gmail.com/barnowl0AF0C398> > kill it
LWP, or kaserver?
[15:50:44] <meffie> LWP
[15:51:01] <meffie> sorry for the pronoun.
[15:51:01] <mvita> we have better things to do.
[15:51:27] <wiesand> I just managed to reproduce getcwd, 2 out of 2 times
[15:51:35] <meffie> yay.
[15:51:42] <kadukoafs@gmail.com/barnowl0AF0C398> Yeah, we have so many important things to do and kauth and LWP are so
unimportant that we should try to minimize the time we spend on them.
[15:51:51] <mvita> are you runnng with afsd —fakestat?
[15:52:47] <mvita> Stephan^
[15:52:55] <wiesand> yes
[15:52:59] <mvita> ND is using -fakestat, and so I have been as well
[15:53:05] <mvita> I need to try without it.
[15:53:13] <mvita> now that I can reproduce it reliably.
[15:53:29] <mvita> thank you Stephan.
[15:53:34] <wiesand> I'm just trying w/ -disable-dynamic-vcaches
[15:53:40] <mvita> what's your kernel?
[15:53:59] <mvita> (and I presume you are OpenAFS 1.6.21.1)
[15:54:06] <wiesand> 3.10.0-693.5.2.el7
[15:54:16] <mvita> okay.
[15:54:29] <wiesand> yes, 1.6.21.1
[15:54:40] <mvita> 3.10.0-693.2.2.el7.x86_64 here
[15:54:44] <wiesand> but note i'm not working in the volume root
[15:54:57] <mvita> ah, thank you for that data point as well.
[15:55:19] <mvita> did you reduce -stats ?
[15:55:33] <wiesand> talking of data points, i had tried fs flushall and echo 3 >>/proc/sys/vm/drop_caches, to no avail
[15:55:40] <wiesand> yes, to 2000
[15:55:51] <mvita> right, I wouldn't expect those to help
[15:56:01] <mvita> But they are always worth trying.
[15:56:51] <wiesand> -disable-dynamic-vcaches doesn't help either
[15:57:06] <mvita> OK, thank you for trying that.
[15:58:20] <mvita> Oh, sorry, I must drop out of the mtg.
[15:59:03] <kadukoafs@gmail.com/barnowl0AF0C398> Okay, thanks for being here
[15:59:18] <kadukoafs@gmail.com/barnowl0AF0C398> And I guess we are probably set to adjourn anyway?
[15:59:18] <wiesand> I have to run too..
[15:59:23] <wiesand> Thanks everyone!
[15:59:33] <kadukoafs@gmail.com/barnowl0AF0C398> Thanks everyone!
[15:59:40] wiesand leaves the room
[16:01:27] meffie leaves the room
[17:02:59] meffie joins the room
[17:33:02] meffie leaves the room
Powered by ejabberd Powered by Erlang Valid XHTML 1.0 Transitional Valid CSS!