Home
release-team@conference.openafs.org
Friday, March 8, 2019< ^ >
Room Configuration
Room Occupants

GMT+0
[13:00:01] wiesand joins the room
[13:43:56] kaduk@jabber.openafs.org/barnowl leaves the room
[13:44:00] kaduk@jabber.openafs.org/barnowl joins the room
[13:58:55] meffie joins the room
[14:00:16] <wiesand> Good morning
[14:00:25] <meffie> good day
[14:00:45] <kaduk@jabber.openafs.org/barnowl> greetings
[14:01:12] <mvita> hi
[14:01:30] <kaduk@jabber.openafs.org/barnowl> Should I be tagging pre1 yet?
[14:01:40] <wiesand> I'm in the process of uploading the 1.8.3pre1 candidate sources to g.c.o.
[14:02:02] <meffie> yay
[14:02:08] <wiesand> Mike promised to smoke test - maybe wait with the tag for that
[14:02:16] <kaduk@jabber.openafs.org/barnowl> Sure
[14:02:26] <mvita> sorry I did not get the Solaris CTF verified yet - doing it now
[14:02:55] <mvita> (13487)
[14:10:07] <wiesand> upload running
[14:10:44] <wiesand> And that's what I have on the stable series for today
[14:11:22] <kaduk@jabber.openafs.org/barnowl> master also had a slow week with me pushing to prep for yesterday's
IESG telechat.
[14:11:42] <kaduk@jabber.openafs.org/barnowl> I did get a few rxgk patches merged, though -- HEPIX is close enough
that the pressure is building
[14:12:49] <wiesand> looks like CERN is getting serious about moving home directories from AFS :-(
[14:12:59] <meffie> still waiting from hepix to see if the openafs release team report has been accepted. the call for papers end on the 17th.
[14:13:06] <kaduk@jabber.openafs.org/barnowl> The IBM folks reported some "exciting" issues where an expired token
could show up as RXGEN_CC_MARSHALL (-450) that Yadav and Andrew seem
to have tracked down
[14:13:37] <kaduk@jabber.openafs.org/barnowl> So that's in 13510, but I held off on merging for what is basically a
style question.
[14:14:46] <kaduk@jabber.openafs.org/barnowl> We should probably get back into the habit of looking at RT in these
meetings -- https://rt.central.org/rt/Ticket/Display.html?id=134904 is
getting pretty frustrating for the folks at MIT
[14:15:55] <kaduk@jabber.openafs.org/barnowl> The symptoms look like we're giving the kernel a negative dentry for a
volume mountpoint and the kernel is caching that and not calling back
into us, but it's hard to be sure with what is known so far.
[14:16:35] <kaduk@jabber.openafs.org/barnowl> This showed up when they moved from 1.6 to 1.8 on the "public" dialups
[14:17:00] <mvita> ah, ok
[14:17:26] <wiesand> I was about to say that I don't understand #134900
[14:18:22] <kaduk@jabber.openafs.org/barnowl> (I don't either, but that's unsurprising)
[14:19:01] <wiesand> I agree that #134904 sounds bad
[14:19:57] <wiesand> Could it be related to the getcwd() issue?
[14:20:31] <kaduk@jabber.openafs.org/barnowl> I can't rule that out yet
[14:20:45] <wiesand> Is there a reproducer?
[14:21:07] <kaduk@jabber.openafs.org/barnowl> No reproducer :(
[14:22:23] <wiesand> Is it always the same volume?
[14:22:41] <kaduk@jabber.openafs.org/barnowl> no; we've seen it on lots of volumes
[14:22:57] <kaduk@jabber.openafs.org/barnowl> And I am only mostly sure that it's only been volume roots and not
regular directories
[14:24:27] <wiesand> just like getcwd()
[14:24:42] <kaduk@jabber.openafs.org/barnowl> I guess maybe we could have them try a patch to
afs_linux_dentry_revalidate() that always forces revalidation of
cached negative dentries for mountpoints (or volume roots?) and see if
that changes anything.
[14:25:09] <kaduk@jabber.openafs.org/barnowl> But I think maybe Andrew has the best understanding of this part of
the linux VFS, among us
[14:25:28] <meffie> yes, i'll see if he has time to look at this one.
[14:26:01] <kaduk@jabber.openafs.org/barnowl> thanks!
[14:26:46] <wiesand> upload complete, verification (on a different, 1.8.3.pre1) client successful - happy testing :)
[14:26:54] <kaduk@jabber.openafs.org/barnowl> yay
[14:27:00] <meffie> thank you.
[14:29:12] <kaduk@jabber.openafs.org/barnowl> Anything in gerrit that I should prioritize looking at?
[14:30:47] <meffie> nothing other that the marshal thing you already mentioned.
[14:30:57] <meffie> than the
[14:31:11] <kaduk@jabber.openafs.org/barnowl> I'll probably go ahead and merge that on monday even with no further
comments, if I remember.
[14:31:22] <meffie> ok thank you.
[14:31:46] <kaduk@jabber.openafs.org/barnowl> Though I guess we should check whether Stephan read the commit message
and/or comments there, that point out another patch not yet on the
stable branches that would mitigate some of the effects
[14:31:55] meffie leaves the room
[14:32:04] meffie joins the room
[14:33:52] <wiesand> reading…
[14:35:22] meffie downloading 1.8.3pre1 from g.c.o.
[14:39:38] <wiesand> I guess I'll pull up 13288 then...
[14:40:05] <kaduk@jabber.openafs.org/barnowl> Seems worthwhile; thank you
[14:41:12] <wiesand> How serious/frequent is this? Does it warrant a 1.8.3pre2, or can it wait for 1.8.4?
[14:42:10] <kaduk@jabber.openafs.org/barnowl> Mike?
I don't remember it being serious, but I also don't remember it much
at all.
[14:42:56] <meffie> i think it is moderately serious.
[14:43:26] <wiesand> 13288 pullup is 13515
[14:43:27] <meffie> the frequency depends on how many times the error cases are hit
[14:43:38] <meffie> (e.g. token expiry)
[14:44:14] <wiesand> it's basically a wrong error message?
[14:44:52] <meffie> are we talking about 13510?
[14:45:06] <wiesand> that and 13288
[14:45:46] <meffie> one sec, let me remind myself about 13288
[14:47:14] <wiesand> ok, 13288 fixes a connection leak
[14:47:58] <meffie> yes i think so. it was originally done for 13290 (avoid stalled fileserves)
[14:48:14] <meffie> it is a prereq
[14:49:24] <meffie> it fixes the connection leak and also to mark servers down in these cases.
[14:49:42] <wiesand> ouch, idledead…
[14:50:26] <meffie> no, not idle dead :) one thing user would see is long delays as we retry servers that are down.
[14:51:15] <meffie> there were cases which we did not properly call afs_Analyze()
[14:51:23] <wiesand> "When this happens, rx calls will fail with idle-dead…"
[14:51:33] <wiesand> (13290)
[14:52:15] <meffie> oh, i thought you meant 13288.
[14:54:04] <wiesand> ah, I already commented on 13290 with my idledead foo ;-)
[14:54:07] <meffie> 13290 is for a situation where the fileserver is stalling on disk i/o
[14:55:13] <meffie> for just read-only data that is.
[14:56:18] <wiesand> yes, and client side idledead was introduced in 1.4.8 to handle such situations, and it took 8 months to fix the breakage
[14:56:18] <meffie> the idea is if the server hardware is stalling, then try to detect it, and try a different server.
[14:56:37] <wiesand> so, all changes in this area make me nervous - can't help it, sorry
[14:56:59] <meffie> i thought idea dead was to workaround fileserver "meltdowns"  or some such.
[14:58:03] <wiesand> what's the difference between a server meltown and a server stall (from the client's point of view)?
[14:59:21] <meffie> hmm, good point. i thought it was a different situation.
[15:00:14] <meffie> i can remove the idle dead comment in the commit message if that helps :)
[15:00:28] <mvita> heh
[15:00:41] <wiesand> now I remember that the "for just read-only data" calmed me down last summer
[15:00:46] <mvita> don't agitate, meffie
[15:02:13] <wiesand> meffie: s/ comment.*// and you'll make me very very happy ;-)
[15:02:26] <mvita> meffie and I are now in 2 mtgs
[15:02:28] <meffie> yes, this is just for read only data. if we see a server cant serve read requests, we just try to pick different one, and if we can find a different one, we retry the one that was "stalled". so we just are more robust in selecting servers.
[15:03:26] <wiesand> I think we're mostly finished anyway… Anything else, anyone?
[15:03:41] <kaduk@jabber.openafs.org/barnowl> Not from here
[15:04:19] <meffie> i have the tar files, i'll build and smoke test them and report back to release-team mail list?
[15:04:45] <meffie> also, did you see that some of our buildbots will be offline on tuesday?
[15:04:49] <wiesand> Sure. Thanks!
[15:05:19] <wiesand> er, no I hadn't noticed the buildbot outage
[15:05:49] <meffie> sorry for the outage. did you need me to find a temporary home for tuesday?
[15:06:51] <wiesand> I'm trying to find out what I missed...
[15:07:17] <meffie> ah, i posted some messages to release-team@openafs.org
[15:07:31] <wiesand> ah, found it...
[15:08:11] <wiesand> I'm fine with that, thank you.
[15:08:53] <kaduk@jabber.openafs.org/barnowl> fine with outage or temporary home? ;)
[15:09:05] <wiesand> Means I can spend Tuesday evening on something else than openafs, in good conscience ;-)
[15:09:16] <wiesand> fine with the outage
[15:10:34] <wiesand> Let's adjourn then. Thanks a lot everyone!
[15:11:07] <kaduk@jabber.openafs.org/barnowl> Yes, thanks everyone!
[15:11:43] wiesand leaves the room
[15:13:00] <meffie> thanks have a good weekend
[22:11:28] meffie leaves the room
[23:37:57] mvita leaves the room