- Wednesday, October 11, 2017

release-team@conference.openafs.org

Wednesday, October 11, 2017< ^ >

Room Configuration

Room Occupants

GMT+0
[02:13:17] mvita leaves the room
[02:20:13] mvita joins the room
[13:01:10] mvita leaves the room
[13:10:00] mvita joins the room
[13:16:00] mvita leaves the room
[13:57:34] mvita joins the room
[14:02:12] mvita leaves the room
[15:02:19] wiesand joins the room
[15:02:47] <wiesand> Good morning USA
[15:02:59] <kadukoafs@gmail.com/barnowl772461F2> Greetings
[15:03:37] <wiesand> SNA seem to be notably absent...
[15:04:41] <kadukoafs@gmail.com/barnowl772461F2> Perhaps they have an other meeting that is going long
[15:04:51] <wiesand> Anyway, I think re 1.6 we're heading for a 1.6.21.2
[15:04:52] <kadukoafs@gmail.com/barnowl772461F2> I will take advantage of this gap between meetings to go make another
cup of tea
[15:05:45] <wiesand> Sounds like a good plan, please go ahead ;-)
[15:06:32] <kadukoafs@gmail.com/barnowl772461F2> 1.6.21.2 sounds reasonable.  (Or is it sad, that we have to keep
adopting to changing linux?)
[15:07:11] <wiesand> Meanwhile: I just pulled up the Linux 4.14 changes: gerrit  12734..5
[15:08:29] <wiesand> It is sad, but that we are keeping up is much more than I'd have hoped for 12 to 18 months ago!
[15:09:33] <wiesand> The buildbot verdict on those pullups is overdue. I haven't found the culprit yet.
[15:09:48] <kadukoafs@gmail.com/barnowl772461F2> Could you make pull-ups to 1.8 first/too?
[15:10:22] <wiesand> Ouch, sorry for that. I'll correct that tomorrow.
[15:10:27] <kadukoafs@gmail.com/barnowl772461F2> rhel5_x86_64-builder (offline, plus 296)
opensuse13-arm-builder (offline, plus 44)
ouch
[15:10:34] <kadukoafs@gmail.com/barnowl772461F2> No worries, and thanks.
[15:11:22] <kadukoafs@gmail.com/barnowl772461F2> Ah, but you're waiting for the fedora20-x86_64 builder, I think, which
is just (offline, plus 2)
[15:11:41] <kadukoafs@gmail.com/barnowl772461F2> Maybe that one is Derek Atkins; I'm not sure.
(Also, isn't fedora 20 ancient history?)
[15:12:13] <wiesand> probably… anyway, we should really do master -> 1.8.x -> 1.6.x
[15:12:35] <wiesand> It's been an unusually long day here...
[15:13:01] <wiesand> Fedora 24 was just retired...
[15:13:40] <kadukoafs@gmail.com/barnowl772461F2> Time for you to go home?
[15:13:55] <wiesand> I am at home ;-)
[15:14:07] <kadukoafs@gmail.com/barnowl772461F2> Ah, good.
[15:14:37] mvita joins the room
[15:15:15] <mvita> oh, sorry, yes Mike and I are offsite meetings all week
[15:15:19] <wiesand> Actually, there's a law that would disallow that I'm still working
[15:15:45] <kadukoafs@gmail.com/barnowl772461F2> Anyway, just to keep you informed, I did get to look at the "rx event"
issue(s) we talked about the last couple of weeks, and have a quite
solid understanding of why the test fails.  Determining whether the
actual rx.c code suffers from similar issues will require going
through and auditing all the events/rxevent_Cancel() callers, though.
[15:15:47] <mvita> this is the first I've been allowed to open my laptop for about an hour
[15:15:54] <wiesand> Mark: still glad you're here
[15:16:06] <kadukoafs@gmail.com/barnowl772461F2> > law that would disallow that I'm still working
Ah, sounds like Europe :)
[15:17:59] <wiesand> Ben: it's not all bad… the makers of that law didn't exactly have me in mind I guess
[15:18:27] <kadukoafs@gmail.com/barnowl772461F2> Oh, I wasn't trying to say it's bad; it's just different.
[15:19:56] <wiesand> re rx: good news
[15:20:57] <mvita> Ben, I just had a occasion to look through all the rx events for another reason last week
[15:21:01] <mvita> an
[15:21:22] <kadukoafs@gmail.com/barnowl772461F2> The core issue is a race between an event firing and rxevent_Cancel()
running (and also that we never check the return value of
rxevent_Cancel() -- we generally call rxevent_Cancel() with a call
lock or connection data lock held, to protect the call/connection data
field holding a struct rx_event*.  rxevent_Cancel() has to grab the
eventTree lock to check if the event's in the tree.  If an event is
due to fire at the same time, the eventTree lock is held to find the
expired/ing event and remove it from the red/black tree, keeping a
local pointer to it.  (This is fine since the event structure is
refcounted.)  Then rxevent_RaiseEvents() drops the eventTree lock and
calls the event's handler function, which (usually) wants to take the
same call/connection data lock.
[15:21:22] <mvita> is there something in particular you need to check?
[15:21:55] <mvita> oh, eeg.
[15:22:04] <mvita> toctou?
[15:22:23] <kadukoafs@gmail.com/barnowl772461F2> So the event's handler function blocks until after the
rxevent_Cancel() caller is done, by which point the call/connection's
pointer to the event has been cleared, so then the event handler might
try to rxevent_Put() a NULL pointer.
[15:22:23] <wiesand> ibegyourpardon?
[15:22:32] <kadukoafs@gmail.com/barnowl772461F2> time-of-check-to-time-of-use
[15:23:01] <kadukoafs@gmail.com/barnowl772461F2> The rxevent_Put() crash is easy to fix -- just check if the 'event'
argument to the handler matches what we're considering
rxevent_Put()-ing.
[15:23:11] <mvita> yes
[15:23:31] <kadukoafs@gmail.com/barnowl772461F2> But the more dangerous thing is the risk that (under the same instance
of taking the lock), the caller of rxevent_Cancel() might also go and,
say, free out the entire containing structure.
[15:23:45] <mvita> yes
[15:24:13] <kadukoafs@gmail.com/barnowl772461F2> In some places we take a reference on the call to correspond to the
pending event, but not universally so.
[15:24:40] <mvita> some of them are scheduled ad-hoc, and some reschedule themselves periodically
[15:25:02] <kadukoafs@gmail.com/barnowl772461F2> In some cases it will be appropriate for the caller of
rxevent_Cancel() to want to wait until the event handler has finished
running, which requires transferring the struct rxevent* to a local
variable and dropping the lock around the rxevent_Cancel() call.
[15:25:05] <mvita> a lot of messy inconsistency
[15:25:13] <wiesand> BTW, we have an instance of the getcwd problem on an EL7 system running 1.6.21
[15:25:20] <kadukoafs@gmail.com/barnowl772461F2> (It's easy to make rxevent_Cancel() wait like this.)
[15:25:28] <wiesand> And we have very few of those systems yet.
[15:25:58] <kadukoafs@gmail.com/barnowl772461F2> > getcwd
It never ends!
[15:26:46] <mvita> getcwd is no fun
[15:26:53] <kadukoafs@gmail.com/barnowl772461F2> But, having started looking through rx.c, I don't think we can easily
make all events into ones that hold references on the containing
structure or make all rxevent_Cancel() calls wait until any running
handler has completed.  So it will be messy and probably tedious.
[15:26:59] <wiesand> "shake harder" may still be unmasking some other bug
[15:27:40] <kadukoafs@gmail.com/barnowl772461F2> I suppose I could push my WIP to my github if people want to look, but
I don't think it's ready for gerrit yet.
(I'm not even sure how many commits I want to split things into.)
[15:27:41] meffie joins the room
[15:28:07] <mvita> Stephan, is there an RT ticket for the getcwd report?
[15:28:18] <mvita> I would need kernel level to try to reproduce it
[15:28:21] <wiesand> Ultimately, presenting mount points as mounts to the Linux kernel rather than faking a directory is probably the only reasonable solution.
[15:29:05] <mvita> and any instructions you have for reproducing the issue reliably
[15:29:10] <wiesand> Mark: no ticket, sorry.
[15:30:22] <kadukoafs@gmail.com/barnowl772461F2> Speaking of tickets, we did get a helpful reminder that the security@
PGP key was expiring.  It's on an older machine of mine, so I'm
considering just making a new one instead of bumping the expiration.
[15:30:30] <wiesand> And no reproducer. A user on one of our work group servers lost her home directory to it. We're not going to find out how.
[15:30:51] <mvita> okay
[15:31:01] <mvita> then kernel level, at least
[15:31:10] <wiesand> Ben: whatever you seem fit
[15:31:21] <mvita> Ben, a new key seems best
[15:31:43] <wiesand> Mark: 3.10.0-693.1.1.el7.x86_64
[15:32:06] <mvita> thank you
[15:32:48] <wiesand> I tried "fs flushall" and "echo 3 >> /proc/sys/vm/drop_caches", to no avail.
[15:33:33] <mvita> we are tied up in meetings this week, but I'll try to get to the getcwd thing
[15:34:06] <mvita> sorry, have to step away for a bit…
[15:34:13] <kadukoafs@gmail.com/barnowl772461F2> thanks
[15:35:26] <wiesand> Thanks Mark!
[15:35:43] <wiesand> Is there more to discuss today?
[15:35:56] <meffie> fyi, marcio has made good progress on high sierra. client is running on apfs, but having some issues with finder
[15:36:17] <wiesand> Mike: good news indeed!
[15:36:20] <meffie> he will be sending patches to gerrit soonish
[15:37:08] <wiesand> Good. We're not dead yet...
[15:37:50] <kadukoafs@gmail.com/barnowl772461F2> I don't have anything else.
[15:38:14] <wiesand> Adjourn?
[15:38:18] <meffie> thanks!
[15:38:29] <wiesand> Thanks a lot everyone!
[15:38:33] <kadukoafs@gmail.com/barnowl772461F2> Thanks everyone!
[15:38:46] <meffie> thanks have a good day/evening.
[15:38:49] <wiesand> CU next week.
[15:38:51] wiesand leaves the room
[16:01:55] meffie leaves the room
[17:39:46] meffie joins the room
[17:47:46] meffie leaves the room
[21:06:24] mvita leaves the room
[21:14:21] meffie joins the room
[21:17:25] mvita joins the room
[21:24:21] meffie leaves the room
[21:26:02] mvita leaves the room