[00:14:14] --- kaj has become available
[00:35:13] --- haba has left
[01:30:02] --- haba has become available
[03:14:31] --- kaj has left
[03:33:40] --- kaj has become available
[04:53:54] --- Jeffrey Altman has left: Replaced by new connection
[04:53:55] --- Jeffrey Altman has become available
[04:54:09] --- jaltman has left: Replaced by new connection
[04:54:10] --- jaltman has become available
[05:28:26] --- sxw has become available
[05:29:02] --- matt has become available
[05:35:31] --- sxw has left
[05:45:06] --- sxw has become available
[05:48:22] --- steven.jenkins has become available
[06:11:46] --- kaj has left
[06:12:52] --- reuteras has left
[06:12:56] --- clc31 has become available
[06:13:44] --- reuteras has become available
[06:25:21] <matt> re steven's comment--I knew DAFS salvaged on demand, but won't it/salvageserver eventually get around to salvaging all volumes, workload permitting?
[07:07:58] --- deason has become available
[07:27:33] <deason> only if the volumes are accessed
[07:29:21] <deason> and only "all volumes" as in "all volumes that were attached when we crashed"
[07:34:11] --- sxw has left
[07:39:21] --- sxw has become available
[07:49:25] <steven.jenkins> deason - are there other optimizations in DAFS that I missed? I have a vague recollection of an effort to do some background salvaging, but I don't know if that ever got beyond the discussion phase.
[07:51:19] <deason> if we don't attach the volume, we never look at the header, so wed never examine inUse to even see if we should salvage it, so.... I would guess that they did not
[08:08:07] <phalenor> I think hartmut was talking about the case where nearly all the volumes on the fileserver would be in use before a crash, and without fast-restart, would have to salvage all of those volumes after coming back up
[08:09:24] <shadow@gmail.com/owl1EA1D463> who'd want to salvage? i mean, why would i want to fix my data?
[08:10:38] <deason> we have more conditions where we can salvage now, though
[08:10:38] <sxw> Yeh. as far as I see, fast-restart is saying that you don't care if its right, as long as it is quick.
[08:10:43] <phalenor> right, that's the counterargument. nothing can make that case faster, unless you don't care about your data
[08:11:12] <deason> with fast-restart, we skipped the only automatic salvaging test we have... with dafs, we could still skip the inUse check, but be able to salvage automatically if we deteect some internal inconsistency
[08:21:05] --- reuteras has left
[08:36:57] <deason> > would have to salvage all of those volumes after coming back up
to be clear, as soon as volume X is salvaged, you can access volume X; you don't have to wait for everything to salvage before the fileserver "comes up"
[08:37:29] <_cclausen> what does the bitmap-later do?
[08:42:11] --- Kevin Sumner has left
[08:42:23] <deason> it avoids loading the vnode bitmap when a volume is attached
[08:54:55] --- sxw has left
[09:02:27] <matt> deason, steven:  sorry, Im still confused.  what if you never access vol x (for some extended period, or never), we really don't eventually schedule a check-for-salvage?
[09:03:10] <deason> no, not currently
[09:03:47] <matt> ok, thanks for the clarification
[09:04:57] --- sxw has become available
[09:08:12] <shadow@gmail.com/owl1EA1D463> isn't every dirty volume at fileserver start salvaged once, though?
[09:09:37] <deason> what is "dirty"? preattached volumes don't have any part of their header looked at iirc
[09:10:11] <deason> you could 'touch /vicepa/V01231234.vol' and we'd have a vp for it until something tries to access it
[09:10:15] <shadow@gmail.com/owl1EA1D463> i thought the deal was it didn't get to preattached without having
something check. i should look at the code
[09:13:38] <deason> the only real check is that the filename is of the form V<numbers>.vol
[09:14:08] <shadow@gmail.com/owl1EA1D463> yeah, i'm reading that now
[09:18:11] --- abo has left
[09:18:55] --- abo has become available
[09:56:34] --- sxw has left
[09:57:41] --- Russ has become available
[10:06:35] --- matt has left
[10:32:57] --- mattjsm has become available
[10:56:35] --- abo has left: Lost connection
[10:56:35] --- haba has left: Lost connection
[11:45:52] <Russ> The system hosting this chatroom, Gerrit, and Git needs a reboot to pick up a new kernel (and new Apache server and PostgreSQL server).  When would be a good time?
[11:48:06] --- Kevin Sumner has become available
[12:33:38] --- jaltman has left: Replaced by new connection
[12:35:42] --- jaltman has become available
[12:36:51] --- matt has become available
[12:46:09] <Simon Wilkinson> Does the new postgres require a database reload?
[12:46:17] <Simon Wilkinson> Russ^
[12:50:38] <Russ> No, it's just a security update.
[12:50:45] <Simon Wilkinson> Cool.
[15:24:53] --- deason has left
[16:00:57] --- dwbotsch has left
[16:01:36] --- dwbotsch has become available
[16:22:34] --- matt has left
[16:29:27] --- mattjsm has left
[19:27:55] --- Jeffrey Altman has left
[19:33:57] --- jaltman has left: Disconnected
[19:59:58] --- Born Fool has become available
[20:05:08] --- jaltman has become available
[20:33:00] --- jaltman has left: Replaced by new connection
[20:33:02] --- jaltman has become available
[20:35:07] --- jaltman has left: Replaced by new connection
[20:35:10] --- jaltman has become available
[20:48:01] --- jaltman has left: Replaced by new connection
[20:48:02] --- jaltman has become available
[20:55:33] --- clc31 has left
[20:58:47] --- jaltman has left: Replaced by new connection
[20:58:48] --- jaltman has become available
[21:02:14] --- jaltman has left: Replaced by new connection
[21:02:15] --- jaltman has become available
[21:16:56] --- Born Fool has left
[21:33:29] <kaduk@mit.edu/barnowl> I was very confused when my git-bisect pointed to f9799b8561830 as the
cause of my aklog segfaults.
It turns out that it was in fact non-deterministic (not that I should
be surprised).
(See gerrit/2213 for the fix)
[21:39:59] --- kaj has become available
[21:41:23] <kaduk@mit.edu/barnowl> jaltman: thanks for the catch
[21:45:08] <kaduk@mit.edu/barnowl> aaaand I fail today.
[21:49:09] <kaduk@mit.edu/barnowl> Let me check that it actually builds ...
[21:49:25] <jaltman> btw, does the caller to that function have to do something special if linkedCell is passed in as NULL?
[21:50:09] <kaduk@mit.edu/barnowl> (it doesn't build, apparently)
[21:50:24] <kaduk@mit.edu/barnowl> Er, the caller of which function?
[21:52:17] <jaltman> auth_to_cell()
[21:53:07] <kaduk@mit.edu/barnowl> (Fifth time's the charm!)
[21:53:56] <kaduk@mit.edu/barnowl> It doesn't look like it ...
[21:54:54] <kaduk@mit.edu/barnowl> Fourth time.  Right.
[21:55:09] <kaduk@mit.edu/barnowl> Sorry for all the mail.
[21:55:12] <jaltman> np
[21:55:31] <jaltman> thanks for the fix
[21:56:58] <kaduk@mit.edu/barnowl> My pleasure; it's been pretty annoying getting the crash for the past
few months.
Actually, now I'm wondering what bad things happened when it *didn't*
crash ...
[22:03:55] <Russ> It probably didn't crash when the variable happened to be NULL, which will be a fair bit of the time.
[22:05:00] <kaduk@mit.edu/barnowl> Oh, so it is.  I somehow expected free(NULL) to crash.
[22:06:12] <Russ> It used to vary by platform, but I think C99 says that free(NULL) is a no-op these days.  At least most platforms treat it that way.
[22:11:01] --- kaj has left
[22:15:28] --- reuteras has become available
[22:16:49] --- jaltman has left: Disconnected
[22:16:59] --- jaltman has become available
[22:25:09] --- Russ has left: Disconnected
[23:11:06] --- kaj has become available
[23:34:34] --- Simon Wilkinson has left