[00:14:14] --- kaj has become available [00:35:13] --- haba has left [01:30:02] --- haba has become available [03:14:31] --- kaj has left [03:33:40] --- kaj has become available [04:53:54] --- Jeffrey Altman has left: Replaced by new connection [04:53:55] --- Jeffrey Altman has become available [04:54:09] --- jaltman has left: Replaced by new connection [04:54:10] --- jaltman has become available [05:28:26] --- sxw has become available [05:29:02] --- matt has become available [05:35:31] --- sxw has left [05:45:06] --- sxw has become available [05:48:22] --- steven.jenkins has become available [06:11:46] --- kaj has left [06:12:52] --- reuteras has left [06:12:56] --- clc31 has become available [06:13:44] --- reuteras has become available [06:25:21] re steven's comment--I knew DAFS salvaged on demand, but won't it/salvageserver eventually get around to salvaging all volumes, workload permitting? [07:07:58] --- deason has become available [07:27:33] only if the volumes are accessed [07:29:21] and only "all volumes" as in "all volumes that were attached when we crashed" [07:34:11] --- sxw has left [07:39:21] --- sxw has become available [07:49:25] deason - are there other optimizations in DAFS that I missed? I have a vague recollection of an effort to do some background salvaging, but I don't know if that ever got beyond the discussion phase. [07:51:19] if we don't attach the volume, we never look at the header, so wed never examine inUse to even see if we should salvage it, so.... I would guess that they did not [08:08:07] I think hartmut was talking about the case where nearly all the volumes on the fileserver would be in use before a crash, and without fast-restart, would have to salvage all of those volumes after coming back up [08:09:24] who'd want to salvage? i mean, why would i want to fix my data? [08:10:38] we have more conditions where we can salvage now, though [08:10:38] Yeh. as far as I see, fast-restart is saying that you don't care if its right, as long as it is quick. [08:10:43] right, that's the counterargument. nothing can make that case faster, unless you don't care about your data [08:11:12] with fast-restart, we skipped the only automatic salvaging test we have... with dafs, we could still skip the inUse check, but be able to salvage automatically if we deteect some internal inconsistency [08:21:05] --- reuteras has left [08:36:57] > would have to salvage all of those volumes after coming back up to be clear, as soon as volume X is salvaged, you can access volume X; you don't have to wait for everything to salvage before the fileserver "comes up" [08:37:29] <_cclausen> what does the bitmap-later do? [08:42:11] --- Kevin Sumner has left [08:42:23] it avoids loading the vnode bitmap when a volume is attached [08:54:55] --- sxw has left [09:02:27] deason, steven: sorry, Im still confused. what if you never access vol x (for some extended period, or never), we really don't eventually schedule a check-for-salvage? [09:03:10] no, not currently [09:03:47] ok, thanks for the clarification [09:04:57] --- sxw has become available [09:08:12] isn't every dirty volume at fileserver start salvaged once, though? [09:09:37] what is "dirty"? preattached volumes don't have any part of their header looked at iirc [09:10:11] you could 'touch /vicepa/V01231234.vol' and we'd have a vp for it until something tries to access it [09:10:15] i thought the deal was it didn't get to preattached without having something check. i should look at the code [09:13:38] the only real check is that the filename is of the form V.vol [09:14:08] yeah, i'm reading that now [09:18:11] --- abo has left [09:18:55] --- abo has become available [09:56:34] --- sxw has left [09:57:41] --- Russ has become available [10:06:35] --- matt has left [10:32:57] --- mattjsm has become available [10:56:35] --- abo has left: Lost connection [10:56:35] --- haba has left: Lost connection [11:45:52] The system hosting this chatroom, Gerrit, and Git needs a reboot to pick up a new kernel (and new Apache server and PostgreSQL server). When would be a good time? [11:48:06] --- Kevin Sumner has become available [12:33:38] --- jaltman has left: Replaced by new connection [12:35:42] --- jaltman has become available [12:36:51] --- matt has become available [12:46:09] Does the new postgres require a database reload? [12:46:17] Russ^ [12:50:38] No, it's just a security update. [12:50:45] Cool. [15:24:53] --- deason has left [16:00:57] --- dwbotsch has left [16:01:36] --- dwbotsch has become available [16:22:34] --- matt has left [16:29:27] --- mattjsm has left [19:27:55] --- Jeffrey Altman has left [19:33:57] --- jaltman has left: Disconnected [19:59:58] --- Born Fool has become available [20:05:08] --- jaltman has become available [20:33:00] --- jaltman has left: Replaced by new connection [20:33:02] --- jaltman has become available [20:35:07] --- jaltman has left: Replaced by new connection [20:35:10] --- jaltman has become available [20:48:01] --- jaltman has left: Replaced by new connection [20:48:02] --- jaltman has become available [20:55:33] --- clc31 has left [20:58:47] --- jaltman has left: Replaced by new connection [20:58:48] --- jaltman has become available [21:02:14] --- jaltman has left: Replaced by new connection [21:02:15] --- jaltman has become available [21:16:56] --- Born Fool has left [21:33:29] I was very confused when my git-bisect pointed to f9799b8561830 as the cause of my aklog segfaults. It turns out that it was in fact non-deterministic (not that I should be surprised). (See gerrit/2213 for the fix) [21:39:59] --- kaj has become available [21:41:23] jaltman: thanks for the catch [21:45:08] aaaand I fail today. [21:49:09] Let me check that it actually builds ... [21:49:25] btw, does the caller to that function have to do something special if linkedCell is passed in as NULL? [21:50:09] (it doesn't build, apparently) [21:50:24] Er, the caller of which function? [21:52:17] auth_to_cell() [21:53:07] (Fifth time's the charm!) [21:53:56] It doesn't look like it ... [21:54:54] Fourth time. Right. [21:55:09] Sorry for all the mail. [21:55:12] np [21:55:31] thanks for the fix [21:56:58] My pleasure; it's been pretty annoying getting the crash for the past few months. Actually, now I'm wondering what bad things happened when it *didn't* crash ... [22:03:55] It probably didn't crash when the variable happened to be NULL, which will be a fair bit of the time. [22:05:00] Oh, so it is. I somehow expected free(NULL) to crash. [22:06:12] It used to vary by platform, but I think C99 says that free(NULL) is a no-op these days. At least most platforms treat it that way. [22:11:01] --- kaj has left [22:15:28] --- reuteras has become available [22:16:49] --- jaltman has left: Disconnected [22:16:59] --- jaltman has become available [22:25:09] --- Russ has left: Disconnected [23:11:06] --- kaj has become available [23:34:34] --- Simon Wilkinson has left