[06:14:23] --- Stephan Wiesand has become available [06:37:15] --- shadow@gmail.com/barnowlED751AFB has become available [06:37:40] --- Stephan Wiesand19977 has become available [06:37:40] --- Stephan Wiesand19977 has left [06:37:40] --- Stephan Wiesand76208 has become available [06:37:40] --- Stephan Wiesand76208 has left [06:37:40] --- Stephan Wiesand89776 has become available [06:37:40] --- Stephan Wiesand89776 has left [06:37:40] --- Stephan Wiesand46502 has become available [06:37:40] --- Stephan Wiesand46502 is now known as Stephan Wiesand18446 [06:37:40] --- Stephan Wiesand18446 has left [06:37:40] --- Stephan Wiesand18446 has become available [06:37:41] --- Stephan Wiesand has left [06:57:34] --- Stephan Wiesand18446 has left [06:57:40] --- Stephan Wiesand has become available [07:03:37] --- deason has become available [07:04:22] Hello [07:04:40] hi [07:04:40] hi [07:05:03] mike will not be here today, I think [07:05:38] Ok. I didn't expect overwhelming participation today ;-) [07:05:57] I may be in and out ;) [07:06:22] Fine. [07:06:42] i've been in and out for years [07:07:07] I guess we all are. [07:07:20] Let's start with "pre3 feedback". [07:07:31] Nothing on the lists. [07:07:45] Nothing in RT (as of 30m ago). [07:07:55] Anything else? [07:08:49] I take it the answer is no. [07:08:50] none here [07:09:57] So let's go through http://gerrit.openafs.org/#q,branch:openafs-stable-1_6_x+starredby:1000008,n,z [07:10:15] Andrew: Thanks for preparing this. [07:11:38] fs flushall: I'd like to see this feature in 1.6.2. Will mark it as pre-approved in the minutes. [07:11:59] 1.6.3, you mean [07:12:11] Yeah, I do :-/ [07:13:16] next one: viced: Restrict RXAFS_FlushCPS to administrators [07:13:18] no general objections here; I just wanna look at the code closely again [07:13:41] Fine, please do. There's ample time ;-) [07:13:43] I glanced at that list last night, and most of them (not all) were "sounds good, but should look at the code". [07:14:00] Same remark ;-) [07:14:36] I'll respond to chaskiel in gerrit; it doesn't seem as useful here since he's not here [07:15:08] I think the objection is invalid - just check the dependencies, right? [07:15:12] he brings up an identity being able to refresh their own cps, which may be a good idea, but I don't think that's a reason to _not_ do this now [07:15:54] That's what I think. [07:16:10] well, he is correct in that most other operations use VanillaUser instead of viced_SuperUser, which check different things [07:16:15] If it's an improvement, do it now instead of waiting for something perfect. [07:16:22] but I think this is different because it involves server operation, and not stuff in afs-space [07:16:57] well, for some things involving this, it could be a problem if they are wrong, because you have to "deal" with older servers doing something bad/weird or whatever [07:17:21] but this is turning an operation you cannot do (fully refresh cps) into something you can only to as an administrator; so I think it's an "improvement" [07:17:30] or that is, it's going in a consistent direction [07:17:35] --- Marc Dionne has become available [07:17:41] Just to make sure: this won't prevent vanilla users from running "fs flush and fs flushv", will it? [07:18:01] no no, it has nothing to do with that [07:18:18] That was my guess, thanks. [07:19:19] moving on? [07:19:39] Yes. NAT pings: could someone spell out what's better than what we already have? [07:20:50] the NAT ping functionality currently only helps if a client is new enough to have NAT pings, and if the NAT in question keeps the connection alive if it sees traffic only in one direction [07:21:06] so right now we have pings from client -> fileserver [07:21:20] this makes the fileserver do pings from fileserver -> client, so it should help even when clients are older than the nat ping stuff [07:22:03] and the purpose for the nat ping stuff in general is to keep nat port mappings alive, so we can send callback breaks to a client [07:22:16] Ok. Let's include it. [07:22:51] If it causes trouble during testing, we can still back it out. [07:23:01] the reason this is "questionable" is that it could add a noticeable amount of new traffic from fileserver->client [07:23:43] but it should be the same as the new client->fileserver traffic, so it should only be significantly "more" if you have old clients [07:23:43] It is supposed to send more than one ping per 20s to the same client? [07:23:54] Well, are we still sending 100s of pings per second in some cases? :) [07:24:24] No, I think that was fixed. [07:24:26] not that I know of, and the client->fileserver nat ping stuff is still on, so... [07:24:35] (though I still wish that was per-peer and not per-conn) [07:24:36] Or at least worked around. [07:25:13] this should be less likely to have issues like that had, since we have exactly _one_ connection per client for callback breaks [07:25:18] But yes, that's the reason why I'm a bit suspicious about this feature. [07:25:39] (I think it was fixed, too.) [07:25:59] if it wasn't fixed, I think we'd still be hearing about it in 1.6 releases :) [07:26:09] we certainly heard about it when it was an issue [07:26:28] Guilty ;-) [07:27:22] Ok, let's move on to volser: preserve stats... [07:27:24] moving on? (we don't need to come to an actual conclusion about each one today, I don't think) [07:27:43] Why would I want it? [07:28:24] the volume statistics are pretty useless for RO volumes that are released either frequently, or at unpredictable intervals [07:28:27] er, currently [07:28:40] because they reset when you do a release [07:29:03] Ok, I want it. [07:29:18] er, and this allows you to "save" the stats across a release [07:30:11] Hmm, 2+ minutes for my messages to get back to me. This seems poor. [07:30:18] and it's been in production use at a site that uses these stats to track volume usage and if a volume is unused etc [07:30:52] I didn't think this one was very controversial, and the code is simple; it's just here because it's a "feature", though a small one [07:31:36] Nice feature. More +1s would be nice [07:31:51] But at first glance it looks straightforward to me. [07:32:46] moving on? [07:32:53] Yes. [07:33:02] NeverAttach: why not. [07:33:22] you want to refresh your cps, push new tokens. aklog -force. whatever [07:33:51] NeverAttach sounds good. [07:34:16] --- kaduk@mit.edu/barnowl has left: Lost connection [07:34:25] Derrick: I'm not quite getting your comment... [07:34:28] (sorry, i am trying to run out the door for a dentist appt so i am spotty and behind) [07:34:37] he's referring to the first change we discussed :) [07:34:42] the flushcps admin only thing. it's not a problem: they have a workaround [07:34:44] Ah... [07:34:53] Fine. Thanks. [07:35:20] neverattach: no objections here, and it's been waiting for a while now, I think [07:35:43] oh, just 2012, maybe not that long [07:36:27] It was pulled up in March... [07:36:55] Let's take it, plus the fix. [07:37:12] Moving on to bozo: retry start... [07:37:18] --- kaduk@mit.edu/barnowl has become available [07:37:48] Speling error in the commit message ;-) [07:38:17] ah, "two", I didn't see that before, hah [07:38:32] I found two more. [07:38:41] my only concern with this one is that the relevant bozo code can be... hard to follow [07:38:54] just means I need to spend more time in reviewing it, though [07:39:42] My concern is that we get close to no testing of servers. [07:40:46] Which is why complex changes in this area make me bervous [07:41:26] What's the case for this feature? Why would it help me run my site? [07:41:41] well, we have some servers that are just running, but this code path isn't really exercised unless you have a failing process [07:42:09] so, the motivation for this is that sometimes a process specified for bosserver fails to start due to some transient error [07:42:38] I don't remember what the specific cause for the motivating issue report (mike probably knows), but it was only an issue for like, a few milliseconds [07:43:08] and because bosserver tried to start it immediately like 5 times or whatever the limit is, without waiting for anything, it failed the process forever and never started it, requiring admin intervention [07:43:23] this change would just try to restart the process again after a bit of a delay [07:43:38] er well, a few times, doubling the delay each time [07:44:14] If an admin attempts to restart manually during that waiting time, what happens? [07:45:11] oh, manually stopping or restarting it should disable this stuff entirely until the next error start; it takes it out of the 'automatic' restarting thing [07:45:29] should or does? [07:45:34] I don't think that part of the code path is different here, though I haven't reviewed it enough to know yet [07:45:53] you're asking me if the change contains any bugs? :) [07:46:07] sorry :-) [07:47:02] I mean, yes, that's what it is supposed to do [07:47:08] I'm not sure about this one. Fixes a rare problem, not easy to review, hard to test. [07:47:36] it only engages this code on an error stop, and a brief glance shows that it does reset the relevant counters on a manual restart, and some other manual operations [07:48:17] come to think of it, I'm a bit concerned about the retry delay/count... it looks like the longest it waits is almost a day [07:49:11] 16x, start with 30s, doubles every time? [07:49:47] it starts with 1s [07:50:12] doubled 16 times give you 18 hours, if I did that right [07:50:57] O(10h) ;-) [07:51:09] Still doesn't look too reasonable. [07:51:25] in any case, long enough for an operator to not be paying attention to it and possibly be asleep :) [07:51:38] We should discuss this in gerrit. [07:51:42] okay, but we can discuss more in gerrit or just later [07:51:49] and mike may have more to say about it [07:52:45] moving on? [07:53:01] ok. remove ih_sync_thread ?! [07:53:23] Isn't that what we just made configurable in 1.6.3? [07:54:00] I just wasn't sure what we were doing with this change [07:54:12] I thought you said it was possible that we'd remove it from 1.6 at some point in the future [07:54:25] I'm not pushing for it; I just wasn't sure what to do with it [07:55:07] Let's keep it around, but it's not for the next stable release. [07:55:52] I guess it will become something like GUACB [07:56:24] okay, but keep in mind that code will have to change, since the gerrit submission is from before we had the runtime stuff [07:56:41] I'll... -2 it for now, I guess? [07:56:45] if that's fine, then moving on.... [07:56:49] Fine. [07:57:53] The interrrupt RX calls thing... [07:58:04] Has been around for a while. [07:58:38] I would still vote for next bugfix release at the earliest [07:59:07] er, which I guess is the release we're talking about, duh [07:59:38] the site running this may be running this on 1.6 by then, which would be nice to see [07:59:54] I'd still put it low on the list of things to add to a release beyond bug fixes. [08:00:07] --- shadow@gmail.com/barnowlED751AFB has left [08:00:21] sure, it can wait [08:00:35] So, don't abandon, but not clearly a candidate for the next stable. [08:00:49] okay [08:01:17] Netx one: libafscp NULL ... [08:01:29] This one seemed straightforward to me. [08:01:50] That is, once the change required to make it apply was merged. [08:02:11] okay, I wasn't sure if it was still "todo", or if it was one of those things obsoleted by other libafscp-related changes [08:02:28] so we just need the requisite change in [08:02:28] I may be wrong :-) [08:02:58] derrick is the one I was looking to provide an answer about this, if he's online enough to see what I'm saying and respond.... [08:03:14] that is, if various software lets him respond [08:03:53] Once the bug was merged, this fix clearly should be applied. Let's move on. [08:04:23] GUACB... [08:04:51] IIRC, we came to the conclusion that this should be configurable? [08:05:42] in my opinion, configurable, default to off, and change the default "sometime" [08:05:58] +1 [08:06:14] and nothing's change wrt to it, so just move on, I guess [08:06:23] Ok. [08:06:33] I just worry (in general) that adding more configurable knobs means code that is not getting used/tested. [08:07:22] Ben: agreed. [08:07:25] But: [08:07:32] the alternative was turning this on in the middle of a stable release, which is not acceptable [08:07:43] or, said to not be acceptable, or whatever [08:08:01] because it will break existing sites [08:08:07] At the rate we have major releases, this is the only way to get such changes in at all IMHO. [08:08:09] so, at least give the person a choice [08:08:20] oh yeah, or just wait for the next major release [08:08:31] Good one... [08:09:05] but if it still hasn't happened by the next major release, I think just turning it on is more reasonable; but that's a discussion for another day [08:10:03] Ok. Last one: Do not rest copyDate... [08:10:35] Not sure I understand what it's about. [08:11:51] one of the volume header fields is different for an RO volume depending on if it was created via a clone or if it's a remote site [08:12:04] that is, it means something different / gets reset at a different time [08:12:22] and it doesn't need to be; it's a useful value for determining when a volume was first created [08:13:03] Sounds reasonable. Why would anyone object? [08:14:31] well, it changes when the relevant date gets changed; it's a "behavior change" [08:15:35] I think it's simple enough. [08:16:09] A site doing tricky stuff with these fields should be able to cope. Or object. [08:16:24] well, the fields are used for things [08:17:05] the specific report motivating this change has to do something with converting ros... I'd need to look it up [08:17:33] but since I'm just changing that copydate to be the same on a clone as it is everywhere else, it seems fine [08:17:53] since if you remove the RW, the local RO and a remote RO should be indistinguishable; right now they're not [08:18:37] This one needs more review by others anyway. Unless someone objects, I'm inclined to accept this one. [08:20:04] yes; they all need review, this was more for nontechnical, conceptual/strategic stuff [08:21:17] It still was a reasonable step. The minutes now going to -devel as well, we may have the required discussions well in time. [08:21:43] Anything else for today? [08:21:51] we could discuss the 1.6.4/1.6.5 branching thing [08:21:54] or leave that to the list [08:22:01] no we can't [08:23:12] I think it's general enough to apply to any release process, but okay, fine [08:23:17] nothing further from me, then :) [08:23:36] Strict order from the gatekeepers: All releases are done from one 1_6_x branch. [08:24:01] Ok, thanks a lot everyone! [08:24:08] Bye. [08:24:14] --- Stephan Wiesand has left [08:29:54] --- Marc Dionne has left [15:10:55] --- deason has left [18:02:20] --- Jeffrey Altman has left: Replaced by new connection [18:02:21] --- Jeffrey Altman has become available [19:16:39] --- Jeffrey Altman has left: Replaced by new connection [19:16:40] --- Jeffrey Altman has become available