[04:09:36] --- Stephan Wiesand has become available [05:48:02] Hello. Review of http://gerrit.openafs.org/9380 would be much appreciated... [06:00:52] --- meffie has become available [06:30:14] --- squinney has become available [06:52:37] --- ktdreyer has become available [06:52:55] Hi all [06:53:13] Hi Ken. Finished moving? [06:54:22] still need to get the bed and sofa; I've been sleeping on an air mattress all week as I take car loads over :) [06:54:39] morning/afternoon (as the case may be) [06:54:44] --- Marc Dionne has become available [06:55:08] --- deason has become available [06:56:47] Hello Mike [06:57:47] sorry for the new gerrit on 1.6.x, just a minor annoying bug (can wait for 1.6.4 if you like) [06:58:23] Ken, last time I moved I slept on my therm-a-rest outdoor mat for weeks. I also had to shave sitting on the floor for a while :-) [06:59:23] Stephan: haha! that is roughing it [07:00:42] Mike, 9701 looks straightforward. [07:03:02] I'm fine with that, but I'm not sure if that commit message should look like a normal cherry pick [07:03:29] I've been using that "cherry picked from" line to indicate that the master commit has been pulled to 1.6, so it's, like, "done" and we don't ever need to look at it [07:04:09] but the rest of it could still be pulled up; I dunno, others may disagree [07:05:04] I'd prefer a more quiet time. There are enough files touched by half a dozen changes in the queue already. [07:05:53] a more quiet time for....? [07:06:41] For pulling up the rest of the change on master 9701 is coming from. [07:07:26] i wasnt sure if i should have included simon's commit message or not, just let me know what to do :) [07:07:53] I'll leave that to the git/gerrit gurus :-) [07:08:00] (or lawyers) [07:08:06] I wasn't suggesting pulling the whole thing, just changing the commit message [07:08:38] to just say "hey this is part of commit/change xxx" instead of the normal cherry pick with the whole commit message in there etc [07:09:07] Fine. Just get it ready and reviewed - this one is hardly worth discussing otherwise. [07:09:19] yeah, it's not clear on what to do when just cherry picking only one hunk. [07:09:44] ok. [07:10:39] Let's briefly talk about 1.6.2.1. [07:10:57] I think everything is on it's way now for getting that out. [07:11:21] Stephen started building. Not sure why for EL5/6, though? [07:11:43] aside from releasing the web site i don't think you need anything from me for 1.6.2.1, right? [07:12:06] Derrick, I don't think so. Thanks for pushing the tag. [07:12:22] (within minutes after I asked you to do it...) [07:13:22] And Christof will supply tumbleweed rpms he said. [07:14:05] I think it's just "get it out and forget it" now. [07:14:43] Stephan Wiesand: I just fired off the build, it's just doing the default build order [07:15:07] > get it out and forget it Something like that. [07:15:47] Stephen, thanks. I think I'll upload the rpms for Fedora releases using 3.8 kenrels though. [07:16:06] Ok, let's get to 1.6.3, which is a bit more exciting. [07:16:14] yep, they'll be done sometime tomorrow [07:16:24] maybe even tonight [07:16:38] Great. [07:16:39] I think it goes through Fedora releases in reverse order [07:17:10] So, we should decide what to do about ih_sync_all. [07:18:16] Choices are 9407 or the "make everyone happy at the price of more options and bitrot" patch Andrew created (Thanks!). [07:19:09] Votes? Vetos? [07:20:09] Chaskiel did not respond on -devel, or did I miss it? [07:20:14] I'm fine with either, but I don't know if the arguing is going to go away anytime soon [07:20:27] so as I mentioned, if it's a runtime option at least we can argue by changing options instead of applying patches [07:23:26] Why are osi_Asserts() added in ih_reallyclose() in 969{4,5}? [07:27:40] Hmm... not much input here... [07:28:00] the case they are added in should never be called in the cases they're added for [07:28:40] Question: If we simply merge 9407, thus kill ih_sync_all/thread for good, who would patch their servers or recommend their clients to do so? [07:28:43] we had better get that right, since in those cases the user explicitly asked us to make sure we synced the data [07:28:45] (i had the same question when i reviewed, and had to reread the code once to get it) [07:29:04] oh, if that's just unclear, I could add a comment or something [07:29:05] yes, those are valid asserts. (could be debug_asserts if we had such a thing) [07:29:35] that code block is just when ih_synced is set, so they should only be hit if our syncs are delayed or "onclose" [07:30:03] or perhaps more generally, "between the immediately-sync and never-sync cases" [07:31:30] > who would patch their servers "cmu people"/chaskiel, and I could imagine maybe cern if there are internal debates about this I don't know about [07:32:11] But even chaskiel wouldn't patch it back to the current state. [07:33:17] And CERN questioned the syncs already when they submitted the patch putting them in the background. [07:33:32] And the persons in charge there +1ed 9407. [07:34:12] yeah I know; I was theorizing "worse"-case if there were others that disagree with that that would come later and maybe want to patch it or something [07:34:28] that is, others in cern that are not part of this conversation [07:34:42] I just mean, those are the most likely places I see that happening [07:35:33] Putting it differently: If we applied 9465, would you recommend sync=none to all your customers? And to me? [07:35:38] chaskiel seemed to be indicating he'd just put the synchronous syncing back in, but I didn't think that decision was final yet for him :) [07:36:09] I don't think that number is correct, but I assume you mean the -sync option [07:36:34] I think I would just tell everyone to not do anything with it and not bother [07:36:41] yes, 9695, and -sync=never [07:37:12] if I found that there were some disk performance issues or something, I'd suggest "-sync never", or "always" if they ask or seem concerned about it or something [07:37:25] i'd not tell people to override it unless they had an issue which seemed to be related to it [07:37:43] or actually, I'd recommend "never" for zfs, at least for the person that was complaining about this behavior :) [07:38:36] Just a second, what's the default? [07:38:46] and for bitrot, well, the code differences here are very small (except perhaps for 'delayed') [07:39:04] the default as submitted now should be DELAYED for 1.6, and ONCLOSE for master [07:39:19] (if it's not actually that in gerrit, that's a mistake) [07:39:24] I thought that's what we said last week [07:39:50] Sorry, I hadn't looked at the details. [07:40:58] Derrick, will they know before their data gets eaten? [07:42:10] > I think I would just tell everyone to not do anything with it and not bother [07:42:28] if the same type of bug exists somehow and is triggered, changes 9506-9508 would avoid data loss [07:42:41] er, 9505-9508 [07:42:43] So you'd tell everyone to continue running their servers with the 1.6.2 behaviour? [07:42:44] yeah, what andrew said [07:42:49] yes [07:42:55] there are two safeguards there, either one would do it [07:43:20] yeah; as I said, I didn't mean that the ih_sync_thread needed to be removed _immediately_ [07:43:48] so I wouldn't push for customers to get away from it urgently [07:46:12] When those safeguards trigger, what happens? [07:46:37] it takes the volume offline; I think for dafs it may trigger a salvage [07:47:07] And that avoids data loss? [07:47:12] at that point there should be no damage to the volume, but taking the volume offline and bringing it back online ensures we don't corrupt anything [07:47:30] well actually [07:47:35] (avoids loss of data, including that in flight?) [07:47:37] 9507 should catch it before any data loss happens [07:47:50] er, I mean 9508 [07:48:20] 9505 just avoids the data loss from getting propagated to RO volumes, which is the thing that made this such a huge outage for both relevant incidents [07:48:43] so, you'd lose the data in the RW, but the release would just fail so ROs would still look "okay" (but old) [07:49:26] the data "in-flight" gets an error returned for it (for 9508), so the application should get an error saying that something went wrong [07:50:06] (or if dafs triggers a demand-salvage, it should go through successfully, since we're retry the request and presumably it'll go through fine on the retry) [07:50:44] I caught Garry, who is also "fond of my data" as Chaskiel is. Which is I guess a vote against 9407. [07:51:17] yeah, so, I don't see the -sync option as something I'd recommend people to use in general, but [07:52:05] I consider it one of those options where I get some people asking "why doesn't openafs do A, anything else is insane/stupid" and others asking "why doesn't openafs do B, [...]" and A and B directly contradict each other [07:52:20] so, here the behavior they want may not be the default, but at least they can do it [07:52:39] (here A and B are "doing synchronous" syncs and "doing no syncs at all", probably) [07:53:07] Ok, we have to decide on something, and there's no obvious best decision. [07:53:19] > Which is I guess a vote against 9407. not really. the fact that he needs to set a switch to get the behavior he wants doesn't mean his opinion is a vote against [07:53:26] (nor that it's one for) [07:53:58] I say we make everyone happy. If you have a problem with it, please speak up now. [07:54:11] 9407 is the option to remove ih_sync_thread, not the -sync option one [07:55:07] ==deason [07:55:17] er. yeah. then it's a vote against that. sorry. i can't keep track of all the damn numbers :\ [07:55:18] The option should be okay for MIT. [07:55:53] I'm just trying to weigh the badness of rushing new code straight into a release versus the history of data corruption. [07:56:14] by the way, are we okay with the 'master' submission saying what 1.6.3 does? [07:56:41] chas raised something about me saying that 1.6.3 would default to 'delayed' behavior, but 1.6.3 hasn't been released yet [07:57:01] I figured if something changed, another commit can just change it later [07:57:33] It will be correct once 1.6.3 was released, so what's wrong? [07:59:20] I thought he was just being picky since 1.6.3 doesn't technically exist yet :) [07:59:24] I did notice that, but it did not seem like a big deal. [07:59:30] but not worth a conversation here, then [07:59:33] move on [07:59:57] Ok, let's move on. [08:00:47] Getting the change polished may take a while. That's no problem. [08:01:30] Merging the other changes, even simple ones, is a pretty slow process now. [08:03:23] So, what else to add to 1.6.3? Those safeguards seem to have good support? [08:05:20] yes, i think so. [08:06:01] the 'salvager-fixes' topic (9457-9462) maybe should just be deferred; they would need more review, and they aren't that simple [08:06:03] Did anyone other than Andrew have a list? [08:06:19] Andrew, ok. [08:06:24] 9460 could go in, though, it just removes a function [08:06:33] (er, an unused function) [08:06:55] 9460 is on my list of things to merge. [08:07:33] But it's number six in a row of changes touching the same file. [08:08:49] 9460 is just code cleanup, so it doent need to be merged now. [08:08:55] (although three of them we just deferred) [08:09:42] it doesn't need to be merged now, but I thought it'd just be easier to get it out of the way so we don't need to look at it again later [08:10:34] sure, whatever is easier for Stephan. [08:11:01] changes 7,8,9 touching the same file are 9461, 9462, 9480 - can all those wait? [08:11:30] At this point, pretty much everything has to be rebased before it can be merged. [08:13:09] Merging too many completely out of order, especially with such "popular" files, makes things more messy. [08:14:57] you can ask the submitter to rebase? [08:15:13] Stephan Wiesand: 9461, 9462, yes [08:15:26] 9480 _can_ wait, but I wasn't including that with the "just defer them" comment [08:15:41] 9480 is much simpler/clearer, and it has 3 +1s [08:17:01] meffie: that's not always feasible; to some degree you need all 1.6 changes on the same branch, ordered one after another [08:17:02] It's the very last out of 9 remaining changes to vol-salvage.c. [08:17:30] it should not be impacted by any of the others; it touches a completely different part of the file unless I'm remembering something wrong [08:17:51] I will eventually start asking other to do the rebases simply because they become too complicated for me. [08:18:21] ok [08:19:10] But there's no point in doing it while it's already clear that it would have to be rebased again. [08:19:34] I would also raise the possibility for deferring 9571 [08:19:42] I looked at it, but it's too complex to review quickly [08:19:47] Fine, I'll merge 9480. I haven't looked at all 9 changes side by side. And I probably won't. [08:20:39] No reviews yet. It's not ready. [08:20:45] Deferring 9571 seems fine... [08:21:17] Stephan Wiesand: if you wanted, I could possibly handle rebasing/submitting after you +2 something [08:22:15] Thanks. Rest assured I'll start begging eventually :-) [08:22:38] have we ever discussed the 'flushcps' stuff in here yet? (9484-9487, topic 'flushcps') [08:22:48] I'm not sure I understand what's going on in 9597. [08:23:01] that is a change in behavior for anyone relying on 'cacheout', but I don't think that's many people [08:24:35] I feel like I looked at the flushcps stuff before, maybe for the master submission. [08:24:42] cacheout is not packaged, so anyone using it would be doing their own builds. [08:25:27] 9597: well, regarding what I said, the change in there is "better", but there is more that should be done [08:25:43] Again, two of those are sitting in the end of lists of changes touching the same files. [08:25:55] 9486 is no problem. [08:25:59] so if we release with that, it will be an improvement over 1.6.2, but I think we're still a regression from 1.6.1 [08:26:59] I could submit something very targeted to try and fix the worst of the cases I mentioned for 9597, but it's really a bandaid [08:27:16] (that is, for 1.6.3) [08:27:36] My impression was it introduces different regressions? [08:29:49] yeah, I suppose; I think it's still better than 1.6.2; I didn't think releasing 1.6.3 with the 1.6.2 behavior in that file was really an option [08:31:13] The 1.6.2 behaviour is "error messages without actual problem"? [08:32:05] And after 9597 that becomes "no error message in case of some real problems"? [08:33:12] you still get an error message, just not saying what the specific error code was [08:33:30] Stephen, I think the question is whether 9597 is acceptable or whether 1.6.3 should block on a more comprehensive fix. [08:33:39] and those code paths I believe are very rare [08:34:40] Is a solution being worked on on master? [08:35:21] no [08:35:34] but the type of change I'm talking about is adding a if (foo) printf(bar): [08:35:56] (and I was asking here first, if we'd want something for 1.6.3) [08:37:22] Stephan, this is the type of change which is unlikely to get developer cycles unless one of two things happens. One, a customer demands under a support contract. Two, it is a blocking issue for a release. [08:38:53] The problem was merged on master late in 2011, and on 1.6.x a year ago, and it wasn't noticed until March? [08:39:17] fixing the error messages? I am pretty darn motivated to fix it on my own, because otherwise I get vos output for issues that's impossible to debug [08:40:13] Andrew, "fit it" = "band aid" ? [08:40:23] (and the 'printing error messages on success' problem already has support-contract-impetus) [08:40:43] Stephan Wiesand: it was reverted for 1.6.1 [08:41:24] Can we revert it again? ;-) [08:42:13] I advocated for reverting it from 1.6 entirely, but it was only reverted from 1.6.1 specifically back then [08:42:30] (to be clear, this would mean reverting the "stay up during releases" vos functionality) [08:42:55] which was requested by one of deason's customers. [08:43:28] I didn't mean the revert was requested, just that "hey this prints error messages what's going on" has a customer actually saying something [08:43:56] ETA for the band aid? [08:44:01] requesting the revert is me speaking for myself, due to the complexity and feature-y-ness of the change [08:44:09] today; 24 hours [08:44:53] Any objections to 9597 + band aid for 1.6.3? [08:45:20] the functionality addresses significant problems caused when the only copy of a .readonly volume that is valid is on the file server partition that has the RW instance. If you have geographically distributed servers across expensive/slow pipes it is very sad when all of your clients fail over to the RW file server which at the same time is attempting to distribute the updates to the volume. [08:46:19] Thanks for the explanation, Jeff. Sounds like we don't want to remove that. [08:46:45] it's a problem that has existed since forever [08:47:21] we don't need to have this argument every single release, but I may make a snide remark whenever a bug with it is encountered, and I apologize in advance [08:48:06] don't apologize. snide remarks are what make openafs great. [08:48:28] no objections for that, though, I guess, so we can move on [08:48:37] that and retaining problems which have existed forever continuing on forever. [08:49:04] Yeah, it just wouldn't be OpenAFS without problems that have existed forever. [08:49:05] (well, to be clear, I still kinda object to the general change, but I feel we've already made that decision, so continuing to argue about it is not fruitful) [08:49:55] Ok, 9597 + band aid it is. [08:50:33] So, what else absolutely must go into 1.6.3? [08:50:34] in 9604 I'm still not clear on what this is actually for in 1.6? meffie? [08:51:24] Says "64-bit solaris" [08:52:10] enabling 64-bit solaris builds is a different change, which I don't think is currently submitted for 1.6 [08:53:13] oh, that was just for build errors hit when building 64 bit solaris bins. [08:53:33] yeah, but we don't do 64-bit solaris bins on 1.6, yes/no? [08:53:55] there are osconf.m4 changes needed to actually do 64 bit on sunos [08:54:41] so does 9604 help? can it be deferred? [08:55:24] Sounds like it's safe to defer, then. [08:55:54] well, it can be deferred, since the osconf.m4 changes werent finished. [08:56:07] okay [08:56:13] Deferred :) [08:56:28] one more thing; I believe we said last time that we'd include the changes so we could use a non-standard heimdal install [08:56:47] (the måns nillson heimdal-on-solaris-10 thing) [08:57:09] that's 9692, and 9693, but 9693 is just master and hasn't been merged yet [08:57:30] but it's just build system logic [08:59:41] There's 9686/7 too. [09:00:39] I thought we said those were okay, last week. [09:00:39] Which haven't had much review either. [09:01:26] we said they were okay strategically, but people still need to +1 to indicate they've checked it and it looks okay [09:01:37] Ah. Okay, will do. [09:02:08] (and to explicitly indicate that they are actually okay with it going in) [09:02:22] The changes became available after the meeting. [09:03:29] Even if the way to a solution was discussed, I'd still like to have +1's from others than the author of a change. Especially one that hasn't even spent a week on master. [09:03:38] --- squinney has left [09:05:08] But I'm ok with merging those once reviewed that way. [09:05:29] 9693 is really really late. [09:06:26] And the last time we accepted a last minute "build system only" change, in hindsight we shouldn't have. [09:07:10] which one was that? [09:07:25] get asn1 libs for aklog [09:07:47] Er, that's 9693 [09:07:56] yeah I meant, what's the "last time" [09:07:58] Not sure what the last build-system-only change was. [09:08:25] Something to help perl-AFS. Only. [09:09:21] that was a mistake? [09:09:33] i dont recall any fallout. [09:09:43] but yes, this isn't critical; I'm not pushing too hard [09:10:02] seems 9693 can be deferred [09:10:05] I just thought risk was low because it was just build (and a simple one) [09:12:21] but yes, this isn't terribly critical and the people affected by it (or... "person" :) can work around it by changing stuff at build time [09:13:46] Get it merged on master and reviewed on 1.6.x in time, and we'll see. Right now, it's just late. [09:15:03] okay, moving on? [09:15:52] Ok, let's move on. But we'll have to stop eventually. [09:18:40] are there more? [09:18:40] oh well, I didn't mean I had anything else [09:20:39] Stephan Wiesand: anything else? [09:20:55] While there are quite a few changes left that make sense to me, I think we have enough changes for the next release. [09:23:27] Ok. Polishing and merging what's agreed will take a couple of days I guess. [09:23:41] And then we should have a pre1. [09:23:47] yay [09:24:36] AOB? [09:25:19] I was wondering whether it would make more sense to maintain the release notes on the 1.6.x branch? [09:25:57] aob == all on board ? [09:26:30] "Any Other Business". [09:26:44] ah, :) [09:27:08] But "all on board" is a question that makes sense too, of course. [09:29:06] Any volunteer for writing minutes? [09:30:03] okok, I'll do it... [09:30:55] Thanks everyone. [09:31:19] thank you Stephan. [09:31:23] --- meffie has left [09:31:29] Thanks, Stephan. [09:31:54] Back to work. Bye. [09:31:57] --- Stephan Wiesand has left [10:42:27] --- Marc Dionne has left [10:44:08] --- stephan.wiesand has become available [11:35:48] --- meffie has become available [14:03:33] --- stephan.wiesand has left [14:17:12] --- meffie has left [15:01:44] --- deason has left [19:48:04] --- ktdreyer has left [19:48:38] --- ktdreyer has become available [20:50:30] --- Jeffrey Altman has left: Disconnected [20:54:57] --- Jeffrey Altman has become available