[04:59:36] --- shadow@gmail.com/barnowl20770CFD has become available [05:28:58] --- wiesand44275 has become available [05:28:58] --- wiesand44275 is now known as wiesand41455 [05:28:58] --- wiesand41455 has left [05:28:58] --- wiesand44275 has become available [05:43:00] --- wiesand44275 is now known as wiesand62993 [05:43:00] --- wiesand62993 has left [05:43:00] --- wiesand44275 has become available [05:43:00] --- wiesand44275 is now known as wiesand62993 [05:43:00] --- wiesand62993 is now known as wiesand91343 [05:43:00] --- wiesand91343 has left [05:43:00] --- wiesand91343 has become available [05:43:00] --- wiesand91343 is now known as wiesand44275 [05:43:07] --- wiesand44275 is now known as wiesand91343 [05:43:07] --- wiesand91343 is now known as wiesand13335 [05:43:07] --- wiesand13335 has left [05:43:07] --- wiesand13335 has become available [05:43:07] --- wiesand13335 is now known as wiesand44275 [05:43:07] --- wiesand44275 is now known as wiesand91343 [05:43:07] --- wiesand91343 has left [05:43:07] --- wiesand44275 has become available [05:43:07] --- wiesand44275 is now known as wiesand91343 [05:43:07] --- wiesand91343 is now known as wiesand56226 [05:43:16] --- wiesand56226 is now known as wiesand44275 [05:43:16] --- wiesand44275 is now known as wiesand91343 [05:48:18] --- wiesand91343 is now known as wiesand44275 [05:48:18] --- wiesand44275 is now known as wiesand56226 [05:48:18] --- wiesand56226 is now known as wiesand91343 [05:48:19] test [05:48:26] --- wiesand91343 is now known as wiesand44275 [05:48:26] --- wiesand44275 has left [05:48:26] --- wiesand56226 has become available [05:48:26] --- wiesand56226 is now known as wiesand916 [05:48:27] --- wiesand916 is now known as wiesand56226 [05:48:27] --- wiesand56226 is now known as wiesand91343 [05:48:27] --- wiesand91343 is now known as wiesand44275 [05:48:27] --- wiesand44275 is now known as wiesand92108 [05:48:27] --- wiesand92108 has left [05:48:27] --- wiesand92108 has become available [05:48:27] --- wiesand92108 is now known as wiesand56226 [05:48:27] --- wiesand56226 is now known as wiesand44275 [05:48:27] --- wiesand44275 is now known as wiesand91343 [05:49:42] nope [05:53:21] --- wiesand91343 is now known as wiesand56226 [05:53:21] --- wiesand56226 is now known as wiesand44275 [05:53:21] --- wiesand44275 is now known as wiesand92108 [05:53:21] --- wiesand92108 is now known as wiesand91343 [05:54:53] --- wiesand91343 is now known as wiesand56226 [05:54:53] --- wiesand56226 is now known as wiesand91343 [05:54:53] --- wiesand91343 is now known as wiesand44275 [05:54:53] --- wiesand44275 is now known as wiesand92108 [05:58:05] --- wiesand92108 is now known as wiesand56226 [05:58:05] --- wiesand56226 is now known as wiesand91343 [05:58:05] --- wiesand91343 is now known as wiesand44275 [05:58:05] --- wiesand44275 is now known as wiesand92108 [05:58:05] --- wiesand92108 has left [05:58:05] --- wiesand38670 has become available [05:58:05] --- wiesand38670 is now known as wiesand56226 [05:58:05] --- wiesand56226 is now known as wiesand91343 [05:58:05] --- wiesand91343 is now known as wiesand44275 [05:58:05] --- wiesand44275 is now known as wiesand92108 [05:58:05] --- wiesand92108 has left [05:58:05] --- wiesand94137 has become available [05:58:05] --- wiesand94137 is now known as wiesand56226 [05:58:05] --- wiesand56226 is now known as wiesand94137 [05:58:05] --- wiesand94137 is now known as wiesand92108 [05:58:05] --- wiesand92108 is now known as wiesand44275 [05:58:05] --- wiesand44275 is now known as wiesand91343 [05:58:15] --- wiesand91343 has left [06:04:28] --- wiesand42838 has become available [06:04:29] --- wiesand42838 has left [06:04:29] --- wiesand42838 has become available [06:04:29] --- wiesand42838 has left [06:04:29] --- wiesand98706 has become available [06:04:41] whoami [06:05:22] wiesand98706 [06:05:36] --- stephan.wiesand@googlemail.com has become available [06:06:02] --- Marc Dionne has become available [06:06:16] Messages behaves weird. With Adium, I can't join this room at all. [06:06:40] some ssl thing. with adium i can't currently log in to my openafs.org jabber account at all. [06:07:18] Joining openafs with Adium works. [06:08:24] and whoami [06:09:22] Ok, so now it works with Adium too. [06:11:11] All messages I send from messages appear duplicated in Adium. All your messages appear duplicated in Messages. Weird. [06:12:41] all messages period are appearing duplicated for me. [06:14:02] At least I'm no longer kicked out atm. [06:28:28] Pidgin 2.10.9 can be used but not 2.10.8 [06:29:44] Hey, that one was duplicated in Adium but not in Messages [07:01:45] --- deason has become available [07:02:08] Ok, let's try... Hello all, who's able to participate? [07:02:43] hi Stephan, i'm here [07:02:50] hi [07:04:53] I sent a question to the list short while ago. No answers, so let's skip the first item? [07:05:40] Marc, what's Linux up to? [07:05:58] we're at 3.14-rc1, and still ok [07:06:18] Good news. Thanks. [07:06:49] good thing we removed use of getname(), since it is no longer exported [07:07:00] sorry, i got distracted reviewing code. [07:07:13] I noticed ;-) [07:07:43] I saw you merged 10774. [07:08:06] Is that likely to help with the issue reported on openafs-info lately? [07:08:19] don't think so [07:08:54] But it's still something for 1.6.7? [07:09:15] yes [07:09:37] it helps with the getwd() getting ENOENT, which has been a real problem for some users, so I think we want it for 1.6.7 [07:09:47] Ah, it's already there. [07:09:56] Yes, it's a problem here too. [07:10:03] yeah Anders already pushed it to 1_6 [07:10:49] anders wants it and is apparently on top of things. so yay anders. [07:11:11] It's the next one I'll test. I'll likely roll it out here before 1.6.7 if it does well. [07:12:00] wifi-controllable implanted cybernetic hair follicles. that's the only answer i have. [07:12:09] uh. sorry. replied to wrong message [07:12:32] Glad you said that ;-) [07:12:46] Let's talk about the more general questions (3.4 on the agenda) next? [07:13:33] GUACB: Do we want to wait for the perfect solution? [07:13:54] perfect being per-something as opposed to on/off? [07:13:59] Or could we go with the simple global switch for the time being? [07:14:03] Right. [07:14:47] i'm fine with global switch. people who are worried about crashing servers either 1) are admins who are already vulnerable and should fix it or 2) are clients who can just turn it off [07:14:55] global switch is better than nothing; we could add more granular stuff later [07:15:24] I'm a bit selfish here. I want it for my clients :-) [07:15:24] and per (something) handling code will probably have other uses (or even be completed because something else needs it) [07:15:37] stephan, i just patch it into my clients directly. [07:15:44] per-cell [07:16:16] I really try to run what we release... [07:16:42] Or I wouldn't be whining about some features ;-) [07:16:58] if you released what i wanted to be running i'd be running it :) (well, for production things. for machines where i mean to test, who knows what i am running) [07:17:05] and we already have 'fs setcell'.... a global switch would require a new command or something, but a per-cell one has more existing infrastructure to work with, really [07:18:15] Ah, I understood the global one would be really easy, and per-cell much more work. [07:18:17] I collect statistics of deployed file server versions. A per cell switch wouldn't help most sites. [07:18:49] I don't see what those two sentences have to do with each other [07:19:03] yeah, i'm confused by that statement [07:19:54] Jeff, could you elaborate? [07:20:04] I'm assuming he's typing a long response :) [07:20:13] a majority of publicly accessing cells have a mixture of file servers that range across versions that would end up with corrupted memory and those that do not. [07:20:31] yeah, so you wouldn't set it for one of those cells [07:21:01] so per-cell is fine. those cells would want "off" [07:21:08] end users do not know what their file servers are running. [07:21:22] or which volumes are on which file servers [07:21:53] so... you're selling per-cell, yes? [07:21:55] I thought the way it would usually work is that you'd set it for your local cell, or an admin with deployed clients would set it for the cell(s) they control [07:22:39] just glancing at the code, it does seem easier to do it that way (and yeah, previously I had thought the global switch would be more work, but I don't think so anymore) [07:22:48] That's what I would do. [07:23:10] I don't think the is much more safety in a per cell switch than a global switch. [07:24:30] If either one was doable for 1.6.7, I'd like to have it. [07:25:19] Are there any objections to either version? [07:26:19] Taking this as a "no". [07:26:54] Next one: interrupt rx calls accessing offlining volumes? [07:26:58] not done yet [07:27:19] gerrit 6266 and 10799 [07:28:06] The behavior on master for GUACB is on all the time? Does the configuration permit turning it off there? (global or by cell or never?) [07:28:36] no, no config was added yet [07:28:36] the configuration of the unix client on master? it's always on [07:29:00] I know that no configuration has been added. What is the final behavior going to be? [07:29:12] default to on, is what I assumed [07:29:20] (but still configurable, yes) [07:29:21] all the time, no configuration [07:31:37] Could you (Derrick, Andrew, Jeff) summarize your personal conclusion? [07:31:45] I don't have a conclusion yet [07:33:05] A global switch is easy to specify. Its a command line option for 'afsd' and it is on/off. Once you say per-cell, now all sorts of questions come up as to how configuration should be provided and what the long term user experience should be. It isn't just a new option for 1.6 that becomes a no-op on master. [07:33:47] conclusion? my conclusion was upgrade your damn fileservers and it was that long ago. how i think this should go? add config, defaults off in software but packaging if you use openafs.org packaging turns it on in the default config files we ship [07:33:49] Instead it becomes a new bit of configuration data that should probably be set as part of the proposed CellServDB replacement [07:34:24] for this, that is my reaction as well. Its a bug. It was fixed years ago. [07:34:34] my thoughts: one series with per-cell config defaulting to 'off' (1.6), one with per-cell defaulting to 'on' (1.10), and onwards just 'always on' [07:34:48] but if someone does a global switch I won't try to block it [07:35:14] the problem is that we know that large numbers of site are still running 1.2.x and 1.4.x servers that have this bug and for whom new clients are deployed more frequently than file servers are upgraded. [07:35:44] to what end? [07:37:26] I really wouldn't want to block this on a CellServDB replacement. [07:37:52] Would "fs setcell" be acceptable? [07:38:12] user visible changes that must be supported forever are what I want to avoid [07:38:55] this is a bug and to tiptoe around the bug we are going to create a large documented and supported user visible configuration infrastructure? [07:41:41] there could conceivably be an 'fs' subcommand or something that is explicitly documented to have changing parameters over release [07:41:43] So you're in favor of merging it into 1.6.7, on unconditionally? [07:41:44] over releases [07:41:57] (jeff) [07:42:10] This bug was fixed in 2007. Its been nearly seven years. [07:43:27] I'm feeling like bringing this up on -info... [07:44:22] i won't tell you not to, only that how you explain it will calibrate the degree of informed that the answers you get are. [07:45:33] Here is what I would like to do. SNA and YFSI will perform a review of all of our customer systems to confirm that none of them are still vulnerable to this issue. I would hope not because we have all migrated our customers to rxkad-k5. We will send an announcement mail to openafs-info and openafs-announce reminding folks of this issue and recommending that file servers be upgraded. We will specifically announce this change at EAKC. And we will unconditionally turn on this feature in 1.6.7 to ship after EAKC [07:46:27] I just don't understand how this is even a question [07:47:05] if it's even 1% possible that this bug could cause 1 incident of 1 fileserver hitting memory corruption, turning this into an option is worth the maintenance effort [07:47:34] that would only be true if there were no clients in the wild issuing the calls. [07:47:40] Arla always issued the calls [07:47:48] The windows cache manager always issues the calls [07:48:14] there are plenty of sites with only unix cache managers, and a mix of old and new [07:48:42] and there are plenty of publicly-available fileservers that apparently haven't been hit by this enough to notice [07:48:57] (er, I assume; there were at some point, I haven't checked recently) [07:49:01] and there are plenty of sites that are never going to upgrade file servers and will upgrade clients to a version that has this on by default. It won't matter if we wait another five years. [07:49:10] Running such old servers publicly, you put yourself at risk in more severe ways already. [07:49:32] yes, but if a client wants to not impact the fileservers of other cells, without an option they have no choice [07:50:00] there is a cost associated with adding more knobs [07:50:09] there's no way for a user to try to get a file at cell.example.com without possibly causing the administrator a headache [07:50:28] yes, and it's so much smaller than the benefit here [07:50:36] agreed. there is no way for an end user to know if it is ever safe or not [07:50:49] the suid/nosuid obviously was a huge benefit, and the cost of maintaining those options is near zero [07:50:52] therefore it is never 100% to be on. [07:51:02] well, there is. it just means they have to pull the plug on their network before shutting down afs. so. not "no way" just "not a particularly useful way" [07:51:13] haha, okay shadow :) [07:52:22] what is the difference between turning it on by default today and doing so in six months or a year? [07:53:09] at minimum, which release it is in. possibly which series. [07:53:16] it gives a time period where people can see if things seem to start breaking after they either (1) turn the switch on, or (2) upgrade to a version where it's on by default [07:53:51] if they upgrade and see something starts breaking, with it always-on they can downgrade and stay downgraded; or with an option they can just turn off the option [07:54:09] that is true for a global option. [07:54:44] how will an end user that deploys a new version on their mac know that the new version had an impact on file server stability? [07:55:04] wiesand98706: I'm not sure if we're going to get much agreement in here, and this is taking like the whole meeting [07:55:13] will the end user know that there is a core file? that the file server restarted? that a thread deadlocked? [07:55:33] defer? [07:55:52] I'm now rather sure we'lll never ever get agreement on this. [07:55:59] the admin oculd be monitoring client versions; but I had thought this would be most useful for clients under some kind of control, not arbitrary over-the-internet clients [07:56:17] (I'm tlaking about "any kind of option" here, if that's not clear) [07:56:26] admins with control would have upgraded or patched their file servers [07:56:27] I rule: this is not a release issue. It's AFS politics. Not even limited to OpenAFS. [07:56:44] I'll leave the decision to the gatekeepers. [07:56:45] that's not true at all [07:56:53] (er, admins would have upgraded...) [07:57:12] well, it could be mentioned on -info or somewhere.... [07:57:36] I vote for unconditional on with no configuration switch [07:58:14] Jeffrey's plan looks fine to me (announce, EAKC,...). [07:59:02] I also think turning it on unconditionally in the middle of a stable series is pretty rude [08:00:22] We'd have the very same discussion when the next stable series becomes ready. [08:00:52] introducing something like this in 1.10.0 I think is less "rude" [08:00:59] not if it's forked from master. [08:01:07] it's a bigger jump, so it's less unexpected that it causes stuff to break [08:01:52] but that would also mean it's a long way off [08:02:13] the runtime option I thought we be a less "surprising" way to get it in the middle of this series [08:02:29] but sorry sorry, I don't mean to be dragging it out [08:02:40] are we seeing a decision here now, or are we deferring? [08:02:50] (and thus talking about something else?) [08:03:37] I'll leave it to the gatekeepers. If they unequivocally want it in 1.6, I'll merge it. I like the plan to announce it and talk abaout it at EAKC. [08:03:49] The file server versions that are at risk are 1.3.50->1.4.5 and 1.5.0->1.5.27. [08:04:08] this is SA-2007-003 [08:04:43] are we mentioning/asking -info? [08:05:40] Announcing on info is part of Jeff's proposal. [08:06:11] On to "vol: Interrupt RX calls accessing offlining vols "? [08:07:05] 6266 + 10799 [08:08:08] Is it a feature for 1.6.7? [08:08:34] i'm fine with it [08:08:48] it was discussed as desirable before; it's been running in production with dafs for a little while now, this time [08:09:05] Objections? [08:10:19] I'm having trouble loading the changeset in gerrit. gerrit is being very slow. [08:11:02] Ok, while waiting for that: Remove the ih_sync thread? [08:11:37] Alternatively, set the default to -sync=none in 1.6.7? [08:11:48] 6266/10799 is the -offline-timeout thing, if that helps at all waiting for gerrit [08:12:46] defaulting to -sync=none seems "safer" if people really do want the old behavior, but I don't feel so strongly about it [08:13:13] i'm down with at least sync=none [08:13:41] Sorry: "I'm down with" means? [08:13:55] he's okay with it [08:14:10] Andrew, have you received feedback from end user sites that have deployed file server versions with the -sync option? What are they selecting or what are you recommending? [08:15:00] "it's a good plan" [08:15:02] IIRC, we recommended sync=none in the release notes and/or announcement. [08:15:04] either 'none' or 'always', since those are the options that make sense to me; but that usually turns into just 'none' because people don't like the perf hit [08:15:56] there's no real "feedback", the only expected changei s that volumes don't destroy themselves randomly anymore, and that seems to be the case :) [08:16:04] Or at least "one of the others". [08:18:42] Ok, sync=none by default, and leave the switch and the thread in. Announce loudly. [08:18:54] I feel like if there are objections to this, they'd come from people that are not in this room [08:19:17] I'm fine with 10779 [08:19:27] okay, I'll submit a change to default to sync=none, then [08:19:41] Thanks. Thanks. [08:19:45] I think you mean 10799 [08:20:03] I hope you're fine with 10779; you merged it :) [08:20:29] yes, I +1 10799 in gerrit [08:22:26] Ok. I think this has taken long enough. I propose skipping going over individual changes on 1.6.x now. [08:22:50] I am in agreement with changing the -sync default [08:23:14] there are 2 specific things I want to bring up, but this will seriously be very quick [08:23:20] We have agreement on something? ;-) [08:23:29] Andrew: ok [08:23:52] derrick, can you please abandon 9370, and 6342? [08:24:18] although, I wonder if we should add a warning to the log if -sync is not explicitly specified stating that it is now off [08:25:35] hmm, maybe, but then the only way to make the warning go away is to provide an explicit setting for it [08:25:55] I worry slightly about whining about too many things in the log so that people don't pay attention to them [08:26:22] I think such a discussion can be in the gerrit for the change [08:27:02] I assume the meeting is coming to a close, but we didn't discuss the thing wiesand asked about on release-team; did I interpret it correctly that we're skipping that? [08:27:18] One message per fileserver start. Doesn't ahve to stay forever. Why not? [08:27:41] shadow: thanks [08:27:49] i think we should wait til next week to figure out if we should discuss that here, since we don't have a better answer [08:27:57] okay [08:28:34] Good. I think this concludes today's meeting. [08:28:50] Thanks a lot everyone for the constructive discussions. [08:29:47] The minutes will give anyone reading -devel a first opportunity to complain about the plans. [08:30:47] Will write them later. Bye now. [08:30:52] --- wiesand98706 has left [08:31:00] --- stephan.wiesand@googlemail.com has left [08:31:09] --- deason has left [08:31:16] bye [08:31:16] --- wiesand has left: Lost connection [08:33:36] --- Marc Dionne has left [09:12:39] --- deason has become available [14:36:44] --- deason has left [14:48:47] --- deason has become available [17:15:08] --- Jeffrey Altman has left: Disconnected [17:15:21] --- Jeffrey Altman has become available