[00:04:05] --- Simon Wilkinson has become available [00:56:18] --- Stephan Wiesand has become available [01:06:48] --- Simon Wilkinson has left [05:18:02] --- Simon Wilkinson has become available [05:58:57] --- meffie has become available [06:13:09] --- Simon Wilkinson has left [06:16:13] --- Simon Wilkinson has become available [06:31:04] --- Simon Wilkinson has left [06:56:41] --- paul.smeddle has become available [07:00:10] Hello [07:00:23] Hi [07:00:35] Hi All [07:00:48] --- deason has become available [07:01:13] just looking at https://rt.central.org/rt/Ticket/Display.html?id=131587 [07:02:48] Something to worry about? Any chance this is already fixed in 1.6.2? [07:02:50] Not sure what inherently makes that interesting. [07:03:22] the other bug he reported is something we have a fix for, but we deferred it until after 1.6.2 [07:03:44] that one could be the same thing [07:04:27] that is, 131277 and 8846 [07:04:33] Yes, looks similar. [07:05:55] Does 131587 want a note that we suspect memory corruption and not an actual lack of memory? [07:07:45] --- Marc Dionne has become available [07:08:11] I'll handle that (give hime some gdb stuff to run); it doesn't seem terribly relevant to here... unless we're blocking something on it [07:08:40] We probably shouldn't, IMO. [07:12:50] I have an agenda item to add whenever we have time. [07:13:09] I guess we have time ;-) [07:13:18] (Not blocking on that ticket seems fine to me.) [07:14:14] Ok, what's your item? [07:14:30] We had a couple of patches go in to fix warnings on freebsd 9.1 that, if I remember correctly, only cause our build to fail on master and not 1.6, because the freebsd module build process is different. [07:15:38] The warnings do indicate real (potential) issues, namely 64-bit values being collapsed to a 32-bit type. I think the only one that would potentially affect us is the mount flags field, and I'm not sure that the fix really needs to go into 1.6.2. [07:15:51] But I figured I would ask the team. [07:16:13] they're all isolated enough that they affect only freebsd, right? [07:16:23] Should be. [07:16:56] i always took such things, but, not my circus, not my monkeys [07:17:22] You'd want to push them to 1.6.x directly? [07:17:41] The patches are alreay in master, just haven't been pulled up. [07:18:18] I suppose I might as well try to get them in. [07:18:20] --- Marc Dionne has left [07:18:40] --- Marc Dionne has become available [07:19:02] Maybe I will think harder about reworking the freebsd module build on 1.6. [07:19:08] (But not for 1.6.2.) [07:20:06] It's really late for additions to 1.6.2. But pushing them to gerrit for review on the 1_6_x branch won't hurt from my point of view. [07:21:31] It seems we're blocked on the availability of major players for a few days. [07:21:47] kaduk, could you defer to a 1.6.3, assuming it's not too far out? [07:22:21] I think that would be okay. [07:23:43] are we deferring the linux 3.8 stuff? that's the only other thing I see submitted, I think [07:24:02] +1 to deferring that [07:24:19] I would wait. Still in rc [07:24:25] Having all changes potentially wanted in 1.6.3 visible in gerrit for the 1_6_x would be helpful to me. [07:25:19] Yes, the linux 3.8 changes should wait for 1.6.3, and we should make that happen more timely than 1.6.2. [07:26:43] There are two older changes sitting in gerrit. 6266 and 6272. Should they go into 1.6.3? [07:27:07] it's a bit late... [07:27:12] oh, you said 1.6.3 nor 1.6.2 [07:27:15] er, not 1.6.2 [07:27:39] Yes, I'm talking about 1.6.3 :-) [07:27:39] I think they could [07:27:58] are we talking about 1.6.3 for the rest? is there nothing further for 1.6.2? [07:28:31] i suppose it is not a big deal to share that a security patch will be added before release. [07:30:01] er, my earlier comment was for 6266; I didn't notice 6272 was something else [07:30:15] 6266 I believe needs other patches to go along with it [07:30:30] 6272 I feel really uneasy about without a mechanism to disable it... [07:31:44] What harm can it do? [07:33:01] for any fileserver older than a certain point, it will very likely crash it or cause memory corruption on it when the new code path is it (when the client is shutdown) [07:33:20] the only reason to not just take this is servers older than 1.4.5 can end up with memory corruption. [07:34:13] which is unavoidable, but without a switch to flip it off it's a problem for environments that know they have older fileservers and are willing to work around it [07:34:29] Hmm, probably ought to migrate away from that 1.2.11 machine I've got... [07:34:37] i hope all of you with servers on the public internet already upgraded: the client on my laptop is otherwise happily corrupting away. anyway, mech to disable: new pioctls for this stuff are lame. sysctls are not global. other suggestions? [07:34:37] I'm not sure "older than 1.4.5" is completely correct... there's a minimum version, too, but I forget what it is [07:35:14] gerrit actually claims 1.4.5 and older; my memory is faulty [07:35:31] command line switch for afsd? [07:35:33] guacb wasn't in ibm 1.0, was it? [07:36:19] i dont think so. [07:36:21] afsd switch still needs a syscall or pioctl or something [07:36:37] was not, no [07:36:51] for some other rarely-used options we have sysctl in linux and /etc/system on solaris; I assume we don't have anything similar in place elsewhere [07:37:06] > afsd switch still needs a syscall or pioctl or something no afsd switch uses a pioctl that i know of. all afs3 syscall options [07:37:17] macos has sysctl stuff also [07:37:32] but it's different [07:37:47] those platforms may be enough, but a syscall with afsd switch may be easier [07:38:05] I was hoping for cell- or fileserver-granularity with disabling before... but a global switch is probably good enough [07:38:35] i think a syscall with afsd switch is better, simply because it doesn't pollute the pioctl namespace, but an O pioctl is probably also fine. [07:39:12] making it runtime configurable may make more sense, since otherwise changing it would require a shutdown, which... "oops" [07:39:32] heh [07:39:33] or that is, making it possibly change at runtime [07:39:43] @nelson_muntz(haw haw!) [07:39:46] or maybe we don't care; I'm not expecting it to get used a lot [07:40:14] i am willing to add an O pioctl for it, if that will make it palatable. [07:40:42] Andrew, I'd turn it on on my clients if it were available. [07:40:48] --- Simon Wilkinson has become available [07:41:12] turn on... the disabling? so, turning it off? :) [07:41:24] or you mean, you want to have the GiveUpAllCallBacks functionality turned on? [07:41:33] Hi Simon [07:41:38] Turn on the feature. [07:42:16] yes, I mean, I don't expect the "turn it off" functionality to be used very often [07:42:35] afsd -giveupyourcallbackswhenyoushutdown [07:43:04] I thought we were talking about enabling GUACB by default, and having an option to turn it off [07:43:19] Well, there are two ways to do it. [07:43:42] afsd. as opposed to afsd -takealongtimetoshutdown [07:44:23] how long? [07:45:36] I will point out that the windows client has been giving up all callbacks by default since january 2011. [07:46:28] it was originally disabled when the code was added in sept 2007. [07:46:30] And it caused quite a bit of trouble at TU Chemnitz ;-) [07:48:34] I do like the feature! [07:48:34] Stephan Wiesand: which caused the trouble? "it" ? [07:49:39] They had a windows classroom or something which booted every night. And their servers started crashing every night at that time. [07:50:37] failure to upgrade your servers leaves you open to a denial of service attack. it also means that your servers are old enough to be sending bogus status info to clients that can corrupt data in the client cache. since it is impossible to control what the clients do, the servers must be upgraded. Arla calls GiveUpAllCallbacks as well and has since the RPC was added. [07:51:20] I'm not sure turning it on is the kind of change we should be making in a minor release [07:51:42] I thought the windows client had a registry option to disable it [07:51:45] The TU Chemnitz problems were in 2007 when the Windows client first added a call to GiveUpAllCallbacks. As a result of that experience the functionality was disabled for more than 3 years to give sites an opportunity to upgrade [07:52:15] and yeah, what simon said, but it seems to be highly desired... [07:52:22] The Windows client does have an option to disable it but what good does it do to a server on the public internet. [07:52:58] People expect to be able to deploy minor releases trivially, without necessarily reading release notes [07:53:19] I wasn't talking about the public internet [07:53:35] I hope the afs3.6 base configuration servers aren't on there either [07:54:12] I am aware of at least two AFS 3.6 servers on the public internet [07:54:59] um, okay, that's nice [07:55:31] Maybe it's something to think over for the next meeting? [07:56:09] simon, are you suggesting it shouldn't be added in 1.6, or just not defaulting to "on"? [07:56:11] If you're going to turn it on, then you're really thinking about "should the next release be 1.6.3 or 1.8.0" [07:56:31] 1.8 is reserved for windows. it would be 1.10 [07:56:40] deason: Not defaulting to on. I don't care particularly if it's added, defaulting to off. [07:56:52] Maybe we should start a list of "next major version" features. [07:57:53] i think people could live with not defaulting to on. then your 1.6.3 would be better than 1.6.2, and still be safe. [07:58:04] --- Simon Wilkinson has left [07:59:51] Mike, yes. [08:00:25] I'd be fine with that, just not sure about the others :) [08:00:46] (sadly, the number of options grows..., that is a big downside, but it is safer) [08:01:13] --- simonxwilkinson has become available [08:01:29] Does the client know each server's version? [08:02:09] Sadly not. [08:02:17] in theory, no, since the version packet could contain anything [08:02:40] Sites also customise their version strings, so even rxdebug won't help [08:02:44] in practice, you can make a sort of guess based on what's in it; usually it's correct enough, but... [08:02:49] I originally proposed us [08:03:31] you can kind of think of it like guessing based on http user-agent strings, except there's fewer standards and conventions, etc [08:03:36] using some kind of OpenSSH style features list to turn on and off behaviour based on that string. [08:04:05] ... which would work, in this case, for whitelisting [08:04:35] Or we could use a new code point, as we're changing the RPC's behaviour [08:04:53] or, have a new version of the rpc that is known safe... [08:05:05] (er, what simon just said) [08:05:30] well, we're not changing the protocol behavior; it does the same thing [08:05:37] "guacb without the bugs" [08:05:47] heh. yes. [08:06:33] issuing new RPCs every time a bug is discovered in an implementation may scale when there is only one vendor implementation but it certainly is not going to work when there are multiple vendors. Nor does it address the fact that GiveUpAllCallbacks is already being called by clients and those clients will not stop issuing those calls against servers that are buggy. [08:07:01] Regardless of what OpenAFS decides to do, Arla has been calling that RPC since the day it was assigned. [08:07:39] considering this is the only time I've ever heard of this conversation being brought up in the past 13 years, it doesn't seem like this is going to make new assignments happen every day [08:08:01] this conversation has been held before. [08:08:06] about what rpc? [08:08:15] GiveUpAllCallbacks [08:08:29] yes, I mean, this rpc is the only time it's been brought up [08:08:39] it's not like we do this for everything [08:08:44] InlineBulkStatus [08:08:58] and BulkStatus for that matter [08:09:09] for the corrupted fetchstatus thing? [08:09:21] failure to assign VolSymc [08:09:32] and the corrupted FetchStatus [08:10:21] FetchData[64] for the give back bogus data on first packet if the requested offset is beyond the EOF [08:10:28] where was this serious discussion? I wouldn't ever consider it for those [08:10:47] in the old days Zephyr [08:10:58] and those are all for corrupting client things, not crashing the server [08:11:22] I'm not saying doing this for guacb is a good idea... I've never liked that idea because it just changes an rpc without changing any spec [08:11:25] corrupting data is corrupting data [08:11:33] The idea that there is more than one vendor using RXAFS is just a fiction. [08:12:13] the data corruption cases are not all the same; which component is at fault changes how upgrades are managed and what workarounds etc you can use [08:13:01] imo, for guacb this doesn't seem worth it to do such an ugh thing [08:14:14] The only easy options are new RPC, regexp match on rxdebug version or capability bit. [08:14:17] I see no difference. RPCs that corrupt data such that clients cannot trust the data and can't distinguish between valid and corrupt data is many times worse that a server that needs to be restarted periodically because of memory corruption [08:14:30] to me, it sounds like the idea of putting it in 1.6.3 defaulted to 'off' got the least amount of objection... [08:14:47] --- simonxwilkinson has left [08:15:02] yes, that was my understanding. [08:15:29] --- Marc Dionne has left [08:15:56] --- Marc Dionne has become available [08:15:57] that is the consensus for 6272 [08:16:11] Yes. [08:16:25] with the current idea that the default changes with the next major version; but in theory that could still be argued about [08:17:07] We can change it anytime. Having the feature available allows sites to test. [08:18:29] There isn't a viable test that sites can perform. Issuing the RPC does not result in an immediate crash. A crash if it occurs will happen some point later. [08:19:49] it's not a unit test... "see if this makes your site miserable" seems pretty useful [08:20:08] Ok. Still, saying "this is going to be on by default eventually, please try" makes sense. [08:23:08] okay, is there anything else? [08:23:09] one client issuing the RPC will most likely not result in something noticeable. if it did, the feature would never have been turned on by default in the Windows client in 2007. None of the sites that were testing the client before release noticed the issue on their servers. Tu Chemnitz noticed the problem because they had somewhere between 60 and 120 clients issuing the RPC on a regular basis. [08:23:43] ...and a site may have several clients doing just that if we ask them to turn it on [08:24:21] to what end? [08:24:32] Stephan Wiesand: is there going to be a pre4 just for smoke testing et al before the final release? are we waiting for anything else, code-wise? [08:25:06] you are going to ask sites "please start to issue these RPCs we know will corrupt memory on your servers because your servers have not been upgraded beyond 1.4.5"? [08:25:06] Nothing except the little patch Derrick mentioned. [08:25:28] can you remind me, which patch? [08:25:32] --- Simon Wilkinson has become available [08:25:47] security fixes [08:25:59] ah, thanks. [08:26:12] How we'll handle this exactly has yet to be discussed. [08:27:04] We'll take a local git clone of the current openafs 1.6 tip, apply the security fixes, and apply a signature to that tree. [08:27:09] in the past the security fixes have been slipped into tar balls distributed to release builders. the changes do not go into public repositories until after the release is announced. [08:27:21] Tarballs will then be generated from that tree. [08:27:37] When the release is announced, the local tree will be pushed (behind gerrit's back) into the repository. [08:27:46] I can ask them to turn on a feature that may crash fileservers, yes; I don't know that every fileserver they contact is susceptible or not [08:28:05] they either do it soon, while they know about it and can turn the thing back off, or they do it later whe nthey have no idea what's going on [08:28:35] my point is why bother asking them to test? just ask them to obtain version strings from their servers [08:29:35] Simon, Jeff: Thanks. I guess this means we can review and merge "make 1.6.2" now? [08:30:01] Yes, as long as the tag isn't applied to what's on master. [08:30:02] because relying on that alone is not a real test; though trying to get them to do that first is obviously what you want to do [08:30:36] I am sorry to interrupt. I'm heading to another meeting... I just wanted to mention that I'd like to get the NEWS updates (master is Gerrit 8750) merged to 1_6_x before we do the final 1.6.2 tag. I'm wondering what date we're looking at for the final release? [08:30:41] the security thing should go in NEWS, too, right? [08:30:49] I mean, at some point [08:31:15] Yeah, probably [08:31:29] commit the NEWS patchset and the security fixes that are slipped in can update NEWS [08:31:57] that works for me. I'll merge 8750 and backport to 1.6.x later today [08:32:12] Ok, thanks Ken. [08:33:07] Tarballs will then be made available to those building binaries, and we annoucne/release when they are ready? [08:33:33] sorry if this is dumb, but... I actually didn't notice until now that that NEWS submission is for master; is that how we normally do it? [08:33:46] not just e.g. submitted directly to 1.6.x [08:34:31] the NEWS update should go to 1.6 directly [08:36:05] Cells running public file servers that are susceptible to the GiveUpAllCallbacks bug include: andrew.cmu.edu, ciemat.es, club.cc.cmu.edu, gorlaeus.net, info.infn.it, msc.cornell.edu, oc7.org, postech.ac.kr, sanchin.se, tproa.net, umbc.edu, uni-hohenheim.de [08:37:50] NEWS on master has a 1.6.0 entry. [08:37:55] NEWS should go to master, until such time as 1.6 and master have diverged. [08:38:20] --- Marc Dionne has left [08:38:26] they haven't diverged? [08:39:14] Not until we do another NEWS-worthy release from master. [08:39:32] --- Marc Dionne has become available [08:40:33] That is, a NEWS file for 1.8.0 should include all of the 1.6.0 changes that preceded it [08:41:18] Let's keep it on master. Pruning is trivial if ever desired. [08:42:09] So Ken would just proceed as he proposed. [08:43:25] There's still Andrew's question when/what should be smoke tested before the release. [08:44:40] I assumed we had everything before the security patch as an unnanounced pre4, and we give that around [08:45:09] assuming the security patch is small enough to not require a whole lot of testing [08:45:19] It is. [08:46:38] We can have an unannounced pre4. We can just as well merge "make 1.6.2" and cut inofficial, unannounced tarballs from that for release-team testing. [08:47:00] It's also not sufficiently critical that I wouldn't be unduly bothered if it leaked. But given we're doing this whole thing new, I thought it would be worth doing it the way we should do for anything more major [08:49:03] Simon, I offered a semi-secure way to get the final tarballs to the builders. INterested? [08:49:12] okay; I didn't mean it needed to be called pre4 or anything; I'm just saying that to distinguish between that and the actual final release :) [08:49:39] --- Marc Dionne has left [08:49:50] --- Marc Dionne has become available [08:51:31] Guess that's a "no" ;-) [08:53:55] is there anything else for this meeting? [08:54:06] I will leave it up to Paul and Stephan. They can push a pre4 version and the final make 1.6.2 can be slipped into the repo with the security fixes. [08:54:45] slipping the final version patchset in with the security fixes will prevent anyone from simply grabbing the current head of 1.6.x and think they are building 1.6.2 final [08:54:58] Jeff, good plan. [08:55:46] Paul, agreed? [08:55:58] Sure -- does that mean we're going to tag a pre4 ? [08:56:17] Or are we just going to call what we have pre4? I guess it's a question of timeframes [08:56:28] I don't think we need a tag. [08:56:43] --- Marc Dionne has left [08:57:36] Okay, cool. Then I guess we're pretty much there. [08:57:45] --- Marc Dionne has become available [08:57:51] Lets merge NEWS and a "make pre4" I'll push later. Then I'll cut tarballs, upload to grand.central.org , send mail to release-team. Ok? [08:58:06] sounds good. Let's do that. [08:58:09] --- Derrick Brashear has left [08:58:40] Great. Thanks everyone (NB also for working on NEWS!). [08:59:00] additional cells with file servers susceptible to GiveUpAllCallbacks: asu.edu in2p3.fr, nada.kth.se, adrake.org, hephy.at, nikhef.nl, physik.uni-freiburg.de, integra-ev.de, tu-bs.de, engin.umich.edu, rl.ac.uk, lngs.infn.it, enea.it [09:00:08] Good reasons to be careful. Some of those may have much more business with Linux clients than with Windows ones. [09:00:30] It's a shame though. [09:02:00] Signing off for an hour. Bye. [09:02:36] --- Stephan Wiesand has left [09:05:23] --- paul.smeddle has left [09:38:26] --- stephan.wiesand has become available [09:56:23] --- shadow@gmail.com/barnowlCA9FC336 has left [09:58:46] --- shadow@gmail.com/barnowlCA9FC336 has become available [10:29:16] --- Marc Dionne has left [10:52:24] --- Derrick Brashear has become available [11:10:16] --- meffie has left [11:17:32] --- stephan.wiesand has left [13:07:46] --- Derrick Brashear has left [13:42:09] --- Derrick Brashear has become available [14:16:14] --- Simon Wilkinson has left [14:32:32] --- Simon Wilkinson has become available [14:48:42] --- Derrick Brashear has left [14:49:51] --- Derrick Brashear has become available [14:50:40] --- Derrick Brashear has left [14:51:02] --- Derrick Brashear has become available [15:24:45] --- deason has left [15:30:35] --- Simon Wilkinson has left [15:32:44] --- Derrick Brashear has left [16:31:39] --- ktdreyer has left [19:32:35] --- Derrick Brashear has become available