[00:48:56] --- wiesand has become available [00:49:43] test [06:14:48] --- Marc Dionne has become available [06:51:27] --- meffie has become available [06:59:49] --- shadow@gmail.com/barnowl20770CFD has left [07:00:05] --- marc has become available [07:00:16] --- shadow@gmail.com/barnowl95925616 has become available [07:00:55] Hi All [07:00:55] --- Marc Dionne has left [07:01:06] mornin'. [07:01:16] hi [07:01:39] terrible day here... [07:01:51] sorry [07:02:01] --- Marc Dionne has become available [07:03:03] Ok, let's postpone the first item until andrew shows up [07:03:17] So Marc, any Linux news? [07:03:27] (My net appears to be flaky; I may be slow at things.) [07:03:44] Nothing new, still work ok at the current mainline level [07:03:47] (not as slow as me today) [07:03:55] Marc: thanks. [07:04:38] Next topic: announcing those dreadful changes planned for 1.6.7 [07:04:54] (wondering whether Jeffrey is actually here) [07:05:14] i don't think he is [07:05:14] I think they are so dreadful my brain has tried to remove all references to them :) Care to remind us? [07:05:26] give up callbacks [07:05:35] --- deason has become available [07:06:03] and some will dislike changing the ihandle sync default, probably [07:06:49] For the former, Jeffrey proposed to announce it on -announce and -info. [07:06:55] --- deason has left [07:06:58] The latter could simply be included. [07:07:10] --- deason/gco has become available [07:07:27] I think it's a good idea. We just didn't say who'd do it and when. Opinions? [07:09:05] do you mean adding the new fs setcell option? [07:09:17] for guac on shutdown? [07:09:18] someone who isn't here now to disclaim it, and as soon as they get here? [07:09:34] Result of last week was there'll be no new fs setcell option. [07:09:49] oh, sorry i missed that. ok. [07:09:59] we decided one big switch, yes? [07:10:09] We decided no switch. [07:10:26] ok. sorry. a lot has happened in a week [07:10:41] email to lists: either around the time of pre1, or "now", or right before eakc....? [07:10:54] At least that's my perception (nad what I had hoped to make clear in the minutes) [07:11:45] Yes, that's the question. When? And then: who? [07:12:00] yes, that's what we ended up at, even though the whole reason we had delayed guacb in previous releases was waiting for an option to be implemented [07:12:22] jeff or you, I assumed [07:13:11] Probably. Let's postpone this. [07:13:44] Let's try to work on some changes queued up for 1.6.7? [07:14:06] i could certainly do it. i have no idea when i'd have time to write it, in the immediate term, but i am sure i could [07:15:02] Thanks. Still hoping Jeffrey wants to do it himself ;-) [07:15:16] it being? [07:15:30] Writing the announcements. [07:15:32] send the mail to the community [07:15:33] ah. [07:15:37] I pushed the afs_fetchstore follow-up for 1.6.x, I think that means Andrew can remove his -1 from the first one. [07:15:49] telling everyone that guacb is going on unconditionally [07:16:20] Ben: that's 10742, right? [07:16:23] i assume the reason andrew did not +1 10759 is that he wants other reviews for it, rather than stephan assuming his +1 is good enough, because it's not the most trivial of changes? [07:17:35] 10742, yes, and the follow-up is 10835. [07:17:39] Yes, 10759 is really one I'd like to have more opinions on. [07:18:24] Andrew: objections to 10742+10835? [07:18:33] I haven't +1'd it because I haven't reviewed the 1.6 submission yet [07:18:41] the other ones I +1'd I've reviewed [07:18:52] Ah, ok. [07:19:41] 10744 is one I really wonder whether it should block on a fix on master. [07:19:55] I haven't looked at... 10835 yet; I don't have a lot of strong feelings in that issue, though, so it's probably fine [07:20:46] I thought I had looked at 10759 when it was on master (that is, at 9711), but apparently I didn't finish. (I have a draft inline comment on host.h noting that the 'XXX' comment visible in the context is probably stale.) [07:20:57] Marc, I was really hoping for your verdict on 10804 and 10598 [07:22:09] (gah, net hang) [07:22:27] I +1'd them [07:22:41] Thanks for all the +1s trickling in. [07:22:51] I think for 10744 we want someone to look at the code and understand what (if any) actual issues there are. ISTR that Andrew has volunteered to do so eventually :) [07:24:11] yeah, I was planning on doing that [07:24:34] I understand 10744 doesn't fix the whole problem. Does it make anything worse? [07:24:46] it doesn't make it worse. it does hide it [07:24:58] basically, it provides a potential path to forgetting to deal [07:25:11] Well, it's on master. [07:25:42] and look how good we are dealing there ;) [07:26:00] well, it's a potential bug we need to look into [07:26:09] okok [07:26:21] the code doesn't make it better or worse, but it's an easy way to remember it hasn't been addressed yet [07:26:43] I don't mean to block it forever when we're getting right to releasing 1.6.7 or whatever, but for now... [07:27:23] (unrelated) I was planning to submit 10796 in a moment, I assume nobody would have strong objections to it [07:28:16] more path conflicts... [07:28:20] I am fine with 10796 in principle. [07:29:07] 10745 and 10812 already touch afs.h [07:29:29] 10745 depends on 10744 [07:29:40] so, no point in pulling that up yet [07:31:03] How about the new 10746? [07:31:48] Positive review for that would allow to merge two changes, for a change. [07:32:05] And then the conflicting ones could be rebased... [07:33:05] The 10746 is intended to fix the build for os x without regressing in functionality. [07:33:48] looks fine, have a +1 [07:33:56] "you already can't use sbrk, so don't use sbrk" [07:34:13] But then, the situation on master is worse, so maybe we want to wait for the perfect solution there bewfore we merge 10746? Just as a reminder? [07:34:46] Anything new going into master counts as a feature, not a bugfix, to me. [07:35:03] the situation on master is people are free to test the sbrk replacement there and submit reports if they have a problem. [07:35:27] and if they don't care (especially given it's advisory anyway) me either. [07:35:58] if you're thinking of the 'read length' thing from before to keep it open to fix something... this is much much less important and if we forget to "fix" things further, we don't care [07:36:01] Well, 1.6 review revealed linux will always see 0. [07:36:55] (I may be wrong in thinking this, but it's the general sentiment I have.) [07:36:57] It may be of different importance, but it's just the same thing. [07:37:00] the sbrk calculation before was wrong; frankly, seeing 0 would be an improvement over seeing a lie, imo [07:37:14] like, you can tell 0 is wrong. [07:37:15] [did I mention it has been a terrible day here ;-)] [07:38:11] So go back to version 2 of 10746? [07:38:35] I don't see what's wrong with version 3? [07:38:51] version 3 maintains the status quo in a stable series. which is also fine [07:38:54] Derrick just says it's worse than version 2. [07:39:10] Derrick was talking about master, I thought. [07:39:12] i was speaking of master. [07:39:25] you asked if we should deal with it again on master. [07:39:38] for the stable series, I think being 'consistent' is more important, while we don't have a 'real' fix [07:40:00] (this is also such a small already-not-working feature, spending this much time talking about it is silly) [07:40:28] master is fine. vesion 3 is fine for 1.6. all done ;) [07:40:38] perfect ;-/ [07:41:59] merged. [07:43:56] path conflict for 10747... day's not getting better... [07:47:18] But a few more seem to be ready now. Thanks all. [07:47:28] That may be my fault for branching the history graph in that string of gerrit changes. [07:47:58] well, if we had a designated gerrit change to push on top of... [07:48:04] You're not the only one ;-) [07:48:35] The problem is we already have several independent dependency chains with various touching points. [07:48:46] Well, normally I would rebase and re-push the whole chain of patches, but for this chain I figured that they were only connected by the osx buildslave and were otherwise independent, so maybe it would be okay. [07:48:53] Once it starts, no way back. [07:49:23] src/viced/afsfileprocs.c 10746 10747 10756 10757 10799 [07:50:20] We should merge what's possible now, and then probably rebase *everything* left. [07:50:49] (I just pushed a rebased 10747) [07:52:07] --- edgester has become available [07:52:44] --- ktdreyer has become available [07:52:52] Ah, it was due to the new 10746... [07:54:07] Hi Ken and Jason [07:54:23] Ken: objections to 10807? [07:54:28] This is what we get for cherry-picking old changes instead of doing a single cleanup commit per file, I guess. [07:54:32] hi [07:55:10] * ktdreyer looks at 10807 [07:55:13] the mac OS 10.9 buildslave is back up. thanks go to dave botsch [07:56:43] yep, I'm quite fine with 10807. +1'd. [07:56:43] --- Marc Dionne has left: Lost connection [07:56:48] Thanks. [07:57:45] The problem I see with all the rebases is that I believe there is a real risk of patches being misapplied, even if gerrit detects a "trivial" rebase. [07:58:57] I am in the habit of doing something like 'diff <(git show {master-rev}) <(git show {1.6 changeset})' or similar. [07:59:26] I thought that should be safe; a 'trivial' rebase means the git patch-id is identical [07:59:32] so the contents of the change are the same [07:59:46] (of course, sans gerrit bugs or whatever...) [08:00:03] Gerrit is set to not actually merge if there's a path conflict for a reason. [08:01:43] Remember what happened to one of the more recent security patches... [08:02:28] The fedora 20 build slave (gerrit-triggered) is down. I've notified Derek Atkins. [08:03:01] I know, but that's not using the same logic; that was actually performing the merges itself [08:03:34] the patch-ids in those cases were different, I thought [08:04:27] What do you do when you rebase? [08:05:18] You merge and submit with the same patchid, right? [08:07:26] a patch-id is a git concept, a hash of the contents of a commit (git help patch-id) [08:07:43] when you submit a rebase of a gerrit change, and it detects that the patch-id is identical, it does that "trivial rebase" stuff [08:08:12] before we had the path-conflict thing on, gerrit would try to apply the change itself if there was a conflict, and sometimes it didn't do it correctly [08:08:16] There will typically an offset. [08:08:44] so I just mean, those two things are different; one being broken doesn't mean the other is [08:08:47] And that sufficed for that security patch to misapply... [08:08:52] patch-id ignores whitespace and line numbers, according to the man page [08:10:04] ok I confused patch-id and change-id. [08:10:30] Let me know if I need to demote the fedora20 slave, but I'll need a short outage and any currently-building patch sets will get an error. [08:13:18] is there anything wrong with the fedora20 slave? [08:13:47] jason said it needed to be restarted [08:14:08] wiesand: are we going to talk about linux dentry stuff at all? [08:14:10] (oh, sorry) [08:14:58] Why not. It will give me a break... [08:15:39] if fedora20 can't wait an hour, I can just mark the changes as verified if everything else succeeds; it's not a big deal [08:16:28] Derek replied. The fedora20 slave is back up. It had a 30 minute network outage. It's working now. [08:17:13] for linux dentry mtpts... are we proceeding by posting the different "solutions" to -devel or -info to get feedback? [08:17:32] or I could try implementing one to see if it's actually feasible before we discuss a bunch of stuff that may turn out to not matter later [08:17:41] (namely, I'd be implementing the bind-mount option) [08:18:27] I figure there would be objections to ever pulling that up to 1.5. [08:19:17] not fixing it lets you trivially crash the box... [08:19:17] Is the dentry thing what causes the a Linux kernel crash when a volume is mounted in multiple places? [08:19:42] sort-of, yes [08:20:10] in this recent RH kernel it's just a WARN, right? [08:20:11] the client shouldn't crash by that just existing, but it permits a code path that makes it so you can crash the box [08:20:17] no [08:20:27] ok, that's biting me at least once a month near the end of the semester [08:20:34] while the warning turning into a BUG is a concern (that would mean we'd crash on every access) [08:20:44] even with just the warning, there's a behavior change in there, too [08:21:15] where when we d_splice_alias, it always gives us back the existing dentry (for a dir), where before we could actually add an alias [08:21:25] the bind mount option is kinda ugly but may be the best we can do [08:21:36] so we're back in the same situation from before we added all of the canonical_dentry stuff [08:22:18] er, I should clarify that "no" above [08:22:29] while it is just a WARN right now in RH, I believe, yes [08:22:59] the (more critical) problem is the behavior change in d_splice_alias, as that's what makes the crash possible [08:23:22] unless something else is crashing? edgester? does the box panic for you just on accessing stuff? [08:24:08] the kernel panics when rm'ing certain files. [08:24:32] or maybe there are more problematic code paths from last time, since the vfs obviously is always changing; before, the known code path was (summarizing) an rmdir that crossed a mountpoint for the same volume twice [08:24:52] hmm [08:25:14] I might still have logs of crash dump analyses if that would help [08:25:45] I'm seeing frequent panics on rm (not rmdir) triggered by a small group of users. But that's still 1.4. [08:26:04] it won't help in solving it, but it may provide more information about under what circumstances we are hit by it [08:26:22] My problems are on 1.6.5 clients and servers [08:26:45] wiesand: what platform? can you post the stack trace or something somewhere? [08:27:21] It's 1.4, so I assumed nobody cares. [08:28:19] it only takes a minute to look at the stack to see if it's relevant to this or not (if it's convenient for you to get) [08:28:34] And what I'd have is just a screenshot of the panic... [08:28:44] But I'll send that next time it happens. [08:28:46] a lot of commits submitted are a result of bugs found in 1.4-based trees :) [08:30:33] any opinions on "steps forward" with this, though? [08:30:37] Someone proposed restricting fs mkm - won't help, since you can do that with ln -s . [08:31:41] I'm still afraid it's a 1.10 topic. [08:31:59] how can I post my crash log privately? I'm not sure how much it will help since it's a custom build from 1.6.5 [08:32:48] i can give you a place to copy it. [08:34:23] I don't see how you could fix it by restricting making mountpoints; you'd have to search the entire afs-space to see if another one exists [08:34:45] you could make accesses fail when you hit another mountpoint, but that's not going to be "popular" [08:37:02] is there someone here who can give me access to the RT bug (130273) [08:38:54] I assume you need shadow for that, of the people that are here [08:41:31] kernel backtrace: http://pastebin.com/VcrjnUAn [08:41:52] marc, i can do that. [08:43:09] thanks [08:43:45] meffie: please provide a place for me to upload [08:44:17] edgester: quick q: do you have fakestat on? [08:44:19] I have 12 log files from crash sessions where I can the commands" log kmem -i bt ps ps -a bt -a foreach bt exit " [08:44:44] which specific kernel is this? [08:45:16] yeah that's another good question; and if this is easily reproducible by just 'rm'ing a file in a multiple-mounted volume, we probably don't need info from cores... [08:45:19] I even have a one of the original 64GB kernel core dumps that I can still analyze [08:45:24] since we can just have it happen here [08:45:30] deason/gco: 2.6.18-371.1.2.el5 (RHEL5) [08:46:04] We've tried reproducing this and failed [08:46:17] the problem is intermittent [08:47:05] yeah, it's not so simple as every unlink through a multiply-mounted multiply-accessed path [08:47:22] I may still be able to trigger it; is fakestat on? (fakestat or fakestat-all) [08:48:12] yes, fakestat is on [08:48:51] fakestat or fakestat-all? [08:49:06] AFS_ARGS="-fakestat -dynroot -afsdb -chunksize 22 -stat 500000 -daemons 16 -volumes 2000 -blocks 119125339" [08:51:19] okay, well, going further into this in this space is probably not best [08:51:49] for "plans" forward, I think I/we just need to look a little more about what's going on, and discuss next week (or earlier, on lists, if we can), unless someone has something else to say [08:52:50] and yeah, 2.6.18-371* seems to have that same WARN code, if anyone was wondering [08:53:28] Ok. Could you send me a summary for that last topic for the minutes? [08:53:43] Please contact me and I can provide crash log files. I would prefer a private place for uploading them. [08:54:12] and now I'm wondering if we should announce something recommending not using the client with the relevant RHEL versions (the newest 5 update and the newest 6 update, I assume) [08:54:45] or at least saying that some problems in this area exist for those versions [08:54:48] wiesand: yes [08:55:00] meffie: I'm assuming you're taking care of an upload space [08:55:16] lovely! [08:55:18] yeah, i'll send the info to jason directly [08:55:46] does this affect RHEL7/Fedora19 as well? [08:57:45] meffie: thanks! [08:57:46] I assume so; I'm still waiting for my rhel7 machine, but if I can find the source I can verify [08:59:38] Just ran the procedure from 130273 on RHEL7 beta. [08:59:52] deason/gco: I'm just starting to play with RHEL7 beta. do you expect to get it in the next month? [09:00:09] rmdir: failed to remove ...: Resource deadlock avoided [09:00:47] can I get RT access to 130273? username: jason [09:01:04] or we could just take that ticket out of the security queue [09:01:13] 3.10.0-54.0.1.el7.x86_64 [09:01:16] wiesand: that's probably for an "immediately-recursive" mountpoint [09:01:26] which we just disallowed completely [09:01:27] Right. [09:01:37] you need an intermediate directory between them for a 'real' test [09:01:56] like, fs mkm vol mtpt1/somedir/mtpt2 [09:02:39] RHEL7 actually appears to not have the problematic code [09:03:07] but it seems entirely possible it will get it in an update or they'll add it before GA or whatever [09:04:35] no crash, no error [09:04:54] What's the upstream commit id? [09:05:16] 7732a557b1342c6e6966efb5f07effcf99f56167 [09:05:17] I can ask RedHat to consider excluding or postponing that commit. [09:10:01] Unless there's anything else to discuss today, I'd like to call it a day. [09:10:13] +1 [09:10:14] And I'm really glad this one's nearly over. [09:10:42] Minutes tomorrow, hopefully (andres: thanks). [09:10:52] thanks everybody [09:10:58] further discussion on this stuff can probably go in 130273; IMO it should be taken out of the security queue [09:11:28] No objection. [09:12:12] I'm not sure who has the "authority" to do that, but I'm assuming shadow will take care of it or figure it out [09:12:40] Thanks a lot everyone. [09:13:30] Path conflict on 10598, and again more new ones than changes made ready for merging... what a day. [09:13:32] Bye. [09:13:39] --- wiesand has left [09:13:43] --- deason/gco has left [09:25:47] --- marc has left [09:26:27] shadow: would you mind giving my account (ktdreyer) read access to 130273? [09:31:15] --- Marc Dionne has become available [09:37:45] --- edgester has left [10:09:20] --- deason/gco has become available [10:43:56] --- meffie has left [16:05:31] --- deason/gco has left: Replaced by new connection [16:05:32] --- deason/gco has become available [19:56:28] --- ktdreyer has left