[06:18:05] --- Derrick Brashear has become available [06:21:01] --- Stephan Wiesand has become available [06:22:33] --- meffie has become available [06:33:57] Hello. [06:34:03] hi [06:34:54] Could we briefly chat about Change-Id uniqueness? [06:36:50] Yes ... [06:37:08] stephan accidentally discovered that change ids no longer need to be globally unique in a gerrit instance [06:37:19] When I pulled up two changes from master, I made the mistake not to delete the Change-Id line before pushing them to refs/for/openafs-stable-1_6_x. [06:37:23] hello [06:37:29] that they did the thing with branches that i (or you, simon, but one of us) suggested [06:37:54] I wonder if they did, or if it just doesn't look up change-Ids that are merged. [06:38:13] i think stephan had a reference the other day that said they changed it [06:38:50] I noticed that a few months ago but concluded that what we are doing is good policy since it makes it easier to search the git logs for a particular change [06:39:13] it was gerrit issue #117 [06:39:28] It's only good policy if something is enforcing it, otherwise it's unusable. [06:39:45] http://code.google.com/p/gerrit/issues/detail?id=117 [06:39:57] And actually, as long as we're not going to merge between branches, you can always search for a particular change-Id. [06:40:15] Yes, comment 14 [06:40:22] No, 12 [06:41:01] Although comment 14 seems doubtful [06:41:08] yes, but comment 14 suggests what simon said, namely, it's sort of luck [06:41:30] regardless, using changeid per change, where a cherry-picked change is not the same change, seems wise [06:41:41] I made simple test installations of 2.2.1 and 2.5.1. Both seem to behave as comment 12 says. [06:41:58] especially since in some cases, while we use the cherry picked from text, the change is not literally the result of a cherry pick, even with conflict resolution [06:43:08] All of my tooling uses the commit's SHA1, and the cherry-picked from lines, rather than the Change-Id [06:43:25] My major question at this point is what should be done if it happens accidentally that the same Id gets into a change posted for review on 1_6_x. [06:43:39] once it's merged, we live with it [06:43:50] if any tool ever cares we will have to do a change-id quirks list [06:44:02] But before merging? [06:44:21] change the change-id but resubmit to the same change # in gerrit [06:44:38] which would be e.g. refs/changes/(number) instead of refs/for/branch [06:45:12] Does that work? [06:45:25] at some point it did. i suppose i should try it [06:45:45] i will change the id of my keepalive test case [06:45:52] One of the macos changes sitting there has the issue too ;-) [06:46:11] Guess you did it on purpose. [06:46:23] i didn't. comment on it [06:46:40] of course i fucked up and didn't push to refs/changes/(whatever) because i'm an idiot [06:46:55] I did. It's 8778. [06:47:35] yes, the change id in 8782 just changed. it was Change-Id: I3262b5d363b51db817d78e5ffa1a9aaae5bcd53f [06:47:39] it is now not [06:47:43] so, yes, that still works. [06:48:04] (i pushed to e.g. HEAD:refs/changes/8782) [06:48:17] Cool [06:49:53] hm. jeff points out that gerrit's title/header doesn't change. which means you can't search gerrit by the new change id. [06:50:14] so the question is what tools we think might use change-id. if they use gerrit directly, we need a new changeset (but you lose any history) [06:50:37] otherwise, we can do this (chnaging the commit message) and tools which use the git repo will dtrt [06:51:04] if its a pullup and there are no comments, just abandon and push an new changeset [06:51:06] Wouldn't it be safer and cleaner to abandon such a change and resubmit it with a fresh ID? [06:51:27] if you don't care about any of the comment history before it was noticed, yes [06:51:50] i suppose, worst case, you can refer to the old (wrong) change id's gerrit # in the new one [06:52:23] And whoever commented will receive mail when the change is abandoned. [06:52:49] In the three actual cases, I think we won't loose anything invaluable. [06:53:07] --- paul.smeddle has become available [06:53:10] what are the other 2 cases? [06:53:29] 8753 8757 [06:53:50] ah, the ones you did. [06:54:21] so, do you have a working tree with those changes in it already? you know about using git rebase -i with edit instead of pick? [06:54:56] the sadthing is, 8785 in place of 8772 is effectively a trivial rebase, but nothing will know. [06:54:57] No and no. [06:55:14] do you want me to pull them and push them back? [06:55:25] (since i do have a working tree) [06:55:31] I wouldn't mind. [06:56:09] doing, [06:56:42] Thanks. It will happen again eventually, and then I'll have my go :-) [06:57:11] have you used git rebase -i? [06:57:22] Maybe the hook could be hacked to prevent it? [06:57:47] No, I haven't. [06:58:01] the hook can't. gerrit would have to do it. the hook can't know what all change ids exist, as your working tree can be stale [06:58:25] It's not supposed to be stale when I push, is it? [06:59:28] if i have a tree where the branch i am working on is up to date but the master is not, and the master is where the change id i reused is (which is possible but not overly likely) i am pushing to a current branch head but my tree is nonetheless stale [07:00:03] in any case, you now have a small stack of patches for 1.6 all of which gerrit is now happily chewing away at [07:00:08] er, buildbot is now... [07:00:43] Thanks. That's good because there are a few more waiting to be pulled up that touch acinclude.m4. [07:01:14] what else needs to be pulled up? [07:01:17] At least I assume that's what we'll decide today. [07:01:37] Ok, we can just as well start. [07:01:48] hello [07:02:00] well. if they are likely, tell me and i'll push cherry-picks now, since if buildbot catches up (hahahahahaha) they can at least keep building, and if you don't want them after all you can simply not act on them [07:02:25] just looking over the history -- take away is always delete change-id lines in commit messages? [07:02:35] I really think we want the ones I mailed yesterday: [07:02:39] f87d49c autoconf: add AC_CHECK_LINUX_TYPE macro 74c1881 LINUX: Use struct vfs_path on RHEL5 5daa08e LINUX: make d_automount work properly on rhel5 kernels [07:03:10] Paul, only when you cherry-pick from master. [07:03:37] When a change is updated/reabsed, the Change-Id should stay the same. [07:04:04] gotcha [07:04:28] hm. wonder if andrew is coming [07:04:49] anyway, have 8788 through 8790, which are the pullups of those patchsets [07:05:25] --- deason has become available [07:05:39] Thanks. But they will have to be rebased anyway, won't they? [07:05:45] why? [07:06:00] they're based on top of the other stack of patches you are presumably merging [07:06:21] if you are taking all those, you can just merge right up the stack [07:06:24] f87d49c is definitely okay - it's just another configure test [07:06:36] Once we merge 8785, won't there be a path conflict? [07:06:53] why would there be? [07:07:08] 74c1881 should be fine. If it isn't, it will be caught at build time. [07:07:08] this change is (now) based on top of 8785 [07:07:42] 5daa08e I don't know much about, but Marc reviewed it on master, so I'm happy. [07:07:50] 5daa08e would also be caught at build time. either it will build or not. [07:08:12] (as long as you believe the macros mean equivalent things) [07:08:16] But the path will be different after the change is cherry-picked from refs/, no? [07:08:38] no. because these pathcsets are now based on top of what the result from 8785 will be [07:09:09] i pushed a stack of patches which build on the previous one, not several patches either individually or against the current head-of-branch [07:09:17] Ok, all the better. I thought it's about the parent's change#. [07:10:10] so whatever is after 8785 when merged since it applies to 8785 will be applicable to the tree that results from 8785. [07:10:13] ok. [07:11:30] of the mountain lion fixes i pulled up, none touch non-macos code. [07:11:48] 2 touch only gui helper tools. [07:12:22] Paul, I think we should just merge the macos ones once verified. [07:12:40] so do you wish to vet the current stack of things with folks, or talk about the fact that we are presumably not going to deal with 8464, or something else? [07:13:14] I will test the rhel5 build fixes again, and then I think they should be merged. [07:13:28] They're all fine by me. [07:13:46] On 8464, there seems to be consensus it's not for 1.6.2pre2 at least. [07:14:13] I think it's not for 1.6.2. There's too much confusion around it for it to be included at this stage in the game. [07:14:51] There's also the question of whether Derricks KeepAliveOn / KeepAliveOff change should be in 1.6 - as it appears that doesn't work as intended. [07:15:05] simon, sure looks like it works as intended. [07:15:08] look at 8782 [07:15:10] stephan: agreed [07:15:11] But if that was merged before 1.6.1, I think we can ignore it for now, as it's not a regression [07:15:44] when keepalives are turned off by the server, the hard dead time is applied and the connection dies. [07:16:09] I thought the problem was that a hard dead time wasn't being set for those calls. [07:16:54] looking. [07:16:54] I don't think that's a discussion that needs to happen now for 1.6.2 [07:17:14] possibly. hang on. [07:17:15] the keepaliveoff/etc thing at the very least isn't hurting anything, afaik [07:17:23] i don't remember if the keepaliveoff thing is post 1.6.1 [07:17:33] Okay, if Andrew's happy that it isn't hurting, then I'm happy to leave that for now. [07:17:38] ok. [07:17:47] me too [07:18:01] Could someone summarize? [07:18:23] My brief summary would be that nobody understands how all of the various RX timeouts interact [07:18:32] that's an apt summary [07:18:38] And so, when we try to fix one thing, something else gets broken [07:18:56] haha, yeah :) [07:19:02] Ah now I git it ;-) [07:19:17] but anyway.... for what we just talked about: [07:19:43] I think we're agreed that Jeffrey's 8464 isn't appropriate for 1.6. The question is whether Andrew's less invasive change is. [07:20:05] Simon Wilkinson: no, I don't really even know if that's right, either [07:20:15] Can we just leave it alone for now? [07:20:35] i think we should just leave it. the problem is andrew's change doesn't look correct. [07:20:55] we should especially leave it for 1.6.2; what happens on master is a different question, but not for here [07:21:23] [summary] the original 8464 submission modified the 'rx_KeepAliveOff' stuff (merged before 1.6.1) which was supposed to improve behavior when disks are really slow for fetch/store operations; simon wondered if the original rx_KeepAliveOff stuff should be pulled out due to discussion in there, but as I mentioned, even if the rx_KeepAliveOff stuff isn't correct, it doesn't seem to make anything _worse_ [07:21:57] So, nothing to do for now. [07:22:05] yeah. [07:22:06] yeah, consensus is to leave it alone for now [07:22:24] Good. [07:22:45] I guess there are no objections to 8786 and 8787? [07:23:22] i have none. one changes documentation. one makes the spec file actually work. [07:24:11] what about 8775 and 8776? [07:24:44] (kerberos changes for windows) [07:25:05] I'm happy with 8787 - I'd like Stephen to give it a spin on the RPM builder before we actually release with it, just to catch any oddities. [07:25:07] on 8787: I really think perl-devel exists on other things besides just rhel6 (some fedora, and probably other stuff) but I don't know of a better way for it [07:25:43] We catch it for Fedora >15 [07:25:58] on 8786: verifying that has been on my todo pile since arne's email to the list, but not critical enough to block imo... [07:26:02] I don't think we guarantee that that spec file works elsewhere. (What does SuSe use?) [07:26:06] fedora > 15 sets rhel == 6? [07:26:29] oh, they're separate blocks [07:26:40] Yeah, hidden after systemd-unit [07:26:47] I only found it with search and replace :) [07:27:15] if we had an activist specfile maintainer for suse, and could ensure that both suse and rhel/fedora worked from one spec file, that'd be fine. we don't. [07:27:53] I'm not sure it would help a lot. Many more %ifs aren't making things easier. [07:28:12] i wonder if spec supports including file fragments [07:28:16] but that is not a discussion for here [07:28:47] Back to 8775 / 8776 ? [07:29:49] Jeff, do these fix the breakage with the latest KfW release? [07:29:52] they fix an issue raised on openafs-info last week. they are not the fix for the krb524 functionality removal from kfw 4.01. [07:30:15] Pity. [07:30:25] 8775 is pretty boring [07:30:27] I wonder if the sleep should be bounded, so there's no way we can end up waiting for ever for a duff KDC [07:31:01] I have not had time to work on importing ticket conversion code into kauth. [07:31:34] Any ETA? [07:31:47] as I indicated last week I wasn't going to have time to even start to look at it until yesterday and I have not as yet. [07:31:57] anyone else that wants to volunteer is welcome to do so [07:32:01] What do we think the timescale for 1.6.3 will be? [07:32:03] simon, you want to only sleep a finite number of times before bailing? [07:32:20] Derrick: Yes [07:35:39] cant the user bail in this case? [07:36:13] the user cannot bail but I'm not aware of any KDCs that broken. [07:36:32] I do not object to making a change that limits the number of retries to one [07:36:50] I just get jumpy with anything that has infinite retries ... [07:37:00] understood. [07:37:13] i will push a patchset that limits to 30 retries in a moment, to the stop of the stack [07:37:15] "top" [07:38:05] seriously, limit it to one. If it fails with a retry error more than once the kdc is already far too out of compliance to be useful [07:38:38] Meanwhile: I think we want to pull up 8761 and 8759 ? [07:38:39] WRT KfW 4.0 compatibility, I'm not sure we should delay the release to get it in. [07:39:05] Can we just fail to start if KfW 4.0 is installed? [07:39:26] fail to start what? [07:39:59] Something that the user can get an error back from. I dunno - it's Windows - all a dark art to me... [07:41:29] Simon, I think we shouldn't wait for it with anything we want to do soon (like pre2). [07:41:46] Stephan Wiesand: yes (for 8761 and 8759) [07:41:51] aklog functionality is baked into: 1. windows logon 2. network identity manager afs credential provider 3. afs credentials tool 4 aklog 5. anything else linked to afskfw.lib [07:43:54] what will happen to users that require krb524 for access to their cell is that kfw will be installed, the krb5 logic will be triggered instead of the kauth code path and they won't get a token that is usable. The user will get an error instead. It simply won't be clear as to why the error is generated. [07:44:43] That's not a regression with OpenAFS, though, so I'm not sure whether it's a reason to delay a release. [07:46:35] Let's go ahead with getting 1.6.2 ready. Once there's an ETA for the 534 changes, let's see. If 1.6.2 is ready and no ETA, release 1.6.2 and face 1.6.2a for Windows only? [07:46:47] sounds like a good plan [07:47:09] i have one more thing when there is nothing else to talk about [07:47:36] I also have a couple of things [07:47:38] we can document the issue as known and move one [07:47:44] s/one/on [07:47:57] What's the bottom line for 8775 / 8776 ? [07:48:09] i am pushing new ones with a timeout [07:48:35] Ok. Will they have to be in pre2? [07:48:49] won't much matter. no reason not to [07:48:54] 8775 is a pullup from 1_7_x. A fix should be submitted to 1_7_x and pulled [07:48:56] you can have them in a few more seconds [07:49:10] fine. then just merge 8775 and 8776 as-is [07:49:22] and when we have something to pull up, we can [07:49:55] Fine by me [07:50:36] Ok. Derrick, "one more thing"? [07:50:51] i'd like to pull up 8777 [07:51:14] fixes a leaked vnode iocount ref on macos. can cause unmount to block forever. macos only. [07:51:58] I have no objection [07:52:06] Fine by me - the imbalance is pretty obvious, and the fix localised. [07:52:41] Fine if no developer objects. [07:53:13] I figure Paul won't either. [07:53:53] Nope [07:54:03] Andrew, "a couple of things"? [07:54:44] 8751 I feel is pretty significant [07:55:18] as far as I know the current fix in there is fine, but afaik only chas and derrick have looked at it [07:55:20] i'd be happy to merge and pull up 8751 [07:55:47] I would like Marc to look at it before we merge it [07:56:09] has Marc's locking comment been addressed / researched? [07:56:15] it's not merged on master... [07:56:41] it's been addressed / is no longer applicable [07:56:54] and yeah, it needs to be merged on master first [07:56:55] marc's comment is not applicable due to the code not looking like patchset 1 [07:57:03] I would also rather not change LINUX24, unless we're absolutely certain that it's okay there too. [07:57:03] ok [07:57:59] I think you either need to do this, or rip out the other mtpt/dentry stuff for 2.4 [07:58:36] it's possible to submit something different for 2.4 (just limit the loop iterations), but I really thought this was... well, fairly normal [07:58:41] It's just that ~nobody seems to be using or testing 2.4, so I'd rather just let it stagnate, rather than try to copy fixes from 2.6 into it. [07:58:50] I agree that the change does look fairly normal. [07:59:15] Is anywhere running with this code already? [07:59:35] I was just changing 2.4 stuff for the mtpt/dentry issues, not everything [08:00:08] no, the place that reported this needed something fast, so they have something that just limits the loop iterations [08:00:31] Can we take that change for 1.6, and fix it properly on master and give it some testing there? [08:00:37] Can we put that into pre2? [08:00:45] "that" change? [08:00:52] the limit loop change [08:00:58] Yes, the limit loop change. [08:01:01] Jeff, yes. [08:01:31] 'ugh', okay :) [08:02:10] Does it have side effects? [08:02:55] doing the loop limiting thing vs the 'real' fix? [08:03:01] it's just less efficient, I think [08:03:36] How likely is this problem to strike? [08:05:38] it takes a lot of load, but given a busy enough machine, it will happen sooner or later [08:06:26] Paul, would you be ok with taking the "quick fix" for the time being? [08:06:46] Well, I have a vested interest in this work ... [08:07:02] well, kinda [08:07:46] If 8751 is no good, and we use it in pre2, what will happen? [08:08:00] paul, do you have a dev environment system with sufficient workload to exercise this code path sufficiently to make people feel more comfortable with the "fix" [08:08:09] Either kernel deadlocks, or BUG()s [08:08:36] Easily attributable to this change? [08:08:38] the most likely case for it breaking would be something involving the nfs translator [08:08:50] since we only change behavior around 'disconnected' dentries [08:09:05] I should say, I have "observer status" interest in this issue [08:09:16] but I could potentially try and test the code [08:10:34] the nfs translator is effectively already broken on linux [08:10:37] I'm fine with the temp fix, though; it's not terribly different for me [08:10:40] i am… not so much caring [08:10:53] yeah I know, I mentioned that somewhere in 8751 [08:11:10] If we get the temp fix in for 1.6.2, push the real fix to master and make sure that lands on 1.6.3, after it has (hopefully) had some exposure on master. [08:11:15] I reckon we should take the loop limiting fix [08:11:50] We may have a pre3 and more on 8751 then. [08:12:39] how about, we consider the real fix when there's more feedback/info on it; if that's in time for a possible pre3, then okay; if it's not until 1.6.3, then okay [08:12:46] Sure [08:12:57] Yes. [08:13:09] But I think it would be worth getting the loop limiting fix in to pre2 [08:13:17] okay, I'll submit the 1.6.2-only fix [08:13:25] Ok. [08:13:27] er, to the 1.6.x branch [08:14:08] moving on? [08:14:22] yeah [08:14:32] Ok. [08:14:41] the last thing I have is rt 131530 [08:14:59] which has no submitted fix right now, but the first 'solution' proposed there is obviouslly easy to implement [08:15:12] you mean just kill ih_sync_all [08:15:31] so people don't have to read that whole thing... there's a data loss/corruption issue with ih_sync_all, similar to the last corruption issue with ih_sync_all [08:15:34] derrick, yeah [08:15:39] which i think is simple, easily understood, and the safest option esp for 1.6 [08:16:14] if people are worried about their data being synced, a thread which syncs later still leaves a window for data loss, so it's of marginal value anyway [08:16:31] there are 4 points mentioned at the bottom of the original report; point 1, remove ih_sync_all is all I'm really advocating for 1.6.2 [08:16:39] I think killing ih_sync_all is fine as would be having the sync thread never put fds back into the cache [08:16:56] Does removing ih_sync_all just mean that stuff isn't sync'd, or does it move the syncs into a different thread? [08:17:24] (the other things are safeguards I will be adding, but aren't really important to hold up the release) [08:17:43] Simon Wilkinson: stuff isn't synced; just remove the call to ih_sync_all/ih_sync_thread [08:17:50] it's not forcibly synced. you get to rely on your storage to actually write the data out that you sent it. [08:18:04] I don't have a problem with that. It probably makes things like ZFS and ext4 much happier with us, too. [08:18:17] and I should note, many filesystem backends already sync every N seconds configurably [08:18:43] that too [08:19:24] so we need a ih_sync_all killer in master and we can then pull it up [08:19:28] removing ih_sync_all; +1 [08:19:53] well, I'd like to know if stephan/paul have any opinions.... [08:20:05] or if we want to give cern a chance to yell about this [08:20:10] Still trying to make up mine. [08:20:23] the person who will most likely object is Hartmut [08:20:36] can we done something like the "fast restart", make it an option for the foolish/brave? [08:20:47] do* [08:20:55] Well, at our site we certainly prefer having the ZFS backend doing the sync, so I'm in favour of it if it's just an easy removal. [08:21:04] I don't think having a "—enable-data-corruption" option is the best way forwards ... [08:21:11] we alresdy have one [08:21:19] heh. [08:21:22] Yeah, so let's not have another :) [08:22:07] Proposal: We go ahead killing it. I contact CERN and Hartmut, solciting their input. [08:22:15] i like that plan [08:22:19] yes. [08:22:47] If they convince you're all wrong,... [08:22:52] modifying ih_sync_all thread to not safe the FDs it uses will be sufficient to avoid the race. That would limit the ih_sync_all thread to a max count of one FD as well. I think that the patchset killing ih_sync_all should be submitted to master and that Hartmut and Dan should be invited as reviewers [08:23:19] s/safe/save [08:24:48] Stephan Wiesand: okay, will submit [08:25:12] And Jeff's proposal looks good too. [08:26:17] oh yes, I'll add them to it [08:26:23] Jeff, the modification of ih_sync_all would be 1.6 only? [08:26:50] yes, it could be a 1.6 option [08:27:49] Sounds most attractive to me. Stopgap for the next stable, final solution discussed on master. [08:28:38] I don't know if I agree that that's the stopgap.... if we really wanted to keep ih_sync_thread around, I would've advocated for temporarily removing it would be the safer temporary fix [08:29:43] but I won't push it... as long as it's gone longer term [08:30:44] But Jeff's proposal will eliminate the risk of data corruption? [08:31:40] Doe 1.4.14 have that, BTW? [08:31:40] as far as I know, yes (that's mentioned in 131530), but it's a more complex change [08:31:42] if this thread's functionality is something we want (as opposed to this implementation of it) i can conceive of a safer way of doing it, but we will cross that bridge if people care [08:32:05] note that 'more complex' is still not very complex [08:33:29] I would've hoped that getting rid of the background thing would increase motivation in using sync()s correctly... [08:34:38] sync() is system-wide, high collateral damage. correctly is "don't [08:34:43] " imo [08:34:50] sorry, fsync [08:35:06] Yeah, having a background sync thread at all is a workaround architecturally, no? [08:35:15] yes. lousy [08:35:36] I think it's rather late for changing behavior like this, if it can be fixed safely and relatively easily for the time being. [08:36:15] the objection to removing it from Hartmut is likely to be that he doesn't trust that data is fsync'd properly elsewhere [08:36:29] It isn't. [08:37:21] The conservative approach would be "fix the use of fsync and when that is done remove the in_sync_all thread" [08:37:35] which delays 1.6.2 [08:38:09] that would not be a change for 1.6.2. hence removing ih_sync_all should not be removed for 1.6.2. [08:38:23] I may be mistaken, but I thought what we had at the moment was that FDH_SYNC just sets a flag on the inode, and ih_sync_all comes along and actually does the sync? [08:38:25] changing it adds additional risk to screwing up the runtime for a normal fileserver [08:38:51] removing the sync thread will definitely not break anything for a normally-running server [08:39:27] Simon Wilkinson: it's not the only thing that necessarily does the sync, but yes, that's right [08:39:36] It's a significant change in between pre1 and pre2... [08:40:07] In an ideal world, pre2 would be purely fixing issues found with pre1. [08:40:11] and the data is not fsync()'d correctly elsewhere; it's not fsync()d correctly now either [08:41:47] hm. we will not have 1.6.2 final next week. can we leave this in pre2 and examine it before the next meeting? [08:42:03] as in, leave things the way they are? [08:42:09] for pre2, yes [08:42:40] it is exactly as broken as 1.6.1 [08:43:02] This is happening in the wild? [08:43:07] yes [08:43:08] yes [08:43:10] yes [08:43:17] but I mean, it took several years for anyone to notice [08:43:21] in surround sound. [08:43:46] rare, but very very serious when it hits. [08:43:47] the people who noticed are unlikely to roll out pre2, i assume [08:43:48] this is a real problem that does result in data corruption. [08:43:53] but also note that the rather large site that noticed the last bug we had similar to this runs without ih_sync_thread [08:44:38] so if they ran with the background syncs, maybe it would've been hit earlier [08:45:02] How much work is it to make it safe for now the way Jeff suggests? [08:45:37] not much; I assume ih_ihopen is usable there [08:45:46] er, ih_iopen, whatever it's called [08:46:00] I haven't tried it, but it's not a lot of work, submittable today, certainly [08:46:29] Paul, should we ask Andrew to please do it? [08:46:56] Well, I guess if we're agreed that the short term fix is too drastic for pre2, then sure [08:47:31] Long term? [08:48:22] Long long term, I'd like to kill the ihandle package ;), and just let the OS do the caching for us [08:48:32] +1 [08:48:48] heh [08:48:51] Can you submit that taoday too? [08:49:03] well, we need some kind of abstraction if we're to sync data to ensure consistency [08:50:15] I'm much in favour of having a pre2 with background syncing but without known data corruption due to it. [08:50:27] wasnt commit 12e85227 meant to avoid "poisoning" the fd cache ? [08:50:50] it was. [08:51:00] and that did fix issues, but not all of them [08:51:12] ah, ok. [08:51:13] well. i wonder if the race you describe fixing as 4 would fix the rest [08:51:30] Stephan Wiesand: I can submit that today; whether or not it goes in pre2 can maybe be discussed in gerrit? [08:51:35] and yes, i thought point 4 was the real issue? [08:51:48] Andrew, thanks! [08:51:50] Derrick Brashear: the reallyclose/open race? no, even without that, this can happen [08:52:01] ok. [08:52:12] well, let's carry on in gerrit once we have some code to carry on with [08:52:36] (once we check ih_synced, and then drop IH_LOCK, anything could happen by the time we IH_OPEN...) [08:53:38] Any more changes to discuss today? [08:53:43] but okay, I think we've run this into the ground enough; remove ih_sync_thread in master, and submit 'do not deal iwht the cache but still sync' for 1.6.x [08:53:44] i got none [08:53:53] nothing else from me [08:53:58] 5157, possibly [08:54:22] does that fix the perl thing? [08:54:26] 5157: trivial. don't care if it goes [08:55:04] It means that you can build perl-AFS using just libafsauthent_pic and libafsrpc_pic, and no longer mix all manner of other nonsense together. [08:55:07] I have no objections [08:55:33] We'll need 5156 too, which is just a build system dependency change [08:55:53] aiui you can't usefully use the perl-afs stuff with 1.6.1; this makes it so you can with 1.6.2 yes/no? [08:55:54] if this fixed the perl issue, excellent. [08:56:44] Yes, this should mean (in conjuction with some patches that Norbert already has) that you can use perl-AFS with 1.6.2 [08:57:00] 5156/7 are ok for me. Paul? [08:57:06] yep [08:57:06] okay then yes, +1 [08:57:27] is simon submitting? [08:57:49] --- Derrick Brashear has left [08:58:05] I'm turning into a pumpkin. If I don't speak with folks, merry xmas and such. [08:58:22] I'll submit [08:58:30] I think we'll leave the cache bypass fixes I found on the list for now. [08:58:47] Pre2 schedule? [08:59:09]