[05:59:49] --- mmeffie has become available [06:50:28] --- matt has become available [06:50:34] hi mike? [06:56:42] --- matt has left [07:11:26] --- matt has become available [07:16:35] (I will be in at 11:00 am EDT, but had prior commitment at 11:30) [07:17:03] --- shadow@gmail.com/owl912329A2 has become available [07:40:55] hi. [07:54:59] --- Derrick Brashear has become available [07:55:55] with any luck we can be fast. the two vaguely controversial things are the "what pag am i in" pioctl, which is mundane in implementation and provides missing functionality for aix and portable functionality for linux; [07:56:10] and "what if anything can we do about mmaping files >> cache" [08:01:48] after 11. are we it? [08:03:03] yes. so far. [08:03:11] i meant total [08:03:27] (in general, from owl, i have no idea who's here, but i opened in adium also) [08:03:42] hi ho [08:03:47] haba mumbled he'd try [08:03:52] I have 30m, as noted [08:04:13] 27m now [08:04:16] anyway. [08:04:30] release notes are up and the list of changes was summarized to release team [08:05:12] I assume solookup fix can go in, if it works? [08:05:16] the "what pag am i" pioctl is pretty mundane, e.g. adds only a simple "return 32 bit int" wrapper around a function which is not new for this, so should be relatively noncontroversial [08:05:28] and if a configure test can be made to not screw people for whom it won't work [08:05:39] --- cg2v has become available [08:05:49] i have the code in front of me, but no relevant systems on which to try a configure test. [08:05:58] (no osol systems at all, at the moment) [08:06:15] chaskiel, do you get scrollback or should i copy out bits for you? [08:06:30] I see back to 9:50. Alternatively, there's the web [08:07:22] 9:50 is all there is [08:08:07] i guess questions would be. [08:08:37] do we need to discuss more than the what pag pioctl, the osol socket change and the file >> cache mmap issue [08:09:09] and then, "what issues if any with the pag pioctl", as described a moment ago. [08:09:58] also "i can integrate the osol code change behind an ifdef precluding it from touching other than 'solaris 11' with an additional conditional we can define if there's a configure test possible; is that reasonable?" [08:10:59] and finally, "given lack of firm data about exactly how much larger file need be than cache, are the choices 'warn people and live with it' and 'go back to deadlocking in every case,any case'" [08:11:10] er, every case should only have been any case. [08:11:50] --- stevenjenkins has become available [08:12:41] bleah. I should really find time to play with that issue. I still think that "DoPartialWrite doesn't call VM_StoreAllSegments" would be a sufficient solution to this issue. [08:13:13] you might ask felix if he tried that; i think he did [08:13:22] I think he did [08:14:01] he was unable to create a case which neither deadlocked nor failed to write files sufficiently larger than cache, though what sufficiently was changed over time [08:18:05] aside from file >> cache on linux, any comments whatsoever? [08:21:44] at this point, i am encouraging people to run the prerelease. slac is running testing. [08:22:01] ok. well... seems like this will be a fairly boring release. [08:22:46] boring (stable) releases are good. have you made any progress on the host list race? [08:23:17] what's "fileserver address/uuid tracking unhashing fix"? I can't map that to a delta [08:23:26] uh, hang on [08:24:29] that'd be "sir not appearing in this film" [08:24:39] it wasn't supposed to be on this list; it's only in 1.5.x and head [08:24:55] viced-host-uuid-and-addr-hashing-corrections-20090530, 124634 [08:25:38] (which, to be fair, is only not included because of the size of doing so this late; it was tested heavily including on 1.4.x) [08:26:23] Seems excessive for 1.4 absent compelling bad behavior [08:26:31] But we'll have that argument in august [08:27:40] --- jhutz@jis.mit.edu/owl has become available [08:27:45] one hopes by august there will be operational support for "it works", but we'll see [08:28:29] "it works" is not a compelling argument for changing the stable release [08:29:05] yes. the point is, the opposite is an argument against inclusion. [08:29:39] --- Simon Wilkinson has become available [08:29:48] i guess matt needs to leave now [08:30:13] --- stevenjenkins has left [08:32:52] - [08:33:48] --- stevenjenkins has become available [08:34:13] --- Jeffrey Altman has become available [08:35:06] Not sure why matt would need to leave. [08:35:06] --- matt has left: Lost connection [08:35:15] he claimed he could only be here til 11:30 [08:35:20] Oh, nevermind. [08:35:28] not we were forcing him to go [08:35:48] --- matt has become available [08:36:08] client died [08:36:35] (but my meeting starts asap) [08:36:46] On "what pag am I", I say go for it. In fact, IIRC the only thing on that list even vaguely controversial to me is the mmap thing, which I beg you to get right this time :-) [08:36:48] I wanted to ack derrick suggestion for osol [08:37:19] I can test latest 11 some time next week [08:37:33] matt: thanks. have "fun" with your meeting [08:37:47] jhutz: there's no right, yet, but it is better [08:38:15] --- cg2v has left [08:38:34] I'm not sure we have enough information to truly get the mmap think right yet. [08:38:59] --- cg2v has become available [08:39:01] As far as I know nobody but Felix has reproduced the problem he's seeing. And he's been unclear about the circumstances he's seeing it in. [08:39:21] I guess I will settle for something that's no worse than anything we've had to date. Of course there's a tradeoff to be made between deadlock and panic... [08:39:23] his test client is in the ticket, is it not? [08:39:34] Yeh. I have, so far, been unable to reproduce. [08:39:51] (the still existing file >> cache size problem, at least) [08:40:09] It's not clear to me whether it's file >> cache size, or mmap'd region >> cache size that is the problem. [08:40:44] I assume it's "you dirty enough pages -> enough cache chunks to force DoPartialWrite". [08:40:49] sadly felix likely asleep. [08:41:12] I don't understand how on earth file size can be important, is the thing. [08:41:24] pages, sure, but the overall size of the file should be neither here nor there. [08:41:40] And I find it very hard to think what the right thing to do is if we have more dirty pages than space in the cache. [08:41:43] But I haven't looked at this in any detail since Derrick's original patch which IIRC I decided was clearly wrong but didn't spend any effort to figure out how to do it right. [08:41:57] Especially given that the option to decline to flush pages is no longer there in recent kernels. [08:42:45] I imagine the right thing to do is flush dirty pages to the cache, and when the number of dirty cache chunks gets to be too many, start flushing dirty cache chunks to the server, with the former waiting on the latter. We just don't get it right yet. [08:43:09] Do we have the option of taking arbitrarily long to flush a page? [08:43:19] --- stevenjenkins has left [08:43:23] I think we could, but it would be pretty anti-social. [08:43:39] it's not worse than losing changes, imo [08:43:45] It's entirely possible that the machine would deadlock, too. [08:43:48] that said, i think it's late to ship that for 1.4.11 [08:43:58] Anyway, we should take discussing the details back to the other room. Right now I think the main question is, is what's in 1.4.10 bad enough and what we have now better enough to justify pushing it into 1.4.11? [08:43:59] even assuming it's doable [08:44:01] Oh, completely. [08:44:11] jhutz: Definitely. [08:44:20] We have more data corruption in 1.4.10 than in 1.4.11. [08:44:34] Yeah, I think panic and deadlock are both better than data corruption. [08:45:29] because neither panic nor deadlock ever corrupt data... [08:45:33] anyway, ... [08:47:04] --- stevenjenkins has become available [08:48:47] Deadlock doesn't do _anything_, more or less by definition. [08:49:01] I think the question is whether the data corruption occurs with notice to the user or not. [08:49:12] --- Simon Wilkinson has left [08:49:18] BTW, what sort of timeline are you looking at? We need to move penn sometime next week. [08:49:40] rc fired yesterday. no sooner than a week from yesterday. [08:53:07] anything else? [08:53:16] OK, then I'll deal early next week. [08:53:37] thank you [08:54:13] Nothing else. If you have zephyr servers, don't forget arilinn renumbering early tomorrow morning, and it's the last athena server to be renumbered [08:55:53] yeah, i need to knock over the dementia zephyrds [08:56:23] ok kids, i'm gonna hope for the best and keep my ear to the ground. try 1.4.11 if you get time [08:56:39] Just stay out of the street while you do that. [09:30:58] --- mmeffie has left [10:16:29] --- matt has left [10:57:27] --- cg2v has left [11:05:25] --- Derrick Brashear has left [11:20:43] --- Derrick Brashear has become available [12:36:03] --- Simon Wilkinson has become available [12:50:28] --- Simon Wilkinson has left [13:00:32] --- Derrick Brashear has left [13:20:48] --- Derrick Brashear has become available [13:51:56] --- stevenjenkins has left [13:55:29] --- stevenjenkins has become available