[06:35:40] --- shadow@gmail.com/owl74289755 has become available [07:04:40] --- stevenjenkins has become available [07:29:14] --- tkeiser@sinenomine.net/owl has become available [07:55:44] --- SecureEndpoints has become available [07:59:02] --- cg2v has become available [08:00:33] --- Derrick Brashear has become available [08:00:59] good morning (since all of you appear to be in my time zone) [08:01:12] good morning [08:02:33] i'll give people another minute or two, and then we can go [08:03:18] if you've not checked out http://rt.central.org/rt/index.html?q=124514 [08:03:19] please do so [08:04:27] So, why are the two features (dynamic vcaches & vos split) going into stable? [08:05:36] dynamic vcaches: so people stop crashing their linux systems by exhausting vcaches by pinning all of them [08:05:53] can't see any other way to fix the bug without using GPLONLY interfaces [08:05:55] linux systems don't crash anymore. [08:06:03] Or I didn't think they did [08:06:23] there was at least one, but it may have been old. [08:06:39] not being able to use afs because everything is pinned isn't exactly a useful way to deal, though [08:07:01] as to vos split, that's the one i expected we were going to end up heavily discussing. the code's not committed yet, as such [08:07:11] 1.4 ones don't crash..even w/o dynamic vcaches. [08:08:24] one other issue i know of, not currently on the bug list: figuring out how to get aix 5's krb5 to be correctly detected so aklog builds [08:08:26] I haven't been following the list. Is the context for the signed/unsigned thing there? I'm not convinced we aren't treating a cosmetic symptom here [08:08:29] I've got some patches for vos split, but I haven't touched them in a month or so, so I'm not sure how ready-to-go they are. they actually depend on some other patches. let me take a quick look... [08:08:48] patches above the original code, you mean? [08:09:08] cg2v: with the patch the volumes move; without the patch the volumes do not. [08:09:14] yes, the actual vos split code is in a sequence of patches..iirc, there are 3-4 I need to get done first. [08:09:20] the code proposed for 1.4.9 is only the volser SplitVolume RPC, and the fileserver InverseLookup RPC, none of the command line tool work [08:09:30] ok, good. [08:09:51] getting the RPCs in there is great, but if we need the patches for the commands, I probably can't get to it in the short term. [08:10:19] the commands wouldn't ship in 1.4.9. the RPC code is succinct, mature and auditable [08:10:27] sounds good, then. [08:10:28] The thing is, if we're treating passing an afs_int32 through a long somewhere, that's a bug in and of itself [08:10:28] the command line code isn't ready let alone tested [08:10:33] correct. [08:10:44] let's talk about the volume ids first [08:10:45] s/treating// [08:11:16] btw, my conf call has started now, so if you need me, and I'm not responding, ping me directly. [08:11:21] --- haba has become available [08:12:03] good not morning [08:12:57] reference for volume id issue on list: https://lists.openafs.org/pipermail/openafs-info/2009-March/031022.html [08:16:57] (apologies for the silence; i am tracking through code for a moment to see if the long issue is obvious) [08:17:39] the vos message is entirely about %lu vs %u. The actual types are irrelevant. I suppose it's also relevant that we might be casting a signed type to a larger unsigned type. [08:18:55] i'm less worried about the messages than the apparent issue during cloning (see the original submitter's answer to Hartmut's question) [08:19:23] https://lists.openafs.org/pipermail/openafs-info/2009-March/031025.html [08:20:48] I was just getting to that question myself. [08:20:55] *also* a format string. [08:25:31] Since I presume we can't rely on inttypes.h/PRIu32, we either need to change the types (in the volser/vol interface, not really on the wire), or use an appropriate cast. [08:25:48] That is, something like (void)afs_snprintf(name, sizeof name, VFORMAT, (unsigned long)(afs_uint32)volumeId); [08:26:21] Some sort of cast should really be added anyway, as I'm surprised this works at all on x86_64 [08:26:57] istr 1.5.x (now) has a cast but i'll need to do a fair bit of abstracting to pull from the delta [08:26:59] lemme look [08:27:24] nope [08:28:35] stick a note in 124510, if you don't mind, and we'll move on [08:30:25] 3rd works better for me (since I'd only miss one class) [08:30:41] er, mix [08:34:47] well, when my 64 bit test host starts behaving again, i will look at that issue. [08:35:33] let's see if we can blow through a couple of the (i believe) non-controversial things [08:35:39] cache files by path on solaris 10, 11 [08:35:58] does this need to go backward to solaris 9? [08:37:04] (i'd rather not simply for testing purposes) [08:38:10] (ticket number? it's not listed in 124514) [08:38:21] it's old. um. [08:38:47] 123677 [08:39:23] What about sol9 with cache that has logging on? [08:39:33] this doesn't fix that [08:39:42] this is just about how you find the file to open [08:39:51] ah [08:40:02] if there's potential deadlocking behavior vm system versus afs cache, this does not modify it [08:40:03] Then why backport at all? [08:40:22] Is there a 10 patch that's going to break our current stuff? [08:40:31] "backport" consists of adding a #define to a file [08:40:43] --- stevenjenkins has left [08:40:48] oh, you mean 1.5->1.4? [08:41:12] I don't know, is any of this in 1.4 now? [08:41:34] not in 1.4.8 [08:41:58] IIRC, solaris 10 update 6 ships with zfs root by default. It would be nice to have this patch in stable. [08:42:39] yeah, my understanding was zfs as root would become default; i wasn't sure if it had, but forcing people to convert to ufs probably won't go well [08:42:43] Creating a zvol is not that big of a deal. [08:42:51] But I don't really care. [08:42:56] Don't change solaris 9 [08:43:31] I'm wrong. In U6 ZFS is a menu option; ufs is still default. [08:43:38] linux vm system hang (124456) [08:43:39] How well does this work with zfs root? does zfs not have the deadlocking problem? [08:43:40] --- stevenjenkins has become available [08:44:04] i've been previously unable to reproduce the zfs deadlock [08:44:08] but i don't promise it's not there [08:44:19] er, the deadlock with zfs [08:44:22] you get the idea [08:46:23] my expectation is still zfs root as default before the next stable, whatever it is, comes out. [08:46:58] (unless there's more comment, let's talk about the linux vm system hang) [08:47:10] But there is a workaround, and it deals with potential deadlocks too. [08:47:15] i assume there's no objection to switching to vmtruncate and letting it deal (correctly) with locking [08:47:24] the workaround being "create another volume"? [08:47:58] Create a zfs "volume" and put a ufs on it. It doesn't even require repartitioning or planning in advance. [08:48:18] then leave sol9 as is and change to new in sol10 and 11 as Derrick proposed make most sense. [08:48:24] if we expect people to do it, arguably we should create a script. [08:49:01] we also have requests to support zfs for at least solaris 10 [08:49:04] --- mmeffie has become available [08:51:31] vmtruncate is not for FlushPages [08:51:47] use invalidate_remote_inode there [08:53:06] fine [08:53:07] - vmtruncate(AFSTOV(avc), 0); + invalidate_remote_inode(AFSTOV(avc)); [08:54:37] appearance is that we don't need any other lock/unlock to do that [08:56:44] if RT ever deigns to respond to me again i will update the ticket [08:57:54] 124455 CheckCall is something i just need to verify. i have a machine set up to do so [08:58:10] so let's come back to vos split [08:59:03] feeling is maturity isn't there, don't ship? [08:59:32] who is going to make use of it at this point? what is the benefit to shipping now? [09:01:32] quite possibly nobody. i expected 3rd party afs management tools would want it. [09:02:09] given that the command line tool is not ready, perhaps deferring this is prudent [09:02:13] I suspect they will but I think we can wait [09:02:44] removed from dependency list [09:02:47] A possible benefit to shipping now is that server upgrade cycles are slower than client upgrade cycles. [09:02:58] well, that was my thought [09:03:13] but if it's going to make people nervous, there's also benefit in not shipping it [09:03:17] But my actual opinion is "this isn't a 1.4.x feature" [09:03:30] fine. so be it. [09:03:43] shall we go back to the dynamic vcache patch. [09:03:55] sure. [09:04:00] is "you can pin your entire vcache pool and make your machine unusable" a bug? [09:04:04] my position is yes [09:04:19] I agree [09:04:26] so you think we need a high-water mark for vcaches? [09:04:44] orthogonal to my point [09:05:03] ok.. [09:05:07] (re 124456; I don't know why osi_VM_FlushPages uses truncate rather than invalidate. If there's a good reason for that, then using invalidate_remote_inode is not enough. Using vmtruncate isn't right in any case as it does i_size_write, etc) [09:05:14] the first question is whether or not the problem is something that needs to be fixed in 1.4.9 [09:05:20] then we can discuss possible solutions [09:06:03] what i think is the behavior we have now, through 1.4.8, is "you can pin your entire cache with something like famd" [09:06:28] and in whatever version was the last one to panic, you'd panic. [09:06:37] People have lived with this problem since the transarc days. OTOH, I've informally proposed "-stat is a target, not a limit" for linux ever since the inodes became seperately allocated. [09:07:01] well, -stat becomes a target with this patch [09:07:16] in any case, in the transarc days linux didn't have inotify [09:07:38] and we can't do anything to manage inotify unless we become GPL [09:07:58] and i'm not going to do that for 1.4.9 ;) [09:09:14] The patch looks ok in principal, but why is NewVCache overhauled like that? [09:09:38] Oh. it's split up. I see now. [09:09:59] because some parts of it needed to move out into the piece which keeps you near your target [09:10:52] I think the delta's misnamed (it _depends_ on dynamic vcache allocation. It doesn't provide it), but other than that, throw it in. [09:11:27] so the only thing left to discuss is aix 5 krb5 configure detection [09:11:42] --- matt has become available [09:12:00] e.g. #error "Must have either keyblock or session member of krb5_creds" [09:12:04] Which I have no opinion on, and it's time for lunch [09:12:27] I'll propose something simple: if --with-krb5 is asserted, and neither KRB5_CFLAGS nor KRB5_LIBS are set, how about we test for -L/usr/krb5/lib -lkrb? [09:12:43] enjoy lunch [09:13:24] something simple won't work there are 2 issues [09:14:16] 1) aix_aklog.c builds always, and shouldn't. 2) we need to know whether to use heimdal or mit and even with configure being told the right thing we don't figure it out [09:14:26] 1) is probably an easy fix [09:15:45] 2) i need to get configure output [09:16:18] my proposed solution is to open a ticket for this now, not block pre1 and see if we can fix it for pre2 since it's build system only, the code does build and work if you tell it the right thing [09:17:09] fixing the string format issues in the volume paths is also an action item, but that should ideally be done before pre1 [09:17:32] so. what else do we need to discuss (or can i also go get lunch) [09:17:50] I agree that the build changes can wait for pre2 [09:18:22] I guess I'd add a (3) in that case, which is I'd like to see NAS autodetected as a fallback for when neither heimdal nor mit can be found. [09:18:34] do we have a target for pre1, pre2, ...? how often will the pre releases be issued? [09:18:36] if --with-krb5, you mean [09:18:46] yes, --with-krb5 [09:19:00] i assume as soon as a pre has no reported issues we tag and ship [09:19:31] my expectation is there will be a pre2 one week after a pre1, and we give a week per pre [09:19:45] there needs to be a minimum time period before that determination is made. 7 days with no blocking issues? [09:20:02] that would be my suggestion [09:20:06] --- cg2v has left [09:20:10] --- stevenjenkins has left [09:20:14] that is fine. we need to communicate it. that is all. [09:20:17] --- cg2v has become available [09:20:29] ok [09:22:21] then i assume that's it. [09:23:14] I think you can get Nikke in Umeå to fix the AIX stuff if you give him a pre1 [09:23:17] --- stevenjenkins has become available [09:23:25] i can test it myself, but he has aix6 [09:23:39] ah, damn [09:24:40] You may convince me ( oslevel 5.2.0.0) but that is our backup box. [09:25:32] this is not a showstopper for pre1 anyway. [09:38:14] in that vein, aix5.1 won't build at present because sys/pag.h doesn't exist. Someday we should get around to writing a configure test [09:51:07] I did not notice because we kindof jumped over 5.1 or so. [10:33:50] Ok, time over, have a nice weekend. [10:33:57] --- haba has left [10:36:16] --- Derrick Brashear has left [10:49:34] --- cg2v has left [10:57:14] --- stevenjenkins has left [13:13:15] --- SecureEndpoints has left: Disconnected [13:57:47] --- matt has left [14:11:38] --- Derrick Brashear has become available [14:58:27] --- mmeffie has left [16:00:01] --- Simon Wilkinson has become available