[00:13:04] --- kaj has become available [01:09:27] --- Simon Wilkinson has left [01:47:40] --- Russ has left: Disconnected [01:54:50] --- Simon Wilkinson has become available [01:57:08] Did my "don't hardcode flex" patch break on some platforms? [03:27:59] --- abo has left [03:51:17] --- abo has become available [05:34:51] --- Simon Wilkinson has left [05:37:54] --- Simon Wilkinson has become available [05:39:59] --- meffie has become available [05:45:40] --- pod has left [05:51:38] --- jaltman has become available [05:52:45] > Did my "don't hardcode flex" patch break on some platforms? for reasons which are not entirely clear to me. [05:54:07] I wonder if it's because autoconf isn't setting the portability flag when it gives us 'flex' on some platforms. [05:54:29] (-l) [05:54:31] well, that'd have to be it. but i don't understand its picks. [05:54:40] (since -l implies %option yylineno) [05:54:57] and honestly, i think one place with brokenness, did have -l... [05:56:32] ah. hm. i lie. nope. in fact it does not have -l [05:58:28] I wonder what the correct (autoconf) way to deal with this is [05:58:36] still looking [06:05:59] "by parsing the output of `$LEX --version' than by simply relying on test "x$LEX" = xflex" [06:06:08] yeah, so, i think we get to solve it ourselves. [06:06:57] Yuck. [06:07:06] Feel free to revert that patch for now [06:08:14] well, i think we should just fix it [06:08:35] since hardcoding a lexer has caused its own issues [06:08:53] and honestly, i'd be tempted to just stop needing yylineno, as i said [06:09:21] well. i suppose we probably should just detect flex and deal [06:10:12] --- abo has left [06:10:34] --- Kevin Sumner has left [06:10:51] --- abo has become available [06:11:28] --- Kevin Sumner has become available [07:22:55] --- deason has become available [08:14:27] --- meffie has left [08:15:03] --- meffie has become available [08:38:35] --- reuteras has left [09:02:54] --- pod has become available [09:18:08] --- kaj has left [10:02:23] I understand the desire to prevent the use of any loopback address, there are times when testing that the ability to configure clients and servers on loopback addresses is desirable. [10:03:32] see my comments in the gerrit issue [10:05:12] --- Simon Wilkinson has left [10:13:04] consolidating the checks allows this to be easily configurable, at least [10:14:19] and at least for the 'viced client alt addresses thing', it doesn't prevent loopback communication; it just prevents loopback addresses in the alternate IP list [10:14:54] the file server has no method of knowing what a loopback address is on the client. [10:15:01] so if you are connecting from/to a lo ip, it'll still get in the interface list since we add the connecting one [10:15:42] yeah, and that's probably reasonable [10:15:51] the 127.0.0.x addresses are non-routable but they are not the only addresses that are used on loopbacks. [10:16:19] All of the Windows clients for example have a 10.x.x.x address on the loopback. [10:16:40] but we always know 127./8 is loopback [10:17:10] but you don't know that they are not valid alternate addresses for the connection that is being received. [10:17:43] a connection from the local machine could be received from a non-127 address and the 127 address is perfectly valid. [10:17:50] i don't think we always 127/8 is a loopback [10:18:08] i'm actually fairly certain have have evidence it's not [10:18:36] I thought 127/8 as lo is part of ipv4 [10:18:58] I agree that a client should avoid putting its loopback addresses in the published interface list. However, that check should be made by examining the interface to see if it is a loopback interface not based upon the IP address that is assigned. [10:19:22] we can't assume existing clients are reasonable [10:19:36] also, do all platforms give us a means to determine that? [10:20:01] of course not. the file server must be able to handle the same address being reported by multiple clients and must be able to deal with addresses which are not reachable. [10:20:16] --- Kevin Sumner has left [10:20:32] --- Kevin Sumner has become available [10:20:46] this problem is no different than clients from N different NATs reporting the same 192.168.1.100 address to the file server [10:21:13] as far as 127/8, i think it's specified by RFC but i mean empirically i think (mumble) is abusing it. [10:21:45] rfc 3330. [10:21:56] i guess if (mumble) is being special, (mumble) can patch [10:22:07] --- meffie has left [10:22:58] --- meffie has become available [10:23:44] --- rra has become available [10:24:05] as far as I'm concerned the reported addresses from a client are untrustworthy. the file server should only use them as an emergency fallback if the primary callback connection is useless and the address/port can be verified to reach a client with the same UUID. [10:24:59] yes, I know that; this is just intended as an optimization [10:26:00] (related: gerrit 2367 and 2368, for those that don't know what this is about) [10:27:51] I don't see how a 127/8 address would be required to make things work, _except_ in testing circumstances where we can make special accomodations [10:29:31] not all "testing" is performed by developers. it is very common for someone evaluating a technology that is new to them to deploy it on a single machine with loopback interfaces [10:30:18] the inability to do this with MIT Kerberos has resulted in a lot of very negative feelings in a subset of the first time sysadmins [10:30:59] I know, I've hit that exact thing with mit when trying to do just that; my opinion would be to just specify a flag to allow it or something [10:32:03] except that first time users are not going to know to use the flag. the only time it is necessary to filter is when a site knows they are doing something special. For example (mumble) already has a special filter in place because of the custom loopback address they have assigned to the windows clients. [10:32:55] if i gotta specify a flag and it's not obvious, i hate you. [10:33:14] The windows client already has code to detect loopback interfaces but because (mumble) is using their own custom driver to provide the loopback interface, the operating system standard code does not detect it. [10:34:30] if you want to make something friendly, issue a FileLog message when a loopback interface is identified in the TellMeAboutYourself response and suggest turning on an flag to filter them if its a problem [10:34:46] er, wait, let me go back.... > it is very common for someone evaluating a technology that is new to them to deploy it on a single machine with loopback interfaces this doesn't prevent you from using loopback interfaces [10:36:21] --- kaj has become available [10:36:24] it just removes it from the alternate list... so this would 'fail' only if you were using a normal address, you have another loopback address, and the original normal address went away... [10:36:34] but even then, whatever address you use to contact viced from then on would be usable [10:37:21] yeah, this doesn't break the... basically, this would still work there [10:37:29] so I'm not worried as much for testing purposes/initial impressions, but mumble is another case... [10:38:54] but also, removing the addresses from the list doesn't fix any actual _problem_ I see (just that we contact an extra, arguably known-bad address), so I'm fine with just not doing it [10:43:07] I don't see the benefit from it. The much bigger problem is the unreachable addresses that the file server finds out about. I seriously question whether it is worthwhile for the file server to be responsible for attempting to deliver callbacks to alternate addresses. It made more sense before we had delayed callback breaks and before we had clients that perform regular pings (and now the NAT pings) [10:44:03] getting rid of all of the alternate address processing would significantly simplify the file server [10:45:15] until we have a secure callback channel it should just be left alone. when we do, things can change [10:45:20] but only for those hosts [10:45:24] (imo) [10:45:26] if a client stops responding on the IP you know about... you just wait for it to contact you again? [10:45:53] if you don't call us, we won't call you :) [10:46:05] the reality is that it would only be safe to contact alternate addresses with a secure channel. what we have now is completely unsafe. [10:46:42] I thought the argument is that non-XCB callback breaks aren't so bad with the current 'unsafe' way [10:47:23] the behavior for Unix clients when receiving a callback break is to break that AFSFid in all cells. [10:47:34] > if you don't call us, we won't call you yeah, but if you can say "you can also contact me at addr X", it provides better chances at better cache coherence [10:47:38] or to break all callbacks on the volume FID in all cells [10:48:24] i'll side with jhutz: changing this seems like a recipe for destabilizing things. [10:48:33] (not the loopback change, i mean) [10:48:43] it is only better cache coherence if the time it takes to find the alternate address, validate its uuid, and patch up the connection table is faster than the ping time. [10:48:58] the pings are not guaranteed. [10:49:11] not all clients do them, they can be disabled, etc... [10:49:21] --- meffie has left [10:49:42] in the vast majority of cases the client went away because (a) it was being shutdown; (b) it is behind a NAT/firewall with a short port mapping timeout; and (c) it is mobile and is going to come up on an address it didn't know about before [10:50:26] I certainly wouldn't make such a change for 1.6 or 1.8. [10:50:41] 2.0 might be possible with serious real world testing [10:50:59] good, then i can go back to ignoring this for a while and worry about whatever is on fire under my chair. [10:51:04] --- meffie has become available [10:51:41] I'd agree those are the vast majority.... but you're not leaving an option for the admittedly rare case of a multihomed host with n < N_IFACE unreliable connections [10:52:08] (or not unreliable but you want "better" or something) [10:52:28] if you have a secure method of establishing connections from all of those interfaces, the file server can construct a list and maintain it. [10:53:15] the problem is that the client view of the world is not trustworthy because (a) the client doesn't know the server's view of the world and (b) client's often lie [10:54:37] ah, yes, that would be nice; the client trying to initiate a connection from each local addr it knows about [11:09:27] --- mattjsm has become available [11:11:40] --- mattjsm has left [11:13:36] my preference would be for you to modify 2367 to only consolidate the test. The windows code to identify the interface type should be folded into the rx_IsLoopbackAddress() function. The change to filter all 127/8 should be its own ticket which can be further debated. 2368 I would like to see abandoned. [11:14:22] > 127/8 should be its own ticket that was my suggestion. and i think mod what i said about mumble i am comfortable filtering 127/8 [11:15:07] --- mattjsm has become available [11:23:13] > we always know 127./8 is loopback No. We always know 127.0.0.1/32 is loopback. [11:24:45] I don't think I've ever seen something specify 127.0.0.1 as more of a standard or more guaranteed than the 127/8 block [11:24:59] RFC 5735 [11:25:22] --- Simon Wilkinson has become available [11:25:28] yes, which says it's "ordinarily" 127.0.0.1, but the whole block must be treated that way [11:25:29] However, it also says "addresses within the entire 127.0.0.0/8 block do not legitimately appear on any network anywhere," which means that for our purposes we still shouldn't register them. [11:25:54] But we don't know that they're loopback, since only 127.0.0.1 is guaranteed to be that. [11:25:58] That's all jhutz is saying, I think. [11:30:03] > i'll side with jhutz did I actually say that yet about this change, or was I only thinking it? :-) [11:30:48] you didn't say it. you're certainly said it before regarding this code [11:31:58] in general i think you may be excessively and sometimes annoyingly cautious about changes here, but i think it's possible to change and tread lightly. i don't think changing things severely here wrt alt address tracking is particularly lightly treading. [11:34:44] > for our purposes we still shouldn't register them. Not entirely true. The right thing to do here is to register everything, and make the vlserver apply some scoping rules. That's hard, but I think it's less hard and more likely to do the right thing than introducing arbitrary assumptions about network architecture into the fileserver. [11:35:13] client addr advertisement; vlserver doesn't come into this [11:50:29] --- Kevin Sumner has left [11:50:44] --- Kevin Sumner has become available [12:33:38] --- mattjsm has left [12:35:47] --- mattjsm has become available [12:44:20] --- kaj has left [12:51:38] --- kaj has become available [12:55:28] --- phalenor has left [12:59:46] --- phalenor has become available [13:01:39] --- kaj has left [13:03:16] --- mattjsm has left [13:11:45] --- mattjsm has become available [13:12:47] shadow - do you which files the relevant initStates are in? [13:12:59] probably all src/afs/afs_call.c [13:13:15] i bet grep knows. conveniently, i can grep. hang on [13:13:48] i doubt you care about any in afs_daemons.c. you don't care about any in afs_pioctl.c [13:13:48] ok. just double checking. i know there's also different permutations of afssyscall that may or may not be relevant [13:14:06] they're all the same. just the wrapping is different. [13:14:14] afs_syscall_call is probably what macos calls it. [13:14:22] since i have macos on the brain [13:14:37] yeah. that's what i've got tracing the syscalls now. it's following them from that is the issue atm [13:15:09] well, if you track the while (afs_initState < (whichever)) with printfs as in the email i bet you can make progress [13:16:11] k [13:16:11] --- mattjsm has left [13:37:02] Speaking of afs_fooStates, I got another panic on afs shutdown yesterday. (On my laptop while I was in X, oops.) This being in addition to the one where rx_socket was not in the file descriptor table for soclose(). osi_StopListener() should only get called once, right? [13:38:16] osi_StopListener *should* be called once. it may not be [13:39:49] like, it's worth checking, not that i expect it is called twice [14:04:34] Well, it looks like if it is happening, it only happens sometimes. [14:28:16] --- kaj has become available [15:15:30] --- kaj has left [16:02:16] --- deason has left [19:01:20] --- Simon Wilkinson has left [19:12:19] --- rra has left: Disconnected [19:32:14] --- Russ has become available [19:49:35] --- jaltman has left: Replaced by new connection [19:49:36] --- jaltman has become available [19:52:01] --- jaltman has left: Disconnected [19:52:11] --- jaltman has become available [20:21:23] --- kaduk@mit.edu/barnowl has left [20:35:36] --- Jeffrey Altman has left: Replaced by new connection [20:36:14] --- Jeffrey Altman has become available [21:17:48] --- Born Fool has become available [22:20:54] --- reuteras has become available [22:28:22] --- Born Fool has left [22:43:45] --- kaj has become available [23:28:52] --- kaj has left [23:32:01] --- steven.jenkins has left [23:34:02] --- steven.jenkins has become available