[00:09:11] --- Derrick Brashear has left [01:27:54] --- Derrick Brashear has become available [01:27:58] first session [01:28:03] generic quota mechanism [01:28:05] --- Simon Wilkinson has become available [01:28:17] christof: at the moment we have only minquota/maxquota [01:28:49] perhaps we should create an array of tag-value pairs of elements [01:28:54] derrick: why fixed array? [01:29:06] christof: could be not fixed array [01:29:21] christof: want to track boundary and actual state [01:29:31] simon: want to generalize for more than osd [01:29:42] christof: can bind the tags to anything [01:29:57] simon: current scheme can be implemented in this [01:30:20] do tags need to be shipped with afs? [01:30:26] jeff: you need an enumeration rpc [01:30:59] simon: does it just return a human readable string? [01:31:08] jeff: localization desires [01:31:37] simon: would just list supported =tags for that server or that volume? [01:31:39] jeff: yes [01:32:22] aside: http://www.ietf.org/id/draft-benjamin-extendedcallbackinfo-00.txt [01:32:39] jeff: this scheme gives clients unaware of tags the ability to dump raw data [01:32:47] simon->christof: is this an action item for you? [01:32:51] christof: ok. [01:32:59] simon: (summarizes the path forward) [01:35:10] simon: gut feeling to ignore the posibility that we need to move to 128bit ints when we do this [01:37:42] christof: partition usage also? [01:37:51] simon: i think this is more quota-focus [01:38:09] (zfs discussion) [01:40:14] simon: feeling as to whether this goes into FetchStatus? [01:40:31] derrick: if it can represent a potentially-boundless set of values, it's a new rpc [01:41:48] hartmut: question of whether a versioned union could be used in new rpcs [01:42:05] simon corrects: FetchVolumeStatus [01:42:19] so we should just rev that [01:42:28] and the volserver-similar struct [01:43:25] simon: VolGetStatus has to change. (ali may also care) [01:44:27] hartmut: volintinfo needs to then be able to encapsulate this data so it can move with them [01:45:00] then dumps need to have a new tag which encapsulates the array [01:45:20] tom: listmultivolumes rpc will be done for dafs; would be good to get everything into a new rpc [01:45:39] simon: old client and old rpc needs the information it can have, loses what it can't [01:46:08] ultimately informational, not used by cache manager [01:46:21] simon: tom, suggesting just rev'ing existing rpcs? [01:46:27] tom: yes. avoid more round trips [01:46:54] simon: should we discard the existing fields in the new rpc? [01:47:05] derrick: new rpc users know how to decode; yes [01:48:10] simon: do you even need enumerate tags? [01:48:18] or does the fileserver just not let you set things? [01:49:14] jeff: what happens with a single unknown tag in an array? [01:49:20] simon: reject whole rpc [01:49:26] jeff: how do you know which failed? [01:49:33] simon: maybe we do need to enumerate [01:50:20] question of max tag issue won't fix [01:50:24] we need enumerate tags [01:50:30] (maybe?) [01:51:12] simon: to provide meaningful error messages, need enumerate tags rpc [01:51:27] tom: looking for 3 tags [01:52:36] implementation type (for whole server) [01:53:27] current volume state (raw, mapped (online, offline, preonline, busy, salvaging)) [01:54:00] simon: are the values int32? [01:57:06] tom: ideally this belongs in the volint interface simon: seems reasonable except implementation type [01:57:34] simon: implement something in addition to simple capabilities bitmap? [01:58:08] simon: use a capability for now, and if we need to revisit, we can [01:58:58] tom: that's all we need [02:00:46] simon: tom, will you write that? [02:00:48] tom: yes [02:01:16] simon: do it as part of the rpc refresh or re-rev rpcs after? [02:01:47] derrick: can we do this first? [02:01:53] simon: just don't want to block rpc refresh on this [02:02:17] tom: i-d form? [02:02:19] jeff: yes [02:04:58] metadiscussion: new drafts to ietf should be draft-(individual)-afs3-(whatever's a draft) [02:06:35] moving on [02:06:40] rtt calculations [02:09:22] jeff: currently rx does rtt calculation using van jacobsen and phil karn's work [02:09:30] currently the values generated are inaccurate [02:09:32] 2 issues [02:09:39] 1) selection of packets as input is flawed [02:09:51] should exclude retransmits [02:10:10] (as documented by phil karn in 1987; we don't know which of the packets resulted in the reply) [02:10:47] currently we assume from the first packet, so the rtt could potentially be the real rtt+the retransmit interval [02:11:13] 2) not actually including all packets we could gather an rtt over, we filter out a large number for various reasons, so our measurement set is poor [02:11:25] we end up having skewed unrepresentative changes [02:11:44] side effect: very few retransmits, fewer than desirable to maintain performance [02:12:04] we hand out data, then block waiting far longer than we should rather than filling the pipe [02:12:47] operational experience shows more retransmits, a larger sample set and better performance [02:13:00] jake's work was 2 patches [02:13:19] 1) a way to export rx peer structures via api, previously only via rxdebug [02:13:26] which is integrated [02:13:46] 2) he has a later patch which uses the rtt values as an input to compute server rankings [02:13:56] this cannot be usefully used without the rtt fix [02:14:52] (jeff gives background of how rankings were previously calculated) [02:15:48] jake's cm implementation uses a background thread to periodically recompute server rankings [02:22:36] jeff: algorithms there are rough, they could use performance analysis and we should look at other ways to process the data [02:22:48] jake is trying to get an undergrad research grant to continue this [02:25:24] Derrick: Window size is easy. Nothing to negotiate [02:25:35] Hartmut, Jeff and Derrick discussed earlier [02:25:48] If you support a larger window size, and the other side supports it, it just works. [02:25:58] If the other side doesn't support it, then it will just throttle you. [02:26:17] hartmut: Window size is only one byte. Can't go any larger than that. [02:26:28] derrick: Is this a problem? [02:26:34] tom: 1gigbit will make you sad [02:26:44] hartmut: Problem is WAN with high RTT [02:27:01] (actually tom said transcontinental 100gig would make you sad) [02:29:05] hartmut: On a WAN, it would be nice to be able to have higher window sizes [02:29:24] derrick: Is it controversial to say we should push the window size as high as we can with our current implementation [02:29:43] derrick: We probably need to think a lot before we start considering reving the RX header [02:29:53] jeff: agrees, but uses more words to say so [02:30:25] derrick: OpenAFS should push it to at least 128, or to whatever the actual limit ends up being - 254, providing there's not an issue with it being a power of 2 [02:30:37] jeff: we need to test in an environment with delays in it [02:30:53] derrick: Will take as an action item that he will test this out, and move it forwards [02:31:04] Moving on ... [02:31:46] derrick: RX already has negotiation by using the ping/ack payloads [02:32:04] size is used to determine how to decode it [02:32:14] can add more options by changing the packet size [02:33:06] What things do we want to negotiate in RX, and what is the fallback procedure so that we don't throw away data [02:33:40] Providing we keep the original start to the payload, old clients will keep working, even with an extended packet size [02:34:07] If we want to do more calls, we also need to rev rxkad [02:34:51] New challenge has to be existing challenge with more stuff at the end [02:35:44] Derrick: Not sure what the path forwards is, because this isn't obviously AFS3 protocol, as it's RX. [02:36:21] Simon: Thinks we should do this on the afs3-stds list [02:36:48] Jeff: Thinks we should directly approach people at Universities with a history of RX use/development to review [02:37:10] Derrick: Wonders if we will find people [02:37:57] Matt: Should talk about this on afs3-stds, because this is where we should talk about stds things [02:38:06] (discussion about where we discuss RX) [02:40:06] simon: afs3-stds is the best place for common discussion; private discussion may also take place or those others may choose to comment to the list [02:40:37] Derrick: Should there be a draft fall out of this? [02:40:42] Jeffrey: Yes [02:41:05] e.g. draft-brashear-rx-call-option-negotiation [02:41:42] Derrick will go away and write this up [02:41:55] derrick: Are there other things that we know that should be being negotiated? [02:42:07] matt: packet size? [02:43:07] (discussion of mtus) [02:43:37] Derrick: going to do path mtu discovery, but that doesn't cause protocol changes [02:45:04] Marcus worried about implications of path mtu discovery. [02:46:36] (discussion about avoiding it in cases where client is just doing short lived connections) [02:47:13] Goal is to get the most out of rx/udp for long running services. It's a trade off. Don't make short lived connections worse, but not focussing on making them better. [02:47:37] Derrick: Another thing is delivering large payload rx packets, that aren't jumbograms [02:47:53] Need to test and confirm that implications do the right thing, but no protocol implications. [02:47:59] s/implications/implementations/ [02:48:20] Current rx library should support it, but not necessary well. [02:48:44] Derrick is going to go off and do this. [02:49:06] ... but no protocol/documentation issues? [02:49:17] Room seems to think we just want to see the code. [02:49:48] rx/udp discussion is pretty much done. We came out ahead [02:56:56] Hartmut & Christof leave for plane ... [03:14:24] next up: rxgk [03:15:26] simon: splits into 2 parts [03:15:33] negotiation/key establishment [03:15:36] data encryption [03:15:40] there' [03:15:48] --- tkeiser has become available [03:15:48] s a key negotiation rpc [03:16:04] via an unencrypted rx connection [03:16:35] in the rx specific part, it's defined as application-dependent who you negotiate with [03:16:45] in afs it's proposed to be against the vlserver [03:16:52] avoids upcalls, gss in the kernel [03:17:13] GssNegotiate RPC, this is slightly stateful as multiple round trips are possible [03:17:21] the opaque in/out tokens allow state [03:17:47] for security reasons they need to prevent hijacking another in-process connection, however the opaques are implementation defined [03:18:16] the client sends startparams: enctype, levels, lifetime, bytelife, nametag and a nonce [03:18:44] levels include the obvious ones, plus bind, which allows e.g. transport-specific security to be bound; not proposed to be implemented yet [03:18:58] s/transport/network protocol/ (e.g. ipsec) [03:19:36] when after you've done enough round trips and you've finished negotiating, you get a clientinfo blob [03:19:57] errorcode, flags, enctype, level, lifetime, bytelife, expiration, a gss mic, ticket and the server nonce [03:20:23] the ticket is an opaque identifier [03:20:37] (previously was defined in the draft but it should be implementation-dependent) [03:20:52] an afs proposed ticket will be made but not part of this [03:21:05] by the time you get this info block, there's a gss context established [03:21:16] the mic provided is calculated over the provided start params [03:21:23] so no owngrade attack is possible [03:21:43] gsswrap encrypts this block that comes back, tying in the server nonce [03:22:01] server never gives the client a readable key [03:22:34] the prf negotiated from the sec context, uses the 2 nonces from each side and embeds the calculations for the prf into the ticket [03:22:53] uses gss pseudo random with the client nonce||server nonce [03:23:14] the K output length is the key gen seed length specified in 3961 [03:23:28] the gssapi layer is not tied to kerberos in spite of using 3961 for this table [03:23:44] in the simplest form you've negotiated a key to use for the rest of the session [03:24:06] ticket is used as paert of security class and is used for establishment of the security class [03:24:17] tk, the transport key, is derived from the overall [03:24:27] key, using the prf+ operation [03:24:39] and the random-to-key operation [03:24:56] the client asserts a timestamp as the input to this key [03:25:04] the challenge is simply a version and a nonce [03:25:10] xdr-encoded, sent to client [03:25:28] response: version, start time, token, authenticator [03:26:03] authenticator: limited by rx max calls. [03:26:10] needs to be variable length [03:26:31] encrypted with transport key [03:26:39] every enc operation uses key derivation [03:26:48] one set of derivation values for every kind of operation we do [03:27:02] authenticator encrypted in the tk; the [03:27:17] none, epoch, cid, call numbers decrypted and if it matches a sec context is established [03:27:25] 3 security layers: [03:27:40] encyption. has pseudoheader: call, seq, data len, service id [03:27:48] encrypted with 3961-style enc function [03:27:55] 2 derivation values per direction [03:28:45] integrity adds the header to the mic generation, then ships without the header [03:29:17] (fire alarm) [03:29:49] auth-only sticks payload straight onto the wire [03:29:57] that's the simplest way to run it [03:30:06] more complex mechanism avoids cache poisoning [03:30:13] can also solve migration problems [03:30:19] in this mode: we have a token [03:30:34] the traffic should be authenticated as coming from both user and cache manaher [03:31:11] the attack this precludes is the user cannot spoof being the server to the cache manager to inject data into the cache of a client which will be executed by the system or another user [03:32:20] combinetokens allows a keyed cache manager to have input to the token such that both the user and the cache manager have both proven they are involved [03:32:35] combinetokens takes 2 tokens, combines, gives you one. implementation-defined [03:32:50] actual openafs implementation will be more complicated [03:33:22] token0 and token1 and used to get key0 and key1 and then uses key combination to create keyN to return in tokenN [03:33:34] in the afs case it includes the user and the cm identity in the token [03:33:55] key combination algorithm is from the kerberos working group [03:34:17] KRB-FX-CF2, includes 2 pepper strings, used similarly but not quite like salts [03:34:22] not quite hmacs [03:34:43] outsource security implications to krb wg [03:35:09] some are keen to keep rx part of the description away from "how it works with afs"; this document splits into those parts [03:35:21] with afs, combinetokens will be more complex [03:35:31] we want to allow for "departmental fileservers" and mixed cells [03:35:46] (kad, rxk5 and rxgk servers deployed) [03:36:36] for afs, the extended CombineTokens RPC includes a target host and service: it will give you the right kind of token for the desired service, maybe includes the afs global key, maybe server specific, and maybe none, in which case you use another mechanism [03:36:41] questions? [03:37:01] elizabeth: is this at the volume level? [03:37:13] jeff: at the server (connection) level; not specific to afs [03:37:31] simon: i will implement the non-afs-specific version of combinetokens, but afs will not use it [03:38:22] jeff: you get the same crap as yesterday [03:38:34] how do you solve the first packet problem? use the solution from yesterday [03:38:42] same for binding the client uuid to the authenticator [03:38:48] simon: can't be a uuid [03:38:58] application-opaque [03:39:07] there may be api issues with that [03:39:39] jeff: implementation note for afs: we will need to extend vl_registerrpc so types of authentication can be registered in order for the afs combinetokens to be useful [03:40:07] simon: we also need for dept fileservers a repository of private keys [03:40:12] derrick: so vlserver is a kdc? [03:40:22] marcus: it's the logical place to put it [03:40:32] simon: separate data store [03:40:53] marcus: you could use the ubik "table-like" feature [03:41:11] simon: we'll come back to departmental fileservers as 2nd phase of implementation [03:41:41] derrick: use a second vlserver "rx service"? [03:41:57] marcus: reuse vlserver connections? [03:42:11] simon: combinetokens probably needs an rxgk-protected connection [03:42:20] protected with the single token as a result of negotiate [03:43:12] side discussion of connection overhead [03:43:35] simon: server combinetokens can be overloaded/DoS'd by combinetokens in the clear [03:43:42] rxgk protection removes that [03:43:55] marcus: what if combinetokens was done on the client? [03:44:02] simon: client can't decrypt [03:44:07] it's opaque [03:44:16] --- tkeiser has left [03:44:20] new token is not simply result of smashing 2 tokens together [03:44:39] --- tkeiser has become available [03:46:21] side discussion of what is known about the token by whom [03:47:12] jeff: said in talk, missing in document [03:47:32] simon: the thing i said for the afs draft will be in the afs draft [03:47:35] not this [03:47:49] jeff: bytelife [03:48:12] it's advisory which is the right way to go in jeff's opinion [03:48:19] server can issue a challenge whenever it wishes [03:48:24] rxkad dtrt already [03:48:34] need to define a mech to request a challenge from a client [03:49:06] jeff: derrick, can it be done in a ping payload? [03:49:24] derrick: i think that's difficult as far as protecting payload [03:49:33] simon: should be application-specific [03:49:43] for example can be done now by establishing a new connection [03:50:03] jeff: the client as part of a response could be send me a challenge every (10mb? whatever) [03:50:18] simon: rxk5 wouldn't need this but in general it's more global in scope than rxgk [03:50:36] jeff: could include this bytelife the security layer data header? [03:50:42] simon: bytelife negotiated [03:51:11] derrick: just issue a challenge that often? [03:51:24] simon: ue the shortest byte life [03:51:34] marcus: disagree [03:51:44] jeff: even if the fallkback is breaking connections? [03:51:55] marcus: what happens if 2 calls happen at the same time? [03:52:08] and the server bytelife is exceeded [03:52:13] and there are outstanding calls [03:52:34] simon: bytelife is soft, won't interrupt in-flight calls [03:53:01] marcus: you'll have to come up with an exactl number [03:53:02] --- tkeiser has left [03:53:18] --- tkeiser has become available [03:53:45] simon: it's an advisory request [03:54:01] matt: server should enforce bytelife [03:54:18] simon: next new call would be on a new connection to solve this [03:55:44] marcus: does a challenge cause a new transport key to be selected? [03:55:46] simon: yes [03:55:56] marcus: how do you handle in-flight data? [03:56:02] simon: track which key per call [03:56:30] tom: what if it's an enourmous call [03:56:47] simon: no good answer [03:56:55] tom: time isn't the best thing either [03:57:03] simon: no. it comes from the gss layer though [03:58:00] marcus: skew time is important to calculate the start time [03:58:10] a one second in the future ticket was a problem [03:58:20] simon: gss layer handles time, should we honour expiration or not? [03:58:22] bytelife is harder [03:58:55] only choice for bytelife is to make it optional and make it the client's problem [03:59:21] when the client starts a new call it could start a new connection [03:59:58] tom: version ordinal key operations for connection? [04:00:28] simon: if we add a crypto header, part could be a key index [04:01:04] tom: rx could be used for an async rep mechanism, not efficient to shut it down [04:01:15] marcus: bytelife has 3 components [04:01:22] when client should think about getting a new key [04:01:26] when server should issue a challenge [04:01:38] when server should stop allowing valid key [04:02:00] jeff: what if the challenge could simply be replied to again? [04:02:03] --- matt has become available [04:02:12] derrick: add an epoch to the challenge response? [04:02:17] simon: you'd not get a new key [04:02:41] jeff: we don't need to; just say the next call to pseudo random is the next key [04:02:57] when a new packet arrives with a new key id, you call that and you're using a new key [04:03:06] simon: server calls and gets new key [04:03:11] jeff: client mech is trivial [04:03:28] simon: packet with key 2 arriving before a packet with key1, hold on to it [04:03:32] jeff: for a window size [04:03:42] simon: we already need to hold packets for decrypt-in-order [04:03:48] --- tkeiser has left [04:04:07] --- tkeiser has become available [04:04:14] jeff: in the rxgk response, make it a 64 bit time wwith 100ns granularity [04:04:17] same as yesterday [04:04:25] (rxgk response) [04:04:41] probably the same things for lifetime and bytelife and expiration time [04:05:16] simon: bytelife becomes a 32 bit log2 of the number of octets [04:06:35] jeff: inconsistent descriptions of security levels in the document between 7.6 and the place where 4 are listed [04:08:06] (jeff: explains bind layer; if we don't do crypto over tls we are vulnerable to mitm) [04:08:14] jeff: ivecs? [04:08:16] simon: no [04:09:20] simon: jeff suggested making ivec be pseudo header; i prefer to just have them be decrypted in order [04:09:26] marcus: is sequence number per call? [04:09:28] jeff: yes [04:09:49] marcus: how are the channels ordered? how do you know what order? [04:09:56] simon: maybe we need an ivec [04:10:10] jeff: with window size growing, is queueing packets going to eat memory? [04:10:15] simon: maybe we need an ivec [04:10:21] marcus: chaining them together? [04:11:20] discussion of whether we need per-channel keys [04:12:07] simon: we won't do chaining, either use ivec of 0 with a safe crypto system or use pseudoheader as ivec [04:12:11] jeff: the latter is safer [04:12:19] kerberos likes ivec 0 but why? [04:12:27] marcus: key as ivec burned kerberos [04:12:56] jeff: peer sent mic, needs to mention padding and length of the output [04:13:06] simon: you already know it, it's a property of the encryption type [04:13:07] jeff: ok [04:13:14] jeff; acknowledgements are wrong [04:13:18] simon: i know i need to fix it [04:13:43] marcus: link up usage and [04:13:54] jeff: wait, can you add marcus' diagrams? [04:14:00] matt: can add pdfs [04:14:05] simon: text is canonical [04:14:15] marcus: i admire your ascii format but... [04:14:19] jeff: i have this tool jave [04:14:24] simon: lunch! [04:14:34] marcus: you define seclevels, etc [04:14:44] the numbers you pick are not rxkad [04:14:59] i discovered i needed to map rxkad and my own levels. fix yours too [04:15:01] simon: ok [04:15:10] (make them match rxkad to start with) [04:15:19] marcus: byte limit questions already addressed [04:15:28] will verify revised wording [04:15:40] reservations about combine tokens [04:15:51] derrick: generic, afs or both? [04:15:52] marcus: yes [04:15:58] want a local-only version [04:16:01] even if server only [04:16:05] describe in more detail [04:16:11] simon: what do you need? [04:16:16] marcus: what's in the token? [04:16:27] simon: implementation-dependent, will go in the afs document [04:16:37] needs to be written up in more detail [04:16:41] extension also need to be defined [04:17:07] marcus: start time could be chosen randomly [04:17:15] how is it different from client nonce? [04:17:23] simon: it's just another nonce, used elsewhere [04:17:40] marcus: your authenticator defn came from the afs3.0 spec? [04:17:43] simon: not sure [04:17:49] marcus: read the citi paper about this [04:18:00] simon: i suspect it came from arla [04:18:16] will look at it [04:18:44] marcus: define usage; you use key to check validity of response. say what key,response [04:19:04] pseudo-header. is this part of the payload? [04:19:23] simon: only when encrypting; otherwise not when only integrity [04:19:34] only point it's shipped it's shipped encrypted [04:19:44] jeff: other payload was outside secrity classes [04:19:55] marcus: just make sure you mention the first packet coverage is mentioned here [04:20:16] the encrypted header call/seq/etc matches data you can calculate other [04:20:31] you don't need this to prove that data [04:20:37] simon: you could just use a checksum [04:20:57] marcus: you don't need this so it's not really a pseudoheader [04:21:28] simon: 3961 says it's hard to include something in the checksum that's not then in the encrypted payload [04:21:47] jeff: can't do iv-based enc/dec where the pseudoheader and the buffer are provided [04:21:57] simon: other option: include just a cksum of the pseudoheader [04:22:06] most checksums are larger than this 96 bytes [04:22:28] marcus: i don't like it anyway; i think maybe avoid using the 3961 routine directly? [04:22:32] --- tkeiser has left [04:22:41] simon: prefer using 3961 for the standards benefits [04:23:00] --- tkeiser has become available [04:23:09] marcus: love suggested i use as crypto in rxk5 in smaller chunks, and that api would lend itself to this [04:23:22] simon: esp with integrity we'd get benefits with in-place work [04:23:30] but it's not limited to this [04:24:14] simon: love's general point was don't use 3961 at all because he dislikes the crypto implementations but it's a huge win to not do it yourself [04:24:24] we don't need to do work; we take the ietf's work [04:24:55] jeff: can you (marcus) give this feedback to ietf that their framework caused you (these) problems when it was used outside kerberos? [04:25:19] marcus: i haven't done so yet [04:26:57] marcus: is the pseudoheader being used to generate the ivec now? [04:27:06] simon: previously no. now, not sure, need to revisit [04:27:51] simon: use rxk5 pseudoheader? [04:27:54] marcus: sure [04:28:56] simon: clear is a bad idea [04:29:01] marcus: i compile with it in rxk5 [04:30:07] jeff: nrl wants auth clear [04:30:32] i think rxk5 and rxgk should default to minlevel to auth and let you hurt yourself if you want less [04:32:20] simon: crypto will bite on poor-performance hardware [04:34:26] marcus: an afs integration paper? [04:34:37] simon: next month [04:34:52] marcus: how long until there's sample code? [04:35:02] esp with combine tokens working [04:35:17] simon: running code within 6 months [04:35:40] complete implementation not integrated within 12 [04:35:59]