[05:39:49] --- mdionne has become available [05:47:51] --- mdionne has left [06:10:51] --- Derrick Brashear has become available [06:28:34] --- Roman Mitz has become available [06:28:45] --- jaltman has become available [06:28:46] --- Roman Mitz has left [06:32:20] --- Marshall Vale has become available [06:34:59] --- mdionne has become available [06:38:18] !Agenda Bashing [06:44:13] --- steven.jenkins has become available [06:53:47] discussion of what happens at xdr level or not [06:55:38] question of whether xdr (not rx) extensions could be done without ietf being upset if we use the rfc process given other xdr users [06:56:14] simon suggests using rxgen to implement given that rxgen is the compiler for rx's variant of xdr [06:57:02] matt argues question of what other people would do. [06:57:18] simon suggests tom's document is ready, question is how we make it so others can refer to it. [06:57:25] matt: implement and move on [06:57:39] treat as afs/rx extension; implement and use [06:58:28] (no need to summarize on my account. I'm working today and can't participate in discussions here) [06:58:48] digression standardization: simon suggests since iesg seems to be pushing us informational that perhaps iesg is not friendly to us and we should publish our our series [06:58:59] steven: doing this so it's logged, did it in edinburgh also [06:59:09] tx. [06:59:25] rehash of discussion from new jersey about keeping standards separate from openafs [06:59:53] basically we can maintain that separation without using the rfc series (independent submissions) if the ietf will not be friendly to a non-working-group creating documents [07:01:10] discussion of how ietf would handle afs if a working group were formed. rubber-stamp of existing protocol would not fly. [07:01:55] tom asks if a document on the afs directory format would be historical or informational. jeff: only working groups can be historical. we are all informational [07:02:42] simon: to some extent formality may not be best use of time [07:05:46] discussion of standards process, what issues exist as far as doing things without ietf involvement [07:06:41] simon suggests the documents not be published by ietf but simply the last draft as approved gets published by AFS. [07:06:51] marcus: publish documents as whitepapers alongside source code? [07:07:09] simon: archival series outside openafs can be done, the documents are available [07:08:12] matt: wither arla? (yes) [07:08:22] simon: also david howells (kafs) [07:09:04] simon: let afs3 process publish agreed-upon documents and consider it done [07:11:12] --- tkeiser has become available [07:11:30] simon: 2 repos, 1 which is "consensus" and one which is "implemented" [07:12:09] derrick: reuse infrastructure for openafs to allow a new web site for this [07:13:03] discussion of the correct spelling of standardi(sz)ation [07:13:59] back to topic: enumerating extended data types for discussion [07:14:13] xdr: extended union [07:14:17] afs3: [07:14:27] time, timestamp, reltime [07:15:16] derrick: "rx-xdr" extended union? [07:16:21] wordsmithing for pc-ness not necessary if this is afs-only [07:16:26] tom: new uuid type? [07:16:48] efficient encoding for uuid [07:17:24] afsfetchstatus type [07:19:24] no new data types needed for fetchstatus, can wait [07:20:42] afsuuid (next gen) [07:21:06] simon: how to deal with rxgen [07:21:22] use common header? need to solve multiple inclusions [07:21:59] http://datatracker.ietf.org/doc/draft-keiser-afs3-xdr-union/history/ [07:22:20] tom: trying to define directory and on-wire acl formats [07:23:43] --- matt has become available [07:24:26] simon: do we need the time variant with resolution? [07:24:45] how do you convert time 0 from one representation to another? epoch changes [07:25:24] jeff a: how to move from the unix epoch (32 bit) to a new special case [07:26:21] discussion of pre-1970. simon: use negative time [07:27:00] simon: time 0 can be "no time", "no expiry" etc and we need a special case, suggests keeping it same [07:27:17] epoch then is representable in 64 but not 32 bit time [07:28:03] andrew suggested previously documenting the 0 translation in each use. simon would prefer to do it once, in one place [07:28:31] --- Simon Wilkinson has become available [07:29:36] jeff h: old representation won't matter for new uses. [07:29:41] discussion of time formats [07:30:14] jeff h: instead of specifying translation, we need to specify what sentinel values mean [07:30:27] simon: minimize hidden gotchas, define sentinels once [07:30:44] jeff a: define the values in afs time, not their meaning for rpc. be done [07:31:01] then define conversions for sentinels [07:31:28] simon: when do we use resolution? well, we will come back to it in refresh [07:33:04] discussion of afs timestamp name [07:33:09] better name needed. [07:33:44] moving on, new uuid type [07:34:09] simon: new uuid is 16 byte vector. just a blob [07:34:13] jeff h: good and bad [07:34:37] must specify structure and how it's constructed so they can actually be universally unique [07:34:49] simon: use one of the 2 existing protocols. be done. [07:35:14] never decompose existing uuids [07:35:30] tom: currently 44 octets encoded, want it to be 16 [07:35:48] simon: vlserver code can be simplified to avoid byteswapping. [07:36:43] ACTION ITEM: matt volunteers to write down and implement new uuid data type [07:36:54] tom: should existing data type doc be merged in? [07:37:27] jeff h: avoid history being commingled with consensus-based documents [07:38:34] matt: should be an easy document [07:38:55] simon: use rfc which describes uuid generation [07:38:58] (rfc 4122) [07:39:03] no conversion process needed. [07:39:23] matt: will describe rfc refresh types and send patches [07:40:00] jeff a: wants i-d to describe new data type [07:40:41] jeff a: have the ietf independent submission process discussion now [07:40:50] does not meet our needs [07:42:38] jeff h: haven't written real afs3-stds charter yet [07:43:48] discussion of whether doing "process" is a use of time that will happen with volunteers [07:48:25] --- deason has become available [07:48:39] discussion of what things might look like with ietf process, jeff h argues we have time to wait and see if ietf will stop pushing back [07:49:25] simon: if each new type is a document it's easy to move forward with each; we may move them all into one document later. we should keep new data types split for now, and if we have "the world's shortest document" so be it [07:50:44] the ietf document process tooling is sufficiently of value that no reason to stop using it for the process [07:51:07] seems we are done with new data types and probably standardization [07:51:25] security [07:51:30] simon talking about rxgk [07:51:31] 2 drafts [07:51:46] one with rx, one with afs3 [07:51:54] entirety of rx implementation is done [07:52:04] will need to be revisited to use the xdr ext union [07:52:16] needs to be changed to do afs time using new type also [07:52:26] otherwise done [07:52:32] the afs3 side: [07:52:47] more work to be done. big thing missing is the callback channel security [07:52:59] a bit more cm integration for combine-tokens [07:53:16] derrick has done some work on callback channel security, needs t be extected to work with an rxgk token [07:53:30] then combine-tokens can be easily used to preclude cache poisoning [07:54:13] also needs vldb flags, but the existing rpc stub in the server breaks the flags implementation, simon procposes using a bitwise flags value on one of the spares [07:54:23] wants to store whether a fileserver is rxgk-enabled [07:54:46] big issue from deployment point of view, requires pthreaded ubik [07:55:20] jeff h: spares where? [07:55:26] simon: in the database object for the spare [07:56:50] only issue is whether there are lurking spares uses in the world [07:57:00] jeff h: mechanism sounds very non-generic [07:57:19] simon: special-casing that there is a security capability for the fileserver [07:57:33] jeff h: not that…. will you allocate flag for each security mech? [07:57:36] simon: potentially [07:57:50] jeff h: combine-tokens ? [07:57:58] simon: it's an rxgk rpc [07:58:06] jeff h: it's an app-specific rpc [07:58:26] simon: 2 versions: one is an rx version of it, the rpc which takes a uuid is afs-specific [07:58:33] also rx-afs specific [07:58:36] fileserver only [07:59:11] simon: pthreaded ubik. stable version required to deploy this [07:59:15] with lwp it just won't work [07:59:35] simon: gss can do network ops, which won't use IOMGR_select which will block the whole process lwp-wise [08:00:06] jeff h: true in theory. in practice the only gss network op is gss init sec context, then only for the first call [08:00:12] simon: we shouldn't be calling that [08:00:30] simon: my interest extends beyond kerberos [08:00:57] would be nice to get a moonshot impl out, their acceptor is heavyweight [08:01:06] and moonshot needs pthreads [08:04:40] simon: rxgk. idea of letting client determine security level. you agree to a minimum security level. no notion in rxgk for a client to decide to up the sec level (e.g. if auth required, allow moving up to crypt) [08:05:28] he proposes to allow client to ask for the same or higher level. if the client asks for higher and the server is not prepared, the server can refuse the connection [08:06:00] jeff h: downgrade attack unless you insure the payload used at different security levels so that nothing valid at one encryption level is valid at another [08:06:19] simon: clear has no header. but the challenge response would have failed by then [08:06:50] jeff h: suppose clear is minimum. client proposes a. simon: no, won't work, challenge is encryted [08:08:52] simon: aklog will need to be turned into a library [08:09:06] all the various token-getting things would be refactored to use libaklog [08:10:08] (5 different token-getters) [08:11:58] jumping aside: will implementations of the times, ext unions be forthcoming? [08:12:05] tom will implement ext unions. [08:12:09] jeff a: timeframe? [08:12:31] rxgk will need both before we can push it [08:12:50] tom: ext union blocking tlv draft so it's top priority for him [08:13:14] consensus we are happy with ext unions [08:13:18] jeff a: ask for last call [08:13:35] --- cchandler has become available [08:15:28] suggestion that we just do it during wednesday coding session [08:17:32] discussion of the language of decoding in tom's document. [08:18:01] simon: an unmarshall error if you fail to decode [08:18:26] if the type is something in a union that you decode but don't understand you simply keep going [08:19:36] tom will change to an unmarshall error in event of corrupted wire, differing xdr implementations [08:21:08] simon: are you looking for an analog of critical and noncritical extension? [08:21:25] a type which i understand but can't decode is a decode error [08:21:32] tom: fine, reduces to 2 conditions [08:22:09] simon: just need a c union to represent what was decoded and an opaque if the type is unknown [08:31:21] --- cchandler has left [08:33:39] --- cchandler has become available [08:38:30] --- Roman Mitz has become available [08:39:59] Derrick is now talking about intermediate file servers [08:40:13] (sharing the contents of one client's cache with other clients) [08:41:01] avoid issues around convergent encryption, by returning a key from the fileserver with the fetchstatus object, or in a new RPC [08:41:14] Use this key to share encrypted cache blocks with other clients [08:41:42] Avoids pushing access control decision into clients, because access is determined by whether you can decrypt the block or not. [08:42:00] ACL changes are done by changing the key when the data version is bumped. [08:42:11] Jeffrey: You want to change the key if the ACL changes on the directory [08:43:15] JeffH (sorry, same as before) doesn't want to be bumping dataversions if a directroy's data hasn't [08:44:17] simon: goal was to tie key to dv to avoid putting it on the wire in every fetchstatus object [08:45:12] JeffH is arguing that encryption keys aren't actually that huge, and maybe they should just in the fetchstatus object. [08:45:36] Derrick: Unless there is a combined operation to let you get the key and the status, you need two RPCs. [08:47:11] JeffH: Do we only need to change the key when the DV changes, and ignore ACL changes? If you had the key you could have read all of the files already, before the ACL change. [08:48:00] Derrick: Encrypted data is not in the cache. The key is only used on the wire. [08:50:18] JeffH: Clarifies that changes in ACLs _do_ have to result in a key change [08:51:55] Now burrowing down into whether we should be storing callback state across restarts [08:54:24] Discussing whether keys need to be persistent across fileserver restarts [08:54:47] JeffH: what do we do about write access [08:55:02] We need an integrity check to stop clients from corrupting each other. [08:57:03] simon:solve by deciding who you are willing to pull from. can't just hash anyway as you don't have a standardized chunk size. need to use something like merkel trees [08:58:34] Derrick: So, is this workable in a model where a client only polls clients that it trusts to share data. [08:59:07] JeffH: Providing that you have an authenticated connection [08:59:46] simon: you can use rxgk. but do you still need the encryption key? [08:59:56] jeff h: 2 different trust domains [08:59:59] so yes [09:02:14] Different security models for different caches of clustore. [09:02:25] Some clusters are completely exchangable. [09:04:31] Consensus: Put an encryption key version number in the FetchStatus message. Provide a separate RPC to fetch new keys that are found to have changed [09:04:55] extended callbacks [09:05:10] matt is polishing another draft which adjusts slightly what was discussed in 2009 [09:05:19] changes result types, unique for each [09:05:35] supports reliable delivery of messages, needed for byte range file locking [09:05:41] new draft later today [09:06:08] prototype byte range locking uses the draft xcb code [09:06:26] async lock issue implemented, xcb interface is use effectively unicast to do so [09:06:50] tom had previously asked, one improvement since edinburgh is sync versus async delivery of xcb [09:07:29] for file ops, async delievry is possible using the existing consistency model depending on what the cm indicates [09:07:37] if it's sync on close or fsync, it's a sync event [09:08:11] other cases e.g. cache too full or best effort storage could be async [09:09:02] contention whether we know (on linux at least) whether we know we are syncing [09:09:08] jeff a: same on windows. don't know [09:09:50] (simon: AFS_SYNC flag) [09:10:20] simon: deploy xcb with sync callback breaks first, then look at how we can tune async as a second deployment [09:12:44] comes in via the Sync "inStatus" mask. [09:13:35] jeff h: distinction is actually of durability. [09:13:43] did change go to stable storage [09:16:51] simon: path forward: start with sync, we can revisit async later [09:17:00] matt: can be done by adding xcb sync barriers [09:17:27] tom: need to ensure we don't end up with O(n^2) messages where we would have had O(n) before [09:18:04] jeff h: a mode where ok to presume that it's ok to not break callbacks until after the RPC returns [09:18:17] and then allow a commit changes RPC call after [09:18:33] existing clients have to keep working the same way [09:18:39] matt: this opens a path forward [09:19:31] jeff h: barring explicit configuration, existing user code must see behavior expected, e.g. at close() or fsync() or dropping a lock, that changes are pushed and visible elsehwre before our call returns. [09:20:08] (to save people who are using the "magic aside lockfile") [09:20:37] matt: will add remaining sync barriers [09:20:45] tom: will be wanted for cache bypass [09:21:11] simon: the afs write on close model exists except for when it doesn't [09:22:21] linux works differently [09:24:05] --- mmeffie has become available [09:24:10] fs storebehind basically bypasses the guarantee [09:31:36] questions of how to review it [09:31:40] can break into smaller pieces [09:31:47] jeff a: are there implementation notes? [09:31:55] matt describes some, probably needs to be more [09:36:08] http://www.alibabapittsburgh.com/images/stories/alibabalunchmenutogether.png [09:41:10] --- tkeiser has left [10:39:38] --- tkeiser has become available [10:54:38] ok, back from lunch [10:54:57] advisory, mandatory locking; direct io, synchronous io; partition uuids [10:55:19] matt: afs byte range locking has been submitted for standardization [10:55:23] there's an implementation [10:55:33] (derrick: a copy in gerrit) [10:56:43] new byte range lock upgrade/downgrade ops existing [10:56:53] share reservations provided [10:57:12] rules defined for taking such locks at open time for e.g. atomic open and lock [10:57:15] @derrick: irix crash dump: The command 'df' was running. 1 dumpsys[../os/vmdump.c: 528, 0xa8000000201c4fb4] 2 syncreboot[../os/printf.c: 1677, 0xa8000000201bd3e0] 3 icmn_err_tag[../os/printf.c: 593, 0xa8000000201bbeb4] 4 cmn_err[../os/printf.c: 159, 0xa8000000201bb244] 5 panicregs[../os/trap.c: 255, 0xa8000000201667b4] 6 k_trap[../os/trap.c: 561, 0xa8000000201668d4] 7 trap[../os/trap.c: 731, 0xa800000020166db0] 8 VEC_trap[../ml/LOCORE/vec_trap.s: 62, 0xa80000002000f3ec] r0/zero:0000000000000000 r1/at:ffffffffffffffe0 r2/v0:0000000000000000 r3/v1:0000000000000008 r4/a0:ffffffffffffbe48 r5/a1:0000000000000108 r6/a2:c000000000708000 r7/a3:ffffffffffffbe40 r8/a4:0000000000000000 r9/a5:0000000000000000 r10/a6:0000000000365480 r11/a7:0003ffffffffffff r12/t0:0003000000000000 r13/t1:0002a00003ffffff r14/t2:0000000000000001 r15/t3:fffffffffffffffd r16/s0:000000007ffd7c80 r17/s1:0000000000000000 r18/s2:0000000000000000 r19/s3:0000000000000000 r20/s4:0000000000000004 r21/s5:0000000000000000 r22/s6:fffffffffffffffe r23/s7:00000000000000ae r24/t8:ffffffffffffffff r25/t9:0000000000000003 r26/k0:00000000003fc876 r27/k1:0000000000000000 r28/gp:a800000020477cf0 r29/sp:ffffffffffffbd40 r30/s8:a80000007c33b000 r31/ra:a800000020192bd4 EPC:a800000020192c38 CAUSE=8, SR=ffa3, BADVADDR=68 9 cstatvfs[../os/vfs.c: 597, 0xa800000020192c38] 10 statvfsx[../os/vfs.c: 456, 0xa8000000201927cc] 11 statvfs[../os/vfs.c: 431, 0xa80000002019272c] 12 syscall[../os/trap.c: 2822, 0xa800000020169538] 13 systrap[../ml/LOCORE/systrap.s: 315, 0xa80000002000e148] r0/zero:0000000000000000 r1/at:0000000000000000 r2/v0:0000000000000496 r3/v1:0000000000000000 r4/a0:0000000010012f08 r5/a1:000000007ffd7c80 r6/a2:000000007ffd7d50 r7/a3:0000000000000000 r8/a4:0000000000000000 r9/a5:000000000000000a r10/a6:0000000004163820 r11/a7:0000000000000001 r12/t0:0000000000000073 r13/t1:0000000000000000 r14/t2:0000000000000000 r15/t3:000000001000c328 r16/s0:0000000010012e40 r17/s1:0000000000000001 r18/s2:000000007ffd7f18 r19/s3:0000000010011ac0 r20/s4:000000001001112c r21/s5:000000000405fee0 r22/s6:0000000000000000 r23/s7:0000000000000001 r24/t8:0000000000001384 r25/t9:000000000410d4b0 r26/k0:0000000000000000 r27/k1:0000000000000000 r28/gp:0000000010019164 r29/sp:000000007ffd7c80 r30/s8:000000007ffd7f14 r31/ra:000000001000c328 EPC:000000000410d4b8 CAUSE=8, SR=ffffffffa400ffb3, BADVADDR=410d4b0 [10:58:24] ability to request lock and have it granted later (delayed) [10:58:47] also ability to cancel lock if it cannot be granted [10:58:54] uses extended version of xcb interface [11:00:22] locks expire at a fixed time. clients expected to renew. deferred locks also expire. assert extend locks inteface allows bulk discard or extend [11:05:18] discuission of requirement for protected callback channel for xcb [11:13:08] https://github.com/your-file-system/openafs-rxgk [11:14:34] matt jumps ahead to lock release notifications. discussion of what circumstances a lock release notifcation would be sent, to avoid message storms [11:15:10] question of how you register/deregister interest in a lock [11:15:21] matt: generic result behavior (using a result vector) [11:15:49] --- Marshall Vale has left [11:19:07] matt: are lock release and lock change the smae message and is are the clients free to poll [11:19:35] jeff h: question of strategy for lock release messages (1 writer or up to all readers) [11:20:35] jeff a: when a lock is denied return remaining time on a lock, to allow a client to know to not poll [11:22:06] discussion of what promise is made with a failure reply [11:24:24] --- Marshall Vale has become available [11:25:02] jeff h: results in a type of callback, which is a lifetime of the remaining time on the lock currently [11:25:45] really to help fileserver manage load [11:30:01] upon receipt of callback, you can poll. fileserver should manage callback breaks to avoid being flooded [11:31:38] jeff a: makes sense to break things out and start pushing the pieces of xcb which do not require authentication [11:31:38] --- Marshall Vale has left [11:42:38] messages which can be sent are cancel, storedata, lock release, (some variants which apply to cancel) [11:43:30] --- Marshall Vale has become available [11:46:28] direct io, sync io hints. intended to be write-side, [11:46:37] O_SYNC would do synchronous stores [11:46:46] O_DIRECT would do direct io [11:47:05] write vnop used for linux, freebsd since only those do O_DIRECT [11:47:14] linux would ideally use a different method [11:47:48] direct io scheduling would be moved for linux into direct io interface (.direct vnop) code now in progress [11:47:55] would add read side which reads past the cache [11:49:01] would add an afs_DirectRead equivalent for that [11:49:10] simon: other issues i have already in gerrit [11:49:32] would rather leave afs_ProcessFS signature the same and use a helper function for direct [11:49:36] same with VerifyVCache [11:50:02] --- Marshall Vale has left [11:51:26] partition uuids [11:51:35] mostly involves revving vlentrys and the RPCs for it [11:51:56] then needs to be merged with other work for vlserver entry changes [11:55:43] simon: aside about ubik database formats [11:59:22] sqlite over ubik [11:59:41] marcus: consider dealing with the offset ubik has, make things block-aligned [12:11:38] --- Marshall Vale has become available [12:17:39] --- Marshall Vale has left [12:25:00] question of whether derrick's multiple files proposal should be pushed. [12:37:36] --- tkeiser has left [12:48:10] --- deason has left [12:49:44] --- Marshall Vale has become available [12:58:14] --- tkeiser has become available [13:17:07] /afs/inf.ed.ac.uk/user/s/sxw/Public/draft-wilkinson-afs3-rpc-refresh-00.txt [13:23:49] needs reltime some places. [13:28:28] jhutz says get rid of VolSync, and put it into FetchStatus [13:30:11] Make FetchStatus structure be an extensible union, with restrictions on who can define new code points there [13:34:42] jhutz: should we pack uniquifier and vnode no together, to in effect just give a 64 bit vnode. [13:37:42] We're now considering a FID being a 64bit volume ID, and a 128bit vnode. [13:38:15] (we'd pack the current vnode and uniquifier into the lower 64bits of the vnode) [13:43:10] Add a int128 base type to xdr [13:49:02] afs_vnid [13:49:14] sorry, did you just sneeze on your keyboard? [13:56:04] We're not going to try and take statistics structures through the standardisation process [14:01:10] +1 [14:01:31] --- mmeffie is now known as meffie [14:22:01] --- Derrick Brashear has left [14:24:44] And, finally, we're going to remove SyncCounter from FetchStatusEx [14:26:49] --- mdionne has left [14:27:53] --- cchandler has left [14:30:18] --- tkeiser has left [14:34:40] --- jaltman has left: Disconnected [14:36:25] --- matt has left [14:36:36] --- meffie has left [14:36:40] --- Roman Mitz has left [14:37:44] --- Simon Wilkinson has left [16:30:34] --- steven.jenkins has left [18:16:10] --- Derrick Brashear has become available [18:18:45] --- Derrick Brashear has left [18:33:24] --- Derrick Brashear has become available [21:03:16] --- Derrick Brashear has left [21:24:07] --- Derrick Brashear has become available