[04:53:47] --- haba has become available [05:27:45] --- haba has left [05:27:45] --- haba has become available [06:31:08] seems like a jabber restart [06:32:42] I have a question: Do/did we have a bug which shows itself like this: During vos backup, the volume sometimes goes offline and needs salvage: Wed Oct 7 00:42:15 2009 1 Volser: Clone: Recloning volume 537092430 to volume 537092446 Wed Oct 7 00:44:23 2009 VAttachVolume: volume salvage flag is ON for /vicepa/V0537092430.vol; volume needs salvage I have a user who uses his volume a lot and seems to trigger that once a month or so. Server version is 1.4.10. [07:13:07] > The first long of the AFSVolSync structure You mean the one that's the volume's creationDate? [07:14:38] --- deason has become available [07:23:08] yes. the volume creation date [07:23:53] haba: if I recall, yes; the fsync-background-consistency-issues or whatever it's called patch I think was supposed to handle something like that [07:24:48] deason: Hm, good, did that go into .11? [07:25:08] er, hmm, maybe not; I don't remember the exact issue, but I can't look into it right now [07:25:54] Ok, I can look around and probably find it. [07:26:40] I don't have full context, but the volume creation date is usually not as useful as you think, because it doesn't change at the times you think. [07:30:22] I found background-fsync-consistency-issues-20090522 http://git.openafs.org/?p=openafs.git;a=commit;h=12e85227c5dbfdb1258718ee3360bffacc4f96ac [07:34:30] so now where to click in the gitwebthingie to see how that relates to releases. [07:35:45] (I could look at ihandle.c but somewhere I expect that there should be a clickety answer :) [07:36:23] --- mmeffie has become available [07:42:23] you can look at the tags from the main page, and see the date of the last commit for that tag [07:42:55] that is, http://git.openafs.org/?p=openafs.git;a=tags [07:53:21] Yes, I have clicked down the "tag path" and end up at en ihandle.c that has as last commit in its history the background-fsync-consistency-issues-20090522, which kindof gave me the info that it should be in 1.4.11. Then If I want to know if it was already in 1.4.10 I can click down that tag tree (and there it is _not_), but somehow that information should be clickable from a version or something like that, [07:53:55] So the answer is "upgrade to 1.4.11 and see if my problem goes away" [07:54:19] Thanks. [07:58:34] there is a git command 'git describe --all ' which is helpful for seeing what release something is in; a clickable thing for it would be nice but I don't know if it exists in gitweb [08:00:22] jhutz: as long as the creation date changes for a replica when it is released and is consistent across all replica sites, the value can be used as an identifier to determine whether a volume callback can be applied to multiple vnodes at once. [08:02:04] we discussed use of creation date on zephyr in fact when volume fetchstatus was first discussed. i thought we determined it was fine to use at the time [08:05:33] it seems like it could get odd if you restore to a single replica or something, but that's a bit special [08:05:55] you can already screw yourself that way [08:08:07] I can believe such a discussion took place, but I don't recall its outcome. The volume creation date _might_ be OK, but I'm not convinced it is guaranteed to be unique across all replicas, even in situations where not all replicas are online (or exist) at the time of release. I might become convinced of that by reading some code. It definitely is _not_ OK to use it as a basis for incremental updates, since the creation date on replicas may be considerably _after_ the last update. [08:08:59] I remember you discussing that when I brought up using the creationdate as the time to use as the dump time for releases [08:09:19] but when you do a release, the creationdate is sent with the dump data, and it replaces the creationdate on the receiver [08:14:51] er, while i get why it'd be an issue, what does that have to do with restoring replicas? [08:16:29] The point is, the creationDate is metadata about the volume, not about the contents of the volume. If what you care about is whether the contents of the volume changed, such as for processing a whole-volume callback [08:17:35] ... you should use metadata that is intended to describe when the contents of the volume changed, instead of fudging it by using an unrelated value that happens to have behavior you want in a particular implementation but certainly has no protocol reason to continue to do so [08:18:06] which... i get, but has nothing t do with restoring replicas, either [08:18:10] yes, but the creationDate on the replicas is the same as the creationDate on the clone; I don't know if the protocol mandates it (comments and such on creationDate are incredibly confusing and misleading) but it's what we do [08:18:26] you're answering a question i didn't ask. if you don't have the answer to the question i did ask, say so [08:18:29] jhutz: I agree with that, yes, on that we should use something specifically intended for detecting changes [08:19:38] --- stevenjenkins has left [08:20:17] shadow: you're not talking to me, right? [08:20:21] no, jhutz. [08:20:30] I'm sorry; were we discussing restoring replicas? I thought we were discussing "volume synchronization information 'allowing something akin to a 'whole-volume callback' on read-only volumes.'" [08:20:56] well, i was asking a question about somethin deason said, and you gave an answer to not that. [08:21:55] I thought that was just me saying "what if you restore to a replica?" and you saying "you're already screwed for other reasons" [08:22:10] I didn't see a question up there... I mean, you're correct [08:22:19] --- stevenjenkins has become available [08:22:49] that's true. you're already screwed for other reasons. if jhutz was addressing something else, fine, sure. [08:23:16] jeff: I am discussing the 'whole-volume callback' on read-only volumes. The docs from transarc (section 5.1.2.2 of FS/CM ProgRef) explicitly state that the reason for AFSVolSync and the inclusion of the volume's creation date is to support "something akin to a 'whole-volume callback'. [08:25:50] I agree that it would be better if last update time was provided because that is an explicit indication of the contents of the volume. However, from the perspective of the cache manager knowing whether or not a whole-volume callback can be applied to a particular vnode, I believe it is sufficient provided that the creation date is consistent across all up to date replicas of the .readonly volume instance. [08:27:20] I think at some point we should allocate another of the fields in that structure to actually carry the update date, and make CM's use that when it is provided (and perhaps explicitly document that if the update date is not provided, the first field must have whatever properties we think are necessary for that) [08:28:11] I agree and if you read my follow up to the AFS3 RPC refresh thread on afs3-stds you will see that I suggested we do that as part of the AFS3 RPC refresh [08:29:49] Note that the docs from transarc are ancient, and while it's reasonable to give it the benefit of the doubt wrt what a field contains, IMHO anything it says about semantics or intended use should be taken with a grain of salt, if the implementation did not actually do that. [08:31:33] well, the server side *sent* is as described, so inasmuch as anything about those fields is protocol, i think that's not an unreasonable assertion that it *is* protocol [08:31:53] jhutz, can you create an openafs list? or, failing that, not get cranky if i do it? [08:42:46] the zephyr discussion took place three times over 18 months. my conclusion from reading is that lastUpdateTime is best. copyDate should not be used and creationDate is reasonable. [08:46:42] Probably. What list? [08:46:55] class shadow [08:47:03] Were those zephyr discussions about this, or about replication? [08:47:13] port-solaris@ [08:47:29] the zephyr discussions were about this [08:47:35] the zephyr discussions were explicitly about whole-volume callback optimization [08:47:49] raised twice my me and once by Derrick [08:49:17] I'll do that if you really want, but I think you should think hard before creating another port-* list. It seems like usually the traffic is better kept on -devel. [08:50:43] -devel not appropriate for the osol-end folks who want a better way to interface with us. they don't care about the rest of our -devel traffic [08:54:48] --- Jeffrey Altman has become available [08:55:53] --- dev-zero@jabber.org has become available [08:57:48] Hrm, yeah, I suppose that's true. [08:58:00] I wonder if I can remember how to create a mailing list. [08:58:27] mailman on the command line takes a switch, emits crap for aliases, iirc [08:58:54] Yeah. But it's also necessary to edit the aliases file on michigan, where the spam filtering happens. [08:59:09] which, presumably, can be copied from existing aliases. [08:59:17] newlist. the mailman tool is newlist [08:59:23] not a switch. uh. [08:59:27] "d"uh [08:59:54] Well, fine. If you want to do it, go ahead. I'll set up the SA aliases [09:01:54] --- abo has left [09:02:21] --- abo has become available [09:02:32] Hm. Seems like michigan:/var/backups/system.tar.gz ought to get copied someplace. [09:07:21] fine, list created, overlay updated. [09:49:00] --- haba has left [09:59:33] looks like something still isn't there yet: port-solaris-request@openafs.org SMTP error from remote mail server after RCPT TO:: host mx2.central.org [128.2.13.207]: 550 Unrouteable address [10:00:19] yes, i did the grand half; i assume jhutz isn't done with the rest [10:00:54] (i assumed you and keiser would both want to be on this list; i was going to invite you after it was up) [10:28:05] --- jaltman has left: Replaced by new connection [10:28:06] --- jaltman has become available [10:52:24] --- Russ has become available [11:27:44] --- dev-zero@jabber.org has left [11:44:42] --- chaz has become available [12:06:50] Hrm. I bet the MX's need configuration I forgot. [12:08:56] Russ: Can you remember where we got to with buildbot on o.s.e? [12:17:36] Too bad I don't remember what it's supposed to be. [12:19:08] Right. They needed the same change I did on michigan, because they're where the work actually happens. [12:52:07] --- chaz has left [12:54:20] Simon: I think you asked me for a package and I installed something, but beyond that, I don't recall. [12:54:39] Yeh. I think the Debian package is too old, sadly. It's got security holes, too. [12:54:59] Figures. [12:55:11] We're not running it, so nothing for us to worry about. [12:55:31] I'm going to try and get this all going on lochranza, and then I'll take a look at getting it going on o.s.e when you get back, I think. [12:55:40] That sounds good. [12:55:43] At the moment, I'm trying to get gerrit to speak JSON to me. Which is painful. [12:56:00] Did we ever sponsor you for a stanford.edu Kerberos principal? [12:56:09] Not that I can recall, no. [12:56:39] Okay. Probably still no need. There's now a remctl interface on openafs.s.e to restart Gerrit, so I thought I'd mention it in case we had. [12:56:59] Ah. That's cool. At the moment, I'm just sudoing to restart it. [12:57:05] Really must work out why that's needed. [12:57:22] Yeah, that's cool. I just added it since it was easy and I needed to add a remctl interface to push the web site anyway. [12:57:29] Cool. [13:07:35] --- haba has become available [13:17:52] --- Jeffrey Altman has left [13:38:05] completely off topic but star wars merchandising has gone too far. http://im.ly/674fd/ [13:39:34] * stevenjenkins coughs up his coffee. [13:39:43] jaltman++ [13:46:49] --- jaltman has left: Disconnected [13:58:31] --- Jeffrey Altman has become available [14:17:19] --- jaltman has become available [14:32:48] Well, I know have a script that can talk to gerrit through its JSON-RPC interface. Which is quite neat. I wonder what damage I can do ... [14:32:54] s/know/now/ [14:42:42] Has anyone got any hardware they'd be interested in throwing into a buildbot pool? [14:47:06] what does buildbot use for authentication? [14:47:21] a password. [14:47:41] The buildslave contacts the master with a username and password to request jobs. [14:47:47] sent over ssh? https? ? [14:48:16] Its own protocol, which runs in the clear, as far as I can tell. [14:49:05] There's a MITM attack possible against the slave (by server replacement), but beyond that, the attacks seem limited to causing the master to think builds that succeeded failed, or vice-versa. [15:02:00] In any case, as a build slave owner, your worry should be the fact that the master is going to be sending arbitrary shell commands to your machine ... [15:29:37] --- deason has left [15:32:14] does buildbot require python 2.5 or 3.1? [15:32:21] or does it not matter? [16:04:10] --- dev-zero@jabber.org has become available [16:11:13] you have to trust your master [16:12:26] --- deason has become available [16:16:34] looks like 2.5 which will be a problem for 64-bit windows [16:24:52] if running buildbot slaves behind a NAT use the --keepalive option on the "buildbot slave" command in order to keep the connection open [17:16:09] --- asedeno has left [17:16:09] --- asedeno has become available [17:16:11] --- asedeno has left [17:16:31] --- asedeno has become available [17:32:12] --- jaltman has left: Replaced by new connection [17:32:13] --- jaltman has become available [17:32:41] --- Russ has left: Disconnected [17:46:12] --- Russ has become available [19:46:03] --- jaltman has left: Replaced by new connection [19:46:04] --- jaltman has become available [20:02:55] --- Jeffrey Altman has left: Disconnected [20:03:00] --- jaltman has left: Disconnected [20:10:37] --- Jeffrey Altman has become available [20:26:59] --- jaltman has become available [21:31:49] --- asedeno has left [21:31:53] --- asedeno has become available [22:16:05] --- deason has left [22:18:00] --- reuteras has left [22:19:47] --- reuteras has become available [22:55:02] --- Jeffrey Altman has left: Replaced by new connection [23:13:13] --- jaltman has left: Disconnected [23:36:22] --- dev-zero@jabber.org has left [23:36:33] --- dev-zero@jabber.org has become available