[00:06:46] --- reuteras has left
[00:39:58] --- reuteras has become available
[00:55:47] --- Russ has left: Disconnected
[01:57:19] --- sxw has become available
[01:59:14] <sxw> Master is currently broken on IRIX, which is why buildbot is failing to build everything based off it. Just before loads of other folk start trying to read the IRIX build log :)
[02:39:05] --- reuteras has left
[04:17:16] --- reuteras has become available
[04:29:18] --- lama has become available
[04:30:52] --- jaltman/FrogsLeap has left: Disconnected
[04:59:17] <sxw> I think we're now at a point where both the RHEL5 and HPUX builders complete successfully
[05:13:26] --- reuteras has left
[05:13:26] --- reuteras has become available
[05:15:25] --- reuteras has left
[05:15:26] --- reuteras has become available
[05:29:24] --- phalenor has left
[05:31:41] --- phalenor has become available
[05:53:41] --- phalenor has left
[05:53:44] --- reuteras has left
[05:53:46] --- phalenor has become available
[05:54:17] --- reuteras has become available
[06:37:12] --- jaltman has become available
[06:51:49] --- jaltman has left: Disconnected
[06:51:54] --- jaltman has become available
[07:10:44] --- jaltman has left: Disconnected
[07:32:46] --- deason has become available
[07:39:37] --- reuteras has left
[07:46:18] --- haba has become available
[07:46:41] --- jaltman has become available
[07:48:38] <haba> Since upgrading to 1.4.14, we got error messages "touch: cannot touch `/var/easy/a-pop/transient/resources/SP_Resources.new': No such file or directory" of files we KNOW should be there. It is fixed by fs flush.  Any suggestions? (This is all Centos5)
[07:49:18] <haba> (the path is a soft link to /afs/pdc.kth.se/var/common/easy/a-......
[07:52:00] <Simon Wilkinson> We've seen similar problems here, but they predate 1.4.14
[07:52:17] <Simon Wilkinson> There seems to be some cache consistency problem with directories.
[07:53:38] <haba> I wonder what old version the client on that computer was on before it got up to 1.4.14.
[07:54:26] <haba> (if it is a client issue)
[07:54:46] <Simon Wilkinson> I've only seen them extremely intermittently, but they seem to occur in situations where large numbers of file operations (creation/deletion) are occuring at the same time.
[07:55:11] <Simon Wilkinson> My suspicion is that LocalHero has a race in it somewhere, which leads to the client's cached copy being disjoint from the servers.
[07:57:54] <haba> The previous version was 1.4.6-osd. That worked fine.
[07:59:57] <haba> I can try to figure out in what exact sequence this dir's contents are updated.
[08:00:16] <haba> It's an ugly perl hack that does it.
[08:03:34] --- lama has left
[08:05:28] --- lama has become available
[08:08:34] --- lama has left
[08:09:32] <haba> Simon: The perl does the whole time: [ create(file.new) ; write contents to file.new ; rename(file.new, file); repeat]
[08:10:37] <haba> When this happens, the rename was successful, the directory does not contain file.new but you can not make a new file.new anyway. You _can_ make a file.something-else.
[08:17:43] <Simon Wilkinson> Okay, so the rename is succeeding on the fileserver, but not being reflected in the client's view of the directory?
[08:18:18] <Simon Wilkinson> What I need you to do is to better define "directory does not contain a file.new"
[08:18:26] <Simon Wilkinson> Is that as observed from the client, or from another client?
[08:18:34] <haba> When you do ls, file.new is not there, but when you try to create it you get an error.
[08:18:46] <kaduk@mit.edu/barnowl> Can you look from a different client?
[08:18:48] <Simon Wilkinson> Okay, so that's from the failing client. What does another client see?
[08:18:48] <haba> all same client
[08:19:16] <Simon Wilkinson> Yeah, don't care about that client - it's seeing what's in its cache. 
[08:19:22] <Simon Wilkinson> What I want to know is what another client sees.
[08:19:39] <haba> Don't know (at that particular point in time of the error)
[08:20:13] <Simon Wilkinson> Well, next time it happens, could you find out?
[08:20:33] <haba> Lars did fs flushv before looking and then the perl script does it thing again.
[08:21:24] <Simon Wilkinson> Is file.new still open when its renamed?
[08:21:31] <haba> But I think it will happen again. Will instruct Lars.
[08:21:46] <haba> No, I think it should be closed.
[08:21:58] <haba> I can look if it really is closed.
[08:21:58] <Simon Wilkinson> Actually, scratch that. That's only an issue if _file_ is open.
[08:22:41] <Simon Wilkinson> When you flushv, do you get a directory with file.new in it, or not?
[08:24:24] * haba noting your questions for next time it happens.
[08:25:21] <Simon Wilkinson> Could you open an RT ticket for this, with everything we've discussed so far.
[08:25:27] <Simon Wilkinson> And then update it if it happens again.
[08:25:32] <haba> Will do!
[08:25:38] <haba> check!
[08:27:55] <Simon Wilkinson> Of course, I do still need to figure out how to get RT to email me.
[08:33:29] --- jaltman has left: Replaced by new connection
[08:33:30] --- jaltman has become available
[08:35:43] <haba> Can a nice RT admin add lama@kth.se as a requestor to 129355 as well?
[08:36:24] <sxw> I should be able to.
[08:38:13] <sxw> Another question for you -
[08:38:24] <sxw> what gives the "No such file or directory" error - the create, or the rename?
[08:38:42] <haba> create
[08:39:15] <sxw> And what are the flags that are being passed to open by the script?
[08:39:26] <haba> our perl script says so and you can try with "touch" as well.
[08:39:42] <haba> I can dig....
[08:42:16] <kaduk@mit.edu/barnowl> Somewhat off-topic, but is Arla actually being developed at this
point?
[08:44:14] <shadow@gmail.com/owl1FDC9F18> another interesting question is what is in the cache. since, well,
linux dcache might also be stale?
[08:45:07] <sxw> kaduk: I don't think it's particularly active. I think there's still some platform maintenance work going on, or at least there was last summer.
[08:45:42] <sxw> shadow: Does "fs flushv" help with the dcache, though. I thought if the dcache got mangled, we had to wait until the kernel flushed our bad entries?
[08:47:21] <kaduk@mit.edu/barnowl> Yeah, other than workshop/standards announce mails, I don't see
anything real in the archives since july 2010, for platform
maintenance stuff.
"Maybe I should tell them how it is broken on recent freebsd" but I
think it may be broken for other reasons anyway.
[08:47:40] <shadow@gmail.com/owl1FDC9F18> i thought we ended up forcing a dentry revalidation
[08:47:56] <haba> Whatever perl does with "open (NEW_RES, ">$Easy::Resource_File.new")"
[08:49:10] <sxw> haba: Cool, thanks.
[08:49:11] <haba> kaduk: Send an email to tol@stacken.kth.se and ask if something is going on.
[08:50:36] <sxw> haba: If you can reproduce this quickly, or on a machine which you don't care about slowing down a bit, the AFS fstrace output from the error occuring would be very interesting.
[08:52:10] <haba> open("/var/easy/a-pop/transient/resources/SP_Resources.new", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
[08:52:25] * haba have strace
[08:52:53] <sxw> Yeah. That lets me know the options.
[08:53:24] <haba> then a stat, then a lot of write, then close.
[08:53:44] <sxw> What I'd _really_ love to see is the AFS kernel module fstrace output over the problem occurring. That will let us see what code path is taken over the rename, and then exactly where open() is bombing out.
[08:54:28] <haba> rename("/var/easy/a-pop/transient/resources/SP_Resources.new", "/var/easy/a-pop/transient/resources/SP_Resources") = 0
[08:54:40] <sxw> Yeah, that's strace, not fstrace.
[08:54:56] <haba> I don't know if I can hang in an fstrace there
[08:55:07] <sxw> http://blob.inf.ed.ac.uk/sxw/2009/01/24/using-fstrace-to-debug-the-afs-cache-manager/
[08:55:25] <haba> this is scheduling our cluster jobs, so it should rather not grind to a halt ;-)
[08:56:19] <haba> But of course, no jobs will be lost, so we can start it under supervision and then let it run.
[08:56:23] <sxw> Well, what I'd do is turn on fstrace, but not dump the logs out to disk.
[08:56:45] <sxw> Then, providing you can catch the failure immediately, dump out what's in the fstrace buffer as soon as the command fails.
[08:57:11] <sxw> That should contain all of the recent kernel debugging information, including the bits we're interested in.
[08:58:39] <haba> I'll read your text and then tomorrow talk to Lars about turning fstrace on for the scheduling computer.
[08:58:59] <sxw> Cool. Thank you.
[08:59:10] --- jaltman has left: Disconnected
[08:59:18] --- jaltman has become available
[08:59:28] <haba> Hm. Actually, the perl script could initiate the dumping when it hits the error.
[08:59:49] <sxw> Oh, one last thing. It would also be useful to know whether it's path resolution that fails
[09:00:15] <sxw> Next time it breaks, could you cd down the whole path to the directory its trying to create the file in, then try the touch using just the leafname?
[09:00:31] <haba> can do that as well.
[09:00:54] <haba> Now let my cutpaste what we just had here.
[09:04:12] * haba now heads for Sushi
[09:04:30] --- haba has left
[09:51:35] --- rra has become available
[10:46:30] --- haba has become available
[10:59:30] --- jaltman has left: Disconnected
[11:00:15] --- sxw has left
[11:59:17] --- jaltman has become available
[11:59:20] --- lars.malinowsky has become available
[12:20:33] --- asedeno has left
[12:20:50] --- asedeno has become available
[12:21:31] --- haba has left
[12:24:47] --- haba has become available
[12:39:49] --- haba has left
[13:42:57] <deason> hm, so, this O_LARGEFILE vs _FILE_OFFSET_BITS thing is even more annoying for DAFS, I just realized
[13:43:48] <deason> dafs code locks a per-partition file at a particular offset related to the volume id that certain code deals with; on unix, it's with a fcntl lock, which accepts an off_t as the offset to lock
[13:44:22] <deason> so with vol ids >2^31-1, it breaks, since it interprets it as a negative offset, and there doesn't appear to be an fcntl64 interface or something
[13:45:00] <deason> so right now dafs doesn't work with large vol ids
[13:46:07] <deason> we can either start using _FILE_OFFSET_BITS=64 everywhere, or if we want a band-aid, we could open the lock file once for vol ids <=2^31-1, and again for >2^31-1, with different file descriptors, and I think that would work
[13:48:29] <deason> or we can hold an in-memory lock while we seek on the fd, and fcntl lock relative to the file position... but I don't really like holding an in-memory lock like that over a disk operation
[13:50:26] <jaltman> or use more than one file for different portions of the volume id space.
[13:52:48] <deason> opening the same file twice I think can work, though, and in some ways makes it simpler...
[13:53:04] <deason> but I wasn't sure if I should even bother, or if we want to go to _FILE_OFFSET_BITS=64 sooner
[13:53:24] <deason> > so right now dafs doesn't work with large vol ids
er, where off_t is 32-bits, that is
[14:21:32] --- pod has left
[14:56:12] <jaltman> there are a number of places where 64-bit offsets get truncated including in that fcntl call 
[15:02:32] <deason> truncation isn't the problem; the VLockFile* functions don't accept "file offset"s, they accept "32-bit unsigned integers"
[15:03:17] <jaltman> they need to accept afs_foffset_t 
[15:04:33] <deason> no, because afs_foffset_t can represent numbers greater than 2^32; we do not support locking offsets that large right now
[15:04:36] <jaltman> the fact that a volume id is being passed in for use an as offset is independent of the fact that the function should support locking any offset
[15:05:16] <deason> we cannot do that; fcntl locks without _FILE_OFFSET_BITS set to 64 cannot lock arbitrary offsets in that range quickly
[15:05:24] <jaltman> on UNIX
[15:06:02] <jaltman> VLockFile is not a UNIX specific interface
[15:06:04] <deason> (er, again, on off_t = 32 bits platforms)
[15:06:28] <deason> yeah, but VLockFile* stuff is platform agnostic, so we shouldn't allow locking beyond 32 bits if we can't do it everywhere
[15:06:57] <deason> unless you want to return a runtime error when we receive a too-large offset.... since we don't need them anywhere, a compile-time enforcement seems more desirable
[15:13:38] <jaltman> at the moment the prototype isn't enforcing anything at compile time.  All that is happening is that a warning is generated on some platforms and then value is truncated.   When volume ids become 64-bit this interface is going to have to support it.  If off32_t platforms cannot do so with a single file, then an alternate method is going to be required.
[15:15:05] <deason> where are they getting truncated? nothing should be passing anything but afs_uint32s to the locking functions
[15:16:00] <deason> that's what I mean by compiler enforcement; if anything is giving something larger than a max uint32, it's not going to work so it should be fixed
[15:16:15] <deason> (enforcement by i.e. detecting warnings)
[15:17:01] <jaltman> the vol package is full of 64-bit to 32-bit truncation warnings.  
[15:17:13] <deason> for VLockFile &co?
[15:17:29] <deason> I'm just talking about those; I agree with the rest of the 4195 changes at a glance
[15:17:56] <deason> and yeah, when 64-bit volume ids are supported this will need to change; but by that time I would hope we can just set _FILE_OFFSET_BITS to 64, and then this will not be a problem on unix
[15:18:56] --- rra has left: Disconnected
[15:36:39] --- Russ has become available
[15:45:14] <Simon Wilkinson> How does FILE_OFFSET_BITS possibly relate to fcntl behaviour - we can't suddenly throw 64bits rather than 32 bits into the kernel just because off_t has changed size.
[15:47:11] <deason> I assume fcntl makes a different syscall depending on the size of off_t (or the presence of _F_O_B or some other define), or fills in different syscall arguments or something
[15:47:14] <Simon Wilkinson> Sure you just do F_SETLK64 where you need to.
[15:47:45] <Simon Wilkinson> s/Sure/Surely/, but you get the gist.
[15:48:02] <deason> ah, that's not in my manpage
[16:19:58] --- steven.jenkins has left
[16:26:49] --- steven.jenkins has become available
[16:43:29] --- deason has left
[17:38:44] --- summatusmentis has become available
[19:07:38] --- jaltman/FrogsLeap has become available
[20:20:07] --- deason has become available
[22:25:50] --- deason has left
[23:16:41] --- reuteras has become available
[23:25:26] --- lars.malinowsky has left