[01:54:45] --- dev-zero@jabber.org has become available [01:56:12] --- dev-zero@jabber.org has left: offline [05:21:56] --- SecureEndpoints has left: Replaced by new connection [05:31:20] Anyone around who knows the difference between BulkStatus and InlineBulkStatus ? [05:55:04] yes [05:55:24] And would they like to share? [05:57:10] bulk does a for loop over files. if you hit an error you stop filling, and return what you have with an error. [05:57:23] Ah. Okay. [05:57:40] inlinebulk will not return an error unless it's an error with the fileserver. if you can't stat a file, that's returned inline for just that file [05:57:55] Okay. Thanks! [05:58:09] this was to avoid throttling, at the time [05:58:27] since otherwise you could end up recalling bulkstat 30 times on a directory with 30 things. [05:58:35] This error seems to be bulkstat related. The client goes kaboom when it receives the bulkstat results, and it looks like the dentry for part of the path is getting corrupted. [05:59:27] bulk only? [05:59:31] and not inline? [05:59:48] Well, both, I think. [06:00:03] I've only tried disabling DOBULKSTAT, not running with one, and not the other. [06:02:27] and no DOBULK = no problem? [06:02:37] As far as I can tell, yes. [06:05:06] Judging from the network, it seems to be entirely related to the amount of time the fileserver takes to respond to the InlineBulkStat. If the fileserver is fast, all is well. If the fileserver takes its time, then we race with something, and kaboom. [06:05:38] well, what happens if you cheat and, say, in DoBulk either 1) ifdef out the "merge in the info" or 2) maybe just the VLRU reordering [06:05:58] I will take a look. [06:06:33] we may miss holding xvcache somewhere and the vlru ordering testing is getting sad [06:06:58] Yeh. General memory scribbling could example this bug, as well ... [06:07:33] xvcache is one of the ones where we seem to get read locks when we really should have exclusive access, too. [06:42:36] Well, with VLRU reordering disabled, I so far have been unable to make it go boom. [07:35:32] so who's evil and also reordering (i'm betting) while not holding a lock [07:36:00] I'm looking... not found anything as yet. [07:36:10] did you undef the entire lock 132 section, or just the if (tvcp != lruvcp) { section? [07:36:45] All of lock 132's scope. [07:37:45] And, it just went boom. After nearly an hour of testing... [07:37:55] So that, actually, isn't the problem. Bah. [07:37:58] there's timing. [07:39:57] Indeed. It had just been happily sitting there, ls-ing away. [07:40:54] Needless to say, the stack trace is identical. I'm going to try and soak with bulk stat disabled (I only ran that for an hour or so), and just make sure we're not barking up the wrong tree. [07:41:33] ok [07:41:52] i'm going to vanish soon; almost time for church. [07:42:10] No problem. [07:42:27] This next test run needs to take a while, anyway. [08:22:33] --- dev-zero@jabber.org has become available [08:22:34] --- dev-zero@jabber.org has left: offline [09:59:23] --- Russ has become available [10:08:16] Here's a question. If one of the entries in an inline bulk stat results in an error, that entries vcache never seems to be cleared up (beyond clearing the BulkFetching flag). But similarly, it's not populated with any of the stuff that afs_fill_inode would put there... Is that likely to be a problem? [10:49:57] --- SecureEndpoints has become available [10:52:30] last chance for 1.5.58 patches [10:53:07] Nothing that's fit to print ... [12:49:10] simon, how usable is dicon? [13:16:00] Define usable? [13:16:41] just... usable... I don't know how to describe it. Functional enough to get stuff done? [13:17:37] It's functional enough for use. Personally, I don't yet trust it with the only copy of important data. [13:18:04] All of the bugs I am currently aware of are in RT. The only one that will make you really sad is the hard link bug. [13:18:39] Of course, until backup storage (which is what I was working on, until I started trying to track down this Linux crash) is done, you'll lose data if your machine loses power whilst disconnected. [13:18:55] And the absence of pinning means that you have to populate your cache yourself. [13:20:06] ok, thanks [13:21:41] mostly just curious, but was wondering about offline access. Intriguing [13:23:01] What I really need to do is to test it more. It passes everything I've thrown at it (mainly because I've been fixing things as I go) but I need some more comprehensive test suites. [13:23:09] ... and some more time. [13:24:28] hmm [14:54:57] So, with bulkstat disabled, no kaboom. But, even if all of the handling of results in bulkstat is disabled, we still go bang. My suspicion is the NewVCaches that bulkstat creates, but that's just a hunch with no real evidence. [15:17:03] fun. [15:18:12] I also think we're not getting the appropriate locks, in the case where BulkStat works. We update the inode, without holding its mutex (normally this isn't a problem, because the VFS holds the mutex before it passes it to us, but in this case we're handling an inode we've made ourselves) [15:20:38] Oh, and the logic for clearing BulkFetching is wrong in the Inline case. Because we use flagIndex to determine where to start clearing the CBulkFetching flag, if InlineBulkStats has given us an error entry, we'll miss clearing that one's flags (and be out of step for the rest of the loop). But I don't think this is my problem. [15:22:06] Any hoo, I've been staring at this for far too long, and someone stole an hour from today, so I'm off to sleep. [23:06:14] --- reuteras has become available