[00:02:16] --- Russ has left: Disconnected [03:22:32] --- dev-zero@jabber.org has left [03:30:40] --- manfred furuholmen has left [04:01:18] --- dev-zero@jabber.org has become available [07:13:34] VIOCGETFID doxygen stuff fixed in cvs, thanks. [07:14:43] Oh, good; thanks [07:16:11] Now if only I had a way to get a volume name... [07:18:06] Should RT send me email when I get given bugs? Just noticed 123797 [07:18:10] --- dmontuori has become available [07:18:49] i thought so [07:20:02] No. We send mail only on ticket creation and when there are comments or correspondence. [07:20:48] We could change that easily enough. [07:20:52] It's the same bug as 99477 - RPM builds for F9 and F10 don't work unless you explicitly give the kernel version on the command line. I've got a fix in a VM somewhere, but never got as far as properly testing it. [07:21:21] Changing it would be good. I only intermittently check RT - notification of bug assignment would at least prod me into paying attention. [07:22:13] i agree [07:26:08] OK; from now on, for tickets in -bugs RT will notify the owner on owner or status change. [07:26:53] Thank you. [07:41:05] --- reuteras has left [08:00:52] OK, pioctl saves the day. VIOC_FILE_CELL_NAME gives me a cell name, and VIOCGETVOLSTAT gives me a volume name. [08:01:08] Well, it gives me what the fileserver thinks is the volume name, but that'll be good enough for this purpose. [08:03:53] Now for my next trick, I'll turn a volume name into a list of people with 'a' access on its root. For that, I'll need 1.5.x. [08:06:39] --- dev-zero@jabber.org has left [08:28:07] secureendpoints: ok, will try to identify, assuming it isn't something I provoked myself (don't think so) [08:29:20] just be warned. a lot of time can be wasted on it. I already have. [08:30:28] Yeah, it's tough. [08:34:16] there isn't anything wrong with the logic. the fault occurs in the lock release code. the wrong value ends up in a register. the actual memory is correct [08:36:07] when I first saw the problem I thought it was a hardware issue because it only occurred on one machine but then I came across it elsewhere. [08:36:46] its not consistent and only occurs with that one lock. happens more frequently with checked builds than release builds [08:40:34] That sounds like either a failure to do locking correctly, or the result of an incorrect assumption that reads and writes will be done in the order you intended even in the absence of the required memory barriers. [08:42:43] --- SecureEndpoints has left: Replaced by new connection [08:43:04] --- SecureEndpoints has become available [08:43:32] one would think. the code in question is src/WINNT/client_osi/osibasel.c [08:43:59] all of the protection is provided by CRITICAL_SECTION system objects. [08:44:29] the code that is suffering the problem is the code that implements the locks used by the cache manager [08:45:06] if there is a problem there it dates back a long ways and is not obvious [08:46:49] yeah. i looked and saw nothing. [08:47:42] IIRC, CRITICAL_SECTION objects prevent interrupts, but may not provide memory barriers. [08:48:26] But it's probably been over a year since I read any relevant documentation. Which branch are you looking at? [08:50:17] If I can ask a dumb question, how can lock_VerifyOrderRW itself be called with no locks or critical section held, nor take any? (pardon if I misread) [08:51:57] Because the only question is whether _this thread_ holds the lock, and that state can only be changed by this thread. [08:52:40] Particularly, lockRefH/lockRefT are TLS [08:53:32] Of course, lock_VerifyOrderRW doesn't appear to be using the TLS accessor on lockRefH, which is probably a bug. [08:54:05] Anyway, if critical sections do not imply memory barriers, this code is really totally unsafe. [08:54:15] MSDN notes "Acquiring or releasing a critical section consists of a memory barrier, an InterlockedXxx operation, and some extra checking to handle recursion and to fall back to a mutex, if necessary" [08:54:29] http://msdn.microsoft.com/en-us/library/bb310595(VS.85).aspx [08:54:42] OK [08:57:18] The same page indicates that adding an explicit MemoryBarrier() would add only 20-90 cycles [08:57:20] Does InterlockedIncrement imply a memory barrier? At least, with respect to the thing being incremented? [08:57:49] I think the implication is no (wrt their macro) and I am rather certain the instruction(s) per se do not. [08:58:13] lock_VerifyOrderRW is passed the value of lockRefH and it comes from lock_ObtainRead(),lock_ObtainWrite() which uses the TlsGetValue() accessor to obtain it. [08:58:55] The initialize operations don't actually take the relevant CS. That could be a problem if there does not happen to be some other memory barrier involved, because it could mean that some threads trying to take that lock will read mp->atomicIndex before it has been initialized. [08:59:43] Oh, I missed that the value it was using was a copy and not the TLS handle, which is actually named tls_LockRefH. OK. [09:00:15] tls_LockRefH is simply an index that is passed to TlsGetValue/TlsSetValue [09:00:38] I asked about InterlockedIncrement because the init code also assumes that InterlockedIncrement on atomicIndexCounter from multiple threads will be safe, which is true only if a membar is implied. [09:01:01] Yeah, I understand that. I just missed that the access had been done elsewhere. [09:01:59] the value which is appearing to be zero is 'level' [09:02:50] it doesn't matter if atomicIndex is unique. This code shares a pool of CRITICAL_SECTION objects across all of the locks. [09:03:34] I suspect that was done to support platforms which had limited numbers of kernel objects. [09:03:43] Yeah, I guess that's true. But it _does_ matter if every thread accessing a lock uses the same index, which requires the membar after writing mp->atomicIndex [09:05:53] Maybe. Why are we implementing our own mutexes anyway? [09:06:08] inherited code [09:06:30] Fair enough. So, level should be immutable once the lock object is created, right? [09:06:44] level and atomicIndex are immutable [09:06:56] as is type [09:07:03] I came into this late; are you able to reproduce the problem on demand with reasonably high probability? [09:07:41] depends on the day of week, the phase of the moon, and the machine. for periods of time yes, then no [09:08:12] the really interesting issue is that the problem is only ever triggered on a single lock [09:08:38] What I'm really asking is, if we thought we had a fix, is it likely that we could gain some experimental evidence that it works? Oh? What lock? [09:08:40] LOCK_HIERARCHY_SMB_STARTED [09:09:13] grep sources in src/WINNT/afsd/smb*.c for the allocation and usage [09:09:17] there's only one of those, right? [09:09:24] yes [09:09:45] I didn't recall windows providing rw locks--looks like SRWLock is added in Vista only? [09:10:26] Surely it provides mutexes, though. [09:10:30] it does [09:10:38] But I recall critical sections to be cheap--not so heavy we need to share a pool [09:10:42] Does the problem happen on any particular release? [09:10:49] on my to do list is to re-implement osibasel using system mutexes [09:10:59] it is release independent [09:11:14] of course the lock order validation code is very new. I added it over the summer [09:11:33] Right. Is that code always turned on? [09:11:33] it found dozens of deadlocks [09:11:36] no [09:11:55] Does the problem occur even when it is not turned on (maybe you answered this already?) [09:11:56] it is on by default in the checked builds and off by default in the release builds [09:12:18] the problem is strictly in the lock order validation. [09:12:37] if there is a symptom that shows itself outside of that code I am unaware of it [09:12:38] So it never happens in release builds? [09:13:41] you can turn on the lock order validation via the registry. then it might happen there. [09:20:35] Am I understanding this right? The behavior is that in lock_ReleaseMutex on that lock, the level appears to be 0 and so the validation stuff never runs? [09:21:55] matt, the reason that a shared pool is used is historical. there were limits on the number of critical sections available for Win95 and djgpp [09:22:22] yes, I follow, I was wondering whether we would continue to do now [09:23:28] jhutz: right [09:23:35] I've got a call [09:23:40] back later [09:24:59] Hrm. I was about to suggest looking at the memory layout to figure out what comes before that that might have been overflowed. But IIRC you said something about the value in memory being right. Bizarre. [09:25:26] --- Derrick Brashear has become available [09:26:04] the scrollback the server sends ends just before that but iirc he said the register was wrong, memory correct [09:26:09] --- Derrick Brashear has left [09:26:50] jhutz: yes; I'm not sure if the think I saw is the same, but I have a scenario I can try to repeat [09:26:54] thing [09:34:49] --- dev-zero@jabber.org has become available [10:18:08] --- RedZBear has left [10:34:19] --- dwbotsch has become available [13:38:20] --- Russ has become available [14:14:19] --- dev-zero@jabber.org has left: Replaced by new connection [14:14:20] --- dev-zero@jabber.org has become available [14:56:55] --- dmontuori has left [16:01:14] jhutz: do you find the XO keyboard to be annoyingly small? [16:01:33] xo is the olpc? [16:02:20] yeah [16:02:54] i find the biggest annoyance is every device has a different small keyboard. i don't find that one, particularly, to be any more annoying than any other (different) one [16:03:42] hmm [16:04:19] the family computers are all failing at the same time [16:04:33] 2 laptops and a desktop, so I'm stuck researching replacements :) [16:04:40] the xo has an annoying small keyboard. plug a usb keyboard into it [16:04:55] of course my three year old niece thinks the keyboard is the perfect size [16:05:45] granted [16:07:40] I'm contemplating suggesting they replace the 2 laptops w/ netbooks/olpcs, since the laptops are almost exclusively used for siblings schoolwork [16:10:13] oh, btw, power cord came today :) [16:51:33] --- tkeiser@sinenomine.net/owl has left [19:02:04] Yes, of course I do. It's designed for hands much smaller than mine. But I also found that it's possible to get used to it, and the display is absolutely fabulous in sunlight, unlike any regular laptop (by design). I spent a bunch of time last summer sitting outside writing software and documentation on it. [19:04:24] The XO's keyboard is actually fairly standard, down to the ~ and escape being in the right place. It's just that everything is a bit smaller and spaced a bit more closely together, which results in a keyboard that should be reasonably comfortable to type on, if you're 10. [19:05:20] what power cord? [19:07:05] Also, what sort of schoolwork? The box comes with a reasonable variety of software, but does have some limitations. For example, I don't think there's any printing support. [19:20:08] --- matt has left [19:36:07] power cord is unrelated to olpc [19:36:30] they've got thumb drives, mostly just wordprocessing and browsing [19:36:58] wrt power cord: I left my macbook power cord at school [19:43:49] Ah, OK. The machine has something like 3 uSB ports, and an SD card slot (though the latter is a bit inconvenient to get to, being located on the bottom edge of the display). There's a word processor, and a functional browser. [19:44:24] intriguing [19:44:45] Oh, that's always annoying. I don't know how expensive the Mac power bricks are, but I try to keep a few for my machine - one for home, one for the office, and one to carry around in the laptop bag. [19:45:41] this was $80 [19:47:37] Yeah, that sounds about right. I wonder if Apple has settled on a connector they're going to use long enough to make that a reasonable long-term investment. Dell has used the same connector for several years now. [19:54:00] hopefully magsafe will stick around, I really like it [23:03:21] --- Russ has left: Disconnected [23:14:41] --- reuteras has become available