[00:13:10] --- abo has left [00:42:00] --- Russ has left: Disconnected [03:22:56] --- sxw has left [03:50:03] --- kaj has left [05:39:03] --- Simon Wilkinson has become available [05:40:04] I'm trying to fit all of my TimeMachine backups onto the one TimeCapsule. Is there any reason we're not marking cache files as being excluded from the backup? [05:41:26] mdionne: Is it completely gone from the headers, or do you have to #define one of the compatibility macros to get it. [05:48:39] mdionne: As far as I can see from git, the prototype is still there, but the macros that turn it on have changed ... [06:44:06] istr we should be marking them excluded but i couldn't find an easy way to inject the cache directory into the excluded list [06:51:49] You can set an extended attribute on each file. [06:52:57] that wasn't "documented" at the time i first looked, but yeah, i forgot. [06:53:09] Set the attribute com_apple_backup_excludeItem = com.apple.backupd [06:53:11] i guess that's an easy patch too. afsd gets to do it when it creates the files [06:53:23] we should also exclude those from spotlight [06:53:34] Haven't looked at how you do that yet, but yes. [06:53:48] istr it had a similar problem. i should find my notes on this [07:07:36] and it looks like megacz got a bizarre panic anyway. i asked him to decode it too, but i'm impatient. [07:07:47] 0x4607e500 : mov 0x64(%edx),%ebx 0x29d968 : mov %edi,%esp [07:08:38] So %edx is corrupt? [07:09:13] well, the registers show edx is 0x00000000 [07:10:14] What's the code corresponding to afs_GetDCache+7832 ? [07:10:22] figuring it out [07:10:46] I'll do a pull up of the negative length issue ... [07:10:54] ok, i was gonna ask [07:12:45] oh, you'll laugh [07:13:00] yeh? [07:13:19] it looks like it's in the hunk of code that is if (length > size) { /* The fileserver told us it is going to send more data [07:13:46] confirming [07:14:14] (i have to disassemble; we should probably start packaging the dSYM stuff, but that's a different discussion) [07:15:46] Is call NULL? [07:16:28] Do you have have a gdb that can say what the offset of error is in the call structure? [07:16:38] wow, you ask a lot [07:17:05] no stack. not a global symbol. i have... very little [07:17:38] realize, all i can do is what's in the flat text file that is the panic log [07:17:53] Certainly on Linux, gdb can tell you that if you just have a debugging object - print the address of the element, minus the offset of the structure. [07:17:59] so unless i can decode which register it's in now, hah! [07:18:10] i don't have a debugging object either. [07:18:16] i have the kext we shipped. [07:18:19] Bah. It's down to counting then... [07:18:39] print the address of the element: yes, things i don't have [07:19:05] Actually, what was his architecture? I can probably write a bit of C. [07:19:06] it is definitely that hunk of code. [07:19:10] i386 [07:19:35] there are 2 EndCalls. the other is the "real" one. this one is that. [07:20:16] we're at 0x4607e500 [07:20:16] EndCall is a function, though. I'm pretty sure it's the rx_Error call. [07:20:25] the very next instruction is 0x4607e503 : call 0x2a013e [07:20:55] so looking at this, we're probably in RX_AFS_GUNLOCK, since after that is an assert fail, a couple more movls and lck_mtx_unlock_darwin10 [07:21:11] EndCall is a function i'm using to tell me where i am in the code [07:21:28] like, ....e550 is a call to EndCall. [07:21:55] but it's the first of the 2 in the disassembled function, and code flow says we're in that block, almost certainly in the GUNLOCK [07:22:36] What does RX_AFS_GUNLOCK do? [07:25:01] ok, so looking at how other GUNLOCKs are, no, we're in rx_Error. current_thread is the first instruction in a GUNLOCK in every other GUNLOCK i see. [07:25:09] RX_AFS_GUNLOCK is AFS_GUNLOCK [07:25:16] My bet is that tCall is NULL. [07:25:42] well, that i can't tell you, but that seems likely [07:27:49] If we're a 64bit client, and our rx_Read of the number of bytes fails, then we'll end the call twice. [07:29:30] I think we need to check that code==0 before doing that length check [07:29:48] um. how do you figure? [07:30:31] where's the other EndCall, that is [07:30:56] In the 64BIT_CLIENT code, if the rx_Read of bytes fails to read sizeof(afs_int32), then we end the call, NULL tcall, and set code to an error. [07:31:09] which is where we crashed [07:31:20] where did we end the call before it? [07:31:35] Is it? [07:31:36] the second endcall won't trigger: we set tcall to 0 [07:31:46] I thought we were crashing down where we do the length check? [07:31:48] yes, the first of the 2 calls to EndCall. [07:32:05] line number for where you think we are? [07:32:15] My theory is that we NULL tcall at line 2153 [07:32:18] is there a third call to EndCall lurking i missed? [07:32:33] ah, hang on. i read the other branch [07:32:38] ... and then blow up at line 2217 [07:32:49] Does that match where you think the dump points at? [07:32:54] that means i should find a 3rd EndCall in this disassembly [07:33:09] i think the dump's at 2217. but i will know in a moment [07:33:47] We certainly fail in that case, so we should fix it, even if its not Adam's problem. [07:33:50] yes, you are correct. i missed the first EndCall, at 0x4607e204 [07:34:00] well, i think you're correct, it's adam's problem [07:34:03] Okay, patch against 1.5.x coming up. [07:35:18] --- jaltman has become available [07:40:05] http://gerrit.openafs.org/1112 [07:46:55] trying it [07:48:57] The first (cast) issue doesn't need pulled up - both length and size are int32 in 1.4 [07:49:35] ok. pushed. you wanna pull it up? [07:49:47] Yeh. Will do. [07:55:45] minus changelog and version info, windows is ready for 1.5.69. [07:55:50] so coming back to the issue of dSYM for kexts... it's extra space. maybe i shouldn't care and should just package it as part of the dest tree. but i considered a second package, and a .mpkg which let you choose. that may be overkill. [07:56:23] It depends on what we think our debugging model should be. If we expect users to do their own debugging, then we should probably ship it. [07:57:06] If we're going to try and get them to just report errors well, and 'developers' do the debugging work, then we should store the dSYMs for everything we ship somewhere central, but not distribute them with the installation packages. [07:57:14] well, i think the real question is "do we intend to ship decode-panic whether for users to use or to have run for them" [07:57:33] I think we should ship decode-panic for users to run. [07:57:36] and i think we should do that. ideally i'd like something like the adium "it broke. report it?" screen to pop up and ask [07:57:50] Yes, that would be very nice. [07:57:59] Do we need to ship dSYMs to do that? [07:58:01] for windows, the symbols are in the installer but are not installed by default. [07:58:05] 1113, btw [07:58:23] as decode-panic works today, no [07:58:39] I think that a separate package containing decode-panic and the symbols is the right way to go [07:58:56] i guess actually the right way to do this, since we lack otherwise a smart way to store, is a second .pkg, no .mpkg, and assume no one will install the debug package unless they know they care [07:59:03] Yes. [07:59:05] no, everyone should get decode-panic [07:59:11] But I think the current decode-panic should go everywhere. [07:59:16] ==shadow [07:59:19] k [07:59:27] And we should see if we can automate it. [07:59:36] i don't want you to have to install it. it's like afszcm.cat. i don't want to tell you after you've tried to help me, "no, go back and get this other thing" [08:00:04] Marc and I have fixed a number of problems on Linux that we've never seen, and that have never been officially reported, just by looking at the back traces on kerneloops.org [08:00:07] well, we could automate it more now without cocoa programming. the rc script could run it and set the decoded log aside [08:00:27] yeah, i wish we had something like that for macos [08:00:34] apple won't share, so we have to be proactive [08:00:41] Is it worth asking Claudio if he'd be interested in adding something to his stuff? [08:01:54] actually, that's a decent idea. unsure why i didn't think of it [08:02:11] i assume this would be something in the prefs pane which was a "select this to mail in crash reports" [08:02:27] i assume we make an email address which for now just points to rt [08:03:08] Yes. Or something that pops up whenever it sees that there's a new panic, and asks you if you'd like to submit it. [08:03:18] Pretty much the same as Apple's thing actually. [08:04:11] It might be easier to have a web application It's easier to do a POST to a known website, than it is to navigate the problems of getting email out of arbitrary networks. [08:04:42] well, adium's would be what we want [08:04:55] Yeh. I think that uses an HTTP POST. [08:05:03] it tries to slay the apple reporter, since apple won't give adium the bugs, then submits [08:05:14] I suspect we also want the stack traces to go into a private queue initially. [08:05:20] probably. [08:05:31] (Depending on where we went boom, there might be key material in the registers) [08:05:34] also, i don't want to write the cgi, which is why this fell on its facebefore [08:05:45] If you can get someone to do the rest. I'll write the CGI. [08:07:01] well. we should find someone who likes perl and hve them string up a perl cgi [08:08:40] Don't let that be the sticking point though - it's not much work, and can easily sit along side some of the other stuff we're running on o.s.e. [08:10:06] sure. it's not hard. i'll email claudio and see if he can help [08:12:06] jason is comfortable with perl [08:12:26] Before we cut 1.5.69, let me just check that it builds on Linux :) [08:13:14] please do [08:16:47] well, i should at least figure out how to ship decode-panic for 1.5.69, and maybe also make the dSYM package. but that should be simple [08:16:54] and won't touch anyone else [08:17:58] we ship /Library/OpenAFS/Tools/tools/OpenAFS.prefPane; i should just put decode-panic there [08:18:17] (in tools; not in the prefspane) [08:18:43] Sounds as good a place as any [08:25:24] claudio's in. [08:25:28] says next week [08:27:03] Very cool. [08:32:57] Current 1.5 head tests fine on Linux [10:15:15] --- dev-zero@jabber.org has left [10:15:26] --- dev-zero@jabber.org has become available [10:25:50] --- jaltman has left: Replaced by new connection [10:25:51] --- jaltman has become available [10:31:18] --- Jeffrey Altman has become available [10:33:11] --- dev-zero@jabber.org has left [10:36:29] --- Russ has become available [10:54:34] --- andersk@mit.edu/dr-wily has left [11:05:07] --- andersk@mit.edu/dr-wily has become available [13:10:40] --- jaltman has left: Disconnected [13:14:41] --- jaltman has become available [13:37:46] --- jaltman has left: Replaced by new connection [13:37:47] --- jaltman has become available [14:34:10] --- mdionne has become available [14:44:46] a followup on sbrk() - there's already an update in the glibc git that restores more visibility for sbrk(), so this is an issue that will go away on its own [14:45:53] Cool. [15:49:11] --- Russ has left: Disconnected [15:55:09] --- Russ has become available [16:07:13] --- kaj has become available [16:21:56] --- Russ has left: Disconnected [16:22:45] --- Russ has become available [16:54:25] --- deason has become available [16:57:00] --- Russ has left: Disconnected [16:57:02] --- Russ has become available [17:03:57] --- Russ has left: Disconnected [17:04:44] --- Russ has become available [17:13:35] --- Russ has left: Disconnected [17:15:18] --- Russ has become available [17:15:54] augh, sorry about c7b92a30; why do we sometimes get a negative length, though? [17:18:36] --- Russ has left: Disconnected [17:18:48] --- Russ has become available [17:22:03] should the 'size' parameter in rxfs_fetchInit just be signed? it looks like it's an int32 further up the call stack [18:12:19] --- mdionne has left [18:23:03] --- kaj has left [18:43:58] --- dev-zero@jabber.org has become available [20:50:21] --- dev-zero@jabber.org has left: Replaced by new connection [20:50:34] --- dev-zero@jabber.org has become available [22:05:55] Andrew: there was a bug in the file server code that would return a negative length if the FetchData offset was beyond the end of the file. This was fixed in the file server code shortly after the Edinburgh hackathon. There has been no 1.4 release since then so the bug is still widely deployed. [22:06:48] The bug was present in the OpenAFS 1.0 distribution from IBM. [23:49:05] --- Russ has left: Disconnected