[00:21:18] --- haba has become available [00:43:37] --- rdw has become available [00:54:22] --- haba has left [01:18:22] --- Simon Wilkinson has become available [01:21:42] --- haba has become available [02:30:07] --- Simon Wilkinson has left [02:56:39] --- abo has left [02:56:55] --- abo has become available [03:48:41] --- Jeffrey Altman has left: Replaced by new connection [05:13:57] --- Simon Wilkinson has become available [05:18:08] --- Simon Wilkinson has left [05:32:15] --- meffie has become available [05:48:08] What is the best URL to give an AFS noob which has clicked through the installer (hopefully with success) and then has to learn that his files are under \\afs\...... ? [05:48:37] (of course Windows in this case) [05:48:40] --- Simon Wilkinson has become available [06:01:32] --- Simon Wilkinson has left [06:32:20] http://www.dementia.org/twiki/bin/view/AFSLore/WindowsEndUserQuickStartGuide [06:32:52] although even that is very out of date. 1.3.70 [06:33:32] if you are looking for a basic introduction to using AFS in general I think Stanford has a pretty good guide. [06:34:32] At the end that says something about mapping drive letters. [06:35:01] my guess is that this should be done how nowadays (in windows 7)? [06:43:12] 1.3.70 was when? [07:01:02] mapping drive letters should be done via the Explorer Shell. [07:01:45] 1.3.70 was August 2004 [07:04:53] --- deason has become available [07:11:58] We have this one which was not so long ago written by our user support: http://www.pdc.kth.se/resources/software/file-transfer/file-transfer-with-afs/afs-in-windows/installing-afs-in-windows-xp [07:12:57] * haba has no idea what the explorer shell is or how to map drives there [08:16:12] --- Simon Wilkinson has become available [08:22:41] --- jaltman has left: Disconnected [08:22:45] --- jaltman has become available [08:28:34] --- jaltman has left: Disconnected [08:46:23] --- reuteras has left [08:49:51] --- Simon Wilkinson has left [08:51:03] --- Simon Wilkinson has become available [08:56:54] --- Simon Wilkinson has left [08:57:25] --- Simon Wilkinson has become available [09:05:00] --- haba has left [09:05:01] --- Simon Wilkinson has left [09:06:13] --- Simon Wilkinson has become available [09:12:59] --- Simon Wilkinson has left [09:16:27] --- Simon Wilkinson has become available [09:26:40] --- kaj has left [10:06:56] --- jaltman has become available [10:27:48] --- jaltman has left: Disconnected [10:29:17] --- jaltman has become available [10:39:52] --- rdw has left: Disconnected [11:03:38] --- Simon Wilkinson has left [11:15:53] --- kaj has become available [11:44:59] --- Russ has become available [12:34:43] --- Simon Wilkinson has become available [12:53:21] --- Simon Wilkinson has left [13:20:31] --- jaltman has left: Disconnected [13:22:54] --- Jeffrey Altman has become available [13:37:28] --- haba has become available [13:37:32] --- haba has left [13:38:13] --- haba has become available [13:49:01] where is bpw actually being held in UIUC? is it at the hotel / conference center, or in some campus buildings? [13:52:11] --- Jeffrey Altman has left [14:16:21] --- jaltman has become available [14:16:56] the BPW is being held in the CS department building in the same space as ACM Reflections [14:41:45] I remember when that building was new -- all us students were so excited about how much nicer the classrooms were than almost all of the other buildings on campus. [14:49:43] it is a really nice building. [14:53:06] Does someone what to help the gnu find developer come up with a fix to make find fast when running against /afs? [14:53:08] http://savannah.gnu.org/bugs/?24140 [14:56:34] --- mdionne has become available [14:58:52] --- haba has left [15:05:00] I've been a little confused as to why they use a pioctl to see if a file is in afs; can't you get that from statfs() or similar? [15:05:54] (I suppose I should be asking them that) [15:06:14] Apparently not when the directory only has 'l' privileges. The AFS CM does not provide sufficient details to identify the type of objects. See the discussion at the bottom of the history. [15:06:51] There was also a very active discussion in either openafs-devel or openafs-info that was initiated by Richard. [15:07:13] that also applies to statfs/statvfs, not stat/lstat/fstat ? [15:07:55] and yes, I remember seeing it [15:11:27] The slowness appears to be a combination of two factors. (a) mount points are expensive to cross and find does; and (b) when the user only has 'l' and attempts to stat an individual object the file server will respond with access denied. Do this too frequently and the file server will begin to pace the responses. When every request is going to be access denied for hundreds or thousands of files, find will take a long time to finish. [15:17:53] I know, I understand that [15:18:15] in that ticket, the general feel I get is "if we know we are in afs, we know that DT_UNKNOWN means it's not a dir" [15:19:25] correct. the CM knows what is a directory from the vnode number. If it is odd, it is a directory. If it is even, could be a symlink to a directory or a file or a symlink or a mount point; or a mount point; or a file or ...? [15:19:37] Windows has this problem in a big way. [15:20:33] The Explorer Shell (and in fact any directory enumeration) really requires that the type of an object be disclosed along with the name and AFS cannot provide that. As a result, the Windows client lies and says that anything that is unknown is in fact a directory until it can become known. [15:20:58] so it was mentioned to use a pioctl to see if they are in AFS when they notice changing FSs... actually, I'm not sure why they didn't do that anyway [15:21:12] but a statfs or similar would seem easier... [15:21:13] It also means that symlinks must be walked to determine the type of the object they refer to even if the object will never be opened. [15:23:17] concerns of the find author. will testing for afs be a performance hit? and, will the method of determining that we are AFS be stable and consistent across implementations? (openafs, arla, kafs, ....) [15:29:10] the ticket mentions a pioctl approach is 'efficient' [15:29:48] the latter is a good point, though... although, fixing it specifically for openafs is better than nothing [15:32:29] --- deason has left [15:34:50] testing for afs: k_hasafs() run once has never been a big problem. if configure finds libkafs, libkopenafs, link it. then try k_hasafs shrug [15:36:12] if you don't have afs at all, don't try anything further [15:37:42] Hm, does our configure not respect --mandir ? [15:37:57] unlikely? [15:38:02] well [15:38:08] the makefiles probably don't use it [15:39:07] Hrrm. [15:40:27] commit 830cb48c breaks an "enable-checking" build for me - I get a few new warnings, 64-bit only probably [15:41:06] --- deason has become available [15:41:09] weird. oh. i didn't notice because i only built with the warnings on macos. which already used it. sigh. [15:41:41] one case is an easy fix with a cast, the other one is in afs_lock.s where we couldn't use uinptr_t because some platform(s) don't have it it the kernel [15:41:54] afs_lock.c that is [15:42:02] oh, uh. didn't we provide a define for that? [15:42:10] hang on [15:42:57] iparmtype [15:43:36] --- abo has left [15:43:54] i assume this is a MyPidxx? [15:44:32] --- abo has become available [15:44:34] yeah [15:45:09] is it afs_int_to_pointer, MyPidxx2Pid or MyPidxx which needs to be fixed? [15:45:34] within afs_int_to_pointer. we need to cast the return of MyPidxx2Pid [15:46:48] so the fix with uintptr_t would be: afs_int_to_pointer((uintptr_t)MyPidxx2Pid(MyPidxx)) [15:47:26] would the trick iparam32_to_iparam uses (in afs_syscall.c) work? [15:48:26] let me try that [15:52:11] that works, but it needs the definition of uinptrsz from afs_syscall.c [15:53:33] which i guess could be moved to afs.h like iparmtype is [15:53:52] as a bonus it also fixes those same warnings for the KERNEL case [15:55:04] I can whip up a patch - I'll include the other cast I was talking about in afs_usrops.c [15:58:20] excellent. thank you. [16:07:55] submitted as change 1748 [16:10:58] Derrick, do you know if anything changed recently that might explain why that guy from freebsd-afs doesn't have an afszcm.cat? [16:11:13] We started installing it properly in 1.5. [16:11:54] > changed recently That is, since April 4. [16:21:10] no? [16:22:32] Okay. What is installed is consistent with the plist for me. [18:15:48] Alas, I don't appear to have saved enough files to actually get useful debugging symbols in the kernel module from this panic I got a while ago. But from what I can see, it looks like it's trying to dereference a bogus function pointer in afs_GetDownD(). Which is weird, since I don't see any function pointer dereferences in that function. Of course, I'm not actually familiar with x86-64 assembly ... [18:44:17] --- Russ has left: Disconnected [19:05:45] --- Russ has become available [19:21:43] --- mdionne has left [19:34:59] (stark) > a statfs or similar would seem easier... And how do you learn from statfs/statvfs whether it is AFS? [19:36:06] Didn't we have a whole conversation about this at some point? I vaguely recall discussion of the file system ID that you get from statvfs, but don't remember the outcome. [19:37:48] Oh, right, no, it's the file system type we were talking about. [19:38:03] Which is only a statfs, not a statvfs, thing. [19:38:17] f_type? [19:38:47] In theory, the way that you know it's AFS is that f_type is 0x5346414F. [19:38:54] I have no idea how portable that number is. I bet it's just Linux. [19:39:09] I wonder if we do the right things in the kernel to set that. [19:40:45] other things appear to use MOUNT_AFS; for some reason aix sets it to 0 [19:40:46] Right, we had this discussion the *last* time we all tried to help the find maintainer. [19:40:56] That's why this all sounds very familiar. [19:40:58] --- Born Fool has become available [19:41:07] also f_fsid, but I assume we can't guarantee other filesystems stay away from our values [19:41:17] f_fsid should also change by volume. [19:41:30] I don't know if it actually does, but it's supposed to. [19:42:34] as in, a standard says something about it? I remember someone suggesting something like that at one point but that's all... [19:43:05] specifically, as a way to avoid traversing mountpoints or something, without needing afs-specific foo [19:43:08] f_fsid, inode is supposed to uniquely identify a file. [19:43:16] The way you'd do that in AFS is to make f_fsid vary by volume. [19:43:30] Essentially, f_fsid, ino is the statfs equivalent of AFS's FID. [19:43:51] It uniquely identifies a file across the whole system. [19:44:39] inode numbers given to a client already depend on the volume number, so they're supposed to be unique, though they are not in reality [19:45:06] Oh, okay, I didn't know that. [19:45:20] Although that implies that f_fsid should be unique by cell. :) [19:45:29] Since then we could be *really* unique.... [19:49:51] kaduk@mit.edu/barnowl: afs_GetDownD call afs_GetDSlot, which calls a function pointer [19:49:57] "calls" [19:50:11] specifically, afs_cacheType->GetDSlot [19:51:45] Hm. Thanks for the pointer (pun intended, I guess). If I'm reading kgdb correctly, it was trying to call *0x38 . [19:53:30] Thats 'OAFS' in little-endian byte order? I wouldn't expect that to be portable. [19:54:03] But I thought I already fixed the only bug that could be .... [20:05:53] Unless GetDownD somehow got called before dcacheInit had finished. [20:08:51] was this something that crashed very early? I wouldn't expect GetDownD should be called right away unless something was wrong [20:10:11] This was a general protection fault when starting afsd, yes. [20:13:03] cool. [20:13:36] I only saw it once, I think. [20:15:05] yeah, sounds like it could be the cache not init'd yet; structures not initialized so it thinks it's out of dcaches and tries to free them up [20:27:56] Other procs are in: vfs_mount rxk_Listener afs_RXCallBackServer [something with a corrupt stack-->affinity-->rxi_FineIfnet ...] afs_AFSDBHandler-->??-->afs_cachebasedir afs_Daemon-->afs_cachebasedir afs_CheckServerDaemon afs_BackgroundDaemon-->old_all_intvl.18405 (four procs) The offending proc was afs_CacheTruncateDaemon-->afs_GetDownD [20:30:35] (s/FineIfnet/FindIfnet/) [20:35:37] vfs_mount? does that perhaps mean it just finished calling our mount vfsop? [20:35:50] Possibly. [20:35:51] I assume vfs_mount is a fbsd function, which calls our mount function [20:36:05] Right, but it's not in *our* mount function at the time of the crash. [20:36:25] (long trace coming) [20:36:55] yes, but it may mean our mount function screwed up afs_cacheType or something else, and immediately afterwards something trying to use it blew up [20:37:18] (kgdb) bt #0 sched_switch (td=0xffffff0002b12000, newtd=0xffffff000231b390, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1865 #1 0xffffffff8059cb2d in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:449 #2 0xffffffff805d0f03 in sleepq_switch (wchan=0xffffff8028324fa0, pri=76) at /usr/src/sys/kern/subr_sleepqueue.c:530 #3 0xffffffff805d1aed in sleepq_wait (wchan=0xffffff8028324fa0, pri=76) at /usr/src/sys/kern/subr_sleepqueue.c:609 #4 0xffffffff8059d147 in _sleep (ident=Variable "ident" is not available. ) at /usr/src/sys/kern/kern_synch.c:234 #5 0xffffffff8060d924 in bwait (bp=0xffffff8028324fa0, pri=Variable "pri" is not available. ) at /usr/src/sys/kern/vfs_bio.c:3905 #6 0xffffffff8060d9a6 in bufwait (bp=0xffffff8028324fa0) at /usr/src/sys/kern/vfs_bio.c:3146 #7 0xffffffff80613a1e in bread (vp=Variable "vp" is not available. ) at /usr/src/sys/kern/vfs_bio.c:748 #8 0xffffffff807c3624 in ffs_vgetf (mp=0xffffff00029ddbe0, ino=117760, flags=94064, vpp=0xffffff803cf71500, ffs_flags=Variable "ffs_flags" is not available. ) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1520 #9 0xffffffff807ce4e9 in ufs_lookup_ (vdp=0xffffff000293ad20, vpp=0xffffff803cf71880, cnp=0xffffff803cf718a8, dd_ino=0x0) at /usr/src/sys/ufs/ufs/ufs_lookup.c:710 #10 0xffffffff806157a0 in vfs_cache_lookup (ap=Variable "ap" is not available. ) at vnode_if.h:80 #11 0xffffffff808d3ef5 in VOP_LOOKUP_APV (vop=0xffffffff80be4a00, a=0xffffff803cf71620) at vnode_if.c:123 #12 0xffffffff8061c426 in lookup (ndp=0xffffff803cf71850) at vnode_if.h:54 #13 0xffffffff8061d3d9 in namei (ndp=0xffffff803cf71850) at /usr/src/sys/kern/vfs_lookup.c:264 #14 0xffffffff80620b3d in vfs_donmount (td=0xffffff0002b12000, fsflags=0, fsoptions=0xffffff803cf71ac0) at /usr/src/sys/kern/vfs_mount.c:873 #15 0xffffffff80621b21 in kernel_mount (ma=0xffffff005905a040, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/vfs_mount.c:2475 #16 0xffffffff8061fa04 in mount (td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_mount.c:793 #17 0xffffffff808817ee in syscall (frame=0xffffff803cf71c80) at /usr/src/sys/amd64/amd64/trap.c:990 #18 0xffffffff80867a51 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:373 [20:40:36] --- abo has left [20:41:06] Looks like that's where it is looking up the inode of the directory that will be the mountpoint, from a quick reading. [20:41:21] So, before our vfs_mount gets called. [20:41:21] --- abo has become available [20:43:02] can you see what afs_cacheType is? I assume it's null? [20:43:26] I don't have that symbol available, I think. [20:43:36] Oh, wait; typo. [20:43:56] (kgdb) p afs_cacheType $1 = 11708640 (kgdb) p *afs_cacheType Cannot access memory at address 0xb2a8e0 [20:44:48] huh; afs_MemCacheOps and afs_UfsCacheOps I assume are fine? [20:45:14] what about.... afs_cacheinit_flag, afs_cacheFiles, afs_cacheStats [20:46:14] (This is a memcache afsd.) (kgdb) p afs_MemCacheOps $2 = 10283536 (kgdb) p afs_UfsCacheOps $3 = 10583200 (kgdb) p afs_cacheinit_flag $4 = 1 (kgdb) p afs_cacheFiles $5 = 1439 (kgdb) p afs_cacheStats $6 = 2158 [20:48:35] well, afs_MemCacheOps is a struct.... does that just print out an int at that address? [20:48:43] &afs_MemCacheOps [20:49:16] but afs_cacheType is only set to one of those ops, and once... you have something writing garbage to that memory, I guess? [20:49:30] (kgdb) p &afs_MemCacheOps $7 = ( *) 0xffffff8000b2a8e0 (kgdb) p &afs_UfsCacheOps $8 = ( *) 0xffffff8000b2a880 I seem to have not saved all the debugging symbols I thought I did. [20:49:59] It doesn't have to be something writing garbage there; it could just be uninitialized. [20:50:14] erm [20:50:23] 11708640 is B2A8E0 [20:50:50] so, something is writing 0s to the other 4 bytes... or perhaps a casting error of some kind? [20:51:01] Though if gdb is thinking it's an int, and it's a pointer, that could be not quite right. [20:51:13] okay, or that [20:51:45] (kgdb) p *(void **)&afs_cacheType $10 = (void *) 0xffffff8000b2a8e0 [20:51:52] --- abo has left [20:52:19] "I give up" [20:52:25] --- abo has become available [20:52:28] well, check the pointers in there [20:52:31] specifically the 8th one [20:52:45] it could just be some other function pointer; GetDSlot is just one I remember being there [20:53:51] No idea if this actually corresponds to the layout of the struct, but: (kgdb) p *((void **)&afs_cacheType+1) $11 = (void *) 0x0 (kgdb) p *((void **)&afs_cacheType+2) $12 = (void *) 0x0 (kgdb) p *((void **)&afs_cacheType+3) $13 = (void *) 0xffffffff (kgdb) p *((void **)&afs_cacheType+4) $14 = (void *) 0xffffffff (kgdb) p *((void **)&afs_cacheType+5) $15 = (void *) 0xffffff800c0ee0d8 (kgdb) p *((void **)&afs_cacheType+6) $16 = (void *) 0xffffff800c0ee000 (kgdb) p *((void **)&afs_cacheType+7) $17 = (void *) 0xffffff800c0ee000 (kgdb) p *((void **)&afs_cacheType+8) $18 = (void *) 0xffffff800c0ee000 (kgdb) p *((void **)&afs_cacheType+9) $19 = (void *) 0xffffff800c0e0000 [20:53:58] or it's not a funtion pointer at all :) [20:56:09] those first four are.... interesting, assuming that pointer arithmetic is right [20:56:10] So, I guess I should not rely on my own reading: #8 0xffffffff80867773 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #9 0xffffff80009c2a07 in afs_GetDownD () from /boot/modules/libafs.ko #10 0xffffff80009c5be9 in afs_CacheTruncateDaemon () from /boot/modules/libafs.ko #11 0xffffff8000a1c39d in afs_syscall_call () from /boot/modules/libafs.ko (kgdb) x/i 0xffffff80009c2a07 0xffffff80009c2a07 : callq *0x38(%rax) [20:57:28] actually, maybe seeing why it's calling afs_GetDownD at all there would show something... [20:58:12] afs_CacheTooFull, afs_blocksUsed, afs_blocksDiscarded, afs_cacheBlocks, afs_freeDCCount, afs_discardDCCount, afs_cacheFiles ? [21:00:31] (kgdb) p afs_CacheTooFull $40 = 1 (kgdb) p afs_blocksUsed $41 = 0 (kgdb) p afs_blocksDiscarded $42 = 0 (kgdb) p afs_cacheBlocks $43 = 184192 (kgdb) p afs_freeDCCount $44 = 0 (kgdb) p afs_discardDCCount $45 = 0 (kgdb) p afs_cacheFiles $46 = 1439 [21:01:31] --- haba has become available [21:01:34] --- haba has left [21:02:03] --- haba has become available [21:02:39] I wonder if it's worth backing out the base address of the module and getting the offsets for those bogus "pointers", to see if they correspond to something interesting. [21:04:23] yeah, what was the bogus pointer again? [21:04:34] also, I don't see how afs_CacheTooFull could possibly be 1 there [21:05:01] it's only set to 1 if afs_CacheIsTooFull()... I'm trying to see a problem in that, but I don't see it [21:05:04] (afs.h) [21:07:08] Which bogus pointer? The 0xffffff800c0ee000 that was +{6,7,8} into afs_MemCacheOps? [21:07:27] I mena, the address we're trying to dereference [21:08:08] I don't think I ever conclusively nailed down what that was. [21:08:37] (And my gdb-fu is not that great.) [21:09:44] 'regs' or 'info regs' or something like that may give you %rax possibly? [21:10:30] The panic message is not causing anything to jump out at me: Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffff80009c2a07 stack pointer = 0x28:0xffffff803cfdf780 frame pointer = 0x28:0x21 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1775 (afsd) [21:10:58] Ah, (kgdb) info reg rax 0xffffff80009c2a0a -549745579510 [...] [21:12:24] No guarantees that's the right frame, though. [21:12:53] In fact, it probably is not the right frame. [21:26:57] can you look at the instructions right before that call, though? what gets put into %rax? [21:27:03] although I think that might be as far as I go tonight [21:27:57] (kgdb) x/10i 0xffffff80009c2a07-30 0xffffff80009c29e9 : add %ecx,0xffffffffffffff83(%rax) 0xffffff80009c29ec : lds (%rcx),%eax 0xffffff80009c29ee : cmp 0x58(%rsp),%r12d 0xffffff80009c29f3 : je 0xffffff80009c2a88 0xffffff80009c29f9 : mov 0x30(%rsp),%rsi 0xffffff80009c29fe : mov 0x0(%r13),%rax 0xffffff80009c2a02 : mov (%rsi,%rbp,4),%edi 0xffffff80009c2a05 : xor %esi,%esi 0xffffff80009c2a07 : callq *0x38(%rax) 0xffffff80009c2a0a : mov %rax,%rbx [21:28:04] I should be getting home soon, too. [21:30:46] Wait, is this: 0xffffff80009c29fe : mov 0x0(%r13),%rax writing null into %rax? [21:31:25] no, storing the value of %r13 into %rax (afaik, I don't do assembly much) [21:31:49] you'd need to look further up to see what %r13 is, unless it's something special [21:32:05] that doesn't _look_ like what should be around the GetDSlot call, but I can't be too sure [21:32:08] I do assembly ~none, so I'm purely interpolating. [21:32:45] go back 30 more? or just dump the whole function into pastebin [21:33:06] neither. That is indirecting %r13; 0x0(%r13) means the value stored at offset 0 from the address in %r13. [21:33:14] Have some more wall of text: (kgdb) x/30i 0xffffff80009c2a07-90 0xffffff80009c29ad : (bad) 0xffffff80009c29ae : (bad) 0xffffff80009c29af : decl 0xffffffffc758244c(%rbx) 0xffffff80009c29b5 : rexX and $0x50,%al 0xffffff80009c29b8 : add %al,(%rax) 0xffffff80009c29ba : add %al,(%rax) 0xffffff80009c29bc : test %ecx,%ecx 0xffffff80009c29be : mov %ecx,0x68(%rsp) 0xffffff80009c29c2 : je 0xffffff80009c2d3e 0xffffff80009c29c8 : mov 1518121(%rip),%r13 # 0xffffff8000b353f8 <__set_sysuninit_set_sym_M_AFS_uninit_sys_uninit+40536> 0xffffff80009c29cf : xor %r12d,%r12d 0xffffff80009c29d2 : xor %ebp,%ebp 0xffffff80009c29d4 : jmp 0xffffff80009c29f9 0xffffff80009c29d6 : mov 0x28(%rsp),%rcx 0xffffff80009c29db : cmpq $0x0,(%rcx,%rbp,8) 0xffffff80009c29e0 : je 0xffffff80009c2a6d 0xffffff80009c29e6 : add $0x1,%r12d 0xffffff80009c29ea : add $0x1,%rbp 0xffffff80009c29ee : cmp 0x58(%rsp),%r12d 0xffffff80009c29f3 : je 0xffffff80009c2a88 0xffffff80009c29f9 : mov 0x30(%rsp),%rsi 0xffffff80009c29fe : mov 0x0(%r13),%rax 0xffffff80009c2a02 : mov (%rsi,%rbp,4),%edi 0xffffff80009c2a05 : xor %esi,%esi 0xffffff80009c2a07 : callq *0x38(%rax) 0xffffff80009c2a0a : mov %rax,%rbx 0xffffff80009c2a0d : xor %eax,%eax 0xffffff80009c2a0f : mov 0x28(%rsp),%rdx 0xffffff80009c2a14 : cmpw $0x1,0xa4(%rbx) 0xffffff80009c2a1c : cmove %rbx,%rax [21:33:22] sorry, yes, the value of the memory pointed at by %r13 [21:35:09] Yeah; that is setting up for a function call through a pointer. [21:35:41] But how do I get the value of that pointer? [21:37:10] look at offset 0x38 from the address %rax; the value in there should be the address of where it's trying to call [21:37:46] After that instruction, %rax contains an address. The actual code that will start executing after the function call is 0x38 past that address. [21:37:46] give or take a level of indirection; I assume *0x38(%rax) is different than 0x38(%rax) the way I think it is [21:38:51] I think what I said is correct. That is, there is only one level of indirection in the call; rax contains an address to which a constant offset is added to get the place to jump to. But I could be wrong. [21:39:42] So, (kgdb) info reg rax 0xffffff80009c2a0a -549745579510 (kgdb) p/x 0xffffff80009c2a0a+0x38 $48 = 0xffffff80009c2a42 That? [21:39:53] I think so. [21:42:59] (kgdb) p &afs_MemCacheOpen $52 = ( *) 0xffffff80009cea10 (kgdb) p &afs_MemCacheTruncate $53 = ( *) 0xffffff80009cea90 (kgdb) p &afs_MemReadBlk $54 = ( *) 0xffffff80009cef50 (kgdb) p &afs_MemWriteBlk $55 = ( *) 0xffffff80009cf710 (kgdb) p &afs_MemCacheClose $56 = ( *) 0xffffff80009ce810 (kgdb) p &afs_MemRead $57 = ( *) 0xffffff80009ebf10 (kgdb) p &afs_MemWrite $58 = ( *) 0xffffff80009f2730 (kgdb) p &afs_MemGetDSlot $59 = ( *) 0xffffff80009c0af0 (kgdb) p &afs_MemGetVolSlot $60 = ( *) 0xffffff80009f34c0 (kgdb) p &afs_MemHandleLink $61 = ( *) 0xffffff80009f0f70 No match. [21:43:13] hm, you don't think 0x38 is indexing into the struct? [21:45:09] (kgdb) p *((void**)(&afs_cacheType+0x38)) $64 = (void *) 0x100000000 [21:45:16] that's also a possibility [21:45:37] Looks more general-protection-fault-worthy than the other one ... [21:46:47] what is &afs_cacheType ? [21:47:45] (kgdb) p &afs_cacheType $65 = ( *) 0xffffff8000b36ce8 [21:50:24] er, is that adding 0x38 adding 0x38 bytes, or indexing by 0x38 (void *)s? [21:50:58] (kgdb) p *((void**)&afs_cacheType+0x38) $66 = (void *) 0xc00000023 [21:52:43] The way it's written, it's indexing by 0x38 struct afs_cacheOps *'s [21:53:10] And I don't think that expression has anything to do with what the code is doing. What is in r13? rax? [21:53:43] What is the address of afs_MemCacheOps? of afs_UfsCacheOps? [21:54:11] (kgdb) p &afs_MemCacheOps $67 = ( *) 0xffffff8000b2a8e0 (kgdb) p &afs_UfsCacheOps $68 = ( *) 0xffffff8000b2a880 [21:54:35] r13 0xffffff803cfdf788 -548732536952 I'm still not sure if I trust 'info reg' to be what I want, though. [21:55:22] oh, right, kgdb [21:55:49] --- dwbotsch has left [21:56:01] I think maybe it's not :-( [21:56:48] could you put the GetDownD dissassembly on pastebin, though? I might actually be able to see around where in the code that is [21:57:10] Sadly, #bsddev doesn't seem to have anyone who wants to give me voodoo kgdb commands. [21:57:22] Do you have a preferred command that will snag the whole thing? [21:57:56] --- abo has left [21:58:30] --- dwbotsch has become available [21:58:39] --- abo has become available [21:58:40] I'm not sure; but the commands you've been running but with like -800 instead of -90 or something I imagine would contain it [22:02:47] /afs/sipb.mit.edu/user/kaduk/freebsd/openafs/afs_GetDownD.S [22:04:13] actually, try 'x/400i afs_GetDownD' [22:05:33] updated in-place [22:10:14] okay, now I think it does look like the GetDSlot call [22:10:36] GetDownD+720 looks like the if (!victimDCs[i]) check [22:13:50] but that's definitely it for me tonight, I'm done [22:14:03] okay. Thanks for puzzling through with me. [22:14:06] --- deason has left [22:16:32] --- Born Fool has left [22:29:18] --- jaltman has left: Disconnected [22:40:41] --- jaltman has become available [22:57:11] --- kaj has left [23:24:27] --- jaltman has left: Replaced by new connection [23:24:28] --- jaltman has become available [23:34:08] --- haba has left [23:43:30] --- dwbotsch has left [23:45:22] --- dwbotsch has become available [23:59:53] --- kaj has become available