[00:21:54] --- abo has become available [01:21:50] --- haba has left [02:42:15] --- haba has become available [02:42:29] --- Jeffrey Altman has left: Replaced by new connection [02:44:09] --- jaltman has left: Replaced by new connection [02:44:09] --- jaltman has become available [04:59:25] --- haba has left [05:09:35] --- haba has become available [05:22:19] --- jaltman has left: Disconnected [05:22:27] --- jaltman has become available [06:20:29] --- deason has become available [07:27:01] --- Simon Wilkinson has become available [07:43:29] Having just deployed the newest Firefox 3 (which may be unrelated), we're finding that the fileservers that host our homedirectories are melting down. [07:43:54] any data? [07:43:59] The fileserver process is blocking in I/O wait. On the clients, Firefox seems to block on unlink() of an sqllite journal file. [07:44:30] We _did_ have callback sizing issues. Having restarted the fileservers with a larger number of callbacks the problems are persisting. [07:44:53] It is possible that our disk arrays just aren't fast enough, but I'm interested in suggestions of what to look at... [07:45:02] unlink of a journal? that sounds familiar. [07:45:19] what fileserver version? [07:45:31] Hang on. I suspect we're both not going to like this answer ... [07:45:44] lsof a hung fileserver and look for file locks? [07:46:05] No, 1.4.11 [07:46:12] Fileserver isn't hung, it's just very slow ... [07:46:27] what's in the log? [07:46:42] (the issue i was thinking of you slowly used up threads until all were gone) [07:47:00] rxdebug says nothing's waiting for a thread. [07:47:38] ok. um. [07:47:55] and the log? [07:48:21] Pretty much nothing interesting in FileServer. [07:48:25] FileLog, even. [07:48:41] so no clients being told off because they're using too many threads. [07:48:57] Doesn't seem that way. [07:48:57] then 2 things: retransmits? [07:49:08] and what does strace say? [07:49:22] strace on the fileserver? Which thread? [07:49:56] oh god, you have that strace? [07:50:02] er, that linux? [07:50:12] RHEL5 - I'm pretty sure I get to strace a thread. [07:50:24] try? [07:51:22] restart_syscall(<... resuming interrupted call ...> [07:51:59] is that the parent or a thread? [07:52:04] Okay, so where on earth are retransmits in the rxdebug outpu. [07:52:09] That's the parent. [07:55:07] -rxstats flag? [07:55:28] rx stats: free packets 731, allocs 7484229, alloc-failures(rcv 0/0,send 0/0,ack 0) greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0 packets read: data 6988893 ack 370458 busy 0 abort 10 ackall 0 challenge 87 response 512 debug 1947 params 0 unused 0 unused 0 unused 0 version 0 other read counters: data 6988770, ack 370434, dup 40 spurious 24 dally 0 packets sent: data 467340 ack 3673645 busy 123 abort 3253 ackall 0 challenge 512 response 87 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other send counters: ack 3673645, data 934680 (not resends), resends 90, pushed 0, acked&ignored 369125 (these should be small) sendFailed 0, fatalErrors 1 Average rtt is 0.008, with 135834 samples Minimum rtt is 0.000, maximum is 58.982 335 server connections, 339 client connections, 262 peer structs, 246 call structs, 159 free call structs [07:55:38] (still can't see where retransmits are in that lot) [07:55:45] Ah - resends? [07:55:51] so the fileserver is not resending. are there a lot of resends *to* the fileserver? [07:56:16] (you'll need to parse tcpdump or probe clients) [07:56:18] I guess I can't easily tell that without interrogating all of the clients? [07:56:26] correct [07:57:21] --- deason has left [07:57:58] No. [07:58:41] 0, 4, ... that kind of number - less than .01% of the total number of packets transmited by the client [08:01:40] i guess sample some threads with strace [08:01:43] i should change locations [08:02:37] I'll take a look at that. It looks like we're getting tied up with our disk subsystem not responding quickly enough. But i wonder what's changed about our usage pattern to cause that. [08:08:43] --- squinney has become available [08:25:18] and, one flat bike tire later... [08:25:31] Ouch. [08:26:00] bike shop in the next block; i'll get a new tube later [08:26:31] did strace show anything? [08:27:01] Nothing that interesting. I'm about to try with gcore, and see exactly what the stacks are for all of the threads. [08:30:07] --- reuteras has left [08:38:43] --- dev-zero@jabber.org has become available [08:38:48] --- dev-zero@jabber.org has left: offline [08:58:32] --- kaj has left [09:03:04] --- Russ has become available [09:07:24] --- squinney has left [09:11:30] So, further investigation suggests that it's unlink() that the fileserver is blocking in. Either in the fsync in ih_reallyclose(), or in the underlying fs's unlink() command. [09:12:04] ext3? [09:12:33] Yup. I think we're going to change that. [09:12:40] try flushing your journals? [09:13:11] this is "known" (ext3 metadata performance sucks) [09:14:08] Yeh. But it's suddenly got really sucky. My suspicion is that this is related to the firefox we rolled out last week, which uses sqlite3 which is constantly creating, and unlinking, journal files. [09:14:29] likely [09:39:54] --- haba has left [09:46:55] --- Russ has left: Disconnected [09:51:32] --- deason has become available [09:53:52] --- deason has left [10:04:33] --- Russ has become available [10:19:15] has anyone successfully used integrated login with cross-realm on Win 7? [11:11:57] I haven't tried it but if your cross-realm configuration (KFW or MSLSA) works, it should work for integrated login. [11:46:54] unfortunately, it does not [11:48:52] what is "it"? [11:49:02] cross realm integrated login [11:49:03] and what have you done to debug "it"? [11:49:17] using KFW or MSLSA? [11:49:21] as of yet, tcpdump to see where network packets were going [11:49:37] KfW is installed, as for which integrated login attempts to use, I do not know [11:49:42] I would start with turning on the logging for integrated login [11:49:43] --- Simon Wilkinson has left [11:50:35] if it is 64-bit Win7, you must be using a 64-bit KFW [11:50:38] it's the same setup we have under XP ... KfW with an appropriate krb5.ini and the ms lsa kerberos also has the KDCs configured for the realms [11:50:47] nope... decided to start w. 32-bit win7 to keep it simpler [11:51:51] I know that the integrated login is talking to the wrong kerberos realm from tcpdump (it's trying the kerberos realm of the afs cell) [11:52:22] anyway, yes... the integrated login debug on is the next step [11:52:55] have you configured integrated logon properly? registry keys that indicate which realm should be used for the domain account that is used for logon? [11:53:07] --- Simon Wilkinson has become available [12:10:47] --- jaltman has left: Disconnected [12:17:06] --- jaltman has become available [12:19:48] yes... checked that it's configured the same as xp [13:27:29] that is not a very useful statement to me. I do not know how your XP system is configured. I do not know how you are logging onto the machine. I do not know what realm you believe should be contacted and which realm is being contacted. I do not know what is in your krb5.ini or whether the krb5.ini is being loaded. I don't even know which KFW you are using. [13:30:07] can you post the details of your configuration somewhere along with the log info from the Windows Application Log when tracing for the afslogon is turned on? [13:39:57] yeah, I'll do that [13:40:05] haven't gotten that far w. stuff, yet [14:30:40] --- deason has become available [14:39:42] jaltman - in /afs/cnf.cornell.edu/shares/public/outside_users/afs_win/win7-il .. there is the exported AFS Logon application logs, the TransarcAFSDaemon registry entries exported, and a quick description of our setup (3 files) [15:31:23] you didn't include your krb5.ini [15:31:37] nor the KSETUP configuration [15:35:15] --- deason has left [15:35:52] the KFW_AFS_get_cred() call is being made with a username of "dwb7" not "dwb7@REALM" so the realm is not being determined via the registry configuration [15:37:01] the domain being used for logon is CNFAD which is not any of the domains for which you have configuration data specified in the registry. [15:46:31] --- RedBear has become available [15:46:58] hmmm... the auto logging thing when signing on is lacking in some data [15:47:14] "(01/19/2010 12:13:11 PM) shadow@gmail.com/owl81CD615B: this is "known" (ext3 metadata performance sucks)" is the last thing it shows me [15:50:52] sorry [15:51:24] eh... jaltman didn't respond to my posting the location of the win stuff, did he? (since I went home after that) [15:51:42] i dunno. hang on, lemme see what i have if anything [15:51:59] he said no krb5.ini, no ksetup? [15:52:07] http://conference.openafs.org/openafs@conference.openafs.org/2010-01-19.txt [15:53:15] awesomeness... I'll have to respond to that when my stomach is not rumbling [15:56:30] --- meffie has become available [15:58:23] --- meffie has left [16:03:35] --- dev-zero@jabber.org has become available [16:19:48] --- dev-zero@jabber.org has left [16:20:03] --- dev-zero@jabber.org has become available [16:53:02] * Russ iterates on getting the Debian package for 1.5.69 to build. [17:40:23] --- deason has become available [18:23:00] --- Russ has left: Disconnected [18:29:49] --- Russ has become available [18:31:13] Well, that didn't go well. [18:31:43] Immediate kernel panic followed by hard system hang when loading the 1.5.69 Linux client. will collect the kernel output and open a bug. [18:33:46] "Can't init cache with inode numbers when complex inodes are in use" [18:33:48] Eh? [18:33:58] I haven't enabled anything special that I know of. [18:34:10] Also, kernel-panicing and then hard-crashing the system is nasty error behavior. [18:48:19] Anyway, bug submitted. [18:48:35] The source around that error message didn't enlighten me. [18:48:47] --- mdionne has become available [18:48:50] Hm, it could be that I need to force-wipe the cache when goingn from 1.4 to 1.5? [18:50:07] russ: I would say you're running a 1.4 afsd [18:50:18] yes, that sounds like a libafs/afsd version mismatch at first glance [18:50:43] Fairly positive I was not running a 1.4 afsd. [18:50:48] the 1.4 afsd is sending inode numbers to the kernel module, shich expects file names... [18:51:48] 2010-01-19 18:18:30 status triggers-awaited openafs-client 1.5.69+dfsg-1 2010-01-19 18:18:30 trigproc man-db 2.5.6-5 2.5.6-5 2010-01-19 18:18:30 status half-configured man-db 2.5.6-5 2010-01-19 18:18:30 status installed openafs-client 1.5.69+dfsg-1 2010-01-19 18:18:32 status installed man-db 2.5.6-5 [18:51:57] which is before the kernel panic. [18:53:24] Jan 19 18:19:16 windlord kernel: [31467.911542] Starting AFS cache scan...<2>openafs: Can't init cache with inode numbers when complex inodes are in use [18:53:43] Unless dpkg somehow totally failed, I don't see any way that the /sbin/afsd on the system at that time was from 1.4. [18:54:57] 'strings' I would trust... marc, what about the call from AFSOP_GO? that passes a null afile [18:55:22] but I'm probably misinterpreting that incorrectly, I'm not used to looking at that area [18:55:26] well, I can tell you what strings says for what's in the afsd in the package that I installed with dpkg -i. [18:55:30] err, interpreting it incorrectly [18:55:42] windlord:~/dvl/debian/build-area/openafs-client/sbin> strings afsd | grep OpenAFS @(#) OpenAFS 1.5.69 built 2010-01-20 [18:55:50] I backed out of the package installation to get a running system, though. [18:56:03] I can try again with a laptop or VM. [18:56:24] can you check if you have LINUX_USE_FH = 1 in src/afsconfig.h, just in case it's maybe a configure test gone wrong [18:57:12] We don't install that header file, do we? [18:57:37] * Russ isn't sure that he has a copy of that from the build anywhere. [18:57:38] ah, no, it would be there during the build. [18:57:39] I do have the build log. [18:57:52] With configure output. [18:58:07] Problem with building in chroots -- they go away after the build. [18:58:31] Let's see -- you want the build log from the module. [18:58:40] what is LINUX_HAVE_IGET from configure? [18:59:07] Lovely -- did DKMS dump the build log as well? [18:59:26] It appears that it did. Well, crap. [18:59:39] Let me try installing the modules-source package and building with this kernel on the grounds that should produce the same result. [19:00:15] well, the kernel has LINUX_USE_FH defined; that panic only appears with it, if I'm reading correctly [19:00:42] --- abo has left [19:00:55] --- abo has become available [19:01:05] the question is afsd; strings would show "%s/D%d/V%d" if you built with LINUX_USE_FH, but I don't know if there's a string like that elsewhere anyway even without it.... [19:02:07] #define LINUX_USE_FH 1 /* #undef HAVE_IGET */ is the result of running configure on the same module source. [19:02:19] Ok, that looks right [19:02:40] This *should* be, but might not be, the same result as I'd gotten before from DKMS. [19:03:21] I don't see how you could get to that error message unless you had an afsd built without USE_FH [19:03:54] Is there a configure test for the afsd build whose output would determine that? [19:04:04] can you try starting afs with that afsd that you just built? [19:04:21] I didn't just build an afsd, but rather a new kernel module. [19:04:50] What sets USE_FH for afsd? [19:05:00] the LINUX_USE_FH question was about the afsd build... hence my earlier comments [19:05:15] USE_FH is based on HAVE_IGET [19:05:18] Right, but afsd is built with --disable-kernel-module. [19:05:38] So I'm not sure I'm following how that gets set for the afsd build. [19:05:55] ah, that might be the problem. as someone pointed out before, the afsd build shouldn't depend on kernel configure tests [19:06:28] Ack, yeah. Since the Debian package build can't really depend on the kernel headers, there isn't any way of knowing what kernel you're building with when building the user space. [19:06:46] Why isn't Simon seeing this same problem, I wonder? [19:06:57] --- abo has left [19:07:16] --- abo has become available [19:07:32] his build system probably does it differently. i think adam megacz pointed this out [19:08:09] What kernel version introduced iget, do you know? [19:08:16] I think we should make afsd always use a single method - file names, regardless of kernel [19:08:22] 2.6.25 lost iget [19:08:34] Oh, it *lost* it. [19:08:52] ...or always use numbers, and interpolate the string from the number when we don't have iget [19:08:52] Yeah, then I think we should turn LINUX_USE_FH off by default when building with --disable-kernel-module. [19:08:59] yup, that's why we had to switch to a different method [19:09:05] Since that's what the current kernels expect. [19:09:17] Although a more comprehensive solution that works either way would be great. [19:10:14] deason: yeah, that's what the VNODE_PATH code does, that would be an alternative [19:10:57] you mean have USE_FH _on_ by default? [19:11:07] and I have suggested using that for linux before, but I wouldn't be the one doing the work to make it so, so I don't want to be loud about it [19:11:28] Yes, sorry. [19:11:41] I see, USE_FH is the non-inode method. [19:11:48] So it's the inverse of HAVE_IGET. [19:11:58] right, it uses file handles instead of inode numbers [19:12:18] Is there any reason not to always use file handles? [19:12:24] Do file handles not work with pre-2.6.25 kernels? [19:12:51] the current API we use only appeared in 2.6.24, so we'd have to adjust to the older API if we want to use it with older kernels [19:13:09] wasn't the original patch for the old api? i forget [19:13:31] * Russ updates the bug for those playing along at home. [19:13:48] no, the original patch used the current API [19:14:55] so we should have a capability interface on the AFSOPs [19:15:07] Do we currently have a configure probe for that API so that we know when we can't use it, or do we currently assume that if iget() isn't available, we can use it? [19:15:37] We should definitely have a smarter afsd to kernel interface. [19:15:51] Right now USE_FH is based on HAVE_IGET, and if we don't have iget we knoe the kernel is recent enogh to use the current API [19:16:24] Okay. Hm. I think either Andrew or Derrick's solution sounds best to me. [19:16:40] Make the kernel able to cope with either form, or provide a way for afsd to ask the kernel what it wants. [19:16:47] a runtime check by querying the kernel if we can use the new thing makes sense to me [19:16:51] if that's what derrick was saying [19:17:11] How hard is it to add a capabilities interface? [19:17:38] well, the "interpolate the string from the number" isn't possible the old AFSOPs, I don't think, since we don't know the number of cache dirs etc [19:17:53] it is. afsd can be compiled to be able to do either easily enough [19:17:58] (that was responding to Russ's earlier statements, not an answer to that question) [19:18:50] I don't think any of the solutions should be that hard to implement... [19:19:00] the question is do we get to return 8 bits or 32 from the system call? [19:19:37] What's the AFS_CACHE_VNODE bits? [19:19:43] i'd just allocate a bitmask, add a new AFSOP with a callin of a bitmask and return as many bits as we're allowed. [19:19:50] the which what? [19:20:16] you mean the thing we use on macos? [19:20:17] Lots of the code in afsd that's conditional on USE_FH is also conditional on AFS_CACHE_VNODE_PATH. [19:20:26] it's what macos uses [19:20:28] Oh, okay. [19:20:35] AFS_C_V_P is the path-based code used everywhere non-linux [19:20:41] we lost our open-by-inode interface on macos [19:20:42] which right now is I believe mac and solaris [19:20:56] yeah, i think doug reused the code for solaris [19:21:20] * Russ would be willing to try to add this new afsd system call if someone is willing to give me pointers on where to start. It would be a good learning experience. [19:21:37] But first, I must go home, as I've been at work for rather too long. [19:21:41] you need to define a new AFSOP, which will be a header in src/config, probably afs_args.h [19:21:49] Back on from there. [19:21:51] --- Russ has left: Disconnected [19:22:03] i believe unlike stop codes the startup OPs are not ordered. [19:22:05] I have never understood why we have a separate interface just for linux that does basically the same thing, but I understand there are ... complexities to the linux code [19:22:14] (i'll keep talking; i assume you will read the log) [19:22:58] we could use the same interface as VNODE_PATH, can't recall exactly why I did it differently [19:23:24] it will take one parameter as an input, an afs_uint32, in call_syscall in afsd.c. it will return what it can; you'll have to confirm whether it's 8 or 32 bits. return 0xffffffff and check the code returned by call_syscall to see [19:23:48] did USE_FH exist before VNODE_PATH? [19:23:50] if USE_FH is defined, return 1. if not, return 0. (confirm that unimplemented returns -1; i believe it does) [19:24:16] call it early in afsd, before setting up cache files. if you get 1 (USE_FH), call that path. otherwise don't [19:24:46] istr CACHE_VNODE_PATH was first but i should just git annotate [19:24:47] VNODE_PATH was there before - introduced for MacOS I think [19:25:15] it was. we needed it for the kpi, and then volfs got a reprieve after the last seed [19:26:10] shadow: if we wanted a more general caps interface, couldn't we do something where we pass in a user buffer and a len, and we set as many cap bits as we know of, and have the len for? [19:26:18] if we wanted to be able to use this in a more generalized way, that is [19:27:57] could do that. it's more complicated. [19:29:13] in which case, easy enough to pattern after AFSOP_AFSDB_HANDLER, in which case we know all the bugs are solved (note that this didn't work on AIX for... a long time) [19:30:01] just a thought; dunno how many caps we'd really need in this space anyway [19:30:32] i'd hope few, but i'd hate to be wrong [19:35:28] any interest in adapting the linux code to use VNODE_PATH in afsd? I'd worry about adding yet another possible variation in the mix.. [19:37:56] it's probably doable [19:39:55] --- Russ has become available [19:40:43] having the capability exchange would help prevent possible mismatches. that's probably a fairly easy change - I'll have a look [19:41:39] lemme look at what the difference inside the kernel is. if we have the capability interface we can also change osx [19:43:51] So you have to make a probe call first that returns 0xffffffff and then use that to figure out how big of a data structure is being returned, and then do the actual capabilities call? [19:44:17] Wouldn't one return the whole bitmask and let afsd figure out what capabilities are advertised? [19:44:23] I might be misunderstanding what you're saying. [19:46:04] --- mdionne has left [19:59:05] no, the 0xffffffff is a one time test to see how many bits you get [19:59:27] like, code for you to try once, if you're implementing this. [19:59:33] Oh, I see. [19:59:39] Why do we not know how many bits we get? [20:00:46] because i don't know off the top of my head whether you actually get 32 bits back from a system call or if the trampoline stuff we have to do only passes 8 [20:02:53] Ah, okay, I see. [20:02:59] And it's a good experiment. [20:08:12] i think macos could do the linux way. in fact i am unsure why i did macos the macos way, since i'm 99% certain it's my fault [20:14:58] Do we still want the capability interface and then the Mac OS X part is separate? [20:17:49] the capability intterface is needed for linux, mac thing does nothing for use_fh versus not [20:18:55] Right. [20:18:57] Okay. [20:41:15] ah, right. because osi_UFSOpen opens a path, not e.g. an fh. well, lemme see what i can do here. [20:42:58] --- deason has left [20:52:18] --- dev-zero@jabber.org has left: Replaced by new connection [20:52:20] --- dev-zero@jabber.org has become available [21:20:42] yeah, i gotta stash a path in the "ufs" field to make this work. ick. [21:23:30] i guess it's not so bad [21:35:01] --- jaltman has left: Replaced by new connection [21:35:02] --- jaltman has become available [21:55:09] i have code for merging darwin/solaris and linux path-based cache. might work. i'll try it in the morning [21:56:17] Cool! [21:58:05] well. the afsd builds even with warnings=errors. i wonder what will happen when i hit the kernel module. [21:58:56] (basically, what this does: defines f.inode.ufs to be char*. strdups the path passed in. opens it directly in osi_UFSOpen. at shutdown, free allocated memory) [22:04:06] f18a6c3 breaks ukernel. i'll fix it in the morning if there's no patch from simon already. [22:04:47] --- reuteras has become available [22:24:59] --- RedBear has left [23:13:33] --- dev-zero@jabber.org has left