[00:03:06] --- Russ has left: Disconnected [01:05:20] --- shadow@gmail.com/owlB65F7D75 has left [01:43:07] --- kula has left [02:08:54] --- sxw has become available [02:18:41] --- dwbotsch has left [02:19:10] --- dwbotsch has become available [02:56:21] --- sxw has left [02:57:39] --- sxw has become available [03:47:39] --- Claudio Bisegni has become available [04:20:17] --- Claudio Bisegni has left [04:22:58] --- dwbotsch has left [04:23:39] --- dwbotsch has become available [04:48:55] --- kula has become available [05:37:10] --- Jeffrey Altman has left: Replaced by new connection [06:50:18] --- shadow@gmail.com/owl7BB801BA has become available [07:03:13] --- dev-zero@jabber.org has become available [07:03:18] --- dev-zero@jabber.org has left: offline [07:32:15] --- Jeffrey Altman has become available [10:10:19] --- haba has left [10:12:23] --- andersk@mit.edu/dr-wily has become available [10:16:07] Linux commit 17f98dc (v2.6.31-rc1~196) unexports find_task_by_vpid. Apparently this causes the openafs module to fail to load: https://bugs.launchpad.net/bugs/420632 although I can’t reproduce on Karmic amd64. [10:17:23] it probably does. got suggested patch? [10:17:58] i'm having one of those moments where i'd like to focus on bugfixes and where i don't consider "linus broke it" our bug. [10:20:41] Hmm. find_task_by_vpid(vnr) was previously equivalent to pid_task(find_pid_ns(vnr, current->nsproxy->pid_ns), PIDTYPE_PID) but pid_task and find_pid_ns are both EXPORT_SYMBOL_GPL. [10:22:20] what? functionality which was available today is gone tomorrow behind GPL restrictions after linus said that wouldn't happen? never! [10:30:03] I guess I'll just open a ticket for now. [10:32:19] i wish linux would just stop morphing, actually. [10:32:43] if we had today's linux and had to live with it for a while, how horrible would that be, i wonder. [10:32:45] oh well [10:34:18] --- Russ has become available [10:36:55] We can just kill all of that code when we drop support for syscall probing in kernels with keyrings, I think. [10:37:57] That commit looks very familiar. I'm pretty sure we already worked around it. [10:40:36] --- kula has left [10:49:30] --- kula has become available [10:50:59] --- dev-zero@jabber.org has become available [11:21:19] --- sxw has left [11:51:03] --- dev-zero@jabber.org has left [11:56:50] --- brantgurga has become available [12:11:07] --- dev-zero@jabber.org has become available [12:18:58] --- haba has become available [12:29:43] --- dev-zero@jabber.org has left [12:31:37] --- dev-zero@jabber.org has become available [12:36:41] --- dev-zero@jabber.org has left: offline [13:26:52] --- brantgurga has left [14:01:32] --- mdionne has become available [14:35:56] --- mdionne has left [14:43:43] --- haba has left [16:10:04] --- kaduk@mit.edu/owl has left [16:11:07] --- kaduk@mit.edu/owl has become available [16:43:18] --- Jeffrey Altman has left [17:15:30] --- matt has become available [17:34:57] --- Jeffrey Altman has become available [18:10:02] --- Russ has left: Disconnected [18:29:31] --- Russ has become available [19:54:12] --- deason has become available [19:57:04] For the rxk5 work I believe that we either need to setup a new branch within the openafs git that tracks master for it and permit gerrit to manage it. Or we need to setup a secondary git repository and gerrit instance that can host that work until the protocol is standardized and consensus on the implementation is agreed upon. I think that due to the long history of rxk5 being developed within the openafs cvs repository and our relationship with Marcus and Matt that we should host the work within the openafs instances. [19:59:15] I believe that OpenAFS "master" should only have code pushed to it that is ready for the next major release. Git makes it easy enough for us to create tracking releases that once it is deemed ready for production level testing, we can generate distributions which are rxk5 testing distributions that only differ from "master" by rxk5. [20:06:11] The only concern that I have with managing the branch entirely with Gerrit is that I'd like to do merges from master outside of Gerrit since individually approving each merged commit would suck. [20:06:32] and we really want to aggressively merge master into that branch if we hope to merge it back into master eventually. [20:07:14] Maybe we can do some sort of hybrid thing where we use Gerrit to manage the regular patches and do the merges outside of Gerrit. [20:07:15] We are going to have that problem with any public repository. [20:07:18] so what is this ptclient ? [20:07:42] I'm doing tracking builds and I pull --rebase every day [20:07:57] Yeah, I think providing the branch is a good idea. Just am not sure how to do that part of the mechanics. [20:08:11] Its a very powerful technique but you can't use it with a public repository that you are making available for others to base their work off of [20:08:22] --- dwbotsch has left [20:08:55] --- RedBear has become available [20:08:56] Maybe we generate a new rebased branch every week or something and live with merge commits in between [20:09:56] [C:\src\openafs\openafs.git\repo\dest\amd64_w2k\free]root.server\usr\afs\bin\ptclient.exe Using CellServDB file in C:/PROGRA~3/OpenAFS/Client Making unauthenticated connection to prserver pr> ? cr name id owner - create entry with name and id. wh id - what is the offset into database for id? du offset - dump the contents of the entry at offset. add uid gid - add user uid to group gid. iton id* - translate the list of id's to names. ntoi name* - translate the list of names to ids. del id - delete the entry for id. dg gid - delete the entry for group gid. rm id gid - remove user id from group gid. l id - get the CPS for id. lh host - get the host CPS for host. lsg id - get the supergroups for id. m id - list elements for id. nu name - create new user with name - returns an id. ng name - create new group with name - returns an id. lm - list max user id and max (really min) group id. smu - set max user id. smg - set max group id. sin id - single iton. sni name - single ntoi. fih name - fix id hash for . fnh id - fix name hash for . q - quit. ?- this message. pr> [20:10:26] Well, one of us with direct push ability could do the merge, although if there are conflicts, I wouldn't know how to resolve them. [20:10:30] ok, so, what's the advantage of it over the normal pts command? [20:10:51] Usage is: 'prclient [-testconfdir | server | client] [0 | 1 | 2] [-ignoreExist] [-cell ] [20:11:52] > ok, so, what's the advantage of it over the normal pts command? It's not intended for general use. It's a fairly low-level tool for manipulating the PRDB. [20:12:49] k [20:12:52] The advantage is that it has operations like fih and fnh [20:14:04] I'm not sure we should ship it in the general package. Possibly in a separate package of power tools [20:15:47] are there other power tools? [20:16:48] I'm never sure what to do with stuff like that. [20:16:57] Debian ships pt_util since it uses it for the database bootstrap. [20:17:13] I'm including readvol and voldump since I don't see a good reason not to. [20:17:27] yeah, that's in the redhat packages (/usr/afs/bin) [20:17:49] voldump is, but readvol is not [20:18:00] pt_util is useful. I'm not sure ptclient really has much purpose, so I'm not sure I'd bother building it at all. OTOH, it's not like most windows users/afsadmins are in a position to build their own. [20:18:11] windows ships pt_util and ptclient in the server package [20:18:29] there are still a lot of tools in the tree that do not get built on windows [20:18:36] There are the db_verify tools too. [20:18:49] those could be very useful [20:18:56] prdb_check and vldb_check. [20:19:16] both are already there under linux [20:19:21] prdb_check has an option that spits out a ptclient script, so it's kind of weird to include prdb_check and not ptclient. [20:19:34] neither of those are built on windows at the moment [20:19:38] tho, some of these are part of the openafs-sever rpm [20:20:32] actually, seems they all are...tho most, but not all are in sbin (but that's just a packaging issue) [20:22:32] both of those are built on windows but they aren't isntalled [20:22:46] that can be fixed [20:27:20] Jeff - had a computer which was locking up as we previously discussed, but this time every 45 minutes approximately (starting with Eudora freezing first, before the rest of the system freezes up)... got an fs minidump but not a memdump out of it [20:27:31] tho, you were thinking minidump wouldn't help anyway... [20:27:55] anyway, deleted the afscache file and restarted, and that seems to have helped it for the time being... I'm sure it'll lock up again sometime next week [20:28:42] I should get to 1.5.62, tho [20:29:52] the problem I am sure you are seeing is a deadlock in microsoft's code. [20:30:27] what's interesting is that we never saw this before the smb hotfix (and oafs 1.5.60, since those were both done at the same time) [20:30:49] Microsoft has a lot of issues in their code [20:31:03] You have no idea how badly I want to be able to ship an afs redirector [20:31:15] It was being tested this week at Microsoft [20:31:27] how'd that go? [20:31:37] There is still work to do [20:31:52] at this point, you have no idea how badly I want you to able to ship an afs redirector :P [20:32:59] For Win7 we don't have a choice. The SMB interface will not be able to execute applications out of AFS [20:33:27] that a security-ish thing? [20:33:38] what is taking so long is that we literally had to throw out the design and start over again. [20:34:27] The Win7 smb server will not permit execution of code from an untrusted server and since \\AFS cannot be authenticated in a way that can be trusted by Windows, we lose. [20:34:43] --- abo has left [20:34:55] --- abo has become available [20:35:12] did win7 make you throw out the design? [20:35:32] I'm pretty sure I know what your deadlock is. The problem is that Microsoft doesn't want to hear from me anymore about their bugs. They want to hear them from end users. [20:35:54] do you have any way of us trying to verify what this deadlock is? [20:35:55] Win7 had nothing to do with us throwing out the afs redirector design. [20:36:12] a kernel dump of the hung machine [20:36:17] still, having to start from scratch... I appreciate that that sux [20:36:29] it produces a much better product [20:36:33] yeah... need to figure out how to do that since the mahine is, well, hung, for the most part [20:36:44] so, it'll be worth the wait then... no complaints about that [20:37:41] Process Dump is a tool that can produce a dump of any process on the machine. http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx [20:38:42] and I'm guessing I want to have it monitor the kernel process? [20:38:59] Live Kernel Debug can be used to load a kernel debugger on the machine it is running on. http://technet.microsoft.com/en-us/sysinternals/bb897415.aspx [20:39:42] threads deadlock within the kernel but they are still process threads. If Eudora is hung, you dump Eudora [20:40:03] simple enough [20:40:13] so, what do you think it is? [20:40:16] Using livekd you can create a dump of the entire kernel using the ".dump" command from within the kernel debugger [20:40:48] remotely via psexec (since the console of the machne is usually pretty darn hung) [20:41:32] you don't have to wait for the machine to get to that state. Once you see one process hung like Eudora you can run livekd and take a dump [20:42:00] If you get to the point where it is hung in shutdown, the only thing you can do is crash the machine. [20:43:27] hmmm... seems there might be some magic key sequence to crash the machine [20:45:15] which is only available in windows server 2003 or later... *sigh* [20:45:17] The Microsoft SMB Redirector (mrxsmb.sys) has ten worker threads that process all of the requests. The problem it has is that some requests in thread A are processed by pushing a new request on to the stack and then blocking. The new request will be processed by one of the other nine threads. But what happens if all of the ten threads are busy where B through J are waiting for A to release a resource? Then there is no thread to process the request that A is waiting for. You have a deadlock. [20:46:11] ick [20:47:11] In the process of creating the hotfix which solves three deadlocks I learned a lot about the internals of the smb redirector. I suspect there are many more problems lurking beneath the surface. [20:47:49] any of this fixed in xp64 or vista as far as you know? [20:50:08] none of it [20:50:22] xp64 is just windows 2003 [20:50:41] Vista SP1 and 2008 are the same code base [20:50:43] yeah... so I was wondering if they had possibly reworked anything for that [20:50:56] Win7 is the same code base as 2008 R2 [20:55:28] anyway to either basically HUP mrxsmb.sys to bring the system back to life or to increase the number of threads to mrxsmb.sys so that this perhaps happens less often? [21:21:43] --- Jeffrey Altman has left: Replaced by new connection [21:26:25] --- Jeffrey Altman has become available [21:26:49] the number of worker threads is fixed [21:27:36] the real bug is that the transport layer that is used to communicate with the openafs smb server keeps dropping when it shouldn't. [21:39:07] indeed [21:40:38] either its a bug in something the openafs smb server is sending back or it is a bug in their code [21:40:48] unfortunately, its very hard to tell which [21:41:01] yeah [21:41:34] 1.5.62 adds dce rpc service support, named pipe support, query info stream support, and improves dfs referral compatibility [21:41:45] all in an effort to make the smb client happier [21:43:25] Sometimes I think I spend more time working on smb then I do on afs [21:43:56] certainly sounds like it [21:44:01] you're becoming an expert [22:07:47] the vast majority of crashing openafs executables is caused by buggy mit kfw dlls [22:08:06] mostly folks that are running 2.6.5 or 3.0 or 3.1 on Vista [22:08:40] I'm really tempted to put a module check in for the krb5_32.dll version number and refuse to load it if the version is not 3.2.x [22:51:28] --- deason has left [23:41:31] --- Russ has left: Disconnected