[00:03:06] --- Russ has left: Disconnected
[01:05:20] --- shadow@gmail.com/owlB65F7D75 has left
[01:43:07] --- kula has left
[02:08:54] --- sxw has become available
[02:18:41] --- dwbotsch has left
[02:19:10] --- dwbotsch has become available
[02:56:21] --- sxw has left
[02:57:39] --- sxw has become available
[03:47:39] --- Claudio Bisegni has become available
[04:20:17] --- Claudio Bisegni has left
[04:22:58] --- dwbotsch has left
[04:23:39] --- dwbotsch has become available
[04:48:55] --- kula has become available
[05:37:10] --- Jeffrey Altman has left: Replaced by new connection
[06:50:18] --- shadow@gmail.com/owl7BB801BA has become available
[07:03:13] --- dev-zero@jabber.org has become available
[07:03:18] --- dev-zero@jabber.org has left: offline
[07:32:15] --- Jeffrey Altman has become available
[10:10:19] --- haba has left
[10:12:23] --- andersk@mit.edu/dr-wily has become available
[10:16:07] <andersk@mit.edu/dr-wily> Linux commit 17f98dc (v2.6.31-rc1~196) unexports find_task_by_vpid.
Apparently this causes the openafs module to fail to load:
  https://bugs.launchpad.net/bugs/420632
although I can’t reproduce on Karmic amd64.
[10:17:23] <shadow@gmail.com/owl7BB801BA> it probably does. got suggested patch?
[10:17:58] <shadow@gmail.com/owl7BB801BA> i'm having one of those moments where i'd like to focus on bugfixes and
where i don't consider "linus broke it" our bug.
[10:20:41] <andersk@mit.edu/dr-wily> Hmm.  find_task_by_vpid(vnr) was previously equivalent to
  pid_task(find_pid_ns(vnr, current->nsproxy->pid_ns), PIDTYPE_PID)
but pid_task and find_pid_ns are both EXPORT_SYMBOL_GPL.
[10:22:20] <shadow@gmail.com/owl7BB801BA> what? functionality which was available today is gone tomorrow behind
GPL restrictions after linus said that wouldn't happen? never!
[10:30:03] <andersk@mit.edu/dr-wily> I guess I'll just open a ticket for now.
[10:32:19] <shadow@gmail.com/owl7BB801BA> i wish linux would just stop morphing, actually. 
[10:32:43] <shadow@gmail.com/owl7BB801BA> if we had today's linux and had to live with it for a while, how
horrible would that be, i wonder.
[10:32:45] <shadow@gmail.com/owl7BB801BA> oh well
[10:34:18] --- Russ has become available
[10:36:55] <sxw> We can just kill all of that code when we drop support for syscall probing in kernels with keyrings, I think.
[10:37:57] <Russ> That commit looks very familiar.  I'm pretty sure we already worked around it.
[10:40:36] --- kula has left
[10:49:30] --- kula has become available
[10:50:59] --- dev-zero@jabber.org has become available
[11:21:19] --- sxw has left
[11:51:03] --- dev-zero@jabber.org has left
[11:56:50] --- brantgurga has become available
[12:11:07] --- dev-zero@jabber.org has become available
[12:18:58] --- haba has become available
[12:29:43] --- dev-zero@jabber.org has left
[12:31:37] --- dev-zero@jabber.org has become available
[12:36:41] --- dev-zero@jabber.org has left: offline
[13:26:52] --- brantgurga has left
[14:01:32] --- mdionne has become available
[14:35:56] --- mdionne has left
[14:43:43] --- haba has left
[16:10:04] --- kaduk@mit.edu/owl has left
[16:11:07] --- kaduk@mit.edu/owl has become available
[16:43:18] --- Jeffrey Altman has left
[17:15:30] --- matt has become available
[17:34:57] --- Jeffrey Altman has become available
[18:10:02] --- Russ has left: Disconnected
[18:29:31] --- Russ has become available
[19:54:12] --- deason has become available
[19:57:04] <Jeffrey Altman> For the rxk5 work I believe that we either need to setup a new branch within the openafs git that tracks master for it and permit gerrit to manage it.  Or we need to setup a secondary git repository and gerrit instance that can host that work until the protocol is standardized and consensus on the implementation is agreed upon.  

I think that due to the long history of rxk5 being developed within the openafs cvs repository and our relationship with Marcus and Matt that we should host the work within the openafs instances.
[19:59:15] <Jeffrey Altman> I believe that OpenAFS "master" should only have code pushed to it that is ready for the next major release.  Git makes it easy enough for us to create tracking releases that once it is deemed ready for production level testing, we can generate distributions which are rxk5 testing distributions that only differ from "master" by rxk5.
[20:06:11] <Russ> The only concern that I have with managing the branch entirely with Gerrit is that I'd like to do merges from master outside of Gerrit since individually approving each merged commit would suck.
[20:06:32] <Russ> and we really want to aggressively merge master into that branch if we hope to merge it back into master eventually.
[20:07:14] <Russ> Maybe we can do some sort of hybrid thing where we use Gerrit to manage the regular patches and do the merges outside of Gerrit.
[20:07:15] <Jeffrey Altman> We are going to have that problem with any public repository.
[20:07:18] <dwbotsch> so what is this ptclient ?
[20:07:42] <Jeffrey Altman> I'm doing tracking builds and I pull --rebase every day
[20:07:57] <Russ> Yeah, I think providing the branch is a good idea.  Just am not sure how to do that part of the mechanics.
[20:08:11] <Jeffrey Altman> Its a very powerful technique but you can't use it with a public repository that you are making available for others to base their work off of
[20:08:22] --- dwbotsch has left
[20:08:55] --- RedBear has become available
[20:08:56] <Jeffrey Altman> Maybe we generate a new rebased branch every week or something and live with merge commits in between
[20:09:56] <Jeffrey Altman> [C:\src\openafs\openafs.git\repo\dest\amd64_w2k\free]root.server\usr\afs\bin\ptclient.exe
Using CellServDB file in C:/PROGRA~3/OpenAFS/Client
Making unauthenticated connection to prserver
pr> ?
cr name id owner - create entry with name and id.
wh id  - what is the offset into database for id?
du offset - dump the contents of the entry at offset.
add uid gid - add user uid to group gid.
iton id* - translate the list of id's to names.
ntoi name* - translate the list of names to ids.
del id - delete the entry for id.
dg gid - delete the entry for group gid.
rm id gid - remove user id from group gid.
l id - get the CPS for id.
lh host - get the host CPS for host.
lsg id - get the supergroups for id.
m id - list elements for id.
nu name - create new user with name - returns an id.
ng name - create new group with name - returns an id.
lm  - list max user id and max (really min) group id.
smu - set max user id.
smg - set max group id.
sin id - single iton.
sni name - single ntoi.
fih name - fix id hash for <name>.
fnh id - fix name hash for <id>.
q - quit.
?- this message.
pr>
[20:10:26] <Russ> Well, one of us with direct push ability could do the merge, although if there are conflicts, I wouldn't know how to resolve them.
[20:10:30] <RedBear> ok, so, what's the advantage of it over the normal pts command?
[20:10:51] <Jeffrey Altman> Usage is: 'prclient [-testconfdir <dir> | server | client] [0 | 1 | 2] [-ignoreExist] [-cell <cellname>]
[20:11:52] <jhutz@jis.mit.edu/owl> >    ok, so, what's the advantage of it over the normal pts command?

It's not intended for general use.  It's a fairly low-level tool for
manipulating the PRDB.
[20:12:49] <RedBear> k
[20:12:52] <jhutz@jis.mit.edu/owl> The advantage is that it has operations like fih and fnh
[20:14:04] <Jeffrey Altman> I'm not sure we should ship it in the general package.  Possibly in a separate package of power tools
[20:15:47] <RedBear> are there other power tools?
[20:16:48] <Russ> I'm never sure what to do with stuff like that.
[20:16:57] <Russ> Debian ships pt_util since it uses it for the database bootstrap.
[20:17:13] <Russ> I'm including readvol and voldump since I don't see a good reason not to.
[20:17:27] <RedBear> yeah, that's in the redhat packages (/usr/afs/bin)
[20:17:49] <RedBear> voldump is, but readvol is not
[20:18:00] <jhutz@jis.mit.edu/owl> pt_util is useful.  I'm not sure ptclient really has much purpose, so I'm
not sure I'd bother building it at all.  OTOH, it's not like most windows
users/afsadmins are in a position to build their own.
[20:18:11] <Jeffrey Altman> windows ships pt_util and ptclient in the server package 
[20:18:29] <Jeffrey Altman> there are still a lot of tools in the tree that do not get built on windows
[20:18:36] <Russ> There are the db_verify tools too.
[20:18:49] <RedBear> those could be very useful
[20:18:56] <Russ> prdb_check and vldb_check.
[20:19:16] <RedBear> both are already there under linux
[20:19:21] <Russ> prdb_check has an option that spits out a ptclient script, so it's kind of weird to include prdb_check and not ptclient.
[20:19:34] <Jeffrey Altman> neither of those are built on windows at the moment
[20:19:38] <RedBear> tho, some of these are part of the openafs-sever rpm
[20:20:32] <RedBear> actually, seems they all are...tho most, but not all are in sbin (but that's just a packaging issue)
[20:22:32] <Jeffrey Altman> both of those are built on windows but they aren't isntalled
[20:22:46] <Jeffrey Altman> that can be fixed
[20:27:20] <RedBear> Jeff - had a computer which was locking up as we previously discussed, but this time every 45 minutes approximately (starting with Eudora freezing first, before the rest of the system freezes up)... got an fs minidump but not a memdump out of it
[20:27:31] <RedBear> tho, you were thinking minidump wouldn't help anyway... 
[20:27:55] <RedBear> anyway, deleted the afscache file and restarted, and that seems to have helped it for the time being... I'm sure it'll lock up again sometime next week
[20:28:42] <RedBear> I should get to 1.5.62, tho
[20:29:52] <Jeffrey Altman> the problem I am sure you are seeing is a deadlock in microsoft's code.
[20:30:27] <RedBear> what's interesting is that we never saw this before the smb hotfix (and oafs 1.5.60, since those were both done at the same time)
[20:30:49] <Jeffrey Altman> Microsoft has a lot of issues in their code
[20:31:03] <Jeffrey Altman> You have no idea how badly I want to be able to ship an afs redirector
[20:31:15] <Jeffrey Altman> It was being tested this week at Microsoft 
[20:31:27] <RedBear> how'd that go?
[20:31:37] <Jeffrey Altman> There is still work to do
[20:31:52] <RedBear> at this point, you have no idea how badly I want you to able to ship an afs redirector :P
[20:32:59] <Jeffrey Altman> For Win7 we don't have a choice.  The SMB interface will not be able to execute applications out of AFS
[20:33:27] <RedBear> that a security-ish thing?
[20:33:38] <Jeffrey Altman> what is taking so long is that we literally had to throw out the design and start over again.
[20:34:27] <Jeffrey Altman> The Win7 smb server will not permit execution of code from an untrusted server and since \\AFS cannot be authenticated in a way that can be trusted by Windows, we lose.
[20:34:43] --- abo has left
[20:34:55] --- abo has become available
[20:35:12] <RedBear> did win7 make you throw out the design?
[20:35:32] <Jeffrey Altman> I'm pretty sure I know what your deadlock is.  The problem is that Microsoft doesn't want to hear from me anymore about their bugs.   They want to hear them from end users.
[20:35:54] <RedBear> do you have any way of us trying to verify what this deadlock is?
[20:35:55] <Jeffrey Altman> Win7 had nothing to do with us throwing out the afs redirector design.  
[20:36:12] <Jeffrey Altman> a kernel dump of the hung machine
[20:36:17] <RedBear> still, having to start from scratch... I appreciate that that sux
[20:36:29] <Jeffrey Altman> it produces a much better product 
[20:36:33] <RedBear> yeah... need to figure out how to do that since the mahine is, well, hung, for the most part
[20:36:44] <RedBear> so, it'll be worth the wait then... no complaints about that
[20:37:41] <Jeffrey Altman> Process Dump is a tool that can produce a dump of any process on the machine.  http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx
[20:38:42] <RedBear> and I'm guessing I want to have it monitor the kernel process?
[20:38:59] <Jeffrey Altman> Live Kernel Debug can be used to load a kernel debugger on the machine it is running on.
http://technet.microsoft.com/en-us/sysinternals/bb897415.aspx
[20:39:42] <Jeffrey Altman> threads deadlock within the kernel but they are still process threads.  If Eudora is hung, you dump Eudora
[20:40:03] <RedBear> simple enough
[20:40:13] <RedBear> so, what do you think it is?
[20:40:16] <Jeffrey Altman> Using livekd you can create a dump of the entire kernel using the ".dump" command from within the kernel debugger
[20:40:48] <RedBear> remotely via psexec (since the console of the machne is usually pretty darn hung)
[20:41:32] <Jeffrey Altman> you don't have to wait for the machine to get to that state.  Once you see one process hung like Eudora you can run livekd and take a dump
[20:42:00] <Jeffrey Altman> If you get to the point where it is hung in shutdown, the only thing you can do is crash the machine.   
[20:43:27] <RedBear> hmmm... seems there might be some magic key sequence to crash the machine
[20:45:15] <RedBear> which is only available in windows server 2003 or later... *sigh*
[20:45:17] <Jeffrey Altman> The Microsoft SMB Redirector (mrxsmb.sys) has ten worker threads that process all of the requests.  The problem it has is that some requests in thread A are processed by pushing a new request on to the stack and then blocking.  The new request will be processed by one of the other nine threads.  But what happens if all of the ten threads are busy where B through J are waiting for A to release a resource?   Then there is no thread to process the request that A is waiting for.  You have a deadlock.
[20:46:11] <RedBear> ick
[20:47:11] <Jeffrey Altman> In the process of creating the hotfix which solves three deadlocks I learned a lot about the internals of the smb redirector.  I suspect there are many more problems lurking beneath the surface.
[20:47:49] <RedBear> any of this fixed in xp64 or vista as far as you know?
[20:50:08] <Jeffrey Altman> none of it
[20:50:22] <Jeffrey Altman> xp64 is just windows 2003
[20:50:41] <Jeffrey Altman> Vista SP1 and 2008 are the same code base 
[20:50:43] <RedBear> yeah... so I was wondering if they had possibly reworked anything for that
[20:50:56] <Jeffrey Altman> Win7 is the same code base as 2008 R2
[20:55:28] <RedBear> anyway to either basically HUP mrxsmb.sys to bring the system back to life or to increase the number of threads to mrxsmb.sys so that this perhaps happens less often?
[21:21:43] --- Jeffrey Altman has left: Replaced by new connection
[21:26:25] --- Jeffrey Altman has become available
[21:26:49] <Jeffrey Altman> the number of worker threads is fixed
[21:27:36] <Jeffrey Altman> the real bug is that the transport layer that is used to communicate with the openafs smb server keeps dropping when it shouldn't.
[21:39:07] <RedBear> indeed
[21:40:38] <Jeffrey Altman> either its a bug in something the openafs smb server is sending back or it is a bug in their code
[21:40:48] <Jeffrey Altman> unfortunately, its very hard to tell which
[21:41:01] <RedBear> yeah
[21:41:34] <Jeffrey Altman> 1.5.62 adds dce rpc service support, named pipe support, query info stream support, and improves dfs referral compatibility 
[21:41:45] <Jeffrey Altman> all in an effort to make the smb client happier
[21:43:25] <Jeffrey Altman> Sometimes I think I spend more time working on smb then I do on afs
[21:43:56] <RedBear> certainly sounds like it
[21:44:01] <RedBear> you're becoming an expert
[22:07:47] <Jeffrey Altman> the vast majority of crashing openafs executables is caused by buggy  mit kfw dlls
[22:08:06] <Jeffrey Altman> mostly folks that are running 2.6.5 or 3.0 or 3.1 on Vista
[22:08:40] <Jeffrey Altman> I'm really tempted to put a module check in for the krb5_32.dll version number and refuse to load it if the version is not 3.2.x
[22:51:28] --- deason has left
[23:41:31] --- Russ has left: Disconnected