[00:31:34] --- Simon Wilkinson has become available [01:01:04] --- Simon Wilkinson has left [01:03:47] --- kaj has become available [02:57:16] --- abo has left [03:23:36] --- haba has become available [03:36:55] --- Simon Wilkinson has become available [03:47:07] --- jaltman has left: Replaced by new connection [03:47:08] --- jaltman has become available [03:52:18] --- Simon Wilkinson has left [06:24:18] --- haba has left [07:17:46] --- RedBear has left [07:23:36] --- deason has become available [07:42:13] --- meffie has become available [08:22:31] --- reuteras has left [09:11:17] --- jaltman has left: Disconnected [09:11:44] --- jaltman has become available [09:25:26] --- abo has become available [09:44:47] --- haba has become available [10:14:28] --- haba has left [10:21:09] --- meffie has left [10:21:48] --- meffie has become available [10:22:20] --- meffie has left [10:32:03] --- kaj has left [10:44:51] --- RedBear has become available [11:32:14] OpenAFS was accepted into GSoC 2010 [11:54:44] huzzah [12:38:50] --- haba has become available [13:23:59] --- kaj has become available [13:24:55] --- Simon Wilkinson has become available [13:25:36] --- Simon Wilkinson has left [13:25:57] --- Simon Wilkinson has become available [13:36:00] --- Russ has become available [14:10:15] Congratz! What's the project? [14:13:02] Hm, it appears that I was sloppy in de-validating #1340 (xdr: stop the madness); the build had previously been broken, but I had only been working on a 1.5.72 tarball previously. [14:15:48] Do I change my review to neutral? [14:16:13] What's the issue? [14:16:23] That the build is currently broken, but #1340 doesn't fix it? [14:16:28] Yes. [14:16:42] (INT_MAX not defined) [14:16:43] And you just need an appropriately applied limits.h? [14:17:10] It seems so. I just threw it in sysincludes.h at random, and things compiled. [14:17:39] Hm, or maybe not. [14:17:55] (not "build formerly broken, and 1340 doesn't fix", that is) [14:18:07] I should probably actually try it instead of trying to tell from reading code. [14:18:45] If you can let me know the error you're seeing that would be great ... [14:20:10] But, I suspect if you are broken, you were broken due to my change to add xdr_mem.c to the Unix kernel. [14:20:17] So, I have only tried building (a patched) 1.5.72, and master+1340 I haven't tried building just master. master+1340 fails with xdr_mem.c:97: error: 'INT_MAX' undeclared (first use in this function) [14:20:20] And yes, XBSD doesn't include limits.h [14:20:26] Ah, yes, that would do it. [14:20:53] The question is whether its needed for XBSD, or just for FBSD [14:21:39] I'm not sure that I can answer that [14:22:39] sysincludes.h needs to be taken out and given a stern talking to. [14:23:56] The whole way we deal with headers in kernel space always confuses the crap out of me, although I'm sure it was for a good reason and it's probably hard to fix. [14:25:09] Too many people have used too many different styles in that file. It looks like we started with a single header list, and just added and removed things from that list on a per header, per OS basis. But then OBSD and LINUX came along and just added their own list of headers. And then we started using both "h/" and , and then my mind started hurting. [14:25:25] kaduk: Do you need #included ? [14:28:28] Let me check what I actually compiled with ... [14:32:35] I compiled with this: diff --git a/src/afs/sysincludes.h b/src/afs/sysincludes.h @ -340,6 +340,8 @ MALLOC_DECLARE(M_AFS); # include "h/proc.h" # if !defined(AFS_FBSD_ENV) # include "h/ioctl.h" +# else +# include "limits.h" # endif /* AFS_FBSD_ENV */ as a crude hack [14:33:28] I think you probably want h/limits.h [14:34:41] I can look at it more carefully later tonight. [14:39:06] I think we should probably try: [14:39:08] diff --git a/src/afs/sysincludes.h b/src/afs/sysincludes.h index b914956..54b434c 100644 --- a/src/afs/sysincludes.h +++ b/src/afs/sysincludes.h @@ -230,6 +230,10 @@ typedef unsigned short etap_event_t; # include "h/socketvar.h" # include "h/protosw.h" +#if defined(AFS_FBSD_ENV) +# include "limits.h" +#endif + # if defined(AFS_SGI_ENV) || defined(AFS_HPUX_ENV) || defined(AFS_SUN5_ENV) || # include "h/dirent.h" # ifdef AFS_SUN5_ENV [14:42:52] Okay; I'll confirm that that builds, after rehearsal. [15:16:47] --- summatusmentis has become available [15:52:07] For those that care, there's nothing in the gerrit 2.1.2.1 update that we care about, so we won't be installing it... [15:55:16] --- deason has left [16:05:53] --- kaj has left [16:56:22] --- jaltman has left: Disconnected [17:08:03] --- jaltman has become available [17:38:49] --- Russ has left: Disconnected [18:13:47] --- Russ has become available [18:30:12] --- Russ has left: Replaced by new connection [18:30:12] --- Russ has become available [19:27:15] --- deason has become available [19:45:40] --- Born Fool has become available [20:47:30] --- Russ has left: Disconnected [20:50:24] --- Russ has become available [21:06:49] Hm, I did a git reset --hard; ./regen.sh in my checkout, and I am still getting the 'INT_MAX undeclared' build failure. I guess that means that the build was broken before the xdr patch? [21:08:36] quite possibly. [21:08:51] a previous patch broke it the same way on irix, and i assume here too [21:10:05] Now trying Simon's proposed patch (with whitespace change). Oh, and the build just finished. Now to see if aklog still segfaults ... [21:10:16] i assume it will [21:16:04] Indeed, aklog does still dump core. [21:16:40] Did you want anything else from that right now, or should I move on to debugging shutdown hangs? [21:16:54] did aklog work before that patch? [21:17:04] or is this without it? [21:17:09] (the xdr patch, i mean) [21:17:40] This is without the xdr patch [21:17:47] Should be a stock master from last night. [21:18:13] ok. then you can move on. i wonder what the aklog issue is, but we won't get it tonight [21:18:28] Okay. [21:19:19] Now, do I want to revert to my patched 1.5.72 before debugging in earnest... [21:19:32] up to you [21:20:10] i have to drive 80 miles round trip in the morning to bring someone back to town, so at some point i will fall over. but i have a couple more things to do first [21:20:28] Fun. [21:25:46] Looks like we have threads in: osi_NetReceive->soreceive_dgram->_sleep WaitV->afs_osi_TimedSleep->_sleep afsdb_req->afs_osi_Sleep->_sleep afs_brsDaemons->afs_osi_Sleep->_sleep afs_CacheTruncateDaemon->afs_osi_Sleep->_sleep I'm not seeing anything else that looks terribly interesting. [21:27:44] > WaitV->afs_osi_TimedSleep->_sleep it's conceivable i broke this somehow [21:28:04] Yeah ... or maybe I got the unit conversion wrong. [21:28:26] msec/usec? [21:29:01] I'll check once I reboot [21:33:31] --- Born Fool has left [21:35:44] --- deason has left [21:51:15] Things seem to only be sleeping for 500 ms, so I don't think I messed up. [21:58:37] ok. then it implies one of the things which is supposed to change termState and wake up is either not changing, or not waking up [21:58:58] (afs_termState) [22:04:04] I guess I could get a coredump and look at the value of afs_termState ... [22:04:59] with full thread backtraces i could probably tell you without a full dump [22:06:02] "how full is 'full'?" [22:06:33] um. for the threads you listed, at least the fraes back up to syscall [22:06:36] frames [22:07:11] Not sure if that's available in the live debugger; we'll see. [22:18:41] I am trying to multitask between fscks, and looking at my inability to write/do fs operations troubles as well. I can create files in a system:anyuser write directory, which show up as owned by user 32766. Is there something special about that pts ID (athena cell)? I should be either 24729 or 33554683 ... Also, is fs setcrypt/fs getcrypt trustworthy for whether encryption is actually being used, or do I need to tcpdump to be sure? [22:20:00] 32766 is anonymous [22:20:13] Rather as I thought ... [22:20:21] and fs setcrypt applies to connections made after it's set. [22:20:39] and only works if you have keying material. 32766 probably does not [22:21:16] making another wild guess, something in afs_user.c is not matching your userid with the one with the tokens. [22:21:37] nah, perhaps not that [22:22:38] I'm perfectly happy to print out the key material, but I haven't yet found where I would need to hook to do so [22:25:39] ... WTF. I can read /afs/dev.mit.edu/system/README, but trying to read /afs/athena.mit.edu/contrib/bitbucket/unos.txt hangs Hm, I wonder if this is related to setcrypt? [22:28:56] Huh. And now that I have this 'less' process hung in afsslp, when I go to shutdown, all of the afsd processes have terminated. [22:29:35] yeah, i think there's something wacky with wakeups [22:41:55] Yeah, if I setcrypt, I can read anyuser-readable stuff, but not my own. (No hang on reading the anyuser-readable stuff being the notable part) [22:42:28] So maybe it is afs_user.c [22:50:20] This time on shutdown, all afsd's were done, and nothing in afsslp, but shutdown (reboot) still hung. The only thing that looked odd was that init was in state mntref. [23:05:40] That mntwait init is: [...] _sleep vfs_mount_destroy dounmount vfs_unmountall boot reboot syscall [23:13:13] vfs_mount_destroy msleep()s on the mount point with the mount point's mutex; it does seem consistent with a wakeup getting lost, and racing for where that wakeup gets lost. [23:17:27] --- reuteras has become available