[00:06:18] --- Russ has left: Disconnected [00:51:35] --- kaj has become available [02:15:48] --- haba has left [02:48:24] --- haba has become available [03:46:58] --- Jeffrey Altman has left: Replaced by new connection [03:47:02] --- Jeffrey Altman has become available [03:47:08] --- jaltman has left: Replaced by new connection [03:47:09] --- jaltman has become available [06:32:23] --- jaltman has left: Replaced by new connection [06:32:24] --- jaltman has become available [06:38:10] --- kaj has left [06:41:02] --- Simon Wilkinson has become available [06:51:10] --- deason has become available [06:52:14] --- Simon Wilkinson has left [06:52:25] --- Simon Wilkinson has become available [07:10:18] --- Simon Wilkinson has left [07:38:38] --- shadow has become available [08:29:19] --- shadow has left [08:30:30] --- shadow has become available [08:31:34] --- shadow has left [08:32:01] --- shadow has become available [10:33:53] --- shadow has left [10:38:01] --- shadow has become available [10:38:15] --- shadow has left [10:45:12] --- sxw has become available [10:57:46] --- sxw has left [11:08:15] --- meffie has become available [11:15:08] --- Russ has become available [11:15:57] should this room be listed on the 'getting support' page? [11:16:11] probably [11:18:49] I'm not sure. Haven't we been doing that in IRC? Do we want to conflate "support" with development discussion? [11:19:38] I think this room should be primarily developers [11:20:53] if we want a separate room for end user discussions that is ok. support@conference.openafs.org. or something like that. [11:21:25] can we bridge irc to it so i can turn off my irc client? ;) [11:21:46] I would be fine with that [11:24:08] --- meffie has left [11:38:50] --- sxw has become available [11:47:42] --- haba has left [11:55:54] --- jaltman has left: Disconnected [12:00:51] --- jaltman has become available [12:01:15] --- Simon Wilkinson has become available [12:21:09] --- jaltman has left: Disconnected [12:24:43] 1.4.12 is failing to build on s390. [12:24:57] What's the error? [12:24:58] `savecontext' referenced in section `.text' of /build/buildd-openafs_1.4.12+dfsg-1-s390-9nblQa/openafs-1.4.12+dfsg/lib/liblwp.a(lwp.o): defined in discarded section `.note.GNU-stack' of /build/buildd-openafs_1.4.12+dfsg-1-s390-9nblQa/openafs-1.4.12+dfsg/lib/liblwp.a(process.o) [12:25:36] Same error a little bit later for returnto. [12:25:38] Did 1.4.11 build? I can't recall anyone playing in the savecontext maze recently ... [12:25:44] Yes, 1.4.11 built. [12:25:54] Complete build log at: https://buildd.debian.org/fetch.cgi?pkg=openafs&arch=s390&ver=1.4.12%2Bdfsg-1&stamp=1268107194&file=log&as=raw [12:26:25] Hmmm. Did the executable stack stuff get pulled up to 1.4.x ? [12:26:52] It did. [12:26:59] If you mean 4b5e7ebf303621973fb0338cb1e2481e2c140065 [12:27:13] Yes, I think I do. [12:27:20] Does reverting that help? [12:27:43] That, unfortunately, is not a quick question to answer. [12:27:55] Hm, let's see what the s390 porter system is. [12:28:21] Because I can't help but note that the section that your error complains about is the section that file adds to the assembler [12:28:36] Yup. [12:28:39] I noticed that too. [12:28:51] Is that one change standalone? Can I revert just it? [12:29:19] I think so, yes. It was added because the SuSE build system was complaining that we were shipping libraries with executable stacks. [12:30:26] Now we see if zelenka.debian.org has enough installed on it to do a build.... [12:30:29] --- abo has left [12:30:53] --- abo has become available [12:31:43] By the looks of things, the problem is that we're entering that section, but never leaving it. So everything else in the assembler file is included in that section - which it shouldn't be. [12:32:00] I wonder why it only explodes on s390 and not on i386 or amd64... [12:32:07] Those statements should be at the end of the source file, not at the beginning. [12:33:07] Maybe the i386 and amd64 linkers don't discard unknown sections? [12:33:44] --- jaltman has become available [12:34:24] http://www.gentoo.org/proj/en/hardened/gnu-stack.xml has a good overview [12:36:26] Reverting that change fixes the problem. [12:36:43] Working on the correct patch now. [12:49:07] Hm, getting an unrelated build failure on current master. [12:49:16] ./iomgr.c: In function ‘IOMGR_Signal’: ./iomgr.c:996: error: dereferencing type-punned pointer will break strict-aliasi [12:50:00] --- kaj has become available [12:50:21] * Russ looks at that code and urghs. [12:51:20] Why are we not using sigfillset? [12:51:33] Are you building with different build options? We shouldn't be going near gcc's aliasing stuff. Our code isn't _nearly_ clean enough for that :) [12:51:48] Nope, just --enable-checking. [12:52:11] Which gcc version? Sounds like some more stuff has got bundled into -Wall [12:52:18] 4.4.3 [12:52:18] --enable-checking checks gcc's aliasing stuff [12:52:19] --- shadow has become available [12:52:21] I remember fixing some of it [12:52:30] (or more stuff added to the optimiser ) [12:52:49] at least, in whatever I was using to build at the time [12:52:55] This code is not sane. I'm going to patch it to use sigset_t and sigfillset(). I wonder if we really have platforms that don't have those functions now. [12:53:22] --- shadow has left [12:53:31] Sounds good. If there's anything without them, it will be AIX, HPUX or IRIX. [12:53:49] Derrick's got access to AIX. Chaz will test on IRIX. Dunnu about HPUX. [12:54:03] This is is all POSIX.1-2001 stuff, so hopefully it's universal now. [12:54:07] --- kaj has left [12:54:08] --- kaj has become available [12:54:19] --- shadow has become available [12:54:28] I'm not sure I share your optimism :) [12:54:28] (but let's hope ...) [12:54:47] Well, we're using sigaction unconditionally. [12:54:49] That's a good sign. [12:54:56] I don't think anything ever had sigaction without sigfillset. [12:55:32] fun [12:55:46] --- kaj has left [12:55:46] --- kaj has become available [12:57:33] allOnes could be removed in that case, too, while you're at it [12:57:44] Ah, yes, good point. [12:58:52] ubik.c:478: error: ‘VOTE_ExecuteRequest’ undeclared (first use in this function) ubik.c:478: error: (Each undeclared identifier is reported only once ubik.c:478: error: for each function it appears in.) ubik.c:488: error: ‘DISK_ExecuteRequest’ undeclared (first use in this function) [12:59:25] Hmmm. rxgen should be prototyping those. [12:59:39] Let me try a make clean. [12:59:47] This could be a dependency error that's causing my source tree to not be rebuilt. [13:00:22] So nice that parallel builds work now. [13:00:32] At some point, our build system needs attacked with a flame thrower. [13:00:38] Yeah. [13:00:48] circular sys rxosd just removed recently [13:01:14] Aie, more strict-aliasing problems. [13:01:15] > I'm going to patch it to use sigset_t and sigfillset() wasn't it already using sigset_t? [13:01:22] No. [13:01:27] what was line 996? [13:01:31] unsigned char allOnes[100]. [13:01:39] And then doing a nasty cast. [13:01:49] what was 996 exactly [13:01:54] the 996 in mine has a sigset_t in it [13:02:06] I don't know, I don't have that source tree around any more. [13:02:27] Sounds like gcc4.4 is particular anal about strict aliasing, and gcc4.5 is more relaxed. [13:03:07] Well, that code was doing a structure assignment of an unsigned char[100] to a sigset_t, so there at least I have the same reaction as the compiler. :) [13:03:16] just trying to make sure we don't accidentally re-solve problems because old source is being looked at :) [13:03:43] I just git rebased, so that shouldn't be a problem unless there are patches in Gerrit that aren't pushed yet. [13:04:30] no, anything dealing with this would have been months old; I suppose we haven't tried to build with gcc 4.4 --enable-checking in a long time? [13:04:38] Okay, the stuff in ptserver/ptutils.c is beyond simple fixing. [13:04:42] That's actual type-punning. [13:04:49] * Russ gives up and tests this patch by building without --enable-checking. [13:05:17] supergroups stuff? [13:05:32] Yes. [13:05:36] I always build with supergroups. [13:05:47] -fno-strict-aliasing [13:05:51] ? [13:06:06] That's probably a better idea. [13:06:33] * Russ tries CC="gcc -fno-strict-aliasing" with --enable-checking. [13:06:34] The supergroups stuff really is evil when you read the ptserver code. I was advocating enabling it by default in 1.6. [13:06:59] or supergroups-relevant files should have a warning suppression when they are enabled [13:07:07] But having read the code in detail, I'm really not sure it should even be in the tree, let alone enabled. [13:07:34] Unfortunately, we can't yank it, since lots of people use it in production and there's no going back. [13:07:38] We probably have to reimplement it. [13:07:55] deason: I'm not sure the aliasing warnings are ones we want to suppress. [13:08:06] Usually they do indicate real problems. [13:08:13] Depending on the optimization level, gcc can get really confused by type puning. [13:08:32] aren't we running with them like that anyway? [13:08:38] the supressions aren't meant to be permanent [13:08:53] -fno-strict-aliasing has the advantage that it just tells the compiler to try less hard. Your code may be larger & slower, but at least it will work. [13:10:25] Confirming my patch works on s390 now. [13:19:32] --- sxw has left [13:22:30] --- haba has become available [13:24:25] deason: Change now in Gerrit as http://gerrit.openafs.org/1616 so that you can see what I'm seeing. [13:40:48] --- Simon Wilkinson has left [13:43:14] --- steven.jenkins has become available [13:45:49] --- Simon Wilkinson has become available [13:54:44] --- steven.jenkins has left [13:56:58] Russ: Did you verify that the stack was still marked non-executable, or just that it built? [13:59:27] --- shadow has left [14:07:01] --- mdionne has become available [14:07:11] I checked process.o with readelf and it had the right note. [14:07:19] I don't recall how to check executable stack in the final executable. [14:08:30] SuSe's rpmlint does it. I need to figure out how. [14:08:38] BTW as I noted in my comments for 1616, i386 has issues with type punning as well with the new gcc. [14:08:44] That web page that you showed had another utility that Debian doesn't have. [14:09:25] on x86_64 we seem to be OK simply because we don't use -O2 [14:09:32] Pax? [14:10:07] Yeah, pax-utils was it. [14:11:22] Should we add -fno-strict-aliasing if compiling with gcc for the time being? [14:11:28] At least if supergroups are enabled? [14:11:34] I think we should. OpenSSH did so recently. [14:12:23] Always, or just with supergroups do you think? [14:12:30] Before adding it, it would be good if someone opened a Bugzilla bug with the files that require it. If we can add it just for those files, better still. [14:12:33] --- sxw has become available [14:12:37] we could just do it for ptserver stuff too, unless it's not a big enough deal to care [14:12:42] I'd rather not have it globally, because it makes it easier for new problems to slip in. [14:12:47] Yeah, agreed. [14:12:47] yeah, that [14:13:04] --- sxw has left [14:13:28] my thought: if a patch shows up we can deal for 1.5.73, otherwise, don't care yet? [14:13:36] If we could have something like the @CFLAGS_NO_ERROR@ that we use to add -Wno-error to files to add -fno-strict-aliasing, then that would probably be the best bet. [14:14:01] ... and some volunteers to rewrite supergroups ? [14:14:10] If you add -fno-strict-aliasing, then you don't get the strict-aliasing warnings with -Wall. [14:14:25] Oh, wait, I see what you mean. [14:18:37] Sorry, that sentence wasn't as clear as it could have been. [14:32:36] --- shadow@gmail.com/owl06BCDF48 has left [14:33:19] The stuff in StatsQuery in src/vol/fssync-debug.c with the union is not standard C. You can't type-pun pointers that way. [14:33:27] --- summatusmentis has left [14:33:48] --- andersk@mit.edu/dr-wily has become available [14:34:00] --- shadow@gmail.com/owl6A7594E8 has become available [14:35:45] * Russ looks deeper. Oh, ugh. :/ [14:36:05] * Russ adds an exception for that right now. [14:37:27] I was speaking to someone who's just started looking at the OpenAFS code. I told him how much had been cleaned up over the last few years. He gave me this terrified look and said "What was it like before?" [14:38:09] The relevant code there is: [14:38:12] /* use a large type to get proper buffer alignment so we can safely cast the pointer */ #define SYNC_PROTO_BUF_DECL(buf) \ afs_int64 _##buf##_l[SYNC_PROTO_MAX_LEN/sizeof(afs_int64)]; \ char * buf = (char *)(_##buf##_l) [14:38:22] Yeah, gcc isn't going to go for that. [14:38:33] "What was it like before?" "Well, you know how now, the code made puppies die? Well, they used to explode." [14:38:49] Well, that in and of itself is okay, but as soon as you cast that char * back to something real, it's going to explode. [14:40:57] git blame points the finger quite well with that one ... [14:41:00] I think someone really really wanted to alloca() [14:41:42] Yeah, I suspect so. [14:42:07] /* check if entry is free by looking at the first "afs_int32" of the structure */ if (*((afs_int32 *) & entry[0]) == 0) { /* zero is free */ [14:42:08] Aie. [14:44:42] I *think* SYNC_PROTO_BUF_DECL is almost always or always used in pretty small/trivial functions; I don't mind looking through changing to malloc/free [14:45:01] Probably easier than adding the logic to use alloca. [14:46:16] There are only 13 callers in the current master, thankfully. [14:46:27] I thought alloca wasn't very dependable, or something [14:46:52] It's deprecated on a lot of platforms. [14:52:55] It doesn't exist on a lot of platforms, which means that you have to fake it up with malloc (solved problem) and then periodically call alloca(0) to free the memory (big problem). [14:54:36] ick [15:03:40] http://gerrit.openafs.org/1617 has the remaining no-strict-aliasing fixes. [15:07:39] hm, that SYNC_PROTO_foo aliasing warning was only in fssync-debug.c? it's used in several more places... [15:07:51] When you said fixes, I got all excited. [15:07:52] Only fssync-debug.c ever uses it for anything other than a char *. [15:08:06] * Russ hehs. Well, there are some fixes. The rest are bails. :) [15:09:30] I wonder if memcpy(,,sizeof(int)) is cheap on all compilers? [15:09:45] Would a union be better than memcpy in some of those cases (verifyEntryChains, T_DumpHashTable)? [15:09:47] Probably not all compilers, but I wouldn't expect it to be horribly bad. [15:10:14] You can't read from a different element of a union than the one you wrote to. [15:10:19] In general, unions don't fix strict aliasing problems. [15:10:29] IIRC. [15:10:35] I may have some of that wrong. [15:10:38] It's been a long time. [15:10:52] no, there's no guarantee of alignment for union members [15:11:13] I seem to recall the last time I dealt with aliasing issues, I just sprinkled a load of -fno-strict-aliasing magic pixie dust and got on with life. [15:11:29] I know that probably makes me a bad person. [15:11:34] Really? I thought aliasing with unions is explicitly allowed, and that unions are aligned to their most-aligned members. [15:11:35] I thought accessing via a union specifically told the compiler something about aliasing [15:11:43] see 549002c906795f978eebf81c706995116a04a8ff ; is that incorrect? [15:12:31] part of that uses unions to squash warnings like this [15:12:32] i thought that wasn't guaranteed to work [15:12:37] (but probably does) [15:12:44] er, probably always does [15:12:59] I believe what it does is tell the compiler "my code is so weird that you should give up trying to diagnose strict aliasing problems here." [15:13:27] Ah, there it is. [15:13:34] Okay, gcc guarantees that will work as a gcc extension. [15:13:43] I believe ISO C doesn't guarantee it works. [15:14:08] I believe that ISO C says that it is implementation defined. [15:14:10] See the -fstrict-aliasing section of the gcc info page. [15:14:24] "It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another." [15:14:31] They explicitly talk about type-punning through unions. [15:14:58] gcc only allows it if the union stores the final type. In other words, you can't store a pointer of one type in a union and get back a pointer of a different type; gcc will still yell at you and do the wrong thing. [15:15:04] You have to store the actual data directly in the union. [15:15:52] the text being referenced (at least, what I was using for the basis of this): The practice of reading from a different union member than the one most recently written to (called "type-punning") is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above will work as expected. However, this code might not: [15:15:57] Ah. Derrick and I will never see this, as Apple disable the strict aliasing optimisations in their build of gcc. [15:16:11] Yup, that's it. [15:16:29] It's not very clear about that, but that's a gcc extension -- well, as Simon points out, not an extension, but an implementation decision. [15:16:44] Other compilers are allowed by ISO C to emit nasal demons if you do that. [15:16:49] Although as Derrick points out, they probably won't. [15:16:54] Yes. And we do still support other compilers, sadly. [15:17:11] \me has a little piece of hate reserved just for the IRIX compiler [15:17:29] In general, though, I would say that type-punning is generally a sign of a design failure somewhere farther back. [15:18:10] Sometimes it's a legitimate case of "shit, C doesn't have templates and the only way to do generic programming is through horrible hacks," but usually it's a design flaw. [15:18:23] --- abo has left [15:18:53] --- Simon Wilkinson has left [15:19:06] --- abo has become available [15:19:53] Aha, that's why you had something different on that line in iomgr.c. [15:19:58] --- haba has left [15:20:08] * Russ sees the AC_USE_SYSTEM_EXTENSIONS fix also touched that file. [15:21:58] --- haba has become available [15:24:34] also the aix compiler [15:24:57] The AIX complier is usually picky to the point of teeth-gratingness. [15:25:31] Also, AIX's bundled Kerberos is bizarre. [15:26:13] I'm not sure what I like best, the AIX-specific error reporting functions or the bit where they provide all of the private profile library functions but not the public API. [15:27:01] ah right, you also had this fun [15:27:11] anyway, back in a bit [15:29:50] --- deason has left [16:45:32] --- Simon Wilkinson has become available [18:45:34] --- mdionne has left [18:57:48] --- deason has become available [20:02:38] --- Born Fool has become available [20:48:43] Ah, I see you already rebased what I was rebasing for you. [20:48:55] yup [20:57:06] --- jaltman has left: Replaced by new connection [20:57:07] --- jaltman has become available [21:16:30] --- Russ has left: Disconnected [21:30:12] --- deason has left [21:37:35] --- Russ has become available [22:03:53] --- Born Fool has left