[01:19:56] --- kaj has left [01:22:23] --- Rasmus Kaj has become available [01:30:07] --- haba has left [01:42:52] --- Rasmus Kaj has left [02:08:54] --- haba has become available [02:30:24] --- reuteras has left [02:31:20] --- reuteras has become available [02:43:46] --- Jeffrey Altman has become available [02:44:00] I am reading vlprocs.c. Do I see this right, that the whole multi-homed data structure will not be able to cope with two servers that have the IP addrs with same 0x00ffffff part and different 0xff000000 parts (say 10.1.2.3 and 11.1.2.3) ? [02:44:00] --- kaj has become available [02:47:23] haba: Got a line number? [02:48:46] VL_RegisterAddrs around line 2000, there is a lot of indexing going on like if ((HostAddress[srvidx] & 0xff000000) == 0xff000000) { /* The server is registered as a multihomed */ base = (HostAddress[srvidx] >> 16) & 0xff; index = HostAddress[srvidx] & 0x0000ffff; [02:52:02] Looks to me lite the first part of the IP addr is set to 255 (ff) if the server is multihomed and the rest is used as an hash/index. [02:53:51] I have yet to find where the addr = 0xff000000 | addr is taking place. [02:54:00] I think that's just a marker, though. [02:54:28] Like, you can't reconstruct the IP address from that hash, so the real IP address has to get stored elsewhere. [02:55:02] yes, but servers 10.1.2.3 and 11.1.2.3 would share the same index but have different uuids. [02:55:06] base and index are then pointers into that elsewhere. [02:55:51] If they're different servers, they'll use a different srvidx, surely? [03:00:55] That's what I try to find out. The uuid is hanging at (ex_addr[base][index).ex_hostuuid [03:05:14] In my example. base would be 1 and i index would be (2*256+3) in both cases. [03:05:56] Are you actually seeing problems here, or is this a code review exercise? [03:06:01] Normally Organizations don't own IP addresses with same endings from two /8. [03:06:38] I am just trying to understand what I should feed into VL_RegisterAddrs to get an expected result ;-) [03:07:41] Maybe I should read the fileserver code instead. [03:08:43] Okay. Found it. [03:08:58] In the multihomed case, HostAddress doesn't store a real address. [03:09:06] It's not even related to a real address. [03:09:54] Thats not intuitive but probably good. [03:09:55] Instead, it is comprised of a marker (top octet 0xff) [03:10:52] Followed by an extent block identifier (next octet) [03:11:05] Follwed by an offset within that extent block (bottom two octets) [03:11:23] yes [03:11:23] The extension block is then where the actual data is stored. [03:11:37] See FindExtentBlock in vlprocs.c for the details. [03:12:04] I would imagine this is necessary because the vldb format only has 4 octets allocated for the address. [03:12:26] Have you found where these 4 octets are smacked together? [03:12:41] Yes - FindExtentBlocks() [03:12:52] FindExtentBlock(), even. [03:13:20] --- reuteras has left [03:13:26] --- abo has left [03:13:38] --- reuteras has become available [03:13:40] --- abo has become available [03:20:36] I don't think any of this affects what you should be sending to VLRegisterAddrs, though. [03:21:07] Which is what I wanted to make sure. [03:21:54] bulkaddrs is just a list of ints, in host byte order. [03:22:29] The more interesting thing is going to be attaching the UUID to the connection. [03:23:36] Oh, no, I tell a lie. That's trivial - it's an argument to the RPC. [03:24:27] I will have to look at the printuuid routine and do that backwards or something like that. [03:24:49] Then stuff it into the rpc [03:25:43] Here in the "east", now lunchtime is coming up. [03:25:54] I've got an hour or so to go ... [03:25:58] Enjoy lunch! [03:26:16] Thanks and thanks for unconfusing me [03:26:58] no problem [03:27:41] --- abo has left [03:28:29] --- abo has become available [03:36:18] Talk about a false sense of security: afs_GetUser and afs_PutUser have a locktype parameter, which they do precisely nothing with ... [06:05:08] --- meffie has become available [06:19:24] the GetUser/PutUser locktype has been discussed before. we try to at least keep what kind of lock you'd need up to date lest we someday use it [06:20:07] I think that day might have come. [06:21:06] I'm currently implementing a token container within the user structure. I think we're going to need to lock around it, otherwise token use will race with the pioctls and garbage collection routines [06:21:59] I'm pretty surprised we don't already have problems with a race between connection creation and setting new tokens. [06:31:05] hello, i heard the openafs-info daily digests are not going out, for the last two weeks. i signed up for digests (on a second email account) and i havent gotten any digests yet. [06:35:18] i suspect the people who know anything abpout it aren't online yet [06:39:01] ok [06:40:03] wondering, should i put that in rt? is is it enough to mention it here? [06:40:44] You could put it in an RT, but I'm not sure if the person responsible would read it there. You could also try just emailing jhutz ... [06:42:11] meffie: you want to send mail to openafs-listmgr@openafs.org [06:42:35] will do, thanks. [06:43:19] the way you find that address is by going to the appropriate mailing list page from the openafs.org web site and search for the administrator's mail address. [06:43:26] This is just pure, unadulterated, evil ... [06:43:26] #define u (*(get_user_struct())) [06:46:40] listmgr will work, i'm not sure it's right but it beats anything else. [07:13:01] --- deason has become available [07:29:42] > Thats not intuitive but probably good. It doesn't have to be intuitive. It's part of the VLDB's internal structure. It's never exposed to you, unless you try to figure out what to do by reading the VLDB database code instead of just using the documented interface, which is "give me all of your IP addresses". [07:32:41] it would be helpful for it to be intuitive just for the ease of reading vlserver code, but I'm not about to go change it [07:32:50] > You could also try just emailing jhutz ... You could, but that wouldn't be as effective as sending email to the list owner address, or openafs-listmgr, or postmaster [07:34:23] deason: The danger is the assumption that 'HostAddress' might store an address. Once you get past that ... [07:34:26] And actually, it is reasonably intuitive. If it starts with 0xff000000, the rest is the index into the expansion table. haba even quoted the code where we extract the index into the expansion table from the rest. I don't know why he thought that had anything to do with the machine's real IP address. [07:38:14] --- jaltman has left: Disconnected [07:39:11] --- reuteras has left [08:03:17] Hi Jeff [08:06:29] Do I understand correct that VL_GetAddrs(bla bla. &m_addrs) returns a long list of all addrs where some of them start with 0xff000000 meaning that these entries are multihomed and you have to dig further but VL_GetAddrsU(bla bla, &m_addrs) all the time returns the real addreses in m_addrs? [08:07:03] No. The 0xff000000 addresses never appear on the wire. They are part of the internal structure of the VLDB. [08:07:45] Just got those from a VL_GetAddrs() call in vos IMHO. [08:09:57] Did you stick them into vos? [08:11:07] That is, have you been putting 'interesting' values into your vldb? [08:11:35] let me try to explain what I am seeing. [08:12:49] Oh, actually, it looks like VL_GetAddrs will return 0xffxxxxxx numbers for multi-homed entries, unlike the interfaces that CM's actually use. You can't do anything with those entries; if you want a list of all the actual addresses, use the new interface. [08:13:01] --- abo has left [08:13:11] --- abo has become available [08:13:18] I took the code from 1.5.71. vos.c and in the ListAddrs function on line 5335 after the call to ubik_VL_GetAddrs(cstruct, UBIK_CALL_NEW, 0, 0, &vlcb, &nentries, &m_addrs); I added a printf("nentries=%d\n",nentries); print_addrs(&m_addrs, &m_uuid, nentries, 0); [08:14:11] Have you added addresses which are 255.0.0.0 to you vldb at any point? [08:14:31] or 255.x.y.z, more correctly? [08:14:47] That will trigger the otherwise unused code in print_addrs() that does multihome-magic [08:15:53] WAYRTTD [08:16:10] I have not added anything manually to my vldb. [08:17:48] So, you're seeing this, I think, because VL_GetAddrs is an old version of the RPC. [08:17:49] jhutz: I am trying to understand what is in there and what I will get back. I am writing a vos setaddrs -uuid xxx -host a -host b -host c function. [08:18:07] VL_GetAddrsU is the newer version/ [08:18:14] Simon: Check [08:18:28] I still don't get why you care about the vlserver, though. [08:18:45] You just need to mimic what the fileserver does when it starts up. The code there is pretty clear. I think you're just confusing yourself. [08:19:10] I suspect that if you don't run vldb from the stone age, you can get rid of the call to VL_GetAddrs (without the U) in ListAddrs [08:19:59] For "vos setaddrs", you just need to call VL_RegisterAddrs with the UUID and the actual IP addresses. [08:20:02] I wonder if it is a bug that we're actually exposing the redirection markers from the vldb on the wire. I kind of think they should just be hidden from callers to VL_GetAddrs. But, obviously, there's code that expects them to be there. [08:20:17] ==jhutz [08:20:30] In this case, the vlserver is a black box. You stuff information in, you get information out. [08:21:02] VL_GetAddrs and VL_GetAddrsU exist solely to support 'vos listaddrs', which is a debugging tool. Better for the former to actually list the multi-home indices rather than simply not include an entry for that server at all. [08:21:21] yes, I am writing code that will use VL_RegisterAddrs() but while I am doing it I am seeing debugging messages which I want to understand. [08:28:26] I wonder what I should think about entry 10.0.0.1 which shows up from the VL_GetAddrs output but not from the VL_GetAddrsU output. I had no idea that it was there at all. [08:33:44] --- jaltman has become available [08:43:03] --- jaltman has left: Disconnected [08:48:53] What do you think about this patch (ignoring the first 2 chunks) /afs/stacken.kth.se/home/haba/Public/vos-setaddrs.patch ? [08:49:45] * haba has not tested it on any cell yet. [08:50:14] That should answer WAYRTTD, too. [08:52:48] The general premise looks reasonable. [08:52:51] You can't ViceLog() from vos. [08:52:56] I think the first two hunks are not about adding vos setaddrs, and maybe belong in a separate patch. Other than that, I suggest pushing to gerrit. [08:55:02] Ideally, it would also update the vos manpage to announce its existence, and have a manpage of its own. [08:55:55] Other stuff: Is it valid to call this without supplying a uuid? [08:56:24] probably not without an uuid, no. [08:56:38] You should catch that case, and throw an error. [08:57:02] Sure, that together with the manual page AFTER testing ;) [08:57:31] He shouldn't have to catch that case; the parameter isn't marked optional [08:57:43] That's fine. As I said, the approach seems fine. [08:58:11] --- abo has left [08:58:27] Now Sushi, then I will decide which cell I like the least ;) [08:58:40] .. and test this on. [08:59:08] --- abo has become available [09:11:49] --- haba has left [09:13:06] --- kaj has left [10:18:50] --- jaltman has become available [10:23:33] --- meffie has left [10:25:36] --- jaltman has left: Replaced by new connection [10:25:37] --- jaltman has become available [10:43:34] --- kaj has become available [10:44:37] --- haba has become available [10:51:36] > It doesn't have to be intuitive. [10:51:54] wasn't this in relation to openafs-listmgr? [10:53:09] reread. guess not. ok [10:53:25] --- kaj has left [10:55:43] --- kaj has become available [10:58:59] No, but openafs-listmgr doesn't have to be intuitive either, because openafs-info-admin and openafs-info-owner both work. [10:59:31] hey, as long as the answer was "not me" [11:15:51] --- Russ has left: Disconnected [11:16:44] VL_GetAddrs() says that I have 127.0.0.1 amonst the IP addrs but vos changeaddr -remove does not let me remove it. [11:17:06] is it considered in use, perhaps? [11:17:22] That would mean a volume.... Hmmmm [11:17:28] yes [11:17:36] vos listvl -server 127.0.0.1 [11:18:43] $ ./vos listvldb --cell stacken.kth.se -server 127.0.0.1 VLDB entries for server 127.0.0.1 Total entries: 0 Nope $ ./vos changeaddr -oldaddr 127.0.0.1 -remove -cell stacken.kth.se Could not remove server 127.0.0.1 from the VLDB vlserver does not support the remove flag or VLDB: no such entry (I can remove other IP addrs) [11:19:52] use check_vldb and see if you perhaps have a damaged vldb now [11:21:09] How do I run that> [11:21:10] ? [11:21:55] i bet it has a usage [11:22:00] maybe even a man page [11:26:39] The only binary which has the string "check_vldb" is the vlserver itself. [11:27:04] sorry. vldb_check [11:27:04] So what do you mean? [11:27:32] That I have [11:27:50] good. [11:30:40] --- deason has left [11:30:54] --- deason has become available [11:33:18] --- meffie has become available [11:34:38] Hmmmm # /usr/afs/etc/vldb_check vldb.DB0.safe -servers Header's maximum volume id is 536903089 and largest id found in VLDB is 536903086 MH block 0, index 1: 130.237.234.48 MH block 0, index 2: 130.237.234.101 MH block 0, index 7: 130.237.234.46 MH block 0, index 13: 130.237.234.47 Server ip addr 1 = MH block 0, index 2 warning: IP Addr for entry 3: Multihome entry has no ip addresses Server ip addr 3 = MH block 0, index 4 warning: IP Addr for entry 5: Multihome entry has no ip addresses Server ip addr 5 = MH block 0, index 6 Server ip addr 6 = 127.0.0.1 Server ip addr 7 = MH block 0, index 1 Server ip addr 11 = MH block 0, index 7 Server ip addr 15 = MH block 0, index 13 [11:38:47] I do not have any volume entries that have "server 6" (nor server 5 nor server 3 fortunately). [11:38:56] --- Russ has become available [11:41:01] changeaddr -remove should be able to remove 6, though, that's a bit odd [11:41:26] yes, it should. [11:41:40] I don't have any plans to use that addr for any server though ;-) [11:42:16] oh, uh, there's some special case for localhost addr [11:42:27] in vos; it gets the host name, and resolves that to an ip [11:42:53] good point. [11:43:04] (to permit vos cr localhost a ...) [11:43:06] if you can make `hostname` resolve to 127.0.0.1, that may get around that [11:43:12] (temporarily) [11:43:15] I am already hacking in vos... [11:43:49] Probably the 127.0.0.1 came into the vldb before all these special cases where implemented [11:43:59] Oh, hm. Yeah; you just need an older vos [11:44:10] he has source. he can fix it [11:44:20] yes [11:44:51] or vlclient could probably do it if it's handy [11:45:40] --- abo has left [11:46:37] --- abo has become available [11:49:08] I wonder if anyone has done/written up analysis about fileserver tuning [11:49:53] transarc did some. i have it somewhere. [11:50:06] this century [11:50:12] there'd be a talk for the workshop if i had time to redo it. [11:50:19] the logic was sound [12:07:07] --- bpoliakoff has become available [12:07:41] --- bpoliakoff has left [13:49:10] --- manfred has become available [13:50:00] fileserver tuning : 1) don't use ext3 2) see 1. [14:07:39] did you ever try data=writeback with ext3, by the way? [14:10:19] --- manfred has left [14:18:32] --- pod has left [14:19:43] --- jaltman has left: Disconnected [14:19:51] --- jaltman has become available [14:21:09] data=writeback is too dangerous. There's too large a chance of corruption resulting in one user's data being exposed to a different user on the system. [14:39:47] --- mdionne has become available [15:45:02] --- deason has left [16:12:52] --- deason has become available [16:48:07] Hmmm. The current git master cache manager seems to deadlock on the GLOCK. Off to sleep now, but just in case others are seeing issues ... [16:48:12] (on Linux, btw) [16:57:23] --- haba has left [17:02:27] --- meffie has left [17:03:49] --- kaj has left [17:18:23] --- mdionne has left [17:48:58] --- mdionne has become available [17:52:53] hmm, current master wedged my system after 10-15 minutes. 1.5.71 + fix also eventually wedged. [17:54:36] --- jaltman has left: Replaced by new connection [17:54:37] --- jaltman has become available [18:26:09] .. but that was with a really really small cache. with a more normal sized cache things look OK. [18:26:13] --- mdionne has left [19:41:32] > The current git master cache manager seems to deadlock on the GLOCK. alt-sysrq-t or it didn't happen [19:43:59] --- Jeffrey Altman has left [21:11:24] --- Russ has left: Disconnected [21:53:16] --- jaltman has left: Disconnected [21:53:24] --- jaltman has become available [21:54:10] --- deason has left [22:03:19] --- kaj has become available [22:36:40] --- kaj has left [22:38:41] --- reuteras has become available [23:35:33] --- kaj has become available