[00:17:25] --- cclausen has left [01:48:45] --- dev-zero@jabber.org has become available [02:09:00] --- dwbotsch has left [02:11:05] --- dwbotsch has become available [02:48:12] --- dwbotsch has left [02:49:54] --- dwbotsch has become available [03:11:44] --- haba has left [05:15:50] Further to the earlier discussion about wireshark showing loads of packets as AFS (RX) FS Request: Unknown(0) (0), I've now got a full packet trace with these in. [05:16:17] Basically, it's a properly formed RX header, with a service ID of 1, followed by a payload comprised entirely of NULL bytes,. [05:32:16] --- Russ has left: Disconnected [05:32:21] --- Russ has become available [05:33:23] --- Jeffrey Altman has left: Replaced by new connection [06:00:45] --- Russ has left: Disconnected [06:25:28] --- Jeffrey Altman has become available [06:26:51] what port is it being sent to? [06:27:12] FS? [06:45:30] Yup. [06:46:15] Seems to only happen as the number of packets being sent gets higher (I'm seeing this as a byproduct of iozone runs - they only appear as the number of store-data64 packets increases) [06:46:33] I was running my stress tests. I'm doing so now. [06:47:37] which means that it is likely that my improved RTT calculations are likely correct. [06:48:12] I suspect it is a side effect of retries [06:48:51] I fear it is attempts to retry packets that have already been placed on the free list. [06:49:04] That would make sense, I guess. [06:49:30] It's definitely trying to send an empty structure. And the checksum and everything else are correct according to wireshark. [06:54:02] I should rebuild with the new RTT calculations so that we actually send more retries. [06:54:18] See if you can take out your router again? :) [06:54:33] the existing algorithm results in the timeout value being too large and so we do not timeout when we should. [06:54:48] unfortunately taking out the router will be a side effect of that. [07:21:53] just running the stress test is not sufficient to cause the problem [07:35:26] Interesting. [07:35:37] I can reproduce it on demand with iozone -a -i0 -i1 [07:35:58] what are the retry counts for the peer? [07:40:00] I need to create an interface to permit rx_intentionallyDroppedOnReadPer100 and rx_intentionallyDroppedPacketsPer100 to be set at run time [07:40:58] Retry counts for the client, or for the fileserver? [07:41:33] I was thinking the client [07:42:51] rxdebug -rxstats 7001 ? [07:43:22] rxdebug 7001 -peer and then look at the entry for the file server [07:43:59] Peer at host 129.215.33.230, port 7000 ifMTU 1444 natMTU 1444 maxMTU 1444 packets sent 326518978 packet resends 20 bytes sent high 53 low -769127349 bytes received high 0 low 0 rtt 0 msec, rtt_dev 0 msec timeout 0.353 sec [07:48:22] not many retries at all [07:49:20] Indeed. Unless that counter has reset itself, that's much fewer retries than I was seeing NULL packets for. [08:09:08] --- dev-zero@jabber.org has left [08:17:01] which might indicate that the rx library doesn't think they are retries but new data packets [08:18:37] It's interesting if the issue exists on both Unix and Windows. [08:18:46] That would suggest its something internal to the rx library. [08:22:45] that is my assumption [08:23:37] we have an open RT ticket for an rx packet being freed more than once. [08:24:07] I'm hoping the issues are related [08:28:10] which reminds me. I should rebuild with RXDEBUG_PACKET defined [08:49:48] we need to define an EULA for inclusion in OpenAFS installers that display them. Forcing users to agree to the IPL is inappropriate as the IPL is not an EULA. [09:13:27] --- Jeffrey Altman has left: Replaced by new connection [09:19:55] --- Jeffrey Altman has become available [09:20:49] afsd is attempting to send packets at a rate of 54MB/sec through my cable router. obviously it is failing [09:22:23] --- agoode has left [09:29:00] I'm amazed you can still get XMPP through. [09:29:14] I couldn't. [09:29:29] I paused the service so I could try to analyze what is going on [09:32:19] You've probably got a better bet of doing so on Windows. Linux's kernel debugging tools aren't great. [09:32:51] it would be great if I can repo this with just the rx library in user land [09:47:21] one thing I see that is wrong is that peer->timeout.sec is somehow never being initialized. [09:51:12] or being assigned to a struct clock that is never initialized which is more likely [09:56:57] --- edgester has become available [09:58:41] --- dev-zero@jabber.org has become available [10:05:47] my trigger is an error in the RTT backoff logic that I added [10:13:00] stresstest is running again. perhaps this time I will recreate your bug [10:16:56] I do appear to be sending a lot of store-data-64 requests for a fileId, offset and length where all bytes are 'd8' [10:19:02] I'm out of time for today. Will come back to this later. [10:23:11] hmm. looks like it is a wireshark display issue. [10:23:44] by any chance is iostress writing files filled with zeros? [10:42:57] --- dragos.tatulea has become available [10:43:09] Hi. [10:43:42] I'm looking for the cache truncae wiki page and I can't find it on twiki. [10:44:19] Any ideas | suggestions of where it might have disappeared pls? [10:53:14] --- edgester has left [11:22:12] --- dev-zero@jabber.org has left [11:28:54] --- deason has become available [11:45:51] --- dev-zero@jabber.org has become available [12:52:48] --- dragos.tatulea has left [12:53:59] --- dragos.tatulea has become available [13:36:27] --- cclausen has become available [13:38:58] --- dragos.tatulea has left [15:04:21] --- dev-zero@jabber.org has left [15:44:59] --- Russ has become available [16:19:06] --- Russ has left: Disconnected [17:56:59] stress test ran all day. couldn't repo the all 0s issue today. [19:36:52] --- Russ has become available [20:39:22] --- Russ has left: Disconnected [21:03:34] --- dev-zero@jabber.org has become available [21:36:55] --- deason has left [22:40:08] --- dev-zero@jabber.org has left [22:46:16] --- cclausen has left [22:46:17] --- reuteras has become available [23:47:55] --- dev-zero@jabber.org has become available