[00:29:13] OK, now you have 50MB in the sandbox and 50MB in prod [00:42:19] --- dev-zero@jabber.org has left [01:46:41] --- dev-zero@jabber.org has become available [01:59:28] --- Simon Wilkinson has become available [03:03:00] --- pod has become available [03:20:06] --- cclausen has left [05:33:50] --- Jeffrey Altman has left: Replaced by new connection [05:34:13] --- Jeffrey Altman has become available [06:01:33] What do we want to do with the 1.4.x tree. Shall I pull over the last bits from CVS, or does someone else want to do it? [06:06:37] --- abo has left [06:14:46] if you have time to do so I see no reason why you should not just go ahead and do it. [06:16:11] I'm also wondering about direct pushes into 1.4.x for pullups. I know that having to push pullups through gerrit is a pain, but it does avoid accidental merge commits, and other history poisoning mess. [06:17:21] I think being able to double check things in gerrit is useful. [06:18:01] Gerrit currently has no mechanism for handling tagging, beyond direct pushs, but I can configure it so the only thing you can directly push are tags. [06:18:06] I would rather have the extra safeguard. [06:19:25] the reality is that for any major fix we usually have to produce a separate maintenance version of it anyone that should be reviewed. [06:20:57] Yeh. It's more for bug fixes that I can see it being a pain. I guess we won't really know until we start doing pullups. [06:21:37] --- cclausen has become available [06:23:31] For the one line changes we should cherry pick, push to gerrit, and approve. pushing the extra button is not all that hard. [06:26:36] --- reuteras has left [06:30:10] --- abo has become available [06:32:59] if you wish to pull over the last bits from cvs, please do. otherwise, i will do it when i next have network, which will either be at lunch, late afternoon, or sunday night (i have no idea, yet) [06:33:13] I'll do it later today. [06:34:12] finally got the rpm dir to rsync; your cell was "down" to me from when i got the mail until i slept [06:34:18] it's releasing now [06:35:21] Yes. All of our external routers crashed at some point in the night. [06:36:31] nice! [06:38:02] We had this happen in the past. If you have a particular packet type that crashes one router, and then your routing switches to a backup route, and that packet is still being sent, you can quite easily take down all of your redundancy (if all of the routers are the same software) [06:38:44] the benefits of diversification among deployed implementations. [06:38:52] Indeed. [06:39:04] yeah, this is the same sort of thing that killed zephyr. first server dies, zhm switches... kills second server [06:39:32] I expect that when a Windows server platform is available that many orgs will adopt it for just that reason. [06:44:16] There's something seriously wrong with our write code on Linux. [06:44:26] Compare http://homepages.inf.ed.ac.uk/sxw/graphs/2d-read.png with http://homepages.inf.ed.ac.uk/sxw/graphs/2d-reread.png [06:44:59] (This is from iozone, so the first graph is for a read operation of data that was written immediately before) [06:45:44] Until we hit the 'knee' at around 200,000KBytes, everything should be in the page cache. For some reason, our write isn't leaving the page cache in a state that 'read' can get data back from. [06:47:27] what does the network traffic show? the windows cm had the problem that when the dv on the object changed all of the data that was cached from the write had to be repopulated. [06:48:41] I'll have a look. I seem to recall that you mentioned that Windows bug a while a go, and I checked then that the Unix client didn't suffer from it, but I'll look again. [06:49:00] the solution was to implement not a single dv value as current but a range of values as current so that the merge status operation could be O(1) instead of O(num of buffers) or re-read the data from the file server. [06:49:38] The page cache is definitely in play here, though - my 'readahead' code that shows the dramatic improvement in the early stages of the read graph only triggers if the file is Statd and has valid chunks in the cache. [06:53:17] This may also explain something I've always wondered. I could never understand why people would run memcache on Linux. Given the existence of the page cache, there should be little or no benefit to memcache in a normal application - in fact, it should degrade performance as it reduces the size of memory available for the page cache. [06:58:47] in afs_ProcessFS() I do not see where the segments get their data version updated when the new data version is appled to the vcache object. [06:59:37] I thought LocalHero was responsible, but I could be mistaken. [07:03:05] --- deason has become available [07:04:43] it looks like afs_StoreAllSegments() does it. Assuming I am reading this correctly it obtains a list of all dcache entries that had the current version at the start, then afterwards it walks the list and sets the new version. So the behavior is O(num of dcache objects for the file) [07:04:46] Actually, afs_StoreAllSegments() has code to handle this, but I'm not sure if it will fire correctly in all cases. [07:05:20] Yes. It's inefficient. But not inefficient enough to account for the scale of performance difference that we're seeing. [07:05:56] you would be surprised. constructing that list and obtaining/releasing the locks is expensive. [07:06:15] Not in a file that only has a single chunk. [07:07:09] If you look at the graph, the performance difference is largest at the beginning, where the number of chunks is the smallest. [07:07:42] And obtaining and releasing locks is essentially free on a Linux CM which only has one reader/writer active. [07:07:57] it looks like dcList doesn't include all of the dcache entries for the file anyways. [07:08:35] its is easy enough to verify that the data is not being re-read from the file server. [07:09:03] Indeed. [07:09:23] --- pod has left [07:09:45] assuming it isn't. the next thing is the verify that the page cache does not get invalidated when the version of a DV changes due to a locally triggered Store. [07:10:25] --- mmeffie has become available [07:11:17] --- phalenor has left [08:04:44] --- phalenor has become available [08:05:35] http://dl.openafs.org/dl seems to be no worky [08:07:44] --- pod has become available [08:24:31] today is a sad day. http://www.sdtimes.com/blog/post/2009/07/16/The-End-of-Sun.aspx [08:30:07] Simon, any luck? [08:32:10] --- mmeffie has left [08:43:42] --- Russ has become available [08:48:19] someone in IRC mentioned: http://dl.openafs.org/dl doesn't seem to be working [08:48:26] is this to be expected? [08:49:03] it was mentioned here as well. there is no one here at the moment that can do anything. send mail to webmaster. [08:52:22] phalenor: did you email? [08:52:33] I suppose I can [08:52:53] please do [08:55:11] Jeffrey: Not as yet, got distracted so haven't looked at the results from the run. [08:56:04] email sent [09:11:31] --- agoode has become available [09:11:35] > http://dl.openafs.org/dl Yeah, I'm looking at that. [09:12:22] do you want nagios checks setup for some of this stuff? [09:21:59] I'm not sure it would make much difference; people seem to tell us quickly enough when something breaks. It's about someone being around to fix it. [09:23:09] I would say its about it not breaking in t he first place [09:23:29] Nagios doesn't really help with not breaking in the first place. [09:24:36] Jeffrey: No read-datas in sight. [09:37:14] then the data in the cache is valid and it either means that there is a lock competition problem or the page cache contents are being invalidated [09:37:51] Yes, which we knew because of the code paths being followed as the page cache is filled. [09:38:37] It's unlikely to be lock competition too, because we are being asked to fill the page cache. If things were working correctly, our readpages() routine would never be getting called. [09:40:21] so why is readpages() being called? what is the trigger event? any way of identifying that? [09:40:49] It's triggered when the kernel attempts to access a page which is not up to date. [09:41:09] by what criteria? [09:41:19] By the 'UpToDate' flag on the page. [09:42:12] can we determine why the flag is being modified? [09:43:19] btw, I love this http://blogs.sun.com/chandan/entry/copyrights_licenses_and_cddl_illustrated [09:43:42] Not easily, but there are three major possibilities: It's not getting set when the page is filled. It is getting set, but we are clearing it. It is getting set, but the kernel is clearing it. [09:44:27] the first should be easy to disprove. [09:44:54] the second should be easy to document [09:45:23] The issue is that things are complicated, due to the way that the writes actually happen in Linux. [09:45:41] printf to the rescue [09:45:44] We don't actually get asked to move the page from memory to backing store until pdflush decides that its time. [09:46:30] isn't this a case though where the page is already up to date and after the Store we should just be leaving it alone. [09:46:38] Bleah. michigan seems to be stuck in some kind of loop constantly doing vldb lookups. It doesn't seem to be totally killing it, but it is slowing things down a lot. [09:47:33] And I don't want to reboot it, because I'm not on campus, and I really do not have time to deal with this now, or today, or this weekend. [09:47:47] I've already wasted too much time. [09:49:09] --- agoode has left [09:49:18] Well, I was able to stop the web server cleanly and without much delay, so I'm going to pray it really isn't wedged and reboot it. [09:49:22] --- agoode has become available [10:00:49] All better. [10:01:13] That machine had a very outdated CellServDB; the only current server it knew about was andrew.e.kth.se [10:01:39] It probably needs an OS upgrade at some point, too. [10:13:19] --- mmeffie has become available [10:17:31] jhutz: /dl/openafs/ is still a bit wonky [10:17:39] http://dl.openafs.org/dl/openafs/1.4.11/foo/bar/ [10:18:19] is it okay in AFS directly? [10:19:37] It seems unlikely that AFS actually has an infinitely recursive directory tree. [10:20:21] Well, you can make one by screwing up your volume mounts ... [10:20:39] afs is fine http://www.secure-endpoints.com/afs/grand.central.org/software/openafs/1.4.11/ [10:22:36] yeah, that was going to be my next suggestion [10:23:12] Oops. [10:23:27] You can also make one by screwing up your rewrites. [10:23:35] or you an intentionally make one recurse forever: /afs/acm.uiuc.edu/.recursivefindtrap/.recursivefindtrap/.recursivefindtrap [10:23:42] And that is probably what was actually causing the problem before. [10:25:44] Ah, but does it appear in the right place? Ours is actually /afs/cs.cmu.edu/.rft, so I could fit it in the first directory slot. What I really need to do is compute a name that is short and hashes into the first bucket. [10:26:05] yeah, I am not sure if its in the first bucket or not [10:27:07] Hmmm. Wireshark is seeing a load of "AFS (RX) FS Request: Unknown(0) (0)" RPCs. I wonder if wireshark is bust or if something more interesting is going on ... [10:33:06] I was seeing those to the other day and thought they were the result of the modifications I had made to my local tree related to retry processing. I put back the old library and they went away. Perhaps I should have dived in deeper. [10:33:33] they were being sent fast enough to take down my router. [10:34:23] --- dev-zero@jabber.org has left [10:34:58] This is 1.4.10 client against 1.4.10 server. [10:36:03] then it is either a bug in wireshark or a bug we don't know about that and I should have examined it more closely. [10:37:24] My .rft is inode 3756 and dates to Feb, 1997 [10:39:18] My suspicion is that it's an issue in the wireshark packet decoder. [10:39:26] Just waiting to get a packet dump I can prod. [11:00:29] iirc, mike meffie (mmeffie) and matt benjamin both have some fairly recent decoders [11:01:14] stevenjenkins: what jabber client are you using? your messages aren't wrapping for me and seem to be cut-off at hte edge of the window [11:04:07] Do mine have that same problem, OOC? [11:05:06] no [11:05:07] stevenjenkins: i did a patch to tcpdump, to decod some missing rpcs at the time (such as fetchstatus64) [11:05:30] Interesting. I'm using pidgin and not adding any newlines, and was wondering. jhutz's messages all show up with hard newlines, for instance. [11:05:57] yeah, I am seeing that too [11:06:15] Yeah, I assume that's because he's using barnowl. [11:17:40] hm, if I shrink my window enough, his messages appear to wrap for me [11:20:41] oh, hmm. maybe its fine. I thought there was more after deco (as that is the last part that I can see) [11:20:48] my log shows the message ending with decoders though [11:21:06] so maybe Trillian just fails at wrapping the last word [11:27:09] cclausen: I use pidgin [11:28:28] hmm [11:28:31] ok [11:28:48] must be something with formating as well. font size is about twice the size for your messages [11:29:12] I really need to find a text based client for Windows... [11:44:18] --- dev-zero@jabber.org has become available [12:03:39] --- Rrrrred has left [12:04:16] --- dwbotsch has become available [13:02:51] --- phalenor has left [13:03:46] --- phalenor has become available [14:04:52] --- stevenjenkins has left [14:34:29] --- cclausen has left [14:37:46] --- cclausen has become available [14:42:12] --- mmeffie has left [14:56:32] --- dev-zero@jabber.org has left [15:04:15] --- deason has left [15:18:44] --- deason has become available [15:41:06] --- dlc has become available [16:33:53] --- pod has left [17:26:59] --- edgester has become available [17:29:23] Should I do anything to have the latest newsletter put into the new website git repo? I put it in CVS, but only after the conversion happened [17:29:54] Speak to Russ - I think he was the one setting that up. [17:31:37] Simon: Could you hook openafs-web.git into Gerrit? [17:31:43] And use a different owner group than the one for openafs.git? [17:31:48] Then I can give Jason access that way. [17:32:11] Certainly can. Hang on. [17:32:20] Cool, thanks. [17:32:38] Which branches would you like hooked up - just master? [17:32:41] I think the theory was that Jason should be able to push changes to Gerrit for openafs-web.git and approve them directly himself. [17:32:45] Yeah, that's the only branch there. [17:33:19] If I'm ever not around, the instructions on how to do this are at http://gerrit.googlecode.com/svn/documentation/2.0/project-setup.html [17:33:26] Oh, cool, thanks. [17:34:19] oh, that would be sweet! [17:34:52] more gerrit goodness [17:35:43] Is the git repo already set up for git-daemon and http? [17:36:17] Yes. [17:36:52] --- agoode has left [17:37:02] --- agoode has become available [17:40:30] That should be that done. [17:40:49] New group 'Web Publishers' which currently just contains Jason. [17:41:05] Push to gerrit.openafs.org/openafs-web.git [17:42:11] If you want commit mail, or anything else to happen as a result of hooks (auto checkout, for example), you can do that in the srv/git/ repo - everything should go through there. [17:43:15] thanks Simon! [17:45:12] ok, now if I can remember how to get a diff for the past two days of my changes so I can apply them in git [17:52:15] meh, cvs diff is more painful than just copying the three files that I changed [17:52:43] I don't miss CVS at all. [17:58:54] ugh, how do I abandon uncommitted changes in git? [17:59:09] git reset --hard HEAD will throw them all away [17:59:30] thanks [18:05:34] Do we want the web tree to cherry-pick or merge its commits? [18:05:54] I like the way cherry-pick has been working. [18:06:10] Jeff, Derrick, and I should probably also be in that group, unless you also added the regular gatekeeper group separately. [18:06:38] Gatekeepers have all the same rights, too. [18:06:46] Oh, okay. Excellent. [18:32:47] From https://www.ohloh.net/p/openafs/factoids/1738180 Over the past twelve months, 41 developers contributed new code to OpenAFS. This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh. For this measurement, Ohloh considered only recent changes to the code. Over the entire history of the project, 211 developers have contributed. [19:11:39] --- Russ has left: Disconnected [19:13:53] yay, I commited the newsletter to the openafs-web.git repo and figured out how to make gerrit merge it. [19:20:17] --- edgester has left [19:29:03] --- Jeffrey Altman has left [19:29:57] --- Jeffrey Altman has become available [19:34:16] --- Russ has become available [19:34:36] Cool! [19:35:53] Should now be live on sb.openafs.org. [20:45:25] --- andersk@mit.edu/vinegar-pot has left [20:45:26] --- andersk@mit.edu/vinegar-pot has become available [20:49:03] --- andersk@mit.edu/vinegar-pot has left [20:49:39] --- andersk@mit.edu/vinegar-pot has become available [23:41:39] --- deason has left