• Category Archives Linux
  • Speed testing of Fulong

    Reading Ryan Hill’s blog, he did a test to see the merge times needed for a bunch of KDE packages (why people want KDE in the first place is beyond me, but whatever…) Apparently he did these on an SGI O2, and I was curious to see how they compared to the Fulong. The following is my result:

    kiona ~ # for pkg in kdeaddons kdeadmin kdeartwork kdebase kdeedu kdegames kdegraphics kdelibs kdemultimedia kdenetwork kdepim kdetoys kdeutils kdewebdev; do genlop -t $pkg; done
    * kde-base/kdeaddons

    Sat Jan 12 13:12:16 2008 >>> kde-base/kdeaddons-3.5.8
    merge time: 55 minutes and 54 seconds.

    * kde-base/kdeadmin

    Sat Jan 12 01:31:28 2008 >>> kde-base/kdeadmin-3.5.8
    merge time: 18 minutes and 51 seconds.

    * kde-base/kdeartwork

    Fri Jan 11 15:24:05 2008 >>> kde-base/kdeartwork-3.5.8
    merge time: 22 minutes and 38 seconds.

    * kde-base/kdebase

    Fri Jan 11 01:36:48 2008 >>> kde-base/kdebase-3.5.8-r5
    merge time: 4 hours, 59 minutes and 46 seconds.

    Sat Mar 1 17:54:52 2008 >>> kde-base/kdebase-3.5.8-r6
    merge time: 4 hours, 59 minutes and 12 seconds.

    * kde-base/kdeedu

    Sat Jan 12 01:12:37 2008 >>> kde-base/kdeedu-3.5.8
    merge time: 1 hour, 52 minutes and 53 seconds.

    * kde-base/kdegames

    Fri Jan 11 14:37:26 2008 >>> kde-base/kdegames-3.5.8
    merge time: 1 hour and 55 seconds.

    * kde-base/kdegraphics

    Sat Jan 12 04:00:52 2008 >>> kde-base/kdegraphics-3.5.8-r2
    merge time: 2 hours, 29 minutes and 24 seconds.

    * kde-base/kdelibs

    Mon Jan 7 21:44:56 2008 >>> kde-base/kdelibs-3.5.8-r2
    merge time: 3 hours, 56 minutes and 15 seconds.

    Tue Jan 15 12:47:09 2008 >>> kde-base/kdelibs-3.5.8-r2
    merge time: 4 hours, 22 minutes and 31 seconds.

    Fri Feb 22 14:16:42 2008 >>> kde-base/kdelibs-3.5.8-r3
    merge time: 4 hours, 17 minutes and 16 seconds.

    Sat Mar 1 04:58:21 2008 >>> kde-base/kdelibs-3.5.8-r3
    merge time: 4 hours, 17 minutes and 41 seconds.

    * kde-base/kdemultimedia

    Sat Jan 12 05:28:55 2008 >>> kde-base/kdemultimedia-3.5.8-r1
    merge time: 48 minutes and 45 seconds.

    * kde-base/kdenetwork

    Fri Jan 11 23:19:43 2008 >>> kde-base/kdenetwork-3.5.8
    merge time: 3 hours, 25 minutes and 1 second.

    * kde-base/kdepim

    Sat Jan 12 12:16:22 2008 >>> kde-base/kdepim-3.5.8
    merge time: 5 hours, 51 minutes and 46 seconds.

    * kde-base/kdetoys

    Tue Jan 8 02:13:33 2008 >>> kde-base/kdetoys-3.5.8
    merge time: 13 minutes and 19 seconds.

    * kde-base/kdeutils

    Fri Jan 11 18:14:59 2008 >>> kde-base/kdeutils-3.5.8-r1
    merge time: 1 hour, 7 minutes and 58 seconds.

    * kde-base/kdewebdev

    Thu Jan 10 18:10:38 2008 >>> kde-base/kdewebdev-3.5.8
    merge time: 1 hour, 28 minutes and 11 seconds.

    I estimate that I am getting anywhere from 10-100% better performance out the Fulong than he is out of his O2. If I get the Origin 200 back up and running, I’ll see what happens with that as well. Probably won’t bother with the Indy, however, as the numbers would just be stupidly long.


  • Fulong issues

    So, while I have been doing a compile of compiz on the Fulong, it wanted to compile kdebase as well. Fine, no big deal, except that it fails the compile of that package about halfway through. Upon restart of the compile it completes without any issues. This is a problem that I have seen (and reported) before with the Fulong, and now I think I have the answer as was so graciously bestowed upon me by Thiemo Seufer:

    The problem is a compound of
    1) Not enough RAM (only 512 MB) in some machines, which causes an
       increasing number of package builds to use swap, and some of them
       to evenutually fail to build because of a timeout.
    2) Slow on-board PIO IDE, from which the firmware can boot from
    3) A kernel-imposed limit of 1 GB when PCI DMA devices (like a SATA
       disk controller) is used.
    4) A kernel bug in the cache coherency management which hits PIO IDE,
       and causes instability since kernel 2.6.18. Up to then, the problem
       was mostly papered over by an excessive amount of cache flushing in
       the kernel code. This problem went unnoticed upstream since PIO IDE
       is these days only used on very small/cheap systems, where a
       different code path is used.

    So, the first one is the most telling, at least for my situation. I suppose the other problems (at least 2 and 4… 3 doesn’t apply since I don’t (and cannot)  have SATA controllers on the Fulong at this time) are also applicable, but I haven’t really noticed them one way or another.

    Thiemo notes that problem 3 has supposedly been fixed in kernel 2.6.22+, and since I am running 2.6.23.14 at the moment, this should be fixed for me as well, not that it matters. He also has a fix in for 4, and it is waiting for upstream review. 2? Well, his solution is to add SATA disks, so apparently it is not really a problem for me.

    His answer to 1 is to add memory to the system, and recommends 1-2gb. Well, seeing as I already have 1gb in the system, obviously this is still an issue regardless of what he recommends, and hopefully a real solution will come about.


  • xorg 1.4.x on Fulong

    So once I got gentoo installed on the fulong, I decided to do an emerge world update on the system. Yes, there were some problems during this and the KDE update was a bit hairy, but it did eventually install everything in the list.

    One issue I found with it was that the emerge would go for a while, then die with a compile error. If I started the emerge again it would continue for a while, then die again in a new spot on a different package. I had to do this iteration more than once because of packages breaking during compile, even though when restarted they would compile fine. Is this a problem with the Fulong where it is getting overstressed or something? The system has a full gigabyte of RAM installed so that is probably not the problem, unless there is some spot in the memory stick that is bad that I do not know about.

    Another thing I found was that X broke. Everytime I fired it up, it would die with a segfault. Eventually I found a patch that was running around out there from Zhang Le that fit the bill, and once I patched it with his patch it started to work again. Apparently this is a problem with the 1.4.x series of X. In my case it is a pre-release version (1.4.0.90-r3).

    The patch I installed is now on my local web site.

    There are also still some problems with KDE, so I am now in the process of unmerging KDE completely from my system so that I can put in something else that I more prefer. Still haven’t quite decided WHAT, of course… 🙂


  • Testing offer

    Just made an offer to Stuart Longland to do some stage testing for the 2008.0 build of gentoo on various platforms. I have in my possession a Lemote Fulong, an SGI Origin 200, and an SGI Indy r5k machine (I also have an r4600, but we won’t go into that… I’m not even sure if it works.)

    Doing this sort of testing is just the sort of thing that I enjoy working with, and I have the equipment almost setup and ready to go now to actually do it. I have to pull out a big-ass Gateway server that is sitting in the rack improperly in order to put in some rack rails that will support it, but once that is done I can install the rest of the rails that I have for the rack and rack up the Origin 200 and possibly the Indy, assuming the rails will work for it. Otherwise I may just have to set it on top of the Origin 200 instead.


  • Gentoo on Fulong

    So, a couple of days ago, I decided that it was high time to install a different version of linux on my Lemote Fulong.

    First of all, the Lemote Fulong is a Chinese desktop machine designed for low power with a decent amount of capability. It is a MIPS based machine, using pretty much all of the MIPS architecture that is not patented by various other corporations. As such, it works pretty well in this regard.

    It only runs linux based operating systems, as Microsoft certainly never wrote a version of Windows for the MIPS architecture. The closest they have ever come is by writing Windows NT for the Alpha architecture. So linux it is.

    By default, the Fulong comes with a modified Debian installation installed, and that is maintained by the Lemote corporation. Unfortunately they do not maintain everything as up-to-date as I would like it, so I went searching for other distributions. I found that Stuart Longland had been working on getting gentoo working on the Fulong, so I patiently waited for him to get something going. That day happened last week.

    He had released a kernel for the machine a while ago, but the rest of the distribution was still coming along. Last week he released a liveUSB image of his distribution for use, and I quickly grabbed that and started playing with it.

    His instructions included putting a netboot kernel and the liveUSB image onto a flash key and booting the entire system from the key, but I would rather use the keys as little as possible, so I put the kernel onto a tftp server I have available to boot from instead. From there I was able to bootstrap into the liveUSB image and play around.

    Of course, this wasn’t enough, and I wanted the entire thing installed onto the harddrive of the fulong instead. My first couple of attempts didn’t go so well, and ended up in a mess. Finally I was able to build a distribution on the harddrive that had a working kernel that would boot, and I am now good to go with a full gentoo distribution on the harddrive, and I can now remove the USB key for good from the system.


  • Infiniband Network Mapping

    So, recently I had to figure out a way to produce a map of the infiniband fabric. What this means, and how to display it, was a different matter. I had a number of tools available to do so, but the most important one I found was the ibnetdiscover tool that is packaged with the OFED (OpenFabrics Enterprise Distribution) software stack.

    This tool allowed me to find the infiniband connections between the different switches in our fabric and the connections to individual nodes. Of course, all it would tell me is the IB GUID associated with each switch and node, although the nodes would let me know that they were connected to via “HCA-1”.

    To fix the switches, I had to go to each switch and get their GUIDs manually and enter them into a file that ibnetdiscover supports as a lookup file. Not so bad for the 24-port edge switches that we have from Cisco, but the Silverstorm switches are unmanaged, and we cannot login to them. Fortunately I was smart when I installed the Silverstorm switches and grabbed the GUIDs off of them when I installed them.

    The larger Cisco 9600 series (also known as TopSpin 270) switch was a much larger problem, because none of the GUIDs detailed by the software matched anything that the 9600 series switch shows in queries sent to it. Mapping that out was a pain, and the way that things are apparently connected in a 9600 series switch is interesting. There are three different kinds of blades in the 9600 series:

    • Port blades where you physically plug infiniband cables into the card
    • Management blades which plug into the back of the switch and manage the port blades
    • Super management blades which manage the management blades.

    These are all interconnected: 2 port blades connect to a single management blade, which in turn are connected to each of the super management blades. So, when a message comes in across one of the cables, it enters through a port blade, goes to a management blade, which in turn decides whether it should go out another port on that same port blade, through a port on the other port blade that is attached to the management blade, or up to one of the super management blades. All of this is of course figured out via the subnet manager engine which runs on the 9600 series switch (in our case). That subnet manager can also be run in software on any host node, but because of the size of the infiniband network that we have, it was better to offload that to an actual hardware based subnet manager. I was able to figure out these interconnects on the 9600 switches by using the ibtracrt command against LIDs.

    So, I had the different GUIDs allocated to different switches, but what about the individual nodes connected? All I was getting was “HCA-1” out of them, and trying to override via the lookup file wasn’t working. The problem turned out to be that when infiniband is firing up on the nodes, it was firing up so early that getting the hostname was not occurring, so all it had to go on was that it was connecting via port 1 on the infiniband HCA. By setting up a command to cat the hostname into the sys value on each node, this was fixed and the proper names were being displayed.

    Now that I could get an output to come out that was right, it was a simple matter to write a perl script that would take this output and convert it to an output that would work well as an input for the program “dot”, which took that input and produced a very nice GIF of the infiniband network mappings.