Infiniband Network Mapping

So, recently I had to figure out a way to produce a map of the infiniband fabric. What this means, and how to display it, was a different matter. I had a number of tools available to do so, but the most important one I found was the ibnetdiscover tool that is packaged with the OFED (OpenFabrics Enterprise Distribution) software stack.

This tool allowed me to find the infiniband connections between the different switches in our fabric and the connections to individual nodes. Of course, all it would tell me is the IB GUID associated with each switch and node, although the nodes would let me know that they were connected to via “HCA-1”.

To fix the switches, I had to go to each switch and get their GUIDs manually and enter them into a file that ibnetdiscover supports as a lookup file. Not so bad for the 24-port edge switches that we have from Cisco, but the Silverstorm switches are unmanaged, and we cannot login to them. Fortunately I was smart when I installed the Silverstorm switches and grabbed the GUIDs off of them when I installed them.

The larger Cisco 9600 series (also known as TopSpin 270) switch was a much larger problem, because none of the GUIDs detailed by the software matched anything that the 9600 series switch shows in queries sent to it. Mapping that out was a pain, and the way that things are apparently connected in a 9600 series switch is interesting. There are three different kinds of blades in the 9600 series:

  • Port blades where you physically plug infiniband cables into the card
  • Management blades which plug into the back of the switch and manage the port blades
  • Super management blades which manage the management blades.

These are all interconnected: 2 port blades connect to a single management blade, which in turn are connected to each of the super management blades. So, when a message comes in across one of the cables, it enters through a port blade, goes to a management blade, which in turn decides whether it should go out another port on that same port blade, through a port on the other port blade that is attached to the management blade, or up to one of the super management blades. All of this is of course figured out via the subnet manager engine which runs on the 9600 series switch (in our case). That subnet manager can also be run in software on any host node, but because of the size of the infiniband network that we have, it was better to offload that to an actual hardware based subnet manager. I was able to figure out these interconnects on the 9600 switches by using the ibtracrt command against LIDs.

So, I had the different GUIDs allocated to different switches, but what about the individual nodes connected? All I was getting was “HCA-1” out of them, and trying to override via the lookup file wasn’t working. The problem turned out to be that when infiniband is firing up on the nodes, it was firing up so early that getting the hostname was not occurring, so all it had to go on was that it was connecting via port 1 on the infiniband HCA. By setting up a command to cat the hostname into the sys value on each node, this was fixed and the proper names were being displayed.

Now that I could get an output to come out that was right, it was a simple matter to write a perl script that would take this output and convert it to an output that would work well as an input for the program “dot”, which took that input and produced a very nice GIF of the infiniband network mappings.