Discussion:
IP over infiniband unable to find route to some hosts
Martyn Klassen
2011-03-25 12:32:56 UTC
Permalink
I upgraded from OpenSolaris 2009.06 (snv_111b) to the snv_134b and am observing some odd behaviour with IP over Infiniband. I can communicate with most of my host without problems, but a couple of hosts report "no route to host errors." These hosts are still accessible from other hosts on the network, just not the upgraded system. The inaccessible hosts are also unable to access the upgraded system. If I boot back into snv_111b the problem goes away.

I have used the Open Fabrics tools available on one of my Linux systems to trace the underlying IB network and all the hosts seem to be able to communicate without issue at the IB level, just not the IPoIB level. The inaccessible hosts are not unique, ie hosts with identical hardware and software are accessible.

Any suggestions on diagnosing or solving this issue would be appreciated?
--
This message posted from opensolaris.org
sowmini.varadhan-QHcLZuEGTsvQT0dZR+
2011-03-25 14:20:58 UTC
Permalink
Post by Martyn Klassen
I upgraded from OpenSolaris 2009.06 (snv_111b) to the snv_134b and am
observing some odd behaviour with IP over Infiniband. I can communicate
with most of my host without problems, but a couple of hosts report "no
route to host errors." These hosts are still accessible from other
hosts on the network, just not the upgraded system. The inaccessible
hosts are also unable to access the upgraded system. If I boot back
into snv_111b the problem goes away.
I have used the Open Fabrics tools available on one of my Linux
systems to trace the underlying IB network and all the hosts seem to be
able to communicate without issue at the IB level, just not the IPoIB
level. The inaccessible hosts are not unique, ie hosts with identical
hardware and software are accessible.
Any suggestions on diagnosing or solving this issue would be appreciated?
What does the routing table say for these unreachable hosts (i.e., what's
the output of 'netstat -rn')? What does "route -n get <host>" return
for these hosts?

Can you run the following dtrace script:

#!/usr/sbin/dtrace -Cs -
ip_drop_output:entry
{
printf("%s\n", (string) arg0);
stack();
}
ire_reject:entry
{
stack();
}

--Sowmini
Martyn Klassen
2011-03-25 15:06:27 UTC
Permalink
I traced the issue down to the configuration of IPoIB on the two affected hosts. They were configured to use connected mode instead of datagram mode. When I switched the hosts to using datagram mode the communication was restored.
--
This message posted from opensolaris.org
Loading...