Discussion:
Tuning Myri10ge driver and missing myri10ge.conf
Don
2011-04-21 21:55:05 UTC
Permalink
I have several OpenIndiana b147 boxes serving as a NAS heads with a dual port Myricom 10G NIC in each head.

The 10G network has been used for testing and we've been gathering performance numbers. We're ready to enable jumbo frames but I don't see a /kernel/drv/myri10ge.conf file in which to make the changes. Anyone know why the file might be missing? I don't see it on any of the OI boxes I've checked.

Can anyone recommend network tuning parameters that we should consider using for a NAS box serving as an ISCSI target with dual 10g interfaces for 25 ESX servers (1G Interfaces)?

Myricom has several recommendations:
/etc/system:
set ddi_msix_alloc_limit=8
set pcplusmp:apic_multi_msi_max=8
set pcplusmp:apic_msix_max=8
set pcplusmp:apic_intr_policy=1

/etc/kernel/myri10ge.conf:
myri10ge_bigbufs_initial=4096;
myri10ge_bigbufs_max=32768;

Might be helpful:
myri10ge_lro=1;
myri10ge_lro_max_aggr=2;

For Low Latency:
myri10ge_use_msix=0;
myri10ge_intr_coal_delay=0;

Any opinions on:
myri10ge_max_slices=1;

There are 20-odd ESX servers accessing this host at the same time- would additional slices be useful?

Is there any consensus on whether a low latency or high bandwidth configuration would be more useful for this sort of environment?

Any guidance would be appreciated.

Thanks in advance,
-Don
--
This message posted from opensolaris.org
Andrew Gallatin
2011-04-22 11:37:02 UTC
Permalink
I'm the original author. I just answered most of your
questions on your help ticket, but it may be a while before
our support team passes my reply on.
Post by Don
I have several OpenIndiana b147 boxes serving as a NAS heads with a dual port Myricom 10G NIC in each head.
The 10G network has been used for testing and we've been gathering performance numbers. We're ready to enable jumbo frames but I don't see a /kernel/drv/myri10ge.conf file in which to make the changes. Anyone know why the file might be missing? I don't see it on any of the OI boxes I've checked.
The syntax is as you guessed:

myri10ge_mtu_override=9000;
Post by Don
Can anyone recommend network tuning parameters that we should consider using for a NAS box serving as an ISCSI target with dual 10g interfaces for 25 ESX servers (1G Interfaces)?
set ddi_msix_alloc_limit=8
set pcplusmp:apic_multi_msi_max=8
set pcplusmp:apic_msix_max=8
set pcplusmp:apic_intr_policy=1
I believe those last 3 (pcplusmp*) are no longer required
in this version of opensolaris, but they should check the
web to confirm. See http://www.solarisinternals.com/wiki/index.php/Networks
Post by Don
myri10ge_bigbufs_initial=4096;
myri10ge_bigbufs_max=32768;
myri10ge_lro=1;
myri10ge_lro_max_aggr=2;
I think b147 has fixed the 2 mblk chain limit that forces packets
through a slow path in TCP, so you can probably increase that to
8.

Note that you can play with this at runtime via ndd.
Post by Don
myri10ge_use_msix=0;
myri10ge_intr_coal_delay=0;
myri10ge_max_slices=1;
There are 20-odd ESX servers accessing this host at the same time- would additional slices be useful?
Yes. The purpose of the above (ddi_msix_alloc_limit=8) is to allow the
driver to allocate up to 8 MSI-X vectors, for 8 slices (tx/rx queue
pairs). Some of these proposed settings (myri10ge_use_msix=0;
myri10ge_max_slices=1;) disable multiple slices, and negate the
ddi_msix_alloc_limit tuning. So do one or the other, but not both :)

Most of the "low latency" tuning suggestions you found are for workloads
like HFT, where every microsecond matters. For a fileserver, I'd
suggest optimizing for CPU utilization (eg, bandwidth).

Drew
Don
2011-04-22 14:52:58 UTC
Permalink
I just answered most of your questions on your help ticket, but it may be a while before our support team passes my reply on
Not a problem- I wanted to get as much insight as possible and have somewhere to document my findings for anyone else that comes along.
I believe those last 3 (pcplusmp*) are no longer required in this version of
opensolaris, but they should check the web to confirm. See
http://www.solarisinternals.com/wiki/index.php/Networks
I've been using the SolarisInternals wiki but some of the information seems out of date and/or contradictory to what Myricom recommends. For example:

Myricom recommends a tcp_recv_hiwat that is twice what the SolarisInternals wiki recommends. I'm using the Myricom recommendation.

SolarisInternals also recommends setting ip_soft_rings_cnt but I don't see it in ndd and it's commitment level is listed as obsolete here: http://download.oracle.com/docs/cd/E19082-01/819-2724/gbsbo/index.html

Does anyone know if this tunable still exists? I've tried querying ndd -get /dev/ip ? as well as a few other places and don't see it anywhere.
I think b147 has fixed the 2 mblk chain limit that forces packets
through a slow path in TCP, so you can probably increase that to 8.
I'll give that a try. Thanks.
Some of these proposed settings (myri10ge_use_msix=0;
myri10ge_max_slices=1;) disable multiple slices, and negate the
ddi_msix_alloc_limit tuning. So do one or the other, but not both :)
Yes- I won't be using both- I was just trying to figure out which one made the most sense for my workload.
Most of the "low latency" tuning suggestions you found are for workloads
like HFT, where every microsecond matters. For a fileserver, I'd
suggest optimizing for CPU utilization (eg, bandwidth).
And that is the exact bit of information I was looking for- Thanks!


-Don
--
This message posted from opensolaris.org
Don
2011-04-22 15:06:19 UTC
Permalink
So with Andrew's guidance here are the config files I'll be testing:

/kernel/myri10ge.conf:
#
# Myricom 10G Ethernet driver configuration
#

# MTU - Jumbo Frames
myri10ge_mtu_override=9000;

# Set RX buffer counts:
myri10ge_bigbufs_initial=4096;
myri10ge_bigbufs_max=32768;

# Large Receive Offload
# Note that increasing this to a value beyond 2 causes the Solaris kernel
# to take a slow path on receive (and not honor TCP checksum offload).
# This is fixed in recent kernels. If increasing this above 2 results in
# poor performance, your kernel does not have the fix.
myri10ge_lro=1;
myri10ge_lro_max_aggr=8;

# Interrupt Coalescence Delay
# Default is 125
# Should be set to 0 for Latency (HFT) and 125 for Bandwidth (File Servers)
myri10ge_intr_coal_delay=125;

# Running a lot of streams in parallel- Pick this or the next- not both
myri10ge_use_msix=1;
myri10ge_max_slices=8;

/etc/system:
* Settings recommended by Myricom for the myri10ge cards
* See: http://www.myri.com/serve/cache/511.html#solaris
* And: http://www.solarisinternals.com/wiki/index.php/Networks
*
set ddi_msix_alloc_limit=8
* set pcplusmp:apic_multi_msi_max=8
* set pcplusmp:apic_msix_max=8
* set pcplusmp:apic_intr_policy=1

/etc/inittab:
tm::sysinit:/usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 786432 > /dev/console
tm::sysinit:/usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 786432 > /dev/console
tm::sysinit:/usr/sbin/ndd -set /dev/tcp tcp_max_buf 2097152 > /dev/console
tm::sysinit:/usr/sbin/ndd -set /dev/tcp tcp_cwnd_max 2097152 > /dev/console
tm::sysinit:/usr/sbin/ndd -set /dev/tcp tcp_naglim_def 1 > /dev/console

Nagle has been disabled- whether this is done anyway for iscsi targets I'm not certain but I don't foresee a problem disabling it for our usage pattern. We'll test both ways just to be sure.
--
This message posted from opensolaris.org
Loading...