Mellanox MT27500 ConnectX3

Hardware

The testing was performed between 2 SSTAR nodes (sstar158 & sstar159) from the Green2 Supercomputer. Each server was isolated from the rest of the supercomputer for the purposes of this benchmarking. The configuration of the hardware was:

  • 2 x E5-2660 CPUs
  • 64 GB RAM
  • Mellanox MT27500 ConnectX 3 in PCIe 3.0 slot

The ports on the Mellanox cards were connected to each other directly (no switch):

  • Port 1: QSFP+ cable, X m
  • Port 2: QSFP+ to 10GbE cable

Results

The purpose of this testing was to establish whether line rate 10GbE communications can ocurr on 1 port of the Mellanox card whilst the other port is receiving native Infiniband data at high data rates. The summary is that this card can indeed to this.

Port 1 (FDR IB) Port 2 (10GbE)
Test Rate [MB/s] Test Rate [MB/s]
rdma_bw -n 5000 5947 MB/s (47.6 Gb/s)
ibv_send_bw -n 50000 6041 MB/s (48.3 Gb/s)
netperf TCP_STREAM, 1500 MTU 1174 MB/s (9.395 Gb/s)
netperf UDP_STREAM, 9000 MTU 1240 MB/s (9.926 Gb/s)
netperf TCP_STREAM, 9000 MTU 1238 MB/s (9.907 Gb/s)
ib_send_bw -n 50000 6040 MB/s (48.3 Gb/s) netperf TCP_STREAM, 9000 MTU 1238 MB/s (9.908 Gb/s)

TEST 1: Port 1, ib_send_bw test

The first testing performed was with the ib_send_bw application from the OFED perftest package. The best average bandwidth was 6000.52 MB/s which equates to 48 Gb/s (of maximum possible 56 Gb/s).

[ajameson-test@sstar158 src]$ ib_send_bw -i 1 -a sstar159
------------------------------------------------------------------
                    Send BW Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 300
 CQ Moderation   : 50
 Mtu             : 4096B
 Link type       : IB
 Max inline data : 0B
 rdma_cm QPs   : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0xb9 QPN 0x00b8 PSN 000000
 remote address: LID 0xba QPN 0x00b4 PSN 000000
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 2          1000           8.41               7.83   
 4          1000           16.69              15.00  
 8          1000           33.57              30.84  
 16         1000           67.15              60.29  
 32         1000           134.02             122.76 
 64         1000           269.12             242.15 
 128        1000           533.97             482.53 
 256        1000           1065.81            982.46 
 512        1000           2127.41            1908.28
 1024       1000           3605.17            3424.30
 2048       1000           4490.45            4383.69
 4096       1000           5076.62            5038.27
 8192       1000           5390.23            5388.25
 16384      1000           5805.28            5796.05
 32768      1000           5899.93            5899.49
 65536      1000           5950.22            5950.13
 131072     1000           5992.22            5992.10
 262144     1000           5991.90            5991.90
 524288     1000           6000.85            6000.52
 1048576    1000           5996.65            5996.65
 2097152    1000           5988.60            5988.60

TEST 2: Port 1, ib_send_bw test

The first testing performed was with the ib_send_bw application from the OFED perftest package. The best average bandwidth was 6000.52 MB/s which equates to 48 Gb/s (of maximum possible 56 Gb/s). Since the FDR connection utilizes a

[ajameson-test@sstar158 src]$ ib_send_bw -i 1 -a sstar159
------------------------------------------------------------------
                    Send BW Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 300
 CQ Moderation   : 50
 Mtu             : 4096B
 Link type       : IB
 Max inline data : 0B
 rdma_cm QPs   : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0xb9 QPN 0x00b8 PSN 000000
 remote address: LID 0xba QPN 0x00b4 PSN 000000
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 2          1000           8.41               7.83   
 4          1000           16.69              15.00  
 8          1000           33.57              30.84  
 16         1000           67.15              60.29  
 32         1000           134.02             122.76 
 64         1000           269.12             242.15 
 128        1000           533.97             482.53 
 256        1000           1065.81            982.46 
 512        1000           2127.41            1908.28
 1024       1000           3605.17            3424.30
 2048       1000           4490.45            4383.69
 4096       1000           5076.62            5038.27
 8192       1000           5390.23            5388.25
 16384      1000           5805.28            5796.05
 32768      1000           5899.93            5899.49
 65536      1000           5950.22            5950.13
 131072     1000           5992.22            5992.10
 262144     1000           5991.90            5991.90
 524288     1000           6000.85            6000.52
 1048576    1000           5996.65            5996.65
 2097152    1000           5988.60            5988.60
 4194304    1000           5670.85            5661.53
 8388608    1000           5646.17            5643.64
------------------------------------------------------------------

TEST 3: Port 2, ib_send_bw test

The second test was the same ib_send_bw test, but this time using port 2 (10GbE), here the maximum data rate achieved was only 1089 MB/s (8.4 Gb/s)

[ajameson-test@sstar158 src]$ ib_send_bw -i 2 -a sstar159
------------------------------------------------------------------
                    Send BW Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 300
 CQ Moderation   : 50
 Mtu             : 1024B
 Link type       : Ethernet
 Gid index       : 0
 Max inline data : 0B
 rdma_cm QPs   : ON
 Data ex. method : rdma_cm
------------------------------------------------------------------
 local address: LID 0000 QPN 0x00b5 PSN 000000
 GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:34:07:66
 remote address: LID 0000 QPN 0x00b1 PSN 000000
 GID: 254:128:00:00:00:00:00:00:02:02:201:255:254:33:253:18
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 2          1000           11.72              11.25  
 4          1000           22.75              22.26  
 8          1000           45.49              44.50  
 16         1000           90.98              89.10  
 32         1000           181.97             178.10 
 64         1000           362.95             355.76 
 128        1000           639.49             638.64 
 256        1000           817.61             817.44 
 512        1000           979.34             979.19 
 1024       1000           1077.57            1077.11
 2048       1000           1083.00            1081.56
 4096       1000           1085.19            1085.12
 8192       1000           1088.08            1087.93
 16384      1000           1088.97            1088.91
 32768      1000           1089.39            1089.37
 65536      1000           1089.66            1089.66
 131072     1000           1089.88            1089.86
 262144     1000           1089.93            1089.92
 524288     1000           1089.94            1089.94
 1048576    1000           1089.97            1089.96
 2097152    1000           1089.97            1089.97
 4194304    1000           1089.98            1089.98
 8388608    1000           1089.98            1089.98
------------------------------------------------------------------

TEST 4: Port 2, netperf TCP_STREAM

Here, netperf was used to with the default TCP_STREAM test on port 2. 9.4 Gb/s which is not bad, MTU = 1500.

[ajameson-test@sstar158 src]$ ~/linux_64/bin/netperf -H 10.1.1.2 
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.2 (10.1.1.2) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  65536  65536    10.00    9395.10

Importantly I also re-ran this transfer test whilst a bulk transfer was ocurring on port 1 (at 5.5 GB/s) and the results did not change. This was particularly encouraging – implying the comments Dan made regarding IB / GbE conflicts were perhaps not valid for this card ?

TEST 5: Port 2, netperf TCP_STREAM with MTU 9000

Same as test above, but this time using a 9000 byte MTU on the 10GbE interface. This time around we pretty much reached the line rate

[ajameson-test@sstar158 src]$ ~/linux_64/bin/netperf -H 10.1.1.2 
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.2 (10.1.1.2) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  65536  65536    10.00    9907.52