Difference between revisions of "CSIT/csit tg servers"

Latest revision as of 13:50, 14 April 2021

FD.io CSIT recommended server specification for Traffic Generator (TRex) used in FD.io performance tests.

TODAY - Deployed Traffic Generator (TG) Servers

All TG/TRex instances are based on Xeon servers
- SuperMicro 2 socket servers.
- Motherboards with max PCIe I/O for NIC cards.
For Xeon SUTs, same Xeon generation for TG
- 2n-clx, 2n-clx => CLX TG.
- 3n-skx => skx TG, SKX TG.
- 3n-hsw => hsw TG, HSW TG.
For Arm SUT ThunderX2
- 2n-tx2 => shared skx TG, shared SKX TG.
- shared TG has two TRex instances running in parallel, one per Numa.
For AMD SUTs, same AMD generation for TG
- 2n-zn2 => zn2 TG, ZN2 TG
for DNV
- 2n-dnv => shared skx TG, shared SKX TG.
- 3n-dnv => shared skx TG, shared SKX TG.

FUTURE - Recommended "standardised" approach

Standardise on a reference TG server
- Reduce amount of calibration work involved in making TRex fit for purpose for CSIT tests
  - Note that separate calibration is required for TRex STL (stateless) and ASTF (advanced stateful) APIs.
- Improve TG/TRex determinism of behaviour and performance.
- Get max performance from TRex.
Proposal
- Use Xeon ICX as the main TG server platform (2 socket servers)
  - Processor: ICX high end SKU (assume 8380, other recommended by the vendor), similar to what is used for SKX (8180) and CLX (8280).
- Server/Motherboard: OEM motherboards with max PCIEe Gen4 I/O (as per current practice, i.e. SuperMicro).
- NICs
  - Separate onboard management connectivity.
  - 4p10GbE, 2p25GbE Intel FVL.
  - 2p100GbE Mellanox and Intel CVL.
  - Separate 2p100GbE NIC to enable b2b calibration tests.
- Approach for for lower end SUTs
  - Use the "standardised" TG 2-socket server, allocating NUMA per lower-end SUT testbed
  - Note: if SUT vendor prefers to use a lower power Xeon builds, then the SUT vendor would need to take on the responsibility for calibration of TRex.

POINTS FOR DISCUSSION

CLOSED What is the status of TRex support for Arm, and expected performance?
- CSIT project doesn't have any experience running TRex on Arm.
- TRex documentation claims generic support for Arm, but no builds provided, need to compile from source.
- No / limited documented use of TRex on Arm servers.
OPEN If for the new builds in 2021 CSIT recommend ICX Xeon, but once SPR Xeon arrives in the future it will be faster.
- Meaning we always end up with a mix of TG server platforms, as is the case today.
- But we should strive to minimize the number of TRex server variations, to reduce calibration/maintenance work.
OPEN What about AMD?
- CSIT should encourage AMD to provide AMD based servers for TG.
- But AMD should be co-sharing the responsibility of TRex calibration.

@@ Line 1: / Line 1: @@
-DRAFT FD.io CSIT recommended server specification for Traffic Generator (TRex) used in FD.io performance tests.
+FD.io CSIT recommended server specification for Traffic Generator (TRex) used in FD.io performance tests.
 == TODAY - Deployed Traffic Generator (TG) Servers ==
@@ Line 22: / Line 22: @@
 # Standardise on a reference TG server
-#* Reduce amount of work involved in TRex calibration for tests with STL and ASTF APIs.
+#* Reduce amount of calibration work involved in making TRex fit for purpose for CSIT tests
+#** Note that separate calibration is required for TRex STL (stateless) and ASTF (advanced stateful) APIs.
 #* Improve TG/TRex determinism of behaviour and performance.
 #* Get max performance from TRex.
 # Proposal
-#* Use Xeon ICX as the main TG server platform
+#* Use Xeon ICX as the main TG server platform (2 socket servers)
-#** Processor: ICX high end SKU (assume 8380), similar to what is used for SKX (8180) and CLX (8280).
+#** Processor: ICX high end SKU (assume 8380, other recommended by the vendor), similar to what is used for SKX (8180) and CLX (8280).
-#** Server/Motherboard: OEM motherboards with max PCIEe Gen4 I/O (as per current practice).
+#* Server/Motherboard: OEM motherboards with max PCIEe Gen4 I/O (as per current practice, i.e. SuperMicro).
-#** NIC: Separate onboard management connectivity; using 40Gb/100Gb/200Gb NIC cards (Nvidia /formerly Mellanox/, Intel) directly connected to DUT; using separate 100Gb card for b2b calibration tests.
+#* NICs
-#* Optionally consider lower power Xeon builds for lower end SUTs
+#** Separate onboard management connectivity.
-#** e.g. for new Intel Atom / Snowridge SUTs?
+#** 4p10GbE, 2p25GbE Intel FVL.
-#** e.g. for new Arm / Ampere SUTs?
+#** 2p100GbE Mellanox and Intel CVL.
-#*** The risk here is that underpowered TG may not be able to stress Ampere 80-core processor, in case we ever go that high on Arm SUTs.
+#** Separate 2p100GbE NIC to enable b2b calibration tests.
-#*** Agree this is a risk - we should look for DUT vendor contribution to mitigate, otherwise let them assume the risk.
+#* Approach for for lower end SUTs
+#** Use the "standardised" TG 2-socket server, allocating NUMA per lower-end SUT testbed
+#** Note: if SUT vendor prefers to use a lower power Xeon builds, then the SUT vendor would need to take on the responsibility for calibration of TRex.
 == POINTS FOR DISCUSSION ==
-# Is CSIT project good to continue to have TG/TRex servers shared, e.g. TRex instance per Numa/socket? If yes, need to work out what "slower" SUTs / low-speed NIC we expect in the project going forward.
+# CLOSED What is the status of TRex support for Arm, and expected performance?
-# What is the status of TRex support for Arm, and expected performance?
 #* CSIT project doesn't have any experience running TRex on Arm.
 #* TRex documentation claims generic support for Arm, but no builds provided, need to compile from source.
 #* No / limited documented use of TRex on Arm servers.
-# If for the new builds in 2021 CSIT recommend ICX Xeon, but once SPR Xeon arrives in the future it will be faster.
+# OPEN If for the new builds in 2021 CSIT recommend ICX Xeon, but once SPR Xeon arrives in the future it will be faster.
 #* Meaning we always end up with a mix of TG server platforms, as is the case today.
-#** Is this de-risked by running more TG cores compared to DUT cores? If I have 8xTG cores blasting traffic at 4xDUT cores, does it matter that the DUT cores are 10-20% faster? Inevitably the DUT core are doing more work compared to the TG cores in anycase?
+#* But we should strive to minimize the number of TRex server variations, to reduce calibration/maintenance work.
-#** Consider better utilizing PCI bus by running 2p2nic configuration.
+# OPEN What about AMD?
+#* CSIT should encourage AMD to provide AMD based servers for TG.
+#* But AMD should be co-sharing the responsibility of TRex calibration.

Difference between revisions of "CSIT/csit tg servers"

Latest revision as of 13:50, 14 April 2021

TODAY - Deployed Traffic Generator (TG) Servers

FUTURE - Recommended "standardised" approach

POINTS FOR DISCUSSION

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools