Difference between revisions of "CSIT/csit tg servers"
From fd.io
< CSIT
Mackonstan (Talk | contribs) (Created page with "DRAFT FD.io CSIT recommended server specification for Traffic Generator (TRex) used in FD.io performance tests. == TODAY - Deployed Traffic Generator (TG) Servers == # All T...") |
Mackonstan (Talk | contribs) |
||
(6 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | FD.io CSIT recommended server specification for Traffic Generator (TRex) used in FD.io performance tests. | |
== TODAY - Deployed Traffic Generator (TG) Servers == | == TODAY - Deployed Traffic Generator (TG) Servers == | ||
Line 22: | Line 22: | ||
# Standardise on a reference TG server | # Standardise on a reference TG server | ||
− | #* Reduce amount of work involved in TRex | + | #* Reduce amount of calibration work involved in making TRex fit for purpose for CSIT tests |
+ | #** Note that separate calibration is required for TRex STL (stateless) and ASTF (advanced stateful) APIs. | ||
#* Improve TG/TRex determinism of behaviour and performance. | #* Improve TG/TRex determinism of behaviour and performance. | ||
#* Get max performance from TRex. | #* Get max performance from TRex. | ||
# Proposal | # Proposal | ||
− | #* Use Xeon ICX as the main TG server platform | + | #* Use Xeon ICX as the main TG server platform (2 socket servers) |
− | #** Processor: ICX | + | #** Processor: ICX high end SKU (assume 8380, other recommended by the vendor), similar to what is used for SKX (8180) and CLX (8280). |
− | # | + | #* Server/Motherboard: OEM motherboards with max PCIEe Gen4 I/O (as per current practice, i.e. SuperMicro). |
− | #* | + | #* NICs |
− | #** | + | #** Separate onboard management connectivity. |
− | #** | + | #** 4p10GbE, 2p25GbE Intel FVL. |
− | #** | + | #** 2p100GbE Mellanox and Intel CVL. |
+ | #** Separate 2p100GbE NIC to enable b2b calibration tests. | ||
+ | #* Approach for for lower end SUTs | ||
+ | #** Use the "standardised" TG 2-socket server, allocating NUMA per lower-end SUT testbed | ||
+ | #** Note: if SUT vendor prefers to use a lower power Xeon builds, then the SUT vendor would need to take on the responsibility for calibration of TRex. | ||
== POINTS FOR DISCUSSION == | == POINTS FOR DISCUSSION == | ||
− | # | + | # CLOSED What is the status of TRex support for Arm, and expected performance? |
− | + | ||
#* CSIT project doesn't have any experience running TRex on Arm. | #* CSIT project doesn't have any experience running TRex on Arm. | ||
#* TRex documentation claims generic support for Arm, but no builds provided, need to compile from source. | #* TRex documentation claims generic support for Arm, but no builds provided, need to compile from source. | ||
#* No / limited documented use of TRex on Arm servers. | #* No / limited documented use of TRex on Arm servers. | ||
− | # If for the new builds in 2021 CSIT recommend ICX Xeon, but once SPR Xeon arrives in the future it will be faster. | + | # OPEN If for the new builds in 2021 CSIT recommend ICX Xeon, but once SPR Xeon arrives in the future it will be faster. |
#* Meaning we always end up with a mix of TG server platforms, as is the case today. | #* Meaning we always end up with a mix of TG server platforms, as is the case today. | ||
+ | #* But we should strive to minimize the number of TRex server variations, to reduce calibration/maintenance work. | ||
+ | # OPEN What about AMD? | ||
+ | #* CSIT should encourage AMD to provide AMD based servers for TG. | ||
+ | #* But AMD should be co-sharing the responsibility of TRex calibration. |
Latest revision as of 13:50, 14 April 2021
FD.io CSIT recommended server specification for Traffic Generator (TRex) used in FD.io performance tests.
TODAY - Deployed Traffic Generator (TG) Servers
- All TG/TRex instances are based on Xeon servers
- SuperMicro 2 socket servers.
- Motherboards with max PCIe I/O for NIC cards.
- For Xeon SUTs, same Xeon generation for TG
- For Arm SUT ThunderX2
- 2n-tx2 => shared skx TG, shared SKX TG.
- shared TG has two TRex instances running in parallel, one per Numa.
- For AMD SUTs, same AMD generation for TG
- 2n-zn2 => zn2 TG, ZN2 TG
- for DNV
- 2n-dnv => shared skx TG, shared SKX TG.
- 3n-dnv => shared skx TG, shared SKX TG.
FUTURE - Recommended "standardised" approach
- Standardise on a reference TG server
- Reduce amount of calibration work involved in making TRex fit for purpose for CSIT tests
- Note that separate calibration is required for TRex STL (stateless) and ASTF (advanced stateful) APIs.
- Improve TG/TRex determinism of behaviour and performance.
- Get max performance from TRex.
- Reduce amount of calibration work involved in making TRex fit for purpose for CSIT tests
- Proposal
- Use Xeon ICX as the main TG server platform (2 socket servers)
- Processor: ICX high end SKU (assume 8380, other recommended by the vendor), similar to what is used for SKX (8180) and CLX (8280).
- Server/Motherboard: OEM motherboards with max PCIEe Gen4 I/O (as per current practice, i.e. SuperMicro).
- NICs
- Separate onboard management connectivity.
- 4p10GbE, 2p25GbE Intel FVL.
- 2p100GbE Mellanox and Intel CVL.
- Separate 2p100GbE NIC to enable b2b calibration tests.
- Approach for for lower end SUTs
- Use the "standardised" TG 2-socket server, allocating NUMA per lower-end SUT testbed
- Note: if SUT vendor prefers to use a lower power Xeon builds, then the SUT vendor would need to take on the responsibility for calibration of TRex.
- Use Xeon ICX as the main TG server platform (2 socket servers)
POINTS FOR DISCUSSION
- CLOSED What is the status of TRex support for Arm, and expected performance?
- CSIT project doesn't have any experience running TRex on Arm.
- TRex documentation claims generic support for Arm, but no builds provided, need to compile from source.
- No / limited documented use of TRex on Arm servers.
- OPEN If for the new builds in 2021 CSIT recommend ICX Xeon, but once SPR Xeon arrives in the future it will be faster.
- Meaning we always end up with a mix of TG server platforms, as is the case today.
- But we should strive to minimize the number of TRex server variations, to reduce calibration/maintenance work.
- OPEN What about AMD?
- CSIT should encourage AMD to provide AMD based servers for TG.
- But AMD should be co-sharing the responsibility of TRex calibration.