CSIT/csit tg servers

From fd.io
< CSIT
Revision as of 05:57, 1 April 2021 by Pmikus (Talk | contribs)

Jump to: navigation, search

DRAFT FD.io CSIT recommended server specification for Traffic Generator (TRex) used in FD.io performance tests.

TODAY - Deployed Traffic Generator (TG) Servers

  1. All TG/TRex instances are based on Xeon servers
    • SuperMicro 2 socket servers.
    • Motherboards with max PCIe I/O for NIC cards.
  2. For Xeon SUTs, same Xeon generation for TG
  3. For Arm SUT ThunderX2
    • 2n-tx2 => shared skx TG, shared SKX TG.
    • shared TG has two TRex instances running in parallel, one per Numa.
  4. For AMD SUTs, same AMD generation for TG
  5. for DNV

FUTURE - Recommended "standardised" approach

  1. Standardise on a reference TG server
    • Reduce amount of work involved in TRex calibration for tests with STL and ASTF APIs.
    • Improve TG/TRex determinism of behaviour and performance.
    • Get max performance from TRex.
  2. Proposal
    • Use Xeon ICX as the main TG server platform
      • Processor: ICX high end SKU (assume 8380), similar to what is used for SKX (8180) and CLX (8280).
      • Server/Motherboard: OEM motherboards with max PCIEe Gen4 I/O (as per current practice).
      • NIC: Separate onboard management connectivity; using 40Gb/100Gb/200Gb NIC cards (Nvidia /formerly Mellanox/, Intel) directly connected to DUT; using separate 100Gb card for b2b calibration tests.
    • Optionally consider lower power Xeon builds for lower end SUTs
      • e.g. for new Intel Atom / Snowridge SUTs?
      • e.g. for new Arm / Ampere SUTs?
        • The risk here is that underpowered TG may not be able to stress Ampere 80-core processor, in case we ever go that high on Arm SUTs.
        • Agree this is a risk - we should look for DUT vendor contribution to mitigate, otherwise let them assume the risk.

POINTS FOR DISCUSSION

  1. Is CSIT project good to continue to have TG/TRex servers shared, e.g. TRex instance per Numa/socket? If yes, need to work out what "slower" SUTs / low-speed NIC we expect in the project going forward.
  2. What is the status of TRex support for Arm, and expected performance?
    • CSIT project doesn't have any experience running TRex on Arm.
    • TRex documentation claims generic support for Arm, but no builds provided, need to compile from source.
    • No / limited documented use of TRex on Arm servers.
  3. If for the new builds in 2021 CSIT recommend ICX Xeon, but once SPR Xeon arrives in the future it will be faster. What about AMD?
    • Meaning we always end up with a mix of TG server platforms, as is the case today.
      • Is this de-risked by running more TG cores compared to DUT cores? If I have 8xTG cores blasting traffic at 4xDUT cores, does it matter that the DUT cores are 10-20% faster? Inevitably the DUT core are doing more work compared to the TG cores in anycase?
      • Consider better utilizing PCI bus by running 2p2nic configuration.