Difference between revisions of "CSIT/csit2001 plan"
Mackonstan (Talk | contribs) |
(Update status of Vratko's deliverables.) |
||
(7 intermediate revisions by 3 users not shown) | |||
Line 43: | Line 43: | ||
! Description | ! Description | ||
|- | |- | ||
− | | | + | | VPP API checks |
− | | | + | | Done |
− | | | + | | Framework |
− | | | + | | Improving VPP API change process to make it more reliable and reduce the false positive. |
+ | |- | ||
+ | | VAT to PAPI | ||
+ | | Done | ||
+ | | Framework | ||
+ | | Complete VAT to PAPI migration - address the API execution efficiency for scale tests. TODO performance tests of fib scale still on VAT. | ||
+ | |- | ||
+ | | Python migration | ||
+ | | Done | ||
+ | | Framework | ||
+ | | Python 2.7 to 3.x migration, .md analysis and migration plan coming in gerrit. | ||
+ | |- | ||
+ | | Performance bisection | ||
+ | | Done (but merge postponed) | ||
+ | | Framework | ||
+ | | Job for bisecting performance regressions (leveraging per patch perf test work). | ||
+ | |- | ||
+ | | Data backend | ||
+ | | Next Release | ||
+ | | Framework | ||
+ | | A standalone test data processing backend - datastore, analytics/query engine. Stop relying on Nexus as results file store. | ||
+ | |- | ||
+ | | HDRhistogram | ||
+ | | Done | ||
+ | | Framework | ||
+ | | Making use of HDRhistogram in TRex, and higher resolution of latency data for performance tests. | ||
+ | |- | ||
+ | | Reconf tests | ||
+ | | Done | ||
+ | | Methodology | ||
+ | | Reconf tests methodology - see if any methodology improvements required based on feedback, add more test cases. | ||
+ | |- | ||
+ | | Perfmon | ||
+ | | Next Release | ||
+ | | Framework | ||
+ | | Per vpp node efficiency - today storing elog capturing thread barriers - for perfmon we are missing an API to catch two values for the run, we would need to check if this got resolved. | ||
+ | |- | ||
+ | | Per packet path telemetry | ||
+ | | Next Release | ||
+ | | Tools | ||
+ | | Start with a new telemetry approach - per packet path analysis, similarly how it's done in NFVbench, see how this could be applied to NFV density tests and actually all other tests. | ||
+ | |- | ||
+ | | Emails with regressions | ||
+ | | Done | ||
+ | | Presentation | ||
+ | | Trending regressions - add announce emails to csit-report. | ||
+ | |- | ||
+ | | Improved anomaly detection | ||
+ | | Done (for 2001) | ||
+ | | Methodology | ||
+ | | Anomaly detection - still seeing some noise, more data doesn't seem to be helping, no pattern. Need more inside knowledge, white-box, need more telemetry data from tests to see if any correlation can be found. Affects trending anomaly detection, per patch perf, perf bisecting. | ||
+ | |- | ||
+ | | IPsec in container | ||
+ | | Done | ||
+ | | Performance | ||
+ | | vhost/memif - adding vpp-in-container with ipsec. | ||
+ | |- | ||
+ | | More Arm testbeds for vpp_device | ||
+ | | WIP | ||
+ | | Device | ||
+ | | Testbeds - Arm - adding more ThunderX machines for vpp_device to run csit-vpp and vpp-csit device tests. | ||
+ | |- | ||
+ | | More vpp_device tests | ||
+ | | WIP | ||
+ | | Device | ||
+ | | Add more vpp_device tests for better VPP API coverage, as those are executed per vpp patch and per csit patch | ||
+ | |- | ||
+ | | Arm vpp_Device per VPP patch voting | ||
+ | | WIP | ||
+ | | CI process | ||
+ | | Productize per VPP patch (with voting?) vpp-csit device tests for Arm. | ||
+ | |- | ||
+ | | HostStack Tests | ||
+ | | WIP | ||
+ | | Performance | ||
+ | | New Performance Tests - Iperf3+LDP with WRK, Nginx+VCL with WRK, Quic Transport | ||
|- | |- | ||
| Name | | Name | ||
Line 113: | Line 188: | ||
ii) We need heavy workarounds or way more predictable SUT behavior. | ii) We need heavy workarounds or way more predictable SUT behavior. | ||
+ | |||
+ | iii) Denverton is more precise, but we need to avoid TRex duration stretching first. | ||
|- | |- | ||
| More VPP telemetry reported and analysed | | More VPP telemetry reported and analysed | ||
Line 120: | Line 197: | ||
| Build upon MLRsearch and PLRsearch experience vs. ordinary binary search: | | Build upon MLRsearch and PLRsearch experience vs. ordinary binary search: | ||
i) Compare MLRsearch with PLRsearch soak test results. | i) Compare MLRsearch with PLRsearch soak test results. | ||
+ | |- | ||
+ | | Continue improving VPP API process | ||
+ | | Currently missing features: | ||
+ | i) Make sure no job gives -1 on correct commit. | ||
+ | |||
+ | ii) Enable voting on vpp-csit-devicetest job. | ||
+ | |||
+ | iii) Specify how CSIT should properly react to VPP code moving between plugins. | ||
|- | |- | ||
| General enhancements | | General enhancements |
Latest revision as of 09:33, 18 February 2020
Contents
Introduction
This page tracks release information for FD.io CSIT-2001. It is updated regularly by hand. Real-time information is available in FD.io CSIT code repository and auto-generated docs.
Release Milestones
Milestone | Date | Deliverables |
---|---|---|
F0 | 2020-01-08 | Test case keywords code complete. Only low-risk changes accepted. |
RC1 | 2020-01-15 (F0+7) | Code complete. Pull first release branch. Only bug fixes accepted in release branch. Date aligned with VPP RC1. Start dry-runs to identify CSIT gaps on less frequently run tests. |
RC2 | 2020-01-22 (RC1+7) | Dry-run testing begins of VPP RC2, performance and functional. Date aligned with VPP RC2. |
CSIT Release | 2020-01-29 (RC2+7) | CSIT release complete. VPP release testing starts. Date aligned with VPP Formal Release. |
Report Publish | 2020-02-12 (Rls+14) | CSIT report published for VPP release. |
Release Deliverables
Name | Status | Jira Category | Description |
---|---|---|---|
VPP API checks | Done | Framework | Improving VPP API change process to make it more reliable and reduce the false positive. |
VAT to PAPI | Done | Framework | Complete VAT to PAPI migration - address the API execution efficiency for scale tests. TODO performance tests of fib scale still on VAT. |
Python migration | Done | Framework | Python 2.7 to 3.x migration, .md analysis and migration plan coming in gerrit. |
Performance bisection | Done (but merge postponed) | Framework | Job for bisecting performance regressions (leveraging per patch perf test work). |
Data backend | Next Release | Framework | A standalone test data processing backend - datastore, analytics/query engine. Stop relying on Nexus as results file store. |
HDRhistogram | Done | Framework | Making use of HDRhistogram in TRex, and higher resolution of latency data for performance tests. |
Reconf tests | Done | Methodology | Reconf tests methodology - see if any methodology improvements required based on feedback, add more test cases. |
Perfmon | Next Release | Framework | Per vpp node efficiency - today storing elog capturing thread barriers - for perfmon we are missing an API to catch two values for the run, we would need to check if this got resolved. |
Per packet path telemetry | Next Release | Tools | Start with a new telemetry approach - per packet path analysis, similarly how it's done in NFVbench, see how this could be applied to NFV density tests and actually all other tests. |
Emails with regressions | Done | Presentation | Trending regressions - add announce emails to csit-report. |
Improved anomaly detection | Done (for 2001) | Methodology | Anomaly detection - still seeing some noise, more data doesn't seem to be helping, no pattern. Need more inside knowledge, white-box, need more telemetry data from tests to see if any correlation can be found. Affects trending anomaly detection, per patch perf, perf bisecting. |
IPsec in container | Done | Performance | vhost/memif - adding vpp-in-container with ipsec. |
More Arm testbeds for vpp_device | WIP | Device | Testbeds - Arm - adding more ThunderX machines for vpp_device to run csit-vpp and vpp-csit device tests. |
More vpp_device tests | WIP | Device | Add more vpp_device tests for better VPP API coverage, as those are executed per vpp patch and per csit patch |
Arm vpp_Device per VPP patch voting | WIP | CI process | Productize per VPP patch (with voting?) vpp-csit device tests for Arm. |
HostStack Tests | WIP | Performance | New Performance Tests - Iperf3+LDP with WRK, Nginx+VCL with WRK, Quic Transport |
Name | Status | Jira Category | Description |
Jira Task Tracking
All CSIT release deliverables should be tracked in FDio CSIT Jira using one of the following Jira Epic categories:
Framework CI process Performance Device Methodology Telemetry Tools Presentation Honeycomb Aarch64
Multi-Release Work Areas
Work Area | Description |
---|---|
Xeon Skx testbeds | Make Skylake performance test coverage complete:
i) Boost tests in 2-Node setups, complete 3-Node setups; ii) Complete Memif/Container and Vhost-user/VM with latest QEMU; iii) Push vpp-dev to Ubuntu 18.04. |
Arm testbeds | Introduce Arm performance tests. |
Atom testbeds | Introduce Denverton and Rangeley performance tests. |
Better vhost, memif coverage | Produce more complete test data for NFV service density:
i) Scaled-out Vhost-user/VM and Memif/Container tests; ii) Test the same packet paths and NF topologies: service chains, service pipelines; iii) See if we can isolate the actual cost of Vhostuser-virtio and Memif-Memif virtual interfaces based on the test and system telemetry. iv) Test with VM and Containers running on a single Processor (single socket) with no core oversubscription and with. v) Extend the test over two Processors to quantify impact of UPI latency (and bandwidth). |
VPP per patch performance tests | Productise per VPP patch performance tests with change detection, prepare for voting:
i) Improve detection accuracy and precision; ii) Nail down current results variance; iii) Apply improvements to continuous trending and (future) git auto-bisection. |
Trending Improved Detection | Make trending job use new Burst MRR trending tests for better anomaly detection:
i) Currently postponed, as the algorithm detects performance changes not related to VPP code. ii) We need heavy workarounds or way more predictable SUT behavior. iii) Denverton is more precise, but we need to avoid TRex duration stretching first. |
More VPP telemetry reported and analysed | API based consumption of VPP telemetry including existing general counters, and future extended per node counters. |
Evolve throughput search | Build upon MLRsearch and PLRsearch experience vs. ordinary binary search:
i) Compare MLRsearch with PLRsearch soak test results. |
Continue improving VPP API process | Currently missing features:
i) Make sure no job gives -1 on correct commit. ii) Enable voting on vpp-csit-devicetest job. iii) Specify how CSIT should properly react to VPP code moving between plugins. |
General enhancements | General CSIT and VPP performance test and infrastructure enhancements:
i) Productize VPP_Device container-based functional tests in 1-Node Skylake testbeds, assist with the same for Arm; ii) Add proper packet latency measurements with T-Rex HDRhistogram, push T-Rex to productize HDRh'gram; iii) Start using the new VPP stats infra for per test counters and "gauges" collection incl. "show runtime", instead of VPP show CLI; iv) Start migration from VAT to VPP Python API; v) Nail down "broken"/not-performing VPP data plane feature arcs (incl. multi-threading) indicated by CSIT-18.10 results data. |
External Dependencies
- No known external dependencies.