Difference between revisions of "CSIT/TestFailuresTracking"

From fd.io
Jump to: navigation, search
(Created page with "== CSIT Test Failure Clasification == All known CSIT failures grouped and listed in the following order: * Always failing followed by sometimes failing. * Always failing test...")
 
((M) 2n-zn2: All 4c RDMA tests are failing)
(33 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== CSIT Test Failure Clasification ==
+
= CSIT Test Failure Clasification =
  
 
All known CSIT failures grouped and listed in the following order:
 
All known CSIT failures grouped and listed in the following order:
Line 6: Line 6:
 
** Most common use cases followed by less common.
 
** Most common use cases followed by less common.
 
* Sometimes failing tests:
 
* Sometimes failing tests:
** Most frequently failing followed by less frequently failing.
+
** Most frequently failing followed by less frequently failing.
 +
*** High frequency 50%-100%
 +
*** medium frequency 10%-50%
 +
*** low frequency 0%-10%.
 
** Within each sub-group: most common use cases followed by less common.
 
** Within each sub-group: most common use cases followed by less common.
  
== CSIT Test Fixing Priorities ==
+
= CSIT Test Fixing Priorities =
  
* Test fixing work priorities defined as follows
+
Test fixing work priorities defined as follows:
** (H)igh priority, most common use cases and most common test code.
+
* (H)igh priority, most common use cases and most common test code.
** (M)edium priority, specific HW and pervasive test code issue.
+
* (M)edium priority, specific HW and pervasive test code issue.
** (L)ow priority, corner cases and external dependencies.
+
* (L)ow priority, corner cases and external dependencies.
  
== Always Failing Tests ==
+
= Current Failures =
 +
 
 +
== Deterministic Failures ==
  
 
=== In Trending ===
 
=== In Trending ===
  
==== (H) 2n-clx, 2n-zn2: VPP RDMA tests no traffic forwarded ====
+
==== (M) 2n-zn2: All 4c RDMA tests are failing ====
  
* (H) 2n-clx, 2n-zn2: all RDMA tests failing with cli_inband clear runtime command
+
* last update: before 2023-02-22
** work-to-fix: easy
+
* work-to-fix: medium?
** rca:
+
* rca: VPP change 38242 causes a crash, stack trace looks the same (if it exists), does not happen with debug VPP build. More specifics not known yet.
** test: all RDMA with CX556A NIC
+
* test: only on RDMA, only on 2n-zn2 (not on 2n-clx), higher core count is affected more
** frequency: always
+
* frequency: 4c almost always, 2c sometimes, 1c rarely.
** testbed: 2n-clx, 2n-zn2
+
* testbed: 2n-zn2
** example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-clx/1212/log.html.gz#s1-s1-s1-s1-s1-t1 2n-clx], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-zn2/639/log.html.gz#s1-s1-s1-s1-s1-t1 2n-zn2], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-clx/167/log.html.gz#s1-s1-s1-s2-s5-t1 2n-clx]
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-zn2/683/log.html.gz#s1-s1-s1-s2-s1-t2 2n-zn2]
** ticket: [https://jira.fd.io/browse/CSIT-1882 CSIT-1882]
+
* ticket: [https://jira.fd.io/browse/VPP-2070 VPP-2070]
** note:
+
  
==== (M) 3n-snr: hwasync Wireguard failing to verify device ====
+
==== (M) 3n-snr: All hwasync wireguard tests failing when trying to verify device ====
  
* (M) 3n-snr: All hwasync wireguard tests failing when trying to verify device
+
* last update: before 2023-01-31
** work-to-fix: easy
+
* work-to-fix: hard
** rca: Failed to bind PCI device 0000:f4:00.0 to c4xxx on host 10.30.51.93
+
* rca: Missing QAT driver. Symptom: Failed to bind PCI device 0000:f4:00.0 to c4xxx on host 10.30.51.93
** test: hwasync wireguard
+
* test: hwasync wireguard
** frequency: always
+
* frequency: always
** testbed: 3n-snr
+
* testbed: 3n-snr
** example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-snr/45/log.html.gz#s1-s1-s1-s3-s1 3n-snr]
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-snr/95/log.html.gz#s1-s1-s1-s3-s1 3n-snr]
** ticket: [https://jira.fd.io/browse/CSIT-1883 CSIT-1883]
+
* ticket: [https://jira.fd.io/browse/CSIT-1883 CSIT-1883]
** note:
+
  
 
==== (M) 1n-aws: TRex mlrsearch fails to find NDR & PDR due to AWS rate limiting (5min total test duration) ====
 
==== (M) 1n-aws: TRex mlrsearch fails to find NDR & PDR due to AWS rate limiting (5min total test duration) ====
  
* (M) 1n-aws: TRex NDR PDR ALL IP4 scale and L2 scale tests failing with 50% packet loss
+
* last update: 2023-02-09
** work-to-fix: hard
+
* work-to-fix: hard
** rca:
+
* rca:
** test: ip4scale2m
+
* test: ip4scale2m
** frequency: always
+
* frequency: always
** testbed: 1n-aws
+
* testbed: 1n-aws
** example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-trex-perf-ndrpdr-weekly-master-1n-aws/8/log.html.gz#s1-s1-s1-s1-s2-t1 1n-aws]
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-trex-perf-ndrpdr-weekly-master-1n-aws/18/log.html.gz#s1-s1-s1-s1-s2-t1 1n-aws]
** ticket: [https://jira.fd.io/browse/CSIT-1876 CSIT-1876]
+
* ticket: [https://jira.fd.io/browse/CSIT-1876 CSIT-1876]
** note: The root cause can be shared environment in aws cloud.
+
* note: The root cause can be shared environment in aws cloud. We may need to use a smaller scale there.
  
 
==== (M) 3n-alt, 3n-snr: testpmd no traffic forwarded ====
 
==== (M) 3n-alt, 3n-snr: testpmd no traffic forwarded ====
  
* (M) 3n-alt, 3n-snr: testpmd tests fail with no traffic
+
* last update: 2023-02-09
** work-to-fix: hard
+
* work-to-fix: medium
** rca:
+
* rca: DUT-DUT link takes too long to come up on some testbeds. This happens *after* a test case with a DPDK app (not VPP even when using dpdk plugin), although multiple subsequent tests (even with VPP) may be affected. The real cause is probably in NIC firmware or driver, but CSIT can be better at detecting port status as a workaround.
** test: testpmd
+
* test: testpmd (also l3fwd but hidden by CSIT-1896)
** frequency: always
+
* frequency: always (almost)
** testbed: 3n-alt, 3n-snr
+
* testbed: 3n-alt, 3n-snr
** example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-mrr-weekly-master-3n-alt/33/log.html.gz#s1-s1-s1-s1-t2 3n-alt], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2210-3n-snr/6/log.html.gz#s1-s1-s1-s1-t1 3n-snr], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2210-3n-snr/14/log.html.gz#s1-s1-s1-s1-t1 3n-snr]
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-mrr-weekly-master-3n-alt/42/log.html.gz#s1-s1-s1-s1-t1 3n-alt], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2210-3n-snr/6/log.html.gz#s1-s1-s1-s1-t1 3n-snr], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2210-3n-snr/14/log.html.gz#s1-s1-s1-s1-t1 3n-snr]
** ticket: [https://jira.fd.io/browse/CSIT-1848 CSIT-1848]
+
* ticket: [https://jira.fd.io/browse/CSIT-1848 CSIT-1848]
** note:
+
  
=== not in trending ===
+
==== (M) 3n-alt: Tests failing until 40Ge Interface comes up ====
  
==== (H) 3n-icx: vpp hoststack QUIC vppecho tests failing ====
+
* last update: 2023-02-09
 +
* work-to-fix: medium
 +
* rca: DUT-DUT link takes too long to come up due to CSIT-1848.
 +
* test: first tests in order
 +
* frequency: always (almost, depends on run order)
 +
* testbed: 3n-alt (3n-snr link does not take that long)
 +
* example: https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-alt/155/log.html.gz#s1-s1-s1-s1-s1-t1
 +
* ticket: [https://jira.fd.io/browse/CSIT-1890 CSIT-1890]
  
* (H) 3n-icx: QUIC vppecho BPS tests failing on timeout when checking hoststack finished
+
==== (L) 3n-icx: negative ipackets on TB38 AVF 4c l2patch ====
** work-to-fix: easy
+
** rca:
+
** test: Quic vppecho BPS
+
** frequency: always
+
** testbed: 3n-skx, 3n-icx
+
** example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2210-3n-icx/17/log.html.gz#s1-s1-s1-s1-s5-t1 3n-icx]
+
** ticket: [https://jira.fd.io/browse/CSIT-1835 CSIT-1835]
+
** note:
+
  
==== (M) all testbeds: vpp 9000B tests with vhostuser, memif, tunnels, avf ====
+
* last update: 2023-02-22
 +
* work-to-fix:
 +
* rca:
 +
* test: TB38 AVF 4c l2patch e810cq
 +
* frequency: always
 +
* testbed: 3n-icx (only TB38, never TB37)
 +
* example: https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/214/log.html.gz#s1-s1-s1-s5-s3-t3-k2-k9-k8-k13-k1-k2
 +
* ticket: [https://jira.fd.io/browse/CSIT-1901 CSIT-1901]
  
* All tests with 9000B payload frames not forwarded over vhostuser interfaces.
+
=== Not In Trending ===
** work-to-fix: hard
+
** rca: VPP code: [34839: dpdk: cleanup MTU handling](https://gerrit.fd.io/r/c/vpp/+/34839)
+
** test: 9000B - vhostuser
+
** frequency: always
+
** testbed: 2n-skx, 3n-skx, 2n-clx
+
** examples: [3n-skx vhostuser](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2202-3n-skx/67/log.html.gz#s1-s1-s1-s1-s1)
+
** ticket: [https://jira.fd.io/browse/CSIT-1809 CSIT-1809]
+
** note:
+
  
* All tests with 9000B payload frames not forwarded over memif interfaces.
+
==== (M) all testbeds: some vpp 9000B tests ====
** work-to-fix: hard
+
** rca: VPP code: [34839: dpdk: cleanup MTU handling](https://gerrit.fd.io/r/c/vpp/+/34839)
+
** test: 9000B - memif
+
** frequency: always
+
** testbed: 2n-skx, 3n-skx, 2n-clx
+
** examples: [2n-skx Memif](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2202-2n-skx/33/log.html.gz#s1-s1-s1-s1-s1)
+
** ticket: [https://jira.fd.io/browse/CSIT-1808 CSIT-1808]
+
** note:
+
  
* 9000B payload frames not forwarded over tunnels due to violating supported Max Frame Size (VxLAN, LISP, SRv6)
+
* last update: 2023-02-09
** work-to-fix: hard
+
* work-to-fix: hard
** rca: VPP code: [34839: dpdk: cleanup MTU handling](https://gerrit.fd.io/r/c/vpp/+/34839)
+
* rca: VPP code: [https://gerrit.fd.io/r/c/vpp/+/34839 34839: dpdk: cleanup MTU handling]. CSIT needs to rework how it sets MTU / max frame rate (CSIT-1797). Some tests will continue failing due to missing support on VPP side, we will open specific Jira tickets for those.
** test: 9000B - IP4 tunnels VXLAN, IP4 tunnels LISP, Srv6
+
* test: see sub-items
** frequency: always
+
* frequency: always
** testbed: 2n-icx, 3n-icx
+
* testbed: all
** examples: [2n-icx VXLAN](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-icx/10/log.html.gz), [3n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-3n-icx/22/log.html.gz#s1-s1-s1-s1-s1-t6)
+
* examples: see sub-items
** ticket:
+
* ticket: [https://jira.fd.io/browse/CSIT-1809 CSIT-1809]
** note:
+
* gerrit: https://gerrit.fd.io/r/c/csit/+/37824
  
* (M) 3n-icx: 9000b ip4 ip6 l2 NDRPDR AVF tests are failing to forward traffic
+
===== (M) tests with 9000B payload frames not forwarded over vhost interfaces =====
** work-to-fix: hard
+
** rca:
+
** test: 9000B - IP4, IP6, l2 - base and scale
+
** frequency: always
+
** testbed: 3n-icx
+
** examples: [3n-icx ip4base](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-3n-icx/13/log.html.gz#s1-s1-s1-s1-s1-t6)
+
** ticket: [https://jira.fd.io/browse/CSIT-1885 CSIT-1885]
+
** note:
+
  
* (M) 2n-clx, 2n-icx, 2n-zn2: DPDK testpmd 9000b tests on xxv710 nic are failing with no traffic
+
* last update: 2023-02-09
** work-to-fix: hard
+
* work-to-fix: hard
** rca:
+
* test: 9000B + vhostuser
** test: DPDK testpmd 9000b tests on xxv710 nic
+
* testbed: 2n-skx, 3n-skx, 2n-clx
** frequency: always
+
* examples: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2202-3n-skx/67/log.html.gz#s1-s1-s1-s1-s1 3n-skx vhostuser]
** testbed: 2n-clx, 2n-icx, 2n-zn2
+
* ticket: [https://jira.fd.io/browse/CSIT-1809 CSIT-1809]
** example: [2n-clx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2210-2n-clx/1/log.html.gz#s1-s1-s1-s3-t6), [2n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2210-2n-icx/3/log.html.gz#s1-s1-s1-s1-t6)
+
 
** ticket: [https://jira.fd.io/browse/CSIT-1870 CSIT-1870]
+
===== tests with 9000B payload frames not forwarded over memif interfaces =====
** note:
+
 
 +
* last update: 2023-02-09
 +
* work-to-fix: hard
 +
* test: 9000B + memif
 +
* testbed: 2n-skx, 3n-skx, 2n-clx
 +
* examples: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2202-2n-skx/33/log.html.gz#s1-s1-s1-s1-s1 2n-skx Memif]
 +
* ticket: [https://jira.fd.io/browse/CSIT-1808 CSIT-1808]
 +
 
 +
===== 9000B payload frames not forwarded over tunnels due to violating supported Max Frame Size (VxLAN, LISP, SRv6) =====
 +
 
 +
* last update: 2023-02-09
 +
* work-to-fix: medium
 +
* test: 9000B + (IP4 tunnels VXLAN, IP4 tunnels LISP, Srv6, IpSec)
 +
* testbed: 2n-icx, 3n-icx
 +
* examples: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-icx/10/log.html.gz 2n-icx VXLAN], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-3n-icx/22/log.html.gz#s1-s1-s1-s1-s1-t6 3n-icx]
 +
* ticket: [https://jira.fd.io/browse/CSIT-1801 CSIT-1801]
 +
 
 +
===== (M) 9000b all AVF tests are failing to forward traffic =====
 +
 
 +
* last update: 2023-02-09
 +
* work-to-fix: hard
 +
* test: 9000B + AVF
 +
* testbed: 3n-icx
 +
* examples: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-3n-icx/13/log.html.gz#s1-s1-s1-s1-s1-t6 3n-icx ip4base]
 +
* ticket: [https://jira.fd.io/browse/CSIT-1885 CSIT-1885]
 +
 
 +
==== (M) 2n-clx, 2n-icx, 2n-zn2: DPDK testpmd 9000b tests on xxv710 nic are failing with no traffic ====
 +
 
 +
* last update: 2023-02-09
 +
* work-to-fix: medium
 +
* rca: The DPDK app only attempts to set MTU once, but if interface is down (CSIT-1848) it fails. As a workaround, MTU could be set on Linux interface before starting the DPDK app.
 +
* test: DPDK testpmd 9000b
 +
* frequency: always
 +
* testbed: 2n-clx, 2n-icx, 2n-zn2
 +
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2210-2n-clx/1/log.html.gz#s1-s1-s1-s3-t6 2n-clx], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2210-2n-icx/3/log.html.gz#s1-s1-s1-s1-t6 2n-icx]
 +
* ticket: [https://jira.fd.io/browse/CSIT-1870 CSIT-1870]
 +
* note: Vratko will fix, either in general workaround for CSIT-1848 or in a separate change.
  
 
==== (M) 2n-clx, 2n-icx: all Geneve tests with 1024 tunnels fail ====
 
==== (M) 2n-clx, 2n-icx: all Geneve tests with 1024 tunnels fail ====
  
* (M) All Geneve L3 mode scale tests (1024 tunnels) are failing
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: hard
** rca: VPP crash, Failed to add IP neighbor on interface geneve_tunnel258
+
* rca: VPP crash, Failed to add IP neighbor on interface geneve_tunnel258
** test: avf-ethip4--ethip4udpgeneve-1024tun-ip4base 64B 1518B IMIX 1c 2c 4c
+
* test: avf-ethip4--ethip4udpgeneve-1024tun-ip4base 64B 1518B IMIX 1c 2c 4c
** frequency: always
+
* frequency: always
** testbed: 2n-skx, 2n-clx, 2n-icx
+
* testbed: 2n-skx, 2n-clx, 2n-icx
** example: [2n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-icx/10/log.html.gz#s1-s1-s1-s1-s1)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-icx/10/log.html.gz#s1-s1-s1-s1-s1 2n-icx]
** ticket: [https://jira.fd.io/browse/CSIT-1800 CSIT-1800]
+
* ticket: [https://jira.fd.io/browse/CSIT-1800 CSIT-1800]
** note:
+
  
 
==== (L) 2n-clx, 2n-icx: nat44ed cps 16M sessions scale fail ====
 
==== (L) 2n-clx, 2n-icx: nat44ed cps 16M sessions scale fail ====
  
* (L) All NAT44-ED 16M sessions CPS scale tests fail while setting NAT44 address range.
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: hard
** rca: VPP crash, Failed to set NAT44 address range on host 10.30.51.44 (connections-per-second tests only)
+
* rca: VPP crash, Failed to set NAT44 address range on host 10.30.51.44 (connections-per-second tests only)
** test: 64B-avf-ethip4tcp-nat44ed-h262144-p63-s16515072-cps-ndrpdr 1c 2c 4c, 64B-avf-ethip4udp-nat44ed-h262144-p63-s16515072-cps-ndrpdr 1c 2c 4c
+
* test: 64B-avf-ethip4tcp-nat44ed-h262144-p63-s16515072-cps-ndrpdr 1c 2c 4c, 64B-avf-ethip4udp-nat44ed-h262144-p63-s16515072-cps-ndrpdr 1c 2c 4c
** frequency: always
+
* frequency: always
** testbeds: 2n-skx, 2n-clx, 2n-icx
+
* testbeds: 2n-skx, 2n-clx, 2n-icx
** example: [2n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-icx/18/log.html.gz#s1-s1-s1-s1-s11-t3), [2n-clx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-clx/9/log.html.gz#s1-s1-s1-s1-s11-t1)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-icx/18/log.html.gz#s1-s1-s1-s1-s11-t3 2n-icx], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-clx/9/log.html.gz#s1-s1-s1-s1-s11-t1 2n-clx]
** ticket: [https://jira.fd.io/browse/CSIT-1799 CSIT-1799]
+
* ticket: [https://jira.fd.io/browse/CSIT-1799 CSIT-1799]
** note:
+
  
 
==== (L) 2n-clx, 2n-icx: nat44det imix 1M sessions fails to create sessions ====
 
==== (L) 2n-clx, 2n-icx: nat44det imix 1M sessions fails to create sessions ====
  
* (L) 2n-clx, 2n-icx: All NAT44DET NDR PDR IMIX over 1M sessions BIDIR tests failing to create enough sessions
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: hard
** rca:
+
* rca:
** test: IMIX over 1M sessions bidir
+
* test: IMIX over 1M sessions bidir
** frequency: always
+
* frequency: always
** testbed: 2n-skx, 2n-clx, 2n-icx
+
* testbed: 2n-skx, 2n-clx, 2n-icx
** example: [2n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-icx/18/log.html.gz#s1-s1-s1-s1-s2-t4)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-2n-icx/18/log.html.gz#s1-s1-s1-s1-s2-t4 2n-icx]
** ticket: [https://jira.fd.io/browse/CSIT-1884 CSIT-1884]
+
* ticket: [https://jira.fd.io/browse/CSIT-1884 CSIT-1884]
** note:
+
  
== Sometimes failing tests ==
+
== Occasional Failures ==
  
=== in trending - high frequency failures ===
+
=== In Trending ===
  
 
==== (H) 2n-icx: NFV density VPP does not start in container ====
 
==== (H) 2n-icx: NFV density VPP does not start in container ====
  
* (H) 2n-icx: NFV density tests breaks VPP which fails to start (re-opened)
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: hard
** rca:
+
* rca:
** test: all subsequent
+
* test: all subsequent
** frequency: medium
+
* frequency: medium
** testbed: 2n-icx
+
* testbed: 2n-icx
** example: [2n-icx mrr](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-weekly-master-2n-icx/47/log.html.gz#s1-s1-s1-s1-s1-s1-s1), [2n-icx ndrpdr](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-icx/48/log.html.gz#s1-s1-s1-s5-s8-t1)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-weekly-master-2n-icx/57/log.html.gz 2n-icx mrr], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-icx/48/log.html.gz#s1-s1-s1-s5-s8-t1 2n-icx ndrpdr]
** ticket: [https://jira.fd.io/browse/CSIT-1881 CSIT-1881]
+
* ticket: [https://jira.fd.io/browse/CSIT-1881 CSIT-1881]
** note: Once VPP breaks, all subsequent tests fail. Even all subsequent builds will be failing until Peter makes TB working again. Although it's failing with medium frequency when it happens it breaks all subsequent builds on the TB therefore [H] priority.
+
* note: Once VPP breaks, all subsequent tests fail. Even all subsequent builds will be failing until Peter makes TB working again. Although it's failing with medium frequency when it happens it breaks all subsequent builds on the TB therefore [H] priority.
  
 
==== (M) 2n-clx: e810 mlrsearch tests packets forwarding in one direction ====
 
==== (M) 2n-clx: e810 mlrsearch tests packets forwarding in one direction ====
  
* (M) 2n-clx: half of the packets lost on PDR tests (re-opened)
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: hard
** rca:
+
* rca:
** test: e810Cq ip4base, ip6base
+
* test: e810Cq ip4base, ip6base
** frequency: high
+
* frequency: high
** testbed: 2n-clx
+
* testbed: 2n-clx
** example: [2n-clx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-clx/167/log.html.gz#s1-s1-s1-s2-s8-t1)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-clx/176/log.html.gz#s1-s1-s1-s2-s8-t1 2n-clx]
** ticket: [https://jira.fd.io/browse/CSIT-1864 CSIT-1864]
+
* ticket: [https://jira.fd.io/browse/CSIT-1864 CSIT-1864]
** note:
+
  
==== (M) 3n-snr: 25GE links randomly going down between snr/sut and icx/tg-trex ====
+
==== (M) 3n-icx, 3n-snr: wireguard 100 and 1000 tunnels mlrsearch tests failing with 2c and 4c ====
  
* (M) 3n-snr: 25GE interface between SUT and TG/TRex goes down randomly
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: easy
** rca:
+
* rca:
** test: all subsequent
+
* test: wireguard 100 tunnels and more
** frequency: high
+
* frequency: high
** testbed: 3n-snr
+
* testbed: 3n-icx, 3n-snr
** example: [3n-snr](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-snr/45/log.html.gz#s1-s1-s1-s3-s12-t1)
+
* examples: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-3n-icx/56/log.html.gz#s1-s1-s1-s3-s8-t4 3n-icx]
** ticket: [https://jira.fd.io/browse/CSIT-1871 CSIT-1871]
+
* ticket: [https://jira.fd.io/browse/CSIT-1886 CSIT-1886]
** note: Sometimes 'TwentyFiveGigabitEthernetec/0/0' goes down and all subsequent tests fail.
+
 
+
==== (M) 3n-icx: wireguard 1k tunnels mlrsearch tests failing with 2c and 4c ====
+
 
+
* (M) 3n-icx: Wireguard tests with 100 and more tunnels are failing PDR criteria
+
** work-to-fix: easy
+
** rca:
+
** test: wireguard 100 tunnels and more
+
** frequency: high
+
** testbed: 3n-icx
+
** examples: [3n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-coverage-2210-3n-icx/23/log.html.gz#s1-s1-s1-s1-s1-t2)
+
** ticket: [https://jira.fd.io/browse/CSIT-1886 CSIT-1886]
+
** note:
+
 
    
 
    
==== (M) 3n-tsh: vpp in VM not starting ====
+
==== (M) 3n-tsh: vpp in VM starting too slowly ====
  
* (M) 3n-tsh: VM tests failing to boot VM
+
* last update: before 2023-02-22
** work-to-fix: easy
+
* work-to-fix: medium
** rca:
+
* rca: perhaps related to numa, investigation continues
** test: 3n-tsh: sporadic VM vhost
+
* test: 3n-tsh: sporadic VM vhost
** frequency: high
+
* frequency: high
** testbed: 3n-tsh
+
* testbed: 3n-tsh
** example: [3n-tsh](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-tsh/710/log.html.gz#s1-s1-s1-s7-s2-t1)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-tsh/738/log.html.gz#s1-s1-s1-s7-s2-t1 3n-tsh], [https://jenkins.fd.io/view/csit/job/csit-vpp-perf-verify-master-3n-tsh/123/ 3n-tsh]
** ticket: [https://jira.fd.io/browse/CSIT-1877 CSIT-1877]
+
* ticket: [https://jira.fd.io/browse/CSIT-1877 CSIT-1877]
** note: 3n-alt testbed was fixed. 3n-tsh still failing. fixed: by rebuild initrd .37 on TB, [3n-tsh test log](https://jenkins.fd.io/view/csit/job/csit-vpp-perf-verify-master-3n-tsh/123/)
+
  
=== in trending - lower frequency failures ===
+
== Rare Failures ==
 +
 
 +
=== In Trending ===
  
 
==== (M) 3n-icx, 3n-snr: 1518B IPsec packets not passing ====
 
==== (M) 3n-icx, 3n-snr: 1518B IPsec packets not passing ====
  
* (M) 3n-icx, 3n-skx, 3n-snr: all 1518B AVF crypto tests failed with no traffic, all IMIX AVF crypto with excessive packet loss
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: hard
** rca:
+
* rca:
** test: all AVF crypto
+
* test: all AVF crypto
** frequency: low
+
* frequency: low
** testbed: 3n-skx, 3n-icx, 3n-snr
+
* testbed: 3n-skx, 3n-icx, 3n-snr
** example: [3n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/148/log.html.gz#s1-s1-s1-s1-s4-t1), [3n-snr](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-snr/32/log.html.gz#s1-s1-s1-s1-s4-t1), [3n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-3n-icx/43/log.html.gz#s1-s1-s1-s1-s4-t1)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/197/log.html.gz#s1-s1-s1-s1-s4-t1 3n-icx daily], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-snr/32/log.html.gz#s1-s1-s1-s1-s4-t1 3n-snr], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-3n-icx/57/log.html.gz#s1-s1-s1-s1-s4-t1 3n-icx weekly]
** ticket: [https://jira.fd.io/browse/CSIT-1827 CSIT-1827]
+
* ticket: [https://jira.fd.io/browse/CSIT-1827 CSIT-1827]
** note:
+
  
 
==== (M) all testbeds: mlrsearch fails to find NDR rate ====
 
==== (M) all testbeds: mlrsearch fails to find NDR rate ====
  
* (M) 3n-tsh, 3n-alt, 2n-clx testbed (Taishan, Altra, Cascade-lake): NDR tests failing from time to time.
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: hard
** rca:
+
* rca:
** test: Crypto, Ip4, L2, Srv6, Vm Vhost (all packet sizes, all core configurations affected)
+
* test: Crypto, Ip4, L2, Srv6, Vm Vhost (all packet sizes, all core configurations affected)
** frequency: low
+
* frequency: low
** testbed: 3n-tsh, 3n-alt, 2n-clx
+
* testbed: 3n-tsh, 3n-alt, 2n-clx
** example: [2n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-icx/47/log.html.gz#s1-s1-s1-s2-s37-t1)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-icx/57/log.html.gz#s1-s1-s1-s2-s37-t2 2n-icx]
** ticket: [https://jira.fd.io/browse/CSIT-1804 CSIT-1804]
+
* ticket: [https://jira.fd.io/browse/CSIT-1804 CSIT-1804]
** note:
+
  
 
==== (M) all testbeds: AF_XDP mlrsearch fails to find NDR rate ====
 
==== (M) all testbeds: AF_XDP mlrsearch fails to find NDR rate ====
  
* (M) all testbeds: AF-XDP - NDR tests failing from time to time
+
* last update: before 2023-01-31
** work-to-fix: hard
+
* work-to-fix: hard
** rca:
+
* rca:
** test: af-xdp multicore tests
+
* test: af-xdp multicore tests
** frequency: low
+
* frequency: low
** testbed: 2n-clx, 2n-skx, 2n-tx2, 2n-icx
+
* testbed: 2n-clx, 2n-skx, 2n-tx2, 2n-icx
** example: [2n-skx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-skx/202/log.html.gz#s1-s1-s1-s2-s4-t3), [2n-clx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-clx/152/log.html.gz#s1-s1-s1-s5-s12-t3)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-skx/202/log.html.gz#s1-s1-s1-s2-s4-t3 2n-skx], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-clx/152/log.html.gz#s1-s1-s1-s5-s12-t3 2n-clx]
** ticket: [https://jira.fd.io/browse/CSIT-1802 CSIT-1802]
+
* ticket: [https://jira.fd.io/browse/CSIT-1802 CSIT-1802]
** note: This is mainly observed in iterative and coverage. It's very low frequency ~ 1 out of 100
+
* note: This is mainly observed in iterative and coverage. It's very low frequency ~ 1 out of 100
  
 
==== (L) all testbeds: vpp create avf interface failure in multi-core configs ====
 
==== (L) all testbeds: vpp create avf interface failure in multi-core configs ====
  
* (L) multicore AVF tests are failing when trying to create interface
+
* last update: 2023-02-06
** work-to-fix: hard
+
* work-to-fix: hard
** rca: issue in Intel FVL driver
+
* rca: issue in Intel FVL driver
** test: multicore AVF
+
* test: multicore AVF
** frequency: low
+
* frequency: low
** testbed: all testbeds
+
* testbed: all testbeds
** example: [2n-zn2](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-zn2/639/log.html.gz#s1-s1-s1-s2-s18-t3), [3n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/152/log.html.gz#s1-s1-s1-s5-s1-t2)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-clx/1257/log.html.gz#s1-s1-s1-s5-s24-t2 2n-clx], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-icx/197/log.html.gz#s1-s1-s1-s5-s1-t3 3n-icx]
** ticket: [https://jira.fd.io/browse/CSIT-1782 CSIT-1782]
+
* ticket: [https://jira.fd.io/browse/CSIT-1782 CSIT-1782]
** note: A long standing issue without a final permanent fix.
+
* note: A long standing issue without a final permanent fix.
  
==== (L) 2n-dnv, 3n-dnv: x557 auto-negotiating 1ge instead of 10ge ====
+
==== (L) all testbeds: nat44det 4M and 16M scale 1 session not established ====
  
* (L) T-Rex STL runtime error
+
* last update: 2023-02-14
** work-to-fix: hard
+
* work-to-fix: hard
** rca: VPP code - X557 speed_capability set 1GE instead of 10GE
+
* rca: unknown
** test: all tests
+
* test: nat44det udp 4m and 16m (64k is ok, 1m can fail but rarely than bigger scales)
** frequency: high
+
* frequency: low
** testbed: 2n-dnv and 3n-dnv
+
* testbed: 2n-zn2, 2n-skx, 2n-icx, 2n-clx
** example: [2n-dnv](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-dnv/1264/log.html.gz#s1-s1-s1-s1-s3-t1), [3n-dnv](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-dnv/1274/log.html.gz#s1-s1-s1-s2-s1-t1)
+
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-zn2/672/log.html.gz#s1-s1-s1-s2-s22-t3 2n-zn2], [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-clx/1271/log.html.gz#s1-s1-s1-s2-s60-t1-k2-k11-k1-k2 2n-clx]
** ticket: [/VPP-2010](https://jira.fd.io/browse/VPP-2010)
+
* ticket: [https://jira.fd.io/browse/CSIT-1795 CSIT-1795]
** note: TODO VPP to fix speed_capability.
+
  
==== (L) all testbeds: nat44det 4M and 16M scale 1 session not established ====
+
= Past Failures =
 +
 
 +
==== (M) csit-dpdk-perf-mrr-weekly-master-3n-snr fails due to a missing symlink ====
 +
 
 +
* last update: 2023-02-14
 +
* rca: Missing file in CSIT git (probably an oversight).
 +
* test: all (robot does not even start)
 +
* testbed: 3n-snr
 +
* frequency: always
 +
* example: https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-3n-alt/155/log.html.gz#s1-s1-s1-s1-s1-t1
 +
* ticket: [https://jira.fd.io/browse/CSIT-1894 CSIT-1894]
 +
* gerrit: https://gerrit.fd.io/r/c/csit/+/38239
 +
* note: Fix verified by https://jenkins.fd.io/view/csit/job/csit-dpdk-perf-mrr-weekly-master-3n-snr/26/
 +
 
 +
==== (H) 3n-icx: vpp hoststack QUIC vppecho tests failing ====
 +
 
 +
* last update: 2023-02-14
 +
* test: Quic vppecho BPS
 +
* frequency: always
 +
* testbed: 3n-skx, 3n-icx
 +
* example: [https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2210-3n-icx/17/log.html.gz#s1-s1-s1-s1-s5-t1 3n-icx]
 +
* ticket: [https://jira.fd.io/browse/CSIT-1835 CSIT-1835]
 +
* gerrit: https://gerrit.fd.io/r/c/csit/+/38085
 +
* note: Fix verified since https://jenkins.fd.io/view/csit/job/csit-vpp-perf-hoststack-daily-master-3n-icx/2/
 +
 
 +
==== (M) wrong MAC address on lf_2n_clx_testbed27.yaml ====
  
* (L) Not all DET44 sessions have been established: 4128767 != 4128768
+
* last update: 2023-02-14
** work-to-fix: hard
+
* rca: typo in topology yaml file
** rca:
+
* test: mlx5 relying on MAC. Affected: memif, vhost, l2bd. Not affected: ip4, ip6, dot1q, other L2.
** test: nat44det udp 4m and 16m (64k and 1m are ok)
+
* testbed: 2n-clx, only the first testbed out of three in lab
** frequency: low
+
* frequency: always, unless other 2n-clx testbed is reserved
** testbed: 2n-zn2, 2n-skx, 2n-icx, 2n-clx
+
* example: https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-clx/1270/log.html.gz#s1-s1-s1-s1-s3-t1-k2-k9-k1-k1-k1-k21
** example: [2n-icx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-icx/160/log.html.gz#s1-s1-s1-s2-s35-t1), [2n-clx](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-clx/164/log.html.gz#s1-s1-s1-s2-s54-t1)
+
* ticket: [https://jira.fd.io/browse/CSIT-1893 CSIT-1893]
** ticket: [https://jira.fd.io/browse/CSIT-1795 CSIT-1795]
+
* gerrit: https://gerrit.fd.io/r/c/csit/+/38239
** note:
+
* note: Fix verified since https://jenkins.fd.io/view/csit/job/csit-vpp-perf-mrr-daily-master-2n-clx/1271/
  
==== (L) 2n-dnv: nat44ed 1518B 64k sessions not establishing all sessions ====
+
==== (M) wrong MAC address on lf_3n_icx_testbed37.yaml ====
  
* (L) 2n-dnv: sporadic 1518B tput tests failing to establish required sessions
+
* last update: 2023-02-21
** work-to-fix: hard
+
* work-to-fix: easy
** rca:
+
* rca: typo in topology yaml file
** test: 1518B tput
+
* test: tests using 100Ge2P1E810Cq on that testbed with dpdk plugin; AVF is not affected (as that has its own MAC addresses on VFs)
** frequency: low
+
* testbed: 3n-icx, only the first testbed out of three in lab
** testbeds: 2n-dnv
+
* frequency: always, unless other 3n-icx testbed is reserved
** examples: [2n-dnv](https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-dnv/1264/log.html.gz#s1-s1-s1-s1-s7-t4)
+
* example: https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2302-3n-icx/5/log.html.gz#s1-s1-s1-s5-s7-t1-k2-k5-k4
** ticket: [https://jira.fd.io/browse/CSIT-1850 CSIT-1850]
+
* ticket: [https://jira.fd.io/browse/CSIT-1898 CSIT-1898]
** note:
+
* gerrit: https://gerrit.fd.io/r/c/csit/+/38239
 +
* note: Fix verified since https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2302-3n-icx/15/log.html.gz#s1-s1-s1-s4-s2

Revision as of 14:15, 22 February 2023

Contents

CSIT Test Failure Clasification

All known CSIT failures grouped and listed in the following order:

  • Always failing followed by sometimes failing.
  • Always failing tests:
    • Most common use cases followed by less common.
  • Sometimes failing tests:
    • Most frequently failing followed by less frequently failing.
      • High frequency 50%-100%
      • medium frequency 10%-50%
      • low frequency 0%-10%.
    • Within each sub-group: most common use cases followed by less common.

CSIT Test Fixing Priorities

Test fixing work priorities defined as follows:

  • (H)igh priority, most common use cases and most common test code.
  • (M)edium priority, specific HW and pervasive test code issue.
  • (L)ow priority, corner cases and external dependencies.

Current Failures

Deterministic Failures

In Trending

(M) 2n-zn2: All 4c RDMA tests are failing

  • last update: before 2023-02-22
  • work-to-fix: medium?
  • rca: VPP change 38242 causes a crash, stack trace looks the same (if it exists), does not happen with debug VPP build. More specifics not known yet.
  • test: only on RDMA, only on 2n-zn2 (not on 2n-clx), higher core count is affected more
  • frequency: 4c almost always, 2c sometimes, 1c rarely.
  • testbed: 2n-zn2
  • example: 2n-zn2
  • ticket: VPP-2070

(M) 3n-snr: All hwasync wireguard tests failing when trying to verify device

  • last update: before 2023-01-31
  • work-to-fix: hard
  • rca: Missing QAT driver. Symptom: Failed to bind PCI device 0000:f4:00.0 to c4xxx on host 10.30.51.93
  • test: hwasync wireguard
  • frequency: always
  • testbed: 3n-snr
  • example: 3n-snr
  • ticket: CSIT-1883

(M) 1n-aws: TRex mlrsearch fails to find NDR & PDR due to AWS rate limiting (5min total test duration)

  • last update: 2023-02-09
  • work-to-fix: hard
  • rca:
  • test: ip4scale2m
  • frequency: always
  • testbed: 1n-aws
  • example: 1n-aws
  • ticket: CSIT-1876
  • note: The root cause can be shared environment in aws cloud. We may need to use a smaller scale there.

(M) 3n-alt, 3n-snr: testpmd no traffic forwarded

  • last update: 2023-02-09
  • work-to-fix: medium
  • rca: DUT-DUT link takes too long to come up on some testbeds. This happens *after* a test case with a DPDK app (not VPP even when using dpdk plugin), although multiple subsequent tests (even with VPP) may be affected. The real cause is probably in NIC firmware or driver, but CSIT can be better at detecting port status as a workaround.
  • test: testpmd (also l3fwd but hidden by CSIT-1896)
  • frequency: always (almost)
  • testbed: 3n-alt, 3n-snr
  • example: 3n-alt, 3n-snr, 3n-snr
  • ticket: CSIT-1848

(M) 3n-alt: Tests failing until 40Ge Interface comes up

(L) 3n-icx: negative ipackets on TB38 AVF 4c l2patch

Not In Trending

(M) all testbeds: some vpp 9000B tests

  • last update: 2023-02-09
  • work-to-fix: hard
  • rca: VPP code: 34839: dpdk: cleanup MTU handling. CSIT needs to rework how it sets MTU / max frame rate (CSIT-1797). Some tests will continue failing due to missing support on VPP side, we will open specific Jira tickets for those.
  • test: see sub-items
  • frequency: always
  • testbed: all
  • examples: see sub-items
  • ticket: CSIT-1809
  • gerrit: https://gerrit.fd.io/r/c/csit/+/37824
(M) tests with 9000B payload frames not forwarded over vhost interfaces
  • last update: 2023-02-09
  • work-to-fix: hard
  • test: 9000B + vhostuser
  • testbed: 2n-skx, 3n-skx, 2n-clx
  • examples: 3n-skx vhostuser
  • ticket: CSIT-1809
tests with 9000B payload frames not forwarded over memif interfaces
  • last update: 2023-02-09
  • work-to-fix: hard
  • test: 9000B + memif
  • testbed: 2n-skx, 3n-skx, 2n-clx
  • examples: 2n-skx Memif
  • ticket: CSIT-1808
9000B payload frames not forwarded over tunnels due to violating supported Max Frame Size (VxLAN, LISP, SRv6)
  • last update: 2023-02-09
  • work-to-fix: medium
  • test: 9000B + (IP4 tunnels VXLAN, IP4 tunnels LISP, Srv6, IpSec)
  • testbed: 2n-icx, 3n-icx
  • examples: 2n-icx VXLAN, 3n-icx
  • ticket: CSIT-1801
(M) 9000b all AVF tests are failing to forward traffic
  • last update: 2023-02-09
  • work-to-fix: hard
  • test: 9000B + AVF
  • testbed: 3n-icx
  • examples: 3n-icx ip4base
  • ticket: CSIT-1885

(M) 2n-clx, 2n-icx, 2n-zn2: DPDK testpmd 9000b tests on xxv710 nic are failing with no traffic

  • last update: 2023-02-09
  • work-to-fix: medium
  • rca: The DPDK app only attempts to set MTU once, but if interface is down (CSIT-1848) it fails. As a workaround, MTU could be set on Linux interface before starting the DPDK app.
  • test: DPDK testpmd 9000b
  • frequency: always
  • testbed: 2n-clx, 2n-icx, 2n-zn2
  • example: 2n-clx, 2n-icx
  • ticket: CSIT-1870
  • note: Vratko will fix, either in general workaround for CSIT-1848 or in a separate change.

(M) 2n-clx, 2n-icx: all Geneve tests with 1024 tunnels fail

  • last update: before 2023-01-31
  • work-to-fix: hard
  • rca: VPP crash, Failed to add IP neighbor on interface geneve_tunnel258
  • test: avf-ethip4--ethip4udpgeneve-1024tun-ip4base 64B 1518B IMIX 1c 2c 4c
  • frequency: always
  • testbed: 2n-skx, 2n-clx, 2n-icx
  • example: 2n-icx
  • ticket: CSIT-1800

(L) 2n-clx, 2n-icx: nat44ed cps 16M sessions scale fail

  • last update: before 2023-01-31
  • work-to-fix: hard
  • rca: VPP crash, Failed to set NAT44 address range on host 10.30.51.44 (connections-per-second tests only)
  • test: 64B-avf-ethip4tcp-nat44ed-h262144-p63-s16515072-cps-ndrpdr 1c 2c 4c, 64B-avf-ethip4udp-nat44ed-h262144-p63-s16515072-cps-ndrpdr 1c 2c 4c
  • frequency: always
  • testbeds: 2n-skx, 2n-clx, 2n-icx
  • example: 2n-icx, 2n-clx
  • ticket: CSIT-1799

(L) 2n-clx, 2n-icx: nat44det imix 1M sessions fails to create sessions

  • last update: before 2023-01-31
  • work-to-fix: hard
  • rca:
  • test: IMIX over 1M sessions bidir
  • frequency: always
  • testbed: 2n-skx, 2n-clx, 2n-icx
  • example: 2n-icx
  • ticket: CSIT-1884

Occasional Failures

In Trending

(H) 2n-icx: NFV density VPP does not start in container

  • last update: before 2023-01-31
  • work-to-fix: hard
  • rca:
  • test: all subsequent
  • frequency: medium
  • testbed: 2n-icx
  • example: 2n-icx mrr, 2n-icx ndrpdr
  • ticket: CSIT-1881
  • note: Once VPP breaks, all subsequent tests fail. Even all subsequent builds will be failing until Peter makes TB working again. Although it's failing with medium frequency when it happens it breaks all subsequent builds on the TB therefore [H] priority.

(M) 2n-clx: e810 mlrsearch tests packets forwarding in one direction

  • last update: before 2023-01-31
  • work-to-fix: hard
  • rca:
  • test: e810Cq ip4base, ip6base
  • frequency: high
  • testbed: 2n-clx
  • example: 2n-clx
  • ticket: CSIT-1864

(M) 3n-icx, 3n-snr: wireguard 100 and 1000 tunnels mlrsearch tests failing with 2c and 4c

  • last update: before 2023-01-31
  • work-to-fix: easy
  • rca:
  • test: wireguard 100 tunnels and more
  • frequency: high
  • testbed: 3n-icx, 3n-snr
  • examples: 3n-icx
  • ticket: CSIT-1886

(M) 3n-tsh: vpp in VM starting too slowly

  • last update: before 2023-02-22
  • work-to-fix: medium
  • rca: perhaps related to numa, investigation continues
  • test: 3n-tsh: sporadic VM vhost
  • frequency: high
  • testbed: 3n-tsh
  • example: 3n-tsh, 3n-tsh
  • ticket: CSIT-1877

Rare Failures

In Trending

(M) 3n-icx, 3n-snr: 1518B IPsec packets not passing

(M) all testbeds: mlrsearch fails to find NDR rate

  • last update: before 2023-01-31
  • work-to-fix: hard
  • rca:
  • test: Crypto, Ip4, L2, Srv6, Vm Vhost (all packet sizes, all core configurations affected)
  • frequency: low
  • testbed: 3n-tsh, 3n-alt, 2n-clx
  • example: 2n-icx
  • ticket: CSIT-1804

(M) all testbeds: AF_XDP mlrsearch fails to find NDR rate

  • last update: before 2023-01-31
  • work-to-fix: hard
  • rca:
  • test: af-xdp multicore tests
  • frequency: low
  • testbed: 2n-clx, 2n-skx, 2n-tx2, 2n-icx
  • example: 2n-skx, 2n-clx
  • ticket: CSIT-1802
  • note: This is mainly observed in iterative and coverage. It's very low frequency ~ 1 out of 100

(L) all testbeds: vpp create avf interface failure in multi-core configs

  • last update: 2023-02-06
  • work-to-fix: hard
  • rca: issue in Intel FVL driver
  • test: multicore AVF
  • frequency: low
  • testbed: all testbeds
  • example: 2n-clx, 3n-icx
  • ticket: CSIT-1782
  • note: A long standing issue without a final permanent fix.

(L) all testbeds: nat44det 4M and 16M scale 1 session not established

  • last update: 2023-02-14
  • work-to-fix: hard
  • rca: unknown
  • test: nat44det udp 4m and 16m (64k is ok, 1m can fail but rarely than bigger scales)
  • frequency: low
  • testbed: 2n-zn2, 2n-skx, 2n-icx, 2n-clx
  • example: 2n-zn2, 2n-clx
  • ticket: CSIT-1795

Past Failures

(M) csit-dpdk-perf-mrr-weekly-master-3n-snr fails due to a missing symlink

(H) 3n-icx: vpp hoststack QUIC vppecho tests failing

(M) wrong MAC address on lf_2n_clx_testbed27.yaml

(M) wrong MAC address on lf_3n_icx_testbed37.yaml