Difference between revisions of "VPP/AArch64"

From fd.io
< VPP
Jump to: navigation, search
(FD.io Lab)
(Meeting Minutes)
Line 252: Line 252:
  
 
== Meeting Minutes ==
 
== Meeting Minutes ==
 +
=== 5/1/2018 ===
 +
 +
* Action Items - Next Week
 +
** Sirshak: Follow up with Mohammed regarding ThunderX mgmt connectivity and mcbin - IP addresses allocated cavium-2 has IPMI connectivity but console still hanging. cavium-1,3 - Not able to connect to IPMI.
 +
** Sirshak to hold a call with Khem and Adarsh to understand the Vm_vhsot issue because of nested VMs - Contact established still working on analyzing the setup.
 +
** Sirshak to create consolidated ARM ecosystem xls to reflect CSIT effort. - Not Done will do it next week.
 +
** Honnappa: memcpy benchmarking
 +
** Brian : CSIT-990(buildroot)
 +
** Brian to publish a pictorial representation of rx queues and tx queues in multicore case for mcbin. - Moved to next week
 +
** Khem to analyze make test failure in Taishan - 1804 - Next Week
 +
** ARM - For TG for deciding connectivity - MCBin and Taishan - Working on it.
 +
** CSIT 990 brian to try - Next Week
 +
** Sirshak/Brian to recheck validity of ASLR issue. - Not Done. Next Week.
 +
 
=== 5/1/2018 ===
 
=== 5/1/2018 ===
 
* New Joinees
 
* New Joinees

Revision as of 22:01, 7 May 2018

Meeting Details

Weekly on Tuesdays, 6AM PT / 3PM CET / 7:30PM IST / 10PM CST. FD.io Zoom Meeting room

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/5301185804 or https://zoom.us/my/fastdata

Or iPhone one-tap (US Toll):  +14086380968,,5301185804# or +16465588656,,5301185804#

Or Telephone:
    Dial: +1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll)
    +1 855 880 1246 (US Toll Free)
    +1 877 369 0926 (US Toll Free)
    Meeting ID: 530 118 5804
    International numbers available: https://zoom.us/zoomconference?m=ppBOQMQTVxGYmbxNsVemC6KNo8eX2ptF

IRC Channel

#fdio-arm on freenode.net

FD.io Lab

Platform Role Status Hostname IP IPMI Cores RAM Ethernet
SoftIron OverDrive 1000 CI build server Running softiron-1 10.30.51.12 4 8GB
CI build server Running softiron-2 10.30.51.13 4 8GB
CI build server Running softiron-3 10.30.51.14 4 8GB
Cavium ThunderX CI build server Needs OS cavium-1 10.30.51.38 10.30.50.38 96 128GB 3x40GbE QSFP+ / 4x10GbE SFP+
CI build server Needs OS cavium-2 10.30.51.39 10.30.50.39 96 128GB 3x40GbE QSFP+ / 4x10GbE SFP+
CI build server Needs OS cavium-3 10.30.51.40 10.30.50.40 96 128GB 3x40GbE QSFP+ / 4x10GbE SFP+
Huawei TaiShan 2280 CSIT testbed Running huawei-1 10.30.51.36 10.30.50.36 64 128GB 2x10GbE SFP+ Intel 82599 / 2x25GbE SFP28 Mellanox CX-4
CSIT testbed Running huawei-2 10.30.51.37 10.30.50.37 64 128GB 2x10GbE SFP+ Intel 82599 / 2x25GbE SFP28 Mellanox CX-4
Marvell MACCHIATObin CSIT testbed At VEXXHOST mcbin-1 10.30.51.41 10.30.50.64 4 16GB 2x10GbE SFP+
CSIT testbed At VEXXHOST mcbin-2 10.30.51.42 10.30.50.65 4 16GB 2x10GbE SFP+
CSIT testbed At VEXXHOST mcbin-3 10.30.51.43 10.30.50.66 4 16GB 2x10GbE SFP+

Build, unit test, packaging

The following is tracked manually until hardware is integrated into upstream FD.io CI
Cmd Status timing
make bootstrap OK 0m45
make build OK 11m45
make build-release OK 14m56
make test OK 33m40
make test-all KO (kubeproxy) 46m30
make test-debug OK 22m32
make test-all-debug KO (kubeproxy) 33m29

Status on commit: a38783e0d1ab1d4c661570a1ec90670a1fb0598d (Thu Feb 15 07:31:01 2018 +0000)

kubeproxy tests are broken on purpose: corresponding features are not fully implemented

Timing consideration on platform: Hierofalcon with Cortex-A57 & Fedora 26

Distro Cmd Status
Fedora 27 (Server Edition) make pkg-rpm OK
Ubuntu 17.10 make pkg-deb OK
Ubuntu 16.04.3 LTS make pkg-deb OK

CSIT

https://wiki.fd.io/view/CSIT/AArch64

AArch64 Tuning

Areas:

  • Profiling analysis & optimization
  • Runtime selection of code using existing methods in VPP

AArch64 Porting

Areas:

  • Hardware topology variation, e.g. non PCIe NICs
  • VPP-1215 - TC01 : Process untagged send tagged testcase failing due to same packet received as sent

All: JIRA issues with ARM64 label

Assigned and New:

CSIT-1019 In some situations the receive timeout of PacketVerifier.RxQueue is not working Khem
CSIT-1021 Jumbo frames tests based on Scapy pcap receive functions limited to 1600 bytes Khem
CSIT-1023 Look to extend crypto VPP func tests to support OpenSSL based PMD in DPDK Adarsh
CSIT-1043 Guest OS becomes unresponsive during CSIT crypto suite execution Khem
CSIT-990 CSIT uses buildroot package for nested VM image which doesn't support AARCH64 Nitin
VPP-1174 Prefetch hotspots Brian
VPP-1126 Benchmark and optimize clib_memcpy64_x4() Khem
VPP-1103 Use correct CPU freq on ARM platforms Sirshak
VPP-1114 Ensure correctness of atomics and memory ordering Sirshak
VPP-1267 L3Fwd performance tuning on Macchiatobin
VPP 1268 Support for configuring memory channels in DPDK-INPUT
Add top level make argument for custom cache line size VPP-1064 Nitin Saxena

Known Issues

Compilation may fail on systems with less than 1GB memory per core. One workaround is to search for -j in build-root/Makefile and multiply by 1 instead of 2.

GCC 5.3.x ICEs during FP register allocation. Please use GCC 5.4+.

Try disabling ASLR if experiencing random crashes: sysctl -w kernel.randomize_va_space=0

Recent Patches

add 'is_all_zero(x)' for NEON - fix build break Merged 2/20 Adrian Oanca
u8x16_compare_byte_mask optimization Merged 2/24 Adrian Oanca
Added u8x16,u32x4,u64x2 variants of _zero_byte_mask(x) for ARM/NEON platform Merged 2/26 VPP-1129 Adrian Oanca
add CLIB_HAVE_VEC128 with NEON intrinsics Merged 02/08 VPP-1127 Gabriel Ganne
Use neutral vector code for ethernet_frame_is_tagged Merged 2/19 Damjan Marion
vhost: Added ARMV8 NEON version of function map_guest_mem() Merged 2/7 VPP-1085 Nitin Saxena
vppinfra: use __atomic_fetch_add instead of __sync_fetch_and_add builtins VPP-1114 Kevin Wang
Arm system counter cleanup Merged 1/30 VPP-1125 Brian Brooks
svm: ... on autodetected VA space size (fixup again) Merged 01/10 Gabriel Ganne
svm: calc base address on AArch64 based on autodetected VA space size (fixup) Merged 01/10 Gabriel Ganne
svm: calc base address on AArch64 based on autodetected VA space size Merged 01/09 Damjan Marion
show cpu microarchitecture Merged 01/06 Gabriel Ganne
Fix Debian Packaging on AARCH64 Merged 01/06 Nitin Saxena
more extended tests fixes Merged 12/16 Gabriel Ganne
Use crc32 wrapper Merged 12/16 VPP-1086 Gabriel Ganne
implement clib_smp_pause() for arm and aarch64 platform Merged 12/15 VPP-1066 Kevin Wang
make "test-all" target pass again (for all platforms) Merged 12/13 Gabriel Ganne
fill "show cpu" Flag list on aarch64 platforms Merged 12/06 VPP-1065 Gabriel Ganne
remove smp dead code Merged 12/06 VPP-1066 Gabriel Ganne
net/virtio: support modern device id Merged 11/28 Gabriel Ganne
use REV on aarch64 for endianness swapping Merged 11/21 VPP-1067 Gabriel Ganne
armv8 crc32 - fix macro name Merged 11/15 Gabriel Ganne
bier - fix node table declaration Merged 11/14 Gabriel Ganne
Map SVM regions at a sane offset on arm64 Merged 11/10 Brian Brooks
bfd tests fix Merged 11/07 Gabriel Ganne
debian packaging fix Merged 11/06 Gabriel Ganne
lb test fix Merged 10/31 Gabriel Ganne
conditional x86intrin.h inclusion Merged 10/25 Gabriel Ganne
fix test_lb_ip4_gre6() cleanup Merged 10/24 Gabriel Ganne
null-terminate some formatted string Merged 10/20 Gabriel Ganne
lb plugin - fix format() type mismatches Merged 10/16 Gabriel Ganne
Use AESNI=y only on x86_64 machines Merged 10/14 Brian Brooks
Improved arm64 chip detection Merged 09/11 Brian Brooks
Native arm64 build: dpdk/Makefile change Merged 08/31 Brian Brooks

Meeting Minutes

5/1/2018

  • Action Items - Next Week
    • Sirshak: Follow up with Mohammed regarding ThunderX mgmt connectivity and mcbin - IP addresses allocated cavium-2 has IPMI connectivity but console still hanging. cavium-1,3 - Not able to connect to IPMI.
    • Sirshak to hold a call with Khem and Adarsh to understand the Vm_vhsot issue because of nested VMs - Contact established still working on analyzing the setup.
    • Sirshak to create consolidated ARM ecosystem xls to reflect CSIT effort. - Not Done will do it next week.
    • Honnappa: memcpy benchmarking
    • Brian : CSIT-990(buildroot)
    • Brian to publish a pictorial representation of rx queues and tx queues in multicore case for mcbin. - Moved to next week
    • Khem to analyze make test failure in Taishan - 1804 - Next Week
    • ARM - For TG for deciding connectivity - MCBin and Taishan - Working on it.
    • CSIT 990 brian to try - Next Week
    • Sirshak/Brian to recheck validity of ASLR issue. - Not Done. Next Week.

5/1/2018

  • New Joinees
    • Natalie and Yuval from Marvell for engineering input.
  • fd.io lab
    • Follow up on ThunderX to getting mgmt IP
    • Release Machine to EdK as soon as ThunderX is up.
    • Cavium has shipped more machines as well.
    • See the Taishan setup for any VM issue.
  • VPP
    • VPP-1064 Dave Barach rejected the patch based on the solution Damjan and Nitin had decided upon following the reason that current approach breaks cross compilation.
    • One solution suggested was creating a platform specific Makefile for ThunderX
    • Honnappa Suggested as this not just a ThunderX issue but also Qualcomm issue hence a ARM specific Makefile would be better.(Issue 128 byte Cache Line Size)
    • Honnappa no update on memcpy benchmarking will do that next week
    • 1019: fixed in local will upstream soon
    • 1021: Patch submitted centos env issue CSIT follow up.
    • 1023: migrated to openssl using DPDK manual but facing failed TCs
    • 1043: No updates
    • 990: Brian to Retry on mcbin
    • 1267: l3fwd performance tuning: Marvell to upstream a patch to enable dpdk on mcbin by making changes to dpdk plugin in vpp.
    • Auto-detection of memory channels: Andrew's comment no really way to do that hence to go with making it a runtime argument via startup conf instead of being hard coded.
    • Sachin facing issues with build rpm currently on 1801 will open a Jira Tkt if issues persists with 1804.
  • CSIT
    • Adarsh stalled with failure of test cases after using openssl.
    • Performance Testing Khem : NUMA node numbering issue.
    • NUMA node no issue not seen in ThunderX. Khem to post the details of issue and the workaround on Taishan.
    • Khem facing issues with trex installation on ARM hence he will try getting a x86 machine as TG.
    • Nitin known issue with trex with arm and mellanox card.
    • Khem to try L2BD and L2XC.
    • brian to use cache stashing and see the results.
  • Action Items - Next Week
    • Sirshak: Follow up with Mohammed regarding ThunderX mgmt connectivity and mcbin.
    • Sirshak to hold a call with Khem and Adarsh to understand the Vm_vhsot issue because of nested VMs - Not done yet will do it next week.
    • Sirshak to create consolidated ARM ecosystem xls to reflect CSIT effort. - Not Done will do it next week.
    • Honnappa: memcpy benchmarking
    • Brian : CSIT-990(buildroot)
    • Brian to publish a pictorial representation of rx queues and tx queues in multicore case for mcbin. - Moved to next week
    • Khem to analyze make test failure in Taishan - 1804 - Next Week
    • ARM - For TG for deciding connectivity - MCBin and Taishan - Working on it.
    • CSIT 990 brian to try - Next Week
    • Sirshak/Brian to recheck validity of ASLR issue. - Not Done. Next Week.
  • Action Items - Last Week
    • Khem to ask mohammed, anton for power clearance for 2 new taishan. - Ok for Power Clearance
    • Sirshak to hold a call with Khem and Adarsh to understand the Vm_vhsot issue because of nested VMs - Not done yet will do it next week.
    • Sirshak and Brian to discuss on TG connectivity. - Done
    • Sirshak to create consolidated ARM ecosystem xls to reflect CSIT effort. - Not Done will do it next week.
    • Nitin: To post vlib_main 1804_rc2 issue to community. - Done
    • Sirshak : to check if vlib_main is a issue in centriq. - Done
    • Nitin: AI for creating Jira for number of memory channel identification. - Done
    • Brian to publish a pictorial representation of rx queues and tx queues in multicore case for mcbin. - Moved to next week
    • John B - 1G to USB adapters Ship to lab. - Done
    • Khem to analyze make test failure in Taishan - 1802 rc2 - Next Week
    • ARM - For TG for deciding connectivity - MCBin and Taishan - Working on it.
    • CSIT 990 brian to try - Next Week
    • Sirshak to take 1103 and 1114 - Done
    • Nitin to Create l3fwd tkt - Done
    • Brian to create a mcbin crash tkt. Next Week
    • Maen to provide contact for IO Stashing on mcbin. - Contacted Brian. Brian to provide further input.
    • Sirshak/Brian to recheck validity of ASLR issue. - Not Done. Next Week.

4/25/2018

  • Meeting Time
    • Proposed time 6-8am Tuesday PST.
    • Tina to update wiki with new meeting time.
  • FD.io lab
    • ThunderX
      • OS installed on ThunderX. Switch being sent.
      • 1 ThunderX booted.
      • Plan to use 1G to USB adapters.
      • Varun POC for Cavium.
    • Taishan
      • Its up and connected to Internet.
      • Build and make test 2 TCs failing (VCL TCs failing) - 1802 rc2 used.
      • Brian no update for TG - Meeting on it next week.
      • Khem to ask mohammed, anton for power clearance for 2 new taishan.
    • MCBin
      • Maen POC - To Contact Mohammed.
      • Maen to provide engineering contact for help to Nitin.
  • VPP
    • Round Table status on Porting tkts.
    • Nitin: vlib_main taking a lot of time on both mcbin and thunderx2
    • Sirshak to take on ARM tkts.
  • CSIT
    • Adarsh looking at IPv4 failed test cases with priorty.
    • Sirshak to hold a call with Khem and Adarsh to understand the Vm_vhsot issue because of nested VMs
    • Cavium to publish mcbin cist performance nos but low priority. Nitin faced build-root issue with this.
    • Maciek to host a kick off call.
    • Sirshak and Brian to discuss on TG connectivity.
    • Sirshak to create consolidated ARM ecosystem xls to reflect CSIT effort.
  • Performance Benchmarking
    • Nitin: To post vlib_main 1804_rc2 issue to community.
    • Nitin: vlib_main issue in mcbin and thunderx2 at different points within the function. Not a hotspot in x86.
    • Sirshak : to check if vlib_main is a issue in centriq.
    • Nitin: AI for creating Jira for number of memory channel identification.
    • AI for creating Jira for the crash on Mcbin – Brian
    • Khem to get started on CSIT performance suite this week and publish on shared xls.
    • Brian to publish a pictorial representation of rx queues and tx queues in multicore case for mcbin.
  • Action Items - Last Week
    • Sirshak to add link to xls to wiki page. - Done by somebody else.
    • Brian to raise LF RT ticket about MACCHIATObins - Done. Pinged Mohammed yet hear back from him.
    • Nitin to check 'make test' on MACCHIATObin (16GB DRAM) - Failed. Error related to Python scripts.
    • Honnappa, Khem to check Clang build on arm64. - Tried clang build on Centriq made some changes still fails. clang on x86 has errors still passes. 'make test' fails on x86. Jira Card to be created - AI(Sirshak). Khem to try.
  • Action Items
    • John B- 1G to USB adapters Ship to lab.
    • Khem to analyze make test failure in Taishan - 1802 rc2
    • ARM - For TG for deciding connectivity - MCBin and Taishan
    • CSIT 990 brian to try
    • Sirshak to take 1103 and 1114
    • Nitin to Create l3fwd tkt
    • Brian to create a mcbin crash tkt.
    • Maen to provide contact for IO Stashing on mcbin.
    • Sirshak/Brian to recheck validity of ASLR issue.
    • Sirshak to track down issues.

4/18/2018

  • FD.io lab
    • Temporarily borrow 1x ThunderX to be used for ONAP demo at OpenStack Summit (end of May)? Yes.
    • OS exists on ThunderXs; Varun will keysign with EdW; need to resolve OS netdev connectivity over 10/40GbE
    • OS exists on TaiShan2280; no connectivity to the Internet
  • VPP
    • RC2
      • 'make' passes, 'make test' fail, 'make test-all' ??? - MACCHIATObin (4GB DRAM)
      • 'make' passes, 'make test' pass, 'make test-all' fails - Centriq
      • 'make' passes, 'make test' pass, 'make test-all' fails - x86
    • Build
      • Testing Verify and Merge jobs for 18.04 master on arm64 today
      • Clang build fails on arm? 'CC=clang CXX=clang make'
  • CSIT
    • Adarsh updated CSIT status in xls
    • CSIT-1023: decided to go with OpenSSL instead of ARMv8 crypto library, in DPDK, due to number of algorithms supported
      • e.g. AES-GCM not supported by ARMv8 crypto library
    • Nitin updated CSIT-990 (buildroot) with more information
  • Action Items
    • Sirshak to add link to xls to wiki page.
    • Brian to raise LF RT ticket about MACCHIATObins
    • Nitin to check 'make test' on MACCHIATObin (16GB DRAM)
    • Honnappa, Khem to check Clang build on arm64

4/11/2018

  • Proposal to keep meeting at current time with additional overflow meeting at 8AM PST
  • FD.io lab
    • MACCHIATObins just arrived at VEXXHOST
    • Nitin working on getting IPMI login credentials to provision OS on ThunderX
    • Need to connect Skylake TG machines to Arm machines
      • ETA: 1wk
    • Khem working with Aton (LF) to provision OS on TaiShan2280
      • ETA: 1wk, Ubuntu 17.10
  • VPP
    • Brian to do more benchmarking on MACCHIATObin
    • Khem working on benchmarking clib_memcpy64_x4()
  • CSIT
    • Lucian submitted patches for CSIT-1019, CSIT-1021
    • Lucian looking for contact for ARMv8 crypto driver in DPDK for CSIT-1023
      • See CSIT-1023 for details; looks like DPDK issue?
    • Nitin to add more details to CSIT-990
  • Action Items
    • Sirshak to move JIRA tickets to xls
    • Lucian to work with Nitin/Jerin on CSIT-1023

4/4/2018

  • Propose to move the meeting +2 hours?
  • RC1 cut today
  • FD.io lab
    • Allocate 3 ThunderX for EdK to integrate into CI
      • JohnB from Cavium agreed to supply 3 more ThunderX for CSIT (will pre-install FW & OS)
    • Brian working on provisioning SSDs for MACCHIATObins
    • Khem can ping IPMI interfaces on TaiShan2280s; also needs an OS to be installed
  • VPP
    • Discussed ONS slides
    • Khem has patch for clib_memcpy64_x4() and needs help benchmarking
  • CSIT
    • Lucian found and created JIRA tickets for 3 issues while running CSIT
    • Nitin created JIRA ticket for buildroot issue
    • Khem seeing issues with VM
  • Action Items
    • Nitin/Varun to help provision Ubuntu 16.04 and firmware update on ThunderX machines

3/28/2018

  • Sachin Saxena from NXP joined the call, welcome
  • FD.io lab
    • Khemendra is having issues with Rudy's emails. Hence, not been able to access Taishan servers
    • Nitin will try to access the servers this week
    • MACCHIATObin setup under progress
    • OD1000 is added to Jenkins slave. The build is failing currently. The build can be triggered manually.
  • VPP
    • Discuss Single core, L3Fwd sample perf numbers and analysis next week
    • Sachin is working on compiling 18.01. Native compilation works fine, cross compilation is failing
    • Nitin still working on patch for cache line size
    • VPP-1126 is being used in DPDK input node. Khemendra will take a look at it this week.
    • VPP-1129 Brian/Sirshak will take a look. Looks like it can be closed.
    • VPP-1114 Patch under internal review
  • CSIT
    • Khemendra having issues with interface bring up failing intermittently. Nitin suggested to add delay.
    • Nicolas/Lucian debugging TC-07
    • Khemendra having issues with TG VM crashing randomly with Ubuntu 16.04, QEMU 2.10. Solved by moving to Ubuntu 17.10, QEMU 2.10
    • Nitin using Ubuntu 16.04 with 4.13 kernel
  • Action Items
    • Discuss Single core, L3Fwd sample perf numbers and analysis next week - Brian
    • VPP-1126 Take a look this week as it affects DPDK input node - Khemendra
    • Need more attention on solution for buildroot issue, need more information on failure CSIT-990 - Nitin
    • Create an excel sheet with the test case status - Nicolas/Lucian

3/21/2018

  • Key signing party! Thank you Ed!
  • FD.io lab
    • VEXXHOST currently working on getting another PDU because there are not enough power ports
    • Received SSDs for MACCHIATObins
  • VPP
    • Discuss high level plan for VPP on Arm
    • Nitin still working on patch for cache line size
  • CSIT
    • Need more attention on solution for buildroot issue CSIT-990
    • Nitin moving towards L2 & L3 perf test cases
    • VM crash due to buffer overflow when multiple VMs share NVRAM; resolved in Fedora27

3/14/2018

  • Key signing party! Thank you Ed!
  • FD.io lab
    • ToR switch issue resolved; confirm mgmt IP address assignment to racked Huawei/Cavium machines
    • Started provisioning MACCHIATObins; Andy ordered SSDs to go with them
  • VPP
    • No updates
  • CSIT
    • Adarsh started running CSIT on virtual topology; moved past a paramiko issue, seeing other test failures
    • Ongoing discussions on getting Adrian access to machines

3/7/2018

  • FD.io lab
    • Trishan (LF) to help follow up on progress in FD.io lab
  • VPP
    • More discussion on patch for cache line size; use MIDR register exported by proc fs
    • Decision has been made to use wrappers for atomics
    • Damjan reworked PCI handling code and added native driver for Intel AVF (XL710 i.e. Fortville)
      • Measuring 132 clocks per packet on Skylake (ip4 routing) with VLIB_FRAME_SIZE 256 (default); +1Mpps over DPDK avf/i40e PMD
    • Damjan reworked memcpy() in MEMIF; achieve 2x25GbE line rate with these changes
    • Sirshak working on getting VPP running on Qualcomm Centriq with Mellanox NIC
      • Seeing issues with external DPDK; static works but not shared; is VPP build system missing -libverbs -lmlx5 in LDFLAGS?
      • Nitin noticed DPDK 17.11 Mellanox PMD does not compile
      • Mellanox recently submitted a patch to VPP to support dynamic loading of Mellanox libraries
  • CSIT
    • Adrian does not have machines to work with in Bucharest; machine in Paris that Gabriel was using no longer available
      • AndyW to help resolve
    • Adarsh moved past VM issues; able to launch VPP in VM with virtio interface; starting to run CSIT scripts

2/28/2018

  • FD.io lab
    • Ed Kern to try containerized CI on one OD1000 in parallel with Vanessa
    • Received MACCHIATObins in Austin
  • VPP
    • Adarsh trying to run VPP in VM but getting PCI mapping issue; trying to connect to Linux bridge on host
    • Patches for build breakage were committed; arm64 build stable now
    • Brian able to reproduce low PPS numbers seen on MACCHIATObin
  • CSIT
    • Adarsh can reproduce a crash in qemu 2.10 Ubuntu 16.04; going to try Ubuntu 17.10
    • Need to partition func test cases across people

2/21/2018

  • FD.io lab
  • CSIT
    • Gabriel updated CSIT/AArch64 wiki with PASS/FAIL/OTHER list
      • OTHER - failure due to expect-like parsing of output(?)
      • FAIL - ssh timeout during PCIe rescan(?)
    • Moved past first UEFI crash; still seeing crashing on startup (Gabriel)
      • Setup new Ubuntu environment
      • Continue debugging UEFI issue on Fedora with JeremyL
    • Ubuntu is used pretty much everywhere except for additional CentOS CSIT perf
    • Nitin working on upstreaming changes to CSIT
    • Adarsh working on getting VM interfaces working
  • VPP
    • More discussion on how to handle cache line size
    • Sync'd on patches for build breakage

2/14/2018

  • FD.io lab
    • Working on getting access to LF lab in order to setup OD1000 environment
    • Check with tykeal & zxiiro on trust policy for getting others access (Brian)
    • VEXXHOST
      • Mohammed says they do not have extra rack shelf - we need to send one for 3x MACCHIATObin
      • LF RT tickets: #52434 (ThunderX), #52435 (TaiShan2280), #52436 (MACCHIATObin)
  • VPP
    • Build, unit test, deb/rpm
      • 64B/128B cache line size - working on passing this configuration to rest of build system i.e. DPDK (Nitin)
      • RPi3 32-bit
        • Some parts of patch are 32-bit related, some RPi3 related
        • If there is justification, look into maintaining a 32-bit build on ARM
    • Porting & Tuning
      • If patches need to be tested on multiple Arm chips, please use DO_NOT_MERGE and Code Review -2
      • Two NEON related patches merged, working in progress on others, Nitin testing CLASSIFY_USE_SSE
  • CSIT
    • Please open JIRA ticket with details on VM crashing on startup. DONE: CSIT-922
    • Khem working on running VPP func tests on internal setup

2/7/2018

  • LF lab
    • OD1000 - last machine was racked; Vanessa needs credentials
    • Taishan2280 - machines arrived at Vexxhost; confirm with Rudy/Mohammed
    • ThunderX - machines arrived at Vexxhost; send board details to Mohammed
    • MACCHIATObin - boards arrived in Arm SJC waiting for enclosures (Andy)
  • Build, unit test, packaging
  • VPP
    • NEON usage in vhost - sent first patch for review (Nitin)
      • Need to verify how it performs on other Arm-based machines (Brian)
      • VPP maintainers prefer to use SIMD wrappers, but it might not always be possible
        • Cavium/Arm had to rewrite algorithm for AArch64 instead of use SIMD wrappers in DPDK
    • CLIB_HAVE_VEC128 - working on it (Gabriel)
    • Discussed compiler builtins for atomics in VPP call; need to spin another patch with wrappers based on architecture (Kevin)
    • Seeing prefetch hostspots on TX2+MlnxCX4en (similar to Armada8040) (Nitin)
  • CSIT
    • libvirt crashing on VM startup (Hierofalcon) (Gabriel)
      • Need someone who can reproduce this issue (Arm TBD)
    • Huawei also seeing VM issues (Khem)
    • buildroot doesn't work on Arm (Nitin)
      • Root issue: no support in GRUB for AArch64 in buildroot (?)
        • Need someone who can reproduce this issue (Arm TBD)
      • Peter Mikus replied to Nitin on csit-dev mail list
      • Using a temporary workaround: use a different VM image (Ubuntu Cloud) instead of one produced by buildroot
        • Working on patching DPDK in VM image (Ubuntu Cloud) just like done in buildroot
  • Misc
    • OpenFlow (Nitin, Damjan)
      • Is there an OpenFlow agent for VPP, and can VPP implement OpenFlow rules/tables?
      • VPP is not flow-based like OVS is; they are different
      • Can ODL/Honeycomb be used?

1/31/2018

  • LF lab
    • OD1000 - 1 replacement being installed this week
    • Huawei & Cavium boards should arrive at colo this week; confirm with Rudy
  • Build, unit test, packaging
    • Kubeproxy/NAT failures
      • Not arch related
      • Part of extended unit tests, so does not block CI
    • `make test` passes on D03 & D05 (Ubuntu)
  • MACCHIATObin
    • Seeing hotspots in VPP graph nodes
      • L3 forwarding - ip4 rewrite node
      • L2 cross-connect
      • Try reducing quad loop to a dual loop
      • dpdk-input node highly opt for x86 (could contribute to low perf) but hotspots still in rte_mbuf_t conversion(?)
    • Some examples of runtime code selection based on uarch exist in the codebase
  • CSIT
    • Adrian Oanca join from Enea
    • Gabriel seeing VM crashing during boot; related to # interfaces assigned (6)
    • Nitin ran into issue with buildroot on arm64; see thread on csit-dev

1/24/2018

  • VPP
    • DPDK issue with non-pci network cards
    • build & test status updated
    • VPP-1127 (VEC_128 enable) under discussion. Should we enable this by default ?
    • add Nitin to review Neon commits
    • VPP-1114 currently internal review
    • VPP-1064 under rework after review by Damjan
  • CSIT
    • first 3-nodes functional tests status list
    • TODO Gabriel: share CSIT VM setup env
    • nested VM: build-root package support for ARM. Create Jira ticket for Brian.

1/17/2018

  • Tina to send calendar invite for meeting
  • FD.io lab
    • Cavium shipping
  • VPP
    • Kubeproxy tests failing
    • Khem trying to find out the PCIe address for a given netdev interface
  • CSIT
    • Gabriel setting up 3 node topo with VMs
    • Gabriel working on PASS/FAIL status
  • CSIT 17.10 report

1/10/2018

  • Meeting moved 2 hours earlier - 6AM PT / 3PM CET / 7:30PM IST / 10PM CST
  • FD.io lab
    • Cavium ThunderX shipping soon
  • VPP
  • CSIT
    • Gabriel's patch for aarch64 support in CSIT merged
    • VirtualBox not supported on Arm / Vagrant unknown
      • This is OK for upstream since automation expects VMs to already exist
  • Performance
    • Need plan for 1T; use TaiShans that were sent to lab
  • AIs
    • Brian: Follow up with Vanessa and EdW regarding 'resource issue'
    • Gabriel: Update CSIT wiki page; which tests are passing/failing?
    • Brian: Check with Vanessa how to split machines between CI jobs and CSIT jobs

1/3/2018

  • FD.io lab
    • One OD1000 sent for RMA
    • Huawei PO sent out
    • Cavium PO sent out (?)
  • VPP
    • Gabriel working on patch for "show cpu" to display MIDR as human readable
    • Nitin sent preliminary patch for vhost-user NEON impl
      • Seeing perf differences on different cores; tradeoff is single-threaded perf vs. NEON
    • Kumar built and unit test successfully on D03
    • Nitin to resume patch for supporting different cache line sizes for the same arch
  • CSIT
    • Gabriel cleaned up WIP patch; ready for review
    • Kumar starting CSIT func tests with Ubuntu VMs
      • Scripts for running on dedicated hardware need to be modified, e.g. PCIe resources
    • Kumar to send doc on testing
  • Performance
    • Kumar to start thread on performance testing
  • AIs
    • Brian: Check with Tina on shipping and open LF RT ticket once they have arrived
    • Brian: Need a way to choose either SW or NEON impl based on chip
    • Gabriel: Create list of broken CSIT tests for 2-node topology

12/20/2017

No meeting next week - Dec 27

  • FD.io lab
    • OD1000s - build only
      • 1 of 3 needs to be RMAd
      • Can these be up in time to show 'make test' passes on ARM for 18.01 release report?
    • TaiShan
      • PO in progress
    • ThunderX - build only
      • PO went out
  • VPP
    • Patches / JIRAs
      • Patch for extended test failure, but still more (new) extended test failures - Gabriel
      • Nitin to post vhost-user.c changes for NEON
        • Nitin will finish Gabriel's original NEON patch to add CLIB_HAVE_VEC_128
    • Can we share code on Github e.g. NEON perf tests?
  • CSIT
    • Leading question: How many CSIT test cases are passing/failing?
    • Environment issues preventing running through all CSIT test cases; Gabriel needs dedicated machines or more RAM
    • Cavium & Huawei will join Gabriel in CSIT replication on ARM hardware next week
      • Cavium previously ran vhost test cases manually, now moving to CSIT

12/13/2017

  • VPP
    • Quick overview of work items
    • Waiting to hear back from LF about OD1000 connectivity
      • Changes needed to ci-mgmt
  • CSIT
    • Starting to reproduce CSIT on x86 and ARM (with Gabriel's WIP patch)
      • Some issues with environment variables (perf tests on 2-node)
    • Need Nexus to support aarch64 packages
      • Need a contact for Nexus
  • Share known issues on wiki!
  • Request CSIT 'deep dive'

12/06/2017

11/29/2017

  • VPP
    • vhost-user.c - SSE4.2 only. Implement range search using NEON. (nitin)
    • OD1000 status ?
      • build only
      • can we access them ?
      • what wan we do to help in general ?
    • x86 intrinsic review
    • build VPP on ARM VM on x86
  • CSIT
    • what platforms wil lbe made available

11/22/2017

  • VPP CI
    • 3 ThunderX for Chrismas
  • CSIT
  • Next steps
    • VPP
    • CSIT
      • structure work & send email (Gabriel)
      • is xxhash vs crc32 finished ? (Gabriel)
      • ask Maciek & setup a presentation meeting with someone from CSIT (Tina)
      • find a time to reschedule this meeting before the CSIT weekly call (Brian)

11/15/2017

  • VPP upstream status
    • build && build-release OK
    • "make test" && "make test-debug" OK
    • packaging:
      • Ubuntu 16.04 OK
      • Ubuntu 17.10 ? (TBC)
      • fedora-26 OK
  • vpp continuous test
    • all task required for jenkin's "verify" job are ready
    • TODO: request gerrit hook to Dave Barachs / vpp-dev (NB & GG)
    • set up ci in fdio lab
  • CSIT
    • setting up env
    • ThunderX platforms should arrive this week
    • csit work sharing

11/8/2017

  • Unit tests
    • Tests pass except for random initialization failures
    • Need to hear back from upstream about Extended unit tests
  • Should we run plugins such as NSH SFC?
  • Hardware to lab
    • Huawei h/w stalled
    • 3x ThunderX shipping to FD.io lab
  • CSIT replication
    • Cavium replicating on ThunderX2; getting started
  • Let's track our work in Jira; Brian to migrate tasks to Jira

10/25/2017

  • Gabriel working on vpp init failure in linux_pci_init()
  • Kumar to check with GeorgeZ on Huawei boards shipped to CSIT; need to verify tests also on this environment (package versions from distro)
  • Brian to check whether anything else needs to be done besides 'make test' for upstream enablement