Difference between revisions of "VPP/AArch64"
(→FD.io Lab) |
(→Meeting Minutes) |
||
Line 243: | Line 243: | ||
== Meeting Minutes == | == Meeting Minutes == | ||
+ | === 3/28/2018 === | ||
+ | * No key signing party :( | ||
+ | * Sachin Saxena from NXP joined the call, welcome | ||
+ | * FD.io lab | ||
+ | ** Khemendra is having issues with Rudy's emails. Hence, not been able to access Taishan servers | ||
+ | ** Nitin will try to access the servers this week | ||
+ | ** MACCHIATObin setup under progress | ||
+ | ** OD1000 is added to Jenkins slave. The build is failing currently. The build can be triggered manually. | ||
+ | * VPP | ||
+ | ** Discuss Single core, L3Fwd sample perf numbers and analysis next week | ||
+ | ** Sachin is working on compiling 18.01. Native compilation works fine, cross compilation is failing | ||
+ | ** Nitin still working on patch for cache line size | ||
+ | ** VPP-1126 is being used in DPDK input node. Khemendra will take a look at it this week. | ||
+ | ** VPP-1129 Brian/Sirshak will take a look. Looks like it can be closed. | ||
+ | ** VPP-1114 Patch under internal review | ||
+ | * CSIT | ||
+ | ** Khemendra having issues with interface bring up failing intermittently. Nitin suggested to add delay. | ||
+ | ** Nicolas/Lucian debugging TC-07 | ||
+ | ** Khemendra having issues with TG VM crashing randomly with Ubuntu 16.04, QEMU 2.10. Solved by moving to Ubuntu 17.10, QEMU 2.10 | ||
+ | ** Nitin using Ubuntu 16.04 with 4.13 kernel | ||
+ | * Action Items | ||
+ | ** Discuss Single core, L3Fwd sample perf numbers and analysis next week - Brian | ||
+ | ** VPP-1126 Take a look this week as it affects DPDK input node - Khemendra | ||
+ | ** Need more attention on solution for buildroot issue, need more information on failure [https://jira.fd.io/browse/CSIT-990 CSIT-990] - Nitin | ||
+ | ** Create an excel sheet with the test case status - Nicolas/Lucian | ||
=== 3/21/2018 === | === 3/21/2018 === | ||
* Key signing party! Thank you Ed! | * Key signing party! Thank you Ed! |
Revision as of 23:01, 28 March 2018
Contents
- 1 Meeting Details
- 2 IRC Channel
- 3 FD.io Lab
- 4 Build, unit test, packaging
- 5 CSIT
- 6 AArch64 Tuning
- 7 AArch64 Porting
- 8 Known Issues
- 9 Recent Patches
- 10 Meeting Minutes
- 10.1 3/28/2018
- 10.2 3/21/2018
- 10.3 3/14/2018
- 10.4 3/7/2018
- 10.5 2/28/2018
- 10.6 2/21/2018
- 10.7 2/14/2018
- 10.8 2/7/2018
- 10.9 1/31/2018
- 10.10 1/24/2018
- 10.11 1/17/2018
- 10.12 1/10/2018
- 10.13 1/3/2018
- 10.14 12/20/2017
- 10.15 12/13/2017
- 10.16 12/06/2017
- 10.17 11/29/2017
- 10.18 11/22/2017
- 10.19 11/15/2017
- 10.20 11/8/2017
- 10.21 10/25/2017
Meeting Details
Weekly on Wednesdays, 6AM PT / 3PM CET / 7:30PM IST / 10PM CST. FD.io Zoom Meeting room
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/5301185804 or https://zoom.us/my/fastdata Or iPhone one-tap (US Toll): +14086380968,,5301185804# or +16465588656,,5301185804# Or Telephone: Dial: +1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll) +1 855 880 1246 (US Toll Free) +1 877 369 0926 (US Toll Free) Meeting ID: 530 118 5804 International numbers available: https://zoom.us/zoomconference?m=ppBOQMQTVxGYmbxNsVemC6KNo8eX2ptF
IRC Channel
#fdio-arm
on freenode.net
FD.io Lab
Platform | Hostname | Mgmt IP | VEXXHOST | Ethernet | Build slave | CSIT slave |
---|---|---|---|---|---|---|
SoftIron OverDrive 1000 | softiron-1 | 10.30.51.12 | Running | None | Yes | No |
softiron-2 | 10.30.51.13 | Running | None | Yes | No | |
softiron-3 | 10.30.51.14 | Running | None | Yes | No | |
Huawei TaiShan 2280 | huawei-1 | IPMI: 10.30.50.36 / mgmt: 10.30.51.36 | Running | 2x10GbE SFP+ Intel 82599 / 2x25GbE SFP28 Mellanox CX-4 | ? | Yes |
huawei-2 | IPMI: 10.30.50.37 / mgmt: 10.30.51.37 | Running | 2x10GbE SFP+ Intel 82599 / 2x25GbE SFP28 Mellanox CX-4 | ? | Yes | |
Marvell MACCHIATObin | mcbin-1 | Updating env | 2x10GbE SFP+ | ? | Yes | |
mcbin-2 | Updating env | 2x10GbE SFP+ | ? | Yes | ||
mcbin-3 | Updating env | 2x10GbE SFP+ | ? | Yes | ||
Cavium ThunderX | cavium-1 | IPMI: 10.30.50.38 / mgmt: 10.30.51.38 | Running | 3x40GbE QSFP+ / 4x10GbE SFP+ | Yes | No |
cavium-2 | IPMI: 10.30.50.39 / mgmt: 10.30.51.39 | Running | 3x40GbE QSFP+ / 4x10GbE SFP+ | Yes | No | |
cavium-3 | IPMI: 10.30.50.40 / mgmt: 10.30.51.40 | Running | 3x40GbE QSFP+ / 4x10GbE SFP+ | Yes | No |
Build, unit test, packaging
The following is tracked manually until hardware is integrated into upstream FD.io CI
Cmd | Status | timing |
---|---|---|
make bootstrap
|
OK | 0m45 |
make build
|
OK | 11m45 |
make build-release
|
OK | 14m56 |
make test
|
OK | 33m40 |
make test-all
|
KO (kubeproxy) | 46m30 |
make test-debug
|
OK | 22m32 |
make test-all-debug
|
KO (kubeproxy) | 33m29 |
Status on commit: a38783e0d1ab1d4c661570a1ec90670a1fb0598d (Thu Feb 15 07:31:01 2018 +0000)
kubeproxy tests are broken on purpose: corresponding features are not fully implemented
Timing consideration on platform: Hierofalcon with Cortex-A57 & Fedora 26
- Might want to have a look at this patch which adds
make config
: https://gerrit.fd.io/r/#/c/9200/
Distro | Cmd | Status |
---|---|---|
Fedora 27 (Server Edition) | make pkg-rpm
|
OK |
Ubuntu 17.10 | make pkg-deb
|
OK |
Ubuntu 16.04.3 LTS | make pkg-deb
|
OK |
CSIT
https://wiki.fd.io/view/CSIT/AArch64
AArch64 Tuning
Areas:
- Profiling analysis & optimization
- Runtime selection of code using existing methods in VPP
AArch64 Porting
Areas:
- VM creation via buildroot
- Hardware topology variation, e.g. non PCIe NICs
- VPP-1215 - TC01 : Process untagged send tagged testcase failing due to same packet received as sent
All: JIRA issues with ARM64 label
Assigned and New:
CSIT-922 aarch64 VM crash at startup | Lucian Banu |
VPP-1174 Prefetch hotspots | Brian Brooks |
VPP-1129 Investigate enabling CLASSIFY_USE_SSE on Arm | Gabriel Ganne |
VPP-1126 Benchmark and optimize clib_memcpy64_x4() | Khemendra Kumar |
VPP-1114 Ensure correctness of atomics and memory ordering | Arm |
VPP-1103 Use correct CPU freq on ARM platforms | Arm |
VPP-1130 Test with 64K pages | Brian to check with Jeremy Linton |
VPP-1166 Add ILP32 support in VPP |
Known Issues
Compilation may fail on systems with less than 1GB memory per core. One workaround is to search for -j
in build-root/Makefile
and multiply by 1 instead of 2.
GCC 5.3.x ICEs during FP register allocation. Please use GCC 5.4+.
Try disabling ASLR if experiencing random crashes: sysctl -w kernel.randomize_va_space=0
Recent Patches
Meeting Minutes
3/28/2018
- No key signing party :(
- Sachin Saxena from NXP joined the call, welcome
- FD.io lab
- Khemendra is having issues with Rudy's emails. Hence, not been able to access Taishan servers
- Nitin will try to access the servers this week
- MACCHIATObin setup under progress
- OD1000 is added to Jenkins slave. The build is failing currently. The build can be triggered manually.
- VPP
- Discuss Single core, L3Fwd sample perf numbers and analysis next week
- Sachin is working on compiling 18.01. Native compilation works fine, cross compilation is failing
- Nitin still working on patch for cache line size
- VPP-1126 is being used in DPDK input node. Khemendra will take a look at it this week.
- VPP-1129 Brian/Sirshak will take a look. Looks like it can be closed.
- VPP-1114 Patch under internal review
- CSIT
- Khemendra having issues with interface bring up failing intermittently. Nitin suggested to add delay.
- Nicolas/Lucian debugging TC-07
- Khemendra having issues with TG VM crashing randomly with Ubuntu 16.04, QEMU 2.10. Solved by moving to Ubuntu 17.10, QEMU 2.10
- Nitin using Ubuntu 16.04 with 4.13 kernel
- Action Items
- Discuss Single core, L3Fwd sample perf numbers and analysis next week - Brian
- VPP-1126 Take a look this week as it affects DPDK input node - Khemendra
- Need more attention on solution for buildroot issue, need more information on failure CSIT-990 - Nitin
- Create an excel sheet with the test case status - Nicolas/Lucian
3/21/2018
- Key signing party! Thank you Ed!
- FD.io lab
- VEXXHOST currently working on getting another PDU because there are not enough power ports
- Received SSDs for MACCHIATObins
- VPP
- Discuss high level plan for VPP on Arm
- Nitin still working on patch for cache line size
- CSIT
- Need more attention on solution for buildroot issue CSIT-990
- Nitin moving towards L2 & L3 perf test cases
- VM crash due to buffer overflow when multiple VMs share NVRAM; resolved in Fedora27
3/14/2018
- Key signing party! Thank you Ed!
- FD.io lab
- ToR switch issue resolved; confirm mgmt IP address assignment to racked Huawei/Cavium machines
- Started provisioning MACCHIATObins; Andy ordered SSDs to go with them
- VPP
- No updates
- CSIT
- Adarsh started running CSIT on virtual topology; moved past a paramiko issue, seeing other test failures
- Ongoing discussions on getting Adrian access to machines
3/7/2018
- FD.io lab
- Trishan (LF) to help follow up on progress in FD.io lab
- VPP
- More discussion on patch for cache line size; use MIDR register exported by proc fs
- Decision has been made to use wrappers for atomics
- Damjan reworked PCI handling code and added native driver for Intel AVF (XL710 i.e. Fortville)
- Measuring 132 clocks per packet on Skylake (ip4 routing) with VLIB_FRAME_SIZE 256 (default); +1Mpps over DPDK avf/i40e PMD
- Damjan reworked memcpy() in MEMIF; achieve 2x25GbE line rate with these changes
- Sirshak working on getting VPP running on Qualcomm Centriq with Mellanox NIC
- Seeing issues with external DPDK; static works but not shared; is VPP build system missing -libverbs -lmlx5 in LDFLAGS?
- Nitin noticed DPDK 17.11 Mellanox PMD does not compile
- Mellanox recently submitted a patch to VPP to support dynamic loading of Mellanox libraries
- CSIT
- Adrian does not have machines to work with in Bucharest; machine in Paris that Gabriel was using no longer available
- AndyW to help resolve
- Adarsh moved past VM issues; able to launch VPP in VM with virtio interface; starting to run CSIT scripts
- Adrian does not have machines to work with in Bucharest; machine in Paris that Gabriel was using no longer available
2/28/2018
- FD.io lab
- Ed Kern to try containerized CI on one OD1000 in parallel with Vanessa
- Received MACCHIATObins in Austin
- VPP
- Adarsh trying to run VPP in VM but getting PCI mapping issue; trying to connect to Linux bridge on host
- Patches for build breakage were committed; arm64 build stable now
- Brian able to reproduce low PPS numbers seen on MACCHIATObin
- CSIT
- Adarsh can reproduce a crash in qemu 2.10 Ubuntu 16.04; going to try Ubuntu 17.10
- Need to partition func test cases across people
2/21/2018
- FD.io lab
- CSIT
- Gabriel updated CSIT/AArch64 wiki with PASS/FAIL/OTHER list
- OTHER - failure due to expect-like parsing of output(?)
- FAIL - ssh timeout during PCIe rescan(?)
- Moved past first UEFI crash; still seeing crashing on startup (Gabriel)
- Setup new Ubuntu environment
- Continue debugging UEFI issue on Fedora with JeremyL
- Ubuntu is used pretty much everywhere except for additional CentOS CSIT perf
- Nitin working on upstreaming changes to CSIT
- Adarsh working on getting VM interfaces working
- Gabriel updated CSIT/AArch64 wiki with PASS/FAIL/OTHER list
- VPP
- More discussion on how to handle cache line size
- Sync'd on patches for build breakage
2/14/2018
- FD.io lab
- Working on getting access to LF lab in order to setup OD1000 environment
- Check with tykeal & zxiiro on trust policy for getting others access (Brian)
- VEXXHOST
- Mohammed says they do not have extra rack shelf - we need to send one for 3x MACCHIATObin
- LF RT tickets: #52434 (ThunderX), #52435 (TaiShan2280), #52436 (MACCHIATObin)
- VPP
- Build, unit test, deb/rpm
- 64B/128B cache line size - working on passing this configuration to rest of build system i.e. DPDK (Nitin)
- RPi3 32-bit
- Some parts of patch are 32-bit related, some RPi3 related
- If there is justification, look into maintaining a 32-bit build on ARM
- Porting & Tuning
- If patches need to be tested on multiple Arm chips, please use DO_NOT_MERGE and Code Review -2
- Two NEON related patches merged, working in progress on others, Nitin testing CLASSIFY_USE_SSE
- Build, unit test, deb/rpm
- CSIT
- Please open JIRA ticket with details on VM crashing on startup. DONE: CSIT-922
- Khem working on running VPP func tests on internal setup
2/7/2018
- LF lab
- OD1000 - last machine was racked; Vanessa needs credentials
- Taishan2280 - machines arrived at Vexxhost; confirm with Rudy/Mohammed
- ThunderX - machines arrived at Vexxhost; send board details to Mohammed
- MACCHIATObin - boards arrived in Arm SJC waiting for enclosures (Andy)
- Build, unit test, packaging
- 64B/128B cache line size - working on it (Nitin)
- Interest in ILP32 from Cavium; customer coming from MIPS32
- VPP
- NEON usage in vhost - sent first patch for review (Nitin)
- Need to verify how it performs on other Arm-based machines (Brian)
- VPP maintainers prefer to use SIMD wrappers, but it might not always be possible
- Cavium/Arm had to rewrite algorithm for AArch64 instead of use SIMD wrappers in DPDK
- CLIB_HAVE_VEC128 - working on it (Gabriel)
- Discussed compiler builtins for atomics in VPP call; need to spin another patch with wrappers based on architecture (Kevin)
- Seeing prefetch hostspots on TX2+MlnxCX4en (similar to Armada8040) (Nitin)
- NEON usage in vhost - sent first patch for review (Nitin)
- CSIT
- libvirt crashing on VM startup (Hierofalcon) (Gabriel)
- Need someone who can reproduce this issue (Arm TBD)
- Huawei also seeing VM issues (Khem)
- buildroot doesn't work on Arm (Nitin)
- Root issue: no support in GRUB for AArch64 in buildroot (?)
- Need someone who can reproduce this issue (Arm TBD)
- Peter Mikus replied to Nitin on csit-dev mail list
- Using a temporary workaround: use a different VM image (Ubuntu Cloud) instead of one produced by buildroot
- Working on patching DPDK in VM image (Ubuntu Cloud) just like done in buildroot
- Root issue: no support in GRUB for AArch64 in buildroot (?)
- libvirt crashing on VM startup (Hierofalcon) (Gabriel)
- Misc
- OpenFlow (Nitin, Damjan)
- Is there an OpenFlow agent for VPP, and can VPP implement OpenFlow rules/tables?
- VPP is not flow-based like OVS is; they are different
- Can ODL/Honeycomb be used?
- OpenFlow (Nitin, Damjan)
1/31/2018
- LF lab
- OD1000 - 1 replacement being installed this week
- Huawei & Cavium boards should arrive at colo this week; confirm with Rudy
- Build, unit test, packaging
- Kubeproxy/NAT failures
- Not arch related
- Part of extended unit tests, so does not block CI
- `make test` passes on D03 & D05 (Ubuntu)
- Kubeproxy/NAT failures
- MACCHIATObin
- Seeing hotspots in VPP graph nodes
- L3 forwarding - ip4 rewrite node
- L2 cross-connect
- Try reducing quad loop to a dual loop
- dpdk-input node highly opt for x86 (could contribute to low perf) but hotspots still in rte_mbuf_t conversion(?)
- Some examples of runtime code selection based on uarch exist in the codebase
- Seeing hotspots in VPP graph nodes
- CSIT
- Adrian Oanca join from Enea
- Gabriel seeing VM crashing during boot; related to # interfaces assigned (6)
- Nitin ran into issue with buildroot on arm64; see thread on csit-dev
1/24/2018
- VPP
- DPDK issue with non-pci network cards
- build & test status updated
- VPP-1127 (VEC_128 enable) under discussion. Should we enable this by default ?
- add Nitin to review Neon commits
- VPP-1114 currently internal review
- VPP-1064 under rework after review by Damjan
- CSIT
- first 3-nodes functional tests status list
- TODO Gabriel: share CSIT VM setup env
- nested VM: build-root package support for ARM. Create Jira ticket for Brian.
1/17/2018
- Tina to send calendar invite for meeting
- FD.io lab
- Cavium shipping
- VPP
- Kubeproxy tests failing
- Khem trying to find out the PCIe address for a given netdev interface
- CSIT
- Gabriel setting up 3 node topo with VMs
- Gabriel working on PASS/FAIL status
- CSIT 17.10 report
1/10/2018
- Meeting moved 2 hours earlier - 6AM PT / 3PM CET / 7:30PM IST / 10PM CST
- FD.io lab
- Cavium ThunderX shipping soon
- VPP
- Kumar to look at VPP-1126
- Gabriel proposed https://gerrit.fd.io/r/#/c/10049/ as follow-up to Damjan's patch
- CSIT
- Gabriel's patch for aarch64 support in CSIT merged
- VirtualBox not supported on Arm / Vagrant unknown
- This is OK for upstream since automation expects VMs to already exist
- Performance
- Need plan for 1T; use TaiShans that were sent to lab
- AIs
- Brian: Follow up with Vanessa and EdW regarding 'resource issue'
- Gabriel: Update CSIT wiki page; which tests are passing/failing?
- Brian: Check with Vanessa how to split machines between CI jobs and CSIT jobs
1/3/2018
- FD.io lab
- One OD1000 sent for RMA
- Huawei PO sent out
- Cavium PO sent out (?)
- VPP
- Gabriel working on patch for "show cpu" to display MIDR as human readable
- Nitin sent preliminary patch for vhost-user NEON impl
- Seeing perf differences on different cores; tradeoff is single-threaded perf vs. NEON
- Kumar built and unit test successfully on D03
- Nitin to resume patch for supporting different cache line sizes for the same arch
- CSIT
- Gabriel cleaned up WIP patch; ready for review
- Kumar starting CSIT func tests with Ubuntu VMs
- Scripts for running on dedicated hardware need to be modified, e.g. PCIe resources
- Kumar to send doc on testing
- Performance
- Kumar to start thread on performance testing
- AIs
- Brian: Check with Tina on shipping and open LF RT ticket once they have arrived
- Brian: Need a way to choose either SW or NEON impl based on chip
- Gabriel: Create list of broken CSIT tests for 2-node topology
12/20/2017
No meeting next week - Dec 27
- FD.io lab
- OD1000s - build only
- 1 of 3 needs to be RMAd
- Can these be up in time to show 'make test' passes on ARM for 18.01 release report?
- TaiShan
- PO in progress
- ThunderX - build only
- PO went out
- OD1000s - build only
- VPP
- Patches / JIRAs
- Patch for extended test failure, but still more (new) extended test failures - Gabriel
- Nitin to post vhost-user.c changes for NEON
- Nitin will finish Gabriel's original NEON patch to add CLIB_HAVE_VEC_128
- Can we share code on Github e.g. NEON perf tests?
- Patches / JIRAs
- CSIT
- Leading question: How many CSIT test cases are passing/failing?
- Environment issues preventing running through all CSIT test cases; Gabriel needs dedicated machines or more RAM
- Cavium & Huawei will join Gabriel in CSIT replication on ARM hardware next week
- Cavium previously ran vhost test cases manually, now moving to CSIT
12/13/2017
- VPP
- Quick overview of work items
- Waiting to hear back from LF about OD1000 connectivity
- Changes needed to ci-mgmt
- CSIT
- Starting to reproduce CSIT on x86 and ARM (with Gabriel's WIP patch)
- Some issues with environment variables (perf tests on 2-node)
- Need Nexus to support aarch64 packages
- Need a contact for Nexus
- Starting to reproduce CSIT on x86 and ARM (with Gabriel's WIP patch)
- Share known issues on wiki!
- Request CSIT 'deep dive'
12/06/2017
- Can we access the OD1000 in csit lab ?
- currently mainly working with VMs
- added dedicated wiki page for CSIT : https://wiki.fd.io/view/CSIT/AArch64
- WIP : https://gerrit.fd.io/r/#/c/9474/
11/29/2017
- VPP
- vhost-user.c - SSE4.2 only. Implement range search using NEON. (nitin)
- OD1000 status ?
- build only
- can we access them ?
- what wan we do to help in general ?
- x86 intrinsic review
- build VPP on ARM VM on x86
- CSIT
- what platforms wil lbe made available
11/22/2017
- VPP CI
- 3 ThunderX for Chrismas
- CSIT
- func on VM vs perfs on HW
- func on x86 VMs OK with 2 nodes
- DPDK integration WIP : https://gerrit.fd.io/r/#/c/9474/
- issues
- how to access the lab ?
- Next steps
- VPP
- CSIT
- structure work & send email (Gabriel)
- is xxhash vs crc32 finished ? (Gabriel)
- ask Maciek & setup a presentation meeting with someone from CSIT (Tina)
- find a time to reschedule this meeting before the CSIT weekly call (Brian)
11/15/2017
- VPP upstream status
- build && build-release OK
- "make test" && "make test-debug" OK
- packaging:
- Ubuntu 16.04 OK
- Ubuntu 17.10 ? (TBC)
- fedora-26 OK
- vpp continuous test
- all task required for jenkin's "verify" job are ready
- TODO: request gerrit hook to Dave Barachs / vpp-dev (NB & GG)
- set up ci in fdio lab
- CSIT
- setting up env
- ThunderX platforms should arrive this week
- csit work sharing
11/8/2017
- Unit tests
- Tests pass except for random initialization failures
- Need to hear back from upstream about Extended unit tests
- Should we run plugins such as NSH SFC?
- Hardware to lab
- Huawei h/w stalled
- 3x ThunderX shipping to FD.io lab
- CSIT replication
- Cavium replicating on ThunderX2; getting started
- Let's track our work in Jira; Brian to migrate tasks to Jira
10/25/2017
- Gabriel working on vpp init failure in linux_pci_init()
- Kumar to check with GeorgeZ on Huawei boards shipped to CSIT; need to verify tests also on this environment (package versions from distro)
- Brian to check whether anything else needs to be done besides 'make test' for upstream enablement