Difference between revisions of "VPP/AArch64"

Revision as of 21:29, 17 October 2017

Unit tests

make test-debug status: ?

make test-all-debug status: ?

make test status:

infinite loop:

TestLB.test_lb_ip4_gre4
TestLB.test_lb_ip4_gre6
TestLB.test_lb_ip6_gre4
TestLB.test_lb_ip6_gre6

fail:

TestIp4VrfMultiInst.test_ip4_vrf_02
TestIP6VrfMultiInst.test_ip6_vrf_02
TestIPv4FibCrud.test_2_del_routes
TestIPv4FibCrud.test_3_add_new_routes
TestJVpp.test_vpp_ioamtrace_future_api
TestJVpp.test_vpp_snat_future_api
TestL2bdArpTerm.test_l2bd_arp_term_02
TestL2bdArpTerm.test_l2bd_arp_term_04
TestL2bdArpTerm.test_l2bd_arp_term_10
TestL2bdArpTerm.test_l2bd_arp_term_11
TestL2bdArpTerm.test_l2bd_arp_term_13
TestL2bdArpTerm.test_l2bd_arp_term_14
TestL2bdMultiInst.test_l2bd_inst_02
TestL2bdMultiInst.test_l2bd_inst_03
TestL2fib.test_l2_fib_02
TestL2fib.test_l2_fib_04

make test-all status: ?

Misc

Support multiple cache line sizes per architecture. AArch64 is currently hard coded to 128B. For native build, inspect ARMv8 Main ID Register in Makefile and pass cache line size as compiler option, e.g. -DCACHE_LINE_SIZE=128.

Investigate "show cpu" output and Arm CPU feature detection (AES, SHA1, SHA2, CRC32, ATOMICS) via hwcaps. src/vppinfra/cpu.[ch]

Review use of Arm architected timer in src/vppinfra/time.[ch]

Use ISB or YIELD in src/vppinfra/smp.h

Use REV in src/vppinfra/byte_order.h

Review use of __sync_xxx/__atomic_xxx builtins to ensure correct memory ordering on non-TSO machines.

Investigate hash performance (CRC32 vs xxhash) e.g. in src/vppinfra/bihash_8_8.h. Dependent on Arm CPU feature detection.

Investigate memcpy performance (src/vppinfra/string.h); both inlined-by-compiler and libc versions.

SIMD

Investigate uses of CLIB_HAVE_VEC128 that are not implemented on Arm (mheap_bootstrap.h, vhost-user.c, ixge.c)
Investigate uses of splat for initialization
Investigate uses of SIMD types with plain C bit-wise/arith ops (code generation) (dpdk/node.c, ...)

Investigate current tuning of dual/quad loop optimizations for hiding memory latency, e.g. l2_forward().

Patches

lb plugin - fix format() type mismatches	Merged 10/16	https://gerrit.fd.io/r/#/c/8755/
Use AESNI=y only on x86_64 machines	Merged 10/14	https://gerrit.fd.io/r/#/c/8622/
Improved arm64 chip detection	Merged 9/11	https://gerrit.fd.io/r/#/c/8372/
Native arm64 build: dpdk/Makefile change	Merged 8/31	https://gerrit.fd.io/r/#/c/8228/

Known Issues

GCC 5.3.x ICEs during FP register allocation. Please use GCC 5.4+.

@@ Line 33: / Line 33: @@
 <code>make test-all</code> status: ?
-== TODO ==
+== Misc ==
-* src/vppinfra/cache.h - aarch64 is always 128B cache block size
+Support multiple cache line sizes per architecture. AArch64 is currently hard coded to 128B. For native build, inspect ARMv8 Main ID Register in Makefile and pass cache line size as compiler option, e.g. <code>-DCACHE_LINE_SIZE=128</code>.
-* src/vppinfra/cpu.[ch] - need ARM feature detection & "show cpu" output
-* src/vppinfra/time.[ch] - review ARM architected timer / use cntfrq_el0 instead of estimation
+Investigate "show cpu" output and Arm CPU feature detection (AES, SHA1, SHA2, CRC32, ATOMICS) via hwcaps. <code>src/vppinfra/cpu.[ch]</code>
-* src/vppinfra/smp.h - use "isb" or "yield" inst
-* src/vppinfra/byte_order.h - use "rev" inst
+Review use of Arm architected timer in <code>src/vppinfra/time.[ch]</code>
-* review __sync use for non-x86/TSO machines; "TCP shared-memory fifos, event logger, etc."
-* bihash_8_8.h - crc32 vs xxhash
+Use ISB or YIELD in <code>src/vppinfra/smp.h</code>
-* string.h - memcpy (inlined by compiler) perf
-* SIMD
+Use REV in <code>src/vppinfra/byte_order.h</code>
-* quad loop / dual loop optimizations e.g. l2_forward() - hide memory latency
+Review use of <code>__sync_xxx</code>/<code>__atomic_xxx</code> builtins to ensure correct memory ordering on non-TSO machines.
+Investigate hash performance (CRC32 vs xxhash) e.g. in <code>src/vppinfra/bihash_8_8.h</code>. Dependent on Arm CPU feature detection.
+Investigate memcpy performance (<code>src/vppinfra/string.h</code>); both inlined-by-compiler and libc versions.
+SIMD
+* Investigate uses of CLIB_HAVE_VEC128 that are not implemented on Arm (mheap_bootstrap.h, vhost-user.c, ixge.c)
+* Investigate uses of splat for initialization
+* Investigate uses of SIMD types with plain C bit-wise/arith ops (code generation) (dpdk/node.c, ...)
+Investigate current tuning of dual/quad loop optimizations for hiding memory latency, e.g. l2_forward().
 == Patches ==

Difference between revisions of "VPP/AArch64"

Revision as of 21:29, 17 October 2017

Contents

Unit tests

Misc

Patches

Known Issues

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools