Difference between revisions of "VPP/Missing Prefetches"
From fd.io
< VPP
m (→Introduction) |
m |
||
Line 32: | Line 32: | ||
1.80% libvlib.so.0.0.0 [.] vlib_get_next_frame_internal | 1.80% libvlib.so.0.0.0 [.] vlib_get_next_frame_internal | ||
1.12% vpp [.] rte_delay_us_block | 1.12% vpp [.] rte_delay_us_block | ||
+ | |||
+ | === Turn off prefetch block in ip4_input_inline(...) === | ||
+ | |||
+ | /* Prefetch next iteration. */ | ||
+ | if (0) | ||
+ | { | ||
+ | vlib_buffer_t * p2, * p3; | ||
+ | |||
+ | p2 = vlib_get_buffer (vm, from[2]); | ||
+ | p3 = vlib_get_buffer (vm, from[3]); | ||
+ | |||
+ | vlib_prefetch_buffer_header (p2, LOAD); | ||
+ | vlib_prefetch_buffer_header (p3, LOAD); | ||
+ | |||
+ | CLIB_PREFETCH (p2->data, sizeof (ip0[0]), LOAD); | ||
+ | CLIB_PREFETCH (p3->data, sizeof (ip1[0]), LOAD); | ||
+ | } |
Revision as of 16:50, 30 November 2016
Introduction
vpp graph nodes make extensive use of explicit prefetching to cover dependent read latency. In the simplest dual-loop case, we prefetch buffer headers and (typically) one cache line worth of packet data. The rest of this page shows what happens if we disable the prefetch block.
Baseline
Single-core, 13 MPPS offered load, i40e NICs, ~13 MPPS in+out:
vpp# show run Name Clocks Vectors/Call FortyGigabitEthernet84/0/1-out 9.08e0 50.09 FortyGigabitEthernet84/0/1-tx 3.84e1 50.09 dpdk-input 7.45e1 50.09 interface-output 1.08e1 50.09 ip4-input-no-checksum 3.92e1 50.09 ip4-lookup 3.88e1 50.09 ip4-rewrite-transit 3.43e1 50.09
Baseline "perf top" function-level profile:
14.21% libvnet.so.0.0.0 [.] ip4_input_no_checksum_avx2 14.14% libvnet.so.0.0.0 [.] ip4_lookup_avx2 14.10% vpp [.] i40e_recv_scattered_pkts_vec 12.64% libvnet.so.0.0.0 [.] ip4_rewrite_transit_avx2 10.60% libvnet.so.0.0.0 [.] dpdk_input_avx2 9.70% vpp [.] i40e_xmit_pkts_vec 4.88% libvnet.so.0.0.0 [.] dpdk_interface_tx_avx2 3.67% libvlib.so.0.0.0 [.] dispatch_node 3.25% libvnet.so.0.0.0 [.] vnet_per_buffer_interface_output_avx2 2.96% libvnet.so.0.0.0 [.] vnet_interface_output_node_no_flatten 1.85% libvlib.so.0.0.0 [.] vlib_put_next_frame 1.80% libvlib.so.0.0.0 [.] vlib_get_next_frame_internal 1.12% vpp [.] rte_delay_us_block
Turn off prefetch block in ip4_input_inline(...)
/* Prefetch next iteration. */ if (0) { vlib_buffer_t * p2, * p3; p2 = vlib_get_buffer (vm, from[2]); p3 = vlib_get_buffer (vm, from[3]); vlib_prefetch_buffer_header (p2, LOAD); vlib_prefetch_buffer_header (p3, LOAD); CLIB_PREFETCH (p2->data, sizeof (ip0[0]), LOAD); CLIB_PREFETCH (p3->data, sizeof (ip1[0]), LOAD); }