Difference between revisions of "VPP/Missing Prefetches"
From fd.io
< VPP
m |
m (→Introduction) |
||
Line 19: | Line 19: | ||
Baseline "perf top" function-level profile: | Baseline "perf top" function-level profile: | ||
− | + | 14.21% libvnet.so.0.0.0 [.] ip4_input_no_checksum_avx2 | |
− | + | 14.14% libvnet.so.0.0.0 [.] ip4_lookup_avx2 | |
− | + | 14.10% vpp [.] i40e_recv_scattered_pkts_vec | |
− | + | 12.64% libvnet.so.0.0.0 [.] ip4_rewrite_transit_avx2 | |
− | 9. | + | 10.60% libvnet.so.0.0.0 [.] dpdk_input_avx2 |
− | + | 9.70% vpp [.] i40e_xmit_pkts_vec | |
− | + | 4.88% libvnet.so.0.0.0 [.] dpdk_interface_tx_avx2 | |
− | 3.25% libvnet.so.0.0.0 | + | 3.67% libvlib.so.0.0.0 [.] dispatch_node |
− | 2. | + | 3.25% libvnet.so.0.0.0 [.] vnet_per_buffer_interface_output_avx2 |
− | + | 2.96% libvnet.so.0.0.0 [.] vnet_interface_output_node_no_flatten | |
− | 1. | + | 1.85% libvlib.so.0.0.0 [.] vlib_put_next_frame |
− | 1. | + | 1.80% libvlib.so.0.0.0 [.] vlib_get_next_frame_internal |
+ | 1.12% vpp [.] rte_delay_us_block |
Revision as of 16:48, 30 November 2016
Introduction
vpp graph nodes make extensive use of explicit prefetching to cover dependent read latency. In the simplest dual-loop case, we prefetch buffer headers and (typically) one cache line worth of packet data. The rest of this page shows what happens if we disable the prefetch block.
Baseline
Single-core, 13 MPPS offered load, i40e NICs, ~13 MPPS in+out:
vpp# show run Name Clocks Vectors/Call FortyGigabitEthernet84/0/1-out 9.08e0 50.09 FortyGigabitEthernet84/0/1-tx 3.84e1 50.09 dpdk-input 7.45e1 50.09 interface-output 1.08e1 50.09 ip4-input-no-checksum 3.92e1 50.09 ip4-lookup 3.88e1 50.09 ip4-rewrite-transit 3.43e1 50.09
Baseline "perf top" function-level profile:
14.21% libvnet.so.0.0.0 [.] ip4_input_no_checksum_avx2 14.14% libvnet.so.0.0.0 [.] ip4_lookup_avx2 14.10% vpp [.] i40e_recv_scattered_pkts_vec 12.64% libvnet.so.0.0.0 [.] ip4_rewrite_transit_avx2 10.60% libvnet.so.0.0.0 [.] dpdk_input_avx2 9.70% vpp [.] i40e_xmit_pkts_vec 4.88% libvnet.so.0.0.0 [.] dpdk_interface_tx_avx2 3.67% libvlib.so.0.0.0 [.] dispatch_node 3.25% libvnet.so.0.0.0 [.] vnet_per_buffer_interface_output_avx2 2.96% libvnet.so.0.0.0 [.] vnet_interface_output_node_no_flatten 1.85% libvlib.so.0.0.0 [.] vlib_put_next_frame 1.80% libvlib.so.0.0.0 [.] vlib_get_next_frame_internal 1.12% vpp [.] rte_delay_us_block