Difference between revisions of "VPP/Missing Prefetches"

From fd.io
< VPP
Jump to: navigation, search
m (Introduction)
m
Line 32: Line 32:
 
   1.80%  libvlib.so.0.0.0          [.] vlib_get_next_frame_internal
 
   1.80%  libvlib.so.0.0.0          [.] vlib_get_next_frame_internal
 
   1.12%  vpp                        [.] rte_delay_us_block
 
   1.12%  vpp                        [.] rte_delay_us_block
 +
 +
=== Turn off prefetch block in ip4_input_inline(...) ===
 +
 +
/* Prefetch next iteration. */
 +
if (0)
 +
  {
 +
      vlib_buffer_t * p2, * p3;
 +
 +
      p2 = vlib_get_buffer (vm, from[2]);
 +
      p3 = vlib_get_buffer (vm, from[3]);
 +
 +
      vlib_prefetch_buffer_header (p2, LOAD);
 +
      vlib_prefetch_buffer_header (p3, LOAD);
 +
 +
      CLIB_PREFETCH (p2->data, sizeof (ip0[0]), LOAD);
 +
    CLIB_PREFETCH (p3->data, sizeof (ip1[0]), LOAD);
 +
  }

Revision as of 16:50, 30 November 2016

Introduction

vpp graph nodes make extensive use of explicit prefetching to cover dependent read latency. In the simplest dual-loop case, we prefetch buffer headers and (typically) one cache line worth of packet data. The rest of this page shows what happens if we disable the prefetch block.

Baseline

Single-core, 13 MPPS offered load, i40e NICs, ~13 MPPS in+out:

vpp# show run
             Name                 Clocks       Vectors/Call  
FortyGigabitEthernet84/0/1-out         9.08e0           50.09
FortyGigabitEthernet84/0/1-tx          3.84e1           50.09
dpdk-input                             7.45e1           50.09
interface-output                       1.08e1           50.09
ip4-input-no-checksum                  3.92e1           50.09
ip4-lookup                             3.88e1           50.09
ip4-rewrite-transit                    3.43e1           50.09

Baseline "perf top" function-level profile:

 14.21%  libvnet.so.0.0.0           [.] ip4_input_no_checksum_avx2
 14.14%  libvnet.so.0.0.0           [.] ip4_lookup_avx2
 14.10%  vpp                        [.] i40e_recv_scattered_pkts_vec
 12.64%  libvnet.so.0.0.0           [.] ip4_rewrite_transit_avx2
 10.60%  libvnet.so.0.0.0           [.] dpdk_input_avx2
  9.70%  vpp                        [.] i40e_xmit_pkts_vec
  4.88%  libvnet.so.0.0.0           [.] dpdk_interface_tx_avx2
  3.67%  libvlib.so.0.0.0           [.] dispatch_node
  3.25%  libvnet.so.0.0.0           [.] vnet_per_buffer_interface_output_avx2
  2.96%  libvnet.so.0.0.0           [.] vnet_interface_output_node_no_flatten
  1.85%  libvlib.so.0.0.0           [.] vlib_put_next_frame
  1.80%  libvlib.so.0.0.0           [.] vlib_get_next_frame_internal
  1.12%  vpp                        [.] rte_delay_us_block

Turn off prefetch block in ip4_input_inline(...)

/* Prefetch next iteration. */
if (0)
  {
     vlib_buffer_t * p2, * p3;

     p2 = vlib_get_buffer (vm, from[2]);
     p3 = vlib_get_buffer (vm, from[3]);

     vlib_prefetch_buffer_header (p2, LOAD);
     vlib_prefetch_buffer_header (p3, LOAD);

     CLIB_PREFETCH (p2->data, sizeof (ip0[0]), LOAD);
    CLIB_PREFETCH (p3->data, sizeof (ip1[0]), LOAD);
  }