VPP/What is VPP?
Last updated 8 years ago

How does the plugin come into play? At runtime, the VPP platform grabs all available packets from RX rings to form a vector of packets. A packet processing graph is applied, node by node (including plugins) to the entire packet vector. Graph nodes are small and modular. Graph nodes are loosely coupled. This makes it easy to introduce new graph nodes. It also makes it relatively easy to rewire existing graph nodes.

A plugin can introduce new graph nodes or rearrange the packet processing graph. You can also build a plugin independently of the VPP source tree - which means you can treat it as an independent component. A plugin can be installed by adding it to a plugin directory.

The VPP platform can be used to build any kind of packet processing application. It can be used as the basis for a Load Balancer, a Firewall, an IDS, or a Host Stack. You could also create a combination of applications. For example, you could add load balancing to a vSwitch.

The engine runs in pure userspace. This means that plugins do not require changing core code - you can extend the capabilities of the packet processing engine without the need to change code running at the kernel level. Through the creation of a plugin, anyone can extend functionality with:

New custom graph nodes
Rearrangement of graph nodes
New low level APIs

Feature Rich

The full suite of graph nodes allows a wide variety of network appliance workloads to be built. At a high level, the platform provides:

Fast lookup tables for routes, bridge entries
Arbitrary n-tuple classifiers
Out of the box production quality switch/router functionality

The following is a summary of the features the VPP platform provides:

IPv4/IPv6
14+ MPPS, single core
Multimillion entry FIBs
Input Checks
    Source RPF
    TTL expiration
    header checksum
    L2 length < IP length
    ARP resolution/snooping
    ARP proxy
Thousands of VRFs
    Controlled cross-VRF lookups
Multipath – ECMP and Unequal Cost
Multiple million Classifiers –
    Arbitrary N-tuple
VLAN Support – Single/Double tag

IPv4
GRE, MPLS-GRE, NSH-GRE,
VXLAN
IPSEC
DHCP client/proxy

IPv6
Neighbor discovery
Router Advertisement
DHCPv6 Proxy
L2TPv3
Segment Routing
MAP/LW46 – IPv4aas
iOAM

MPLS
MPLS-o-Ethernet –
Deep label stacks supported

L2
VLAN Support
Single/ Double tag
L2 forwarding with EFP/BridgeDomain concepts
VTR – push/pop/Translate (1:1,1:2, 2:1,2:2)
Mac Learning – default limit of 50k addresses
Bridging – Split-horizon group support/EFP Filtering
Proxy Arp
Arp termination
IRB – BVI Support with RouterMac assignment
Flooding
Input ACLs
Interface cross-connect

Why is it called vector processing?

As the name implies, VPP uses vector processing as opposed to scalar processing. Scalar packet processing refers to the processing of one packet at a time. That older, traditional approach entails processing an interrupt, and traversing the call stack (a calls b calls c... return return return from the nested calls... then return from Interrupt). That process then does one of 3 things: punt, drop, or rewrite/forward the packet.

The problem with that traditional scalar packet processing is:

thrashing occurs in the I-cache
each packet incurs an identical set of I-cache misses
no workaround to the above except to provide larger caches

By contrast, vector processing processes more than one packet at a time.

One of the benefits of the vector approach is that it fixes the I-cache thrashing problem. It also mitigates the dependent read latency problem (pre-fetching eliminates the latency).

This approach fixes the issues related to stack depth / D-cache misses on stack addresses. It improves "circuit time". The "circuit" is the cycle of grabbing all available packets from the device RX ring, forming a "frame" (vector) that consists of packet indices in RX order, running the packets through a directed graph of nodes, and returning to the RX ring. As processing of packets continues, the circuit time reaches a stable equilibrium based on the offered load.

As the vector size increases, processing cost per packet decreases because you are amortizing the I-cache misses over a larger N.

Example Use Case: VPP as a vSwitch/vRouter

One of the use cases for the VPP platform is to implement it as a virtual switch or router. The following section describes examples of possible implementations that can be created with the VPP platform. For more in depth descriptions about other possible use cases, see the list of Use Cases.

You can use the VPP platform to create out-of-the-box virtual switches (vSwitch) and virtual routers (vRouter). The VPP platform allows you to manage certain functions and configurations of these application through a command-line interface (CLI).

Some of the functionality that a switching application can create includes:

Bridge Domains
Ports (including tunnel ports)
Connect ports to bridge domains
Program ARP termination

Some of the functionality that a routing application can create includes:

Virtual Routing and Forwarding (VRF) tables (in the thousands)
Routes (in the millions)

Local Programmability

One approach is to implement a VPP application to communicate with an external application within a local environment (Linux host or container). The communication would occur through a low level API. This approach offers a complete, feature rich solution that is simple yet high performance. For example, it is reasonable to expect performance yields of 500k routes/second.

This approach takes advantage of using a shared memory/message queue. The implementation is on a local on a box or container. All CLI tasks can be done through API calls.

The current implementation of the VPP platform generates Low Level Bindings for C clients and for Java clients. It's possible for future support to be provided for bindings for other programming languages.

Remote Programmability

Another approach is to use a Data Plane Management Agent through a High Level API. As shown in the figure, a Data Plane Management Agent can speak through a low level API to the VPP App (engine). This can run locally in a box (or VM or container). The box (or container) would expose higher level APIs through some form of binding.

Figure: API Through Data Plane Management Agent

This is a particularly flexible approach because the VPP platform does not force a particular Data Plane Management Agent. Furthermore, the VPP platform does not restrict communication to only *one* high level API. Anybody can bring a Data Plane Management Agent. This allows you to match the high level API/Data Plane Management Agent and implementation to the specific needs of the VPP app.

Sample Data Plane Management Agent

One example of using a high hevel API is to implement the VPP platform as an app on a box that is running a local ODL instance (Honeycomb). You could use a low level API over generated Java Bindings to talk to the VPP App, and expose Yang Models over netconf/restconf NB.

VPP Using ODL Honeycomb as a Data Plane Management Agent

This would be one way to implement Bridge Domains.

Primary Characteristics Of VPP

Some of the primary characteristics include:

Improved fault-tolerance and ISSU when compared to running similar packet processing in the kernel:

crashes seldom require more than a process restart
software updates do not require system reboots
development environment is easier to use and perform debug than similar kernel code
user-space debug tools (gdb, valgrind, wireshark)
leverages widely-available kernel modules (uio, igb_uio): DMA-safe memory

Runs as a Linux user-space process:

same image works in a VM, in a Linux container, or over a host kernel
KVM and ESXi: NICs via PCI direct-map
Vhost-user, netmap, virtio paravirtualized NICs
Tun/tap drivers
DPDK poll-mode device drivers

Integrated with the DPDK, VPP supports existing NIC devices including:

Intel i40e, Intel ixgbe physical and virtual functions, Intel e1000, virtio, vhost-user, Linux TAP
HP rebranded Intel Niantic MAC/PHY
Cisco VIC

Security issues considered:

Extensive white-box testing by Cisco's security team
Image segment base address randomization
Shared-memory segment base address randomization
Stack bounds checking
Debug CLI "chroot"

The vector method of packet processing has been proven as the primary punt/inject path on major architectures.

Supported Architectures

The VPP platform supports:

x86/64

Supported Packaging Models

The VPP platform supports package installation on the following operating systems:

Debian
Ubuntu 16.04
Centos 7.3

Performance Expectations

One of the benefits of this implementation of VPP is its high performance on relatively low-power computing. This high level of performance is based on the following highlights:

High-performance user-space network stack for commodity hardware
The same code for host, VMs, Linux containers
Integrated vhost-user virtio backend for high speed VM-to-VM connectivity
L2 and L3 functionality, multiple encapsulations
Leverages best-of-breed open source driver technology: DPDK
Extensible by use of plugins
Control-plane / orchestration-plane via standards-based APIs

Performance Metrics

The VPP platform has been shown to provide the following approximate performance metrics:

Multiple MPPS from a single x86_64 core
>100Gbps full-duplex on a single physical host
Example of multi-core scaling benchmarks (on UCS-C240 M3, 3.5 gHz, all memory channels forwarded, simple ipv4 forwarding):
- 1 core: 9 MPPS in+out
- 2 cores: 13.4 MPPS in+out
- 4 cores: 20.0 MPPS in+out

NDR rates for 2p10GE, 1 core, L2 NIC-to_NIC

The following chart shows the NDR rates on: 2p10GE, 1 core, L2 NIC-to_NIC.

NDR rate for 2p10GE, 1 core, L2 NIC-to_NIC

NDR rates for 2p10GE, 1 core, L2 NIC-to-VM/VM-to-VM

The following chart shows the NDR rates on: 2p10GE, 1 core, L2 NIC-to-VM/VM-to-VM .

NOTE:

Virtual network infra benchmark of efficiency

All tests per connection only, single core

Potential higher performance with more connections, more cores

Latest SW: OVSDPDK 2.4.0, VPP 09/2015

NDR rates VPP versus OVSDPDK

The following chart show VPP performance compared to open-source and commercial reports.

The rates reflect VPP and OVSDPDK performance tested on Haswell x86 platform with E5-2698v3 2x16C 2.3GHz. The graphs shows NDR rates for 12 port 10GE, 16 core, IPv4.

VPP/What is VPP?
Last updated 8 years ago

Contents

Introduction

Modular, Flexible, and Extensible

Feature Rich

Why is it called vector processing?

Example Use Case: VPP as a vSwitch/vRouter

Local Programmability

Remote Programmability

Sample Data Plane Management Agent

Primary Characteristics Of VPP

Supported Architectures

Supported Packaging Models

Performance Expectations

Performance Metrics

NDR rates for 2p10GE, 1 core, L2 NIC-to_NIC

NDR rates for 2p10GE, 1 core, L2 NIC-to-VM/VM-to-VM

NDR rates VPP versus OVSDPDK

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools

VPP/What is VPP?Last updated 8 years ago

Contents

Introduction

Modular, Flexible, and Extensible

Feature Rich

Why is it called vector processing?

Example Use Case: VPP as a vSwitch/vRouter

Local Programmability

Remote Programmability

Sample Data Plane Management Agent

Primary Characteristics Of VPP

Supported Architectures

Supported Packaging Models

Performance Expectations

Performance Metrics

NDR rates for 2p10GE, 1 core, L2 NIC-to_NIC

NDR rates for 2p10GE, 1 core, L2 NIC-to-VM/VM-to-VM

NDR rates VPP versus OVSDPDK

Navigation menu

Search

VPP/What is VPP?
Last updated 8 years ago