Difference between revisions of "VPP/Using VPP as a VXLAN Tunnel Terminator"

From fd.io
< VPP
Jump to: navigation, search
m (Add paragraph for API message and fix a few indent's.)
(Added sections on End User Interface and Configuration info)
Line 218: Line 218:
 
The VXLAN tunnel modules follows the most up-to-date approach of providing tunnels on VPP, with a c source file for each of encap, decap and creation/deletion function in the VPP workspace directory.
 
The VXLAN tunnel modules follows the most up-to-date approach of providing tunnels on VPP, with a c source file for each of encap, decap and creation/deletion function in the VPP workspace directory.
  
open-vpp/vnet/vnet/vxlan:
+
<tt>.../vpp/vnet/vnet/vxlan</tt>
  
* encap.c – implement the vxlan-encap node to process packet output to the VXLAN tunnel.
+
* <tt>encap.c</tt> – implement the vxlan-encap node to process packet output to the VXLAN tunnel.
* decap.c – implement the vxlan-input node to process packet received on the VXLAN tunnel.
+
* <tt>decap.c</tt> – implement the vxlan-input node to process packet received on the VXLAN tunnel.
* vxlan.c – implement the VXLAN tunnel create/delete function to create or delete VXLAN tunnel interfaces and their associated data structure. The debug CLI to create and delete VXLAN tunnel is also provided by this file.
+
* <tt>vxlan.c</tt> – implement the VXLAN tunnel create/delete function to create or delete VXLAN tunnel interfaces and their associated data structure. The debug CLI to create and delete VXLAN tunnel is also provided by this file.
  
  
 
Other header and miscellaneous files present in this directory are as follows:
 
Other header and miscellaneous files present in this directory are as follows:
  
* vxlan.h – VXLAN tunnel related data structure, function prototypes and inlined body.
+
* <tt>vxlan.h</tt> – VXLAN tunnel related data structure, function prototypes and inlined body.
* vxlan_packet.h – VXLAN tunnel header related definitions.
+
* <tt>vxlan_packet.h</tt> – VXLAN tunnel header related definitions.
* vxlan_error.def – text for VXLAN tunnel related global counters.
+
* <tt>vxlan_error.def</tt> – text for VXLAN tunnel related global counters.
  
 
The files modified for ARP termination and MAC/IP binding notification are as follows:
 
The files modified for ARP termination and MAC/IP binding notification are as follows:
  
* open-vpp/vnet/vnet/l2/l2_input.c/h
+
* <tt>.../vpp/vnet/vnet/l2/l2_input.c/h</tt>
* open-vpp/vnet/vnet/l2/l2_bd.c/h
+
* <tt>.../vpp/vnet/vnet/l2/l2_bd.c/h</tt>
* open-vpp/vnet/vnet/ethernet/arp.c/h
+
* <tt>.../vpp/vnet/vnet/ethernet/arp.c/h</tt>
  
  
 
The files modified to allow MAC address to be specified for loopback interface are as follows:
 
The files modified to allow MAC address to be specified for loopback interface are as follows:
  
* open-vpp/vnet/vnet/ethernet/ethernet.c/h
+
* <tt>.../vpp/vnet/vnet/ethernet/ethernet.c/h</tt>
* open-vpp/vnet/vnet/ethernet/interface.c
+
* <tt>.../vpp/vnet/vnet/ethernet/interface.c</tt>
 
+
  
  
Line 248: Line 247:
 
=== API ===
 
=== API ===
  
VPP API message is defined by the file open-vpp/vpp/api/vpp.api. During the VPP build process, the vppapigen tool will be built and used to generate proper c typedef struct's as shown the the following sections.   
+
A VPP API message is defined in the file <tt>.../vpp/vpp/api/vpp.api</tt>. During the VPP build process, the vppapigen tool will be built and used to generate proper c typedef struct's as shown the the following sections.   
  
 
==== Bridge Domain Creation ====
 
==== Bridge Domain Creation ====
Line 335: Line 334:
  
  
The bits allocated for BD features in feature_bitmap are in open-vpp/vnet/vnet/l2/l2_bd.h:
+
The bits allocated for BD features in feature_bitmap are in <tt>.../vpp/vnet/vnet/l2/l2_bd.h</tt>:
  
 
   #define L2_LEARN    (1<<0)
 
   #define L2_LEARN    (1<<0)
Line 408: Line 407:
  
 
Note that BD membership or IP addresses of the loopback interface must be cleared before deleting a loopback interface. Otherwise, the VPP may become unstable or even crash. Similarly, sw_if_index used to delete a loopback interface must be the correct one or the result VPP behavior is unpredictable.
 
Note that BD membership or IP addresses of the loopback interface must be cleared before deleting a loopback interface. Otherwise, the VPP may become unstable or even crash. Similarly, sw_if_index used to delete a loopback interface must be the correct one or the result VPP behavior is unpredictable.
 +
 +
 +
== End User Interface and User Experience ==
 +
 +
The VPP platform is not expected to be used by end users directly. It is expected to be provisioned by control plane or orchestration software using its binary API.
 +
 +
For development or testing purposes, however, the VPP platform does provide command-line debug features that allow control of the VPP engine and that gathers debug information. In addition, a VPE API Test (VAT) tool with its own set of CLIs can also be used to call VPP APIs to provision the VPP engine while also testing its binary APIs.
 +
 +
This means that a developer who is modifying the control plane can use VAT to try out VPP API calling sequences to make sure that a configuration is provisioned properly.
 +
 +
=== Configuration Sequence ===
 +
 +
Be aware that the VPP platform expects the control plane to make API calls in the correct sequence. 
 +
 +
The control plane does not perform many error checks for out of sequence calls. This means that making an API call in an incorrect sequence can put the VPP engine into an unpredictable state or may even cause it to crash. It is preferable to use VPE API Test tool instead of using VPP’s debug API directly for testing - especially automated tests. Because the VPE_API Test tool calls the VPP API, all of the VPP API’s associate capabilities including API trace capture, replay, and custom dump can also be used to recreate any situation that lead to VPP issues or crash. The following subsections describe CLI/VAT commands that can be used to control the VPP functions or features and some relevant output helpful to diagnose issues.
 +
 +
 +
 +
==== Bridge Domain Creation ====
 +
 +
The following example command shows the configuration sequence to create a bridge domain with BD ID of 13 with learning, forwarding, unknown-unicast flood, flooding enabled and ARP-termination disabled:
 +
 +
 +
VAT commands
 +
 +
  bridge_domain_add_del bd_id 13 learn 1 forward 1 uu-flood 1 flood 1 arp-term 0
 +
 +
VPP commands
 +
 +
N/A – as an interface or a vxlan tunnel is added to a bridge domain which does not already exist, it will be created with learning, forwarding, unknown-unicast flood, flooding enabled and ARP-termination disabled by default.
 +
 +
 +
==== VXLAN Tunnel Creation and Setup ====
 +
 +
Following is the configuration sequence to create a VXLAN tunnel and put it into a bridge domain with BD ID of 13:
 +
 +
VAT commands
 +
 +
<pre>
 +
vxlan_add_del_tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 encap-vrf-id 7 decap-next l2
 +
sw_interface_dump
 +
sw_interface_set_l2_bridge vxlan_tunnel0 bd_id 13 shg 1
 +
</pre>
 +
 +
 +
VPP commands
 +
 +
<pre>
 +
create vxlan tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 encap-vrf-id 7 decap-next l2
 +
set interface l2 bridge vxlan_tunnel0 13 1
 +
</pre>
 +
 +
 +
==== VXLAN Tunnel Tear-Down and Deletion ====
 +
 +
Following is the configuration sequence to delete a VXLAN tunnel which must first be removed from any BD it is attached:
 +
 +
VAT commands
 +
 +
<pre>
 +
sw_interface_set_l2_bridge vxlan_tunnel0 bd_id 13 disable
 +
vxlan_add_del_tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 del
 +
</pre>
 +
 +
VPP commands
 +
 +
<pre>
 +
set interface l3 vxlan_tunnel0
 +
create vxlan tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 del
 +
</pre>
 +
 +
==== BVI Interface Creation and Setup ====
 +
 +
Following is the configuration sequence to create a loopback interface, put it into BD 13 as a BVI interface, put it into VRF 5 and assign an IP address with subnet of 6.0.0.250/16:
 +
 +
VAT commands
 +
 +
<pre>
 +
create_loopback mac 1a:2b:3c:4d:5e:6f
 +
sw_interface_dump
 +
sw_interface_set_flags admin-up loop0
 +
sw_interface_set_l2_bridge loop0 bd_id 13 shg 0 bvi
 +
sw_interface_set_table loop0 vrf 5
 +
sw_interface_add_del_address loop0 6.0.0.250/16
 +
</pre>
 +
 +
VPP commands
 +
 +
<pre>
 +
loopback create mac 1a:2b:3c:4d:5e:6f
 +
set interface l2 bridge loop0 13 bvi
 +
set interface state loop0 up
 +
set interface ip table loop0 5
 +
set interface ip address loop0 6.0.0.250/16
 +
</pre>
 +
 +
 +
==== BVI Interface Tear-Down and Deletion ====
 +
 +
Following is the configuration sequence to delete a loopback interface which is the BVI of a BD. Before the deletion, the loopback interface must be first removed from BD together with its IP address/subnet:
 +
 +
VAT commands
 +
 +
<pre>
 +
sw_interface_add_del_address loop0 6.0.0.250/16 del
 +
sw_interface_set_l2_bridge loop0 bd_id 13 bvi disable
 +
delete_loopback loop0
 +
</pre>
 +
 +
VPP commands
 +
 +
<pre>
 +
set interface ip address loop0 del all
 +
set interface l3 loop0
 +
loopback delete loop0
 +
</pre>
 +
 +
 +
==== Example Config of BD with BVI/VXLAN-Tunnel/Ethernet-Port ====
 +
 +
Following is the configuration sequence to create and add a loopback/BVI, a VXLAN tunnel, and an Ethernet interface into a BD:
 +
 +
VAT commands
 +
 +
<pre>
 +
create_loopback mac 1a:2b:3c:4d:5e:6f
 +
vxlan_add_del_tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 encap-vrf-id 7 decap-next l2
 +
sw_interface_dump
 +
sw_interface_set_flags admin-up loop0
 +
sw_interface_set_flags admin-up GigabitEthernet2/2/0
 +
sw_interface_set_l2_bridge GigabitEthernet2/2/0 bd_id 13 shg 0
 +
sw_interface_set_l2_bridge vxlan_tunnel0 bd_id 13 shg 1
 +
sw_interface_set_l2_bridge loop0 bd_id 13 shg 0 bvi
 +
sw_interface_set_table loop0 vrf 5
 +
sw_interface_add_del_address loop0 6.0.0.250/16
 +
</pre>
 +
 +
VPP commands
 +
 +
<pre>
 +
loopback create mac 1a:2b:3c:4d:5e:6f
 +
create vxlan tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 encap-vrf-id 7 decap-next l2
 +
set interface state loop0 up
 +
set interface state GigabitEthernet2/2/0 up
 +
set interface l2 bridge GigabitEthernet2/2/0 13 0
 +
set interface l2 bridge vxlan_tunnel0 13 1
 +
set interface l2 bridge loop0 13 0 bvi
 +
set interface ip table loop0 5
 +
set interface ip address loop0 6.0.0.250/16
 +
</pre>
 +
 +
==== Enable/Disable ARP termination of a BD ====
 +
 +
Following is a configuration example to enable or disable ARP termination of a BD:
 +
 +
VAT commands
 +
 +
  bridge_flags bd_id 11 arp-term
 +
  bridge_flags bd_id 11 arp-term clear
 +
 +
VPP commands
 +
 +
  set bridge-domain arp term 11
 +
  set bridge-domain arp term 11 disable
 +
 +
 +
==== Add/Delete IP to MAC Entry to a BD for ARP Termination ====
 +
 +
Following is a configuration example to add or delete an IP to MAC entry to/from a BD:
 +
 +
VAT commands
 +
 
 +
  bd_ip_mac_add_del bd_id 11 7.0.0.11 11:12:13:14:15:16
 +
  bd_ip_mac_add_del bd_id 11 7.0.0.11 11:12:13:14:15:16 del
 +
 +
VPP commands
 +
 +
  set bridge-domain arp entry 11 7.0.0.11 11:12:13:14:15:16
 +
  set bridge-domain arp entry 11 7.0.0.11 11:12:13:14:15:16 del
 +
 +
 +
==== Request MAC to IP Bindings from BDs with ARP Termination ====
 +
 +
Following is a configuration example to request MAC to BD binding notifications from BDs with ARP termination:
 +
 +
VAT commands
 +
 +
  want_ip4_arp_events address 0.0.0.0
 +
 +
VPP commands
 +
 +
N/A – only applicable to API
 +
 +
 +
 +
=== Show Command Output for VXLAN related Information ===
 +
 +
The following samples show typical output for various VPP show commands.  These commands show information relevant to BDs, VXLAN tunnels and stats.
 +
 +
 +
==== Bridge Domain and Port Info ====
 +
 +
<pre>
 +
DBGvpd# show bridge 
 +
  ID  Index  Learning  U-Forwrd  UU-Flood  Flooding  ARP-Term    BVI-Intf 
 +
  13    0        on        on        on        on        off        loop0   
 +
  11    1        on        on        on        on        on          loop1   
 +
 +
DBGvpd# show bridge 11 detail 
 +
  ID  Index  Learning  U-Forwrd  UU-Flood  Flooding  ARP-Term    BVI-Intf 
 +
  11    1        on        on        on        on        off        loop1   
 +
 +
          Interface          Index  SHG  BVI        VLAN-Tag-Rewrite     
 +
            loop1              11    0    *              none           
 +
    GigabitEthernet2/1/0        5    0    -              none           
 +
 +
DBGvpd# show bridge 13 detail 
 +
  ID  Index  Learning  U-Forwrd  UU-Flood  Flooding  ARP-Term    BVI-Intf 
 +
  13    0        on        on        on        on        on          loop0   
 +
 +
          Interface          Index  SHG  BVI        VLAN-Tag-Rewrite     
 +
            loop0              10    0    *              none           
 +
    GigabitEthernet2/2/0        6    0    -              none           
 +
        vxlan_tunnel0          9    1    -              none           
 +
 +
  IP4 to MAC table for ARP Termination
 +
      7.0.0.11      =>  11:12:13:14:15:16
 +
      7.0.0.10      =>  10:11:12:13:14:15
 +
 +
DBGvpd# show bridge 13 int 
 +
  ID  Index  Learning  U-Forwrd  UU-Flood  Flooding  ARP-Term    BVI-Intf 
 +
  13    0        on        on        on        on        on          loop0   
 +
 +
          Interface          Index  SHG  BVI        VLAN-Tag-Rewrite     
 +
            loop0              10    0    *              none           
 +
    GigabitEthernet2/2/0        6    0    -              none           
 +
        vxlan_tunnel0          9    1    -              none           
 +
 +
DBGvpd# show bridge 13 arp 
 +
  ID  Index  Learning  U-Forwrd  UU-Flood  Flooding  ARP-Term    BVI-Intf 
 +
  13    0        on        on        on        on        on          loop0   
 +
 +
  IP4 to MAC table for ARP Termination
 +
      7.0.0.11      =>  11:12:13:14:15:16
 +
      7.0.0.10      =>  10:11:12:13:14:15
 +
 +
</pre>
 +
 +
==== VXLAN Tunnel Info ====
 +
 +
<pre>
 +
DBGvpd# show vxlan tunnel 
 +
[0] 10.0.3.1 (src) 10.0.3.3 (dst) vni 13 encap_fib_index 1 decap_next l2
 +
</pre>
 +
 +
==== Interface Address and Modes ====
 +
 +
<pre>
 +
DBGvpd# show interface address 
 +
GigabitEthernet2/1/0 (up):
 +
  l2 bridge bd_id 11 shg 0
 +
GigabitEthernet2/2/0 (up):
 +
  l2 bridge bd_id 13 shg 0
 +
GigabitEthernet2/3/0 (up):
 +
  10.0.3.1/24 table 7
 +
GigabitEthernet2/4/0 (dn):
 +
local0 (dn):
 +
loop0 (up):
 +
  l2 bridge bd_id 13 bvi shg 0
 +
  6.0.0.250/16 table 5
 +
loop1 (up):
 +
  l2 bridge bd_id 11 bvi shg 0
 +
  7.0.0.250/24 table 5
 +
pg/stream-0 (dn):
 +
pg/stream-1 (dn):
 +
pg/stream-2 (dn):
 +
pg/stream-3 (dn):
 +
vxlan_tunnel0 (up):
 +
  l2 bridge bd_id 13 shg 1
 +
 +
</pre>
 +
   
 +
==== Interface Stats ====
 +
 +
<pre>
 +
DBGvpd# show interface 
 +
 +
              Name              Idx      State          Counter          Count   
 +
GigabitEthernet2/1/0              5        up      rx packets                    12
 +
                                                    rx bytes                    1138
 +
                                                    tx packets                    9
 +
                                                    tx bytes                    826
 +
GigabitEthernet2/2/0              6        up      tx packets                    2
 +
                                                    tx bytes                      84
 +
GigabitEthernet2/3/0              7        up      rx packets                    12
 +
                                                    rx bytes                    1612
 +
                                                    tx packets                    12
 +
                                                    tx bytes                    1576
 +
                                                    drops                          1
 +
                                                    ip4                          11
 +
GigabitEthernet2/4/0              8        down     
 +
local0                            0        down     
 +
loop0                            10        up      rx packets                    11
 +
                                                    rx bytes                    848
 +
                                                    tx packets                    12
 +
                                                    tx bytes                    1026
 +
                                                    drops                          3
 +
                                                    ip4                            9
 +
loop1                            11        up      rx packets                    12
 +
                                                    rx bytes                    970
 +
                                                    tx packets                    9
 +
                                                    tx bytes                    826
 +
                                                    drops                          3
 +
                                                    ip4                          11
 +
pg/stream-0                      1        down     
 +
pg/stream-1                      2        down     
 +
pg/stream-2                      3        down     
 +
pg/stream-3                      4        down     
 +
vxlan_tunnel0                    9        up      rx packets                    11
 +
                                                    rx bytes                    1002
 +
                                                    tx packets                    12
 +
                                                    tx bytes                    1458
 +
</pre>
 +
 +
==== Graph Node Global Counters ====
 +
 +
<pre>
 +
DBGvpd# show node counters 
 +
      Count                      Node                        Reason     
 +
              1                arp-input              ARP replies sent
 +
              4                arp-input              ARP replies received
 +
              2              arp-term-l2bd            ARP replies sent
 +
              8                l2-flood                L2 flood packets
 +
              25                l2-input                L2 input packets
 +
              14                l2-learn                L2 learn packets
 +
              2                l2-learn                L2 learn misses
 +
              12                l2-learn                L2 learn hits
 +
              13                l2-output              L2 output packets
 +
              6              vxlan-encap              good packets encapsulated
 +
              5              vxlan-input              good packets decapsulated
 +
              5                ip4-arp                ARP requests sent
 +
</pre>
 +
 +
 +
=== Packet Trace ===
 +
 +
The following example shows typical output from a packet trace of a ping (an ICMP echo request packet). In the following example, the ping is sent from a port with IP address 7.0.0.2 in a BD (with bd_index 1) to an IP address 6.0.4.4.
 +
 +
A typical VPP command to capture the next 10 packets is:
 +
 +
  trace add dpdk-input 10
 +
 +
The VPP command to show packet trace is:
 +
 +
  show trace
 +
 +
The destination IP of 6.0.4.4 resides in another BD (with bd_index 0) on another server. Thus, the packet was forwarded from the 1st BD the 2nd BD via BVI and then be sent from the 2nd BD via VXLAN tunnel to the other server. The ICMP echo response was received at the local VTEP IP address and then forwarded in the 2nd BD after VXLAN header decap. The packet was then forwarded from the 2nd BD to 1st BD via BVI and finally output on the port with IP address 7.0.0.2.
 +
 +
Look for output from the vxlan-encap node and vxlan-input node by searching for the strings:
 +
 +
* vxlan-encap
 +
* vxlan-input
 +
 +
<pre>
 +
Packet 7
 +
 +
17:39:13:781645: dpdk-input
 +
  GigabitEthernet2/1/0 rx queue 0
 +
  buffer 0x3fdb40: current data 0, length 98, free-list 0, totlen-nifb 0, trace 0x6
 +
  PKT MBUF: port 0, nb_segs 1, pkt_len 98
 +
    buf_len 2304, data_len 98, ol_flags 0x0
 +
  IP4: 00:50:56:88:ca:7e -> de:ad:00:00:00:01
 +
  ICMP: 7.0.0.2 -> 6.0.4.4
 +
    tos 0x00, ttl 64, length 84, checksum 0x9996
 +
    fragment id 0x900d, flags DONT_FRAGMENT
 +
  ICMP echo_request checksum 0xbb77
 +
17:39:13:781653: ethernet-input
 +
  IP4: 00:50:56:88:ca:7e -> de:ad:00:00:00:01
 +
17:39:13:781658: l2-input
 +
  l2-input: sw_if_index 5 dst de:ad:00:00:00:01 src 00:50:56:88:ca:7e
 +
17:39:13:781660: l2-learn
 +
  l2-learn: sw_if_index 5 dst de:ad:00:00:00:01 src 00:50:56:88:ca:7e bd_index 1
 +
17:39:13:781662: l2-fwd
 +
  l2-fwd:  sw_if_index 5 dst de:ad:00:00:00:01 src 00:50:56:88:ca:7e bd_index 1
 +
17:39:13:781664: ip4-input
 +
  ICMP: 7.0.0.2 -> 6.0.4.4
 +
    tos 0x00, ttl 64, length 84, checksum 0x9996
 +
    fragment id 0x900d, flags DONT_FRAGMENT
 +
  ICMP echo_request checksum 0xbb77
 +
17:39:13:781669: ip4-rewrite-transit
 +
  fib: 2 adjacency: loop0
 +
                    005056887b68dead000000000800 flow hash: 0x00000000
 +
  00000000: 005056887b68dead00000000080045000054900d40003f019a96070000020600
 +
  00000020: 04040800bb7719b1000481e02b5500000000aaca0c00000000001011
 +
17:39:13:781671: l2-input
 +
  l2-input: sw_if_index 11 dst 00:50:56:88:7b:68 src de:ad:00:00:00:00
 +
17:39:13:781672: l2-fwd
 +
  l2-fwd:  sw_if_index 10 dst 00:50:56:88:7b:68 src de:ad:00:00:00:00 bd_index 0
 +
17:39:13:781673: l2-output
 +
  l2-output: sw_if_index 9 dst 00:50:56:88:7b:68 src de:ad:00:00:00:00
 +
17:39:13:781675: vxlan-encap
 +
  VXLAN-ENCAP: tunnel 0 vni 13
 +
17:39:13:781678: ip4-rewrite-transit
 +
  fib: 2 adjacency: GigabitEthernet2/3/0
 +
                    IP4: 00:50:56:88:90:33 -> 00:50:56:88:00:ac flow hash: 0x00000000
 +
  IP4: 00:50:56:88:90:33 -> 00:50:56:88:00:ac
 +
  UDP: 10.0.3.1 -> 10.0.3.3
 +
    tos 0x00, ttl 253, length 134, checksum 0xa363
 +
    fragment id 0x0000
 +
  UDP: 29806 -> 4789
 +
    length 114, checksum 0x0000
 +
17:39:13:781681: GigabitEthernet2/3/0-output
 +
  GigabitEthernet2/3/0
 +
  IP4: 00:50:56:88:90:33 -> 00:50:56:88:00:ac
 +
  UDP: 10.0.3.1 -> 10.0.3.3
 +
    tos 0x00, ttl 253, length 134, checksum 0xa363
 +
    fragment id 0x0000
 +
  UDP: 29806 -> 4789
 +
    length 114, checksum 0x0000
 +
17:39:13:781683: GigabitEthernet2/3/0-tx
 +
  GigabitEthernet2/3/0 tx queue 0
 +
  buffer 0x3fdb40: current data -50, length 148, free-list 0, totlen-nifb 0, trace 0x6
 +
  IP4: 00:50:56:88:90:33 -> 00:50:56:88:00:ac
 +
  UDP: 10.0.3.1 -> 10.0.3.3
 +
    tos 0x00, ttl 253, length 134, checksum 0xa363
 +
    fragment id 0x0000
 +
  UDP: 29806 -> 4789
 +
    length 114, checksum 0x0000
 +
 +
Packet 8
 +
 +
17:39:13:781815: dpdk-input
 +
  GigabitEthernet2/3/0 rx queue 0
 +
  buffer 0x18eec0: current data 0, length 148, free-list 0, totlen-nifb 0, trace 0x7
 +
  PKT MBUF: port 2, nb_segs 1, pkt_len 148
 +
    buf_len 2304, data_len 148, ol_flags 0x0
 +
  IP4: 00:50:56:88:00:ac -> 00:50:56:88:90:33
 +
  UDP: 10.0.3.3 -> 10.0.3.1
 +
    tos 0x00, ttl 253, length 134, checksum 0xa363
 +
    fragment id 0x0000
 +
  UDP: 24742 -> 4789
 +
    length 114, checksum 0x0000
 +
17:39:13:781819: ethernet-input
 +
  IP4: 00:50:56:88:00:ac -> 00:50:56:88:90:33
 +
17:39:13:781823: ip4-input
 +
  UDP: 10.0.3.3 -> 10.0.3.1
 +
    tos 0x00, ttl 253, length 134, checksum 0xa363
 +
    fragment id 0x0000
 +
  UDP: 24742 -> 4789
 +
    length 114, checksum 0x0000
 +
17:39:13:781825: ip4-local
 +
  fib: 1 adjacency: local 10.0.3.1/24 flow hash: 0x00000000
 +
17:39:13:781827: ip4-udp-lookup
 +
  UDP: src-port 24742 dst-port 4789
 +
17:39:13:781830: vxlan-input
 +
  VXLAN: tunnel 0 vni 13 next 1 error 0
 +
17:39:13:781832: l2-input
 +
  l2-input: sw_if_index 9 dst de:ad:00:00:00:00 src 00:50:56:88:7b:68
 +
17:39:13:781833: l2-learn
 +
  l2-learn: sw_if_index 9 dst de:ad:00:00:00:00 src 00:50:56:88:7b:68 bd_index 0
 +
17:39:13:781835: l2-fwd
 +
  l2-fwd:  sw_if_index 9 dst de:ad:00:00:00:00 src 00:50:56:88:7b:68 bd_index 0
 +
17:39:13:781836: ip4-input
 +
  ICMP: 6.0.4.4 -> 7.0.0.2
 +
    tos 0x00, ttl 64, length 84, checksum 0x7002
 +
    fragment id 0xf9a1
 +
  ICMP echo_reply checksum 0xc377
 +
17:39:13:781838: ip4-rewrite-transit
 +
  fib: 2 adjacency: loop1
 +
                    00505688ca7edead000000010800 flow hash: 0x00000000
 +
  00000000: 00505688ca7edead00000001080045000054f9a100003f017102060004040700
 +
  00000020: 00020000c37719b1000481e02b5500000000aaca0c00000000001011
 +
17:39:13:781840: l2-input
 +
  l2-input: sw_if_index 10 dst 00:50:56:88:ca:7e src de:ad:00:00:00:01
 +
17:39:13:781842: l2-fwd
 +
  l2-fwd:  sw_if_index 11 dst 00:50:56:88:ca:7e src de:ad:00:00:00:01 bd_index 1
 +
17:39:13:781843: l2-output
 +
  l2-output: sw_if_index 5 dst 00:50:56:88:ca:7e src de:ad:00:00:00:01
 +
17:39:13:781845: GigabitEthernet2/1/0-output
 +
  GigabitEthernet2/1/0
 +
  IP4: de:ad:00:00:00:01 -> 00:50:56:88:ca:7e
 +
  ICMP: 6.0.4.4 -> 7.0.0.2
 +
    tos 0x00, ttl 63, length 84, checksum 0x7102
 +
    fragment id 0xf9a1
 +
  ICMP echo_reply checksum 0xc377
 +
17:39:13:781846: GigabitEthernet2/1/0-tx
 +
  GigabitEthernet2/1/0 tx queue 0
 +
  buffer 0x18eec0: current data 50, length 98, free-list 0, totlen-nifb 0, trace 0x7
 +
  IP4: de:ad:00:00:00:01 -> 00:50:56:88:ca:7e
 +
  ICMP: 6.0.4.4 -> 7.0.0.2
 +
    tos 0x00, ttl 63, length 84, checksum 0x7102
 +
    fragment id 0xf9a1
 +
  ICMP echo_reply checksum 0xc377
 +
 +
</pre>
 +
 +
 +
== Restrictions ==
 +
 +
The following list of features might be useful to add in order to improve VXLAN functionality. However, these features are not supported in the current implementation:
 +
 +
* VXLAN multicast mode using IP multicast to flood VXLAN segments to other VTEPs (receiving of IP multicast packets from other VTEPs can be supported).
 +
* MAC aging to clear learned MAC entries which may be stale after a timeout period.

Revision as of 19:33, 11 January 2016

This page describes the support in the VPP platform for a Virtual eXtensible LAN (VXLAN).

Contents

Introduction

A VXLAN provides the features needed to allow L2 bridge domains (BDs) to span multiple servers. This is done by building an L2 overlay on top of an L3 network underlay using VXLAN tunnels.

This makes it possible for servers to be co-located in the same data center or be separated geographically as long as they are reachable through the underlay L3 network.

You can refer to this kind of L2 overlay bridge domain as a VXLAN (Virtual eXtensible VLAN) segment.


Features

This implementation of support for VXLAN in the VPP engine includes the following features:

  • Makes use of the existing VPP L2 bridging and cross-connect functionality.
  • Allows creation of VXLAN as per RFC-7348 to extend L2 network over L3 underlay.
  • Provides Unicast mode where packet replication done at head end toward remote VTEPS.
  • Supports Split Horizon Group (SHG) numbering in packet replication.
  • Supports interoperations with a Bridge Virtual Interface (BVI) to allow inter-VXLAN or VLAN packet forwarding via routing.
  • Supports VXLAN to VLAN gateway.
  • Supports ARP request termination.

VXLAN as defined in RFC 7348 allows VPP bridge domains on multiple servers to be interconnected via VXLAN tunnels to behave as a single bridge domain which can also be called a VXLAN segment. Thus, VMs on multiple servers connected to this bridge domain (or VXLAN segment) can communicate with each other via layer 2 networking.

The functions supported by VPP for VXLAN are described in the following sub-sections.


VXLAN Tunnel Encap and Decap

VXLAN Headers

The VXLAN tunnel encap includes IP, UDP and VXLAN headers as follows:

   Outer IPv4 Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|  IHL  |Type of Service|          Total Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Identification        |Flags|      Fragment Offset    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Time to Live |Protocl=17(UDP)|   Header Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Outer Source IPv4 Address               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   Outer Destination IPv4 Address              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Outer UDP Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Source Port         |       Dest Port = VXLAN Port  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           UDP Length          |        UDP Checksum           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   VXLAN Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R|R|R|R|I|R|R|R|            Reserved                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                VXLAN Network Identifier (VNI) |   Reserved    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Inner Ethernet Header:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Inner Destination MAC Address                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Inner Destination MAC Address | Inner Source MAC Address      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Inner Source MAC Address                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |OptnlEthtype = C-Tag 802.1Q    | Inner.VLAN Tag Information    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Payload:
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Ethertype of Original Payload |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                                  Original Ethernet Payload    |
   |                                                               |
   |(Note that the original Ethernet Frame's FCS is not included)  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


The headers are used as follows:

  • In the IP header, DIP is the destination server VTEP and SIP is the local server VTEP. After VXLAN header encap, the DIP will be used by the encap server to forward packet to the destination server via L3 routing, assuming routes to all destination VTEP IPs are already set up correctly.
  • At destination, the DIP is local and the UDP destination port number let the server identify the packet as a VXLAN packet and start VXLAN decap processing.
  • VXLAN decap processing uses the VNI from the VXLAN header to identify how to perform L2 forwarding of the inner Ethernet frame. The VNI is typically associated with a BD in the server where the inner frame should be forwarded.


UDP Port Numbering for VXLAN Header

The UDP destination port number in the VXLAN encap must be 4789, as assigned by IANA for VXLAN. The UDP source port value in the VXLAN encap should be set according to flow-hash of payload Ethernet frame to help with ECMP load-balance as packet is forwarded in the underlay network. The hash would be a normal 5-tuple hash for IP4/6 packets and a 3 tuple SMAC/DMAC/Etype hash for other packet types.


VTEPs and VXLAN Tunnel Creation

Create VXLAN Tunnel with VTEPs

VTEPs (VXLAN Tunnel End Points) are specified via VXLAN tunnel creation – the source and destination IP addresses of each VXLAN tunnel are the local server VTEP address and the destination server VTEP address. The VNI value used for the VXLAN tunnel is also specified on VXLAN tunnel creation. Once a VXLAN tunnel is created, it is like a VPP interface and not yet associated with any BD.

Associate VXLAN Tunnel with BD

Once a VXLAN tunnel interface is created, it can be added to a bridge domain (BD) as a bridge port by specifying its BDID, just like how a local Ethernet interface can be added to a BD. As a VXLAN tunnel is added to a BD, the VNI used for creating the VXLAN tunnel will be mapped to the BDID. It is a good practice to allocate the same value for both VNI and BDID for all VXLAN tunnels on the same BD or VXLAN segment for all servers to prevent confusion.

Connecting VXLAN Tunnels among Multiple Servers

To setup a VXLAN segment or BD over multiple servers, it is recommended that a VPP BD with the same BDID should be created on each server and then a full mesh of VXLAN tunnels among all servers should be created to link up this BD in each server. In other word, on each server with this BD, a VXLAN tunnel with its VNI set to the same value as the BDID should preferably be created for each of the other servers and be added to the BD. Making all BDIDs and VNIs the same value makes VXLAN segment connectivity much more apparent and less confusing.


VXLAN Flooding

VXLAN Unicast Mode with Head-End Replication

As VXLAN tunnel is just a bridge port, the normal packet replication to all bridge ports in a BD will happen naturally. This behavior matches the VXLAN unicast mode operation where headend packet replication is used to flood packets to remote VTEPs. The VXLAN multicast mode, utilizing IP multicast for flooding to remote VTEPs, is NOT supported.

Split Horizon Group

As VXLAN tunnels are added to a BD, they must be configured with the same and non-zero Split Horizon Group (SHG) number. Otherwise, flood packet may loop among servers with the same VXLAN segment because VXLAN tunnels are fully meshed among servers.


VXLAN over IPv6

VXLAN tunnel over an IPv6 underlay network is not supported in this implementation. Currently, only IPv4 underlay is supported.

The payload for VXLAN tunnel are Ethernet frames which may contain IPv4, IPv6 or packets with other protocols.


VXLAN Tunnel Input and Output with Stats

The VXLAN tunneling implementation supports full interface TX/RX stats with packet and byte counters as follows:

  • Packet input from a VXLAN tunnel interface will perform decap of IP/VXLAN header before forwarding L2 forwarded. So packet size show in the RX stats is after decap so excludes IP/UDP/VXLAN header.
  • Packet output to VXLAN tunnel interface will perform encap of IP/VXLAN header before being L3 forwarded. So output packet size shown in the TX stats is after encap so includes IP/UDP/VXLAN header.

Most features that can be done on a L2 interface, such as VLAN tag manipulation (or input ACL) can also be applied to VXLAN-tunnel interface.

VXLAN to VLAN Gateway Support

VXLAN to VLAN gateway function can be performed in VPP via the bridge port VLAN tag manipulation function. If an Ethernet port is connected to a VLAN and this port is in promiscuous mode, one can create a sub-interface on this port for the VLAN add this sub-interface to the VXLAN segment or BD with the proper VLAN tag pop/push/rewrite operation to perform VXLAN to VLAN gateway function.

For the case where no other ports are on the VXLAN BD except the VXLAN tunnel and VLAN sub-interface, one can simply cross connect the VXLAN tunnel to the VLAN sub-interface with proper VLAN tag manipulation to become an extremely efficient VXLAN to VLAN gateway.

BVI Support

A bridge domain (BD) can have an L3 Bridge Virtual Interface (BVI) (one per BD) with multiple VXLAN-tunnels and Ethernet bridge ports. All three types of bridge ports (BVI, VXLAN and Ethernet) must interoperate properly to forward traffic to each other. BVI allow VMs on separate BDs or VXLAN segments to reach each other via IRB (Integrated Routing and Bridging).

The BVI for a BD is setup by creating and setup a loopback interface and then add it to the BD as its BVI interface. A BD in a VPP can only have one BVI. A VXLAN segment consists of multiple BDs from multiple VPPs, however, may have multiple BVIs with one BVI on each BD of each VPP.


ARP Request Termination

With VXLAN BD MACs provisioned statically and MAC learning disabled, VPP can perform L2 unicast forwarding very efficiently since unknown unicast flooding is not necessary. With IP4 unicast traffic, however, there is still one source of broadcast traffic due to ARP requests from tenant VMs to find MACs for IP addresses.

In order to minimize flooding of ARP requests to the whole VXLAN segment, VPP will allow control plan to provision any VPP BD with IP addresses and their associated MACs. Thereafter, VPP can utilize the IP and MAC information it has for the BD to terminate ARP requests and generate appropriate ARP responses to the requesting tenant VMs. If no suitable IP/MAC info is found, the ARP request cannot be terminated and so will still be flooded to the VXLAN segment.


Cross Connect instead of Bridging

As an optimization, the VXLAN tunnel interface can also be cross connected to a L2 interface if there are only 2 bridge ports, being the VXLAN tunnel and an Ethernet port, in a BD. The cross connect optimization can improve packet forwarding performance as bridging overhead of MAC learning and lookup will not be necessary.

Instead of using cross connect, it is still possible to optimize BD forwarding with 2 bridge ports if both MAC learning and MAC lookup are disabled. The ability to disable learning is supported in VPP while MAC lookup cannot be disabled at present.

Another usage can be to cross connect two VXLAN tunnels to perform VNI stitching.


Architecture

The following sections explain the design approach behind the various aspects of the VPP VXLAN tunneling implementation.


VXLAN Tunnels

In order to support VXLAN on VPP, the design is to provide the ability to create VXLAN tunnel interface which can be added to a bridge domain as a bride port to participate in L2 forwarding. Thus, the current VPP L2 bridging functionality will naturally interact with VXLAN tunnels as normal bridge ports, perform learning, flooding and unicast forwarding transparently.

It is the VXLAN tunnel code which will handle the encap and decap handling of the forwarded packet as follows:

  • OUTPUT – as a packet is L2 forwarded in a BD to a VXLAN tunnel, the L2 output for that packet will naturally send it to the VXLAN output node vxlan-encap. This node can then lookup the tunnel control block of the VXLAN tunnel and find the encap string and VRF for IP forwarding. The node can then put the encap string on the packet and setup VRF in the packet context and then send the packet to ip4-looup node to perform forwarding.
  • INPUT – On receiving a VXLAN encap packet with DIP being a local IP (in fact, VPP’s VTEP IP) address, the packet will get to ip4-local node for processing. The ip4-local node will then send the packet to vxlan-input node because the UDP destination port number is for VXLAN (4789). The vxlan-input node will use the SIP and VNI of the outer header to lookup the VXLAN tunnel control block to obtain its sw_if_index, remove the VXLAN header from the packet, setup the sw_if_index of the VXLAN tunnel as input sw_if_index, and finally pass the packet to l2-input node for L2 forwarding.

Note that vxlan-input node does not map VNI to BDID but rather to the sw_if_index of the VXLAN tunnel and set it as the input interface of the decap’ed Ethernet payload. Thereafter, the L2 forwarding of the Ethernet packet just proceeds according to whether it is in a BD or cross connected to another L2 interface. This flexibility will even allow two VXLAN tunnels to be cross connected to each other to allow VNI stitching which may be useful in the future.

The following two diagrams show the relationship of the newly added VXLAN encap node and input(decap) node with the other VPP nodes while forwarding packets, assuming the VXLAN tunnel is configured as an interface on a BD.

Diagram of VPP graph node path to VXLAN encap

VPP graph node path for VXLAN decap


ARP Request Termination

The ARP termination feature is added to the L2 feature list with a bit allocated from the feature bitmap correspond to a new graph node l2-arp-term. The arp-term-l2bd node is added to the ARP handling module of VPP to process ARP request packets in the L2 forwarding path. This feature bit will be chosen so that l2-arp-term will be called just before the l2-flood node.

A new API/CLI that allows the user to provide IP and MAC address bindings for a specified BD is also added to the L2 BD handling module of VPP. Each MAC address will be stored in a hash table mac_by_ip4 for the BD using IP address as key.

The classify-and-dispatch path of the l2-input node of VPP is enhanced decide whether the ARP termination bit should be enabled for the incoming packet or not. This check is done in the broadcast MAC classification path only so normal unicast and mcast packet performance is not affected. If a packet is a broadcast ARP packet and ARP termination is enabled for a BD, the ARP termination bit in the feature bitmap will be set for this packet. Thereafter, the packet will go through L2 feature processing in its normal order and hit the ARP termination node. For other packets, or if ARP termination is not enabled in the BD, the ARP termination bit of the packet will be clear so no extra processing overhead is added.

The new graph node arp-term-l2bd will lookup mac_by_ip4 using the ARP request IP as key. If a MAC entry is found, an ARP reply will be sent to the input interface. If entry is not found, the ARP request packet will be flooded normally following the normal L2 broadcast forwarding path.


Loopback Interface MAC

The loopback interface creation CLI/API is enhanced to let user specify MAC address on creation. A new API is also added to allow deletion of a loopback interfaces. Thus, loopback interface can be configured as BVI for BDs with the desired MAC address. If VPP assigned MAC address is used for loopback/BVI, there is a potential for MAC address conflicts when VXLAN tunnels are used to connect multiple BDs, each with its own BVI, among servers.


Software Modules

The VXLAN tunnel modules follows the most up-to-date approach of providing tunnels on VPP, with a c source file for each of encap, decap and creation/deletion function in the VPP workspace directory.

.../vpp/vnet/vnet/vxlan

  • encap.c – implement the vxlan-encap node to process packet output to the VXLAN tunnel.
  • decap.c – implement the vxlan-input node to process packet received on the VXLAN tunnel.
  • vxlan.c – implement the VXLAN tunnel create/delete function to create or delete VXLAN tunnel interfaces and their associated data structure. The debug CLI to create and delete VXLAN tunnel is also provided by this file.


Other header and miscellaneous files present in this directory are as follows:

  • vxlan.h – VXLAN tunnel related data structure, function prototypes and inlined body.
  • vxlan_packet.h – VXLAN tunnel header related definitions.
  • vxlan_error.def – text for VXLAN tunnel related global counters.

The files modified for ARP termination and MAC/IP binding notification are as follows:

  • .../vpp/vnet/vnet/l2/l2_input.c/h
  • .../vpp/vnet/vnet/l2/l2_bd.c/h
  • .../vpp/vnet/vnet/ethernet/arp.c/h


The files modified to allow MAC address to be specified for loopback interface are as follows:

  • .../vpp/vnet/vnet/ethernet/ethernet.c/h
  • .../vpp/vnet/vnet/ethernet/interface.c


API

A VPP API message is defined in the file .../vpp/vpp/api/vpp.api. During the VPP build process, the vppapigen tool will be built and used to generate proper c typedef struct's as shown the the following sections.

Bridge Domain Creation

The following is the VPP API message definition for creating and deleting bridge domains:

typedef VL_API_PACKED(struct _vl_api_bridge_domain_add_del_tunnel {
    u32 client_index;
    u32 context;
    u32 bd_id;
    u8 flood;
    u8 uu_flood;
    u8 forward;
    u8 learn;
    u8 arp_term;
    u8 is_add;
}) vl_api_bridge_domain_add_del_t;

typedef VL_API_PACKED(struct _vl_api_bridge_domain_add_del_reply {
    u16 _vl_msg_id;
    u32 context;
    i32 retval;
}) vl_api_bridge__add_del_tunnel_reply_t;


In the bridge domain create/delete message, the fields flood, uu_flood, forward, learn and arp-term specify whether each of the relevant features are enabled or not, with 0 indicate disabled and non-zero enabled. The field is_add is set to 1 to add/modify new/existing bridge domain and set to 0 to delete a bridge domain as specified by bd_id value.


VXLAN Tunnel Creation

The following is the VPP API message definition for creating and deleting VXLAN tunnels:

typedef VL_API_PACKED(struct _vl_api_vxlan_add_del_tunnel {
    u16 _vl_msg_id;
    u32 client_index;
    u32 context;
    u8 is_add;
    u32 src_address;
    u32 dst_address;
    u32 encap_vrf_id;
    u32 decap_next_index;
    u32 vni;
}) vl_api_vxlan_add_del_tunnel_t;

typedef VL_API_PACKED(struct _vl_api_vxlan_add_del_tunnel_reply {
    u16 _vl_msg_id;
    u32 context;
    i32 retval;
    u32 sw_if_index;
}) vl_api_vxlan_add_del_tunnel_reply_t;


In the tunnel create/delete message, the field decap_next_index should be set to the next index for l2-input node. Next index will default to that for l2-input node if the value specified for decap_next_index in the message is ~0 (0xffffffff). The field next_index can also be set to ip4-input or ip6-input node if payload is IP4/IP6 packet without L2 header but this is not the normal usage of VXLAN tunnel.

In the reply message, the sw_if_index of the created VXLAN tunnel is returned which can then be used to add it to a BD or cross connect to another L2 interface.


ARP Termination

The following set bridge flags API message is enhanced to allow ARP termination be enabled or disabled on a BD with a new bit allocated:

typedef VL_API_PACKED(struct _vl_api_bridge_flags {
    u16 _vl_msg_id;
    u32 client_index;
    u32 context;
    u32 bd_id;
    u8  is_set;
    u32 feature_bitmap;
}) vl_api_bridge_flags_t;

typedef VL_API_PACKED(struct _vl_api_bridge_flags_reply {
    u16 _vl_msg_id;
    u32 context;
    u32 retval;
    u32 resulting_feature_bitmap;
}) vl_api_bridge_flags_reply_t;


The bits allocated for BD features in feature_bitmap are in .../vpp/vnet/vnet/l2/l2_bd.h:

 #define L2_LEARN    (1<<0)
 #define L2_FWD      (1<<1)
 #define L2_FLOOD    (1<<2)
 #define L2_UU_FLOOD (1<<3)
 #define L2_ARP_TERM (1<<4)

The following is the API message for adding and deleting IP and MAC entries into BDs to support ARP request termination:

typedef VL_API_PACKED(struct _vl_api_bd_ip_mac_add_del {
    u16 _vl_msg_id;
    u32 client_index;
    u32 context;
    u32 bd_id;
    u8 is_add;
    u8 is_ipv6;
    u8 ip_address[16];
    u8 mac_address[6];
}) vl_api_bd_ip_mac_add_del_t;

typedef VL_API_PACKED(struct _vl_api_bd_ip_mac_add_del_reply {
    u16 _vl_msg_id;
    u32 context;
    i32 retval;
}) vl_api_bd_ip_mac_add_del_reply_t;


The API is designed to be generic to support IPv6 as well. For now, the API call will fail if is_ipv6 is set as IPv6 neighbor discovery termination is not yet supported.


Loopback Interface Creation

The following sample shows the API message for creating and deleting loopback interfaces:

typedef VL_API_PACKED(struct _vl_api_create_loopback {
    u16 _vl_msg_id;
    u32 client_index;
    u32 context;
    u8  mac_address[6];
}) vl_api_create_loopback_t;

typedef VL_API_PACKED(struct _vl_api_create_loopback_reply {
    u16 _vl_msg_id;
    u32 context;
    u32 sw_if_index;
    i32 retval;
}) vl_api_create_loopback_reply_t;

typedef VL_API_PACKED(struct _vl_api_delete_loopback {
    u16 _vl_msg_id;
    u32 client_index;
    u32 context;
    u32 sw_if_index;
}) vl_api_delete_loopback_t;

typedef VL_API_PACKED(struct _vl_api_delete_loopback_reply {
    u16 _vl_msg_id;
    u32 context;
    i32 retval;
}) vl_api_delele_loopback_reply_t;


If the value 0 is used for mac_address, then the default MAC address of dead:0000:000n will be used where n is the loopback instance number.

The sw_if_index field is used to specify the loopback interface to delete.

Note that BD membership or IP addresses of the loopback interface must be cleared before deleting a loopback interface. Otherwise, the VPP may become unstable or even crash. Similarly, sw_if_index used to delete a loopback interface must be the correct one or the result VPP behavior is unpredictable.


End User Interface and User Experience

The VPP platform is not expected to be used by end users directly. It is expected to be provisioned by control plane or orchestration software using its binary API.

For development or testing purposes, however, the VPP platform does provide command-line debug features that allow control of the VPP engine and that gathers debug information. In addition, a VPE API Test (VAT) tool with its own set of CLIs can also be used to call VPP APIs to provision the VPP engine while also testing its binary APIs.

This means that a developer who is modifying the control plane can use VAT to try out VPP API calling sequences to make sure that a configuration is provisioned properly.

Configuration Sequence

Be aware that the VPP platform expects the control plane to make API calls in the correct sequence.

The control plane does not perform many error checks for out of sequence calls. This means that making an API call in an incorrect sequence can put the VPP engine into an unpredictable state or may even cause it to crash. It is preferable to use VPE API Test tool instead of using VPP’s debug API directly for testing - especially automated tests. Because the VPE_API Test tool calls the VPP API, all of the VPP API’s associate capabilities including API trace capture, replay, and custom dump can also be used to recreate any situation that lead to VPP issues or crash. The following subsections describe CLI/VAT commands that can be used to control the VPP functions or features and some relevant output helpful to diagnose issues.


Bridge Domain Creation

The following example command shows the configuration sequence to create a bridge domain with BD ID of 13 with learning, forwarding, unknown-unicast flood, flooding enabled and ARP-termination disabled:


VAT commands

 bridge_domain_add_del bd_id 13 learn 1 forward 1 uu-flood 1 flood 1 arp-term 0

VPP commands

N/A – as an interface or a vxlan tunnel is added to a bridge domain which does not already exist, it will be created with learning, forwarding, unknown-unicast flood, flooding enabled and ARP-termination disabled by default.


VXLAN Tunnel Creation and Setup

Following is the configuration sequence to create a VXLAN tunnel and put it into a bridge domain with BD ID of 13:

VAT commands

vxlan_add_del_tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 encap-vrf-id 7 decap-next l2
sw_interface_dump
sw_interface_set_l2_bridge vxlan_tunnel0 bd_id 13 shg 1


VPP commands

create vxlan tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 encap-vrf-id 7 decap-next l2
set interface l2 bridge vxlan_tunnel0 13 1


VXLAN Tunnel Tear-Down and Deletion

Following is the configuration sequence to delete a VXLAN tunnel which must first be removed from any BD it is attached:

VAT commands

sw_interface_set_l2_bridge vxlan_tunnel0 bd_id 13 disable
vxlan_add_del_tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 del

VPP commands

set interface l3 vxlan_tunnel0 
create vxlan tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 del

BVI Interface Creation and Setup

Following is the configuration sequence to create a loopback interface, put it into BD 13 as a BVI interface, put it into VRF 5 and assign an IP address with subnet of 6.0.0.250/16:

VAT commands

create_loopback mac 1a:2b:3c:4d:5e:6f
sw_interface_dump
sw_interface_set_flags admin-up loop0
sw_interface_set_l2_bridge loop0 bd_id 13 shg 0 bvi
sw_interface_set_table loop0 vrf 5
sw_interface_add_del_address loop0 6.0.0.250/16

VPP commands

loopback create mac 1a:2b:3c:4d:5e:6f
set interface l2 bridge loop0 13 bvi
set interface state loop0 up
set interface ip table loop0 5
set interface ip address loop0 6.0.0.250/16


BVI Interface Tear-Down and Deletion

Following is the configuration sequence to delete a loopback interface which is the BVI of a BD. Before the deletion, the loopback interface must be first removed from BD together with its IP address/subnet:

VAT commands

sw_interface_add_del_address loop0 6.0.0.250/16 del
sw_interface_set_l2_bridge loop0 bd_id 13 bvi disable
delete_loopback loop0

VPP commands

set interface ip address loop0 del all
set interface l3 loop0
loopback delete loop0


Example Config of BD with BVI/VXLAN-Tunnel/Ethernet-Port

Following is the configuration sequence to create and add a loopback/BVI, a VXLAN tunnel, and an Ethernet interface into a BD:

VAT commands

create_loopback mac 1a:2b:3c:4d:5e:6f
vxlan_add_del_tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 encap-vrf-id 7 decap-next l2
sw_interface_dump
sw_interface_set_flags admin-up loop0
sw_interface_set_flags admin-up GigabitEthernet2/2/0
sw_interface_set_l2_bridge GigabitEthernet2/2/0 bd_id 13 shg 0
sw_interface_set_l2_bridge vxlan_tunnel0 bd_id 13 shg 1
sw_interface_set_l2_bridge loop0 bd_id 13 shg 0 bvi
sw_interface_set_table loop0 vrf 5
sw_interface_add_del_address loop0 6.0.0.250/16

VPP commands

loopback create mac 1a:2b:3c:4d:5e:6f
create vxlan tunnel src 10.0.3.1 dst 10.0.3.3 vni 13 encap-vrf-id 7 decap-next l2
set interface state loop0 up
set interface state GigabitEthernet2/2/0 up
set interface l2 bridge GigabitEthernet2/2/0 13 0
set interface l2 bridge vxlan_tunnel0 13 1
set interface l2 bridge loop0 13 0 bvi
set interface ip table loop0 5
set interface ip address loop0 6.0.0.250/16

Enable/Disable ARP termination of a BD

Following is a configuration example to enable or disable ARP termination of a BD:

VAT commands

 bridge_flags bd_id 11 arp-term
 bridge_flags bd_id 11 arp-term clear

VPP commands

 set bridge-domain arp term 11
 set bridge-domain arp term 11 disable


Add/Delete IP to MAC Entry to a BD for ARP Termination

Following is a configuration example to add or delete an IP to MAC entry to/from a BD:

VAT commands

 bd_ip_mac_add_del bd_id 11 7.0.0.11 11:12:13:14:15:16
 bd_ip_mac_add_del bd_id 11 7.0.0.11 11:12:13:14:15:16 del 

VPP commands

 set bridge-domain arp entry 11 7.0.0.11 11:12:13:14:15:16
 set bridge-domain arp entry 11 7.0.0.11 11:12:13:14:15:16 del


Request MAC to IP Bindings from BDs with ARP Termination

Following is a configuration example to request MAC to BD binding notifications from BDs with ARP termination:

VAT commands

 want_ip4_arp_events address 0.0.0.0 

VPP commands

N/A – only applicable to API


Show Command Output for VXLAN related Information

The following samples show typical output for various VPP show commands. These commands show information relevant to BDs, VXLAN tunnels and stats.


Bridge Domain and Port Info

DBGvpd# show bridge  
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term     BVI-Intf   
  13     0        on         on         on         on         off         loop0    
  11     1        on         on         on         on         on          loop1    

DBGvpd# show bridge 11 detail  
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term     BVI-Intf   
  11     1        on         on         on         on         off         loop1    

           Interface           Index  SHG  BVI        VLAN-Tag-Rewrite       
             loop1               11    0    *               none             
     GigabitEthernet2/1/0        5     0    -               none             

DBGvpd# show bridge 13 detail  
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term     BVI-Intf   
  13     0        on         on         on         on         on          loop0    

           Interface           Index  SHG  BVI        VLAN-Tag-Rewrite       
             loop0               10    0    *               none             
     GigabitEthernet2/2/0        6     0    -               none             
         vxlan_tunnel0           9     1    -               none             

  IP4 to MAC table for ARP Termination
      7.0.0.11       =>   11:12:13:14:15:16 
      7.0.0.10       =>   10:11:12:13:14:15 

DBGvpd# show bridge 13 int  
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term     BVI-Intf   
  13     0        on         on         on         on         on          loop0    

           Interface           Index  SHG  BVI        VLAN-Tag-Rewrite       
             loop0               10    0    *               none             
     GigabitEthernet2/2/0        6     0    -               none             
         vxlan_tunnel0           9     1    -               none             

DBGvpd# show bridge 13 arp  
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term     BVI-Intf   
  13     0        on         on         on         on         on          loop0    

  IP4 to MAC table for ARP Termination
      7.0.0.11       =>   11:12:13:14:15:16 
      7.0.0.10       =>   10:11:12:13:14:15 

VXLAN Tunnel Info

DBGvpd# show vxlan tunnel  
[0] 10.0.3.1 (src) 10.0.3.3 (dst) vni 13 encap_fib_index 1 decap_next l2

Interface Address and Modes

DBGvpd# show interface address  
GigabitEthernet2/1/0 (up):
  l2 bridge bd_id 11 shg 0
GigabitEthernet2/2/0 (up):
  l2 bridge bd_id 13 shg 0
GigabitEthernet2/3/0 (up):
  10.0.3.1/24 table 7
GigabitEthernet2/4/0 (dn):
local0 (dn):
loop0 (up):
  l2 bridge bd_id 13 bvi shg 0
  6.0.0.250/16 table 5
loop1 (up):
  l2 bridge bd_id 11 bvi shg 0
  7.0.0.250/24 table 5
pg/stream-0 (dn):
pg/stream-1 (dn):
pg/stream-2 (dn):
pg/stream-3 (dn):
vxlan_tunnel0 (up):
  l2 bridge bd_id 13 shg 1

Interface Stats

DBGvpd# show interface  

              Name               Idx       State          Counter          Count     
GigabitEthernet2/1/0              5         up       rx packets                    12
                                                     rx bytes                    1138
                                                     tx packets                     9
                                                     tx bytes                     826
GigabitEthernet2/2/0              6         up       tx packets                     2
                                                     tx bytes                      84
GigabitEthernet2/3/0              7         up       rx packets                    12
                                                     rx bytes                    1612
                                                     tx packets                    12
                                                     tx bytes                    1576
                                                     drops                          1
                                                     ip4                           11
GigabitEthernet2/4/0              8        down      
local0                            0        down      
loop0                             10        up       rx packets                    11
                                                     rx bytes                     848
                                                     tx packets                    12
                                                     tx bytes                    1026
                                                     drops                          3
                                                     ip4                            9
loop1                             11        up       rx packets                    12
                                                     rx bytes                     970
                                                     tx packets                     9
                                                     tx bytes                     826
                                                     drops                          3
                                                     ip4                           11
pg/stream-0                       1        down      
pg/stream-1                       2        down      
pg/stream-2                       3        down      
pg/stream-3                       4        down      
vxlan_tunnel0                     9         up       rx packets                    11
                                                     rx bytes                    1002
                                                     tx packets                    12
                                                     tx bytes                    1458

Graph Node Global Counters

DBGvpd# show node counters  
      Count                       Node                         Reason       
               1                arp-input               ARP replies sent
               4                arp-input               ARP replies received
               2              arp-term-l2bd             ARP replies sent
               8                l2-flood                L2 flood packets
              25                l2-input                L2 input packets
              14                l2-learn                L2 learn packets
               2                l2-learn                L2 learn misses
              12                l2-learn                L2 learn hits
              13                l2-output               L2 output packets
               6               vxlan-encap              good packets encapsulated
               5               vxlan-input              good packets decapsulated
               5                 ip4-arp                ARP requests sent


Packet Trace

The following example shows typical output from a packet trace of a ping (an ICMP echo request packet). In the following example, the ping is sent from a port with IP address 7.0.0.2 in a BD (with bd_index 1) to an IP address 6.0.4.4.

A typical VPP command to capture the next 10 packets is:

 trace add dpdk-input 10

The VPP command to show packet trace is:

 show trace

The destination IP of 6.0.4.4 resides in another BD (with bd_index 0) on another server. Thus, the packet was forwarded from the 1st BD the 2nd BD via BVI and then be sent from the 2nd BD via VXLAN tunnel to the other server. The ICMP echo response was received at the local VTEP IP address and then forwarded in the 2nd BD after VXLAN header decap. The packet was then forwarded from the 2nd BD to 1st BD via BVI and finally output on the port with IP address 7.0.0.2.

Look for output from the vxlan-encap node and vxlan-input node by searching for the strings:

  • vxlan-encap
  • vxlan-input
Packet 7

17:39:13:781645: dpdk-input
  GigabitEthernet2/1/0 rx queue 0
  buffer 0x3fdb40: current data 0, length 98, free-list 0, totlen-nifb 0, trace 0x6
  PKT MBUF: port 0, nb_segs 1, pkt_len 98
    buf_len 2304, data_len 98, ol_flags 0x0
  IP4: 00:50:56:88:ca:7e -> de:ad:00:00:00:01
  ICMP: 7.0.0.2 -> 6.0.4.4
    tos 0x00, ttl 64, length 84, checksum 0x9996
    fragment id 0x900d, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xbb77
17:39:13:781653: ethernet-input
  IP4: 00:50:56:88:ca:7e -> de:ad:00:00:00:01
17:39:13:781658: l2-input
  l2-input: sw_if_index 5 dst de:ad:00:00:00:01 src 00:50:56:88:ca:7e
17:39:13:781660: l2-learn
  l2-learn: sw_if_index 5 dst de:ad:00:00:00:01 src 00:50:56:88:ca:7e bd_index 1
17:39:13:781662: l2-fwd
  l2-fwd:   sw_if_index 5 dst de:ad:00:00:00:01 src 00:50:56:88:ca:7e bd_index 1
17:39:13:781664: ip4-input
  ICMP: 7.0.0.2 -> 6.0.4.4
    tos 0x00, ttl 64, length 84, checksum 0x9996
    fragment id 0x900d, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xbb77
17:39:13:781669: ip4-rewrite-transit
  fib: 2 adjacency: loop0
                    005056887b68dead000000000800 flow hash: 0x00000000
  00000000: 005056887b68dead00000000080045000054900d40003f019a96070000020600
  00000020: 04040800bb7719b1000481e02b5500000000aaca0c00000000001011
17:39:13:781671: l2-input
  l2-input: sw_if_index 11 dst 00:50:56:88:7b:68 src de:ad:00:00:00:00
17:39:13:781672: l2-fwd
  l2-fwd:   sw_if_index 10 dst 00:50:56:88:7b:68 src de:ad:00:00:00:00 bd_index 0
17:39:13:781673: l2-output
  l2-output: sw_if_index 9 dst 00:50:56:88:7b:68 src de:ad:00:00:00:00
17:39:13:781675: vxlan-encap
  VXLAN-ENCAP: tunnel 0 vni 13
17:39:13:781678: ip4-rewrite-transit
  fib: 2 adjacency: GigabitEthernet2/3/0
                    IP4: 00:50:56:88:90:33 -> 00:50:56:88:00:ac flow hash: 0x00000000
  IP4: 00:50:56:88:90:33 -> 00:50:56:88:00:ac
  UDP: 10.0.3.1 -> 10.0.3.3
    tos 0x00, ttl 253, length 134, checksum 0xa363
    fragment id 0x0000
  UDP: 29806 -> 4789
    length 114, checksum 0x0000
17:39:13:781681: GigabitEthernet2/3/0-output
  GigabitEthernet2/3/0
  IP4: 00:50:56:88:90:33 -> 00:50:56:88:00:ac
  UDP: 10.0.3.1 -> 10.0.3.3
    tos 0x00, ttl 253, length 134, checksum 0xa363
    fragment id 0x0000
  UDP: 29806 -> 4789
    length 114, checksum 0x0000
17:39:13:781683: GigabitEthernet2/3/0-tx
  GigabitEthernet2/3/0 tx queue 0
  buffer 0x3fdb40: current data -50, length 148, free-list 0, totlen-nifb 0, trace 0x6
  IP4: 00:50:56:88:90:33 -> 00:50:56:88:00:ac
  UDP: 10.0.3.1 -> 10.0.3.3
    tos 0x00, ttl 253, length 134, checksum 0xa363
    fragment id 0x0000
  UDP: 29806 -> 4789
    length 114, checksum 0x0000

Packet 8

17:39:13:781815: dpdk-input
  GigabitEthernet2/3/0 rx queue 0
  buffer 0x18eec0: current data 0, length 148, free-list 0, totlen-nifb 0, trace 0x7
  PKT MBUF: port 2, nb_segs 1, pkt_len 148
    buf_len 2304, data_len 148, ol_flags 0x0
  IP4: 00:50:56:88:00:ac -> 00:50:56:88:90:33
  UDP: 10.0.3.3 -> 10.0.3.1
    tos 0x00, ttl 253, length 134, checksum 0xa363
    fragment id 0x0000
  UDP: 24742 -> 4789
    length 114, checksum 0x0000
17:39:13:781819: ethernet-input
  IP4: 00:50:56:88:00:ac -> 00:50:56:88:90:33
17:39:13:781823: ip4-input
  UDP: 10.0.3.3 -> 10.0.3.1
    tos 0x00, ttl 253, length 134, checksum 0xa363
    fragment id 0x0000
  UDP: 24742 -> 4789
    length 114, checksum 0x0000
17:39:13:781825: ip4-local
  fib: 1 adjacency: local 10.0.3.1/24 flow hash: 0x00000000
17:39:13:781827: ip4-udp-lookup
  UDP: src-port 24742 dst-port 4789
17:39:13:781830: vxlan-input
  VXLAN: tunnel 0 vni 13 next 1 error 0
17:39:13:781832: l2-input
  l2-input: sw_if_index 9 dst de:ad:00:00:00:00 src 00:50:56:88:7b:68
17:39:13:781833: l2-learn
  l2-learn: sw_if_index 9 dst de:ad:00:00:00:00 src 00:50:56:88:7b:68 bd_index 0
17:39:13:781835: l2-fwd
  l2-fwd:   sw_if_index 9 dst de:ad:00:00:00:00 src 00:50:56:88:7b:68 bd_index 0
17:39:13:781836: ip4-input
  ICMP: 6.0.4.4 -> 7.0.0.2
    tos 0x00, ttl 64, length 84, checksum 0x7002
    fragment id 0xf9a1
  ICMP echo_reply checksum 0xc377
17:39:13:781838: ip4-rewrite-transit
  fib: 2 adjacency: loop1
                    00505688ca7edead000000010800 flow hash: 0x00000000
  00000000: 00505688ca7edead00000001080045000054f9a100003f017102060004040700
  00000020: 00020000c37719b1000481e02b5500000000aaca0c00000000001011
17:39:13:781840: l2-input
  l2-input: sw_if_index 10 dst 00:50:56:88:ca:7e src de:ad:00:00:00:01
17:39:13:781842: l2-fwd
  l2-fwd:   sw_if_index 11 dst 00:50:56:88:ca:7e src de:ad:00:00:00:01 bd_index 1
17:39:13:781843: l2-output
  l2-output: sw_if_index 5 dst 00:50:56:88:ca:7e src de:ad:00:00:00:01
17:39:13:781845: GigabitEthernet2/1/0-output
  GigabitEthernet2/1/0
  IP4: de:ad:00:00:00:01 -> 00:50:56:88:ca:7e
  ICMP: 6.0.4.4 -> 7.0.0.2
    tos 0x00, ttl 63, length 84, checksum 0x7102
    fragment id 0xf9a1
  ICMP echo_reply checksum 0xc377
17:39:13:781846: GigabitEthernet2/1/0-tx
  GigabitEthernet2/1/0 tx queue 0
  buffer 0x18eec0: current data 50, length 98, free-list 0, totlen-nifb 0, trace 0x7
  IP4: de:ad:00:00:00:01 -> 00:50:56:88:ca:7e
  ICMP: 6.0.4.4 -> 7.0.0.2
    tos 0x00, ttl 63, length 84, checksum 0x7102
    fragment id 0xf9a1
  ICMP echo_reply checksum 0xc377


Restrictions

The following list of features might be useful to add in order to improve VXLAN functionality. However, these features are not supported in the current implementation:

  • VXLAN multicast mode using IP multicast to flood VXLAN segments to other VTEPs (receiving of IP multicast packets from other VTEPs can be supported).
  • MAC aging to clear learned MAC entries which may be stale after a timeout period.