VPP/How to add a tunnel encapsulation

From fd.io
< VPP
Jump to: navigation, search

Introduction

The vpp engine currently supports a number of different tunnel encapsulations. They're all slightly different under the covers, which is not a Good Thing. Strategically, the ip6/l2tpv3 tunnel implementation is probably the closest thing in the source base to a "correct" tunnel implementation. This note is an attempt to codify what we should build moving forward.

Each tunnel type needs an encap path, a decap path, and [most likely] a set of L3 adjacencies to do the job.

Each tunnel instance needs an interface

I resisted this scheme, up to the point where the vpp engine grew an L2 path. At L3, traffic destined for a tunnel needs a FIB entry, and an appropriate adjacency rewrite. At L3, that's sufficient. The L2 path expects to cross-connect or bridge interfaces, which means that we either create an interface or make a complete mess.

The tunnel interface should have something like the following device class and hardware interface class definitions. Aside from the tunnel name formatter, not much doing:

static u8 * format_l2tpv3_name (u8 * s, va_list * args)
{
  u32 dev_instance = va_arg (*args, u32);
  return format (s, "l2tpv3_tunnel%d", dev_instance);
}

static uword dummy_interface_tx (vlib_main_t * vm,
                                 vlib_node_runtime_t * node,
                                 vlib_frame_t * frame)
{
  clib_warning ("you shouldn't be here, leaking buffers...");
  return frame->n_vectors;
}

VNET_DEVICE_CLASS (l2tpv3_device_class,static) = {
  .name = "L2TPv3",
  .format_device_name = format_l2tpv3_name,
  .tx_function = dummy_interface_tx,
};

static u8 * format_l2tp_header_with_length (u8 * s, va_list * args)
{
  u32 dev_instance = va_arg (*args, u32);
  s = format (s, "unimplemented dev %u", dev_instance);
  return s;
}

static uword dummy_set_rewrite (vnet_main_t * vnm,
                                u32 sw_if_index,
                                u32 l3_type,
                                void * dst_address,
                                void * rewrite,
                                uword max_rewrite_bytes)
{
  return 0;
}

VNET_HW_INTERFACE_CLASS (l2tpv3_hw_class) = {
  .name = "L2TPV3",
  .format_header = format_l2tp_header_with_length,
  .set_rewrite = dummy_set_rewrite,
};

Stash the tunnel instance object index

Presumably, each tunnel has some sort of instance data. Put its pool index into (one or both of) the vnet hardware dev and hw instance(s):

hw_if_index = vnet_register_interface (vnm, l2tpv3_device_class.index, s - lm->sessions,
                                       l2tpv3_hw_class.index, s - lm->sessions);

hi = vnet_get_hw_interface (vnm, hw_if_index);
hi->dev_instance = s - lm->sessions;
hi->hw_instance = s - lm->sessions;

Set up the tunnel interface output node index

Each tunnel interface needs an interface transmit routine. Set it up:

hi->output_node_index = tunnel_encap_node.index;

Construct a free tunnel index vector

Since vnet interfaces are not actually destroyed, construct a free tunnel index vector and add "deleted" tunnel hw_if_indices to it when deleting a tunnel.

Remember to shut down the tunnel interface so that any traffic being sent to it will be discarded:

vnet_sw_interface_set_flags (vnm, hi->sw_if_index, 0 /* admin down */);

Encapsulation Path

Retrieve the tunnel instance from hi->dev_instance:

sw_if_index = vnet_buffer(b)->sw_if_index[VLIB_TX];
hi = vnet_get_sup_hw_interface (rt->vnet_main, sw_if_index);
session_index = hi->dev_instance;

Consider using a 1-wide cache. The encap path processes vectors aimed at individual tunnels, so the cache should hit 99% of the time.

Using the session object, add tunnel encap prior to bN->current_data. Techniques: manually paint an (unrouted) ip4/ip6 header, or apply and fix up a vnet_rewrite_t from the session object. Don't forget to deal with checksums.

When using a rewrite, the traditional method involves precomputing the header checksum with the length field set to zero. Then, fix the checksum in the speed path. Example:

sum0 = ip0->checksum;
old_l0 = 0;  /* old_l0 always 0, see the rewrite setup... */
new_l0 = clib_host_to_net_u16 (vlib_buffer_length_in_chain (vm, b0));
 
sum0 = ip_csum_update (sum0, old_l0, new_l0, ip4_header_t,
                       length /* changed member */);
 
ip0->checksum = ip_csum_fold (sum0);
ip0->length = new_l0;

Push newly tunnel-encapsulated, unrouted packets forward in the graph, to ip4-lookup or ip6-lookup - as appropriate.

Decapsulation Path

Register the decap node with the vpp L3 local stack. If tunnel pkts in need of decapsulation arrive in ip forus packets with a specific protocol - the l2tpv3 case - register as follows:

 ip6_register_protocol (IP_PROTOCOL_L2TP, l2t_decap_node.index);

If packets arrive as udp-ip4 forus packets, the vxlan case, register this way:

 udp_register_dst_port (vm, UDP_DST_PORT_vxlan, vxlan_udp_node.index, 1 /* is_ip4 */);

In the decap node itself, look up the tunnel session object. The decap path MUST NOT decapsulate packets which lack corresponding valid tunnel objects. Drop-and-count.

Remember to actually pop the tunnel header. Something like this:

vlib_buffer_advance(b0, sizeof(popped_encap)).

When cross-connected or bridging at L2, the decap path must set vnet_buffer(b0)->l2.l2_len. We will turn the following incantation into an inline function any second now; here are the details:

vnet_buffer(b)->l2.l2_len = sizeof(ethernet_header_t);
ethertype = clib_net_to_host_u16(eth->type);
if ((ethertype == ETHERNET_TYPE_VLAN) ||
    (ethertype == ETHERNET_TYPE_DOT1AD) ||
    (ethertype == ETHERNET_TYPE_VLAN_9100) ||
    (ethertype == ETHERNET_TYPE_VLAN_9200)) {    
    ethernet_vlan_header_t * vlan;
    vnet_buffer(b)->l2.l2_len += sizeof (vlan);
    vlan = (void *) (eth+1);
    ethertype = clib_net_to_host_u16 (vlan->type);
    if (ethertype == ETHERNET_TYPE_VLAN) {
        vnet_buffer(b)->l2.l2_len += sizeof (vlan);
    }

If forwarding newly decapsulated traffic at L2, send packet to l2-input. At L3, send to ip4-lookup / ip6-lookup; remember to set vnet_buffer(b0)->sw_if_index[VLIB_TX] to the desired lookup FIB index based on data in the tunnel's session object.

Adjacency and connection setup

When operating at L3, give the tunnel interface an address via the use the vl_api_sw_interface_add_del API, or via debug CLI:

set int ip address xxx_tunnelN 192.168.1.1/24

At L2, cross-connect or bridge the tunnel interface, via the vl_api_sw_interface_set_l2_xconnect API for cross-connection or the vl_api_sw_interface_set_l2_bridge API for bridging. If bridinging, place tunnel interfaces into a non-default split-horizon group (e.g. 1).

Dbarach (talk) 11:27, 28 March 2016 (UTC)