VPP/SecurityGroups

From fd.io
< VPP
Revision as of 10:19, 7 November 2016 by Ayourtch (Talk | contribs)

Jump to: navigation, search

VPP Security Groups

Introduction

Features are tracked as they are developed in the following VPP-427.

Initial development is done on github: ACL branch

ACL Node architecture

Requirements

  • Support classifiers/filters on any interface type (bridged / routed)
  • Filter on IP-addresses with address mask or prefix length (IPv4 and IPv6)
  • Filter on source and destination TCP/UDP port ranges
  • Filter on source and destination L2 MAC addresses
  • Support IPv6 with extension headers present
  • Support fragmented packets and unknown transport layer headers
  • Combinations of the above filters (e.g. MAC + IP)
  • Filters on ingress and egress interfaces
  • Stateful firewall. No application layer filtering.

Work list

Task Owner Priority Status Description
API definition Ole 0 WIP VPP-513
Connection tracker Andrew 0 WIP VPP-514
Stateful ACLs 0 VPP-515
ACL policy matching node (MVP) Andrew 0 Done input output
Direct classifier policy matching 0
Control Plane tests 0
Data Plane tests 0


 1. Python tests/examples -> Ole + Pavel
 2a. IPv4 matching in all plugin -> Andrew
 2b. make it “deny by default”
 3. Performance testing -> Pavel ?
 --- MVP ---
 4. Plumbing for stateful sessions from ACL plugin (to be able to specify “match and track” (“permit and create the forward/return session”) -> Andrew
 5. L2 rules - TBD
 6. ACL/Sessions support for L3 (routed) mode - (big)!
 7. Can we implement the ACL match purely in terms of classifier tables ? How expensive/(in)efficient that would be ?
 8. Extension header handling during the slow path lookup - easy in ACL plugin
 9. classifier match for the sessions with extension headers - currently no extension headers supported

API

/*
 * Access List Rule entry
 *
 * Future considerations:
 * u32 proto_flags;
 * u8 traffic_class;
 * u32 flow_label;
 * u32 extension_header_present;
 * u8 port_range_operator
 */
typeonly define acl_rule
{
 u8 is_permit;
 u8 is_ipv6;
 u8 src_ip_addr[16];
 u8 src_ip_prefix_len;
 u8 dst_ip_addr[16];
 u8 dst_ip_prefix_len;
 u8 proto;
 u16 src_port;
 u16 dst_port;
};
define acl_add
{
 u32 client_index;
 u32 context;
 u32 count;
 vl_api_acl_rule r[count];
};
define acl_add_reply
{
 u32 context;
 u32 acl_index
 i32 retval;
};
define acl_del
{
 u32 client_index;
 u32 context;
 u32 acl_index
};
define acl_del_reply
{
 u32 context;
 i32 retval;
};
define acl_interface_add_del
{
 u32 client_index;
 u32 context;
 u8 is_add;
 u8 is_input;
 u32 sw_if_index;
 u32 acl_index;
}
define acl_interface_add_del_reply
{
 u32 context;
 i32 retval;
};
define acl_dump
{
 u32 client_context;
 u32 context;
 u32 sw_if_index; /* ~0 for all tunnels */
}
define acl_details
{
 u32 context;
 u32 sw_if_index;
 u32 acl_index;
 u32 count;
 vl_api_acl_rule_t r[count];
}

TBD: L2 API

/**
  Add or delete MAC / IP ingress filter. 
  These rules restrict the MAC addresses that can send the traffic. 
  If the ip_address is all-zero, any IP address is allowed and only
  the MAC address is used for the ingress filtering.
  There can be many MAC addresses on a given interface,
  a given MAC address may have multiple addresses associated with it 
  (by means of separate ingress rules), and different MAC addresses can also have the same addresses.  
*/
  
  
define ip_apr_macip_add_del_ingress
{
       u32 client_index;
       u32 context;
       u32 sw_if_index;
       u8 is_add;
       u8 is_ipv6;
       u8 mac_address[6];
};
  
/** 
  @param context            - sender context, to match reply w/response
  @param retval             - return code for the request
*/
  
define ip_apr_macip_add_del_ingress_reply
{
       u32 context;
       i32 retval;
};

Design and prototyping

The stateful design is being prototyped in https://github.com/vpp-dev/vpp-lua-plugin/blob/master/samples/polua.lua

The goal for this prototype was to minimize the amount of changes to the main forwarding path and explore for the later possible optimizations.

Also one of the primary design criteria is to avoid creating a separate forwarding path as much as possible.

The main idea with the stateful design is to use the L2 classifier for storing the sessions.

For this, we create two chained tables per interface per direction: TCP/UDP then ICMP, and hook them into the processing path of the packet.

If the session is not in the table, it means we need to do the policy check - thus the miss_next index of the ICMP table is set to one of the nodes taking care of the policy checks: there are four of them because of {ingress/egress, ip4/ip6}.

Each of the nodes is very simple: it checks the policy and if the policy permits the packet, then it adds the session and recirculates the packet back into the lookup - then that packet will hit the session and be processed by "fast path".

For the purposes of this document we refer to the policy check path as "slow path" and the path using the established state as "fast path" even as you see if it is a bit of a misnomer - the "slow path" does not really pass the packets through the box, it merely sets up the fast path and recirculates the packets to hit it.

Besides adding the forward flow, if there is a policy in the reverse direction, then the slow path also sets up the mirror flow in the tables of the opposite direction - so as to avoid having to do the policy check for the return packets of the flow. The only type of ICMP that are considered to have the "return" packets are echo/echo-reply.

When the ingress packet processing is done, the forwarding is done as usual by VPP, and then the similar check against the flow table is done on egress in the l2-output-lookup - if there is a policy applied. Again, the missing session results to a redirect to a "slow path" node, which inserts a session and a return session, and recirculates the packet.

This highlights a particularity - if there is a policy in one direction that is other than "permit everything" and has some deny rules, then for the proper functioning, there needs to be a "permit everything" policy applied in the opposite direction on the same interface - so that the return packets did not hit the policy lookup. However, this can be easily hidden from the user by implementation, so is probably not a big problem.

However, some more distinct shortcomings:

1) not very frugal about the memory. With policies applied, each connection consumes 4 session slots. How bad is it ? Certainly not very cache friendly. On the other hand this kind of approach could handle in the future even the features that change the packet.

An idea for possible optimizations: use the same tables for in+out on the same interface, and insert two sessions. This might be questionable from the security standpoint in some corner cases.

2) no TCP state tracking nor UDP timeout tracking.

3) No any cleanup at all for the classifier tables. Only additions are performed. this MUST be taken care of and is TBD. Note that it is intentionally separate from (2), because it covers the scenarios like just simple high resource utilization as well.

4) No support for IPv6 EH or IPv4 fragments. This is a general issue with using the "simple" bitmask/match type of classifier, and so far the solution is is TBD.

5) two checks of policy as opposed to one.

An idea for possible optimization: have the ingress path merely set the flag "need to check the policy" and perform all the policy checking on egress - including the flow - thus combining both of the policies lookup. If the packets are guaranteed to never be translated, then this can be a possible strategy. However, this means an activation of ingress policy MUST trigger the activation of the check on egress on all the interfaces within the forwarding domain. And if the egress interface has a deny-something in the inbound direction, then still a reverse flow check must be done.

A possible optimized implementation that will take care to some extent of (1) and (5):

Perform a flow check on ingress.

If the flow exists, retrieve the egress interface from the flow record, and mark the packet "check the TX interface before output".

If the flow does not exist, mark the packet "Check the policy", and forward as usual.

On egress, if "Check the TX interface" is set - verify that the previously saved TX interface matches. if "Check the policy" matches, then check the policy - if it does not permit the traffic then drop the packet. Else, create an inbound flow using a mirrored information from the packet, and send the packet along.

As an example of how it is prototyped today, below goes the example trace of the ICMP flow permitted by the policy - first and second packet.



------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:12:27:661397: af-packet-input
  af_packet: hw_if_index 1 next-index 1
    tpacket2_hdr:
      status 0x20000001 len 98 snaplen 98 mac 66 net 80
      sec 0x5804fd0a nsec 0x2660d6db vlan 0
00:12:27:661438: ethernet-input
  IP4: 7a:01:9a:05:7b:b7 -> ca:52:50:fb:e5:82
00:12:27:661449: l2-input
  l2-input: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7
00:12:27:661455: l2-input-classify
  l2-classify: sw_if_index 1, table 18, offset 0, next 16
00:12:27:661462: lua-polua-ip4-input
  LUA_plugin: sw_if_index 1, next index 1
00:12:27:661592: l2-input-classify
  l2-classify: sw_if_index 1, table 18, offset c0, next 11
00:12:27:661595: l2-learn
  l2-learn: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 bd_index 1
00:12:27:661602: l2-fwd
  l2-fwd:   sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 bd_index 1
00:12:27:661607: l2-output
  l2-output: sw_if_index 2 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7
00:12:27:661611: l2-output-classify
  l2-classify: sw_if_index 2, table 22, offset 0, next 6
00:12:27:661616: lua-polua-ip4-output
  LUA_plugin: sw_if_index 1, next index 1
00:12:27:661718: l2-output-classify
  l2-classify: sw_if_index 2, table 22, offset c0, next 9
00:12:27:661727: host-s0_s2-output
  host-s0_s2
  IP4: 7a:01:9a:05:7b:b7 -> ca:52:50:fb:e5:82
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x37d5
    fragment id 0x7ed0, flags DONT_FRAGMENT
  ICMP echo_request checksum 0x8b86

Packet 2

00:12:27:661784: af-packet-input
  af_packet: hw_if_index 2 next-index 1
    tpacket2_hdr:
      status 0x20000001 len 98 snaplen 98 mac 66 net 80
      sec 0x5804fd0a nsec 0x2660d6db vlan 0
00:12:27:661790: ethernet-input
  IP4: ca:52:50:fb:e5:82 -> 7a:01:9a:05:7b:b7
00:12:27:661795: l2-input
  l2-input: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82
00:12:27:661798: l2-input-classify
  l2-classify: sw_if_index 2, table 26, offset c0, next 11
00:12:27:661802: l2-learn
  l2-learn: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 bd_index 1
00:12:27:661803: l2-fwd
  l2-fwd:   sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 bd_index 1
00:12:27:661807: l2-output
  l2-output: sw_if_index 1 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82
00:12:27:661809: host-s0_s1-output
  host-s0_s1
  IP4: ca:52:50:fb:e5:82 -> 7a:01:9a:05:7b:b7
  ICMP: 192.0.2.2 -> 192.0.2.1
    tos 0x00, ttl 64, length 84, checksum 0x9113
    fragment id 0x6592
  ICMP echo_reply checksum 0x9386

Packet 3


00:12:28:663937: af-packet-input
  af_packet: hw_if_index 1 next-index 1
    tpacket2_hdr:
      status 0x20000001 len 98 snaplen 98 mac 66 net 80
      sec 0x5804fd0b nsec 0x275a0081 vlan 0
00:12:28:664014: ethernet-input
  IP4: 7a:01:9a:05:7b:b7 -> ca:52:50:fb:e5:82
00:12:28:664030: l2-input
  l2-input: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7
00:12:28:664040: l2-input-classify
  l2-classify: sw_if_index 1, table 18, offset c0, next 11
00:12:28:664053: l2-learn
  l2-learn: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 bd_index 1
00:12:28:664060: l2-fwd
  l2-fwd:   sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 bd_index 1
00:12:28:664071: l2-output
  l2-output: sw_if_index 2 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7
00:12:28:664078: l2-output-classify
  l2-classify: sw_if_index 2, table 22, offset c0, next 9
00:12:28:664086: host-s0_s2-output
  host-s0_s2
  IP4: 7a:01:9a:05:7b:b7 -> ca:52:50:fb:e5:82
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x3797
    fragment id 0x7f0e, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc145

Packet 4

00:12:28:664164: af-packet-input
  af_packet: hw_if_index 2 next-index 1
    tpacket2_hdr:
      status 0x20000001 len 98 snaplen 98 mac 66 net 80
      sec 0x5804fd0b nsec 0x275a0081 vlan 0
00:12:28:664172: ethernet-input
  IP4: ca:52:50:fb:e5:82 -> 7a:01:9a:05:7b:b7
00:12:28:664179: l2-input
  l2-input: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82
00:12:28:664184: l2-input-classify
  l2-classify: sw_if_index 2, table 26, offset c0, next 11
00:12:28:664191: l2-learn
  l2-learn: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 bd_index 1
00:12:28:664193: l2-fwd
  l2-fwd:   sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 bd_index 1
00:12:28:664198: l2-output
  l2-output: sw_if_index 1 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82
00:12:28:664201: host-s0_s1-output
  host-s0_s1
  IP4: ca:52:50:fb:e5:82 -> 7a:01:9a:05:7b:b7
  ICMP: 192.0.2.2 -> 192.0.2.1
    tos 0x00, ttl 64, length 84, checksum 0x904b
    fragment id 0x665a
  ICMP echo_reply checksum 0xc945

CLI

set interface input acl intfc <int> [ip4-table <index>] [ip6-table <index>] [l2-table <index>] [del] 
show inacl type [ip4|ip6|l2]
classify table [miss-next|l2-miss_next|acl-miss-next <next_index>] mask <mask-value> buckets <nn> [skip <n>] [match <n>] [del]
show classify tables [index <nn>]
classify session [hit-next|l2-hit-next|acl-hit-next <next_index>|policer-hit-next <policer_name>] table-index <nn> match [hex] [l2] [l3 ip4] [opaque-index <index>]
test classify [src <ip>] [sessions <nn>] [buckets <nn>] [table <nn>] [del]
set ip classify intfc <int> table-index <index>
set interface ip6 table <intfc> <table-id>
set interface l2 input classify intfc <interface-name> [ip4-table <n>] [ip6-table <n>] [other-table <n>]
set interface l2 output classify intfc <<interface-name>> [ip4-table <n>] [ip6-table <n>] [other-table <n>]
set ip source-and-port-range-check
show ip source-and-port-range-check vrf <nn> <ip-addr> <port>

Examples

YANG model

Open Issues

  • Security Group use case specific API. Done in VPP or control plane plugin?

Existing functionality

The existing functionality has a classifier (https://wiki.fd.io/view/VPP/Introduction_To_N-tuple_Classifiers) matching.

As the above document explains, the classifier is a series of chained tables, with each table having a specific mask, but this mask is the same for all entries.

This has been tested to happen in the L2 bridged case (test case: http://stdio.be/vpp/t/aytest-bridge-tap-py.txt).

Therefore, if we have an example policy:

 nova secgroup-create test-secgroup test
 nova secgroup-add-rule test-secgroup icmp -1 -1 0.0.0.0/0
 nova secgroup-add-rule test-secgroup tcp 22 22 0.0.0.0/0

So, assuming we match with offset 0 (from the beginning of the packet) the mask will look like this for the first line:

 000000000000 000000000000 0000 00 00 0000 0000 0000 00 FF 0000 00000000 00000000  00 00 0000 0000 
   eth dst      eth src    et   ihl t  len id    fo ttl pr  cs   ip4src   ip4dst    t  c  cs   id
   +-------- L2 ---------------+----------- L3 IPv4 ------------------------------+--------L4 ICMP -----+

For the TCP matching on port 22 it will look as follows:

 000000000000 000000000000 0000 00 00 0000 0000 0000 00 FF 0000 00000000 00000000  0000 FFFF 00000000 00000000 0000 0000 0000 0000
   eth dst      eth src    et   ihl t  len id    fo ttl pr  cs   ip4src   ip4dst    sp  dp    seq      ack      fl  win   cs   urg
   +-------- L2 ---------------+----------- L3 IPv4 ------------------------------+--------L4 TCP ---------------------------------+


(One would need to round up the number of bytes to the nearest 16-byte boundary that makes sense)

For IPv6 assuming no extension headers, it will look similar, with the L3 header being the IPv6 one:


 000000000000 000000000000 0000 0 00 00000 0000 FF 00 00000000000000000000000000000000 00000000000000000000000000000000 00 00 0000 0000 
   eth dst      eth src    et   v TC  fll  len  nh hl             ipv6 src                   ipv dst                    t  c  cs   id
   +-------- L2 ---------------+----------- L3 IPv6 --------------------------------------------------------------------+--------L4 ICMP -----+

For the TCP matching on port 22 it will look as follows:

 000000000000 000000000000 0000 0 00 00000 0000 FF 00 00000000000000000000000000000000 00000000000000000000000000000000 0000 FFFF 00000000 00000000 0000 0000 0000 0000
   eth dst      eth src    et   v TC  fll  len  nh hl             ipv6 src                   ipv dst                      sp  dp    seq      ack      fl  win   cs   urg
   +-------- L2 ---------------+----------- L3 IPv6 --------------------------------------------------------------------+--------L4 TCP ---------------------------------


Then using these masks one would create 4 tables, by using the API call:

 classify_add_del_table(is_add=1, skip_n_vectors=0, mask=<MMMM>, match_n_vectors=<NNNN>,nbuckets=32,memory_size=20000, next_table_index=-1, miss_next_index=-1)

Let's call these tables "IPv4PROTO", "IPv4PROTO_TCPDPORT", "IPv6PROTO", "IPv6PROTO_TCPDPORT".

One would mention "IPv4PROTO" table as "next_table_index" table for "IPv4PROTO_TCPDPORT", and "IPv6PROTO" as "next_table_index" table for IPv6PROTO_TCPDPORT table.

Then one needs to populate the tables with the correct matches for "ICMP" and "tcp dst port 22". That can be done using API call:

 classify_add_del_session(is_add=1, table_index=<XXXX>, match=<bytes-to-match>, hit-next-index -1)

The bytes "XXXX" above would be the match of one or several vectors, corresponding to the packet contents with the desired value.

WARNING: if the "skip" is nonzero in the table configuration, the match is still the entire bitstring, without skipping any leading bytes !!!

Then one would apply the IPv4PROTO_TCPDPORT and IPv6PROTO_TCPDPORT as l2 input classify tables.

The CLI for that is set interface l2 output classify intfc <name> ip[46]-table <tableid>.

The API for this is

  classify_set_interface_l2_tables(sw_if_index=<INTFC>, ip4_table_index=<IPv4PROTO_TCPDPORT>, ip6_table_index=<IPv6PROTO_TCPDPORT>, other_table_index=-1, is_input=0)


This would allow to create a unidirectional policy, assuming the other policy is "permit all" it would be fine. If not - then a mirror table entries will need to be created using the same logic.

The full script showing this process in detail using the python API is at http://stdio.be/vpp/t/classifier_script_simple_policy.txt

The Java API is located in $ROOT/vpp-api/java..

References