Difference between revisions of "VPP/SecurityGroups"
(→API) |
(→TBD: L2 API) |
||
Line 93: | Line 93: | ||
[https://github.com/vpp-dev/vpp/blob/acl/plugins/acl-plugin/acl/acl.api API file as implemented] | [https://github.com/vpp-dev/vpp/blob/acl/plugins/acl-plugin/acl/acl.api API file as implemented] | ||
− | == | + | == MACIP (formerly "L2") API == |
− | + | MACIP (renamed to avoid confusion) is an ingress-only ACL which permits the traffic based on a mix of MAC and IP address matches. | |
− | + | ||
− | + | The use of this mechanism is to prevent spoofing. | |
− | + | ||
− | + | [https://github.com/vpp-dev/vpp/blob/acl/plugins/acl-plugin/acl/acl.api#L133-L191 API file as implemented] | |
− | + | ||
− | + | API as implemented supports MAC address masks and prefixes, however, be aware: the current implementation is done using chained classifier tables, | |
− | + | so each variation of the masks/prefix lengths means an extra table and hence the performance impact. | |
− | + | ||
− | + | These filters are per-packet so you will want to care for performance. | |
− | + | ||
− | + | For best performance, use the exact match MAC mask (ff:ff:ff:ff:ff:ff) and the maximum prefix length (/32 for IPv4 and /128 for IPv6). | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
== Design and prototyping == | == Design and prototyping == |
Revision as of 09:23, 17 November 2016
Contents
VPP Security Groups
Introduction
Features are tracked as they are developed in the following VPP-427.
Initial development is done on github: ACL branch
Requirements
- Support classifiers/filters on any interface type (bridged / routed)
- Filter on IP-addresses with address mask or prefix length (IPv4 and IPv6)
- Filter on source and destination TCP/UDP port ranges
- Filter on source and destination L2 MAC addresses
- Support IPv6 with extension headers present
- Support fragmented packets and unknown transport layer headers
- Combinations of the above filters (e.g. MAC + IP)
- Filters on ingress and egress interfaces
- Stateful firewall. No application layer filtering.
Work list
Task | Owner | Priority | Status | Description |
---|---|---|---|---|
API definition | Ole | 0 | WIP | VPP-513 |
Connection tracker | Andrew | 0 | WIP | VPP-514 |
Stateful ACLs | 0 | VPP-515 | ||
ACL policy matching node (MVP) | Andrew | 0 | Done | input output |
Direct classifier policy matching | - | |||
Control Plane test code (new framework) | Pavel | 0 | WIP | |
Data Plane tests (performance + scale) | 0 |
1. Python tests/examples -> Ole + Pavel 2a. IPv4 matching in all plugin -> Andrew - done. 2b. make it “deny by default” -> Andrew - done. 3. Performance testing -> Pavel ? --- MVP --- 4a. Plumbing for stateful sessions from ACL plugin (to be able to specify “match and track” (“permit and create the forward/return session”) -> Andrew - done. 4b. Stateful session tracking -> Andrew 5. L2 rules - TBD 6. ACL/Sessions support for L3 (routed) mode - (big)! 7. Can we implement the ACL match purely in terms of classifier tables ? How expensive/(in)efficient that would be ? 8. Extension header handling during the slow path lookup - easy in ACL plugin 9. classifier match for the sessions with extension headers - currently no extension headers supported
API
MACIP (formerly "L2") API
MACIP (renamed to avoid confusion) is an ingress-only ACL which permits the traffic based on a mix of MAC and IP address matches.
The use of this mechanism is to prevent spoofing.
API as implemented supports MAC address masks and prefixes, however, be aware: the current implementation is done using chained classifier tables, so each variation of the masks/prefix lengths means an extra table and hence the performance impact.
These filters are per-packet so you will want to care for performance.
For best performance, use the exact match MAC mask (ff:ff:ff:ff:ff:ff) and the maximum prefix length (/32 for IPv4 and /128 for IPv6).
Design and prototyping
The stateful design is being prototyped in https://github.com/vpp-dev/vpp-lua-plugin/blob/master/samples/polua.lua
The goal for this prototype was to minimize the amount of changes to the main forwarding path and explore for the later possible optimizations.
Also one of the primary design criteria is to avoid creating a separate forwarding path as much as possible.
The main idea with the stateful design is to use the L2 classifier for storing the sessions.
For this, we create two chained tables per interface per direction: TCP/UDP then ICMP, and hook them into the processing path of the packet.
If the session is not in the table, it means we need to do the policy check - thus the miss_next index of the ICMP table is set to one of the nodes taking care of the policy checks: there are four of them because of {ingress/egress, ip4/ip6}.
Each of the nodes is very simple: it checks the policy and if the policy permits the packet, then it adds the session and recirculates the packet back into the lookup - then that packet will hit the session and be processed by "fast path".
For the purposes of this document we refer to the policy check path as "slow path" and the path using the established state as "fast path" even as you see if it is a bit of a misnomer - the "slow path" does not really pass the packets through the box, it merely sets up the fast path and recirculates the packets to hit it.
Besides adding the forward flow, if there is a policy in the reverse direction, then the slow path also sets up the mirror flow in the tables of the opposite direction - so as to avoid having to do the policy check for the return packets of the flow. The only type of ICMP that are considered to have the "return" packets are echo/echo-reply.
When the ingress packet processing is done, the forwarding is done as usual by VPP, and then the similar check against the flow table is done on egress in the l2-output-lookup - if there is a policy applied. Again, the missing session results to a redirect to a "slow path" node, which inserts a session and a return session, and recirculates the packet.
This highlights a particularity - if there is a policy in one direction that is other than "permit everything" and has some deny rules, then for the proper functioning, there needs to be a "permit everything" policy applied in the opposite direction on the same interface - so that the return packets did not hit the policy lookup. However, this can be easily hidden from the user by implementation, so is probably not a big problem.
However, some more distinct shortcomings:
1) not very frugal about the memory. With policies applied, each connection consumes 4 session slots. How bad is it ? Certainly not very cache friendly. On the other hand this kind of approach could handle in the future even the features that change the packet.
An idea for possible optimizations: use the same tables for in+out on the same interface, and insert two sessions. This might be questionable from the security standpoint in some corner cases.
2) no TCP state tracking nor UDP timeout tracking.
3) No any cleanup at all for the classifier tables. Only additions are performed. this MUST be taken care of and is TBD. Note that it is intentionally separate from (2), because it covers the scenarios like just simple high resource utilization as well.
4) No support for IPv6 EH or IPv4 fragments. This is a general issue with using the "simple" bitmask/match type of classifier, and so far the solution is is TBD.
5) two checks of policy as opposed to one.
An idea for possible optimization: have the ingress path merely set the flag "need to check the policy" and perform all the policy checking on egress - including the flow - thus combining both of the policies lookup. If the packets are guaranteed to never be translated, then this can be a possible strategy. However, this means an activation of ingress policy MUST trigger the activation of the check on egress on all the interfaces within the forwarding domain. And if the egress interface has a deny-something in the inbound direction, then still a reverse flow check must be done.
A possible optimized implementation that will take care to some extent of (1) and (5):
Perform a flow check on ingress.
If the flow exists, retrieve the egress interface from the flow record, and mark the packet "check the TX interface before output".
If the flow does not exist, mark the packet "Check the policy", and forward as usual.
On egress, if "Check the TX interface" is set - verify that the previously saved TX interface matches. if "Check the policy" matches, then check the policy - if it does not permit the traffic then drop the packet. Else, create an inbound flow using a mirrored information from the packet, and send the packet along.
As an example of how it is prototyped today, below goes the example trace of the ICMP flow permitted by the policy - first and second packet.
------------------- Start of thread 0 vpp_main ------------------- Packet 1 00:12:27:661397: af-packet-input af_packet: hw_if_index 1 next-index 1 tpacket2_hdr: status 0x20000001 len 98 snaplen 98 mac 66 net 80 sec 0x5804fd0a nsec 0x2660d6db vlan 0 00:12:27:661438: ethernet-input IP4: 7a:01:9a:05:7b:b7 -> ca:52:50:fb:e5:82 00:12:27:661449: l2-input l2-input: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 00:12:27:661455: l2-input-classify l2-classify: sw_if_index 1, table 18, offset 0, next 16 00:12:27:661462: lua-polua-ip4-input LUA_plugin: sw_if_index 1, next index 1 00:12:27:661592: l2-input-classify l2-classify: sw_if_index 1, table 18, offset c0, next 11 00:12:27:661595: l2-learn l2-learn: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 bd_index 1 00:12:27:661602: l2-fwd l2-fwd: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 bd_index 1 00:12:27:661607: l2-output l2-output: sw_if_index 2 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 00:12:27:661611: l2-output-classify l2-classify: sw_if_index 2, table 22, offset 0, next 6 00:12:27:661616: lua-polua-ip4-output LUA_plugin: sw_if_index 1, next index 1 00:12:27:661718: l2-output-classify l2-classify: sw_if_index 2, table 22, offset c0, next 9 00:12:27:661727: host-s0_s2-output host-s0_s2 IP4: 7a:01:9a:05:7b:b7 -> ca:52:50:fb:e5:82 ICMP: 192.0.2.1 -> 192.0.2.2 tos 0x00, ttl 64, length 84, checksum 0x37d5 fragment id 0x7ed0, flags DONT_FRAGMENT ICMP echo_request checksum 0x8b86 Packet 2 00:12:27:661784: af-packet-input af_packet: hw_if_index 2 next-index 1 tpacket2_hdr: status 0x20000001 len 98 snaplen 98 mac 66 net 80 sec 0x5804fd0a nsec 0x2660d6db vlan 0 00:12:27:661790: ethernet-input IP4: ca:52:50:fb:e5:82 -> 7a:01:9a:05:7b:b7 00:12:27:661795: l2-input l2-input: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 00:12:27:661798: l2-input-classify l2-classify: sw_if_index 2, table 26, offset c0, next 11 00:12:27:661802: l2-learn l2-learn: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 bd_index 1 00:12:27:661803: l2-fwd l2-fwd: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 bd_index 1 00:12:27:661807: l2-output l2-output: sw_if_index 1 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 00:12:27:661809: host-s0_s1-output host-s0_s1 IP4: ca:52:50:fb:e5:82 -> 7a:01:9a:05:7b:b7 ICMP: 192.0.2.2 -> 192.0.2.1 tos 0x00, ttl 64, length 84, checksum 0x9113 fragment id 0x6592 ICMP echo_reply checksum 0x9386 Packet 3 00:12:28:663937: af-packet-input af_packet: hw_if_index 1 next-index 1 tpacket2_hdr: status 0x20000001 len 98 snaplen 98 mac 66 net 80 sec 0x5804fd0b nsec 0x275a0081 vlan 0 00:12:28:664014: ethernet-input IP4: 7a:01:9a:05:7b:b7 -> ca:52:50:fb:e5:82 00:12:28:664030: l2-input l2-input: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 00:12:28:664040: l2-input-classify l2-classify: sw_if_index 1, table 18, offset c0, next 11 00:12:28:664053: l2-learn l2-learn: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 bd_index 1 00:12:28:664060: l2-fwd l2-fwd: sw_if_index 1 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 bd_index 1 00:12:28:664071: l2-output l2-output: sw_if_index 2 dst ca:52:50:fb:e5:82 src 7a:01:9a:05:7b:b7 00:12:28:664078: l2-output-classify l2-classify: sw_if_index 2, table 22, offset c0, next 9 00:12:28:664086: host-s0_s2-output host-s0_s2 IP4: 7a:01:9a:05:7b:b7 -> ca:52:50:fb:e5:82 ICMP: 192.0.2.1 -> 192.0.2.2 tos 0x00, ttl 64, length 84, checksum 0x3797 fragment id 0x7f0e, flags DONT_FRAGMENT ICMP echo_request checksum 0xc145 Packet 4 00:12:28:664164: af-packet-input af_packet: hw_if_index 2 next-index 1 tpacket2_hdr: status 0x20000001 len 98 snaplen 98 mac 66 net 80 sec 0x5804fd0b nsec 0x275a0081 vlan 0 00:12:28:664172: ethernet-input IP4: ca:52:50:fb:e5:82 -> 7a:01:9a:05:7b:b7 00:12:28:664179: l2-input l2-input: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 00:12:28:664184: l2-input-classify l2-classify: sw_if_index 2, table 26, offset c0, next 11 00:12:28:664191: l2-learn l2-learn: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 bd_index 1 00:12:28:664193: l2-fwd l2-fwd: sw_if_index 2 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 bd_index 1 00:12:28:664198: l2-output l2-output: sw_if_index 1 dst 7a:01:9a:05:7b:b7 src ca:52:50:fb:e5:82 00:12:28:664201: host-s0_s1-output host-s0_s1 IP4: ca:52:50:fb:e5:82 -> 7a:01:9a:05:7b:b7 ICMP: 192.0.2.2 -> 192.0.2.1 tos 0x00, ttl 64, length 84, checksum 0x904b fragment id 0x665a ICMP echo_reply checksum 0xc945
CLI
set interface input acl intfc <int> [ip4-table <index>] [ip6-table <index>] [l2-table <index>] [del] show inacl type [ip4|ip6|l2]
classify table [miss-next|l2-miss_next|acl-miss-next <next_index>] mask <mask-value> buckets <nn> [skip <n>] [match <n>] [del] show classify tables [index <nn>] classify session [hit-next|l2-hit-next|acl-hit-next <next_index>|policer-hit-next <policer_name>] table-index <nn> match [hex] [l2] [l3 ip4] [opaque-index <index>]
test classify [src <ip>] [sessions <nn>] [buckets <nn>] [table <nn>] [del]
set ip classify intfc <int> table-index <index>
set interface ip6 table <intfc> <table-id>
set interface l2 input classify intfc <interface-name> [ip4-table <n>] [ip6-table <n>] [other-table <n>]
set interface l2 output classify intfc <<interface-name>> [ip4-table <n>] [ip6-table <n>] [other-table <n>]
set ip source-and-port-range-check
show ip source-and-port-range-check vrf <nn> <ip-addr> <port>
Examples
YANG model
Open Issues
- Security Group use case specific API. Done in VPP or control plane plugin?
Existing functionality
The existing functionality has a classifier (https://wiki.fd.io/view/VPP/Introduction_To_N-tuple_Classifiers) matching.
As the above document explains, the classifier is a series of chained tables, with each table having a specific mask, but this mask is the same for all entries.
This has been tested to happen in the L2 bridged case (test case: http://stdio.be/vpp/t/aytest-bridge-tap-py.txt).
Therefore, if we have an example policy:
nova secgroup-create test-secgroup test nova secgroup-add-rule test-secgroup icmp -1 -1 0.0.0.0/0 nova secgroup-add-rule test-secgroup tcp 22 22 0.0.0.0/0
So, assuming we match with offset 0 (from the beginning of the packet) the mask will look like this for the first line:
000000000000 000000000000 0000 00 00 0000 0000 0000 00 FF 0000 00000000 00000000 00 00 0000 0000 eth dst eth src et ihl t len id fo ttl pr cs ip4src ip4dst t c cs id +-------- L2 ---------------+----------- L3 IPv4 ------------------------------+--------L4 ICMP -----+
For the TCP matching on port 22 it will look as follows:
000000000000 000000000000 0000 00 00 0000 0000 0000 00 FF 0000 00000000 00000000 0000 FFFF 00000000 00000000 0000 0000 0000 0000 eth dst eth src et ihl t len id fo ttl pr cs ip4src ip4dst sp dp seq ack fl win cs urg +-------- L2 ---------------+----------- L3 IPv4 ------------------------------+--------L4 TCP ---------------------------------+
(One would need to round up the number of bytes to the nearest 16-byte boundary that makes sense)
For IPv6 assuming no extension headers, it will look similar, with the L3 header being the IPv6 one:
000000000000 000000000000 0000 0 00 00000 0000 FF 00 00000000000000000000000000000000 00000000000000000000000000000000 00 00 0000 0000 eth dst eth src et v TC fll len nh hl ipv6 src ipv dst t c cs id +-------- L2 ---------------+----------- L3 IPv6 --------------------------------------------------------------------+--------L4 ICMP -----+
For the TCP matching on port 22 it will look as follows:
000000000000 000000000000 0000 0 00 00000 0000 FF 00 00000000000000000000000000000000 00000000000000000000000000000000 0000 FFFF 00000000 00000000 0000 0000 0000 0000 eth dst eth src et v TC fll len nh hl ipv6 src ipv dst sp dp seq ack fl win cs urg +-------- L2 ---------------+----------- L3 IPv6 --------------------------------------------------------------------+--------L4 TCP ---------------------------------
Then using these masks one would create 4 tables, by using the API call:
classify_add_del_table(is_add=1, skip_n_vectors=0, mask=<MMMM>, match_n_vectors=<NNNN>,nbuckets=32,memory_size=20000, next_table_index=-1, miss_next_index=-1)
Let's call these tables "IPv4PROTO", "IPv4PROTO_TCPDPORT", "IPv6PROTO", "IPv6PROTO_TCPDPORT".
One would mention "IPv4PROTO" table as "next_table_index" table for "IPv4PROTO_TCPDPORT", and "IPv6PROTO" as "next_table_index" table for IPv6PROTO_TCPDPORT table.
Then one needs to populate the tables with the correct matches for "ICMP" and "tcp dst port 22". That can be done using API call:
classify_add_del_session(is_add=1, table_index=<XXXX>, match=<bytes-to-match>, hit-next-index -1)
The bytes "XXXX" above would be the match of one or several vectors, corresponding to the packet contents with the desired value.
WARNING: if the "skip" is nonzero in the table configuration, the match is still the entire bitstring, without skipping any leading bytes !!!
Then one would apply the IPv4PROTO_TCPDPORT and IPv6PROTO_TCPDPORT as l2 input classify tables.
The CLI for that is set interface l2 output classify intfc <name> ip[46]-table <tableid>.
The API for this is
classify_set_interface_l2_tables(sw_if_index=<INTFC>, ip4_table_index=<IPv4PROTO_TCPDPORT>, ip6_table_index=<IPv6PROTO_TCPDPORT>, other_table_index=-1, is_input=0)
This would allow to create a unidirectional policy, assuming the other policy is "permit all" it would be fine. If not -
then a mirror table entries will need to be created using the same logic.
The full script showing this process in detail using the python API is at http://stdio.be/vpp/t/classifier_script_simple_policy.txt
The Java API is located in $ROOT/vpp-api/java..