Restart Management (RM)
Here we describe the restart scenarios one might expect a system to support, the characteristics of VPP that need to be accounted for and a resulting recommendation for using VPP in an HA aware system.
We will use the term ‘state’ to refer to all or any configuration that the client programmes in VPP - this can be interfaces, routes, ACL, or anything else.
Mark-n-sweep is an algorithm where a user of a service declares to that service that all state that it has previously programmed is now ‘stale’ and the service marks a stale flag on all the user’s state. The user then proceeds to download new state afresh - the service will clear the stale flag from state that is updated. Once the user has completed its programming (i.e. it has converged) it issues the to the service a converged notification and the service will sweep (i.e. delete) all state that is still marked stale.
We will use the term agent to refer to an application that programmes VPP over the binary API. The agent receives the necessary information to programme VPP from the ‘control-plane’. The agent and the control-plane are separate entities whose lifetimes are not linked.
As has been repeated often on the public mailer, the CLI is not intended nor supported for use in a production system, so we do not consider it here.
Scenario 1 - VPP restart
in the event that VPP restarts it is necessary for the agent to download all of the state it knows about to VPP. This state will comprise two types; state that was present and correct in VPP at the moment in time when VPP restarted and new state that the agent learned whilst VPP was down. We’ll call this action ‘replay’. Note that this replay must be in the same dependency order as the state was originally added, e.g. one cannot bind an ACL to an interface that does not exist.
Scenario 2 - Agent restart
If the agent restarts one should not simply restart VPP. In order to be highly available, VPP and so the data-plane, should stay up and continue to forward traffic using the state that was present the moment the agent crashed - this is commonly referred to as non-stop forwarding (NSF). The clear benefit here is that although the system as a whole cannot accept new state whilst the agent is down, existing state (i.e. connections to other devices) continue to function. Again there are two types of state a newly restarted agent must consider, firstly, the state that exists in VPP (the old set) and secondly the new set of state that it is ‘replayed’ by the control-plane. The old and new set can be different either through the addition or removal of state in the new set with respect to the old. The agent therefore must perform a state reconciliation. As an example, consider a /32 route that was present in the agent and VPP at the moment the agent crashed. whilst the agent was down the control-plane removed the /32. when the agent restarts it will not relearn the /32 but must remove it from VPP.
The VPP binary API supports multiple clients (and hence multiple agents in the terms defined here), however, no state in VPP has the concept of being owned by a particular agent. Indeed in many cases state has no sense of ownership at all, so state added via the API can be removed on the CLI (FIB routes are an exception).
VPP does not support the paradigm of mark-n-sweep.
VPP does support a programmatic interface to query all its existing state, these are referred to commonly as the dump APIs.
There is a single agent that:
1) uses the binary API to programme VPP 2) has a north-bound interface to receive updates from the control-plane
no other applications use the binary API (at least not to add state).
This agent maintains a final-state data-base in non-persistent memory. Under normal operations the agent’s DB will match VPP’s. This database must maintain object (or at least object-type) dependencies.
The agent polls VPP over the binary API for liveness detection.
Supporting scenario 1
When the agent detects VPP is dead, since it no longer responds to polls, the agent can disconnect and reconnect to the binary API. The method by which VPP is restarted is beyond the scope of this article, we’ll just assume it is automatic. Once the connection is re-established, the agent must download all the state in its database in dependency order (e.g. interface first, then ACLs, then tables, then routes, etc).
Whilst VPP is down the agent must continue to accept updates from the control plane and add then to its database so they will be present in the replay.
Supporting Scenario 2
When the agent restarts its first action must be to connect to VPP and retrieve (or dump) all of VPP’s state. This state is used to populate the agent’s DB and marked as stale. At this point the agents starts a ‘boot’ timer. From this point on the agent begins to accept messages from the control plane (i.e. the control plane performs a ‘replay’). All state that the control-plane updates has its stale flag cleared within the agent’s DB. If the agent is capable of issuing a ‘converged’ notification then the agent’s boot timer is not required, otherwise on expiry of the boot timer the agent will walk its DB and delete (from the DB and from VPP) any state that is still marked stale. You’ll note the similarity of this algorithm from the mark-n-sweep algorithm mentioned in the introduction. This scenario explains why it is recommended to have only one agent. If two agents were to perform this action in parallel, then they would each remove the others state and VPP would be left with nothing.
The agent represents a single, queryable oracle of truth within the system. It can disseminate information, using your favourite message bus protocol, to other applications that constitute the larger system so that, for example, they can read stats information for VPP objects.
It is not recommended to use the agent to poll for stats, since the amount of data stats produces can severely impair the throughput of programming in the agent.
The VPP object Model (VOM) is a shared library written in C++ that can be linked into a agent to provide the functions described above. VOM offers a desired state model to its clients, this means that a client constructs an object to represent the desired state in VPP (e.g. an interface or bridge-domain) and then ‘writes’ this object into VOM’s DB. VOM will then perform the necessary ‘update’ (whether it is a modification or create) to make that desired state programmed in VPP. VOM supports the notion of multiple-ownership for all the objects. If there are discrepancies in the desired state of multiply owned objects is a last update wins model). Only once an object no has owners is it deleted. The state reconciliation scenario is supported because the state learned from the initial dump is ‘owned’ by a special owner named ‘boot’. Once the boot timer expires the boot owner is removed from all state and hence if it was the only owner that state is removed. VOM maintains a static graph of object-type dependencies to ensure replay occurs I the proper order.