Difference between revisions of "CSIT/Documentation"
(Continue fixing grammar and word choice.) |
(Continue updating the quality of the writing as I read down the page.) |
||
Line 186: | Line 186: | ||
==Writing a test case== | ==Writing a test case== | ||
− | This section | + | This section covers the steps one must take to add a new, fully functional test. A new test will most probably require opening multiple files in different locations within the CSIT code base, including Python libraies, VAT templates and the RobotFramework suite/test. Usually a developer has an idea of what they want to test and may come with a bunch of scripts they used during unit testing of their feature, or they might just have a list of newly added VPP API functions. This guide will walk you through creating a fully working test case from scratch. We have chosen the '''bridge domain test case'' as our example. |
===Test case design=== | ===Test case design=== |
Revision as of 19:21, 23 December 2020
Contents
WORK IN PROGRESS
CSIT components
The FD.io CSIT system is developed using two main coding building blocks: RobotFramework and Python.
Robot Framework
Robot Framework (RF) is a test automation tool that allows to select and run a set of test suites and test cases on selected target topology (nodes) and provide output logs in a readable and parse-able form. RF uses column-based formatting for its files. For clarity and to avoid problems, CSIT team has selected the "pipe and space separated" format (rather than tab-separated format). RF is case insensitive, but we strive to be consistent in how we use upper/lower case in RF files. The value of RF really is its human readability - anyone who spends a little time reading RF source files can understand what is going on, even non-programmers can understand the essence of what given test case is doing. There are only two types of RF files stored in CSIT: resource (a.k.a. library) files, and test suites (any RF file containing test case). In resource files we store all Keywords (RF's name for functions/methods) that are generic, and could be re-used. For example NIC manipulation is going to be re-used in nearly all test-suites, hence it is placed in a file for common reuse as an RF library. All tests in RF format are stored in tests/suites/ sub-directories. RF interprets every directory there as a test suite, with all files containing test cases taken as test suites.
Python
In CSIT, we use Python (latest release of 2.7.x series) to perform tasks that are unsuitable for RF format (e.g. lower level code, or anything that starts to feel more like coding - conditionals/loops etc.). Since RF is written in Python, it integrates with Python extensions easily. In any RF file one can import Python scripts directly and reference classes and functions from the imported module. Here's a link to RF documentation describing how to create and import Python libraries into an RF file].
The other use of Python code in the CSIT framework is in the traffic scripts. In a typical test, one has to verify whether the network data-plane of VPP works as configured, and in CSIT we do that by sending crafted packets and validating them when they are received. We use Scapy Python tool for this purpose. Scapy allows one to easily create, dissect, manipulate and pretty-print packets. Here is an example of a CSIT Scapy Python traffic script.
These scripts are re-used as much as possible, to minimize the duplication of code. Traffic scripts are run on the TG (traffic generator) node in the test topology using an SSH command that passes variables to the script as command line arguments. There is a utility method in the CSIT Python module that is used to generate command lines, with arguments set based on passed variables. This utility method is re-used as many of the parameters are common to all the scripts, such as the MAC and IP addresses to use for traffic as well as which interface to use for Transmit and Receive.
All Python code submitted to CSIT should conform to the PEP-8 style. There is a Jenkins job hooked to Gerrit watching for CSIT project changes, that runs Pylint against all resources/*/*.py files. Submitted patch may not increase the number of Pylint violations generated (such as the csit-validate-pylint job). The focus should be on lowering the number of warnings. In some cases it may be impossible to do so, so exceptions may be allowed, but in general one should work their hardest toward having clean Python code.
Starting with CSIT Robot Framework tests
pybot
RF provides a Python executable pybot that is currently the main entry point to all CSIT tests. The following documentation provides examples applicable to CSIT test suite execution, and as such should not be considered to completely cover pybot usage options. Complete information is beyond the scope of this document. Use pybot --help to see all available options or the Robot Framework User Guide for details.
Log levels
All CSIT Python libraries use the RF logging sub-system for logging. To specify the logging verbosity level, add -L LEVEL
as a command line argument. The pybot -L TRACE ...
argument can be selected to log everything. Logging information is then stored in the output.xml and log.html output files. RF supported log levels are listed here from RF user guide:
- FAIL - Used when a keyword fails. Can be used only by the Robot Framework itself.
- WARN - Used to display warnings. Warnings are shown on the console and in the Test Execution Errors section of the log files, but they do not affect the test case status.
- INFO - The default level for normal messages. By default, messages below this level are not shown in the log file.
- DEBUG - Used for debugging purposes. Useful for logging what libraries are doing internally. When a keyword fails, a traceback showing where in the code the failure occurred is logged using this level automatically.
- TRACE - More detailed debugging level. The keyword arguments and return values are automatically logged at this level.
Test Suites / Test Cases
In CSIT one directory is used for all tests, conveniently named tests. All tests are grouped by suites - every directory is understood as a suite in RF, and the same applies to every file (CSIT exclusively uses the .robot file suffix). This information can be leveraged by starting pybot -s suite_name
to execute test cases only from that particular suite (wildcards are supported, just be aware of shell globbing). A more detailed explanation from RF user guide.
Selection of a concrete test case is possible too, just use -t name_of_testcase
pybot option. You can combine -s and -t parameters.
Tags
To help in selection of test cases for per-patch runs, CSIT has developed a TAGing scheme documented in tag_documentation.rst file. At minimum, each CSIT test case must provide a TAG from two groups:
- environment tag
- defines on which environment this test case can be run
- e.g. VM_ENV, HW_ENV
- topology tag
- defines what topology this test requires
- e.g. 3_NODE_SINGLE_LINK_TOPO
To run all of the test cases that can be run on a VM environment (VIRL, VmWare, VirtualBox, KVM, etc), and which require at least one link in between the DUT nodes, one can type:
pybot --include VM_ENVand3_NODE_SINGLE_LINK_TOPO
. For usage guidelines of RF tags see RF user guide.
Topology
Every CSIT test case runs on a network topology, either virtual or physical. A topology is a collection of nodes with settings (like IP addresses of nodes, login credentials, and most importantly NIC and link information). Currently there is only one type of topology being used in CSIT - a three node topology with one, or two links between each pair of nodes. See the topology diagram below.
+---------+ +---------+ | <----------------------> | | DUT | | DUT | | <----------------------> | +--^---^--+ +--^----^-+ | | | | | | | | | | +---------+ | | | +---------> <--------+ | | | TG | | +-------------> <-------------+ +---------+
During tests all nodes are reachable thru the MGMT network connected to every node via dedicated NICs and links, which, for clarity, are not shown in the diagram above for clarity. Each connection that is drawn on the diagram is potentially going to be used by test cases. It is critical that there is no other traffic on these links but only what the traffic generator scripts or DUT VPPs send out. Whenever other protocol packets (e.g. LLDP, DHCP generated by NICs or surrounding systems) are present on these links the tests may fail.
Topology information is stored in YAML format in topologies/* sub-directories. To help in early syntax error detection, we provide YAML schemas in the resources/topology_schemas sub-directory. The contents of your topology file mustrepresent your physical or virtual lab setup. Pay special attention to login information, IP addresses and interfaces details.
Topology information is loaded from the YAML file, processed, and provided by the CSIT Python library as a global variable named nodes during CSIT start. You might have spotted occurrences of it in .robot files as ${nodes['TG']}
- this represents a reference to a dictionary value of parsed node named TG from the topology file. The nodes value is then used with other CSIT Python libraries (e.g. SSH, DUT setup, and so on).
To specify your own topology file, pass -v TOPOLOGY_PATH:topologies/enabled/topology.yaml
parameter as your pybot command line argument, just replace the topologies/enabled/topology.yaml with a path to your own topology definition. For example, to start the bridge-domain tests with custom topology file in ~/my_topo.yaml use the following command:
pybot -L TRACE -v TOPOLOGY_PATH:~/my_topo.yaml --include vm_envAND3_node_single_link_topo -s "bridge domain"
Test execution walk-through
After pybot execution, RF starts looking for test suites to execute. All test suites are stored in tests/ sub-directory. RF looks recursively for files in this directory, and naturally it comes to tests/suites/__init__.robot
. This file is loaded before any other suite - it acts as initialization file for all suites in given directory. CSIT uses this file to initialize the run-time environment before we run any test cases. The setup currently consists of:
- Setup Framework - prepares each node in the topology for use by the framework for testing:
- executed exactly once, before any test case execution;
- uploads the whole CSIT directory (CSIT top level dir with contents) to every ${node}:/tmp/openvpp-testing directory for use during test;
- makes sure all dependencies are installed on Nodes if needed, and prepares Python for execution in Virtualenv;
- this last step is done parallel for each topology Node to save significant amounts of setup time.
- Setup All DUTs - prepares all DUTs for test execution:
- executed in most cases(??) / potentially(??) before any test case;
- restarts VPP instances, makes sure VPPs are up, performs common setup before test execution.
- what about initializing TG(??).
- Update All Interface Data On All Nodes:
- downloads VPP interfaces data from each DUT, and stores sw_if_index of interfaces into the topology information file(??) for later use (VPP API commands take sw_if_index as parameter instead of NIC names).
After executing the listed procedures, RF/pybot loads test suites and test cases based on the selection criteria given in the command line. From there onwards it is just the execution of RF test cases as per the RF documentation. Typically there are only a few RF keywords (pre-defined in RF and CSIT library(??)) listed in the suite setup and in test case setup that are executed before the actual test case, and from there on out it's just the execution and evaluation of the defined test cases.
Jenkins
The whole purpose of CSIT is to be continuous. To achieve just that, all possible functional test cases have to be executed every time a patch is introduced to VPP. This is achieved by executing CSIT tests against given all patches in vpp project in gerrit. If any test unexpected test fails, the Gerrit review will receive -Verified vote from Jenkins user, hence rendering the review not submittable (by normal means). This approach helps discovering problems early before VPP code is submitted to VPP master branch. Such jobs (which are triggered by a change in Gerrit review) are in general called verify jobs, and can be spotted by "verify" in their name (e.g. csit-verify, vpp-verify etc.). Analogically there are merge jobs, which are triggered after a review in Gerrit is submitted to the master/parent branch. These jobs are typically used for artefact publishing to nexus and such (e.g. VPP .deb packages).
CSIT has to have its own Jenkins jobs, that help validate incoming tests/code in all ways: code functionality (all test are executed) and code cleanliness.
[| All CSIT jobs with their description are listed here.]
CSIT Jobs
These jobs role is to verify CSIT code is still working. For every review in CSIT gerrit for functional tests, all tests are executed to verify the patch didn't bring in any collateral. Furthermore if there is new test case/suite added, these are executed too. This way we know if the implemented test cases work as they should. We use "golden" VPP version from Nexus, that has been previously validated by CSIT that works fine - this is to eliminate multiple variables in test (i.e. have only CSIT change, leaving VPP version constant between runs).
CSIT requires PEP-8 code style, and to help code reviews pylint job has been created: csit-validate-pylint. This job's task is to detect formatting violations and report them.
VPP Jobs
VPP jobs created by CSIT have the sole purpose of validating VPP review submissions (patchsets if you will). This is achieved by executing verified version of CSIT code on top of build of VPP (built from git parent version + applied patch from given gerrit review). The verified CSIT code version is identified by a git tag (a branch currently, but that's not important), that points to a known-100%-passing-CSIT-version. Therefore if a VPP test fails, it is due to VPP code change instead of potential problem in CSIT.
VPP verification job vpp-csit-verify-virl, builds VPP, checks out csit-verified and uses the built .deb packages to test the VPP. That's how VPP diff gets tested.
Starting RobotFramework
All examples below expect you to have a cloned CSIT repository, and a virtual environment. Setting up the test area can be achieved with the following commands:
username@vpp64:~/vpp/fd.io/csit$ virtualenv env New python executable in env/bin/python Installing setuptools, pip, wheel...done. username@vpp64:~/vpp/fd.io/csit$ source env/bin/activate (env)username@vpp64:~/vpp/fd.io/csit$ pip install -r requirements.txt Collecting robotframework==2.9.2 (from -r requirements.txt (line 1)) /home/username/vpp/fd.io/csit/env/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning. InsecurePlatformWarning Collecting paramiko==1.16.0 (from -r requirements.txt (line 2)) Using cached paramiko-1.16.0-py2.py3-none-any.whl Collecting scp==0.10.2 (from -r requirements.txt (line 3)) Using cached scp-0.10.2-py2.py3-none-any.whl Collecting ipaddress==1.0.16 (from -r requirements.txt (line 4)) Using cached ipaddress-1.0.16-py27-none-any.whl Collecting interruptingcow==0.6 (from -r requirements.txt (line 5)) Collecting PyYAML==3.11 (from -r requirements.txt (line 6)) Collecting pykwalify==1.5.0 (from -r requirements.txt (line 7)) Collecting scapy==2.3.1 (from -r requirements.txt (line 8)) Collecting enum34==1.1.2 (from -r requirements.txt (line 9)) Collecting requests==2.9.1 (from -r requirements.txt (line 10)) Downloading requests-2.9.1-py2.py3-none-any.whl (501kB) 100% |████████████████████████████████| 503kB 582kB/s Collecting ecdsa>=0.11 (from paramiko==1.16.0->-r requirements.txt (line 2)) Using cached ecdsa-0.13-py2.py3-none-any.whl Collecting pycrypto!=2.4,>=2.1 (from paramiko==1.16.0->-r requirements.txt (line 2)) Collecting docopt==0.6.2 (from pykwalify==1.5.0->-r requirements.txt (line 7)) Collecting python-dateutil==2.4.2 (from pykwalify==1.5.0->-r requirements.txt (line 7)) Using cached python_dateutil-2.4.2-py2.py3-none-any.whl Collecting six>=1.5 (from python-dateutil==2.4.2->pykwalify==1.5.0->-r requirements.txt (line 7)) Using cached six-1.10.0-py2.py3-none-any.whl Installing collected packages: robotframework, ecdsa, pycrypto, paramiko, scp, ipaddress, interruptingcow, PyYAML, docopt, six, python-dateutil, pykwalify, scapy, enum34, requests Successfully installed PyYAML-3.11 docopt-0.6.2 ecdsa-0.13 enum34-1.1.2 interruptingcow-0.6 ipaddress-1.0.16 paramiko-1.16.0 pycrypto-2.6.1 pykwalify-1.5.0 python-dateutil-2.4.2 requests-2.9.1 robotframework-2.9.2 scapy-2.3.1 scp-0.10.2 six-1.10.0 /home/username/vpp/fd.io/csit/env/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning. InsecurePlatformWarning You are using pip version 7.1.2, however version 8.1.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. (env)username@vpp64:~/vpp/fd.io/csit$ export PYTHONPATH=.
Start Specific Test Suite
pybot -L TRACE -v TOPOLOGY_PATH:topologies/available/my_topo.yaml -s ipv4 tests
- pybot
- RobotFramework executable
- -L TRACE
- set the debugging level to TRACE
- -v TOPOLOGY_PATH
- topologies/available/my_topo.yaml
- load the topology file
topologies/available/my_topo.yaml
- -s ipv4
- start all test suites that match name
ipv4>
- tests
- path to start tests from. In our case
tests
is directory containing subdirs with test suites. RF reads these sub-directories in recursive manner in search for test cases.
Start Specific Test Case
pybot -L TRACE -v TOPOLOGY_PATH:topologies/available/my_topo.yaml -t "VPP replies to ICMPv4 echo request" tests/
- pybot
- RobotFramework executable
- -L TRACE
- set the debugging level to TRACE
- -v TOPOLOGY_PATH
- topologies/available/my_topo.yaml
- load the topology file
topologies/available/my_topo.yaml
- -4 "VPP replies to ICMPv4 echo request"
- start test case that matches the name
VPP replies to ICMPv4 echo request
- tests
- path to start tests from. In our case
tests
is directory containing subdirs with test suites. RF reads these sub-directories in recursive manner in search for test cases.
Typical problems
vpp-csit-verify job failed, now what?
how to read console logs
reading robotframework log
timeout problems
Writing a test case
This section covers the steps one must take to add a new, fully functional test. A new test will most probably require opening multiple files in different locations within the CSIT code base, including Python libraies, VAT templates and the RobotFramework suite/test. Usually a developer has an idea of what they want to test and may come with a bunch of scripts they used during unit testing of their feature, or they might just have a list of newly added VPP API functions. This guide will walk you through creating a fully working test case from scratch. We have chosen the 'bridge domain test case as our example.
Test case design
First, we have to specify what the given test is proving. In our case, we want to test if VPP's bridge domain (with enabled learning) forwards packets as expected. From VPP's perspective, we want to: - create a bridge domain - add two interfaces to it - set interfaces state to up
Then we will send packet to one interface and expect it will appear unchanged on the other interface. Since the bridge domain is configured as learning, and the dst mac is unknown to VPP the packet has to appear on the other interface.
To create a VPP bridge domain, VPP API has to be used:
bridge_domain_add_del sw_interface_set_l2_bridge sw_interface_set_flags
Lastly, one has to realize they work in a given topology. Most used topology currently is the circular topology with one link between nodes - this one is the easiest one to set up. In this topology each node has two interfaces. TG is connected to both DUTs, and DUTs are interconnected with a cable too. Therefore to test the packet flowing through the VPP's bridge domain, we have to duplicate the bridge domain configuration to both DUTs.
Test suite creation
In this example, we are going to create a new test suite, because we know no bridge domain tests are present (suppose). We have to create a new directory in tests/suites directory. Let's call it bridge_domain. RobotFramework will take the name of the directory as test suite name. We will create then a new file in this subdirectory, which will contain test cases for bridge_domain suite. For now let's call it test.robot (this could be something more explanatory like "untagged" to mark tests that are not using tagged packets).
cd $CSIT_ROOT mkdir tests/suites/bridge_domain touch tests/suites/bridge_domain/test.robot ${EDITOR} tests/suites/bridge_domain/test.robot
Every RobotFramework test suite file will in the end have at least two sections: Settings and Test Cases.
Settings section
This section sets properties for the whole test suite, like imports of libraries, imports of other robot framework files, tags settings, and setup/teardown phase hooks settings. Add these lines to your newly created test.robot file:
*** Settings *** | Resource | resources/libraries/robot/default.robot
We've created a new (still empty) test suite, with two sections. In the Settings section we added a directive to make RobotFramework import other RobotFramework file. default.robot is CSIT created file with default imports that are recurring in multiple test suites. In the default.robot file, there is Variables directive, which causes the topology information to be loaded for later use. Therefore in our test cases we will have ${nodes} populated properly - reference to topology nodes information from topology yaml configuration as passed to the pybot command.
We know that we are going to send crafted packets using our traffic scripts, therefore we have to prepare TG node for this. The reason is that functional tests could get intermixed with perf tests on the same topology, and perf tests could take NICs from linux kernel and use special driver for specialized packet generation. To get around this, one should make sure the NICs are visible in linux by using RobotFramework keyword from defaul.robot:
| Setup all TGs before traffic script | | [Documentation] | Prepare all TGs before traffic scripts execution | | All TGs Set Interface Default Driver | ${nodes}
We have to call this only oncce, because in our suite we want to use traffic scripts only. To make RobotFramework call this KW only once, we can add Suite Setup directive to Settings section:
*** Settings *** | Resource | resources/libraries/robot/default.robot | Suite Setup | Setup all TGs before traffic script
Furthermore we want to make sure VPP is running on DUTs, and if not start it up (VPP might have crashed during previous test). To do this before every test case, we can add following line to Settings section:
*** Settings *** | Resource | resources/libraries/robot/default.robot | Suite Setup | Setup all TGs before traffic script | Test Setup | Setup all DUTs before test
Test Cases section
Now we define the test case. Test case starts with its name as first column. One can write whatever they want here, since it is just taken by RobotFramework as plain text and used as such. In CSIT we strive to have self-explanatory test case names. The goal is to have the test name short, with detail and self-explanatory. Or at least attempt to do so. The approach that we've took is to take the names of the testcases as if one would be explaining the feature to others. We claim, that VPP can forward packets through bridge domain, and we are doing it in circular topology. More documentation can be added later in [Documentation] section in test case definition, but just from the name one should be able what the test case does. So here goes an attempt to name the test case:
*** Settings *** | Resource | resources/libraries/robot/default.robot | Suite Setup | Setup all TGs before traffic script | Test Setup | Setup all DUTs before test *** Test Cases *** | Vpp forwards packets via L2 bridge domain in circular topology
When RobotFramework will be executing these instructions in this test case we are writing, the Test Setup phase has been already executed for us. Therefore VPPs on DUTs are in up state, and we can fire off VPP commands we want to test. First thing to realize is that we have to have reference to a DUT we want to execute these commands at. For this purpose utility library NodePath has been created. It allows one to create virtual "packet" (data) path through the topology nodes and their NICs. To use the library (any library really) in this test suite, we have to import it in Settings section:
*** Settings ***
| Resource | resources/libraries/robot/default.robot
| Library | resources.libraries.python.NodePath
| Suite Setup | Setup all TGs before traffic script
| Test Setup | Setup all DUTs before test
*** Test Cases ***
| Vpp forwards packets via L2 bridge domain in circular topology
We know we plan to go through the topology in circular manner, so our packets are going to go out of TG to DUT1 to DUT2 and back to TG. We already have ${nodes} object available to us in RobotFramework files through the default.robot import. Now we have to select NICs on these ports. To refrain from using concrete port names/numbers in RobotFramework, we can do following:
*** Settings *** | Resource | resources/libraries/robot/default.robot | Library | resources.libraries.python.NodePath | Suite Setup | Setup all TGs before traffic script | Test Setup | Setup all DUTs before test *** Test Cases *** | Vpp forwards packets via L2 bridge domain in circular topology | | Append Nodes | ${nodes['TG']} | ${nodes['DUT1']} | ${nodes['DUT2']} | | ... | ${nodes['TG']} | | Compute Path
Note that we didn't have to write NodePath.Append Nodes. That's because RobotFramework resolves non-conflicting names automatically. One can confirm that in log.html output, where RobotFramework will prepend name of library before the function name.
This will create node traversal path using the NodePath library.
*** Settings *** | Resource | resources/libraries/robot/default.robot | Library | resources.libraries.python.NodePath | Suite Setup | Setup all TGs before traffic script | Test Setup | Setup all DUTs before test *** Test Cases *** | Vpp forwards packets via L2 bridge domain in circular topology | | Append Nodes | ${nodes['TG']} | ${nodes['DUT1']} | ${nodes['DUT2']} | | ... | ${nodes['TG']} | | Compute Path | | ${tg_if1} | ${tg}= | Next Interface | | ${dut1_if1} | ${dut1}= | Next Interface | | ${dut1_if2} | ${dut1}= | Next Interface | | ${dut2_if1} | ${dut2}= | Next Interface | | ${dut2_if2} | ${dut2}= | Next Interface | | ${tg_if2} | ${tg}= | Next Interface
Now we have variables filled with interface data. For example tg_if1 contains interface information about that interface on that given node (TG). We will use these variables later in the test, and pass them to other functions and RobotFramework keywords as arguments.
Now we have to create the bridge domain, add interfaces to it, set them up and do this for both DUTs. So we know we are going to repeat some work. Why not create a RobotFramework function, also known as Keyword, for it? This operation is going to be reused number of times, so let's create a common RobotFramework resource file for it. The common RobotFramework libraries sit in resources/libraries/robot directory. Let's add bridge_domain.robot file there:
$ ${EDITOR} $CSIT_ROOT/resources/libraries/robot/bridge_domain.robot
Robot Framework's resource file's difference from RobotFramework suite file is that it does not contain any Test Cases section. It rather has Keywords section.
Let's add the KW in the newly opened resource file:
*** Keywords *** | Vpp l2bd forwarding setup | | [Documentation] | Setup bridge domain between 2 interfaces on VPP node | | [Arguments] | ${node} | ${if1} | ${if2} | ${learn}=${TRUE} | | Set Interface State | ${node} | ${if1} | up | | Set Interface State | ${node} | ${if2} | up | | Vpp Add L2 Bridge Domain | ${node} | ${1} | ${if1} | ${if2} | ${learn} | | All Vpp Interfaces Ready Wait | ${nodes}
We've named the KW as Vpp l2bd forwarding setup, and it takes 4 arguments: node to create the bridge domain on, two interfaces to add to that bridge domain, and learning option setting (which we set to default to TRUE). In the new Keyword code we actually do what we've written above: set interfaces up, add them to a bridge domain, and then wait for those interfaces to come up. Let's go over them one by one.
- Set Interface State
Set Interface State is located in InterfaceUtil.py:
<snip> @staticmethod def set_interface_state(node, interface, state): """Set interface state on a node. Function can be used for DUTs as well as for TGs. :param node: node where the interface is :param interface: interface name or sw_if_index :param state: one of 'up' or 'down' :type node: dict :type interface: str or int :type state: str :return: nothing """ <snip>
RobotFramework allows calling of functions implemented in Python directly, you just have to know how the name of the function is translated to RobotFramework formatting. Every Python underscore is translated to space in RobotFramework, and case don't matter. Hence set_interface_state is called Set Interface State in RobotFramework. Parameters are delimited by pipe.
- Vpp Add L2 Bridge Domain | ${node} | ${1} | ${if1} | ${if2} | ${learn}
This KW has to be implemented, but for now let's have a look on what parameters we are going to pass to it. We know that we are going to create a BD, and we know that it will exist on a VPP, which is on DUT, which is represented by the ${node}. Further parameters come out of necessity to pass variables to the VPP API to create BD:
bridge_domain_add_del bd_id {bd_id} flood 1 uu-flood 1 forward 1 learn {learn} arp-term 0
We do not care for flood, uu-flood, forward and arp-term parameters, because those are going to be mostly static in our tests. So for now we only need to be able to change bridge domain ID and learn settings. Once the BD is created, those two interfaces ${if1} and {if2} have to be added there. For that we use following VPP API:
sw_interface_set_l2_bridge sw_if_index {sw_if_id} bd_id {bd_id} shg 0 enable
We use template approach to execute VAT scripts with parameters at a DUT. For each VPP API (or series of, if they are logically bound together) we have a VAT(Vpp Api Test) template file. These are stored in ${CSIT}/resources/templates/vat/. For our purposes we are going to create a file called l2_bridge_domain.vat at that location: ${CSIT}/resources/templates/vat/l2_bridge_domain.vat and enter:
bridge_domain_add_del bd_id {bd_id} flood 1 uu-flood 1 forward 1 learn {learn} arp-term 0 sw_interface_set_l2_bridge sw_if_index {sw_if_id1} bd_id {bd_id} shg 0 enable sw_interface_set_l2_bridge sw_if_index {sw_if_id2} bd_id {bd_id} shg 0 enable
One can't really "call" a VAT template file as such, first a Python code has to be created to take that template file, replace placeholders with parameter values, and only then send it to given DUT/VPP. The proper location for this particular function would be in ${CSIT}/resources/libraries/python/L2Util.py, because it is related to L2 utilities. In that file, there is Python class named the same as the module name - L2Util, and we should add the function there. But how to name that function so that it matches our RobotFramework's reference above? Well as mentioned previously, when RF looks for a function, it searches in its imports - when it is looking for the function in Python, it translates all spaces with underscores "_". It is also important to note, that although RF looks for the function in case insensitive style, we've settled with using lower case function names with underscore delimiters.
Our new function in L2Util.py module in L2Util class will look like(don't forget to always write your documentation, we like it):
@staticmethod def vpp_add_l2_bridge_domain(node, bd_id, port_1, port_2, learn=True): """Add L2 bridge domain with 2 interfaces to the VPP node. :param node: Node to add L2BD on. :param bd_id: Bridge domain ID. :param port_1: First interface name added to L2BD. :param port_2: Second interface name added to L2BD. :param learn: Enable/disable MAC learn. :type node: dict :type bd_id: int :type port_1: str :type port_2: str :type learn: bool """ sw_if_index1 = Topology.get_interface_sw_index(node, port_1) sw_if_index2 = Topology.get_interface_sw_index(node, port_2) VatExecutor.cmd_from_template(node, 'l2_bridge_domain.vat', sw_if_id1=sw_if_index1, sw_if_id2=sw_if_index2, bd_id=bd_id, learn=int(learn))
Topology.get_interface_sw_index(node, port_1)
is function in Topology Python module, that gets VPP's sw_if_index from interface. We need that for the above VPP API command. VatExecutor is module to facilitate execution of VPP API calls using vpp_api_test binary. cmd_from_template
connects to node, reads contents of resources/templates/vat/l2_bridge_domain.vat and replaces occurrences of {sw_if_id1} with local variable sw_if_index1 (the same applies to sw_if_id2, bd_id and learn), and sends the result to vpp_api_test for execution.
In effect, this code means: take the template, here are the values to use in the template, and execute on node (DUT).
At this time, we have on DUT1: - VPP started - Interfaces state set to up - created a bridge domain - added two interfaces to the bridge domain
- All Vpp Interfaces Ready Wait | ${nodes}
What is still left to do is to wait for the interfaces to come up. That's being done in Vpp l2bd forwarding setup
keyword: All Vpp Interfaces Ready Wait | ${nodes}
. That is very versatile keyword/function that polls VPP for interface stats, and loops over their stats until all interface's operational status is the same as admin status. Using this function is much more superior to having simple "sleep" there.
- Set up DUTs using the new keyword
*** Settings *** | Resource | resources/libraries/robot/default.robot | Library | resources.libraries.python.NodePath | Suite Setup | Setup all TGs before traffic script | Test Setup | Setup all DUTs before test *** Test Cases *** | Vpp forwards packets via L2 bridge domain in circular topology | | Append Nodes | ${nodes['TG']} | ${nodes['DUT1']} | ${nodes['DUT2']} | | ... | ${nodes['TG']} | | Compute Path | | ${tg_if1} | ${tg}= | Next Interface | | ${dut1_if1} | ${dut1}= | Next Interface | | ${dut1_if2} | ${dut1}= | Next Interface | | ${dut2_if1} | ${dut2}= | Next Interface | | ${dut2_if2} | ${dut2}= | Next Interface | | ${tg_if2} | ${tg}= | Next Interface | | Vpp l2bd forwarding setup | ${nodes['DUT1'] | ${dut1_if1} | ${dut1_if2} | | Vpp l2bd forwarding setup | ${nodes['DUT2'] | ${dut2_if1} | ${dut2_if2}
- Validate VPP configuration by sending traffic
Now that VPPs are configured, and all interfaces are in expected operation state, we should send packets out of one TG interface, and expect those packets to appear on the second TG interface. Let's create robot framework keyword to send L2 traffic and expect it to come on the other interface.
*** Settings *** | Resource | resources/libraries/robot/default.robot | Library | resources.libraries.python.NodePath | Suite Setup | Setup all TGs before traffic script | Test Setup | Setup all DUTs before test *** Test Cases *** | Vpp forwards packets via L2 bridge domain in circular topology | | Append Nodes | ${nodes['TG']} | ${nodes['DUT1']} | ${nodes['DUT2']} | | ... | ${nodes['TG']} | | Compute Path | | ${tg_if1} | ${tg}= | Next Interface | | ${dut1_if1} | ${dut1}= | Next Interface | | ${dut1_if2} | ${dut1}= | Next Interface | | ${dut2_if1} | ${dut2}= | Next Interface | | ${dut2_if2} | ${dut2}= | Next Interface | | ${tg_if2} | ${tg}= | Next Interface | | Vpp l2bd forwarding setup | ${nodes['DUT1'] | ${dut1_if1} | ${dut1_if2} | | Vpp l2bd forwarding setup | ${nodes['DUT2'] | ${dut2_if1} | ${dut2_if2} | | Send and receive ICMPv4 bidirectionally | ${nodes['TG']} | ${tg_if1} | ${tg_if2}
The last keyword is implemented in resources/libraries/robot/l2_traffic.robot, and it wraps two calls to Send and receive ICMP Packet to send packet through the topology in both directions. The Send and receive ICMP Packet takes TG node, two TG's interfaces and src/dest IP addresses as parameters. These are then used in the traffic script, which is executed by call to Run Traffic Script On Node:
| | ${src_mac}= | Get Interface Mac | ${tg_node} | ${src_int} | | ${dst_mac}= | Get Interface Mac | ${tg_node} | ${dst_int} | | ${args}= | Traffic Script Gen Arg | ${dst_int} | ${src_int} | ${src_mac} | | | ... | ${dst_mac} | ${src_ip} | ${dst_ip} Run Traffic Script On Node | send_ip_icmp.py | ${tg_node} | ${args}
As one can notice, the above line refers to send_ip_icmp.py script. We call that a traffic script - it's purpose is to be executed on TG node and to validate certain VPP network configuration. In send_ip_icmp.py case the script is expected to generate a packet coming out from one interface and come back on the other one unchanged. The script is located in resources/traffic_scripts/send_ip_icmp.py. One can do whatever they want in these traffic scripts, but usually you'll end up with using CSIT's tools to help you parse arguments, generate packet, send it out, wait for it and validate the result.
args = TrafficScriptArg(['src_mac', 'dst_mac', 'src_ip', 'dst_ip']) src_mac = args.get_arg('src_mac') dst_mac = args.get_arg('dst_mac') src_ip = args.get_arg('src_ip') dst_ip = args.get_arg('dst_ip') tx_if = args.get_arg('tx_if') rx_if = args.get_arg('rx_if') rxq = RxQueue(rx_if) txq = TxQueue(tx_if)
At this time we have two "queues" (just some naming) that allow to read/write any packets from given interfaces. In our case those two interfaces are src_int and dst_int from script above. Now, let's create raw packet that we will send out:
pkt_raw = (Ether(src=src_mac, dst=dst_mac) / IP(src=src_ip, dst=dst_ip) / ICMP())
Above code uses scapy to generate Ethernet packet with IP header and ICMP payload. See http://www.secdev.org/projects/scapy/doc/usage.html for more information how to use scapy to generate packets of your needs.
Once we have the packet generated, we have to send it out:
txq.send(pkt_raw)
Next step is to wait for the packet come on the other interface:
ether = rxq.recv(2) # Check whether received packet contains layers Ether, IP and ICMP if ether is None: raise RuntimeError('ICMP echo Rx timeout') if not ether.haslayer('IP'): raise RuntimeError('Not an IP packet received {0}' .format(ether.__repr__())) if not ether.haslayer('ICMP'): raise RuntimeError('Not an ICMP packet received {0}' .format(ether.__repr__()))
And that's it. We have created RobotFramework suite, test case, keyword, python code and traffic script and we have them all tied together. The last part is how to execute the test - to test a particular test case, you can pass -t parameter to pybot:
pybot ...
-t "Vpp forwards packets via L2 bridge domain in circular topology" tests
Or you can use wildcards:
pybot ...
-t "Vpp forwards packets via L2 bridge *" tests
CSIT Code Structure
CSIT project consists of the following:
- RobotFramework tests, resources, and libraries.
- bash scripts – tools, and anything system-related (copying files, installing SW on nodes, ...).
- Python libraries
- the brains of the execution.
- for different functionality there is a different module, i.e.
- vpp
- ipv4 utils.
- ipv6 utils.
- xconnect.
- bdomain.
- VAT (vpp_api_test) helpers.
- Config generator.
- ssh.
- topology.
- packet verifier – packet generator and validator.
- v4/v6 ip network and host address generator.
- vpp
- vpp_api_test templates.
Each RF testsuite/case has TAGs associated with it that describe what environment that it can be run on: HW/VM, or what topology it requires. RobotFramework is executed with parameter that links to topology description file, we call it topology for simplicity. This file is parsed to variable “nodes” and later used in test cases and libraries.
In general test cases are written in readable English, so that even non-coders can understand it. These top level test cases should stay the same; in other words the testcase text should not represent “how” the test is done, but “what” the test case does.
Libraries to handle VPP functionality are written in Python and are separated on per-feature basis: v4, v6, interface (admin up, state status and so on), xconnect and bdomain. More modules are going to be implemented when needed.
Performance tests are executed using packet traffic generators external to servers running VPP code. Python APIs are used to control the traffic generators. Linux Foundation hosts physical infrastructure dedicated to FD.io, consisting of three of 3-compute-node performance testbeds (compute node = x86_64 multi-core server). Two of the compute nodes run VPP code, one runs a software traffic generator. Currently CSIT performance tests are executed using trex.
CSIT Test Code Guidelines
WORK IN PROGRESS
Here are some guidelines for writing reliable, maintainable, reusable and readable Robot Framework (RF) test code. There is used Robot Framework version 2.9.2 (user guide) in CSIT.
RobotFramework test case files and resource files
- General
- RobotFramework test case files and resource files use special extension .robot
- Usage of pipe and space separated file format is strongly recommended. Tabs are invisible characters which is error prone.
- Files should be encoded in ASCII. Non-ASCII characters are allowed but they must be encoded in UTF8 (the default Robot source file encoding).
- Line length is limited to 80 characters.
- There must be included licence (/csit/docs/licence.rst) at the begging of each file.
- Copy-pasting of the code is unwanted practice, any code that could be re-used has to be put into RF keyword (KW) or python library instead of copy-pasted.
- Test cases
- Test cases are written in Behavior-driven style – i.e. in readable English so that even non-technical project stakeholders can understand it:
*** Test Cases *** | VPP can encapsulate L2 in VXLAN over IPv4 over Dot1Q | | Given Path for VXLAN testing is set | | ... | ${nodes['TG']} | ${nodes['DUT1']} | ${nodes['DUT2']} | | And Interfaces in path are up | | And Vlan interfaces for VXLAN are created | ${VLAN} | | ... | ${dut1} | ${dut1s_to_dut2} | | ... | ${dut2} | ${dut2s_to_dut1} | | And IP addresses are set on interfaces | | ... | ${dut1} | ${dut1s_vlan_name} | ${dut1s_vlan_index} | | ... | ${dut2} | ${dut2s_vlan_name} | ${dut2s_vlan_index} | | ${dut1s_vxlan}= | When Create VXLAN interface | ${dut1} | ${VNI} | | | ... | ${dut1s_ip_address} | ${dut2s_ip_address} | | And Interfaces are added to BD | ${dut1} | ${BID} | | ... | ${dut1s_to_tg} | ${dut1s_vxlan} | | ${dut2s_vxlan}= | And Create VXLAN interface | ${dut2} | ${VNI} | | | ... | ${dut2s_ip_address} | ${dut1s_ip_address} | | And Interfaces are added to BD | ${dut2} | ${BID} | | ... | ${dut2s_to_tg} | ${dut2s_vxlan} | | Then Send and receive ICMPv4 bidirectionally | | ... | ${tg} | ${tgs_to_dut1} | ${tgs_to_dut2}
- Every test case should contain short documentation. (example will be added) This documentation will be used by testdoc tool - Robot Framework's built-in tool for generating high level documentation based on test cases.
- Do not use hard-coded constants. It is recommended to use the variable table (***Variables***) to define test case specific values. Use the assignment sign = after the variable name to make assigning variables slightly more explicit:
*** Variables *** | ${VNI}= | 23
- Common test case specific settings of the test environment should be done in Test Setup part of the Setting table ease on (***Settings***).
- Post-test cleaning and processing actions should be done in Test Teardown part of the Setting table (e.g. download statistics from VPP nodes). This part is executed even if the test case has failed. On the other hand it is possible to disable the tear-down from command line, thus leaving the system in “broken” state for investigation.
- Every TC must be correctly tagged. List of defined tags is in /csit/docs/tag_documentation.rst file.
- User high-level keywords specific for the particular test case can be implemented in the keyword table of the test case to enable readability and code-reuse.
- Resource files
- Used to implement higher-level keywords that are used in test cases or other higher-level keywords.
- Every keyword must contains Documentation where the purpose and arguments of the KW are described.
- The best practice is that the KW usage example is the part of the Documentation. It is recommended to use pipe and space separated format for the example.
- Keyword name should describe what the keyword does, specifically and in a reasonable length (“short sentence”).
Python library files
- General
- Used to implement low-level keywords that are used in resource files (to create higher-level keywords) or in test cases.
- Higher-level keywords can be implemented in python library file too, especially in the case that their implementation in resource file would be too difficult or impossible, e.g. nested FOR loops or branching.
- Every keyword, Python module, class, method, enums has to contain documentation string with the short description and used input parameters and possible return value(s).
- The best practice is that the KW usage example is the part of the Documentation. It should contains two parts – RobotFramework example and Python example. It is recommended to use pipe and space separated format in case of RobotFramework example.
- KW usage examples can be grouped and used in the class documentation string to provide better overview of the usage and relationships between KWs.
- Keyword name should describe what the keyword does, specifically and in a reasonable length (“short sentence”).
- There must be included licence (/csit/docs/licence.rst) at the begging of each file.
- Coding
- It is recommended to use some standard development tool (e.g. PyCharm Community Edition) and follow PEP-8 recommendations.
- All python code (not only RF libraries) must adhere to PEP-8 standard. This is enforced by CSIT Jenkins verify job.
- Indentation – do not use tab for indents! Indent is defined as four spaces.
- Line length – limited to 80 characters.
- Imports - use the full pathname location of the module, e.g. from resources.libraries.python.topology import Topology. Imports should be grouped in the following order: 1. standard library imports, 2. related third party imports, 3. local application/library specific imports. You should put a blank line between each group of imports.
- Blank lines - Two blank lines between top-level definitions, one blank line between method definitions.
- Do not use global variables inside library files.
- Constants – avoid to use hard-coded constants (e.g. numbers, paths without any description). Use configuration file(s), like /csit/resources/libraries/python/constants.py, with appropriate comments.
- Logging – log at the lowest possible level of implementation (debugging purposes). Use same style for similar events. Keep logging as verbose as necessary.
- Exceptions – use the most appropriate exception not general one („Exception“ ) if possible. Create your own exception if necessary and implement there logging, level debug.
Performance testing
CSIT performance testing followis the same approach and uses the same code structure as CSIT functional testing. The main difference is that performance testing is currently executed on dedicated, bare-metal, physical compute test beds.
There are three physical testbeds available for CSIT performance testing. A reservation system script is used to prevent the execution of more than one running instance of CSIT test suite per testbed. The only testbed topology available today is the 3-node-single-link-topo.
Traffic generator
The core of of the performance testing requires the initialization of the traffic generator. The traffic generator script is optimized for high performance testing, to control multiple streams with high throughput. All current performance test cases create two symmetric packet streams, for bi-directional throughput testing. The tests may be extended in future to also test for latency. A Python API is used to control the traffic generator including: starting the generator, creating the necessary streams based on input parameters, starting the traffic and reading the measured output. Low level traffic generator scripts are encapsulated using appropriate Robot Framework (RF) keywords. All the performance related RF keywords are located in a dedicated robot library and can be used for creating new test suites/cases by adding keywords into the settings section:
| Resource | resources/libraries/robot/performance.robot
Keywords used to initialize the the traffic generator are located in performance.robot library:
| Initialize traffic generator | ${tg} | ${tg_if1} | ${tg_if2} | ... | ${dut1} | ${dut1_if1} | ${dut1_if2} | ... | ${dut2} | ${dut2_if1} | ${dut2_if2} | ... | ${topology_type}
which is also part of suite setup:
| 3-node Performance Suite Setup
To setup a test with NIC topology filtering:
| 3-node Performance Suite Setup with DUT's NIC model
(Note: Currently T-rex is the only traffic generator used in CSIT performance testing. More traffic generators will be available soon. DropRateSearch.py provides a TG independent implementation of throughput search algorithms for finding the Non Drop Rate (NDR) and Partial Drop Rate (PDR).)
Performance testing DUTs
Each DUT in the test is initialized on a per test case basis with a startup and a running configuration. The test case name should be defined in behavioral style. The following is an example of a test case defined in long_bridge_domain.robot suite:
| Find NDR by using RFC2544 linear search and 64B frames through bridge domain in 3-node topology | | [Documentation] | | ... | Find throughput with non drop rate for 64B frames by using | | ... | linear search starting at 4.1Mpps, stepping down with step of 0.1Mpps | | [Tags] | 1_THREAD_NOHTT_RSS_1 | SINGLE_THREAD | | ${framesize}= | Set Variable | 64 | | ${start_rate}= | Set Variable | 4100000 | | ${step_rate}= | Set Variable | 100000 | | ${min_rate}= | Set Variable | 100000 | | ${max_rate}= | Set Variable | 14880952 | | Given Setup '1' worker threads and rss '1' without HTT on all DUTs | | And L2 bridge domain initialized in a 3-node circular topology | | Then Find NDR using linear search and pps | ${framesize} | ${start_rate} | | ... | ${step_rate} | 3-node-bridge | | ... | ${min_rate} | ${max_rate}
In this example the name indicates that test case will search for the NDR (Non Drop Rate) throughput for 64B frames following RFC2544 by using a linear search algorithm in a 3-node topology. As we are testing the performance of DUT, it is crucial to set up the startup configuration to best suit our needs. We achieve this by using the proper keyword from the default.robot library:
| | Given Setup '1' worker threads and rss '1' without HTT on all DUTs
This will set up the DUTs and apply a VPP startup configuration specific to every DUT in the traffic path. The startup configuration includes PCI interface information that is read from the topology file. Note that setting up startup configuration may require a restart of the DUT -- not SUT.
After starting VPP with the startup configuration we need to initialize the running configuration by calling the appropriate keywords, and in some cases, custom keywords may need be added or used.
| | And L2 bridge domain initialized in a 3-node circular topology
The currently available keywords in performance.robot library are 'L2 bridge domain', 'IPv4 forwarding', 'IPv6 forwarding', 'L2 xconnect' and 'VLAN dot1q'.
Throughput measurement initialization is implemented as a keyword:
| | Then Find NDR using linear search and pps | ${framesize} | ${start_rate} | | ... | ${step_rate} | 3-node-bridge | | ... | ${min_rate} | ${max_rate}
So far we have implemented Linear Search, Binary Search and Combined Search (Linear followed by Binary for refined search). Both NDR (Non Drop Rate) and PDR (Partial Drop Rate) rates can be defined as search criteria. Reporting is implemented as part of the search keywords definition. All of the keywords are defined in the performance.robot library.
All of the keywords are parametrized and it is easy to control the tests by setting the variables accordingly.
Performance testing description
There are two main types of performance tests.
- Short – run traffic based on topology and setup and FAIL if there was packet loss. The traffic duration is set to 10 seconds. Each test case generates a single burst of traffic. The tag for the short test is : PERFTEST_SHORT.
- Long – use one of the available algorithms to search for PDR/NDR based on RFC2544. Each trial is currently set to run for 60 seconds. The tag for this type of test is: PERFTEST_LONG.
Each of the long tests will have its own version of PDR (Partial Drop Rate) or NDR (Non Drop rate). The level of acceptable PDR loss is set as a parameter for each suite. The loss acceptance type should be either 'Frames' lost or 'Percentage' of interface line rate.
Performance testing Test Teardown
The test statistics are shown during the test teardown phase.
Performance testing Suite Teardown
The traffic generator is stopped as part of the test suite teardown. (Note: Actual teardown phase depends and varies based on which traffic generator is used.)
Performance testing tags
We designed a custom TAG scheme to describe performance testing and setup.. Full documentation can be found in tag_documentation.rst file. Each performance test or suite must contain aPERFTEST tag. It also should contain tags specific to the startup configuration (e.g.: SINGLE_THREAD) or testing methodology (e.g.: NDR, PERFTEST_LONG).
Performance testing name conventions
All performance suites must conform with the following naming convention and must be inside of a performance suite:
[Short|Long]_[Test_Type]_[NIC_Type]
Example use:
performance/Short_Xconnect_Intel-X520-DA2.robot
Performance testing report readability
We are reporting total packet Bandwidth [Gbps] and Throughput (pps) per stream and also total aggregate from traffic generator perspective. In case of any packet loss is observed we are reporting this information. We are seeking feedback on the performance test reporting for use in updating the system.
The full 'pybot' log can be downloaded and examined for more details. The Log contains all the information available (which depends on thh verbose level parameter -L). Each keyword has its own section with all the output such as: startup configuration, running telemetry and variables.