CSIT/CSIT LF VIRL testbed

From fd.io
< CSIT
Revision as of 11:48, 17 May 2017 by Mackonstan (Talk | contribs)

Jump to: navigation, search

VIRL infrastructure open tasks

This is the current working list of identified tasks for CSIT VIRL testbeds. It is updated periodically. All listed tasks and sub-tasks are tracked in CSIT jira:

  • High Priority Tasks grouped by Epic: VIRL-GetWellPlan.
    • CSIT-581 Address all known issues impacting CSIT VIRL testbeds stability and operation.
  • Other Priority Tasks grouped by Epic: VIRL-Optimizations.
    • CSIT-606 Address all known issues to optimize CSIT VIRL testbeds usability and operation.

High Priority Tasks

  1. [WIP] Detecting and clearing stuck VIRL simulations. CSIT-582.
    1. Description: Continue getting stuck VIRL simulations due to either LF network connectivity interruptions or failing CSIT bootstrap teardown.
    2. Solution: automate clearing old (garbage) simulations, detect non-successful simulation teardowns.
    3. Tasks:
      • [IN-REVIEW] Use built-in VIRL simulation expire timer set to 2hrs.
        • coded default simulation expiry to 120min, semi-weekly to 500min, weekly to 120min. CSIT-579. gr6656.
      • [OPEN] Extend CSIT bootstrap teardown stop-simulation API call to verify if it was SUCCESS/FAIL. CSIT-583.
  2. [WIP] Add VIRL server healthchecks in CSIT. CSIT-584.
    1. Description: no regular automated healthchecks executed against VIRL servers.
    2. Solution: introduce a CSIT health-check monitoring job for VIRL servers' health.
    3. Tasks:
      • [OPEN] Create a new job, executed periodically (6hrs?) for healthchecking all VIRL servers. CSIT-585.
      • [OPEN] VIRL health-check APIs: health status, VIRL API tests, simulation tests. CSIT-586.
      • [OPEN] VIRL capacity check, report number of simulations per virl server. CSIT-587.
      • [IN-REVIEW] pre-check to every start-testcase to better handle exceptions and printing errors. CSIT-579. gr6656.
  3. [OPEN] Address VIRL simulation mgmt IPv4 address depletion. CSIT-588.
    1. Description: Today there is one /24 subnet allocated for all VIRL simulations, split equally across 3 servers, 84 /32 addresses per server. Each CSIT simulation takes 4 addresses (mgmt, tg, sut1, sut2), each csit-vpp and vpp-csit verify job uses 3 simulation to parallized tests for reduced execution time. This means each server has capacity to run up to 7 verify jobs concurrently (3*4*7). Once Centos7 tests productized, where two jobs are always executed in parallel, this will reduce it down to 3 concurrent jobs. Not good. It's basically a show stopper to productize Centos7 into vpp-csit-verify per patch jobs.
    2. Solution: Need to increase IPv4 address space given to VIRL hosts. Dedicating /24 subnet per VIRL server, will give address capacity for 60 concurrent simulations. Based on previous memory calcs each VIRL host is capable of doing 30 simulations (30*3 VMs) - need to test verify this.
    3. Tasks:
  4. [WIP] Script expecting VIRL sim nodes to be active within ca. 120sec after launch request - this is too tight. []
    1. Description: Intermittent test job failures due to 'ERROR: Simulation started OK but devices never changed to ACTIVE state’. Number of these can be avoided by increasing the script timeout to 240sec or so.
    2. Solution: Increasing the script timeout to 240sec or so. But don’t wait 4min every time before trying, as this will add to the overall execution time.
    3. Tasks:
      • [WIP] Increase test script timeout to 240sec. CSIT-593
  5. [WIP] tb4-virl servers upgrade to ubuntu16.04, VIRL-core ver. 10.32.8, OpenStack Mitaka. CSIT-594
    1. Description: virl upgrade to address issues with Centos7 test instabilities related to QEMU, and to improve general virl system robustness.
    2. Solution: upgrade tb4-virl1 server to ubuntu16.04, VIRL-core ver. 10.32.8, OpenStack Mitaka. verify stability. follow gradually with tb4-virl2 and then tb4-virl3 upgrades.
    3.  Tasks:
      • [WIP] VIRL1 server 10.30.51.28 - currently in STAGING, resolving issues. testing ongoing. CSIT-595.
      • [OPEN] VIRL1 server 10.30.51.28 - move to PRODUCTION once determined stable. Monitor PRODUCTION performance. CSIT-596.
      • [OPEN] VIRL1 server 10.30.51.28 - complete upgrade process documentation and ansible scripts. CSIT-597.
      • [OPEN] VIRL2 server 10.30.51.29 - upgrade based on documentation and ansible scripts rom VIRL1 uprade process, verify stability. CSIT-598.
      • [OPEN] VIRL2 server 10.30.51.29 - once stable, move to PRODUCTION. [1].
      • [OPEN] VIRL3 server 10.30.51.30 - upgrade based on documentation and ansible scripts rom VIRL1 uprade process, verify stability. CSIT-600.
      • [OPEN] VIRL3 server 10.30.51.30 - once stable, move to PRODUCTION. CSIT-601.
  6. [DONE] Need to periodically delete old files in /tmp directory. CSIT-578.
    1.  Tasks:
      • [DONE] Cron job to delete old (more then 2 weeks?) files in /tmp directory on every VIRL server. CSIT-578.
        crontab -e
        0 0 * * * * find /var/log/libvirt/qemu -type f -mtime +14 -name "instance*.log" -delete
        0 0 * * * * find /tmp -type f -atime +14 -name "*.deb" -delete
        0 0 * * * * find /tmp -type f -atime +14 -name "*.rpm" -delete
        0 0 * * * * find /nfs/scratch/ -type d -mtime +14 -name "session-*"
  7. [WIP] VIRL Centos7 tests productization into vpp-csit-verify. CSIT-602.
    1. Description: Following upgrade of tb4-virl1, Centos7 tests should be ready for productization.
    2. Solution: Proposal to run Centos7 tests periodically (daily) instead of per patch, to avoid VIRL simulations overload.
    3. Tasks:
      • [OPEN] Verify stability of csit-vpp-verify-Centos7 jobs. CSIT-603.
      • [OPEN] Create a daily vpp-csit-verify-Centos7 job. CSIT-604.

Other Priority Tasks

  1. [OPEN] CSIT-90: Nested-VM boot-up failed.
  2. [DELETED?] CSIT-210: Nested VM to include l3fwd startup script.
  3. [OPEN] CSIT-161: Update nested VM qemu library to use 3rd serial console.
  4. [OPEN] CSIT-356: Update VIRL testbed creation to allow specification of centos image.
  5. [OPEN] CSIT-605: Parameterize selection of VIRL nested VM image.
    1. Description: Currently VIRL is using only the latest nested Ubuntu or Centos VM image for all VM tests. Current inventory of VIRL nested Ubuntu VM images is tracked in https://git.fd.io/csit/tree/resources/tools/disk-image-builder/nested/CHANGELOG.
    2. Solution: Parameterize selection of VIRL nested VM image to allow tests to use specific VM image version - start with Ubuntu.
    3. Tasks: to be identified.
  6. [OPEN] CSIT-116: Modify VIRL and nested-VM username/password.
  7. [OPEN] CSIT-159: Nested VM: Replace cisco/cisco credentials with csit/csit.
  8. [OPEN] CSIT-160: Ubuntu VM: Replace cisco login with csit.
  9. [OPEN] CSIT-145: Out-of-band access to SUTs.
  10. [OPEN] CSIT-151: Do not destroy VM in case of test failure due to infrastructure issue.
  11. [OPEN] CSIT-150: Health-check to capture TG/SUT environment after failed test case.
  12. [OPEN] CSIT-202: Execute start/stop-testcase scripts from git repository.
  13. [OPEN] CSIT-115: Usage and status monitoring of VIRL hosts.
  14. [OPEN] CSIT-112: VIRL infrastructure periodic creation and distribution of images.