Difference between revisions of "CSIT/CSIT LF VIRL testbed"

From fd.io
Jump to: navigation, search
Line 2: Line 2:
  
 
This is the current working list of identified tasks for CSIT VIRL testbeds. It is updated periodically.
 
This is the current working list of identified tasks for CSIT VIRL testbeds. It is updated periodically.
All listed tasks and sub-tasks are grouped in CSIT jira by [https://jira.fd.io/browse/CSIT-581 CSIT-581] Epic Name: VIRL-GetWellPlan.
+
All listed tasks and sub-tasks are grouped in CSIT jira by [https://jira.fd.io/browse/CSIT-581 CSIT-581]:
 +
 
 +
* Summary: Address all known issues impacting CSIT VIRL testbeds stability and operation.
 +
* Epic Name: VIRL-GetWellPlan.
  
 
===High Priority Tasks===
 
===High Priority Tasks===

Revision as of 11:03, 17 May 2017

VIRL infrastructure open tasks

This is the current working list of identified tasks for CSIT VIRL testbeds. It is updated periodically. All listed tasks and sub-tasks are grouped in CSIT jira by CSIT-581:

  • Summary: Address all known issues impacting CSIT VIRL testbeds stability and operation.
  • Epic Name: VIRL-GetWellPlan.

High Priority Tasks

  1. [WIP] Detecting and clearing stuck VIRL simulations. CSIT-582.
    1. Description: Continue getting stuck VIRL simulations due to either LF network connectivity interruptions or failing CSIT bootstrap teardown.
    2. Solution: automate clearing old (garbage) simulations, detect non-successful simulation teardowns.
    3. Tasks:
      • [IN-REVIEW] Use built-in VIRL simulation expire timer set to 2hrs.
        • coded default simulation expiry to 120min, semi-weekly to 500min, weekly to 120min. CSIT-579. gr6656.
      • [OPEN] Extend CSIT bootstrap teardown stop-simulation API call to verify if it was SUCCESS/FAIL. CSIT-583.
  2. [WIP] Add VIRL server healthchecks in CSIT. CSIT-584.
    1. Description: no regular automated healthchecks executed against VIRL servers.
    2. Solution: introduce a CSIT health-check monitoring job for VIRL servers' health.
    3. Tasks:
      • [OPEN] Create a new job, executed periodically (6hrs?) for healthchecking all VIRL servers. CSIT-585.
      • [OPEN] VIRL health-check APIs: health status, VIRL API tests, simulation tests. CSIT-586.
      • [OPEN] VIRL capacity check, report number of simulations per virl server. CSIT-587.
      • [IN-REVIEW] pre-check to every start-testcase to better handle exceptions and printing errors. CSIT-579. gr6656.
  3. [OPEN] Address VIRL simulation mgmt IPv4 address depletion. CSIT-588.
    1. Description: Today there is one /24 subnet allocated for all VIRL simulations, split equally across 3 servers, 84 /32 addresses per server. Each CSIT simulation takes 4 addresses (mgmt, tg, sut1, sut2), each csit-vpp and vpp-csit verify job uses 3 simulation to parallized tests for reduced execution time. This means each server has capacity to run up to 7 verify jobs concurrently (3*4*7). Once Centos7 tests productized, where two jobs are always executed in parallel, this will reduce it down to 3 concurrent jobs. Not good. It's basically a show stopper to productize Centos7 into vpp-csit-verify per patch jobs.
    2. Solution: Need to increase IPv4 address space given to VIRL hosts. Dedicating /24 subnet per VIRL server, will give address capacity for 60 concurrent simulations. Based on previous memory calcs each VIRL host is capable of doing 30 simulations (30*3 VMs) - need to test verify this.
    3. Tasks:
  4. [WIP] Script expecting VIRL sim nodes to be active within ca. 120sec after launch request - this is too tight. []
    1. Description: Intermittent test job failures due to 'ERROR: Simulation started OK but devices never changed to ACTIVE state’. Number of these can be avoided by increasing the script timeout to 240sec or so.
    2. Solution: Increasing the script timeout to 240sec or so. But don’t wait 4min every time before trying, as this will add to the overall execution time.
    3. Tasks:
      • [WIP] Increase test script timeout to 240sec. CSIT-593
  5. [WIP] tb4-virl servers upgrade to ubuntu16.04, VIRL-core ver. 10.32.8, OpenStack Mitaka. CSIT-594
    1. Description: virl upgrade to address issues with Centos7 test instabilities related to QEMU, and to improve general virl system robustness.
    2. Solution: upgrade tb4-virl1 server to ubuntu16.04, VIRL-core ver. 10.32.8, OpenStack Mitaka. verify stability. follow gradually with tb4-virl2 and then tb4-virl3 upgrades.
    3.  Tasks:
      • [WIP] VIRL1 server 10.30.51.28 - currently in STAGING, resolving issues. testing ongoing. CSIT-595.
      • [OPEN] VIRL1 server 10.30.51.28 - move to PRODUCTION once determined stable. Monitor PRODUCTION performance. CSIT-596.
      • [OPEN] VIRL1 server 10.30.51.28 - complete upgrade process documentation and ansible scripts. CSIT-597.
      • [OPEN] VIRL2 server 10.30.51.29 - upgrade based on documentation and ansible scripts rom VIRL1 uprade process, verify stability. CSIT-598.
      • [OPEN] VIRL2 server 10.30.51.29 - once stable, move to PRODUCTION. [1].
      • [OPEN] VIRL3 server 10.30.51.30 - upgrade based on documentation and ansible scripts rom VIRL1 uprade process, verify stability. CSIT-600.
      • [OPEN] VIRL3 server 10.30.51.30 - once stable, move to PRODUCTION. CSIT-601.
  6. [DONE] Need to periodically delete old files in /tmp directory. CSIT-578.
    1.  Tasks:
      • [DONE] Cron job to delete old (more then 2 weeks?) files in /tmp directory on every VIRL server. CSIT-578.
        crontab -e
        0 0 * * * * find /var/log/libvirt/qemu -type f -mtime +14 -name "instance*.log" -delete
        0 0 * * * * find /tmp -type f -atime +14 -name "*.deb" -delete
        0 0 * * * * find /tmp -type f -atime +14 -name "*.rpm" -delete
        0 0 * * * * find /nfs/scratch/ -type d -mtime +14 -name "session-*"
  7. [WIP] VIRL Centos7 tests productization into vpp-csit-verify. CSIT-602.
    1. Description: Following upgrade of tb4-virl1, Centos7 tests should be ready for productization.
    2. Solution: Proposal to run Centos7 tests periodically (daily) instead of per patch, to avoid VIRL simulations overload.
    3. Tasks:
      • [OPEN] Verify stability of csit-vpp-verify-Centos7 jobs. CSIT-603.
      • [OPEN] Create a daily vpp-csit-verify-Centos7 job. CSIT-604.

Other Tasks

  1. CSIT-116 [2]: Modify VIRL and nested-VM username/password
  2. CSIT-159 [3]: Nested VM: Replace cisco/cisco credentials with csit/csit
  3. CSIT-160 [4]: Ubuntu VM: Replace cisco login with csit
  4. CSIT-145 [5]: Out-of-band access to SUTs
  5. CSIT-151 [6]: Do not destroy VM in case of test failure due to infrastructure issue
  6. CSIT-150 [7]: Health-check to capture TG/SUT environment after failed test case
  7. CSIT-202 [8]: Execute start/stop-testcase scripts from git repository
  8. CSIT-115 [9]: Usage and status monitoring of VIRL hosts
  9. CSIT-112 [10]: VIRL infrastructure periodic creation and distribution of images
  10. CSIT-90 [11]: Nested-VM boot-up failed
  11. CSIT-210 [12]: Nested VM to include l3fwd startup script
  12. CSIT-161 [13]: Update nested VM qemu library to use 3rd serial console
  13. CSIT-356 [14]: Update VIRL testbed creation to allow specification of centos image
  14. [OPEN] - Currently the latest nested VM image is used for all Ubuntu/Centos images
    1. Description: need solution to be able to link different nested VM images to different ubuntu/centos images