Nomad Operations and Planning

Jump to: navigation, search

Nomad clusters are hosted on dedicated servers in the lab and used to manage Docker container based executors for project CI jobs.

Physical Lab Infrastructure

Nomad Operational Status

TBD - add description or link to Nomad architecture / configuration

  • TBD - Add links to Nomad monitoring status / data

Nomad Operations Tasks

This is the current list of high priority Nomad tasks.

Task Description Owner  % Complete ETA
Move Nomad Operational Docker images from snergster docker hub account into fdiotools docker hub account. Dave W. 50% June 1, 2020
Update Ubuntu1804 & Centos7 Nomad Docker images to include clang-9 toolchain packages required by VPP 'make install-deps' and lf-infra-publish macro. Dave W. 25% June 2, 2020
New Jenkins/Nomad labels for production, verify, & sandbox Dave W. June 1, 2020
Nomad server OS upgrades/normalization. Utilize ansible to create a uniform bare-metal OS environment across all Nomad servers. Peter M. 99.9% May 29, 2020
Build & test ubuntu 20.04 and centos8 Docker images for CI executors to run respective OS jobs. Dave W.
Fix server-type-c4-3 ( SDD with an HDD, reinstall Ubuntu 18.04 and restore to Nomad cluster. Dave W. Vexxhost Ticket Created TBD
Update VPP ci-management configurations to use global jjb macros (lf-publisher & build-discarder) Vanessa V.
Export Gerrit & Jenkins logs and other operational data to Nomad servers Dave W. & Vanessa V. LF Ticket Created TBD

Nomad Planning Wish List

This is the list of long term Nomad tasks. Please move them to the Nomad Operations Tasks and provide owner/ET information when they are being actively worked on.

  • Clean up old/unused Jenkins jobs.
  • Use Configuration as Code Jenkins Plugin to manage Jenkins configuration (including Nomad Plugin) via YAML configuration files.
  • Nomad cluster resiliency testing/hardening improvements.
  • Nomad docker image CI/CD pipeline.
  • Convert task list to use JIRA tickets/epics for tracking ongoing Nomad work.
  • Create ci-management jobs to do automated build/test/verify for CI process and weekly upgrade of docker images.
  • Add Nomad nodes to LF DNS & make the names the same as the hostname.
  • Add automated Nomad server quorum loss tests
  • Add VPP "test crash" testcase to 'make test'
  • Add VPP 'make test-debug w/ ASAN enabled' verify job
  • Investigate Jenkins Nomad-plugin security issues.
  • Convert Nomad/Jenkins/Gerrit monitoring/screen-scraping hacks into an operational monitoring system using exported gerrit & jenkins logs & nomad cli output.
  • Add a mechanism to measure/track the memory consumed by the CI jobs inside Docker images. pmikus_comment: Depends if we want ability to do live monitoring or ability of storing logs (how long?). I can make prometheus to work for us by very simple change in config.

Completed Nomad Tasks

Task Description Owner  % Complete Finish Date
Move Nomad build executor Dockerfiles from* into the ci-management project. Dave W. 100% April 29, 2020
Add a sudoer/admin account to all Nomad Servers. Dave W. 100% May 18, 2020
Move server-type-c4-2 from Class 's5ci' to Class 'builder' to cover t4-virl* nomad clients during upgrade. Dave W. 100% May 18, 2020
Perform fresh installation of Ubuntu 18.04 Server on t4-virl1, t4-virl2, & t4-virl3 Peter M. 100% May 25, 2020
Restore Nomad configuration on t4-virl1, t4-virl2, & t4-virl3 and rejoin on VPP cluster. Peter M. 100% May 26, 2020