Difference between revisions of "Nomad Operations and Planning"

From fd.io
Jump to: navigation, search
(Nomad Planning Wish List)
(Proposed Jenkins Nomad Plugin Label to Docker Image Mapping)
Line 24: Line 24:
 
dockerhub repository associated with the label.
 
dockerhub repository associated with the label.
  
'''NOTE''': the 'sandbox labels' should NEVER be merged into the ci-management repository.
+
==== Label Nomenclature ====
 +
fdiotools-prod-<nowiki>{arch}-{os}[-{job}]</nowiki>
 +
fdiotools-verify-<nowiki>{arch}-{os}[-{job}]</nowiki>
 +
fdiotools-sandbox-<nowiki>{arch}-{os}[-{job}]</nowiki>
  
<nowiki>
+
where
fdiotools-prod-{arch}-{os}
+
* <nowiki>{arch} =</nowiki> x86_64, arm64, ...
fdiotools-sandbox-{arch}-{os}
+
* <nowiki>{os} =</nowiki> ubuntu1804, ubuntu2004, centos7, centos8, ...
</nowiki>
+
* <nowiki>{job} =</nowiki> project/job abbreviation (tbd)
  
'''QUESTION 1''': Is there a need for project specific images?
+
'''NOTE''': The 'sandbox' and 'verify' labels should NEVER be merged into the ci-management repository and will be added only to the Jenkins sandbox instance. The 'prod' labels will be added to both sandbox and production Jenkins instance.
 
+
'''QUESTION 2''': Is there a need for different size labels (small, medium, large) with different resource allocations (eg. CPU, Memory, etc)?
+
  
 +
==== Examples ====
 
{| class="wikitable"
 
{| class="wikitable"
 
! Label
 
! Label
 
! Dockerhub Repo
 
! Dockerhub Repo
 
|-
 
|-
| fdiotools-prod-amd-ubuntu1804
+
| fdiotools-prod-x86_64-ubuntu1804
| https://hub.docker.com/repository/docker/fdiotools/prod_amd_ubuntu1804
+
| https://hub.docker.com/repository/docker/fdiotools/prod_x86_64_ubuntu1804
 
|-
 
|-
| fdiotools-prod-arm-ubuntu1804
+
| fdiotools-prod-arm64-ubuntu1804
| https://hub.docker.com/repository/docker/fdiotools/prod_arm_ubuntu1804
+
| https://hub.docker.com/repository/docker/fdiotools/prod_arm64_ubuntu1804
 
|-
 
|-
| fdiotools-sandbox-amd-ubuntu1804
+
| fdiotools-verify-x86_64-ubuntu1804
| https://hub.docker.com/repository/docker/fdiotools/sandbox_amd_ubuntu1804
+
| https://hub.docker.com/repository/docker/fdiotools/verify_x86_64_ubuntu1804
 
|-
 
|-
| fdiotools-sandbox-arm-ubuntu1804
+
| fdiotools-verify-arm64-ubuntu1804
| https://hub.docker.com/repository/docker/fdiotools/sandbox_arm_ubuntu1804
+
| https://hub.docker.com/repository/docker/fdiotools/verify_arm64_ubuntu1804
 +
|-
 +
| fdiotools-sandbox-x86_64-ubuntu1804
 +
| https://hub.docker.com/repository/docker/fdiotools/sandbox_x86_64_ubuntu1804
 +
|-
 +
| fdiotools-sandbox-arm64-ubuntu1804
 +
| https://hub.docker.com/repository/docker/fdiotools/sandbox_arm64_ubuntu1804
 
|}
 
|}
 +
 +
==== Q & A ====
 +
'''QUESTION 1''': Is there a need for project specific images?
 +
 +
'''ANSWER 1''': Current project jobs are generally tightly bound to VPP Infra dependencies with some additional requirements.  For most executors it makes sense to combine dependencies into the VPP docker images rather than have per project images.  Project jobs that are independent will have separate images (e.g. csit 'vpp-device' test).
 +
 +
'''QUESTION 2''': Is there a need for different size labels (small, medium, large) with different resource allocations (eg. CPU, Memory, etc)?
 +
 +
'''ANSWER 2''': Currently most of the 'size' flavors are unused.  Initial plan is to not have size based labels.  If specific jobs have different size requirements, then labels will be added with job specific postfix.
  
 
=== Legacy Jenkins Label to Docker Image Mapping ===
 
=== Legacy Jenkins Label to Docker Image Mapping ===

Revision as of 16:04, 29 May 2020

Nomad clusters are hosted on dedicated servers in the FD.io lab and used to manage Docker container based executors for FD.io project CI jobs.

Physical Lab Infrastructure

Nomad Operational Status

  • TBD - add description or link to Nomad architecture / configuration
  • TBD - Add links to Nomad monitoring status / data

Jenkins Nomad Plugin Configuration

Proposed Jenkins Nomad Plugin Label to Docker Image Mapping

Jenkins Nomad Plugin Labels can only be created via LF Service Desk requests. In order to test CI jobs in the Jenkins Sandbox, a different label must exist which points to a sandbox Dockerhub repository to avoid disrupting operational jobs during docker image testing. This is the proposed nomenclature for the mapping of Jenkins Nomad Plugin labels and the corresponding dockerhub repositories (production and sandbox variants).

All new docker images will first be pushed to the Dockerhub associated 'sandbox repo' and all CI jobs utilizing the Jenkins Nomad Plugin Label will be verified in the Jenkins Sandbox, by modifying the appropriate JJB YAML files to point to the sandbox repo. Once all CI jobs have been verified, then the image will be pushed to the operational dockerhub repository associated with the label.

Label Nomenclature

fdiotools-prod-{arch}-{os}[-{job}]
fdiotools-verify-{arch}-{os}[-{job}]
fdiotools-sandbox-{arch}-{os}[-{job}]

where

  • {arch} = x86_64, arm64, ...
  • {os} = ubuntu1804, ubuntu2004, centos7, centos8, ...
  • {job} = project/job abbreviation (tbd)

NOTE: The 'sandbox' and 'verify' labels should NEVER be merged into the ci-management repository and will be added only to the Jenkins sandbox instance. The 'prod' labels will be added to both sandbox and production Jenkins instance.

Examples

Label Dockerhub Repo
fdiotools-prod-x86_64-ubuntu1804 https://hub.docker.com/repository/docker/fdiotools/prod_x86_64_ubuntu1804
fdiotools-prod-arm64-ubuntu1804 https://hub.docker.com/repository/docker/fdiotools/prod_arm64_ubuntu1804
fdiotools-verify-x86_64-ubuntu1804 https://hub.docker.com/repository/docker/fdiotools/verify_x86_64_ubuntu1804
fdiotools-verify-arm64-ubuntu1804 https://hub.docker.com/repository/docker/fdiotools/verify_arm64_ubuntu1804
fdiotools-sandbox-x86_64-ubuntu1804 https://hub.docker.com/repository/docker/fdiotools/sandbox_x86_64_ubuntu1804
fdiotools-sandbox-arm64-ubuntu1804 https://hub.docker.com/repository/docker/fdiotools/sandbox_arm64_ubuntu1804

Q & A

QUESTION 1: Is there a need for project specific images?

ANSWER 1: Current project jobs are generally tightly bound to VPP Infra dependencies with some additional requirements. For most executors it makes sense to combine dependencies into the VPP docker images rather than have per project images. Project jobs that are independent will have separate images (e.g. csit 'vpp-device' test).

QUESTION 2: Is there a need for different size labels (small, medium, large) with different resource allocations (eg. CPU, Memory, etc)?

ANSWER 2: Currently most of the 'size' flavors are unused. Initial plan is to not have size based labels. If specific jobs have different size requirements, then labels will be added with job specific postfix.

Legacy Jenkins Label to Docker Image Mapping

These labels are currently defined (circa VPP 20.05) and will be removed once there are no references to them in the ci-management repo. Some of them are not being used today.

Legacy Label(s) Dockerhub Repo
ubuntu1604-us, ubuntu1604-s, ubuntu1604-m, ubuntu1604-l, ubuntu1604-hub-us, ubuntu1604-nus, https://hub.docker.com/repository/docker/snergster/vpp-ubuntu16
ubuntu1604arm-hub-us https://hub.docker.com/repository/docker/snergster/vpp-arm-ubuntu16
ubuntu1804-us, ubuntu1804-s, ubuntu1804-m, ubuntu1804-l, ubuntu1804-hub-us, vpp-csit-device, vpp-csit-ubuntu18, https://hub.docker.com/repository/docker/snergster/vpp-ubuntu18
ubuntu1804arm-hub-us, ubuntu1804arm-s, ubuntu1804arm-m, vpp-csit-arm-ubuntu18 https://hub.docker.com/repository/docker/snergster/vpp-arm-ubuntu18
centos7-us, centos7-s, centos7-m, centos7-l, centos7-hub-us, https://hub.docker.com/repository/docker/snergster/vpp-centos
ubuntu2004-us https://hub.docker.com/repository/docker/snergster/vpp-ubuntu20
centos8-us https://hub.docker.com/repository/docker/snergster/vpp-centos8

Nomad Operations Tasks

This is the current list of high priority Nomad tasks.

Task Description Owner  % Complete ETA
Move Nomad Docker images from https://hub.docker.com/search?q=snergster&type=image into fdiotools dockerhub account. Dave W. 50% June 1, 2020
Update Ubuntu1804 & Centos7 Nomad Docker images to include clang-9 toolchain packages required by VPP 'make install-deps' and lf-infra-publish macro. Dave W. 25% June 2, 2020
New Jenkins/Nomad labels for production, verify, & sandbox Dave W. June 1, 2020
Nomad server OS upgrades/normalization. Utilize ansible to create a uniform bare-metal OS environment across all Nomad servers. Peter M. 99.9% May 29, 2020
Build & test ubuntu 20.04 and centos8 Docker images for CI executors to run respective OS jobs. Dave W.
Fix server-type-c4-3 (10.32.8.16) SDD with an HDD, reinstall Ubuntu 18.04 and restore to Nomad cluster. Dave W. Vexxhost Ticket Created TBD
Update VPP ci-management configurations to use global jjb macros (lf-publisher & build-discarder) Vanessa V.
Export Gerrit & Jenkins logs and other operational data to Nomad servers Dave W. & Vanessa V. LF Ticket Created TBD

Nomad Planning Wish List

This is the list of long term Nomad tasks. Please move them to the Nomad Operations Tasks and provide owner/ET information when they are being actively worked on.

  • Convert task list to use JIRA tickets/epics for tracking ongoing Nomad work.
  • Create ci-management jobs to do automated build/test/verify for CI process and weekly upgrade of docker images.
  • Add Nomad nodes to LF DNS & make the names the same as the hostname.
  • Add VPP "test crash" testcase to 'make test'
  • Add VPP 'make test-debug w/ ASAN enabled' verify job
  • Convert Jenkins Nomad-plugin configuration spreadsheet to JJB managed YAML configuration files.
  • Investigate Jenkins Nomad-plugin security issues.
  • Convert Nomad/Jenkins/Gerrit monitoring/screen-scraping hacks into an operational monitoring system using exported gerrit & jenkins logs & nomad cli output.
  • Add a mechanism to measure/track the memory consumed by the CI jobs inside Docker images. pmikus_comment: Depends if we want ability to do live monitoring or ability of storing logs (how long?). I can make prometheus to work for us by very simple change in config.

Completed Nomad Tasks

Task Description Owner  % Complete Finish Date
Move Nomad build executor Dockerfiles from https://github.com/snergfdio/* into the ci-management project. Dave W. 100% April 29, 2020
Add a sudoer/admin account to all Nomad Servers. Dave W. 100% May 18, 2020
Move server-type-c4-2 from Class 's5ci' to Class 'builder' to cover t4-virl* nomad clients during upgrade. Dave W. 100% May 18, 2020
Perform fresh installation of Ubuntu 18.04 Server on t4-virl1, t4-virl2, & t4-virl3 Peter M. 100% May 25, 2020
Restore Nomad configuration on t4-virl1, t4-virl2, & t4-virl3 and rejoin on VPP cluster. Peter M. 100% May 26, 2020