Difference between revisions of "CSIT/csit-perf-env-tuning-ubuntu1604-obsolete"

From fd.io
Jump to: navigation, search
(Created page with "'''DRAFT - work in progress as part of CSIT rls1701 report''' https://wiki.fd.io/view/CSIT/csit-perf-env-tuning-ubuntu1604 We have upgraded the CSIT performance testbeds, an...")
 
m (Pmikus moved page CSIT/csit-perf-env-tuning-ubuntu1604 to CSIT/csit-perf-env-tuning-ubuntu1604-obsolete: Obsolete, not maintained anymore)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''DRAFT - work in progress as part of CSIT rls1701 report'''
+
We have upgraded the CSIT performance testbeds, and used this to apply kernel configuration changes that should address some of the issues we observed during performance tests in CSIT rls1609, mainly related to interactions with Qemu in vhost tests.
  
https://wiki.fd.io/view/CSIT/csit-perf-env-tuning-ubuntu1604
+
= Kernel boot parameters (grub) =
  
We have upgraded the CSIT performance testbeds, and used this to apply kernel configuration changes that should address some of the issues we observed during perf tests in csit1609, mainly related to interactions with qemu in vhost tests.
+
== Following kernel boot parameters are used in CSIT performance testbeds ==
  
# Kernel boot parameters (grub)
+
* `isolcpus` used for all cpu cores used for running VPP worker threads and isolcpus=<cpu number>-<cpu number> - [KNL,SMP] Isolate CPUs from the general scheduler, can be used to specify one or more CPUs to isolate from the general SMP balancing and scheduling algorithms. [KNL - Is a kernel start-up parameter, SMP - The kernel is an SMP kernel].
 +
** https://www.kernel.org/doc/Documentation/kernel-parameters.txt
 +
* intel_pstate=disable - [X86] Do not enable intel_pstate as the default scaling driver for the supported processors. Intel P-State driver decide what P-state (CPU core power state) to use based on requesting policy from the cpufreq core. [X86 - Either 32-bit or 64-bit x86]
 +
** https://www.kernel.org/doc/Documentation/kernel-parameters.txt
 +
** https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt
 +
* nohz_full=<cpu number>-<cpu number> - [KNL,BOOT] In kernels built with CONFIG_NO_HZ_FULL=y, set the specified list of CPUs whose tick will be stopped whenever possible. The boot CPU will be forced outside the range to maintain the timekeeping. The CPUs in this range must also be included in the rcu_nocbs= set. Specifies the adaptive-ticks CPU cores, causing kernel to avoid sending scheduling-clock interrupts to listed cores as long as they have a single runnable task. [KNL - Is a kernel start-up parameter, SMP - The kernel is an SMP kernel].
 +
** https://www.kernel.org/doc/Documentation/kernel-parameters.txt
 +
** https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
 +
* rcu_nocbs - [KNL] In kernels built with CONFIG_RCU_NOCB_CPU=y, set the specified list of CPUs to be no-callback CPUs, that never queue RCU callbacks (read-copy update).
 +
** https://www.kernel.org/doc/Documentation/kernel-parameters.txt
 +
** https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
  
1. Following kernel boot parameters are used in CSIT performance testbeds
+
== All grub command line parameters are applied during installation using CSIT ansible scripts ==
  
    a. `isolcpus` used for all cpu cores used for running VPP worker threads and
+
  $ cd resources/tools/testbed-setup/playbooks/
        isolcpus=<cpu number>-<cpu number> - [KNL,SMP] Isolate CPUs from the general scheduler, can be used to specify one or more CPUs to isolate from the general SMP balancing and scheduling algorithms. [KNL - Is a kernel start-up parameter, SMP - The kernel is an SMP kernel].
+
  $ more 01-host-setup.yaml
        - https://www.kernel.org/doc/Documentation/kernel-parameters.txt
+
 
    b. intel_pstate=disable - [X86] Do not enable intel_pstate as the default scaling driver for the supported processors. Intel P-State driver decide what P-state (CPU core power state) to use based on requesting policy from the cpufreq core. [X86 - Either 32-bit or 64-bit x86]
+
  - name: isolcpus and pstate parameter
        - https://www.kernel.org/doc/Documentation/kernel-parameters.txt
+
    lineinfile: dest=/etc/default/grub regexp=^GRUB_CMDLINE_LINUX= line=GRUB_CMDLINE_LINUX="\"isolcpus={{ isolcpus }} nohz_full={{ isolcpus }} rcu_nocbs={{ isolcpus }} intel_pstate=disable\""
        - https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt
+
 
    c. nohz_full=<cpu number>-<cpu number> - [KNL,BOOT] In kernels built with CONFIG_NO_HZ_FULL=y, set the specified list of CPUs whose tick will be stopped whenever possible. The boot CPU will be forced outside the range to maintain the timekeeping. The CPUs in this range must also be included in the rcu_nocbs= set. Specifies the adaptive-ticks CPU cores, causing kernel to avoid sending scheduling-clock interrupts to listed cores as long as they have a single runnable task. [KNL - Is a kernel start-up parameter, SMP - The kernel is an SMP kernel].
+
  $ # Sample of generated grub config line:
        - https://www.kernel.org/doc/Documentation/kernel-parameters.txt
+
  $ # GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35"
        - https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
+
    d. rcu_nocbs - [KNL] In kernels built with CONFIG_RCU_NOCB_CPU=y, set the specified list of CPUs to be no-callback CPUs, that never queue RCU callbacks (read-copy update).
+
        - https://www.kernel.org/doc/Documentation/kernel-parameters.txt
+
        - https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
+
  
2. All grub command line parameters are applied during installation using CSIT ansible scripts
+
== Changes applied during upgrade from ubuntu 14.04.03 to ubuntu 16.04.1 ==
    a. csit / resources/tools/testbed-setup/playbooks/01-host-setup.yaml
+
        - line=GRUB_CMDLINE_LINUX="\"isolcpus={{ isolcpus }} nohz_full={{ nohz }} rcu_nocbs={{ rcu }} intel_pstate=disable\""
+
        - sample of generated grub config line: GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35"
+
  
3. Changes applied during upgrade from ubuntu 14.04.03 to ubuntu 16.04.1
+
* Ubuntu 14.04.3
    a. in ubuntu 14.04.3
+
** sample of generated grub config line: GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable"
        - sample of generated grub config line: GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable"
+
* Ubuntu 16.04.1
    b. in ubuntu 16.04.1
+
** sample of generated grub config line: GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35"
        - sample of generated grub config line: GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35"
+
    c. Jira CSIT-428, https://gerrit.fd.io/r/#/c/3244/
+
  
csit resources/tools/testbed-setup/playbooks/files
+
  $ cd resources/tools/testbed-setup/playbooks/files
$ more cpufrequtils
+
  $ more cpufrequtils
GOVERNOR="performance"
+
  GOVERNOR="performance"
  
 
   - name: Set cpufrequtils defaults
 
   - name: Set cpufrequtils defaults
 
     copy: src=files/cpufrequtils dest=/etc/default/cpufrequtils owner=root group=root mode=0644
 
     copy: src=files/cpufrequtils dest=/etc/default/cpufrequtils owner=root group=root mode=0644
  
$ more irqbalance
+
  $ more irqbalance
#Configuration for the irqbalance daemon
+
  #Configuration for the irqbalance daemon
 
+
  #Should irqbalance be enabled?
#Should irqbalance be enabled?
+
  ENABLED="0"
ENABLED="0"
+
  #Balance the IRQs only once?
#Balance the IRQs only once?
+
  ONESHOT="0"
ONESHOT="0"
+
  
 
   - name: Disable IRQ load balancing
 
   - name: Disable IRQ load balancing
 
     copy: src=files/irqbalance dest=/etc/default/irqbalance owner=root group=root mode=0644
 
     copy: src=files/irqbalance dest=/etc/default/irqbalance owner=root group=root mode=0644
  
# sysctl settings
+
==== Sysctl settings ====
  
a. not using hugepages setting in GRUB_CMDLINE_LINUX, e.g. GRUB_CMDLINE_LINUX="default_hugepagesz=1GB hugepagesz=1G hugepages=64"
+
* not using hugepages setting in GRUB_CMDLINE_LINUX, e.g. GRUB_CMDLINE_LINUX="default_hugepagesz=1GB hugepagesz=1G hugepages=64"
b. using sysctl instead to set additional related parameters
+
* using sysctl instead to set additional related parameters
c. with ubuntu 14.04.3
+
* with ubuntu 14.04.3
    - mk: we did have hugepages setting there, no? was it in GRUB_CMDLINE_LINUX - set manually using 90-csit.conf, but not in ansible script ########
+
** Hugepages were applied by VPP via 80-vpp.conf (1024 of 2M hugepages). For the vhost measurements we dynamically allocate additional hugepages during the vhost tests. This approach leads to huge fragmentation of memory space and caused issues to testbeds.
    - pm: we did have manually applied setting that comes from 80-vpp.conf (1024 of 2M hugepages). We dynamically allocate more during the vhost testing. This aproach leads to huge fragmentation of memory space and caused issues to testbeds.
+
* with ubuntu 16.04.1:
d. with ubuntu 16.04.1:
+
    - ansible .yaml:
+
          - name: copy sysctl file
+
            template: src=files/90-csit dest=/etc/sysctl.d/90-csit.conf owner=root group=root mode=644
+
    - systcl file content - csit resources/tools/testbed-setup/playbooks/files/90-csit
+
  
 +
    $ cd resources/tools/testbed-setup/playbooks/
 +
    $ more 01-host-setup.yaml
 +
 
 +
    - name: copy sysctl file
 +
        template: src=files/90-csit dest=/etc/sysctl.d/90-csit.conf owner=root group=root mode=644
 +
   
 +
    $ more resources/tools/testbed-setup/playbooks/files/90-csit
 +
   
 
     # change the minimum size of the hugepage pool.
 
     # change the minimum size of the hugepage pool.
 
     vm.nr_hugepages=4096
 
     vm.nr_hugepages=4096
 
+
   
 
     # maximum number of memory map areas a process
 
     # maximum number of memory map areas a process
 
     # may have. memory map areas are used as a side-effect of calling
 
     # may have. memory map areas are used as a side-effect of calling
Line 77: Line 80:
 
     # must be greater than or equal to (2 * vm.nr_hugepages).
 
     # must be greater than or equal to (2 * vm.nr_hugepages).
 
     vm.max_map_count=200000
 
     vm.max_map_count=200000
 
+
   
 
     # hugetlb_shm_group contains group id that is allowed to create sysv
 
     # hugetlb_shm_group contains group id that is allowed to create sysv
 
     # shared memory segment using hugetlb page.
 
     # shared memory segment using hugetlb page.
 
     vm.hugetlb_shm_group=0
 
     vm.hugetlb_shm_group=0
 
+
   
 
     # this control is used to define how aggressive the kernel will swap
 
     # this control is used to define how aggressive the kernel will swap
 
     # memory pages.  higher values will increase agressiveness, lower values
 
     # memory pages.  higher values will increase agressiveness, lower values
Line 88: Line 91:
 
     # than the high water mark in a zone.
 
     # than the high water mark in a zone.
 
     vm.swappiness=0
 
     vm.swappiness=0
 
+
   
 
     # shared memory max must be greator or equal to the total size of hugepages.
 
     # shared memory max must be greator or equal to the total size of hugepages.
 
     # for 2mb pages, totalhugepagesize = vm.nr_hugepages * 2 * 1024 * 1024
 
     # for 2mb pages, totalhugepagesize = vm.nr_hugepages * 2 * 1024 * 1024
Line 95: Line 98:
 
     # to current shmmax value.
 
     # to current shmmax value.
 
     kernel.shmmax=8589934592
 
     kernel.shmmax=8589934592
 
+
   
 
     # this option can be used to select the type of process address
 
     # this option can be used to select the type of process address
 
     # space randomization that is used in the system, for architectures
 
     # space randomization that is used in the system, for architectures
Line 103: Line 106:
 
     #    and kernels that are booted with the "norandmaps" parameter.
 
     #    and kernels that are booted with the "norandmaps" parameter.
 
     kernel.randomize_va_space=0
 
     kernel.randomize_va_space=0
 
+
   
 
     # this value can be used to control on which cpus the watchdog may run.
 
     # this value can be used to control on which cpus the watchdog may run.
 
     # the default cpumask is all possible cores, but if no_hz_full is
 
     # the default cpumask is all possible cores, but if no_hz_full is
Line 116: Line 119:
 
     kernel.watchdog_cpumask=0,18
 
     kernel.watchdog_cpumask=0,18
  
e. Jira CSIT-429, https://gerrit.fd.io/r/#/c/3245/
+
==== Host CFS optimizations (QEMU+VPP) ====
 
+
# Host CFS optimizations (QEMU+VPP)
+
  
 
Applying CFS scheduler tuning on all Qemu vcpu worker threads (those are handling testpmd - pmd threads) and VPP
 
Applying CFS scheduler tuning on all Qemu vcpu worker threads (those are handling testpmd - pmd threads) and VPP
PMD worker threads. List of VPP PMD threads can be obtained either from
+
PMD worker threads. List of VPP PMD threads can be obtained either from:
"cat /proc/`pidof vpp`/task/*/stat | awk '{print $1" "$2" "$39}'" (prefixed by vpp_wk) or via VPP cli command "show threads".
+
  
$ chrt -r -p 1 <worker_pid>
+
  $ cat /proc/`pidof vpp`/task/*/stat | awk '{print $1" "$2" "$39}'
  
 +
  $ chrt -r -p 1 <worker_pid>
  
# Host IRQ optimizations
+
==== Host IRQ affinity ====
  
Changing the default pinning of IRQ to core 0. (Same does apply on both guest and host OS)
+
Changing the default pinning of every IRQ to core 0. (Same does apply on both guest and host OS)
  
$ for l in `ls /proc/irq`; do echo 1 | sudo tee /proc/irq/$l/smp_affinity; done
+
  $ for l in `ls /proc/irq`; do echo 1 | sudo tee /proc/irq/$l/smp_affinity; done
  
 
+
==== Host RCU affinity ====
# Host RCU optimizations
+
  
 
Changing the default pinning of RCU to core 0. (Same does apply on both guest and host OS)
 
Changing the default pinning of RCU to core 0. (Same does apply on both guest and host OS)
  
$ for i in `pgrep rcu[^c]` ; do sudo taskset -pc 0 $i ; done
+
  $ for i in `pgrep rcu[^c]` ; do sudo taskset -pc 0 $i ; done
 
+
  
# Host Writeback affiniy
+
==== Host Writeback affinity ====
  
 
Changing the default pinning of writebacks to core 0. (Same does apply on both guest and host OS)
 
Changing the default pinning of writebacks to core 0. (Same does apply on both guest and host OS)
  
$ echo 1 | sudo tee /sys/bus/workqueue/devices/writeback/cpumask
+
  $ echo 1 | sudo tee /sys/bus/workqueue/devices/writeback/cpumask
  
 
For more information please follow: https://www.kernel.org/doc/Documentation/kernel-per-CPU-kthreads.txt
 
For more information please follow: https://www.kernel.org/doc/Documentation/kernel-per-CPU-kthreads.txt

Latest revision as of 12:14, 7 November 2018

We have upgraded the CSIT performance testbeds, and used this to apply kernel configuration changes that should address some of the issues we observed during performance tests in CSIT rls1609, mainly related to interactions with Qemu in vhost tests.

Kernel boot parameters (grub)

Following kernel boot parameters are used in CSIT performance testbeds

All grub command line parameters are applied during installation using CSIT ansible scripts

 $ cd resources/tools/testbed-setup/playbooks/
 $ more 01-host-setup.yaml
 
 - name: isolcpus and pstate parameter
   lineinfile: dest=/etc/default/grub regexp=^GRUB_CMDLINE_LINUX= line=GRUB_CMDLINE_LINUX="\"isolcpus=Template:Isolcpus nohz_full=Template:Isolcpus rcu_nocbs=Template:Isolcpus intel_pstate=disable\""
 
 $ # Sample of generated grub config line:
 $ # GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35"

Changes applied during upgrade from ubuntu 14.04.03 to ubuntu 16.04.1

  • Ubuntu 14.04.3
    • sample of generated grub config line: GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable"
  • Ubuntu 16.04.1
    • sample of generated grub config line: GRUB_CMDLINE_LINUX="isolcpus=1-17,19-35 intel_pstate=disable nohz_full=1-17,19-35 rcu_nocbs=1-17,19-35"
 $ cd resources/tools/testbed-setup/playbooks/files
 $ more cpufrequtils
 GOVERNOR="performance"
 - name: Set cpufrequtils defaults
   copy: src=files/cpufrequtils dest=/etc/default/cpufrequtils owner=root group=root mode=0644
 $ more irqbalance
 #Configuration for the irqbalance daemon
 #Should irqbalance be enabled?
 ENABLED="0"
 #Balance the IRQs only once?
 ONESHOT="0"
 - name: Disable IRQ load balancing
   copy: src=files/irqbalance dest=/etc/default/irqbalance owner=root group=root mode=0644

Sysctl settings

  • not using hugepages setting in GRUB_CMDLINE_LINUX, e.g. GRUB_CMDLINE_LINUX="default_hugepagesz=1GB hugepagesz=1G hugepages=64"
  • using sysctl instead to set additional related parameters
  • with ubuntu 14.04.3
    • Hugepages were applied by VPP via 80-vpp.conf (1024 of 2M hugepages). For the vhost measurements we dynamically allocate additional hugepages during the vhost tests. This approach leads to huge fragmentation of memory space and caused issues to testbeds.
  • with ubuntu 16.04.1:
   $ cd resources/tools/testbed-setup/playbooks/
   $ more 01-host-setup.yaml
 
   - name: copy sysctl file
       template: src=files/90-csit dest=/etc/sysctl.d/90-csit.conf owner=root group=root mode=644
   
   $ more resources/tools/testbed-setup/playbooks/files/90-csit
   
   # change the minimum size of the hugepage pool.
   vm.nr_hugepages=4096
   
   # maximum number of memory map areas a process
   # may have. memory map areas are used as a side-effect of calling
   # malloc, directly by mmap and mprotect, and also when loading shared
   # libraries.
   # while most applications need less than a thousand maps, certain
   # programs, particularly malloc debuggers, may consume lots of them,
   # e.g., up to one or two maps per allocation.
   # must be greater than or equal to (2 * vm.nr_hugepages).
   vm.max_map_count=200000
   
   # hugetlb_shm_group contains group id that is allowed to create sysv
   # shared memory segment using hugetlb page.
   vm.hugetlb_shm_group=0
   
   # this control is used to define how aggressive the kernel will swap
   # memory pages.  higher values will increase agressiveness, lower values
   # decrease the amount of swap.  a value of 0 instructs the kernel not to
   # initiate swap until the amount of free and file-backed pages is less
   # than the high water mark in a zone.
   vm.swappiness=0
   
   # shared memory max must be greator or equal to the total size of hugepages.
   # for 2mb pages, totalhugepagesize = vm.nr_hugepages * 2 * 1024 * 1024
   # if the existing kernel.shmmax setting  (cat /sys/proc/kernel/shmmax)
   # is greater than the calculated totalhugepagesize then set this parameter
   # to current shmmax value.
   kernel.shmmax=8589934592
   
   # this option can be used to select the type of process address
   # space randomization that is used in the system, for architectures
   # that support this feature.
   # 0 - turn the process address space randomization off.  this is the
   #     default for architectures that do not support this feature anyways,
   #     and kernels that are booted with the "norandmaps" parameter.
   kernel.randomize_va_space=0
   
   # this value can be used to control on which cpus the watchdog may run.
   # the default cpumask is all possible cores, but if no_hz_full is
   # enabled in the kernel config, and cores are specified with the
   # nohz_full= boot argument, those cores are excluded by default.
   # offline cores can be included in this mask, and if the core is later
   # brought online, the watchdog will be started based on the mask value.
   #
   # typically this value would only be touched in the nohz_full case
   # to re-enable cores that by default were not running the watchdog,
   # if a kernel lockup was suspected on those cores.
   kernel.watchdog_cpumask=0,18

Host CFS optimizations (QEMU+VPP)

Applying CFS scheduler tuning on all Qemu vcpu worker threads (those are handling testpmd - pmd threads) and VPP PMD worker threads. List of VPP PMD threads can be obtained either from:

 $ cat /proc/`pidof vpp`/task/*/stat | awk '{print $1" "$2" "$39}'
 $ chrt -r -p 1 <worker_pid>

Host IRQ affinity

Changing the default pinning of every IRQ to core 0. (Same does apply on both guest and host OS)

 $ for l in `ls /proc/irq`; do echo 1 | sudo tee /proc/irq/$l/smp_affinity; done

Host RCU affinity

Changing the default pinning of RCU to core 0. (Same does apply on both guest and host OS)

 $ for i in `pgrep rcu[^c]` ; do sudo taskset -pc 0 $i ; done

Host Writeback affinity

Changing the default pinning of writebacks to core 0. (Same does apply on both guest and host OS)

 $ echo 1 | sudo tee /sys/bus/workqueue/devices/writeback/cpumask

For more information please follow: https://www.kernel.org/doc/Documentation/kernel-per-CPU-kthreads.txt

--- END