Difference between revisions of "VPP/BugReports"

From fd.io
< VPP
Jump to: navigation, search
m (Introduction)
m (Introduction)
Line 3: Line 3:
 
Although every situation is different, this page describes data which will help make efficient use of everyone's time when dealing with vpp bugs.  
 
Although every situation is different, this page describes data which will help make efficient use of everyone's time when dealing with vpp bugs.  
  
Before you press the Jira button to create a bug report - or email vpp-dev@lists.fd.io - please ask yourself whether there's enough information for someone else to understand / reproduce the issue given a reasonable amount of effort. Unicast emails to maintainers, committers, and the project PTL are strongly discouraged.  
+
Before you press the Jira button to create a bug report - or email vpp-dev@lists.fd.io - please ask yourself whether there's enough information for someone else to understand / reproduce the issue given a reasonable amount of effort. Unicast emails to maintainers, committers, and the project PTL are strongly discouraged.
  
A good strategy: file a detailed Jira ticket, and then send a short description of the issue to vpp-dev@lists.fd.io, perhaps from the Jira ticket description.  
+
A good strategy for clear-cut bugs: file a detailed Jira ticket, and then send a short description of the issue to vpp-dev@lists.fd.io, perhaps from the Jira ticket description. It's fine to send email to vpp-dev@lists.fd.io to ask a few questions before filing Jira tickets.
  
 
==== Image version and operating environment ====
 
==== Image version and operating environment ====
Line 19: Line 19:
 
                           -b 0000:02:00.0 -b 0000:02:01.0 --master-lcore 0  
 
                           -b 0000:02:00.0 -b 0000:02:01.0 --master-lcore 0  
 
</i></b>
 
</i></b>
Please attempt to reproduce issues using unmodified vpp engine images.
+
 
+
 
With respect to the operating environment: if misbehavior involving a specific VM / container / bare-metal environment is involved, please describe the environment in detail:  
 
With respect to the operating environment: if misbehavior involving a specific VM / container / bare-metal environment is involved, please describe the environment in detail:  
  
Line 26: Line 25:
 
* NIC type(s) (ixgbe, i40e, enic, etc. etc.), vhost-user, tuntap
 
* NIC type(s) (ixgbe, i40e, enic, etc. etc.), vhost-user, tuntap
 
* NUMA configuration if applicable
 
* NUMA configuration if applicable
 +
 +
Please note the CPU architecture (x86_86, aarch64), and hardware platform.
 +
 +
Please attempt to reproduce issues using unmodified vpp engine images.
  
 
==== "Show" Command Output ====
 
==== "Show" Command Output ====
Line 31: Line 34:
 
Every situation is different. If the issue involves a sequence of debug CLI command, please enable CLI command logging, and send the sequence involved. Note that the debug CLI is a developer's tool - '''no warranty express or implied''' - and that we may choose not to fix debug CLI bugs.
 
Every situation is different. If the issue involves a sequence of debug CLI command, please enable CLI command logging, and send the sequence involved. Note that the debug CLI is a developer's tool - '''no warranty express or implied''' - and that we may choose not to fix debug CLI bugs.
  
Please include "show error" [error counter] output. It's often helpful to "clear error", send a bit of traffic, then "show error" on noisy networks.
+
Please include "show error" [error counter] output. It's often helpful to "clear error", send a bit of traffic, then "show error" particularly when running vpp on a noisy networks.
  
 
Please include ip4 / ip6 / mpls FIB contents ("show ip fib", "show ip6 fib", "show mpls fib", "show mpls tunnel").
 
Please include ip4 / ip6 / mpls FIB contents ("show ip fib", "show ip6 fib", "show mpls fib", "show mpls tunnel").
Line 89: Line 92:
 
==== Core Files ====
 
==== Core Files ====
  
We would prefer to reproduce issues, rather than trying to debug them by inspection of core files, gdb backtraces, etc. However, production systems as well as long-running pre-production soak-test systems '''must''' arrange to collect core images. The Ubuntu "corekeeper" package works well. Vpp core files often appear enormous. Gzip typically compresses them to very manageable sizes. Again, please put core files in public places.
+
Production systems, as well as long-running pre-production soak-test systems, '''must''' arrange to collect core images. The Ubuntu "corekeeper" package works well.  
 +
 
 +
Vpp core files often appear enormous. Gzip typically compresses them to very manageable sizes. Please put core files in public places.
 +
 
 +
Core files from private, modified images are discouraged. If it's necessary to go that route, please copy the '''exact''' Debian packages (or RPMs) corresponding to the core file to the same public place as the core file. In particular:
 +
 
 +
* vpp_<version>_<arch>.deb
 +
* vpp-dbg_<version>_<arch>.deb
 +
* vpp-dev_<version>_<arch>.deb
 +
* vpp-lib_<version>_<arch>.deb
 +
* vpp-plugins_<version>_<arch>.deb
 +
 
 +
Include the full commit-ID the Jira ticket.
  
Core files from private, modified images are discouraged. If it's necessary to go that route, access to the '''unmodified''' workspace used to build the image in question is required. If we go through the private image + core file setup process only to discover that the image and core files don't match, it will simply delay resolution of the issue. And it will annoy the heck out of the engineer who just wasted their time. Unmodified means '''unmodified''', not "oh, I added a few lines of debug scaffolding since then..."
+
If we go through the setup process only to discover that the image and core files don't match, it will simply delay resolution of the issue. And it will annoy the heck out of the engineer who just wasted their time. Exact means '''exact''', not "oh, gee, I added a few lines of debug scaffolding since then..."

Revision as of 20:14, 20 June 2018

Introduction

Although every situation is different, this page describes data which will help make efficient use of everyone's time when dealing with vpp bugs.

Before you press the Jira button to create a bug report - or email vpp-dev@lists.fd.io - please ask yourself whether there's enough information for someone else to understand / reproduce the issue given a reasonable amount of effort. Unicast emails to maintainers, committers, and the project PTL are strongly discouraged.

A good strategy for clear-cut bugs: file a detailed Jira ticket, and then send a short description of the issue to vpp-dev@lists.fd.io, perhaps from the Jira ticket description. It's fine to send email to vpp-dev@lists.fd.io to ask a few questions before filing Jira tickets.

Image version and operating environment

Please make sure to include the vpp image version:

sudo vppctl show version verbose
vpp v1.0.0-188~geef4d99 built by vagrant on localhost at Wed Feb 24 08:52:13 PST 2016
Built in /home/vagrant/git/vpp
Compiled with GCC 4.8.4
DPDK version is RTE 2.2.0
DPDK EAL init arguments: -c 1 -n 4 --socket-mem 1024 --huge-dir /run/vpp/hugepages --file-prefix vpp 
                         -b 0000:02:00.0 -b 0000:02:01.0 --master-lcore 0 

With respect to the operating environment: if misbehavior involving a specific VM / container / bare-metal environment is involved, please describe the environment in detail:

  • Linux Distro (e.g. Ubuntu 14.04.3 LTS, CentOS-7, etc.)
  • NIC type(s) (ixgbe, i40e, enic, etc. etc.), vhost-user, tuntap
  • NUMA configuration if applicable

Please note the CPU architecture (x86_86, aarch64), and hardware platform.

Please attempt to reproduce issues using unmodified vpp engine images.

"Show" Command Output

Every situation is different. If the issue involves a sequence of debug CLI command, please enable CLI command logging, and send the sequence involved. Note that the debug CLI is a developer's tool - no warranty express or implied - and that we may choose not to fix debug CLI bugs.

Please include "show error" [error counter] output. It's often helpful to "clear error", send a bit of traffic, then "show error" particularly when running vpp on a noisy networks.

Please include ip4 / ip6 / mpls FIB contents ("show ip fib", "show ip6 fib", "show mpls fib", "show mpls tunnel").

Please include "show hardware", "show interface", and "show interface address" output

Here is a consolidated set of commands that are generally useful before/after sending traffic. Before sending traffic:

sudo vppctl clear hardware
sudo vppctl clear interface
sudo vppctl clear error
sudo vppctl clear run

Send some traffic and then issue the following commands:

sudo vppctl show version verbose
sudo vppctl show hardware
sudo vppctl show hardware address
sudo vppctl show interface
sudo vppctl show run
sudo vppctl show error

Here are some protocol specific show commands that may also make sense. Only include those features which have been configured:

sudo vppctl show l2fib
sudo vppctl show bridge-domain
sudo vppctl show ip fib
sudo vppctl show ip arp
sudo vppctl show ip6 fib
sudo vppctl show ip6 neighbors
sudo vppctl show mpls fib
sudo vppctl show mpls tunnel

Network Topology

Please include a crisp description of the network topology, including L2 / IP / MPLS / segment-routing addressing details. If you expect folks to reproduce and debug issues, this is a must.

At or above a certain level of topological complexity, it becomes problematic to reproduce the original setup.

Packet Tracer Output

If you capture packet tracer output which seems relevant, please include it:

sudo vppctl trace add dpdk-input 100  # or similar

<send-traffic>

sudo vppctl show trace

Binary API Trace

If the issue involves a sequence of control-plane API messages - even a very long sequence - please enable control-plane API tracing. Control-plane API post-mortem traces end up in /tmp/api_post_mortem.<pid>. Please provide a pointer [accessible to the general public!] to the binary API trace. These API traces are especially helpful in cases where the vpp engine is throwing traffic on the floor, e.g. for want of a default route or similar.

Core Files

Production systems, as well as long-running pre-production soak-test systems, must arrange to collect core images. The Ubuntu "corekeeper" package works well.

Vpp core files often appear enormous. Gzip typically compresses them to very manageable sizes. Please put core files in public places.

Core files from private, modified images are discouraged. If it's necessary to go that route, please copy the exact Debian packages (or RPMs) corresponding to the core file to the same public place as the core file. In particular:

  • vpp_<version>_<arch>.deb
  • vpp-dbg_<version>_<arch>.deb
  • vpp-dev_<version>_<arch>.deb
  • vpp-lib_<version>_<arch>.deb
  • vpp-plugins_<version>_<arch>.deb

Include the full commit-ID the Jira ticket.

If we go through the setup process only to discover that the image and core files don't match, it will simply delay resolution of the issue. And it will annoy the heck out of the engineer who just wasted their time. Exact means exact, not "oh, gee, I added a few lines of debug scaffolding since then..."