Difference between revisions of "GoVPP/Performance"
Line 4: | Line 4: | ||
GoVPP performance has been measured using the [https://gerrit.fd.io/r/gitweb?p=govpp.git;a=blob;f=examples/cmd/perf-bench/perf-bench.go;h=e414b204fcadfc9965ac231adce9f495145a8e05;hb=HEAD perf-bench example binary] on an Ubuntu 16.04 LTS virtual machine running on a laptop with Intel® Core™ i7-6820HQ CPU @ 2.70GHz and 16 GB of RAM. The virtual machine has been assigned all CPU cores and 8 GB of RAM. | GoVPP performance has been measured using the [https://gerrit.fd.io/r/gitweb?p=govpp.git;a=blob;f=examples/cmd/perf-bench/perf-bench.go;h=e414b204fcadfc9965ac231adce9f495145a8e05;hb=HEAD perf-bench example binary] on an Ubuntu 16.04 LTS virtual machine running on a laptop with Intel® Core™ i7-6820HQ CPU @ 2.70GHz and 16 GB of RAM. The virtual machine has been assigned all CPU cores and 8 GB of RAM. | ||
− | The benchmark application is able to measure the performance of both synchronous and asynchronous | + | The benchmark application is able to measure the performance of both synchronous and asynchronous GoVPP APIs. The results are compared with the results of the benchmark application written in C, that uses the same vppapiclient calls as GoVPP. As the results of the benchmarks show; in case that speed is an issue, asynchronous API is recommended: |
+ | |||
+ | (rr/second = requests + replies per second) | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 11: | Line 13: | ||
|1 | |1 | ||
|C application (vppapiclient) - control ping | |C application (vppapiclient) - control ping | ||
− | |762 000 | + | |762 000 rr/second |
|- | |- | ||
|2 | |2 | ||
|GoVPP - control ping | |GoVPP - control ping | ||
− | |251 878 | + | |251 878 rr/second |
|- | |- | ||
|3 | |3 | ||
|GoVPP - l2fib add | |GoVPP - l2fib add | ||
− | |245 560 | + | |245 560 rr/second |
|- | |- | ||
|4 | |4 | ||
|GoVPP - interface dump | |GoVPP - interface dump | ||
− | |210 305 | + | |210 305 rr/second |
|} | |} | ||
Line 31: | Line 33: | ||
|5 | |5 | ||
|C application - control ping | |C application - control ping | ||
− | | | + | |2 340 rr/second |
|- | |- | ||
|6 | |6 | ||
|GoVPP - control ping | |GoVPP - control ping | ||
− | |107 | + | |107 rr/second |
|} | |} | ||
− | As the results show, there is quite big difference between the performance of | + | As the results show, there is quite a big difference between the performance of the C application and the GoVPP. Also, there is a big difference between the performance of the synchronous and synchronous API. In order to identify the differences and bottlenecks, we did the following: |
− | * performed profiling of the execution of the test 2 ([ | + | |
+ | * performed profiling of the execution of the test 2 ([https://wiki.fd.io/images/f/fe/Async-prof.pdf GoVPP async API]) and 6 ([https://wiki.fd.io/images/d/dc/Sync-prof.pdf GoVPP sync API]). | ||
* performed the measurements from the bare Go application that called vppapiclient library directly, without any further processing in Go, with following results: | * performed the measurements from the bare Go application that called vppapiclient library directly, without any further processing in Go, with following results: | ||
Line 47: | Line 50: | ||
|7 | |7 | ||
|Go async - no encode & decode, callback once per 100 replies | |Go async - no encode & decode, callback once per 100 replies | ||
− | |761 836 | + | |761 836 rr/second |
|- | |- | ||
|8 | |8 | ||
|Go async - no encode & decode | |Go async - no encode & decode | ||
− | |554 215 | + | |554 215 rr/second |
|- | |- | ||
|9 | |9 | ||
|Go async - with encode & decode, no message passing | |Go async - with encode & decode, no message passing | ||
− | |284 283 | + | |284 283 rr/second |
|- | |- | ||
|10 | |10 | ||
|Go async - with encode & decode, with message passing | |Go async - with encode & decode, with message passing | ||
− | |250 861 | + | |250 861 rr/second |
|} | |} | ||
+ | |||
+ | == Discussion == | ||
+ | * By comparing tests 1 and 8, we can see that we can not reach the performance of the C application from Go. After inspecting the profiling results ([https://wiki.fd.io/images/f/fe/Async-prof.pdf GoVPP async API]) which shows that most of the time is wasted by Go runtime by calling a message reply callback from C world, we performed the test 7, which calls the reply callback only once per 100 replies (buffers the replies). The results confirmed the theory and show that with buffered replies, we could reach the performance of the C application. | ||
+ | * Since the encoder & decoder of binary API messages from binary to Go bindings uses reflection, we knew that this process will eat quite a lot resources. In order to find out the exact numbers, we performed the tests 8 (no encoding & decoding of the messages) and 9 (encoding and decoding of the messages). The results show that the encoding & decoding process slows down the performance in 50%. | ||
+ | * As the results of the tests 2, 3 and 4 show, the more complex is the reply message, the more time is required to decode it. This is not entirely true for the requests, since the request encoding is much faster that reply decoding (this can be also seen on the profiling results). | ||
+ | * There is a big performance drop between the C application and GoVPP in synchronous API - tests 5 and 6. Profiling results ([https://wiki.fd.io/images/d/dc/Sync-prof.pdf GoVPP sync API]) show that this is caused by Go runtime probably by continuous sleeps and wakeups or the async thread created by the vppapiclient library that receives the replies from VPP. | ||
+ | |||
+ | |||
+ | == Possible Performance Improvements == | ||
+ | * Get rid of reflection by message decoding. Either generate custom encoder & decoder code for each binary API message, or allow to provide custom encoder & decoder functions by the user. Possible to combine with the current approach for not so frequent binary API messages. | ||
+ | * Buffer replies from VPP in case that multiple replies are expected. | ||
+ | * Get rid of the async thread created by the vppapiclient library that receives the replies from VPP, since this probably does not match with the concept of Go routines very well. Blocking read from a Go routine may perform much better. |
Revision as of 11:49, 12 July 2017
GoVPP Performance
GoVPP performance has been measured using the perf-bench example binary on an Ubuntu 16.04 LTS virtual machine running on a laptop with Intel® Core™ i7-6820HQ CPU @ 2.70GHz and 16 GB of RAM. The virtual machine has been assigned all CPU cores and 8 GB of RAM.
The benchmark application is able to measure the performance of both synchronous and asynchronous GoVPP APIs. The results are compared with the results of the benchmark application written in C, that uses the same vppapiclient calls as GoVPP. As the results of the benchmarks show; in case that speed is an issue, asynchronous API is recommended:
(rr/second = requests + replies per second)
1 | C application (vppapiclient) - control ping | 762 000 rr/second |
2 | GoVPP - control ping | 251 878 rr/second |
3 | GoVPP - l2fib add | 245 560 rr/second |
4 | GoVPP - interface dump | 210 305 rr/second |
5 | C application - control ping | 2 340 rr/second |
6 | GoVPP - control ping | 107 rr/second |
As the results show, there is quite a big difference between the performance of the C application and the GoVPP. Also, there is a big difference between the performance of the synchronous and synchronous API. In order to identify the differences and bottlenecks, we did the following:
- performed profiling of the execution of the test 2 (GoVPP async API) and 6 (GoVPP sync API).
- performed the measurements from the bare Go application that called vppapiclient library directly, without any further processing in Go, with following results:
7 | Go async - no encode & decode, callback once per 100 replies | 761 836 rr/second |
8 | Go async - no encode & decode | 554 215 rr/second |
9 | Go async - with encode & decode, no message passing | 284 283 rr/second |
10 | Go async - with encode & decode, with message passing | 250 861 rr/second |
Discussion
- By comparing tests 1 and 8, we can see that we can not reach the performance of the C application from Go. After inspecting the profiling results (GoVPP async API) which shows that most of the time is wasted by Go runtime by calling a message reply callback from C world, we performed the test 7, which calls the reply callback only once per 100 replies (buffers the replies). The results confirmed the theory and show that with buffered replies, we could reach the performance of the C application.
- Since the encoder & decoder of binary API messages from binary to Go bindings uses reflection, we knew that this process will eat quite a lot resources. In order to find out the exact numbers, we performed the tests 8 (no encoding & decoding of the messages) and 9 (encoding and decoding of the messages). The results show that the encoding & decoding process slows down the performance in 50%.
- As the results of the tests 2, 3 and 4 show, the more complex is the reply message, the more time is required to decode it. This is not entirely true for the requests, since the request encoding is much faster that reply decoding (this can be also seen on the profiling results).
- There is a big performance drop between the C application and GoVPP in synchronous API - tests 5 and 6. Profiling results (GoVPP sync API) show that this is caused by Go runtime probably by continuous sleeps and wakeups or the async thread created by the vppapiclient library that receives the replies from VPP.
Possible Performance Improvements
- Get rid of reflection by message decoding. Either generate custom encoder & decoder code for each binary API message, or allow to provide custom encoder & decoder functions by the user. Possible to combine with the current approach for not so frequent binary API messages.
- Buffer replies from VPP in case that multiple replies are expected.
- Get rid of the async thread created by the vppapiclient library that receives the replies from VPP, since this probably does not match with the concept of Go routines very well. Blocking read from a Go routine may perform much better.