Difference between revisions of "VPP/Bihash"
m |
(→Bounded-index Extensible Hashing) |
||
Line 3: | Line 3: | ||
= Bounded-index Extensible Hashing = | = Bounded-index Extensible Hashing = | ||
− | + | Writeup moved to the [//fd.io/docs/vpp/master/gettingstarted/developers/bihash.html fd.io vpp documentation] document set. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
= Bihash Cookbook = | = Bihash Cookbook = |
Revision as of 12:42, 26 November 2020
Bounded-index Extensible Hashing
Writeup moved to the fd.io vpp documentation document set.
Bihash Cookbook
Using current (key,value) template instance types
It's quite easy to use one of the template instance types. As of this writing, .../src/vppinfra provides pre-built templates for 8, 16, 20, 24, 40, and 48 byte keys, u8 * vector keys, and 8 byte values.
See .../src/vppinfra/{bihash_<key-size>_8}.h
To define the data types, #include a specific template instance, most often in a subsystem header file:
#include <vppinfra/bihash_8_8.h>
If you're building a standalone application, you'll need to define the various functions by #including the method implementation file in a C source file.
The core vpp engine currently uses most if not all of the known bihash types, so you probably won't need to #include the method implementation file.
#include <vppinfra/bihash_template.c>
Add an instance of the selected bihash data structure to e.g. a "main_t" structure:
typedef struct { ... BVT (clib_bihash) hash; or clib_bihash_8_8_t hash; ... } my_main_t;
The BV macro concatenate its argument with the value of the preprocessor symbol BIHASH_TYPE. The BVT macro concatenates its argument with the value of BIHASH_TYPE and the fixed-string "_t". So in the above example, BVT (clib_bihash) generates "clib_bihash_8_8_t".
If you're sure you won't decide to change the template / type name later, it's perfectly OK to code "clib_bihash_8_8_t" and so forth.
In fact, if you #include multiple template instances in a single source file, you must use fully-enumerated type names. The macros stand no chance of working.
Initializing a bihash table
Call the init function as shown. As a rough guide, pick a number of buckets which is approximately number_of_expected_records/BIHASH_KVP_PER_PAGE from the relevant template instance header-file. See previous discussion.
The amount of memory selected should easily contain all of the records, with a generous allowance for hash collisions. Bihash memory is allocated separately from the main heap, and won't cost anything except kernel PTE's until touched, so it's OK to be reasonably generous.
For example:
my_main_t *mm = &my_main; clib_bihash_8_8_t *h; h = &mm->hash_table; clib_bihash_init_8_8 (h, "test", (u32) number_of_buckets, (uword) memory_size);
Add or delete a key/value pair
Use BV(clib_bihash_add_del), or the explicit type variant:
clib_bihash_kv_8_8_t kv; clib_bihash_8_8_t * h; my_main_t *mm = &my_main; clib_bihash_8_8_t *h; h = &mm->hash_table; kv.key = key_to_add_or_delete; kv.value = value_to_add_or_delete; clib_bihash_add_del_8_8 (h, &kv, is_add /* 1=add, 0=delete */);
In the delete case, kv.value is irrelevant. To change the value associated with an existing (key,value) pair, simply re-add the [new] pair.
Simple search
The simplest possible (key, value) search goes like so:
clib_bihash_kv_8_8_t search_kv, return_kv; clib_bihash_8_8_t * h; my_main_t *mm = &my_main; clib_bihash_8_8_t *h; h = &mm->hash_table; search_kv.key = key_to_add_or_delete; if (clib_bihash_search_8_8 (h, &search_kv, &return_kv) < 0) key_not_found(); else key_found();
Note that it's perfectly fine to collect the lookup result
if (clib_bihash_search_8_8 (h, &search_kv, &search_kv)) key_not_found(); etc.
Bihash vector processing
When processing a vector of packets which need a certain lookup performed, it's worth the trouble to compute the key hash, and prefetch the correct bucket ahead of time.
Here's a sketch of one way to write the required code:
Dual-loop: * 6 packets ahead, prefetch 2x vlib_buffer_t's and 2x packet data required to form the record keys * 4 packets ahead, form 2x record keys and call BV(clib_bihash_hash) or the explicit hash function to calculate the record hashes. Call 2x BV(clib_bihash_prefetch_bucket) to prefetch the buckets * 2 packets ahead, call 2x BV(clib_bihash_prefetch_data) to prefetch 2x (key,value) data pages. * In the processing section, call 2x BV(clib_bihash_search_inline_with_hash) to perform the search
Programmer's choice whether to stash the hash code somewhere in vnet_buffer(b) metadata, or to use local variables.
Single-loop: * Use simple search as shown above.
Walking a bihash table
A fairly common scenario to build "show" commands involves walking a bihash table. It's simple enough:
my_main_t *mm = &my_main; clib_bihash_8_8_t *h; void callback_fn (clib_bihash_kv_8_8_t *, void *); h = &mm->hash_table; BV(clib_bihash_foreach_key_value_pair) (h, callback_fn, (void *) arg);
To nobody's great surprise: clib_bihash_foreach_key_value_pair iterates across the entire table, calling callback_fn with active entries.
Creating a new template instance
Creating a new template is easy. Use one of the existing templates as a model, and make the obvious changes. The hash and key_compare methods are performance-critical in multiple senses.
If the key compare method is slow, every lookup will be slow. If the hash function is slow, same story. If the hash function has poor statistical properties, space efficiency will suffer. In the limit, a bad enough hash function will cause large portions of the table to revert to linear search.
Use of the best available vector unit is well worth the trouble in the hash and key_compare functions.