Difference between revisions of "VPP/Bihash"

From fd.io
< VPP
Jump to: navigation, search
(Bounded-index Extensible Hashing)
(Replaced content with "== Page moved == This page has [https://fd.io/docs/vpp/master/developer/corearchitecture/bihash.html moved to the versioned documentation]")
 
Line 1: Line 1:
__NOTOC__
+
== Page moved ==
  
= Bounded-index Extensible Hashing =
+
This page has [https://fd.io/docs/vpp/master/developer/corearchitecture/bihash.html moved to the versioned documentation]
 
+
Writeup moved to the [//fd.io/docs/vpp/master/gettingstarted/developers/bihash.html fd.io vpp documentation] document set.
+
 
+
= Bihash Cookbook =
+
 
+
== Using current (key,value) template instance types ==
+
 
+
It's quite easy to use one of the template instance types. As of this writing, .../src/vppinfra provides pre-built templates for 8, 16, 20, 24, 40, and 48 byte keys, u8 * vector keys, and 8 byte values.
+
 
+
See .../src/vppinfra/{bihash_<key-size>_8}.h
+
 
+
To define the data types, #include a specific template instance, most often in a subsystem header file:
+
 
+
<source lang="c">    #include <vppinfra/bihash_8_8.h></source>
+
If you're building a standalone application, you'll need to define the various functions by #including the method implementation file in a C source file.
+
 
+
The core vpp engine currently uses most if not all of the known bihash types, so you probably won't need to #include the method implementation file.
+
 
+
<source lang="c">    #include <vppinfra/bihash_template.c></source>
+
Add an instance of the selected bihash data structure to e.g. a &quot;main_t&quot; structure:
+
 
+
<source lang="c">    typedef struct
+
    {
+
      ...
+
      BVT (clib_bihash) hash;
+
      or
+
      clib_bihash_8_8_t hash;
+
      ...
+
    } my_main_t;</source>
+
The BV macro concatenate its argument with the value of the preprocessor symbol BIHASH_TYPE. The BVT macro concatenates its argument with the value of BIHASH_TYPE and the fixed-string &quot;_t&quot;. So in the above example, BVT (clib_bihash) generates &quot;clib_bihash_8_8_t&quot;.
+
 
+
If you're sure you won't decide to change the template / type name later, it's perfectly OK to code &quot;clib_bihash_8_8_t&quot; and so forth.
+
 
+
In fact, if you #include multiple template instances in a single source file, you '''must''' use fully-enumerated type names. The macros stand no chance of working.
+
 
+
== Initializing a bihash table ==
+
 
+
Call the init function as shown. As a rough guide, pick a number of buckets which is approximately number_of_expected_records/BIHASH_KVP_PER_PAGE from the relevant template instance header-file. See previous discussion.
+
 
+
The amount of memory selected should easily contain all of the records, with a generous allowance for hash collisions. Bihash memory is allocated separately from the main heap, and won't cost anything except kernel PTE's until touched, so it's OK to be reasonably generous.
+
 
+
For example:
+
 
+
<source lang="c">    my_main_t *mm = &my_main;
+
    clib_bihash_8_8_t *h;
+
       
+
    h = &mm->hash_table;
+
 
+
    clib_bihash_init_8_8 (h, "test", (u32) number_of_buckets,
+
                          (uword) memory_size);</source>
+
== Add or delete a key/value pair ==
+
 
+
Use BV(clib_bihash_add_del), or the explicit type variant:
+
 
+
<source lang="c">  clib_bihash_kv_8_8_t kv;
+
  clib_bihash_8_8_t * h;
+
  my_main_t *mm = &my_main;
+
  clib_bihash_8_8_t *h;
+
       
+
  h = &mm->hash_table;
+
  kv.key = key_to_add_or_delete;
+
  kv.value = value_to_add_or_delete;
+
 
+
  clib_bihash_add_del_8_8 (h, &kv, is_add /* 1=add, 0=delete */);</source>
+
In the delete case, kv.value is irrelevant. To change the value associated with an existing (key,value) pair, simply re-add the [new] pair.
+
 
+
== Simple search ==
+
 
+
The simplest possible (key, value) search goes like so:
+
 
+
<source lang="c">  clib_bihash_kv_8_8_t search_kv, return_kv;
+
  clib_bihash_8_8_t * h;
+
  my_main_t *mm = &my_main;
+
  clib_bihash_8_8_t *h;
+
       
+
  h = &mm->hash_table;
+
  search_kv.key = key_to_add_or_delete;
+
 
+
  if (clib_bihash_search_8_8 (h, &search_kv, &return_kv) < 0)
+
    key_not_found();
+
  else
+
    key_found();</source>
+
Note that it's perfectly fine to collect the lookup result
+
 
+
<source lang="c">  if (clib_bihash_search_8_8 (h, &search_kv, &search_kv))
+
    key_not_found();
+
  etc.</source>
+
== Bihash vector processing ==
+
 
+
When processing a vector of packets which need a certain lookup performed, it's worth the trouble to compute the key hash, and prefetch the correct bucket ahead of time.
+
 
+
Here's a sketch of one way to write the required code:
+
 
+
Dual-loop: * 6 packets ahead, prefetch 2x vlib_buffer_t's and 2x packet data required to form the record keys * 4 packets ahead, form 2x record keys and call BV(clib_bihash_hash) or the explicit hash function to calculate the record hashes. Call 2x BV(clib_bihash_prefetch_bucket) to prefetch the buckets * 2 packets ahead, call 2x BV(clib_bihash_prefetch_data) to prefetch 2x (key,value) data pages. * In the processing section, call 2x BV(clib_bihash_search_inline_with_hash) to perform the search
+
 
+
Programmer's choice whether to stash the hash code somewhere in vnet_buffer(b) metadata, or to use local variables.
+
 
+
Single-loop: * Use simple search as shown above.
+
 
+
== Walking a bihash table ==
+
 
+
A fairly common scenario to build &quot;show&quot; commands involves walking a bihash table. It's simple enough:
+
 
+
<source lang="c">  my_main_t *mm = &my_main;
+
  clib_bihash_8_8_t *h;
+
  void callback_fn (clib_bihash_kv_8_8_t *, void *);
+
 
+
  h = &mm->hash_table;
+
 
+
  BV(clib_bihash_foreach_key_value_pair) (h, callback_fn, (void *) arg);</source>
+
To nobody's great surprise: clib_bihash_foreach_key_value_pair iterates across the entire table, calling callback_fn with active entries.
+
 
+
= Creating a new template instance =
+
 
+
Creating a new template is easy. Use one of the existing templates as a model, and make the obvious changes. The hash and key_compare methods are performance-critical in multiple senses.
+
 
+
If the key compare method is slow, every lookup will be slow. If the hash function is slow, same story. If the hash function has poor statistical properties, space efficiency will suffer. In the limit, a bad enough hash function will cause large portions of the table to revert to linear search.
+
 
+
Use of the best available vector unit is well worth the trouble in the hash and key_compare functions.
+

Latest revision as of 09:57, 9 November 2021

Page moved

This page has moved to the versioned documentation