lsd0004

LSD0004: R5N Distributed Hash Table
Log | Files | Refs

commit 3f70faf531ebd101004f5b93e1c80ee1ec5bcdd1
parent 564e5fa775b0c0fdf947e857727412dfca4f5ac8
Author: Martin Schanzenbach <schanzen@gnunet.org>
Date:   Fri, 23 Dec 2022 20:36:03 +0900

Try to disentangle Bloom filters

Diffstat:
Mdraft-schanzen-r5n.xml | 194+++++++++++++++++++++++++++++++++++++++++++++++--------------------------------
1 file changed, 115 insertions(+), 79 deletions(-)

diff --git a/draft-schanzen-r5n.xml b/draft-schanzen-r5n.xml @@ -349,7 +349,7 @@ Connectivity | |Underlay| |Underlay| </figure> </section> <section anchor="overlay" numbered="true" toc="default"> - <name>Overlay operations</name> + <name>Overlay Operations</name> <t> An implementation of this specification commonly exposes the two overlay operations "GET" and "PUT". @@ -358,9 +358,9 @@ Connectivity | |Underlay| |Underlay| picture of the protocol. </t> <section> - <name>The GET procedure</name> + <name>GET operation</name> <t> - A basic GET procedure may be exposed as: + A basic GET operation interface may be exposed as: </t> <t> <tt>GET(Query-Key, Block-Type) -> Results as List</tt> @@ -462,9 +462,9 @@ Connectivity | |Underlay| |Underlay| </dl> </section> <section> - <name>The PUT procedure</name> + <name>PUT operation</name> <t> - A PUT procedure may be exposed as: + A PUT operation interface may be exposed as: </t> <t> <tt>PUT(Key, Block-Type, Block-Expiration, Block-Data)</tt> @@ -773,47 +773,6 @@ bchar = *(ALPHA / DIGIT) </t> </section> </section> - <section anchor="bloom_filters" numbered="true" toc="default"> - <name>Bloom Filters</name> - <t> - R<sup>5</sup>N uses Bloom filters in several places. This section - gives some general background on Bloom filters and defines functions - on this data structure shared by the various use-cases in R<sup>5</sup>N. - </t> - <t> - A Bloom filter (BF) is a space-efficient probabilistic datastructure - to test if an element is part of a set of elements. - Elements are identified by an element ID. - Since a BF is a probabilistic datastructure, it is possible to have - false-positives: when asked if an element is in the set, the answer from - a BF is either "no" or "maybe". - </t> - <t> - Bloom filters are defined as a string of <tt>L</tt> bits always - initially empty, consisting only of zeroes. - There are two functions which can be invoked on the Bloom filter: - BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to - be added to the Bloom filter or queried against the set. - The mapping function <tt>M</tt> is defined as follows: - </t> - <t> - The element <tt>e</tt> is hashed using SHA-512. - The resulting byte string is interpreted as a string of 16 - 32-bit integers in network byte order. - </t> - <t> - When adding an element to the Bloom filter <tt>bf</tt> using - <tt>BF-SET(bf,e)</tt>, each integer <tt>n</tt> of the mapping - <tt>M(e)</tt> is interpreted as a bit offset <tt>n mod L</tt> within - <tt>bf</tt> and set to 1. - </t> - <t> - When testing if an element may be in the Bloom filter <tt>bf</tt> using - <tt>BF-TEST(bf,e)</tt>, each bit offset <tt>n mod L</tt> within - <tt>bf</tt> <bcp14>MUST</bcp14> have been set to 1. - Otherwise, the element is not considered to be in the Bloom filter. - </t> - </section> <section anchor="routing" numbered="true" toc="default"> <name>Routing</name> <t> @@ -920,9 +879,19 @@ bchar = *(ALPHA / DIGIT) traversed peers when searching for the next hops in the routing table. </t> <t> - The peer Bloom filter is always a 128 byte field. The elements - hashed into the Bloom filter are the 32 byte peer IDs. We note - that the peer Bloom filter may exclude peers due to false-postive + The peer Bloom filter (see <xref target="bloom_filters"/>) consists of + L=1024 buckets. + The mapping function <tt>M</tt> is defined as follows: + </t> + <t> + The element <tt>e</tt> is a 32 byte peer ID and is hashed using + SHA-512. + The resulting byte string is interpreted as a string of k=16 + 32-bit integers in network byte order which are used to set the bucket bits + accordingly. + </t> + <t> + We note that the peer Bloom filter may exclude peers due to false-postive matches. This is acceptable as routing should nevertheless terminate (with high probability) in close vicinity of the key. </t> @@ -2395,10 +2364,33 @@ gnunet+tcp://12.3.4.5/ \ <dd> <t> The RESULT_FILTER for HELLO blocks is implemented using a - Bloom filter. - </t> - <figure anchor="hello_rf" title="The HELLO Block Result Filter."> - <artwork name="" type="" align="left" alt=""><![CDATA[ + Bloom filter (see <xref target="bloom_filters"/>). + The size <tt>S = L/8</tt> of the Bloom filter in bytes depends on + the number of elements <tt>F</tt> known to be filtered at the + initiator. If <tt>F</tt> is zero, the size <tt>S</tt> is just 8 (bytes). + Otherwise, <tt>S</tt> is set to the minimum of + 2<sup>15</sup> and the lowest power + of 2 that is strictly larger than <tt>K*F/4</tt> + (in bytes). + </t> + <t> + The <tt>k</tt>-value for the HELLO_BF + Bloom filter is always 16. + <tt>k</tt> is never transmitted. + The elements used in the Bloom filter + consist of an XOR between the <tt>H_ADDRS</tt> field (as computed using + SHA-512 over the <tt>ADDRESSES</tt>) and the SHA-512 + hash of the <tt>MUTATOR</tt> field from a given HELLO block. + The mapping function M(<tt>H_ADDRS XOR MUTATOR</tt>) is an identity + function and returns the 512-bit XOR result unmodified. + This resulting byte string is interpreted as k=16 32-bit + integers in network byte order and used to set or check the bucket bits + accordingly. + This function returns an empty Bloom filter of length <tt>S= L/8</tt> as + part of RF: + </t> + <figure anchor="hello_rf" title="The HELLO Block Result Filter."> + <artwork name="" type="" align="left" alt=""><![CDATA[ 0 8 16 24 32 40 48 56 +-----+-----+-----+-----+-----+-----+-----+-----+ | MUTATOR | HELLO_BF / @@ -2407,7 +2399,6 @@ gnunet+tcp://12.3.4.5/ \ +-----+-----+-----+-----+-----+-----+-----+-----+ ]]></artwork> </figure> - <t>where:</t> <dl> <dt>MUTATOR</dt> @@ -2416,24 +2407,16 @@ gnunet+tcp://12.3.4.5/ \ </dd> <dt>HELLO_BF</dt> <dd> - The <tt>K</tt>-value for the HELLO_BF Bloom filter - is always 16. The size <tt>S</tt> of the Bloom filter in bytes depends on - the number of elements <tt>F</tt> known to be filtered at the - initiator. If <tt>F</tt> is zero, the size <tt>S</tt> is just 8 (bytes). - Otherwise, <tt>S</tt> is set to the minimum of - 2<sup>15</sup> and the lowest power - of 2 that is strictly larger than <tt>K*F/4</tt> - (in bytes). The wire format of HELLO_BF is the resulting byte - array. In particular, <tt>K</tt> is never transmitted. + The Bloom filter byte array. </dd> </dl> -<t> - R<sup>5</sup>N HELLO blocks use a <tt>MUTATOR</tt> value - to additionally "randomize" the computation of the bloom filter while remaining - deterministic across peers. The 32-bit <tt>MUTATOR</tt> - value is set by the peer initiating the GET request, and changed - every time the GET request is repeated by the initiator. Peers - forwarding GET requests <bcp14>MUST</bcp14> not change the + <t> + The <tt>MUTATOR</tt> value is used + to additionally "randomize" the computation of the Bloom filter while + remaining deterministic across peers. + It is only ever set by the peer initiating the GET + request, and changed every time the GET request is repeated. + Peers forwarding GET requests <bcp14>MUST</bcp14> not change the mutator value included in the <tt>RESULT_FILTER</tt> as they might not be able to recalculate the result filter with a different <tt>MUTATOR</tt> value. @@ -2457,12 +2440,9 @@ gnunet+tcp://12.3.4.5/ \ </dd> <dt>FilterResult(Block, Key, RF, XQuery) -&gt; (FilterEvaluationResult, RF')</dt> <dd> - To filter results of HELLO blocks using the Bloom filter, the - <tt>H_ADDRS</tt> field (as computed using SHA-512 over - the <tt>ADDRESSES</tt>) and XORed with the SHA-512 - hash of the <tt>MUTATOR</tt> (in network byte order). - The resulting value is then used when hashing into the - Bloom filter as described in <xref target="bloom_filters" />. + The <tt>H_ADDRS</tt> field is XORed with the SHA-512 + hash of the <tt>MUTATOR</tt> field from the HELLO block and the resulting + value is checked against the Bloom filter in RF. Consequently, HELLOs with completely identical sets of addresses will be filtered and FILTER_DUPLICATE is returned. Any small variation in the set of addresses will cause the block @@ -2833,8 +2813,64 @@ Type | Name | References | Description </front> </reference> </references> - <!-- Change Log - v00 2017-07-23 MS Initial version - --> + <section anchor="bloom_filters" numbered="true" toc="default"> + <name>Bloom filters in R<sup>5</sup>N</name> + <t> + R<sup>5</sup>N uses Bloom filters in several places. This section + gives some general background on Bloom filters and defines functions + on this data structure shared by the various use-cases in R<sup>5</sup>N. + </t> + <t> + A Bloom filter (BF) is a space-efficient probabilistic datastructure + to test if an element is part of a set of elements. + Elements are identified by an element ID. + Since a BF is a probabilistic datastructure, it is possible to have + false-positives: when asked if an element is in the set, the answer from + a BF is either "no" or "maybe". + </t> + <t> + Bloom filters are defined as a string of <tt>L</tt> bits called "buckets". + The buckets are initially always empty, meaning that the bits are set to + zero. + There are two functions which can be invoked on the Bloom filter "bf": + BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element that is to + be added to the Bloom filter or queried against the set. + </t> + <t> + A mapping function M is used to map each ID of each element from the set to a + subset of k buckets. + In the original proposal by Bloom, M is non-injective and can thus map the same + element multiple times to the same bucket. + The type of the mapping function can thus be described by the following + mathematical notation: + </t> + <figure anchor="figure_bf_func" title="Bloom filter mapping function."> + <artwork><![CDATA[ + ------------------------------------ + # M: E->B^k + ------------------------------------ + # L = Number of buckets + # B = 0,1,2,3,4,...L-1 (the buckets) + # k = Number of buckets per element + # E = Set of elements + ------------------------------------ + Example: L=256, k=3 + M('element-data') = {4,6,255} +]]> + </artwork> + </figure> + <t> + When adding an element to the Bloom filter <tt>bf</tt> using + <tt>BF-SET(bf,e)</tt>, each integer <tt>n</tt> of the mapping + <tt>M(e)</tt> is interpreted as a bit offset <tt>n mod L</tt> within + <tt>bf</tt> and set to 1. + </t> + <t> + When testing if an element may be in the Bloom filter <tt>bf</tt> using + <tt>BF-TEST(bf,e)</tt>, each bit offset <tt>n mod L</tt> within + <tt>bf</tt> <bcp14>MUST</bcp14> have been set to 1. + Otherwise, the element is not considered to be in the Bloom filter. + </t> + </section> </back> </rfc>