lsd0004

LSD0004: R5N Distributed Hash Table
Log | Files | Refs

commit d1a71233594235411e1a4da75caa71c2bb31f902
parent a8d0df2b972d5a6d8938286e3758aecf874580ac
Author: Christian Grothoff <grothoff@gnunet.org>
Date:   Fri, 11 Mar 2022 05:12:48 +0100

more on Bloom filters

Diffstat:
Mdraft-schanzen-r5n.xml | 41++++++++++++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/draft-schanzen-r5n.xml b/draft-schanzen-r5n.xml @@ -864,7 +864,10 @@ bchar = *(ALPHA / DIGIT) </t> <t> The peer Bloom filter is always a 128-bit field. The elements - hashed into the Bloom filter are the 32 byte peer IDs. + hashed into the Bloom filter are the 32 byte peer IDs. We note + that the peer Bloom filter may exclude peers due to false-postive + matches. This is acceptable as routing should nevertheless + terminate (with high probability) in close vicinity of the key. </t> </section> <section anchor="routing_functions"> @@ -970,16 +973,36 @@ bchar = *(ALPHA / DIGIT) Any peer which is processing GET messages and has a result which matches the query key <bcp14>MUST</bcp14> check the result Bloom filter and only send a reply message if the block key is not in the - Bloom filter set. + Bloom filter set. <!-- FIXME: is this strictly always the block key? + I think this is block-type dependent! --> </t> <t> - The Bloom filter is a 128-bit field. - It is initially empty, consisting only of zeroes. - There are two functions which can be invoked on the Bloom filter: - BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to - be added to the Bloom filter or queried against the set. - Any Bloom filter uses k=16 different hash functions each of which is - defined as follows: + FIXME: say something about the size of the result Bloom filter here! + </t> + <t> + Bloom filters are probabilistic data structures. Thus, especially + given the small size of the result Bloom filter, it is always possible + that a desireable result is filtered by the Bloom filter because of + a false-positive match created by a collision in the hash values + between the desireable result and filtered items. + </t> + <t> + To address this problem, R<sup>5</sup>N uses a mutator value + to additionally randomize the process + when hashing results into the result Bloom filter. The mutator + value is set by the peer initiating the GET request, and changed + every time the GET request is repeated by the initiator. Peers + forwarding GET requests <bcp14>MUST</bcp14> not change the + mutator value included in the GET message as they could not + recalculate the Bloom filter with the new mutator value. + </t> + <t> + By including the mutator value in the hashing process, repeated + requests have statistically independent probabilities of creating + collisions in the Bloom filter. Thus, even if for one request + a Bloom filter collision may exclude a result as a false-positive + match, subsequent requests are likely to not have the same + false-positives. </t> </section> <section anchor="p2p_xq" numbered="true" toc="default">