lsd0004

LSD0004: R5N Distributed Hash Table
Log | Files | Refs

commit 4418802490a141a64dcea94b6bb0e12090f84d98
parent 3a4a6dc27419141f2d59a00c2566e93976488d54
Author: Christian Grothoff <grothoff@gnunet.org>
Date:   Sun, 20 Aug 2023 16:52:17 +0200

use bit, not bucket for Bloom filter bits

Diffstat:
Mdraft-schanzen-r5n.xml | 71++++++++++++++++++++++++++++++++++++-----------------------------------
1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/draft-schanzen-r5n.xml b/draft-schanzen-r5n.xml @@ -630,7 +630,7 @@ Connectivity | |Underlay| |Underlay| | Underlay | ... <em>underlay interface</em> through a <tt>PEER_CONNECTED</tt> signal, information on the new neighbor <bcp14>MUST</bcp14> be added to the routing table, except if the - respective bucket in the routing table is full or if meta-data + respective <tt>k</tt>-bucket in the routing table is full or if meta-data is present that indicates that the peer does not wish to participate in routing. Peers added to the routing table <tt>SHOULD</tt> be signalled to the @@ -696,7 +696,7 @@ Connectivity | |Underlay| |Underlay| | Underlay | ... to drop the shortest-lived connection per <tt>k</tt>-bucket first. </t> <t> - The implementation <bcp14>MAY</bcp14> cache valid <em>addresses</em> of disconnected + Implementations <bcp14>MAY</bcp14> cache valid <em>addresses</em> of disconnected <em>peers</em> outside of the routing table and sporadically or periodically try to (re-)establish connection to the <em>peer</em> by making <tt>TRY_CONNECT</tt> calls to the <em>underlay interface</em> if the respective <tt>k</tt>-bucket has empty slots. @@ -705,53 +705,54 @@ Connectivity | |Underlay| |Underlay| | Underlay | ... <section anchor="find_peer"> <name>Peer Discovery</name> <t> - Initially, the implementation depends upon either the Underlay providing at - least one initial connection to a peer (signalled through - <tt>PEER_CONNECTED</tt>), or the application/end-user providing at + Initially, implementations depend upon either the underlay providing at + least one initial connection to a <em>neighbor</em> (signalled through + <tt>PEER_CONNECTED</tt>), or the <em>application</em> or even end-user providing at least one working <tt>HELLO</tt> which is then in turn used to call <tt>TRY_CONNECT</tt> - on the Underlay in order to trigger a subsequent <tt>PEER_CONNECTED</tt> signal - from the Underlay. + on the underlay in order to trigger a subsequent <tt>PEER_CONNECTED</tt> signal + from the <em>underlay interface</em>. This is commonly achieved through the configuration of hardcoded bootstrap peers - or bootstrap servers either for the Underlay or the R<sup>5</sup>N implementation. + or bootstrap servers either for the underlay or the R<sup>5</sup>N implementation. While details on how the first connection is established <bcp14>MAY</bcp14> depend on the specific implementation, this <bcp14>SHOULD</bcp14> usually be done by an out-of-band exchange of the information from a <tt>HELLO</tt> block. Section <xref target="hello_url"/> specifies a URL format for encoding HELLO blocks as text strings which allow portable, human-readable, text-based serialization - format that can, for example, be encoded into a QR for dissemination. + format that can, for example, be encoded into a QR code for dissemination. HELLO URLs <bcp14>SHOULD</bcp14> be supported by implementations for both import and export of <tt>HELLO</tt>s. </t> <t> To discover peers for its routing table, a peer will initiate <tt>GetMessage</tt> requests - <xref target="p2p_get"/> asking for blocks of type <tt>HELLO</tt> using its own peer address as - <tt>QUERY_HASH</tt>. + <xref target="p2p_get"/> asking for blocks of type <tt>HELLO</tt> using its own peer address as + the <em>key</em> provided in the <tt>QUERY_HASH</tt> field of the message. The <tt>PEER_BF</tt> is initialized and set using the peers own peer address as well as the addresses - of all currently connected peers. + of all currently connected <em>neighbors</em>. <!-- note: we mean the identities here! FIX with terminology... --> These requests <bcp14>MUST</bcp14> use the <tt>FindApproximate</tt> and <tt>DemultiplexEverywhere</tt> flags. <tt>FindApproximate</tt> will ensure that other peers will reply with keys they merely consider close-enough, while <tt>DemultiplexEverywhere</tt> will cause each peer on the path to respond, which is likely to yield - <tt>HELLO</tt> s of peers that are useful somewhere in the routing table. - The <tt>RECOMMENDED</tt> replication level set in the <tt>REPL_LVL</tt> field is 4. + <tt>HELLO</tt>s of peers that are useful somewhere in the routing table. + The <tt>RECOMMENDED</tt> replication level to be set in the <tt>REPL_LVL</tt> field is 4. The size and format of the result filter is specified in <xref target="hello_block"/>. - The <tt>XQUERY</tt> is empty. + The <tt>XQUERY</tt> <bcp14>MUST</bcp14> be empty. </t> <t> In order to facilitate the above, - the Underlay is expected to provide the implementation with one or more + the underlay is expected to provide the implementation with one or more addresses signalled through <tt>ADDRESS_ADDED</tt>. Zero addresses <bcp14>MAY</bcp14> be provided if a peer can only establish outgoing connections and is otherwise unreachable. - An implementation <bcp14>MUST</bcp14> advertise its addresses periodically to its neighbors through <tt>HelloMessage</tt>s. - The advertisement interval and expiration should be configurable or chosen at the discretion of the implementation based - on external factors such as DHCP leases. + An implementation <bcp14>MUST</bcp14> advertise its addresses periodically to its + <em>neighbors</em> through <tt>HelloMessage</tt>s. + The advertisement interval and expiration should be configurable or chosen at the discretion + of the implementation based on external factors such as DHCP leases. The specific frequency of advertisements <bcp14>MAY</bcp14> depend on available bandwidth, the set of already connected neighbors, the workload of the system and other factors which are at the discretion of the developer, but <bcp14>SHOULD</bcp14> be a fraction of the expiration period. Whenever a peer receives such a <tt>HELLO</tt> message from another peer that is already in the routing table, it must cache it as long as that peer is in its routing table (or until the <tt>HELLO</tt> expires) and serve it in response to - GET requests for <tt>HELLO</tt> blocks (see <xref target="p2p_get_processing"/>). + <tt>GET</tt> requests for <tt>HELLO</tt> blocks (see <xref target="p2p_get_processing"/>). This behaviour makes it unnecessary to initiate dedicated <tt>PutMessages</tt> containing <tt>HELLO</tt> blocks by the implementation. </t> @@ -760,9 +761,9 @@ Connectivity | |Underlay| |Underlay| | Underlay | ... <name>Peer Bloom Filter</name> <t> As DHT <tt>GetMessage</tt>s and <tt>PutMessage</tt>s traverse a random path through the network for the - first N hops, it is essential that routing loops are avoided. - This peer Bloom filter is constant in size at <tt>L=1024</tt> buckets (128 bytes) and - <tt>k=16</tt> buckets per element. + first <tt>L2NSE</tt> hops, it is essential that routing loops are avoided. + This peer Bloom filter is constant in size at <tt>L=1024</tt> bits (128 bytes) and + <tt>k=16</tt> bits set per element. The peer Bloom filter is part of the routing metadata in messages in order to prevent circular routes and is updated at each hop with the hops peer identity. @@ -788,7 +789,7 @@ Connectivity | |Underlay| |Underlay| | Underlay | ... <t> The element <tt>e</tt> is hashed using SHA-512. The resulting byte string is interpreted as a string of k=16 - 32-bit integers in network byte order which are used to set and check the bucket bits + 32-bit integers in network byte order which are used to set and check the bits in <tt>B</tt> using <tt>BF-SET</tt> and <tt>BF-TEST</tt>. </t> <t> @@ -2386,7 +2387,7 @@ BEGIN <t> The RESULT_FILTER for <tt>HELLO</tt> blocks is implemented using a Bloom filter following the definition from <xref target="bloom_filters"/> - and consists of a variable number of buckets <tt>L</tt>. + and consists of a variable number of bits <tt>L</tt>. <tt>L</tt> depends on the number of connected peers <tt>|E|</tt> known to the peer creating a <tt>HELLO</tt> block from its own addresses: <tt>L</tt> is set to the minimum of @@ -2407,9 +2408,9 @@ BEGIN <t> <tt>M</tt> is an identity function and returns the 512-bit XOR result unmodified. This resulting byte string is interpreted as k=16 32-bit - integers in network byte order which are used to set and check the bucket bits in + integers in network byte order which are used to set and check the bits in <tt>B</tt> using <tt>BF-SET</tt> and <tt>BF-TEST</tt>. - The 32-bit Mutator is prepended to the L-bit Bloom filter bucket field <tt>HELLO_BF</tt> containing <tt>B</tt> + The 32-bit Mutator is prepended to the L-bit Bloom filter field <tt>HELLO_BF</tt> containing <tt>B</tt> to create the result filter for a <tt>HELLO</tt> block: </t> <figure anchor="hello_rf" title="The HELLO Block Result Filter."> @@ -2430,7 +2431,7 @@ BEGIN </dd> <dt>HELLO_BF</dt> <dd> - The L-bit Bloom filter buckets byte array. + The L-bit Bloom filter array. </dd> </dl> <t> @@ -2856,8 +2857,8 @@ Type | Name | References | Description a BF is either "no" or "maybe". </t> <t> - Bloom filters are defined as a string of <tt>L</tt> bits called "buckets". - The buckets are initially always empty, meaning that the bits are set to + Bloom filters are defined as a string of <tt>L</tt> bits. + The bits are initially always empty, meaning that the bits are set to zero. There are two functions which can be invoked on the Bloom filter "bf": BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element that is to @@ -2865,9 +2866,9 @@ Type | Name | References | Description </t> <t> A mapping function M is used to map each ID of each element from the set to a - subset of k buckets. + subset of k bits. In the original proposal by Bloom, M is non-injective and can thus map the same - element multiple times to the same bucket. + element multiple times to the same bit. The type of the mapping function can thus be described by the following mathematical notation: </t> @@ -2876,9 +2877,9 @@ Type | Name | References | Description ------------------------------------ # M: E->B^k ------------------------------------ - # L = Number of buckets - # B = 0,1,2,3,4,...L-1 (the buckets) - # k = Number of buckets per element + # L = Number of bits + # B = 0,1,2,3,4,...L-1 (the bits) + # k = Number of bits per element # E = Set of elements ------------------------------------ Example: L=256, k=3