The R5N Distributed Hash Table

Introduction Distributed Hash Tables (DHTs) are a key data structure for the construction of completely decentralized applications. DHTs are important because they generally provide a robust and efficient means to distribute the storage and retrieval of key-value pairs. While already provides a peer-to-peer (P2P) signaling protocol with extensible routing and topology mechanisms, it also relies on strict admission control through the use of either centralized enrollment servers or pre-shared keys. Modern decentralized applications require a more open system that enables ad-hoc participation and other means to prevent common attacks on P2P overlays. This document contains the technical specification of the R5N DHT , a secure DHT routing algorithm and data structure for decentralized applications. R5N is an open P2P overlay routing mechanism which supports ad-hoc participation and security properties including support for topologies in restricted-route environments and path signatures. This document defines the normative wire format of peer-to-peer messages, routing algorithms, cryptographic routines and security considerations for use by implementors.

Requirements Notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Architecture R5N is an overlay network with a pluggable transport layer. The following figure shows the R5N architecture.

Applications: Applications are components which directly use the DHT overlay interfaces. Possible applications include the GNU Name System or the CADET transport system .
Overlay Interface: The Overlay Interface exposes the core operations of the DHT overlay to applications. This includes querying and retrieving data from the DHT.
Block Storage: The Block Storage component is used to persist and manage data by peers. It includes logic for quotas, caching stragegies and data validation.
Message Processing: The Message Processing component processes requests from and responses to applications as well as messages from the underlay network.
Routing: The Routing component includes the routing table as well as routing and peer selection logic. It facilitates the R5N routing algorithm with required data structures and algorithms.
Underlay Interface: The DHT Underlay Interface is an abstraction layer on top of the supported links of a peer. Peers may be linked by a variety of different transports, including "classical" protocols such as TCP, UDP and TLS or advanced protocols such as GNUnet, L2P or Tor.

Overlay In the DHT overlay, a peer is addressable by its Peer ID. The Peer ID is the 256-bit hash of the peer public key. The peer public key is the public key of the corresponding Ed25519 peer private key. Any implementation of this specification MUST expose the two API procedures "GET" and "PUT".

The GET procedure The GET procedure is defined as follows: RESULTS as List or GET(key[, options], callbackFunction) ]]> The procedure takes two arguments. The first argument is the query key and is mandatory. The GET procedure may also allow the caller to specifiy RouteOptions in order to indicate certain processing requirements for messages. Any combination of options may be specificied.

DemultiplexEverywhere: indicates that each peer along the way should process the request.
RecordRoute: indicates to keep track of the route that the message takes in the P2P network.
FindPeer: indicates that this is a request used to find additional peers. This is a special flag which modifies the message processing to allow approximate results.

The procedure either returns a list of results or allows the caller to provide a callback function which is called for any result received from the DHT until the procedure is cancelled.

The PUT procedure The PUT procedure is defined as follows: The procedure takes three arguments. The first argument is the query key and is mandatory. The PUT procedure may also allow the caller to specifiy put options. The third argument is the payload data which is to be stored under the provided put key.

Underlay In the network underlay, a peer is addressable by traditional means out of scope of this document. For example, the peer may have a TCP/IP address, or a HTTPS endpoint. While the specific addressing options and mechanisms are out of scope for this document, it is necessary to define a universal addressing format in order to facilitate the distribution of connectivity information to other peers in the DHT overlay. This format is the "HELLO" message. A "HELLO" is a human-readable UTF-8 string consisting of the peer public key and the HELLO URI .

peer-public-key := [A-HJ-NP-Z1-9]+ ]]>

For the string representation of the peer public key, the base-32 encoding "StringEncode" is used. However, instead of following the character map is based on the optical character recognition friendly proposal of Crockford . The only difference to Crockford is that the letter "U" decodes to the same base-32 value as the letter "V" (27). The "scheme" part of the HELLO URI defined the addressing scheme which is used. An example of an addressing scheme used throughout this document is "ip+tcp", which refers to a standard TCP/IP socket connection. The "hier"-part of the URI must provide a suitable address for the given addressing scheme. The following is a non-normative example of a HELLO containing three HELLO URIs:

It is expected that there are basic mechanisms available to manage peer connectivity and addressing. The required functionality are abstracted through the following procedures and events:

PEER_CONNECTED(phash,address): is a signal that allows the DHT to react to peers which connect. Such an event triggers, for example, updates in the routing table.
PEER_DISCONNECTED(phash,address): is a signal that allows the DHT to react to peers which disconnect. Such an event triggers, for example, updates in the routing table.
TRY_CONNECT(pid, address): A function which allows a peer to attempt the establishment of a connection to another peer using an address.
HOLD(pash): A function which tells the underlay to keep a hold on the connection to another peer.
DROP(pash): A function which tells the underlay to drop the connection to another peer.
RECEIVE(source, message): A function or event that allows the peer to receive protocol messages as defined in this document from a connected peer.
SEND(target, message): A function that allows a peer to send protocol messages as defined in this document to a connected peer. If call to SEND fails, the message has not been sent.
NETWORK_SIZE_ESTIMATE(N): A function or event that provides estimates on the network size for use in the DHT routing algorithms.
ADDRESS_ADD(pk, address): The underlay signals us that an address was added. This information is used, for example, to publish connectivity as part of the bootstrapping and overlay creation.
ADDRESS_DELETE(pk, address): The underlay signals us that an address was removed. This information is used, for example, to publish connectivity as part of the bootstrapping and overlay creation.
VERIFY(blob): Signature verification by underlay.

Routing

Peer selection In order to select peers from the routing table which are suitable destinations for sending messages, R5N uses a hybrid approach: Given an estimated network size N, the peer selection for the first N hops is random. After the initial N hops, peer selection follows an XOR-based peer distance calculation. As the message traverses a random path through the network for the first N hops, it is essential that routing loops are avoided. In R5N, a bloomfilter is used as part of the routing metadata in messages. The bloomfilter is updates at each hop with the hops peer identity. For the next hop selection in both the random and the deterministic case, any peer which is in the bloomfilter for the respective message is not included in the peer selection process. R5N stores the information of all connected peers in a a set of lists similar to the k-buckets data structure of . The index which determines in which of the k lists to add a given peer is calculated using the FIND-BUCKET procedure (see . The buckets serve implicitly as a routing table for messages: In order to select a peer for a given message key and bloomfilter, the PEER-SELECT is used (see .

IF hops >= N dist := MAX_VALUE FOR EACH p IN peers IF XOR(p, key) < dist dist := XOR(p, key) target := p END END ELSE r := rand() target := peers[r] END END ]]> The procedure to determine if we are the closest know peer for a given message key and bloomfilter is defined as follows:

The FIND-BUCKET Procedure.

The AM-CLOSEST-PEER Procedure.

Message Processing

Bloomfilter In order to prevent circular routes, GET and PUT messages contain a 128-bit Bloom filter (m=128). The Bloom filter is used to detect duplicate peer IDs along the route. A Bloom filter "bf" is initially empty, consisting only of zeroes. There are two functions which can be invoked on the Bloom filter: BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to added to the Bloom filter or queried against the set. Any bloom filter uses k=16 different hash functions each of which is defined as follows:

Extended query TODO: What is this for? Not documented anywhere

PUT message

Wire Format

where:

MSIZE: denotes the size of this message in network byte order.
MTYPE: is the 16-bit message type. This type can be one of the DHT message types but for put messages it must be set to the value 146 in network byte order.
BTYPE: is a 32-bit block type field. The block type indicates the content type of the payload. In network byte order.
OPTIONS: is a 16-bit options field (see below).
HOPCOUNT: is a 16-bit number indicating how many hops this message has traversed to far. In network byte order.
REPL_LVL: is a 16-bit number indicating the desired replication level of the data. In network byte order.
PATH_LEN: is a 16-bit number indicating the length of the PUT path recorded in PUTPATH. As PUTPATH is optiona, this value may be zero. In network byte order.
EXPIRATION: denotes the absolute 64-bit expiration date of the content. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
BLOOMFILTER: A bloomfilter (for peer identities) to stop circular routes.
KEY: The key under which the PUT request wants to store content under.
PUTPATH: the variable-length PUT path. The path consists of a list of PATH_LEN peer IDs.
BLOCK: the variable-length block payload. The contents are determined by the BTYPE field.

Processing Upon receiving a PutMessage from a connected peer. An implementation MUST process it step by step as follows:

The EXPIRATION field is evaluated. If the message is expired, it MUST be discarded.
If the BTYPE is not supported by the implementation, no validation of the block payload is performed and processing continues at (4). Else, the block MUST be validated as defined in (3).
The block payload of the message is evaluated using according to the BTYPE using the respective ValidateBlockStoreRequest procedure. If the block payload is invalid or does not match the key, it MUST be discarded.
The sender peer ID SHOULD be in the BLOOMFILTER. If not, the implementation MAY log an error, but MUST continue.
If the RecordRoute flag is set in OPTIONS, the local peer ID MUST be appended to the PUTPATH of the message.
If the local peer is the closest peer (AM-CLOSEST-PEER is true) or the DemultiplexEverywhere options flag ist set, the message MUST be stored locally in the block storage.
Given the value in REPL_LVL, the number of peers to forward to MUST be calculated (NUM-FORWARD-PEERS). If there is at least one peer to forward to, the implementation SHOULD select up to this number of peers to forward the message to. The implementation MAY forward to fewer or no peers in order to handle resource constraints such as bandwidth. The message BLOOMFILTER MUST be updated with the local peer ID.

GET Message

Wire Format

where:

MSIZE: denotes the size of this message in network byte order.
MTYPE: is the 16-bit message type. This type can be one of the DHT message types but for put messages it must be set to the value 147 in network byte order.
BTYPE: is a 32-bit block type field. The block type indicates the content type of the payload. In network byte order.
OPTIONS: is a 16-bit options field (see below).
HOPCOUNT: is a 16-bit number indicating how many hops this message has traversed to far. In network byte order.
REPL_LVL: is a 16-bit number indicating the desired replication level of the data. In network byte order.
XQ_SIZE: is a 32-bit number indicating the length of the optional extended query XQUERY. In network byte order.
BLOOMFILTER: A bloomfilter (for peer identities) to stop circular routes.
KEY: The key under which the PUT request wants to store content under.
XQUERY: the variable-length extended query. Optional.
BF_MUTATOR: The 32-bit bloomfilter mutator for the result bloomfilter.
RESULT_BF: the variable-length result bloomfilter.

Processing Upon receiving a GetMmessage from a connected peer an implementation MUST process it step by step as follows:

The KEY and XQUERY is validated against the requested BTYPE as defined by its respective ValidateBlockQuery procedure. If the BTYPE is not supported, or if the block key does not match or if the XQUERY is malformed, the message MUST be discarded.
The sender peer ID SHOULD be in the BLOOMFILTER. If not, the implementation MAY log an error, but MUST continue.
If the local peer is the closest peer (AM-CLOSEST-PEER) or the DemultiplexEverywhere options flag is set, a reply MUST be produced:
1. If OPTIONS indicate a FindPeer request, FIXME the peer selection foo from buckets that probably needs fixing. Take into account REPLY_BF
2. Else, if there is a BLOCK in the local Block Storage which is not already in the RESULT_BF, a RESULT message MUST be sent. FIXME link to how the result is sent?
FIXME: We only handle if not GNUNET_BLOCK_EVALUATION_OK_LAST. This means that we must evaluate the Reply produced in the previous step using ValidateBlockReply for this BTYPE
Given the value in REPL_LVL, the number of peers to forward to MUST be calculated (NUM-FORWARD-PEERS). If there is at least one peer to forward to, the implementation SHOULD select up to this number of peers to forward the message to. The implementation MAY forward to fewer or no peers in order to handle resource constraints such as bandwidth. The message BLOOMFILTER MUST be updated with the local peer ID.

RESULT message

Wire Format

where:

MSIZE: denotes the size of this message in network byte order.
MTYPE: is the 16-bit message type. This type can be one of the DHT message types but for put messages it must be set to the value 148 in network byte order.
OPTIONS: is a 16-bit options field (see below).
BTYPE: is a 32-bit block type field. The block type indicates the content type of the payload. In network byte order.
PUTPATH_L: is a 16-bit number indicating the length of the PUT path recorded in PUTPATH. As PUTPATH is optiona, this value may be zero. In network byte order.
GET_PATH_LEN: is a 16-bit number indicating the length of the GET path recorded in GETPATH. As PUTPATH is optiona, this value may be zero. In network byte order.
EXPIRATION: denotes the absolute 64-bit expiration date of the content. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
KEY: The key under which the PUT request wants to store content under.
PUTPATH: the variable-length PUT path. The path consists of a list of PATH_LEN peer IDs.
GETPATH: the variable-length PUT path. The path consists of a list of PATH_LEN peer IDs.
BLOCK: the variable-length resource record data payload. The contents are defined by the respective type of the resource record.

Processing Upon receiving a RESULT message from a connected peer. An implementation MUST process it step by step as follows:

The EXPIRATION field is evaluated. If the message is expired, it MUST be discarded.
If the MTYPE of the message indicates a HELLO block, the payload MUST be considered for the local routing table. FIXME: Considered how?
If the sender peer (FIXME which peer?) is already found in the GETPATH, the path MUST be truncated.
If the KEY of this PUT message is found in the list of pending queries, the the KEY and XQUERY is validated against the requested BTYPE. If the BTYPE is not supported, or if the block key does not match the BTYPE or if the XQUERY is malformed, the message MUST be discarded. (FIXME: It is not clear the key validation is happening. However, block validation is.)
The implementation MAY cache RESULT messages.
If no requests for this KEY or BTYPE are known, result processing is completed.
If the request is of type "Find Peer" and the message BTYPE is of type HELLO the block key is extracted from BLOCK, and if the block key does not match KEY or cannot be extracted because the BLOCK is malformed, the message MUST be discarded. Otherwise, the block is evaluated against the message KEY. FIXME: If OK_MORE or OK_LAST the RESULT is routed. One (!) peer is selected from the connected peers (!). If none is found the message is discarded.

Block Storage

Block Processing RequestEvaluationResult

REQUEST_VALID: Query is valid, no reply given.
REQUEST_INVALID: Query format does not match block type. For example, XQuery not given or of size of XQuery is not appropriate for type.

ReplyEvaluationResult

OK_MORE: Valid result, and there may be more.
OK_LAST: Last possible valid result.
OK_DUPLICATE: Valid result, but duplicate.
RESULT_INVALID: Invalid result. Block does not match query. Value = 4.
RESULT_IRRELEVANT: Block does not match xquery. Valid result, but not relevant for the request.

Block Functions Any block type implementation MUST implement the following functions.

ValidateBlockQuery(Key, XQuery) -> RequestEvaluationResult: is used to evaluate the request for a block. It is used as part of GetMessage processing, where the block payload is still unkown, but the block XQuery (FIXME: Undefined here) and Key can and MUST be verified, if possible.
ValidateBlockStoreRequest(Block, Key) -> RequestEvaluationResult: is used to evaluate a block including its key and payload. It is used as part of PutMessage processing. The validation MUST include a check of the block payload against the Key under which it is requested to be stored.
ValidateBlockReply(Block, XQuery, Key) -> ReplyEvaluationResult: is used to evaluate a block including its Key and payload. It is used as part ResultMessage processing. The validation of the respective Block requires a pending local query or a previously routed request of another peer and its associated XQuery data and Key. The validation MUST include a check of the block payload against the key under which it is requested to be stored.
DeriveBlockKey(Block) -> Key: is used to synthesize the block key from the block payload and metadata. It is used as part of FIND-PEER message processing.
FilterResult(Block, XQuery, Key) -> ReplyEvaluationResult: is used to filter results stored in the local block storage for local queries. Locally stored blocks from previously observed ResultMessages and PutMessages MAY use this function instead of ValidateBlockReply in order to avoid revalidation of the block and only perform filtering based on request parameters.

Block Types Applications can and should define their own block types. The block type determines the format and handling of the block payload by peers in PUT and RESULT messages. Block types MUST be registered with GANA . For bootstrapping and peer discovery, the DHT implementation uses its own block type called "HELLO". A block with this block type contains the peer ID of the peer initiating the GET request.

HELLO The HELLO block type wire format is illustrated in . A block of type HELLO MUST NOT include extended query data (xquery). Any implementation encountering a HELLO block with xquery data MUST consider the block invalid and ignore it. A HELLO reply block MAY be empty. Otherwise, it contains the HELLO URI of a peer.

Bootstrapping It is assumed that the peer is already connected to at least one other peer. First, those initial peers are sorted into their respective buckets. In order to find the closest peers in the network to itself, an implementation MUST now periodically send HELLO GET queries for its own peer ID. Both the "record route" and "find peer" message options are set in the GET queries in order to learn peers and network topology from the message route and in order to receive approximate replies to the query key (the peer ID). FIXME: Periodically -> more specific? No. Frequency may be adapted depending on network conditions, known peers, busy/idle etc. Any implementation encountering a HELLO GET request initially sends its own peer ID if it.

Security Considerations

GANA Considerations GANA is requested to create a "DHT Block Types" registry. The registry shall record for each entry:

Name: The name of the block type (case-insensitive ASCII string, restricted to alphanumeric characters
Number: 32-bit
Comment: Optionally, a brief English text describing the purpose of the block type (in UTF-8)
Contact: Optionally, the contact information of a person to contact for further information
References: Optionally, references describing the record type (such as an RFC)

The registration policy for this sub-registry is "First Come First Served", as described in . GANA is requested to populate this registry as follows:

GANA is requested to amend the "GNUnet Signature Purpose" registry as follows:

Test Vectors