dht.rst - gnunet-handbook - The GNUnet Handbook

dht.rst (16408B)
      1 .. _Distributed-Hash-Table-Dev:
      2 
      3 .. index::
      4    double: Distributed hash table; subsystem
      5    see: DHT; Distributed hash table
      6 
      7 DHT
      8 ===
      9 
     10 The API of libgnunetblock
     11 ^^^^^^^^^^^^^^^^^^^^^^^^^
     12 
     13 The block library requires for each (family of) block type(s) a block
     14 plugin (implementing ``gnunet_block_plugin.h``) that provides basic
     15 functions that are needed by the DHT (and possibly other subsystems) to
     16 manage the block. These block plugins are typically implemented within
     17 their respective subsystems. The main block library is then used to
     18 locate, load and query the appropriate block plugin. Which plugin is
     19 appropriate is determined by the block type (which is just a 32-bit
     20 integer). Block plugins contain code that specifies which block types
     21 are supported by a given plugin. The block library loads all block
     22 plugins that are installed at the local peer and forwards the
     23 application request to the respective plugin.
     24 
     25 The central functions of the block APIs (plugin and main library) are to
     26 allow the mapping of blocks to their respective key (if possible) and
     27 the ability to check that a block is well-formed and matches a given
     28 request (again, if possible). This way, GNUnet can avoid storing invalid
     29 blocks, storing blocks under the wrong key and forwarding blocks in
     30 response to a query that they do not answer.
     31 
     32 One key function of block plugins is that it allows GNUnet to detect
     33 duplicate replies (via the Bloom filter). All plugins MUST support
     34 detecting duplicate replies (by adding the current response to the Bloom
     35 filter and rejecting it if it is encountered again). If a plugin fails
     36 to do this, responses may loop in the network.
     37 
     38 .. _Queries:
     39 
     40 Queries
     41 ^^^^^^^
     42 
     43 The query format for any block in GNUnet consists of four main
     44 components. First, the type of the desired block must be specified.
     45 Second, the query must contain a hash code. The hash code is used for
     46 lookups in hash tables and databases and must not be unique for the
     47 block (however, if possible a unique hash should be used as this would
     48 be best for performance). Third, an optional Bloom filter can be
     49 specified to exclude known results; replies that hash to the bits set in
     50 the Bloom filter are considered invalid. False-positives can be
     51 eliminated by sending the same query again with a different Bloom filter
     52 mutator value, which parametrizes the hash function that is used.
     53 Finally, an optional application-specific \"eXtended query\" (xquery)
     54 can be specified to further constrain the results. It is entirely up to
     55 the type-specific plugin to determine whether or not a given block
     56 matches a query (type, hash, Bloom filter, and xquery). Naturally, not
     57 all xquery's are valid and some types of blocks may not support Bloom
     58 filters either, so the plugin also needs to check if the query is valid
     59 in the first place.
     60 
     61 Depending on the results from the plugin, the DHT will then discard the
     62 (invalid) query, forward the query, discard the (invalid) reply, cache
     63 the (valid) reply, and/or forward the (valid and non-duplicate) reply.
     64 
     65 .. _Sample-Code:
     66 
     67 Sample Code
     68 ^^^^^^^^^^^
     69 
     70 The source code in **plugin_block_test.c** is a good starting point for
     71 new block plugins --- it does the minimal work by implementing a plugin
     72 that performs no validation at all. The respective **Makefile.am** shows
     73 how to build and install a block plugin.
     74 
     75 .. _Conclusion2:
     76 
     77 Conclusion2
     78 ^^^^^^^^^^^
     79 
     80 In conclusion, GNUnet subsystems that want to use the DHT need to define
     81 a block format and write a plugin to match queries and replies. For
     82 testing, the ``GNUNET_BLOCK_TYPE_TEST`` block type can be used; it
     83 accepts any query as valid and any reply as matching any query. This
     84 type is also used for the DHT command line tools. However, it should NOT
     85 be used for normal applications due to the lack of error checking that
     86 results from this primitive implementation.
     87 
     88 libgnunetdht
     89 :index:`libgnunetdht <single: libgnunet; dht>`
     90 libgnunetdht
     91 ------------
     92 
     93 The DHT API itself is pretty simple and offers the usual GET and PUT
     94 functions that work as expected. The specified block type refers to the
     95 block library which allows the DHT to run application-specific logic for
     96 data stored in the network.
     97 
     98 .. _GET:
     99 
    100 GET
    101 ^^^
    102 
    103 When using GET, the main consideration for developers (other than the
    104 block library) should be that after issuing a GET, the DHT will
    105 continuously cause (small amounts of) network traffic until the
    106 operation is explicitly canceled. So GET does not simply send out a
    107 single network request once; instead, the DHT will continue to search
    108 for data. This is needed to achieve good success rates and also handles
    109 the case where the respective PUT operation happens after the GET
    110 operation was started. Developers should not cancel an existing GET
    111 operation and then explicitly re-start it to trigger a new round of
    112 network requests; this is simply inefficient, especially as the internal
    113 automated version can be more efficient, for example by filtering
    114 results in the network that have already been returned.
    115 
    116 If an application that performs a GET request has a set of replies that
    117 it already knows and would like to filter, it can
    118 call ``GNUNET_DHT_get_filter_known_results`` with an array of hashes
    119 over the respective blocks to tell the DHT that these results are not
    120 desired (any more). This way, the DHT will filter the respective blocks
    121 using the block library in the network, which may result in a
    122 significant reduction in bandwidth consumption.
    123 
    124 .. _PUT:
    125 
    126 PUT
    127 ^^^
    128 
    129 .. todo:: inconsistent use of "must" above it's written "MUST"
    130 
    131 In contrast to GET operations, developers **must** manually re-run PUT
    132 operations periodically (if they intend the content to continue to be
    133 available). Content stored in the DHT expires or might be lost due to
    134 churn. Furthermore, GNUnet's DHT typically requires multiple rounds of
    135 PUT operations before a key-value pair is consistently available to all
    136 peers (the DHT randomizes paths and thus storage locations, and only
    137 after multiple rounds of PUTs there will be a sufficient number of
    138 replicas in large DHTs). An explicit PUT operation using the DHT API
    139 will only cause network traffic once, so in order to ensure basic
    140 availability and resistance to churn (and adversaries), PUTs must be
    141 repeated. While the exact frequency depends on the application, a rule
    142 of thumb is that there should be at least a dozen PUT operations within
    143 the content lifetime. Content in the DHT typically expires after one
    144 day, so DHT PUT operations should be repeated at least every 1-2 hours.
    145 
    146 .. _MONITOR:
    147 
    148 MONITOR
    149 ^^^^^^^
    150 
    151 The DHT API also allows applications to monitor messages crossing the
    152 local DHT service. The types of messages used by the DHT are GET, PUT
    153 and RESULT messages. Using the monitoring API, applications can choose
    154 to monitor these requests, possibly limiting themselves to requests for
    155 a particular block type.
    156 
    157 The monitoring API is not only useful for diagnostics, it can also be
    158 used to trigger application operations based on PUT operations. For
    159 example, an application may use PUTs to distribute work requests to
    160 other peers. The workers would then monitor for PUTs that give them
    161 work, instead of looking for work using GET operations. This can be
    162 beneficial, especially if the workers have no good way to guess the keys
    163 under which work would be stored. Naturally, additional protocols might
    164 be needed to ensure that the desired number of workers will process the
    165 distributed workload.
    166 
    167 .. _DHT-Routing-Options:
    168 
    169 DHT Routing Options
    170 ^^^^^^^^^^^^^^^^^^^
    171 
    172 There are two important options for GET and PUT requests:
    173 
    174 GNUNET_DHT_RO_DEMULITPLEX_EVERYWHERE This option means that all
    175    peers should process the request, even if their peer ID is not
    176    closest to the key. For a PUT request, this means that all peers that
    177    a request traverses may make a copy of the data. Similarly for a GET
    178    request, all peers will check their local database for a result.
    179    Setting this option can thus significantly improve caching and reduce
    180    bandwidth consumption --- at the expense of a larger DHT database. If
    181    in doubt, we recommend that this option should be used.
    182 
    183 GNUNET_DHT_RO_RECORD_ROUTE This option instructs the DHT to record
    184    the path that a GET or a PUT request is taking through the overlay
    185    network. The resulting paths are then returned to the application
    186    with the respective result. This allows the receiver of a result to
    187    construct a path to the originator of the data, which might then be
    188    used for routing. Naturally, setting this option requires additional
    189    bandwidth and disk space, so applications should only set this if the
    190    paths are needed by the application logic.
    191 
    192 GNUNET_DHT_RO_FIND_PEER This option is an internal option used by
    193    the DHT's peer discovery mechanism and should not be used by
    194    applications.
    195 
    196 GNUNET_DHT_RO_BART This option is currently not implemented. It may
    197    in the future offer performance improvements for clique topologies.
    198 
    199 .. _The-DHT-Client_002dService-Protocol:
    200 
    201 The DHT Client-Service Protocol
    202 -------------------------------
    203 
    204 .. _PUTting-data-into-the-DHT:
    205 
    206 PUTting data into the DHT
    207 ^^^^^^^^^^^^^^^^^^^^^^^^^
    208 
    209 To store (PUT) data into the DHT, the client sends a
    210 ``struct GNUNET_DHT_ClientPutMessage`` to the service. This message
    211 specifies the block type, routing options, the desired replication
    212 level, the expiration time, key, value and a 64-bit unique ID for the
    213 operation. The service responds with a
    214 ``struct GNUNET_DHT_ClientPutConfirmationMessage`` with the same 64-bit
    215 unique ID. Note that the service sends the confirmation as soon as it
    216 has locally processed the PUT request. The PUT may still be propagating
    217 through the network at this time.
    218 
    219 In the future, we may want to change this to provide (limited) feedback
    220 to the client, for example if we detect that the PUT operation had no
    221 effect because the same key-value pair was already stored in the DHT.
    222 However, changing this would also require additional state and messages
    223 in the P2P interaction.
    224 
    225 .. _GETting-data-from-the-DHT:
    226 
    227 GETting data from the DHT
    228 ^^^^^^^^^^^^^^^^^^^^^^^^^
    229 
    230 To retrieve (GET) data from the DHT, the client sends a
    231 ``struct GNUNET_DHT_ClientGetMessage`` to the service. The message
    232 specifies routing options, a replication level (for replicating the GET,
    233 not the content), the desired block type, the key, the (optional)
    234 extended query and unique 64-bit request ID.
    235 
    236 Additionally, the client may send any number of
    237 ``struct GNUNET_DHT_ClientGetResultSeenMessage``\ s to notify the
    238 service about results that the client is already aware of. These
    239 messages consist of the key, the unique 64-bit ID of the request, and an
    240 arbitrary number of hash codes over the blocks that the client is
    241 already aware of. As messages are restricted to 64k, a client that
    242 already knows more than about a thousand blocks may need to send several
    243 of these messages. Naturally, the client should transmit these messages
    244 as quickly as possible after the original GET request such that the DHT
    245 can filter those results in the network early on. Naturally, as these
    246 messages are sent after the original request, it is conceivable that the
    247 DHT service may return blocks that match those already known to the
    248 client anyway.
    249 
    250 In response to a GET request, the service will send ``struct
    251 GNUNET_DHT_ClientResultMessage``\ s to the client. These messages
    252 contain the block type, expiration, key, unique ID of the request and of
    253 course the value (a block). Depending on the options set for the
    254 respective operations, the replies may also contain the path the GET
    255 and/or the PUT took through the network.
    256 
    257 A client can stop receiving replies either by disconnecting or by
    258 sending a ``struct GNUNET_DHT_ClientGetStopMessage`` which must contain
    259 the key and the 64-bit unique ID of the original request. Using an
    260 explicit \"stop\" message is more common as this allows a client to run
    261 many concurrent GET operations over the same connection with the DHT
    262 service --- and to stop them individually.
    263 
    264 .. _Monitoring-DHT:
    265 
    266 Monitoring the DHT
    267 ^^^^^^^^^^^^^^^^^^
    268 
    269 To begin monitoring, the client sends a
    270 ``struct GNUNET_DHT_MonitorStartStop`` message to the DHT service. In
    271 this message, flags can be set to enable (or disable) monitoring of GET,
    272 PUT and RESULT messages that pass through a peer. The message can also
    273 restrict monitoring to a particular block type or a particular key. Once
    274 monitoring is enabled, the DHT service will notify the client about any
    275 matching event using ``struct GNUNET_DHT_MonitorGetMessage``\ s for GET
    276 events, ``struct GNUNET_DHT_MonitorPutMessage`` for PUT events and
    277 ``struct GNUNET_DHT_MonitorGetRespMessage`` for RESULTs. Each of these
    278 messages contains all of the information about the event.
    279 
    280 .. _The-DHT-Peer_002dto_002dPeer-Protocol:
    281 
    282 The DHT Peer-to-Peer Protocol
    283 -----------------------------
    284 
    285 .. _Routing-GETs-or-PUTs:
    286 
    287 Routing GETs or PUTs
    288 ^^^^^^^^^^^^^^^^^^^^
    289 
    290 When routing GETs or PUTs, the DHT service selects a suitable subset of
    291 neighbours for forwarding. The exact number of neighbours can be zero or
    292 more and depends on the hop counter of the query (initially zero) in
    293 relation to the (log of) the network size estimate, the desired
    294 replication level and the peer's connectivity. Depending on the hop
    295 counter and our network size estimate, the selection of the peers maybe
    296 randomized or by proximity to the key. Furthermore, requests include a
    297 set of peers that a request has already traversed; those peers are also
    298 excluded from the selection.
    299 
    300 .. _PUTting-data-into-the-DHT2:
    301 
    302 PUTting data into the DHT
    303 ^^^^^^^^^^^^^^^^^^^^^^^^^
    304 
    305 To PUT data into the DHT, the service sends a ``struct PeerPutMessage``
    306 of type ``GNUNET_MESSAGE_TYPE_DHT_P2P_PUT`` to the respective neighbour.
    307 In addition to the usual information about the content (type, routing
    308 options, desired replication level for the content, expiration time, key
    309 and value), the message contains a fixed-size Bloom filter with
    310 information about which peers (may) have already seen this request. This
    311 Bloom filter is used to ensure that DHT messages never loop back to a
    312 peer that has already processed the request. Additionally, the message
    313 includes the current hop counter and, depending on the routing options,
    314 the message may include the full path that the message has taken so far.
    315 The Bloom filter should already contain the identity of the previous
    316 hop; however, the path should not include the identity of the previous
    317 hop and the receiver should append the identity of the sender to the
    318 path, not its own identity (this is done to reduce bandwidth).
    319 
    320 .. _GETting-data-from-the-DHT2:
    321 
    322 GETting data from the DHT
    323 ^^^^^^^^^^^^^^^^^^^^^^^^^
    324 
    325 A peer can search the DHT by sending ``struct PeerGetMessage``\ s of
    326 type ``GNUNET_MESSAGE_TYPE_DHT_P2P_GET`` to other peers. In addition to
    327 the usual information about the request (type, routing options, desired
    328 replication level for the request, the key and the extended query), a
    329 GET request also contains a hop counter, a Bloom filter over the peers
    330 that have processed the request already and depending on the routing
    331 options the full path traversed by the GET. Finally, a GET request
    332 includes a variable-size second Bloom filter and a so-called Bloom
    333 filter mutator value which together indicate which replies the sender
    334 has already seen. During the lookup, each block that matches they block
    335 type, key and extended query is additionally subjected to a test against
    336 this Bloom filter. The block plugin is expected to take the hash of the
    337 block and combine it with the mutator value and check if the result is
    338 not yet in the Bloom filter. The originator of the query will from time
    339 to time modify the mutator to (eventually) allow false-positives
    340 filtered by the Bloom filter to be returned.
    341 
    342 Peers that receive a GET request perform a local lookup (depending on
    343 their proximity to the key and the query options) and forward the
    344 request to other peers. They then remember the request (including the
    345 Bloom filter for blocking duplicate results) and when they obtain a
    346 matching, non-filtered response a ``struct PeerResultMessage`` of type
    347 ``GNUNET_MESSAGE_TYPE_DHT_P2P_RESULT`` is forwarded to the previous hop.
    348 Whenever a result is forwarded, the block plugin is used to update the
    349 Bloom filter accordingly, to ensure that the same result is never
    350 forwarded more than once. The DHT service may also cache forwarded
    351 results locally if the \"CACHE_RESULTS\" option is set to \"YES\" in the
    352 configuration.