dht.rst (16408B)
1 .. _Distributed-Hash-Table-Dev: 2 3 .. index:: 4 double: Distributed hash table; subsystem 5 see: DHT; Distributed hash table 6 7 DHT 8 === 9 10 The API of libgnunetblock 11 ^^^^^^^^^^^^^^^^^^^^^^^^^ 12 13 The block library requires for each (family of) block type(s) a block 14 plugin (implementing ``gnunet_block_plugin.h``) that provides basic 15 functions that are needed by the DHT (and possibly other subsystems) to 16 manage the block. These block plugins are typically implemented within 17 their respective subsystems. The main block library is then used to 18 locate, load and query the appropriate block plugin. Which plugin is 19 appropriate is determined by the block type (which is just a 32-bit 20 integer). Block plugins contain code that specifies which block types 21 are supported by a given plugin. The block library loads all block 22 plugins that are installed at the local peer and forwards the 23 application request to the respective plugin. 24 25 The central functions of the block APIs (plugin and main library) are to 26 allow the mapping of blocks to their respective key (if possible) and 27 the ability to check that a block is well-formed and matches a given 28 request (again, if possible). This way, GNUnet can avoid storing invalid 29 blocks, storing blocks under the wrong key and forwarding blocks in 30 response to a query that they do not answer. 31 32 One key function of block plugins is that it allows GNUnet to detect 33 duplicate replies (via the Bloom filter). All plugins MUST support 34 detecting duplicate replies (by adding the current response to the Bloom 35 filter and rejecting it if it is encountered again). If a plugin fails 36 to do this, responses may loop in the network. 37 38 .. _Queries: 39 40 Queries 41 ^^^^^^^ 42 43 The query format for any block in GNUnet consists of four main 44 components. First, the type of the desired block must be specified. 45 Second, the query must contain a hash code. The hash code is used for 46 lookups in hash tables and databases and must not be unique for the 47 block (however, if possible a unique hash should be used as this would 48 be best for performance). Third, an optional Bloom filter can be 49 specified to exclude known results; replies that hash to the bits set in 50 the Bloom filter are considered invalid. False-positives can be 51 eliminated by sending the same query again with a different Bloom filter 52 mutator value, which parametrizes the hash function that is used. 53 Finally, an optional application-specific \"eXtended query\" (xquery) 54 can be specified to further constrain the results. It is entirely up to 55 the type-specific plugin to determine whether or not a given block 56 matches a query (type, hash, Bloom filter, and xquery). Naturally, not 57 all xquery's are valid and some types of blocks may not support Bloom 58 filters either, so the plugin also needs to check if the query is valid 59 in the first place. 60 61 Depending on the results from the plugin, the DHT will then discard the 62 (invalid) query, forward the query, discard the (invalid) reply, cache 63 the (valid) reply, and/or forward the (valid and non-duplicate) reply. 64 65 .. _Sample-Code: 66 67 Sample Code 68 ^^^^^^^^^^^ 69 70 The source code in **plugin_block_test.c** is a good starting point for 71 new block plugins --- it does the minimal work by implementing a plugin 72 that performs no validation at all. The respective **Makefile.am** shows 73 how to build and install a block plugin. 74 75 .. _Conclusion2: 76 77 Conclusion2 78 ^^^^^^^^^^^ 79 80 In conclusion, GNUnet subsystems that want to use the DHT need to define 81 a block format and write a plugin to match queries and replies. For 82 testing, the ``GNUNET_BLOCK_TYPE_TEST`` block type can be used; it 83 accepts any query as valid and any reply as matching any query. This 84 type is also used for the DHT command line tools. However, it should NOT 85 be used for normal applications due to the lack of error checking that 86 results from this primitive implementation. 87 88 libgnunetdht 89 :index:`libgnunetdht <single: libgnunet; dht>` 90 libgnunetdht 91 ------------ 92 93 The DHT API itself is pretty simple and offers the usual GET and PUT 94 functions that work as expected. The specified block type refers to the 95 block library which allows the DHT to run application-specific logic for 96 data stored in the network. 97 98 .. _GET: 99 100 GET 101 ^^^ 102 103 When using GET, the main consideration for developers (other than the 104 block library) should be that after issuing a GET, the DHT will 105 continuously cause (small amounts of) network traffic until the 106 operation is explicitly canceled. So GET does not simply send out a 107 single network request once; instead, the DHT will continue to search 108 for data. This is needed to achieve good success rates and also handles 109 the case where the respective PUT operation happens after the GET 110 operation was started. Developers should not cancel an existing GET 111 operation and then explicitly re-start it to trigger a new round of 112 network requests; this is simply inefficient, especially as the internal 113 automated version can be more efficient, for example by filtering 114 results in the network that have already been returned. 115 116 If an application that performs a GET request has a set of replies that 117 it already knows and would like to filter, it can 118 callĀ ``GNUNET_DHT_get_filter_known_results`` with an array of hashes 119 over the respective blocks to tell the DHT that these results are not 120 desired (any more). This way, the DHT will filter the respective blocks 121 using the block library in the network, which may result in a 122 significant reduction in bandwidth consumption. 123 124 .. _PUT: 125 126 PUT 127 ^^^ 128 129 .. todo:: inconsistent use of "must" above it's written "MUST" 130 131 In contrast to GET operations, developers **must** manually re-run PUT 132 operations periodically (if they intend the content to continue to be 133 available). Content stored in the DHT expires or might be lost due to 134 churn. Furthermore, GNUnet's DHT typically requires multiple rounds of 135 PUT operations before a key-value pair is consistently available to all 136 peers (the DHT randomizes paths and thus storage locations, and only 137 after multiple rounds of PUTs there will be a sufficient number of 138 replicas in large DHTs). An explicit PUT operation using the DHT API 139 will only cause network traffic once, so in order to ensure basic 140 availability and resistance to churn (and adversaries), PUTs must be 141 repeated. While the exact frequency depends on the application, a rule 142 of thumb is that there should be at least a dozen PUT operations within 143 the content lifetime. Content in the DHT typically expires after one 144 day, so DHT PUT operations should be repeated at least every 1-2 hours. 145 146 .. _MONITOR: 147 148 MONITOR 149 ^^^^^^^ 150 151 The DHT API also allows applications to monitor messages crossing the 152 local DHT service. The types of messages used by the DHT are GET, PUT 153 and RESULT messages. Using the monitoring API, applications can choose 154 to monitor these requests, possibly limiting themselves to requests for 155 a particular block type. 156 157 The monitoring API is not only useful for diagnostics, it can also be 158 used to trigger application operations based on PUT operations. For 159 example, an application may use PUTs to distribute work requests to 160 other peers. The workers would then monitor for PUTs that give them 161 work, instead of looking for work using GET operations. This can be 162 beneficial, especially if the workers have no good way to guess the keys 163 under which work would be stored. Naturally, additional protocols might 164 be needed to ensure that the desired number of workers will process the 165 distributed workload. 166 167 .. _DHT-Routing-Options: 168 169 DHT Routing Options 170 ^^^^^^^^^^^^^^^^^^^ 171 172 There are two important options for GET and PUT requests: 173 174 GNUNET_DHT_RO_DEMULITPLEX_EVERYWHERE This option means that all 175 peers should process the request, even if their peer ID is not 176 closest to the key. For a PUT request, this means that all peers that 177 a request traverses may make a copy of the data. Similarly for a GET 178 request, all peers will check their local database for a result. 179 Setting this option can thus significantly improve caching and reduce 180 bandwidth consumption --- at the expense of a larger DHT database. If 181 in doubt, we recommend that this option should be used. 182 183 GNUNET_DHT_RO_RECORD_ROUTE This option instructs the DHT to record 184 the path that a GET or a PUT request is taking through the overlay 185 network. The resulting paths are then returned to the application 186 with the respective result. This allows the receiver of a result to 187 construct a path to the originator of the data, which might then be 188 used for routing. Naturally, setting this option requires additional 189 bandwidth and disk space, so applications should only set this if the 190 paths are needed by the application logic. 191 192 GNUNET_DHT_RO_FIND_PEER This option is an internal option used by 193 the DHT's peer discovery mechanism and should not be used by 194 applications. 195 196 GNUNET_DHT_RO_BART This option is currently not implemented. It may 197 in the future offer performance improvements for clique topologies. 198 199 .. _The-DHT-Client_002dService-Protocol: 200 201 The DHT Client-Service Protocol 202 ------------------------------- 203 204 .. _PUTting-data-into-the-DHT: 205 206 PUTting data into the DHT 207 ^^^^^^^^^^^^^^^^^^^^^^^^^ 208 209 To store (PUT) data into the DHT, the client sends a 210 ``struct GNUNET_DHT_ClientPutMessage`` to the service. This message 211 specifies the block type, routing options, the desired replication 212 level, the expiration time, key, value and a 64-bit unique ID for the 213 operation. The service responds with a 214 ``struct GNUNET_DHT_ClientPutConfirmationMessage`` with the same 64-bit 215 unique ID. Note that the service sends the confirmation as soon as it 216 has locally processed the PUT request. The PUT may still be propagating 217 through the network at this time. 218 219 In the future, we may want to change this to provide (limited) feedback 220 to the client, for example if we detect that the PUT operation had no 221 effect because the same key-value pair was already stored in the DHT. 222 However, changing this would also require additional state and messages 223 in the P2P interaction. 224 225 .. _GETting-data-from-the-DHT: 226 227 GETting data from the DHT 228 ^^^^^^^^^^^^^^^^^^^^^^^^^ 229 230 To retrieve (GET) data from the DHT, the client sends a 231 ``struct GNUNET_DHT_ClientGetMessage`` to the service. The message 232 specifies routing options, a replication level (for replicating the GET, 233 not the content), the desired block type, the key, the (optional) 234 extended query and unique 64-bit request ID. 235 236 Additionally, the client may send any number of 237 ``struct GNUNET_DHT_ClientGetResultSeenMessage``\ s to notify the 238 service about results that the client is already aware of. These 239 messages consist of the key, the unique 64-bit ID of the request, and an 240 arbitrary number of hash codes over the blocks that the client is 241 already aware of. As messages are restricted to 64k, a client that 242 already knows more than about a thousand blocks may need to send several 243 of these messages. Naturally, the client should transmit these messages 244 as quickly as possible after the original GET request such that the DHT 245 can filter those results in the network early on. Naturally, as these 246 messages are sent after the original request, it is conceivable that the 247 DHT service may return blocks that match those already known to the 248 client anyway. 249 250 In response to a GET request, the service will send ``struct 251 GNUNET_DHT_ClientResultMessage``\ s to the client. These messages 252 contain the block type, expiration, key, unique ID of the request and of 253 course the value (a block). Depending on the options set for the 254 respective operations, the replies may also contain the path the GET 255 and/or the PUT took through the network. 256 257 A client can stop receiving replies either by disconnecting or by 258 sending a ``struct GNUNET_DHT_ClientGetStopMessage`` which must contain 259 the key and the 64-bit unique ID of the original request. Using an 260 explicit \"stop\" message is more common as this allows a client to run 261 many concurrent GET operations over the same connection with the DHT 262 service --- and to stop them individually. 263 264 .. _Monitoring-DHT: 265 266 Monitoring the DHT 267 ^^^^^^^^^^^^^^^^^^ 268 269 To begin monitoring, the client sends a 270 ``struct GNUNET_DHT_MonitorStartStop`` message to the DHT service. In 271 this message, flags can be set to enable (or disable) monitoring of GET, 272 PUT and RESULT messages that pass through a peer. The message can also 273 restrict monitoring to a particular block type or a particular key. Once 274 monitoring is enabled, the DHT service will notify the client about any 275 matching event using ``struct GNUNET_DHT_MonitorGetMessage``\ s for GET 276 events, ``struct GNUNET_DHT_MonitorPutMessage`` for PUT events and 277 ``struct GNUNET_DHT_MonitorGetRespMessage`` for RESULTs. Each of these 278 messages contains all of the information about the event. 279 280 .. _The-DHT-Peer_002dto_002dPeer-Protocol: 281 282 The DHT Peer-to-Peer Protocol 283 ----------------------------- 284 285 .. _Routing-GETs-or-PUTs: 286 287 Routing GETs or PUTs 288 ^^^^^^^^^^^^^^^^^^^^ 289 290 When routing GETs or PUTs, the DHT service selects a suitable subset of 291 neighbours for forwarding. The exact number of neighbours can be zero or 292 more and depends on the hop counter of the query (initially zero) in 293 relation to the (log of) the network size estimate, the desired 294 replication level and the peer's connectivity. Depending on the hop 295 counter and our network size estimate, the selection of the peers maybe 296 randomized or by proximity to the key. Furthermore, requests include a 297 set of peers that a request has already traversed; those peers are also 298 excluded from the selection. 299 300 .. _PUTting-data-into-the-DHT2: 301 302 PUTting data into the DHT 303 ^^^^^^^^^^^^^^^^^^^^^^^^^ 304 305 To PUT data into the DHT, the service sends a ``struct PeerPutMessage`` 306 of type ``GNUNET_MESSAGE_TYPE_DHT_P2P_PUT`` to the respective neighbour. 307 In addition to the usual information about the content (type, routing 308 options, desired replication level for the content, expiration time, key 309 and value), the message contains a fixed-size Bloom filter with 310 information about which peers (may) have already seen this request. This 311 Bloom filter is used to ensure that DHT messages never loop back to a 312 peer that has already processed the request. Additionally, the message 313 includes the current hop counter and, depending on the routing options, 314 the message may include the full path that the message has taken so far. 315 The Bloom filter should already contain the identity of the previous 316 hop; however, the path should not include the identity of the previous 317 hop and the receiver should append the identity of the sender to the 318 path, not its own identity (this is done to reduce bandwidth). 319 320 .. _GETting-data-from-the-DHT2: 321 322 GETting data from the DHT 323 ^^^^^^^^^^^^^^^^^^^^^^^^^ 324 325 A peer can search the DHT by sending ``struct PeerGetMessage``\ s of 326 type ``GNUNET_MESSAGE_TYPE_DHT_P2P_GET`` to other peers. In addition to 327 the usual information about the request (type, routing options, desired 328 replication level for the request, the key and the extended query), a 329 GET request also contains a hop counter, a Bloom filter over the peers 330 that have processed the request already and depending on the routing 331 options the full path traversed by the GET. Finally, a GET request 332 includes a variable-size second Bloom filter and a so-called Bloom 333 filter mutator value which together indicate which replies the sender 334 has already seen. During the lookup, each block that matches they block 335 type, key and extended query is additionally subjected to a test against 336 this Bloom filter. The block plugin is expected to take the hash of the 337 block and combine it with the mutator value and check if the result is 338 not yet in the Bloom filter. The originator of the query will from time 339 to time modify the mutator to (eventually) allow false-positives 340 filtered by the Bloom filter to be returned. 341 342 Peers that receive a GET request perform a local lookup (depending on 343 their proximity to the key and the query options) and forward the 344 request to other peers. They then remember the request (including the 345 Bloom filter for blocking duplicate results) and when they obtain a 346 matching, non-filtered response a ``struct PeerResultMessage`` of type 347 ``GNUNET_MESSAGE_TYPE_DHT_P2P_RESULT`` is forwarded to the previous hop. 348 Whenever a result is forwarded, the block plugin is used to update the 349 Bloom filter accordingly, to ensure that the same result is never 350 forwarded more than once. The DHT service may also cache forwarded 351 results locally if the \"CACHE_RESULTS\" option is set to \"YES\" in the 352 configuration.