subsystems.rst (87451B)
1 Subsystems 2 ********** 3 4 This section consists brief description of the subsystems that make up 5 GNUnet. 6 This image is giving an overview over system dependencies and interactions. 7 8 .. image:: /images/gnunet-arch-full.svg 9 10 CADET - Decentralized End-to-end Transport 11 ========================================== 12 13 The Confidential Ad-hoc Decentralized End-to-end Transport (CADET) subsystem 14 in GNUnet is responsible for secure end-to-end 15 communications between nodes in the GNUnet overlay network. CADET builds 16 on the CORE subsystem, which provides for the link-layer communication, 17 by adding routing, forwarding, and additional security to the 18 connections. CADET offers the same cryptographic services as CORE, but 19 on an end-to-end level. This is done so peers retransmitting traffic on 20 behalf of other peers cannot access the payload data. 21 22 - CADET provides confidentiality with so-called perfect forward 23 secrecy; we use ECDHE powered by Curve25519 for the key exchange and 24 then use symmetric encryption, encrypting with both AES-256 and 25 Twofish 26 27 - authentication is achieved by signing the ephemeral keys using 28 Ed25519, a deterministic variant of ECDSA 29 30 - integrity protection (using SHA-512 to do encrypt-then-MAC, although 31 only 256 bits are sent to reduce overhead) 32 33 - replay protection (using nonces, timestamps, challenge-response, 34 message counters and ephemeral keys) 35 36 - liveness (keep-alive messages, timeout) 37 38 Additional to the CORE-like security benefits, CADET offers other 39 properties that make it a more universal service than CORE. 40 41 - CADET can establish channels to arbitrary peers in GNUnet. If a peer 42 is not immediately reachable, CADET will find a path through the 43 network and ask other peers to retransmit the traffic on its behalf. 44 45 - CADET offers (optional) reliability mechanisms. In a reliable channel 46 traffic is guaranteed to arrive complete, unchanged and in-order. 47 48 - CADET takes care of flow and congestion control mechanisms, not 49 allowing the sender to send more traffic than the receiver or the 50 network are able to process. 51 52 .. _CORE-Subsystem: 53 54 .. index:: 55 double: CORE; subsystem 56 57 CORE - GNUnet link layer 58 ======================== 59 60 The CORE subsystem in GNUnet is responsible for securing link-layer 61 communications between nodes in the GNUnet overlay network. CORE builds 62 on the TRANSPORT subsystem which provides for the actual, insecure, 63 unreliable link-layer communication (for example, via UDP or WLAN), and 64 then adds fundamental security to the connections: 65 66 - confidentiality with so-called perfect forward secrecy; we use ECDHE 67 (`Elliptic-curve 68 Diffie—Hellman <http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman>`__) 69 powered by Curve25519 (`Curve25519 <http://cr.yp.to/ecdh.html>`__) 70 for the key exchange and then use symmetric encryption, encrypting 71 with both AES-256 72 (`AES-256 <http://en.wikipedia.org/wiki/Rijndael>`__) and Twofish 73 (`Twofish <http://en.wikipedia.org/wiki/Twofish>`__) 74 75 - `authentication <http://en.wikipedia.org/wiki/Authentication>`__ is 76 achieved by signing the ephemeral keys using Ed25519 77 (`Ed25519 <http://ed25519.cr.yp.to/>`__), a deterministic variant of 78 ECDSA (`ECDSA <http://en.wikipedia.org/wiki/ECDSA>`__) 79 80 - integrity protection (using SHA-512 81 (`SHA-512 <http://en.wikipedia.org/wiki/SHA-2>`__) to do 82 encrypt-then-MAC 83 (`encrypt-then-MAC <http://en.wikipedia.org/wiki/Authenticated_encryption>`__)) 84 85 - Replay (`replay <http://en.wikipedia.org/wiki/Replay_attack>`__) 86 protection (using nonces, timestamps, challenge-response, message 87 counters and ephemeral keys) 88 89 - liveness (keep-alive messages, timeout) 90 91 .. _Limitations: 92 93 :index:`Limitations <CORE; limitations>` 94 Limitations 95 ----------- 96 97 CORE does not perform 98 `routing <http://en.wikipedia.org/wiki/Routing>`__; using CORE it is 99 only possible to communicate with peers that happen to already be 100 \"directly\" connected with each other. CORE also does not have an API 101 to allow applications to establish such \"direct\" connections --- for 102 this, applications can ask TRANSPORT, but TRANSPORT might not be able to 103 establish a \"direct\" connection. The TOPOLOGY subsystem is responsible 104 for trying to keep a few \"direct\" connections open at all times. 105 Applications that need to talk to particular peers should use the CADET 106 subsystem, as it can establish arbitrary \"indirect\" connections. 107 108 Because CORE does not perform routing, CORE must only be used directly 109 by applications that either perform their own routing logic (such as 110 anonymous file-sharing) or that do not require routing, for example 111 because they are based on flooding the network. CORE communication is 112 unreliable and delivery is possibly out-of-order. Applications that 113 require reliable communication should use the CADET service. Each 114 application can only queue one message per target peer with the CORE 115 service at any time; messages cannot be larger than approximately 63 116 kilobytes. If messages are small, CORE may group multiple messages 117 (possibly from different applications) prior to encryption. If permitted 118 by the application (using the `cork <http://baus.net/on-tcp_cork/>`__ 119 option), CORE may delay transmissions to facilitate grouping of multiple 120 small messages. If cork is not enabled, CORE will transmit the message 121 as soon as TRANSPORT allows it (TRANSPORT is responsible for limiting 122 bandwidth and congestion control). CORE does not allow flow control; 123 applications are expected to process messages at line-speed. If flow 124 control is needed, applications should use the CADET service. 125 126 .. when is a peer connected 127 .. _When-is-a-peer-_0022connected_0022_003f: 128 129 When is a peer \"connected\"? 130 ----------------------------- 131 132 In addition to the security features mentioned above, CORE also provides 133 one additional key feature to applications using it, and that is a 134 limited form of protocol-compatibility checking. CORE distinguishes 135 between TRANSPORT-level connections (which enable communication with 136 other peers) and application-level connections. Applications using the 137 CORE API will (typically) learn about application-level connections from 138 CORE, and not about TRANSPORT-level connections. When a typical 139 application uses CORE, it will specify a set of message types (from 140 ``gnunet_protocols.h``) that it understands. CORE will then notify the 141 application about connections it has with other peers if and only if 142 those applications registered an intersecting set of message types with 143 their CORE service. Thus, it is quite possible that CORE only exposes a 144 subset of the established direct connections to a particular application 145 --- and different applications running above CORE might see different 146 sets of connections at the same time. 147 148 A special case are applications that do not register a handler for any 149 message type. CORE assumes that these applications merely want to 150 monitor connections (or \"all\" messages via other callbacks) and will 151 notify those applications about all connections. This is used, for 152 example, by the ``gnunet-core`` command-line tool to display the active 153 connections. Note that it is also possible that the TRANSPORT service 154 has more active connections than the CORE service, as the CORE service 155 first has to perform a key exchange with connecting peers before 156 exchanging information about supported message types and notifying 157 applications about the new connection. 158 .. _Distributed-Hash-Table-_0028DHT_0029: 159 160 .. index:: 161 double: Distributed hash table; subsystem 162 see: DHT; Distributed hash table 163 164 DHT - Distributed Hash Table 165 ============================ 166 167 GNUnet includes a generic distributed hash table that can be used by 168 developers building P2P applications in the framework. This section 169 documents high-level features and how developers are expected to use the 170 DHT. We have a research paper detailing how the DHT works. Also, Nate's 171 thesis includes a detailed description and performance analysis (in 172 chapter 6). [R5N2011]_ [EVANS2011]_ 173 174 Key features of GNUnet's DHT include: 175 176 - stores key-value pairs with values up to (approximately) 63k in size 177 178 - works with many underlay network topologies (small-world, random 179 graph), underlay does not need to be a full mesh / clique 180 181 - support for extended queries (more than just a simple 'key'), 182 filtering duplicate replies within the network (bloomfilter) and 183 content validation (for details, please read the subsection on the 184 block library) 185 186 - can (optionally) return paths taken by the PUT and GET operations to 187 the application 188 189 - provides content replication to handle churn 190 191 GNUnet's DHT is randomized and unreliable. Unreliable means that there 192 is no strict guarantee that a value stored in the DHT is always found 193 — values are only found with high probability. While this is somewhat 194 true in all P2P DHTs, GNUnet developers should be particularly wary of 195 this fact (this will help you write secure, fault-tolerant code). Thus, 196 when writing any application using the DHT, you should always consider 197 the possibility that a value stored in the DHT by you or some other peer 198 might simply not be returned, or returned with a significant delay. Your 199 application logic must be written to tolerate this (naturally, some loss 200 of performance or quality of service is expected in this case). 201 202 .. _Block-library-and-plugins: 203 204 Block library and plugins 205 ------------------------- 206 207 .. _What-is-a-Block_003f: 208 209 What is a Block? 210 ^^^^^^^^^^^^^^^^ 211 212 Blocks are small (< 63k) pieces of data stored under a key (struct 213 GNUNET_HashCode). Blocks have a type (enum GNUNET_BlockType) which 214 defines their data format. Blocks are used in GNUnet as units of static 215 data exchanged between peers and stored (or cached) locally. Uses of 216 blocks include file-sharing (the files are broken up into blocks), the 217 VPN (DNS information is stored in blocks) and the DHT (all information 218 in the DHT and meta-information for the maintenance of the DHT are both 219 stored using blocks). The block subsystem provides a few common 220 functions that must be available for any type of block. 221 222 223 .. [R5N2011] https://bib.gnunet.org/date.html#R5N 224 .. [EVANS2011] https://d-nb.info/1015129951 225 .. index:: 226 double: File sharing; subsystem 227 see: FS; File sharing 228 229 .. _File_002dsharing-_0028FS_0029-Subsystem: 230 231 FS — File sharing over GNUnet 232 ============================= 233 234 This chapter describes the details of how the file-sharing service 235 works. As with all services, it is split into an API (libgnunetfs), the 236 service process (gnunet-service-fs) and user interface(s). The 237 file-sharing service uses the datastore service to store blocks and the 238 DHT (and indirectly datacache) for lookups for non-anonymous 239 file-sharing. Furthermore, the file-sharing service uses the block 240 library (and the block fs plugin) for validation of DHT operations. 241 242 In contrast to many other services, libgnunetfs is rather complex since 243 the client library includes a large number of high-level abstractions; 244 this is necessary since the FS service itself largely only operates on 245 the block level. The FS library is responsible for providing a 246 file-based abstraction to applications, including directories, meta 247 data, keyword search, verification, and so on. 248 249 The method used by GNUnet to break large files into blocks and to use 250 keyword search is called the \"Encoding for Censorship Resistant 251 Sharing\" (ECRS). ECRS is largely implemented in the fs library; block 252 validation is also reflected in the block FS plugin and the FS service. 253 ECRS on-demand encoding is implemented in the FS service. 254 255 .. note:: The documentation in this chapter is quite incomplete. 256 257 .. _Encoding-for-Censorship_002dResistant-Sharing-_0028ECRS_0029: 258 259 .. index:: 260 see: Encoding for Censorship-Resistant Sharing; ECRS 261 262 :index:`ECRS — Encoding for Censorship-Resistant Sharing <single: ECRS>` 263 ECRS — Encoding for Censorship-Resistant Sharing 264 ------------------------------------------------ 265 266 When GNUnet shares files, it uses a content encoding that is called 267 ECRS, the Encoding for Censorship-Resistant Sharing. Most of ECRS is 268 described in the (so far unpublished) research paper attached to this 269 page. ECRS obsoletes the previous ESED and ESED II encodings which were 270 used in GNUnet before version 0.7.0. The rest of this page assumes that 271 the reader is familiar with the attached paper. What follows is a 272 description of some minor extensions that GNUnet makes over what is 273 described in the paper. The reason why these extensions are not in the 274 paper is that we felt that they were obvious or trivial extensions to 275 the original scheme and thus did not warrant space in the research 276 report. 277 278 .. todo:: Find missing link to file system paper. 279 280 .. index:: 281 double: GNU Name System; subsystem 282 see: GNS; GNU Name System 283 284 .. _GNU-Name-System-_0028GNS_0029: 285 286 GNS - the GNU Name system 287 ------------------------- 288 289 The GNU Name System (GNS) is a decentralized database that enables users 290 to securely resolve names to values. Names can be used to identify other 291 users (for example, in social networking), or network services (for 292 example, VPN services running at a peer in GNUnet, or purely IP-based 293 services on the Internet). Users interact with GNS by typing in a 294 hostname that ends in a top-level domain that is configured in the "GNS" 295 section, matches an identity of the user or ends in a Base32-encoded 296 public key. 297 298 Videos giving an overview of most of the GNS and the motivations behind 299 it is available here and here. The remainder of this chapter targets 300 developers that are familiar with high level concepts of GNS as 301 presented in these talks. 302 303 .. todo:: Link to videos and GNS talks? 304 305 GNS-aware applications should use the GNS resolver to obtain the 306 respective records that are stored under that name in GNS. Each record 307 consists of a type, value, expiration time and flags. 308 309 The type specifies the format of the value. Types below 65536 correspond 310 to DNS record types, larger values are used for GNS-specific records. 311 Applications can define new GNS record types by reserving a number and 312 implementing a plugin (which mostly needs to convert the binary value 313 representation to a human-readable text format and vice-versa). The 314 expiration time specifies how long the record is to be valid. The GNS 315 API ensures that applications are only given non-expired values. The 316 flags are typically irrelevant for applications, as GNS uses them 317 internally to control visibility and validity of records. 318 319 Records are stored along with a signature. The signature is generated 320 using the private key of the authoritative zone. This allows any GNS 321 resolver to verify the correctness of a name-value mapping. 322 323 Internally, GNS uses the NAMECACHE to cache information obtained from 324 other users, the NAMESTORE to store information specific to the local 325 users, and the DHT to exchange data between users. A plugin API is used 326 to enable applications to define new GNS record types. 327 328 .. index:: 329 single: GNS; name cache 330 double: subsystem; NAMECACHE 331 332 .. _GNS-Namecache: 333 334 NAMECACHE — DHT caching of GNS results 335 ====================================== 336 337 The NAMECACHE subsystem is responsible for caching (encrypted) 338 resolution results of the GNU Name System (GNS). GNS makes zone 339 information available to other users via the DHT. However, as accessing 340 the DHT for every lookup is expensive (and as the DHT's local cache is 341 lost whenever the peer is restarted), GNS uses the NAMECACHE as a more 342 persistent cache for DHT lookups. Thus, instead of always looking up 343 every name in the DHT, GNS first checks if the result is already 344 available locally in the NAMECACHE. Only if there is no result in the 345 NAMECACHE, GNS queries the DHT. The NAMECACHE stores data in the same 346 (encrypted) format as the DHT. It thus makes no sense to iterate over 347 all items in the NAMECACHE – the NAMECACHE does not have a way to 348 provide the keys required to decrypt the entries. 349 350 Blocks in the NAMECACHE share the same expiration mechanism as blocks in 351 the DHT – the block expires wheneever any of the records in the 352 (encrypted) block expires. The expiration time of the block is the only 353 information stored in plaintext. The NAMECACHE service internally 354 performs all of the required work to expire blocks, clients do not have 355 to worry about this. Also, given that NAMECACHE stores only GNS blocks 356 that local users requested, there is no configuration option to limit 357 the size of the NAMECACHE. It is assumed to be always small enough (a 358 few MB) to fit on the drive. 359 360 The NAMECACHE supports the use of different database backends via a 361 plugin API. 362 363 .. index:: 364 double: subsystem; NAMESTORE 365 366 .. _NAMESTORE-Subsystem: 367 368 NAMESTORE — Storage of local GNS zones 369 ====================================== 370 371 The NAMESTORE subsystem provides persistent storage for local GNS zone 372 information. All local GNS zone information are managed by NAMESTORE. It 373 provides both the functionality to administer local GNS information 374 (e.g. delete and add records) as well as to retrieve GNS information 375 (e.g to list name information in a client). NAMESTORE does only manage 376 the persistent storage of zone information belonging to the user running 377 the service: GNS information from other users obtained from the DHT are 378 stored by the NAMECACHE subsystem. 379 380 NAMESTORE uses a plugin-based database backend to store GNS information 381 with good performance. Here sqlite and PostgreSQL are supported 382 database backends. NAMESTORE clients interact with the IDENTITY 383 subsystem to obtain cryptographic information about zones based on egos 384 as described with the IDENTITY subsystem, but internally NAMESTORE 385 refers to zones using the respective private key. 386 387 NAMESTORE is queried and monitored by the ZONEMASTER service which periodically 388 publishes public records of GNS zones. ZONEMASTER also 389 collaborates with the NAMECACHE subsystem and stores zone information 390 when local information are modified in the NAMECACHE cache to increase look-up 391 performance for local information and to enable local access to private records 392 in zones through GNS. 393 394 NAMESTORE provides functionality to look-up and store records, to 395 iterate over a specific or all zones and to monitor zones for changes. 396 NAMESTORE functionality can be accessed using the NAMESTORE C API, the NAMESTORE 397 REST API, or the NAMESTORE command line tool. 398 399 .. index:: 400 double: HOSTLIST; subsystem 401 402 .. _HOSTLIST-Subsystem: 403 404 HOSTLIST — HELLO bootstrapping and gossip 405 ========================================= 406 407 Peers in the GNUnet overlay network need address information so that 408 they can connect with other peers. GNUnet uses so called HELLO messages 409 to store and exchange peer addresses. GNUnet provides several methods 410 for peers to obtain this information: 411 412 - out-of-band exchange of HELLO messages (manually, using for example 413 gnunet-core) 414 415 - HELLO messages shipped with GNUnet (automatic with distribution) 416 417 - UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast) 418 419 - topology gossiping (learning from other peers we already connected 420 to), and 421 422 - the HOSTLIST daemon covered in this section, which is particularly 423 relevant for bootstrapping new peers. 424 425 New peers have no existing connections (and thus cannot learn from 426 gossip among peers), may not have other peers in their LAN and might be 427 started with an outdated set of HELLO messages from the distribution. In 428 this case, getting new peers to connect to the network requires either 429 manual effort or the use of a HOSTLIST to obtain HELLOs. 430 431 .. _HELLOs: 432 433 HELLOs 434 ------ 435 436 The basic information peers require to connect to other peers are 437 contained in so called HELLO messages you can think of as a business 438 card. Besides the identity of the peer (based on the cryptographic 439 public key) a HELLO message may contain address information that 440 specifies ways to contact a peer. By obtaining HELLO messages, a peer 441 can learn how to contact other peers. 442 443 .. _Overview-for-the-HOSTLIST-subsystem: 444 445 Overview for the HOSTLIST subsystem 446 ----------------------------------- 447 448 The HOSTLIST subsystem provides a way to distribute and obtain contact 449 information to connect to other peers using a simple HTTP GET request. 450 Its implementation is split in three parts, the main file for the 451 daemon itself (``gnunet-daemon-hostlist.c``), the HTTP client used to 452 download peer information (``hostlist-client.c``) and the server 453 component used to provide this information to other peers 454 (``hostlist-server.c``). The server is basically a small HTTP web server 455 (based on GNU libmicrohttpd) which provides a list of HELLOs known to 456 the local peer for download. The client component is basically a HTTP 457 client (based on libcurl) which can download hostlists from one or more 458 websites. The hostlist format is a binary blob containing a sequence of 459 HELLO messages. Note that any HTTP server can theoretically serve a 460 hostlist, the built-in hostlist server makes it simply convenient to 461 offer this service. 462 463 .. _Features: 464 465 Features 466 ^^^^^^^^ 467 468 The HOSTLIST daemon can: 469 470 - provide HELLO messages with validated addresses obtained from 471 PEERINFO to download for other peers 472 473 - download HELLO messages and forward these message to the TRANSPORT 474 subsystem for validation 475 476 - advertises the URL of this peer's hostlist address to other peers via 477 gossip 478 479 - automatically learn about hostlist servers from the gossip of other 480 peers 481 482 .. _HOSTLIST-_002d-Limitations: 483 484 HOSTLIST - Limitations 485 ^^^^^^^^^^^^^^^^^^^^^^ 486 487 The HOSTLIST daemon does not: 488 489 - verify the cryptographic information in the HELLO messages 490 491 - verify the address information in the HELLO messages 492 493 .. _Interacting-with-the-HOSTLIST-daemon: 494 495 Interacting with the HOSTLIST daemon 496 ------------------------------------ 497 498 The HOSTLIST subsystem is currently implemented as a daemon, so there is 499 no need for the user to interact with it and therefore there is no 500 command line tool and no API to communicate with the daemon. In the 501 future, we can envision changing this to allow users to manually trigger 502 the download of a hostlist. 503 504 Since there is no command line interface to interact with HOSTLIST, the 505 only way to interact with the hostlist is to use STATISTICS to obtain or 506 modify information about the status of HOSTLIST: 507 508 :: 509 510 $ gnunet-statistics -s hostlist 511 512 In particular, HOSTLIST includes a **persistent** value in statistics 513 that specifies when the hostlist server might be queried next. As this 514 value is exponentially increasing during runtime, developers may want to 515 reset or manually adjust it. Note that HOSTLIST (but not STATISTICS) 516 needs to be shutdown if changes to this value are to have any effect on 517 the daemon (as HOSTLIST does not monitor STATISTICS for changes to the 518 download frequency). 519 520 .. _Hostlist-security-address-validation: 521 522 Hostlist security address validation 523 ------------------------------------ 524 525 Since information obtained from other parties cannot be trusted without 526 validation, we have to distinguish between *validated* and *not 527 validated* addresses. Before using (and so trusting) information from 528 other parties, this information has to be double-checked (validated). 529 Address validation is not done by HOSTLIST but by the TRANSPORT service. 530 531 The HOSTLIST component is functionally located between the PEERINFO and 532 the TRANSPORT subsystem. When acting as a server, the daemon obtains 533 valid (*validated*) peer information (HELLO messages) from the PEERINFO 534 service and provides it to other peers. When acting as a client, it 535 contacts the HOSTLIST servers specified in the configuration, downloads 536 the (unvalidated) list of HELLO messages and forwards these information 537 to the TRANSPORT server to validate the addresses. 538 539 .. _The-HOSTLIST-daemon: 540 541 :index:`The HOSTLIST daemon <double: daemon; HOSTLIST>` 542 The HOSTLIST daemon 543 ------------------- 544 545 The hostlist daemon is the main component of the HOSTLIST subsystem. It 546 is started by the ARM service and (if configured) starts the HOSTLIST 547 client and server components. 548 549 GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT 550 If the daemon provides a hostlist itself it can advertise it's own 551 hostlist to other peers. To do so it sends a 552 ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` message to other peers 553 when they connect to this peer on the CORE level. This hostlist 554 advertisement message contains the URL to access the HOSTLIST HTTP 555 server of the sender. The daemon may also subscribe to this type of 556 message from CORE service, and then forward these kind of message to the 557 HOSTLIST client. The client then uses all available URLs to download 558 peer information when necessary. 559 560 When starting, the HOSTLIST daemon first connects to the CORE subsystem 561 and if hostlist learning is enabled, registers a CORE handler to receive 562 this kind of messages. Next it starts (if configured) the client and 563 server. It passes pointers to CORE connect and disconnect and receive 564 handlers where the client and server store their functions, so the 565 daemon can notify them about CORE events. 566 567 To clean up on shutdown, the daemon has a cleaning task, shutting down 568 all subsystems and disconnecting from CORE. 569 570 .. _The-HOSTLIST-server: 571 572 :index:`The HOSTLIST server <single: HOSTLIST; server>` 573 The HOSTLIST server 574 ------------------- 575 576 The server provides a way for other peers to obtain HELLOs. Basically it 577 is a small web server other peers can connect to and download a list of 578 HELLOs using standard HTTP; it may also advertise the URL of the 579 hostlist to other peers connecting on CORE level. 580 581 .. _The-HTTP-Server: 582 583 The HTTP Server 584 ^^^^^^^^^^^^^^^ 585 586 During startup, the server starts a web server listening on the port 587 specified with the HTTPPORT value (default 8080). In addition it 588 connects to the PEERINFO service to obtain peer information. The 589 HOSTLIST server uses the GNUNET_PEERINFO_iterate function to request 590 HELLO information for all peers and adds their information to a new 591 hostlist if they are suitable (expired addresses and HELLOs without 592 addresses are both not suitable) and the maximum size for a hostlist is 593 not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When PEERINFO finishes 594 (with a last NULL callback), the server destroys the previous hostlist 595 response available for download on the web server and replaces it with 596 the updated hostlist. The hostlist format is basically a sequence of 597 HELLO messages (as obtained from PEERINFO) without any special 598 tokenization. Since each HELLO message contains a size field, the 599 response can easily be split into separate HELLO messages by the client. 600 601 A HOSTLIST client connecting to the HOSTLIST server will receive the 602 hostlist as an HTTP response and the server will terminate the 603 connection with the result code ``HTTP 200 OK``. The connection will be 604 closed immediately if no hostlist is available. 605 606 .. _Advertising-the-URL: 607 608 Advertising the URL 609 ^^^^^^^^^^^^^^^^^^^ 610 611 The server also advertises the URL to download the hostlist to other 612 peers if hostlist advertisement is enabled. When a new peer connects and 613 has hostlist learning enabled, the server sends a 614 ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` message to this peer 615 using the CORE service. 616 617 HOSTLIST client 618 .. _The-HOSTLIST-client: 619 620 The HOSTLIST client 621 ------------------- 622 623 The client provides the functionality to download the list of HELLOs 624 from a set of URLs. It performs a standard HTTP request to the URLs 625 configured and learned from advertisement messages received from other 626 peers. When a HELLO is downloaded, the HOSTLIST client forwards the 627 HELLO to the TRANSPORT service for validation. 628 629 The client supports two modes of operation: 630 631 - download of HELLOs (bootstrapping) 632 633 - learning of URLs 634 635 .. _Bootstrapping: 636 637 Bootstrapping 638 ^^^^^^^^^^^^^ 639 640 For bootstrapping, it schedules a task to download the hostlist from the 641 set of known URLs. The downloads are only performed if the number of 642 current connections is smaller than a minimum number of connections (at 643 the moment 4). The interval between downloads increases exponentially; 644 however, the exponential growth is limited if it becomes longer than an 645 hour. At that point, the frequency growth is capped at (#number of 646 connections \* 1h). 647 648 Once the decision has been taken to download HELLOs, the daemon chooses 649 a random URL from the list of known URLs. URLs can be configured in the 650 configuration or be learned from advertisement messages. The client uses 651 a HTTP client library (libcurl) to initiate the download using the 652 libcurl multi interface. Libcurl passes the data to the 653 callback_download function which stores the data in a buffer if space is 654 available and the maximum size for a hostlist download is not exceeded 655 (MAX_BYTES_PER_HOSTLISTS = 500000). When a full HELLO was downloaded, 656 the HOSTLIST client offers this HELLO message to the TRANSPORT service 657 for validation. When the download is finished or failed, statistical 658 information about the quality of this URL is updated. 659 660 .. _Learning: 661 662 :index:`Learning <single: HOSTLIST; learning>` 663 Learning 664 ^^^^^^^^ 665 666 The client also manages hostlist advertisements from other peers. The 667 HOSTLIST daemon forwards ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` 668 messages to the client subsystem, which extracts the URL from the 669 message. Next, a test of the newly obtained URL is performed by 670 triggering a download from the new URL. If the URL works correctly, it 671 is added to the list of working URLs. 672 673 The size of the list of URLs is restricted, so if an additional server 674 is added and the list is full, the URL with the worst quality ranking 675 (determined through successful downloads and number of HELLOs e.g.) is 676 discarded. During shutdown the list of URLs is saved to a file for 677 persistence and loaded on startup. URLs from the configuration file are 678 never discarded. 679 680 .. _Usage: 681 682 Usage 683 ----- 684 685 To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES 686 section for the ARM services. This is done in the default configuration. 687 688 For more information on how to configure the HOSTLIST subsystem see the 689 installation handbook: Configuring the hostlist to bootstrap Configuring 690 your peer to provide a hostlist 691 692 .. index:: 693 double: IDENTITY; subsystem 694 695 .. _IDENTITY-Subsystem: 696 697 IDENTITY — Ego management 698 ========================= 699 700 Identities of \"users\" in GNUnet are called egos. Egos can be used as 701 pseudonyms (\"fake names\") or be tied to an organization (for example, 702 \"GNU\") or even the actual identity of a human. GNUnet users are 703 expected to have many egos. They might have one tied to their real 704 identity, some for organizations they manage, and more for different 705 domains where they want to operate under a pseudonym. 706 707 The IDENTITY service allows users to manage their egos. The identity 708 service manages the private keys egos of the local user; it does not 709 manage identities of other users (public keys). Public keys for other 710 users need names to become manageable. GNUnet uses the GNU Name System 711 (GNS) to give names to other users and manage their public keys 712 securely. This chapter is about the IDENTITY service, which is about the 713 management of private keys. 714 715 On the network, an ego corresponds to an ECDSA key (over Curve25519, 716 using RFC 6979, as required by GNS). Thus, users can perform actions 717 under a particular ego by using (signing with) a particular private key. 718 Other users can then confirm that the action was really performed by 719 that ego by checking the signature against the respective public key. 720 721 The IDENTITY service allows users to associate a human-readable name 722 with each ego. This way, users can use names that will remind them of 723 the purpose of a particular ego. The IDENTITY service will store the 724 respective private keys and allows applications to access key 725 information by name. Users can change the name that is locally (!) 726 associated with an ego. Egos can also be deleted, which means that the 727 private key will be removed and it thus will not be possible to perform 728 actions with that ego in the future. 729 730 Additionally, the IDENTITY subsystem can associate service functions 731 with egos. For example, GNS requires the ego that should be used for the 732 shorten zone. GNS will ask IDENTITY for an ego for the \"gns-short\" 733 service. The IDENTITY service has a mapping of such service strings to 734 the name of the ego that the user wants to use for this service, for 735 example \"my-short-zone-ego\". 736 737 Finally, the IDENTITY API provides access to a special ego, the 738 anonymous ego. The anonymous ego is special in that its private key is 739 not really private, but fixed and known to everyone. Thus, anyone can 740 perform actions as anonymous. This can be useful as with this trick, 741 code does not have to contain a special case to distinguish between 742 anonymous and pseudonymous egos. 743 744 .. index:: 745 double: subsystem; MESSENGER 746 747 .. _MESSENGER-Subsystem: 748 749 MESSENGER — Room-based end-to-end messaging 750 =========================================== 751 752 The MESSENGER subsystem is responsible for secure end-to-end 753 communication in groups of nodes in the GNUnet overlay network. 754 MESSENGER builds on the CADET subsystem which provides a reliable and 755 secure end-to-end communication between the nodes inside of these 756 groups. 757 758 Additionally to the CADET security benefits, MESSENGER provides 759 following properties designed for application level usage: 760 761 - MESSENGER provides integrity by signing the messages with the users 762 provided ego 763 764 - MESSENGER adds (optional) forward secrecy by replacing the key pair 765 of the used ego and signing the propagation of the new one with old 766 one (chaining egos) 767 768 - MESSENGER provides verification of a original sender by checking 769 against all used egos from a member which are currently in active use 770 (active use depends on the state of a member session) 771 772 - MESSENGER offsers (optional) decentralized message forwarding between 773 all nodes in a group to improve availability and prevent MITM-attacks 774 775 - MESSENGER handles new connections and disconnections from nodes in 776 the group by reconnecting them preserving an efficient structure for 777 message distribution (ensuring availability and accountablity) 778 779 - MESSENGER provides replay protection (messages can be uniquely 780 identified via SHA-512, include a timestamp and the hash of the last 781 message) 782 783 - MESSENGER allows detection for dropped messages by chaining them 784 (messages refer to the last message by their hash) improving 785 accountability 786 787 - MESSENGER allows requesting messages from other peers explicitly to 788 ensure availability 789 790 - MESSENGER provides confidentiality by padding messages to few 791 different sizes (512 bytes, 4096 bytes, 32768 bytes and maximal 792 message size from CADET) 793 794 - MESSENGER adds (optional) confidentiality with ECDHE to exchange and 795 use symmetric encryption, encrypting with both AES-256 and Twofish 796 but allowing only selected members to decrypt (using the receivers 797 ego for ECDHE) 798 799 Also MESSENGER provides multiple features with privacy in mind: 800 801 - MESSENGER allows deleting messages from all peers in the group by the 802 original sender (uses the MESSENGER provided verification) 803 804 - MESSENGER allows using the publicly known anonymous ego instead of 805 any unique identifying ego 806 807 - MESSENGER allows your node to decide between acting as host of the 808 used messaging room (sharing your peer's identity with all nodes in 809 the group) or acting as guest (sharing your peer's identity only with 810 the nodes you explicitly open a connection to) 811 812 - MESSENGER handles members independently of the peer's identity making 813 forwarded messages indistinguishable from directly received ones ( 814 complicating the tracking of messages and identifying its origin) 815 816 - MESSENGER allows names of members being not unique (also names are 817 optional) 818 819 - MESSENGER does not include information about the selected receiver of 820 an explicitly encrypted message in its header, complicating it for 821 other members to draw conclusions from communication partners 822 823 824 825 .. index:: 826 single: subsystem; Network size estimation 827 see: NSE; Network size estimation 828 829 .. _NSE-Subsystem: 830 831 NSE — Network size estimation 832 ============================= 833 834 NSE stands for Network Size Estimation. The NSE subsystem provides other 835 subsystems and users with a rough estimate of the number of peers 836 currently participating in the GNUnet overlay. The computed value is not 837 a precise number as producing a precise number in a decentralized, 838 efficient and secure way is impossible. While NSE's estimate is 839 inherently imprecise, NSE also gives the expected range. For a peer that 840 has been running in a stable network for a while, the real network size 841 will typically (99.7% of the time) be in the range of [2/3 estimate, 3/2 842 estimate]. We will now give an overview of the algorithm used to 843 calculate the estimate; all of the details can be found in this 844 technical report. 845 846 .. todo:: link to the report. 847 848 .. _Motivation: 849 850 Motivation 851 ---------- 852 853 Some subsystems, like DHT, need to know the size of the GNUnet network 854 to optimize some parameters of their own protocol. The decentralized 855 nature of GNUnet makes efficient and securely counting the exact number 856 of peers infeasible. Although there are several decentralized algorithms 857 to count the number of peers in a system, so far there is none to do so 858 securely. Other protocols may allow any malicious peer to manipulate the 859 final result or to take advantage of the system to perform Denial of 860 Service (DoS) attacks against the network. GNUnet's NSE protocol avoids 861 these drawbacks. 862 863 NSE security 864 .. _Security: 865 866 :index:`Security <single: NSE; security>` 867 Security 868 ^^^^^^^^ 869 870 The NSE subsystem is designed to be resilient against these attacks. It 871 uses `proofs of 872 work <http://en.wikipedia.org/wiki/Proof-of-work_system>`__ to prevent 873 one peer from impersonating a large number of participants, which would 874 otherwise allow an adversary to artificially inflate the estimate. The 875 DoS protection comes from the time-based nature of the protocol: the 876 estimates are calculated periodically and out-of-time traffic is either 877 ignored or stored for later retransmission by benign peers. In 878 particular, peers cannot trigger global network communication at will. 879 880 .. _Principle: 881 882 :index:`Principle <single: NSE; principle of operation>` 883 Principle 884 --------- 885 886 The algorithm calculates the estimate by finding the globally closest 887 peer ID to a random, time-based value. 888 889 The idea is that the closer the ID is to the random value, the more 890 \"densely packed\" the ID space is, and therefore, more peers are in the 891 network. 892 893 .. _Example: 894 895 Example 896 ^^^^^^^ 897 898 Suppose all peers have IDs between 0 and 100 (our ID space), and the 899 random value is 42. If the closest peer has the ID 70 we can imagine 900 that the average \"distance\" between peers is around 30 and therefore 901 the are around 3 peers in the whole ID space. On the other hand, if the 902 closest peer has the ID 44, we can imagine that the space is rather 903 packed with peers, maybe as much as 50 of them. Naturally, we could have 904 been rather unlucky, and there is only one peer and happens to have the 905 ID 44. Thus, the current estimate is calculated as the average over 906 multiple rounds, and not just a single sample. 907 908 .. _Algorithm: 909 910 Algorithm 911 ^^^^^^^^^ 912 913 Given that example, one can imagine that the job of the subsystem is to 914 efficiently communicate the ID of the closest peer to the target value 915 to all the other peers, who will calculate the estimate from it. 916 917 .. _Target-value: 918 919 Target value 920 ^^^^^^^^^^^^ 921 922 The target value itself is generated by hashing the current time, 923 rounded down to an agreed value. If the rounding amount is 1h (default) 924 and the time is 12:34:56, the time to hash would be 12:00:00. The 925 process is repeated each rounding amount (in this example would be every 926 hour). Every repetition is called a round. 927 928 .. _Timing: 929 930 Timing 931 ^^^^^^ 932 933 The NSE subsystem has some timing control to avoid everybody 934 broadcasting its ID all at one. Once each peer has the target random 935 value, it compares its own ID to the target and calculates the 936 hypothetical size of the network if that peer were to be the closest. 937 Then it compares the hypothetical size with the estimate from the 938 previous rounds. For each value there is an associated point in the 939 period, let's call it \"broadcast time\". If its own hypothetical 940 estimate is the same as the previous global estimate, its \"broadcast 941 time\" will be in the middle of the round. If its bigger it will be 942 earlier and if its smaller (the most likely case) it will be later. This 943 ensures that the peers closest to the target value start broadcasting 944 their ID the first. 945 946 .. _Controlled-Flooding: 947 948 Controlled Flooding 949 ^^^^^^^^^^^^^^^^^^^ 950 951 When a peer receives a value, first it verifies that it is closer than 952 the closest value it had so far, otherwise it answers the incoming 953 message with a message containing the better value. Then it checks a 954 proof of work that must be included in the incoming message, to ensure 955 that the other peer's ID is not made up (otherwise a malicious peer 956 could claim to have an ID of exactly the target value every round). Once 957 validated, it compares the broadcast time of the received value with the 958 current time and if it's not too early, sends the received value to its 959 neighbors. Otherwise it stores the value until the correct broadcast 960 time comes. This prevents unnecessary traffic of sub-optimal values, 961 since a better value can come before the broadcast time, rendering the 962 previous one obsolete and saving the traffic that would have been used 963 to broadcast it to the neighbors. 964 965 .. _Calculating-the-estimate: 966 967 Calculating the estimate 968 ^^^^^^^^^^^^^^^^^^^^^^^^ 969 970 Once the closest ID has been spread across the network each peer gets 971 the exact distance between this ID and the target value of the round and 972 calculates the estimate with a mathematical formula described in the 973 tech report. The estimate generated with this method for a single round 974 is not very precise. Remember the case of the example, where the only 975 peer is the ID 44 and we happen to generate the target value 42, 976 thinking there are 50 peers in the network. Therefore, the NSE subsystem 977 remembers the last 64 estimates and calculates an average over them, 978 giving a result of which usually has one bit of uncertainty (the real 979 size could be half of the estimate or twice as much). Note that the 980 actual network size is calculated in powers of two of the raw input, 981 thus one bit of uncertainty means a factor of two in the size estimate. 982 983 .. index:: 984 double: subsystem; PEERINFO 985 986 .. _PEERINFO-Subsystem: 987 988 PEERINFO — Persistent HELLO storage 989 =================================== 990 991 The PEERINFO subsystem is used to store verified (validated) information 992 about known peers in a persistent way. It obtains these addresses for 993 example from TRANSPORT service which is in charge of address validation. 994 Validation means that the information in the HELLO message are checked 995 by connecting to the addresses and performing a cryptographic handshake 996 to authenticate the peer instance stating to be reachable with these 997 addresses. Peerinfo does not validate the HELLO messages itself but only 998 stores them and gives them to interested clients. 999 1000 As future work, we think about moving from storing just HELLO messages 1001 to providing a generic persistent per-peer information store. More and 1002 more subsystems tend to need to store per-peer information in persistent 1003 way. To not duplicate this functionality we plan to provide a PEERSTORE 1004 service providing this functionality. 1005 1006 .. _PEERINFO-_002d-Features: 1007 1008 PEERINFO - Features 1009 ------------------- 1010 1011 - Persistent storage 1012 1013 - Client notification mechanism on update 1014 1015 - Periodic clean up for expired information 1016 1017 - Differentiation between public and friend-only HELLO 1018 1019 .. _PEERINFO-_002d-Limitations: 1020 1021 PEERINFO - Limitations 1022 ---------------------- 1023 1024 - Does not perform HELLO validation 1025 1026 .. _DeveloperPeer-Information: 1027 1028 DeveloperPeer Information 1029 ------------------------- 1030 1031 The PEERINFO subsystem stores these information in the form of HELLO 1032 messages you can think of as business cards. These HELLO messages 1033 contain the public key of a peer and the addresses a peer can be reached 1034 under. The addresses include an expiration date describing how long they 1035 are valid. This information is updated regularly by the TRANSPORT 1036 service by revalidating the address. If an address is expired and not 1037 renewed, it can be removed from the HELLO message. 1038 1039 Some peer do not want to have their HELLO messages distributed to other 1040 peers, especially when GNUnet's friend-to-friend modus is enabled. To 1041 prevent this undesired distribution. PEERINFO distinguishes between 1042 *public* and *friend-only* HELLO messages. Public HELLO messages can be 1043 freely distributed to other (possibly unknown) peers (for example using 1044 the hostlist, gossiping, broadcasting), whereas friend-only HELLO 1045 messages may not be distributed to other peers. Friend-only HELLO 1046 messages have an additional flag ``friend_only`` set internally. For 1047 public HELLO message this flag is not set. PEERINFO does and cannot not 1048 check if a client is allowed to obtain a specific HELLO type. 1049 1050 The HELLO messages can be managed using the GNUnet HELLO library. Other 1051 GNUnet systems can obtain these information from PEERINFO and use it for 1052 their purposes. Clients are for example the HOSTLIST component providing 1053 these information to other peers in form of a hostlist or the TRANSPORT 1054 subsystem using these information to maintain connections to other 1055 peers. 1056 1057 .. _Startup: 1058 1059 Startup 1060 ------- 1061 1062 During startup the PEERINFO services loads persistent HELLOs from disk. 1063 First PEERINFO parses the directory configured in the HOSTS value of the 1064 ``PEERINFO`` configuration section to store PEERINFO information. For 1065 all files found in this directory valid HELLO messages are extracted. In 1066 addition it loads HELLO messages shipped with the GNUnet distribution. 1067 These HELLOs are used to simplify network bootstrapping by providing 1068 valid peer information with the distribution. The use of these HELLOs 1069 can be prevented by setting the ``USE_INCLUDED_HELLOS`` in the 1070 ``PEERINFO`` configuration section to ``NO``. Files containing invalid 1071 information are removed. 1072 1073 .. _Managing-Information: 1074 1075 Managing Information 1076 -------------------- 1077 1078 The PEERINFO services stores information about known PEERS and a single 1079 HELLO message for every peer. A peer does not need to have a HELLO if no 1080 information are available. HELLO information from different sources, for 1081 example a HELLO obtained from a remote HOSTLIST and a second HELLO 1082 stored on disk, are combined and merged into one single HELLO message 1083 per peer which will be given to clients. During this merge process the 1084 HELLO is immediately written to disk to ensure persistence. 1085 1086 PEERINFO in addition periodically scans the directory where information 1087 are stored for empty HELLO messages with expired TRANSPORT addresses. 1088 This periodic task scans all files in the directory and recreates the 1089 HELLO messages it finds. Expired TRANSPORT addresses are removed from 1090 the HELLO and if the HELLO does not contain any valid addresses, it is 1091 discarded and removed from the disk. 1092 1093 .. _Obtaining-Information: 1094 1095 Obtaining Information 1096 --------------------- 1097 1098 When a client requests information from PEERINFO, PEERINFO performs a 1099 lookup for the respective peer or all peers if desired and transmits 1100 this information to the client. The client can specify if friend-only 1101 HELLOs have to be included or not and PEERINFO filters the respective 1102 HELLO messages before transmitting information. 1103 1104 To notify clients about changes to PEERINFO information, PEERINFO 1105 maintains a list of clients interested in this notifications. Such a 1106 notification occurs if a HELLO for a peer was updated (due to a merge 1107 for example) or a new peer was added. 1108 1109 .. index:: 1110 double: subsystem; PEERSTORE 1111 1112 .. _PEERSTORE-Subsystem: 1113 1114 PEERSTORE — Extensible local persistent data storage 1115 ==================================================== 1116 1117 GNUnet's PEERSTORE subsystem offers persistent per-peer storage for 1118 other GNUnet subsystems. GNUnet subsystems can use PEERSTORE to 1119 persistently store and retrieve arbitrary data. Each data record stored 1120 with PEERSTORE contains the following fields: 1121 1122 - subsystem: Name of the subsystem responsible for the record. 1123 1124 - peerid: Identity of the peer this record is related to. 1125 1126 - key: a key string identifying the record. 1127 1128 - value: binary record value. 1129 1130 - expiry: record expiry date. 1131 1132 .. _Functionality: 1133 1134 Functionality 1135 ------------- 1136 1137 Subsystems can store any type of value under a (subsystem, peerid, key) 1138 combination. A \"replace\" flag set during store operations forces the 1139 PEERSTORE to replace any old values stored under the same (subsystem, 1140 peerid, key) combination with the new value. Additionally, an expiry 1141 date is set after which the record is \*possibly\* deleted by PEERSTORE. 1142 1143 Subsystems can iterate over all values stored under any of the following 1144 combination of fields: 1145 1146 - (subsystem) 1147 1148 - (subsystem, peerid) 1149 1150 - (subsystem, key) 1151 1152 - (subsystem, peerid, key) 1153 1154 Subsystems can also request to be notified about any new values stored 1155 under a (subsystem, peerid, key) combination by sending a \"watch\" 1156 request to PEERSTORE. 1157 1158 .. _Architecture: 1159 1160 Architecture 1161 ------------ 1162 1163 PEERSTORE implements the following components: 1164 1165 - PEERSTORE service: Handles store, iterate and watch operations. 1166 1167 - PEERSTORE API: API to be used by other subsystems to communicate and 1168 issue commands to the PEERSTORE service. 1169 1170 - PEERSTORE plugins: Handles the persistent storage. At the moment, 1171 only an \"sqlite\" plugin is implemented. 1172 1173 .. index:: 1174 double: subsystem; REGEX 1175 1176 .. _REGEX-Subsystem: 1177 1178 REGEX — Service discovery using regular expressions 1179 =================================================== 1180 1181 Using the REGEX subsystem, you can discover peers that offer a 1182 particular service using regular expressions. The peers that offer a 1183 service specify it using a regular expressions. Peers that want to 1184 patronize a service search using a string. The REGEX subsystem will then 1185 use the DHT to return a set of matching offerers to the patrons. 1186 1187 For the technical details, we have Max's defense talk and Max's Master's 1188 thesis. 1189 1190 .. note:: An additional publication is under preparation and available 1191 to team members (in Git). 1192 1193 .. todo:: Missing links to Max's talk and Master's thesis 1194 1195 .. _How-to-run-the-regex-profiler: 1196 1197 How to run the regex profiler 1198 ----------------------------- 1199 1200 The gnunet-regex-profiler can be used to profile the usage of mesh/regex 1201 for a given set of regular expressions and strings. Mesh/regex allows 1202 you to announce your peer ID under a certain regex and search for peers 1203 matching a particular regex using a string. See 1204 `szengel2012ms <https://bib.gnunet.org/full/date.html#2012_5f2>`__ for a 1205 full introduction. 1206 1207 First of all, the regex profiler uses GNUnet testbed, thus all the 1208 implications for testbed also apply to the regex profiler (for example 1209 you need password-less ssh login to the machines listed in your hosts 1210 file). 1211 1212 **Configuration** 1213 1214 Moreover, an appropriate configuration file is needed. In the following 1215 paragraph the important details are highlighted. 1216 1217 Announcing of the regular expressions is done by the 1218 gnunet-daemon-regexprofiler, therefore you have to make sure it is 1219 started, by adding it to the START_ON_DEMAND set of ARM: 1220 1221 :: 1222 1223 [regexprofiler] 1224 START_ON_DEMAND = YES 1225 1226 Furthermore you have to specify the location of the binary: 1227 1228 :: 1229 1230 [regexprofiler] 1231 # Location of the gnunet-daemon-regexprofiler binary. 1232 BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler 1233 # Regex prefix that will be applied to all regular expressions and 1234 # search string. 1235 REGEX_PREFIX = "GNVPN-0001-PAD" 1236 1237 When running the profiler with a large scale deployment, you probably 1238 want to reduce the workload of each peer. Use the following options to 1239 do this. 1240 1241 :: 1242 1243 [dht] 1244 # Force network size estimation 1245 FORCE_NSE = 1 1246 1247 [dhtcache] 1248 DATABASE = heap 1249 # Disable RC-file for Bloom filter? (for benchmarking with limited IO 1250 # availability) 1251 DISABLE_BF_RC = YES 1252 # Disable Bloom filter entirely 1253 DISABLE_BF = YES 1254 1255 [nse] 1256 # Minimize proof-of-work CPU consumption by NSE 1257 WORKBITS = 1 1258 1259 **Options** 1260 1261 To finally run the profiler some options and the input data need to be 1262 specified on the command line. 1263 1264 :: 1265 1266 gnunet-regex-profiler -c config-file -d log-file -n num-links \ 1267 -p path-compression-length -s search-delay -t matching-timeout \ 1268 -a num-search-strings hosts-file policy-dir search-strings-file 1269 1270 Where\... 1271 1272 - \... ``config-file`` means the configuration file created earlier. 1273 1274 - \... ``log-file`` is the file where to write statistics output. 1275 1276 - \... ``num-links`` indicates the number of random links between 1277 started peers. 1278 1279 - \... ``path-compression-length`` is the maximum path compression 1280 length in the DFA. 1281 1282 - \... ``search-delay`` time to wait between peers finished linking and 1283 starting to match strings. 1284 1285 - \... ``matching-timeout`` timeout after which to cancel the 1286 searching. 1287 1288 - \... ``num-search-strings`` number of strings in the 1289 search-strings-file. 1290 1291 - \... the ``hosts-file`` should contain a list of hosts for the 1292 testbed, one per line in the following format: 1293 1294 - ``user@host_ip:port`` 1295 1296 - \... the ``policy-dir`` is a folder containing text files containing 1297 one or more regular expressions. A peer is started for each file in 1298 that folder and the regular expressions in the corresponding file are 1299 announced by this peer. 1300 1301 - \... the ``search-strings-file`` is a text file containing search 1302 strings, one in each line. 1303 1304 You can create regular expressions and search strings for every AS in 1305 the Internet using the attached scripts. You need one of the `CAIDA 1306 routeviews 1307 prefix2as <http://data.caida.org/datasets/routing/routeviews-prefix2as/>`__ 1308 data files for this. Run 1309 1310 :: 1311 1312 create_regex.py <filename> <output path> 1313 1314 to create the regular expressions and 1315 1316 :: 1317 1318 create_strings.py <input path> <outfile> 1319 1320 to create a search strings file from the previously created regular 1321 expressions. 1322 1323 1324 1325 .. index:: 1326 double: subsystem; REST 1327 1328 .. _REST-Subsystem: 1329 1330 REST — RESTful GNUnet Web APIs 1331 ============================== 1332 1333 .. todo:: Define REST 1334 1335 Using the REST subsystem, you can expose REST-based APIs or services. 1336 The REST service is designed as a pluggable architecture. 1337 1338 **Configuration** 1339 1340 The REST service can be configured in various ways. The reference config 1341 file can be found in ``src/rest/rest.conf``: 1342 1343 :: 1344 1345 [rest] 1346 REST_PORT=7776 1347 REST_ALLOW_HEADERS=Authorization,Accept,Content-Type 1348 REST_ALLOW_ORIGIN=* 1349 REST_ALLOW_CREDENTIALS=true 1350 1351 The port as well as CORS (cross-origin resource sharing) headers 1352 that are supposed to be advertised by the rest service are configurable. 1353 1354 .. index:: 1355 double: subsystem; REVOCATION 1356 1357 .. _REVOCATION-Subsystem: 1358 1359 REVOCATION — Ego key revocation 1360 =============================== 1361 1362 The REVOCATION subsystem is responsible for key revocation of Egos. If a 1363 user learns that their private key has been compromised or has lost it, 1364 they can use the REVOCATION system to inform all of the other users that 1365 their private key is no longer valid. The subsystem thus includes ways 1366 to query for the validity of keys and to propagate revocation messages. 1367 1368 .. _Dissemination: 1369 1370 Dissemination 1371 ------------- 1372 1373 When a revocation is performed, the revocation is first of all 1374 disseminated by flooding the overlay network. The goal is to reach every 1375 peer, so that when a peer needs to check if a key has been revoked, this 1376 will be purely a local operation where the peer looks at its local 1377 revocation list. Flooding the network is also the most robust form of 1378 key revocation --- an adversary would have to control a separator of the 1379 overlay graph to restrict the propagation of the revocation message. 1380 Flooding is also very easy to implement --- peers that receive a 1381 revocation message for a key that they have never seen before simply 1382 pass the message to all of their neighbours. 1383 1384 Flooding can only distribute the revocation message to peers that are 1385 online. In order to notify peers that join the network later, the 1386 revocation service performs efficient set reconciliation over the sets 1387 of known revocation messages whenever two peers (that both support 1388 REVOCATION dissemination) connect. The SET service is used to perform 1389 this operation efficiently. 1390 1391 .. _Revocation-Message-Design-Requirements: 1392 1393 Revocation Message Design Requirements 1394 -------------------------------------- 1395 1396 However, flooding is also quite costly, creating O(\|E\|) messages on a 1397 network with \|E\| edges. Thus, revocation messages are required to 1398 contain a proof-of-work, the result of an expensive computation (which, 1399 however, is cheap to verify). Only peers that have expended the CPU time 1400 necessary to provide this proof will be able to flood the network with 1401 the revocation message. This ensures that an attacker cannot simply 1402 flood the network with millions of revocation messages. The 1403 proof-of-work required by GNUnet is set to take days on a typical PC to 1404 compute; if the ability to quickly revoke a key is needed, users have 1405 the option to pre-compute revocation messages to store off-line and use 1406 instantly after their key has expired. 1407 1408 Revocation messages must also be signed by the private key that is being 1409 revoked. Thus, they can only be created while the private key is in the 1410 possession of the respective user. This is another reason to create a 1411 revocation message ahead of time and store it in a secure location. 1412 1413 .. index:: 1414 double: subsystems; Random peer sampling 1415 see: RPS; Random peer sampling 1416 1417 .. _RPS-Subsystem: 1418 1419 RPS — Random peer sampling 1420 ========================== 1421 1422 In literature, Random Peer Sampling (RPS) refers to the problem of 1423 reliably [1]_ drawing random samples from an unstructured p2p network. 1424 1425 Doing so in a reliable manner is not only hard because of inherent 1426 problems but also because of possible malicious peers that could try to 1427 bias the selection. 1428 1429 It is useful for all kind of gossip protocols that require the selection 1430 of random peers in the whole network like gathering statistics, 1431 spreading and aggregating information in the network, load balancing and 1432 overlay topology management. 1433 1434 The approach chosen in the RPS service implementation in GNUnet follows 1435 the `Brahms <https://bib.gnunet.org/full/date.html\#2009_5f0>`__ design. 1436 1437 The current state is \"work in progress\". There are a lot of things 1438 that need to be done, primarily finishing the experimental evaluation 1439 and a re-design of the API. 1440 1441 The abstract idea is to subscribe to connect to/start the RPS service 1442 and request random peers that will be returned when they represent a 1443 random selection from the whole network with high probability. 1444 1445 An additional feature to the original Brahms-design is the selection of 1446 sub-groups: The GNUnet implementation of RPS enables clients to ask for 1447 random peers from a group that is defined by a common shared secret. 1448 (The secret could of course also be public, depending on the use-case.) 1449 1450 Another addition to the original protocol was made: The sampler 1451 mechanism that was introduced in Brahms was slightly adapted and used to 1452 actually sample the peers and returned to the client. This is necessary 1453 as the original design only keeps peers connected to random other peers 1454 in the network. In order to return random peers to client requests 1455 independently random, they cannot be drawn from the connected peers. The 1456 adapted sampler makes sure that each request for random peers is 1457 independent from the others. 1458 1459 .. _Brahms: 1460 1461 Brahms 1462 ------ 1463 1464 The high-level concept of Brahms is two-fold: Combining push-pull gossip 1465 with locally fixing a assumed bias using cryptographic min-wise 1466 permutations. The central data structure is the view - a peer's current 1467 local sample. This view is used to select peers to push to and pull 1468 from. This simple mechanism can be biased easily. For this reason Brahms 1469 'fixes' the bias by using the so-called sampler. A data structure that 1470 takes a list of elements as input and outputs a random one of them 1471 independently of the frequency in the input set. Both an element that 1472 was put into the sampler a single time and an element that was put into 1473 it a million times have the same probability of being the output. This 1474 is achieved with exploiting min-wise independent permutations. In the 1475 RPS service we use HMACs: On the initialisation of a sampler element, a 1476 key is chosen at random. On each input the HMAC with the random key is 1477 computed. The sampler element keeps the element with the minimal HMAC. 1478 1479 In order to fix the bias in the view, a fraction of the elements in the 1480 view are sampled through the sampler from the random stream of peer IDs. 1481 1482 According to the theoretical analysis of Bortnikov et al. this suffices 1483 to keep the network connected and having random peers in the view. 1484 1485 .. [1] 1486 \"Reliable\" in this context means having no bias, neither spatial, 1487 nor temporal, nor through malicious activity. 1488 1489 .. index:: 1490 double: STATISTICS; subsystem 1491 1492 .. _STATISTICS-Subsystem: 1493 1494 STATISTICS — Runtime statistics publication 1495 =========================================== 1496 1497 In GNUnet, the STATISTICS subsystem offers a central place for all 1498 subsystems to publish unsigned 64-bit integer run-time statistics. 1499 Keeping this information centrally means that there is a unified way for 1500 the user to obtain data on all subsystems, and individual subsystems do 1501 not have to always include a custom data export method for performance 1502 metrics and other statistics. For example, the TRANSPORT system uses 1503 STATISTICS to update information about the number of directly connected 1504 peers and the bandwidth that has been consumed by the various plugins. 1505 This information is valuable for diagnosing connectivity and performance 1506 issues. 1507 1508 Following the GNUnet service architecture, the STATISTICS subsystem is 1509 divided into an API which is exposed through the header 1510 **gnunet_statistics_service.h** and the STATISTICS service 1511 **gnunet-service-statistics**. The **gnunet-statistics** command-line 1512 tool can be used to obtain (and change) information about the values 1513 stored by the STATISTICS service. The STATISTICS service does not 1514 communicate with other peers. 1515 1516 Data is stored in the STATISTICS service in the form of tuples 1517 **(subsystem, name, value, persistence)**. The subsystem determines to 1518 which other GNUnet's subsystem the data belongs. name is the name 1519 through which value is associated. It uniquely identifies the record 1520 from among other records belonging to the same subsystem. In some parts 1521 of the code, the pair **(subsystem, name)** is called a **statistic** as 1522 it identifies the values stored in the STATISTCS service.The persistence 1523 flag determines if the record has to be preserved across service 1524 restarts. A record is said to be persistent if this flag is set for it; 1525 if not, the record is treated as a non-persistent record and it is lost 1526 after service restart. Persistent records are written to and read from 1527 the file **statistics.data** before shutdown and upon startup. The file 1528 is located in the HOME directory of the peer. 1529 1530 An anomaly of the STATISTICS service is that it does not terminate 1531 immediately upon receiving a shutdown signal if it has any clients 1532 connected to it. It waits for all the clients that are not monitors to 1533 close their connections before terminating itself. This is to prevent 1534 the loss of data during peer shutdown — delaying the STATISTICS 1535 service shutdown helps other services to store important data to 1536 STATISTICS during shutdown. 1537 1538 .. index:: 1539 double: TRANSPORT Next Generation; subsystem 1540 1541 .. _TRANSPORT_002dNG-Subsystem: 1542 1543 TRANSPORT-NG — Next-generation transport management 1544 =================================================== 1545 1546 The current GNUnet TRANSPORT architecture is rooted in the GNUnet 0.4 1547 design of using plugins for the actual transmission operations and the 1548 ATS subsystem to select a plugin and allocate bandwidth. The following 1549 key issues have been identified with this design: 1550 1551 - Bugs in one plugin can affect the TRANSPORT service and other 1552 plugins. There is at least one open bug that affects sockets, where 1553 the origin is difficult to pinpoint due to the large code base. 1554 1555 - Relevant operating system default configurations often impose a limit 1556 of 1024 file descriptors per process. Thus, one plugin may impact 1557 other plugin's connectivity choices. 1558 1559 - Plugins are required to offer bi-directional connectivity. However, 1560 firewalls (incl. NAT boxes) and physical environments sometimes only 1561 allow uni-directional connectivity, which then currently cannot be 1562 utilized at all. 1563 1564 - Distance vector routing was implemented in 209 but shortly afterwards 1565 broken and due to the complexity of implementing it as a plugin and 1566 dealing with the resource allocation consequences was never useful. 1567 1568 - Most existing plugins communicate completely using cleartext, 1569 exposing metad data (message size) and making it easy to fingerprint 1570 and possibly block GNUnet traffic. 1571 1572 - Various NAT traversal methods are not supported. 1573 1574 - The service logic is cluttered with \"manipulation\" support code for 1575 TESTBED to enable faking network characteristics like lossy 1576 connections or firewewalls. 1577 1578 - Bandwidth allocation is done in ATS, requiring the duplication of 1579 state and resulting in much delayed allocation decisions. As a 1580 result, often available bandwidth goes unused. Users are expected to 1581 manually configure bandwidth limits, instead of TRANSPORT using 1582 congestion control to adapt automatically. 1583 1584 - TRANSPORT is difficult to test and has bad test coverage. 1585 1586 - HELLOs include an absolute expiration time. Nodes with unsynchronized 1587 clocks cannot connect. 1588 1589 - Displaying the contents of a HELLO requires the respective plugin as 1590 the plugin-specific data is encoded in binary. This also complicates 1591 logging. 1592 1593 .. _Design-goals-of-TNG: 1594 1595 Design goals of TNG 1596 ------------------- 1597 1598 In order to address the above issues, we want to: 1599 1600 - Move plugins into separate processes which we shall call 1601 *communicators*. Communicators connect as clients to the transport 1602 service. 1603 1604 - TRANSPORT should be able to utilize any number of communicators to the 1605 same peer at the same time. 1606 1607 - TRANSPORT should be responsible for fragmentation, retransmission, 1608 flow- and congestion-control. Users should no longer have to 1609 configure bandwidth limits: TRANSPORT should detect what is available 1610 and use it. 1611 1612 - Communicators should be allowed to be uni-directional and 1613 unreliable. TRANSPORT shall create bi-directional channels from this 1614 whenever possible. 1615 1616 - DV should no longer be a plugin, but part of TRANSPORT. 1617 1618 - TRANSPORT should provide communicators help communicating, for 1619 example in the case of uni-directional communicators or the need for 1620 out-of-band signalling for NAT traversal. We call this functionality 1621 *backchannels*. 1622 1623 - Transport manipulation should be signalled to CORE on a per-message 1624 basis instead of an approximate bandwidth. 1625 1626 - CORE should signal performance requirements (reliability, latency, 1627 etc.) on a per-message basis to TRANSPORT. If possible, TRANSPORT 1628 should consider those options when scheduling messages for 1629 transmission. 1630 1631 - HELLOs should be in a human-readable format with monotonic time 1632 expirations. 1633 1634 The new architecture is planned as follows: 1635 1636 .. image:: /images/tng.png 1637 1638 TRANSPORT's main objective is to establish bi-directional virtual links 1639 using a variety of possibly uni-directional communicators. Links undergo 1640 the following steps: 1641 1642 1. Communicator informs TRANSPORT A that a queue (direct neighbour) is 1643 available, or equivalently TRANSPORT A discovers a (DV) path to a 1644 target B. 1645 1646 2. TRANSPORT A sends a challenge to the target peer, trying to confirm 1647 that the peer can receive. FIXME: This is not implemented properly 1648 for DV. Here we should really take a validated DVH and send a 1649 challenge exactly down that path! 1650 1651 3. The other TRANSPORT, TRANSPORT B, receives the challenge, and sends 1652 back a response, possibly using a dierent path. If TRANSPORT B does 1653 not yet have a virtual link to A, it must try to establish a virtual 1654 link. 1655 1656 4. Upon receiving the response, TRANSPORT A creates the virtual link. If 1657 the response included a challenge, TRANSPORT A must respond to this 1658 challenge as well, eectively re-creating the TCP 3-way handshake 1659 (just with longer challenge values). 1660 1661 .. _HELLO_002dNG: 1662 1663 HELLO-NG 1664 -------- 1665 1666 HELLOs change in three ways. First of all, communicators encode the 1667 respective addresses in a human-readable URL-like string. This way, we 1668 do no longer require the communicator to print the contents of a HELLO. 1669 Second, HELLOs no longer contain an expiration time, only a creation 1670 time. The receiver must only compare the respective absolute values. So 1671 given a HELLO from the same sender with a larger creation time, then the 1672 old one is no longer valid. This also obsoletes the need for the 1673 gnunet-hello binary to set HELLO expiration times to never. Third, a 1674 peer no longer generates one big HELLO that always contains all of the 1675 addresses. Instead, each address is signed individually and shared only 1676 over the address scopes where it makes sense to share the address. In 1677 particular, care should be taken to not share MACs across the Internet 1678 and confine their use to the LAN. As each address is signed separately, 1679 having multiple addresses valid at the same time (given the new creation 1680 time expiration logic) requires that those addresses must have exactly 1681 the same creation time. Whenever that monotonic time is increased, all 1682 addresses must be re-signed and re-distributed. 1683 1684 .. _Priorities-and-preferences: 1685 1686 Priorities and preferences 1687 -------------------------- 1688 1689 In the new design, TRANSPORT adopts a feature (which was previously 1690 already available in CORE) of the MQ API to allow applications to 1691 specify priorities and preferences per message (or rather, per MQ 1692 envelope). The (updated) MQ API allows applications to specify one of 1693 four priority levels as well as desired preferences for transmission by 1694 setting options on an envelope. These preferences currently are: 1695 1696 - GNUNET_MQ_PREF_UNRELIABLE: Disables TRANSPORT waiting for ACKS on 1697 unreliable channels like UDP. Now it is fire and forget. These 1698 messages then cannot be used for RTT estimates either. 1699 1700 - GNUNET_MQ_PREF_LOW_LATENCY: Directs TRANSPORT to select the 1701 lowest-latency transmission choices possible. 1702 1703 - GNUNET_MQ_PREF_CORK_ALLOWED: Allows TRANSPORT to delay transmission 1704 to group the message with other messages into a larger batch to 1705 reduce the number of packets sent. 1706 1707 - GNUNET_MQ_PREF_GOODPUT: Directs TRANSPORT to select the highest 1708 goodput channel available. 1709 1710 - GNUNET_MQ_PREF_OUT_OF_ORDER: Allows TRANSPORT to reorder the messages 1711 as it sees fit, otherwise TRANSPORT should attempt to preserve 1712 transmission order. 1713 1714 Each MQ envelope is always able to store those options (and the 1715 priority), and in the future this uniform API will be used by TRANSPORT, 1716 CORE, CADET and possibly other subsystems that send messages (like 1717 LAKE). When CORE sets preferences and priorities, it is supposed to 1718 respect the preferences and priorities it is given from higher layers. 1719 Similarly, CADET also simply passes on the preferences and priorities of 1720 the layer above CADET. When a layer combines multiple smaller messages 1721 into one larger transmission, the ``GNUNET_MQ_env_combine_options()`` 1722 should be used to calculate options for the combined message. We note 1723 that the exact semantics of the options may differ by layer. For 1724 example, CADET will always strictly implement reliable and in-order 1725 delivery of messages, while the same options are only advisory for 1726 TRANSPORT and CORE: they should try (using ACKs on unreliable 1727 communicators, not changing the message order themselves), but if 1728 messages are lost anyway (e.g. because a TCP is dropped in the middle), 1729 or if messages are reordered (e.g. because they took different paths 1730 over the network and arrived in a different order) TRANSPORT and CORE do 1731 not have to correct this. Whether a preference is strict or loose is 1732 thus dened by the respective layer. 1733 1734 .. _Communicators: 1735 1736 Communicators 1737 ------------- 1738 1739 The API for communicators is defined in 1740 ``gnunet_transport_communication_service.h``. Each communicator must 1741 specify its (global) communication characteristics, which for now only 1742 say whether the communication is reliable (e.g. TCP, HTTPS) or 1743 unreliable (e.g. UDP, WLAN). Each communicator must specify a unique 1744 address prex, or NULL if the communicator cannot establish outgoing 1745 connections (for example because it is only acting as a TCP server). A 1746 communicator must tell TRANSPORT which addresses it is reachable under. 1747 Addresses may be added or removed at any time. A communicator may have 1748 zero addresses (transmission only). Addresses do not have to match the 1749 address prefix. 1750 1751 TRANSPORT may ask a communicator to try to connect to another address. 1752 TRANSPORT will only ask for connections where the address matches the 1753 communicator's address prefix that was provided when the connection was 1754 established. Communicators should then attempt to establish a 1755 connection. 1756 It is under the discretion of the communicator whether to honor this request. 1757 Reasons for not honoring such a request may be that an existing connection exists 1758 or resource limitations. 1759 No response is provided to TRANSPORT service on failure. 1760 The TRANSPORT service has to ask the communicator explicitly to retry. 1761 1762 If a communicator succeeds in establishing an outgoing connection for 1763 transmission, or if a communicator receives an incoming bi-directional 1764 connection, the communicator must inform the TRANSPORT service that a 1765 message queue (MQ) for transmission is now available. 1766 For that MQ, the communicator must provide the peer identity claimed by the other end. 1767 It must also provide a human-readable address (for debugging) and a maximum transfer unit 1768 (MTU). A MTU of zero means sending is not supported, SIZE_MAX should be 1769 used for no MTU. The communicator should also tell TRANSPORT what 1770 network type is used for the queue. The communicator may tell TRANSPORT 1771 anytime that the queue was deleted and is no longer available. 1772 1773 The communicator API also provides for flow control. First, 1774 communicators exhibit back-pressure on TRANSPORT: the number of messages 1775 TRANSPORT may add to a queue for transmission will be limited. So by not 1776 draining the transmission queue, back-pressure is provided to TRANSPORT. 1777 In the other direction, communicators may allow TRANSPORT to give 1778 back-pressure towards the communicator by providing a non-NULL 1779 ``GNUNET_TRANSPORT_MessageCompletedCallback`` argument to the 1780 ``GNUNET_TRANSPORT_communicator_receive`` function. In this case, 1781 TRANSPORT will only invoke this function once it has processed the 1782 message and is ready to receive more. Communicators should then limit 1783 how much traffic they receive based on this backpressure. Note that 1784 communicators do not have to provide a 1785 ``GNUNET_TRANSPORT_MessageCompletedCallback``; for example, UDP cannot 1786 support back-pressure due to the nature of the UDP protocol. In this 1787 case, TRANSPORT will implement its own TRANSPORT-to-TRANSPORT flow 1788 control to reduce the sender's data rate to acceptable levels. 1789 1790 TRANSPORT may notify a communicator about backchannel messages TRANSPORT 1791 received from other peers for this communicator. Similarly, 1792 communicators can ask TRANSPORT to try to send a backchannel message to 1793 other communicators of other peers. The semantics of the backchannel 1794 message are up to the communicators which use them. TRANSPORT may fail 1795 transmitting backchannel messages, and TRANSPORT will not attempt to 1796 retransmit them. 1797 1798 UDP communicator 1799 ^^^^^^^^^^^^^^^^ 1800 1801 The UDP communicator implements a basic encryption layer to protect from 1802 metadata leakage. 1803 The layer tries to establish a shared secret using an Elliptic-Curve Diffie-Hellman 1804 key exchange in which the initiator of a packet creates an ephemeral key pair 1805 to encrypt a message for the target peer identity. 1806 The communicator always offers this kind of transmission queue to a (reachable) 1807 peer in which messages are encrypted with dedicated keys. 1808 The performance of this queue is not suitable for high volume data transfer. 1809 1810 If the UDP connection is bi-directional, or the TRANSPORT is able to offer a 1811 backchannel connection, the resulting key can be re-used if the recieving peer 1812 is able to ACK the reception. 1813 This will cause the communicator to offer a new queue (with a higher priority 1814 than the default queue) to TRANSPORT with a limited capacity. 1815 The capacity is increased whenever the communicator receives an ACK for a 1816 transmission. 1817 This queue is suitable for high-volume data transfer and TRANSPORT will likely 1818 prioritize this queue (if available). 1819 1820 Communicators that try to establish a connection to a target peer authenticate 1821 their peer ID (public key) in the first packets by signing a monotonic time 1822 stamp, its peer ID, and the target peerID and send this data as well as the signature 1823 in one of the first packets. 1824 Receivers should keep track (persist) of the monotonic time stamps for each 1825 peer ID to reject possible replay attacks. 1826 1827 FIXME: Handshake wire format? KX, Flow. 1828 1829 TCP communicator 1830 ^^^^^^^^^^^^^^^^ 1831 1832 FIXME: Handshake wire format? KX, Flow. 1833 1834 QUIC communicator 1835 ^^^^^^^^^^^^^^^^^ 1836 The QUIC communicator runs over a bi-directional UDP connection. 1837 TLS layer with self-signed certificates (binding/signed with peer ID?). 1838 Single, bi-directional stream? 1839 FIXME: Handshake wire format? KX, Flow. 1840 1841 .. index:: 1842 double: TRANSPORT; subsystem 1843 1844 .. _TRANSPORT-Subsystem: 1845 1846 TRANSPORT — Overlay transport management 1847 ======================================== 1848 1849 This chapter documents how the GNUnet transport subsystem works. The 1850 GNUnet transport subsystem consists of three main components: the 1851 transport API (the interface used by the rest of the system to access 1852 the transport service), the transport service itself (most of the 1853 interesting functions, such as choosing transports, happens here) and 1854 the transport plugins. A transport plugin is a concrete implementation 1855 for how two GNUnet peers communicate; many plugins exist, for example 1856 for communication via TCP, UDP, HTTP, HTTPS and others. Finally, the 1857 transport subsystem uses supporting code, especially the NAT/UPnP 1858 library to help with tasks such as NAT traversal. 1859 1860 Key tasks of the transport service include: 1861 1862 - Create our HELLO message, notify clients and neighbours if our HELLO 1863 changes (using NAT library as necessary) 1864 1865 - Validate HELLOs from other peers (send PING), allow other peers to 1866 validate our HELLO's addresses (send PONG) 1867 1868 - Upon request, establish connections to other peers (using address 1869 selection from ATS subsystem) and maintain them (again using PINGs 1870 and PONGs) as long as desired 1871 1872 - Accept incoming connections, give ATS service the opportunity to 1873 switch communication channels 1874 1875 - Notify clients about peers that have connected to us or that have 1876 been disconnected from us 1877 1878 - If a (stateful) connection goes down unexpectedly (without explicit 1879 DISCONNECT), quickly attempt to recover (without notifying clients) 1880 but do notify clients quickly if reconnecting fails 1881 1882 - Send (payload) messages arriving from clients to other peers via 1883 transport plugins and receive messages from other peers, forwarding 1884 those to clients 1885 1886 - Enforce inbound traffic limits (using flow-control if it is 1887 applicable); outbound traffic limits are enforced by CORE, not by us 1888 (!) 1889 1890 - Enforce restrictions on P2P connection as specified by the blacklist 1891 configuration and blacklisting clients 1892 1893 Note that the term \"clients\" in the list above really refers to the 1894 GNUnet-CORE service, as CORE is typically the only client of the 1895 transport service. 1896 1897 .. index:: 1898 double: subsystem; SET 1899 1900 .. _SET-Subsystem: 1901 1902 SET — Peer to peer set operations (Deprecated) 1903 ============================================== 1904 1905 .. note:: 1906 1907 The SET subsystem is in process of being replaced by the SETU and SETI 1908 subsystems, which provide basically the same functionality, just using 1909 two different subsystems. SETI and SETU should be used for new code. 1910 1911 The SET service implements efficient set operations between two peers 1912 over a CADET tunnel. Currently, set union and set intersection are the 1913 only supported operations. Elements of a set consist of an *element 1914 type* and arbitrary binary *data*. The size of an element's data is 1915 limited to around 62 KB. 1916 1917 .. _Local-Sets: 1918 1919 Local Sets 1920 ---------- 1921 1922 Sets created by a local client can be modified and reused for multiple 1923 operations. As each set operation requires potentially expensive special 1924 auxiliary data to be computed for each element of a set, a set can only 1925 participate in one type of set operation (either union or intersection). 1926 The type of a set is determined upon its creation. If a the elements of 1927 a set are needed for an operation of a different type, all of the set's 1928 element must be copied to a new set of appropriate type. 1929 1930 .. _Set-Modifications: 1931 1932 Set Modifications 1933 ----------------- 1934 1935 Even when set operations are active, one can add to and remove elements 1936 from a set. However, these changes will only be visible to operations 1937 that have been created after the changes have taken place. That is, 1938 every set operation only sees a snapshot of the set from the time the 1939 operation was started. This mechanism is *not* implemented by copying 1940 the whole set, but by attaching *generation information* to each element 1941 and operation. 1942 1943 .. _Set-Operations: 1944 1945 Set Operations 1946 -------------- 1947 1948 Set operations can be started in two ways: Either by accepting an 1949 operation request from a remote peer, or by requesting a set operation 1950 from a remote peer. Set operations are uniquely identified by the 1951 involved *peers*, an *application id* and the *operation type*. 1952 1953 The client is notified of incoming set operations by *set listeners*. A 1954 set listener listens for incoming operations of a specific operation 1955 type and application id. Once notified of an incoming set request, the 1956 client can accept the set request (providing a local set for the 1957 operation) or reject it. 1958 1959 .. _Result-Elements: 1960 1961 Result Elements 1962 --------------- 1963 1964 The SET service has three *result modes* that determine how an 1965 operation's result set is delivered to the client: 1966 1967 - **Full Result Set.** All elements of set resulting from the set 1968 operation are returned to the client. 1969 1970 - **Added Elements.** Only elements that result from the operation and 1971 are not already in the local peer's set are returned. Note that for 1972 some operations (like set intersection) this result mode will never 1973 return any elements. This can be useful if only the remove peer is 1974 actually interested in the result of the set operation. 1975 1976 - **Removed Elements.** Only elements that are in the local peer's 1977 initial set but not in the operation's result set are returned. Note 1978 that for some operations (like set union) this result mode will never 1979 return any elements. This can be useful if only the remove peer is 1980 actually interested in the result of the set operation. 1981 1982 .. index:: 1983 double: subsystem; SETI 1984 1985 .. _SETI-Subsystem: 1986 1987 SETI — Peer to peer set intersections 1988 ===================================== 1989 1990 The SETI service implements efficient set intersection between two peers 1991 over a CADET tunnel. Elements of a set consist of an *element type* and 1992 arbitrary binary *data*. The size of an element's data is limited to 1993 around 62 KB. 1994 1995 .. _Intersection-Sets: 1996 1997 Intersection Sets 1998 ----------------- 1999 2000 Sets created by a local client can be modified (by adding additional 2001 elements) and reused for multiple operations. If elements are to be 2002 removed, a fresh set must be created by the client. 2003 2004 .. _Set-Intersection-Modifications: 2005 2006 Set Intersection Modifications 2007 ------------------------------ 2008 2009 Even when set operations are active, one can add elements to a set. 2010 However, these changes will only be visible to operations that have been 2011 created after the changes have taken place. That is, every set operation 2012 only sees a snapshot of the set from the time the operation was started. 2013 This mechanism is *not* implemented by copying the whole set, but by 2014 attaching *generation information* to each element and operation. 2015 2016 .. _Set-Intersection-Operations: 2017 2018 Set Intersection Operations 2019 --------------------------- 2020 2021 Set operations can be started in two ways: Either by accepting an 2022 operation request from a remote peer, or by requesting a set operation 2023 from a remote peer. Set operations are uniquely identified by the 2024 involved *peers*, an *application id* and the *operation type*. 2025 2026 The client is notified of incoming set operations by *set listeners*. A 2027 set listener listens for incoming operations of a specific operation 2028 type and application id. Once notified of an incoming set request, the 2029 client can accept the set request (providing a local set for the 2030 operation) or reject it. 2031 2032 .. _Intersection-Result-Elements: 2033 2034 Intersection Result Elements 2035 ---------------------------- 2036 2037 The SET service has two *result modes* that determine how an operation's 2038 result set is delivered to the client: 2039 2040 - **Return intersection.** All elements of set resulting from the set 2041 intersection are returned to the client. 2042 2043 - **Removed Elements.** Only elements that are in the local peer's 2044 initial set but not in the intersection are returned. 2045 2046 2047 2048 2049 .. index:: 2050 double: SETU; subsystem 2051 2052 .. _SETU-Subsystem: 2053 2054 SETU — Peer to peer set unions 2055 ============================== 2056 2057 The SETU service implements efficient set union operations between two 2058 peers over a CADET tunnel. Elements of a set consist of an *element 2059 type* and arbitrary binary *data*. The size of an element's data is 2060 limited to around 62 KB. 2061 2062 .. _Union-Sets: 2063 2064 Union Sets 2065 ---------- 2066 2067 Sets created by a local client can be modified (by adding additional 2068 elements) and reused for multiple operations. If elements are to be 2069 removed, a fresh set must be created by the client. 2070 2071 .. _Set-Union-Modifications: 2072 2073 Set Union Modifications 2074 ----------------------- 2075 2076 Even when set operations are active, one can add elements to a set. 2077 However, these changes will only be visible to operations that have been 2078 created after the changes have taken place. That is, every set operation 2079 only sees a snapshot of the set from the time the operation was started. 2080 This mechanism is *not* implemented by copying the whole set, but by 2081 attaching *generation information* to each element and operation. 2082 2083 .. _Set-Union-Operations: 2084 2085 Set Union Operations 2086 -------------------- 2087 2088 Set operations can be started in two ways: Either by accepting an 2089 operation request from a remote peer, or by requesting a set operation 2090 from a remote peer. Set operations are uniquely identified by the 2091 involved *peers*, an *application id* and the *operation type*. 2092 2093 The client is notified of incoming set operations by *set listeners*. A 2094 set listener listens for incoming operations of a specific operation 2095 type and application id. Once notified of an incoming set request, the 2096 client can accept the set request (providing a local set for the 2097 operation) or reject it. 2098 2099 .. _Union-Result-Elements: 2100 2101 Union Result Elements 2102 --------------------- 2103 2104 The SET service has three *result modes* that determine how an 2105 operation's result set is delivered to the client: 2106 2107 - **Locally added Elements.** Elements that are in the union but not 2108 already in the local peer's set are returned. 2109 2110 - **Remote added Elements.** Additionally, notify the client if the 2111 remote peer lacked some elements and thus also return to the local 2112 client those elements that we are sending to the remote peer to be 2113 added to its union. Obtaining these elements requires setting the 2114 ``GNUNET_SETU_OPTION_SYMMETRIC`` option.