subsystems.rst - gnunet-handbook - The GNUnet Handbook

subsystems.rst (87451B)
      1 Subsystems
      2 **********
      3 
      4 This section consists brief description of the subsystems that make up
      5 GNUnet.
      6 This image is giving an overview over system dependencies and interactions.
      7 
      8 .. image:: /images/gnunet-arch-full.svg
      9 
     10 CADET - Decentralized End-to-end Transport
     11 ==========================================
     12 
     13 The Confidential Ad-hoc Decentralized End-to-end Transport (CADET) subsystem
     14 in GNUnet is responsible for secure end-to-end
     15 communications between nodes in the GNUnet overlay network. CADET builds
     16 on the CORE subsystem, which provides for the link-layer communication,
     17 by adding routing, forwarding, and additional security to the
     18 connections. CADET offers the same cryptographic services as CORE, but
     19 on an end-to-end level. This is done so peers retransmitting traffic on
     20 behalf of other peers cannot access the payload data.
     21 
     22 -  CADET provides confidentiality with so-called perfect forward
     23    secrecy; we use ECDHE powered by Curve25519 for the key exchange and
     24    then use symmetric encryption, encrypting with both AES-256 and
     25    Twofish
     26 
     27 -  authentication is achieved by signing the ephemeral keys using
     28    Ed25519, a deterministic variant of ECDSA
     29 
     30 -  integrity protection (using SHA-512 to do encrypt-then-MAC, although
     31    only 256 bits are sent to reduce overhead)
     32 
     33 -  replay protection (using nonces, timestamps, challenge-response,
     34    message counters and ephemeral keys)
     35 
     36 -  liveness (keep-alive messages, timeout)
     37 
     38 Additional to the CORE-like security benefits, CADET offers other
     39 properties that make it a more universal service than CORE.
     40 
     41 -  CADET can establish channels to arbitrary peers in GNUnet. If a peer
     42    is not immediately reachable, CADET will find a path through the
     43    network and ask other peers to retransmit the traffic on its behalf.
     44 
     45 -  CADET offers (optional) reliability mechanisms. In a reliable channel
     46    traffic is guaranteed to arrive complete, unchanged and in-order.
     47 
     48 -  CADET takes care of flow and congestion control mechanisms, not
     49    allowing the sender to send more traffic than the receiver or the
     50    network are able to process.
     51 
     52 .. _CORE-Subsystem:
     53 
     54 .. index::
     55    double: CORE; subsystem
     56 
     57 CORE - GNUnet link layer
     58 ========================
     59 
     60 The CORE subsystem in GNUnet is responsible for securing link-layer
     61 communications between nodes in the GNUnet overlay network. CORE builds
     62 on the TRANSPORT subsystem which provides for the actual, insecure,
     63 unreliable link-layer communication (for example, via UDP or WLAN), and
     64 then adds fundamental security to the connections:
     65 
     66 -  confidentiality with so-called perfect forward secrecy; we use ECDHE
     67    (`Elliptic-curve
     68    Diffie—Hellman <http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman>`__)
     69    powered by Curve25519 (`Curve25519 <http://cr.yp.to/ecdh.html>`__)
     70    for the key exchange and then use symmetric encryption, encrypting
     71    with both AES-256
     72    (`AES-256 <http://en.wikipedia.org/wiki/Rijndael>`__) and Twofish
     73    (`Twofish <http://en.wikipedia.org/wiki/Twofish>`__)
     74 
     75 -  `authentication <http://en.wikipedia.org/wiki/Authentication>`__ is
     76    achieved by signing the ephemeral keys using Ed25519
     77    (`Ed25519 <http://ed25519.cr.yp.to/>`__), a deterministic variant of
     78    ECDSA (`ECDSA <http://en.wikipedia.org/wiki/ECDSA>`__)
     79 
     80 -  integrity protection (using SHA-512
     81    (`SHA-512 <http://en.wikipedia.org/wiki/SHA-2>`__) to do
     82    encrypt-then-MAC
     83    (`encrypt-then-MAC <http://en.wikipedia.org/wiki/Authenticated_encryption>`__))
     84 
     85 -  Replay (`replay <http://en.wikipedia.org/wiki/Replay_attack>`__)
     86    protection (using nonces, timestamps, challenge-response, message
     87    counters and ephemeral keys)
     88 
     89 -  liveness (keep-alive messages, timeout)
     90 
     91 .. _Limitations:
     92 
     93 :index:`Limitations <CORE; limitations>`
     94 Limitations
     95 -----------
     96 
     97 CORE does not perform
     98 `routing <http://en.wikipedia.org/wiki/Routing>`__; using CORE it is
     99 only possible to communicate with peers that happen to already be
    100 \"directly\" connected with each other. CORE also does not have an API
    101 to allow applications to establish such \"direct\" connections --- for
    102 this, applications can ask TRANSPORT, but TRANSPORT might not be able to
    103 establish a \"direct\" connection. The TOPOLOGY subsystem is responsible
    104 for trying to keep a few \"direct\" connections open at all times.
    105 Applications that need to talk to particular peers should use the CADET
    106 subsystem, as it can establish arbitrary \"indirect\" connections.
    107 
    108 Because CORE does not perform routing, CORE must only be used directly
    109 by applications that either perform their own routing logic (such as
    110 anonymous file-sharing) or that do not require routing, for example
    111 because they are based on flooding the network. CORE communication is
    112 unreliable and delivery is possibly out-of-order. Applications that
    113 require reliable communication should use the CADET service. Each
    114 application can only queue one message per target peer with the CORE
    115 service at any time; messages cannot be larger than approximately 63
    116 kilobytes. If messages are small, CORE may group multiple messages
    117 (possibly from different applications) prior to encryption. If permitted
    118 by the application (using the `cork <http://baus.net/on-tcp_cork/>`__
    119 option), CORE may delay transmissions to facilitate grouping of multiple
    120 small messages. If cork is not enabled, CORE will transmit the message
    121 as soon as TRANSPORT allows it (TRANSPORT is responsible for limiting
    122 bandwidth and congestion control). CORE does not allow flow control;
    123 applications are expected to process messages at line-speed. If flow
    124 control is needed, applications should use the CADET service.
    125 
    126 .. when is a peer connected
    127 .. _When-is-a-peer-_0022connected_0022_003f:
    128 
    129 When is a peer \"connected\"?
    130 -----------------------------
    131 
    132 In addition to the security features mentioned above, CORE also provides
    133 one additional key feature to applications using it, and that is a
    134 limited form of protocol-compatibility checking. CORE distinguishes
    135 between TRANSPORT-level connections (which enable communication with
    136 other peers) and application-level connections. Applications using the
    137 CORE API will (typically) learn about application-level connections from
    138 CORE, and not about TRANSPORT-level connections. When a typical
    139 application uses CORE, it will specify a set of message types (from
    140 ``gnunet_protocols.h``) that it understands. CORE will then notify the
    141 application about connections it has with other peers if and only if
    142 those applications registered an intersecting set of message types with
    143 their CORE service. Thus, it is quite possible that CORE only exposes a
    144 subset of the established direct connections to a particular application
    145 --- and different applications running above CORE might see different
    146 sets of connections at the same time.
    147 
    148 A special case are applications that do not register a handler for any
    149 message type. CORE assumes that these applications merely want to
    150 monitor connections (or \"all\" messages via other callbacks) and will
    151 notify those applications about all connections. This is used, for
    152 example, by the ``gnunet-core`` command-line tool to display the active
    153 connections. Note that it is also possible that the TRANSPORT service
    154 has more active connections than the CORE service, as the CORE service
    155 first has to perform a key exchange with connecting peers before
    156 exchanging information about supported message types and notifying
    157 applications about the new connection.
    158 .. _Distributed-Hash-Table-_0028DHT_0029:
    159 
    160 .. index::
    161    double: Distributed hash table; subsystem
    162    see: DHT; Distributed hash table
    163 
    164 DHT - Distributed Hash Table
    165 ============================
    166 
    167 GNUnet includes a generic distributed hash table that can be used by
    168 developers building P2P applications in the framework. This section
    169 documents high-level features and how developers are expected to use the
    170 DHT. We have a research paper detailing how the DHT works. Also, Nate's
    171 thesis includes a detailed description and performance analysis (in
    172 chapter 6). [R5N2011]_ [EVANS2011]_
    173 
    174 Key features of GNUnet's DHT include:
    175 
    176 -  stores key-value pairs with values up to (approximately) 63k in size
    177 
    178 -  works with many underlay network topologies (small-world, random
    179    graph), underlay does not need to be a full mesh / clique
    180 
    181 -  support for extended queries (more than just a simple 'key'),
    182    filtering duplicate replies within the network (bloomfilter) and
    183    content validation (for details, please read the subsection on the
    184    block library)
    185 
    186 -  can (optionally) return paths taken by the PUT and GET operations to
    187    the application
    188 
    189 -  provides content replication to handle churn
    190 
    191 GNUnet's DHT is randomized and unreliable. Unreliable means that there
    192 is no strict guarantee that a value stored in the DHT is always found
    193 — values are only found with high probability. While this is somewhat
    194 true in all P2P DHTs, GNUnet developers should be particularly wary of
    195 this fact (this will help you write secure, fault-tolerant code). Thus,
    196 when writing any application using the DHT, you should always consider
    197 the possibility that a value stored in the DHT by you or some other peer
    198 might simply not be returned, or returned with a significant delay. Your
    199 application logic must be written to tolerate this (naturally, some loss
    200 of performance or quality of service is expected in this case).
    201 
    202 .. _Block-library-and-plugins:
    203 
    204 Block library and plugins
    205 -------------------------
    206 
    207 .. _What-is-a-Block_003f:
    208 
    209 What is a Block?
    210 ^^^^^^^^^^^^^^^^
    211 
    212 Blocks are small (< 63k) pieces of data stored under a key (struct
    213 GNUNET_HashCode). Blocks have a type (enum GNUNET_BlockType) which
    214 defines their data format. Blocks are used in GNUnet as units of static
    215 data exchanged between peers and stored (or cached) locally. Uses of
    216 blocks include file-sharing (the files are broken up into blocks), the
    217 VPN (DNS information is stored in blocks) and the DHT (all information
    218 in the DHT and meta-information for the maintenance of the DHT are both
    219 stored using blocks). The block subsystem provides a few common
    220 functions that must be available for any type of block.
    221 
    222 
    223 .. [R5N2011] https://bib.gnunet.org/date.html#R5N
    224 .. [EVANS2011] https://d-nb.info/1015129951
    225 .. index:: 
    226    double: File sharing; subsystem
    227    see: FS; File sharing
    228 
    229 .. _File_002dsharing-_0028FS_0029-Subsystem:
    230 
    231 FS — File sharing over GNUnet
    232 =============================
    233 
    234 This chapter describes the details of how the file-sharing service
    235 works. As with all services, it is split into an API (libgnunetfs), the
    236 service process (gnunet-service-fs) and user interface(s). The
    237 file-sharing service uses the datastore service to store blocks and the
    238 DHT (and indirectly datacache) for lookups for non-anonymous
    239 file-sharing. Furthermore, the file-sharing service uses the block
    240 library (and the block fs plugin) for validation of DHT operations.
    241 
    242 In contrast to many other services, libgnunetfs is rather complex since
    243 the client library includes a large number of high-level abstractions;
    244 this is necessary since the FS service itself largely only operates on
    245 the block level. The FS library is responsible for providing a
    246 file-based abstraction to applications, including directories, meta
    247 data, keyword search, verification, and so on.
    248 
    249 The method used by GNUnet to break large files into blocks and to use
    250 keyword search is called the \"Encoding for Censorship Resistant
    251 Sharing\" (ECRS). ECRS is largely implemented in the fs library; block
    252 validation is also reflected in the block FS plugin and the FS service.
    253 ECRS on-demand encoding is implemented in the FS service.
    254 
    255 .. note:: The documentation in this chapter is quite incomplete.
    256 
    257 .. _Encoding-for-Censorship_002dResistant-Sharing-_0028ECRS_0029:
    258 
    259 .. index::
    260    see: Encoding for Censorship-Resistant Sharing; ECRS
    261 
    262 :index:`ECRS — Encoding for Censorship-Resistant Sharing <single: ECRS>`
    263 ECRS — Encoding for Censorship-Resistant Sharing
    264 ------------------------------------------------
    265 
    266 When GNUnet shares files, it uses a content encoding that is called
    267 ECRS, the Encoding for Censorship-Resistant Sharing. Most of ECRS is
    268 described in the (so far unpublished) research paper attached to this
    269 page. ECRS obsoletes the previous ESED and ESED II encodings which were
    270 used in GNUnet before version 0.7.0. The rest of this page assumes that
    271 the reader is familiar with the attached paper. What follows is a
    272 description of some minor extensions that GNUnet makes over what is
    273 described in the paper. The reason why these extensions are not in the
    274 paper is that we felt that they were obvious or trivial extensions to
    275 the original scheme and thus did not warrant space in the research
    276 report.
    277 
    278 .. todo:: Find missing link to file system paper.
    279 
    280 .. index::
    281    double: GNU Name System; subsystem
    282    see: GNS; GNU Name System
    283 
    284 .. _GNU-Name-System-_0028GNS_0029:
    285 
    286 GNS - the GNU Name system
    287 -------------------------
    288 
    289 The GNU Name System (GNS) is a decentralized database that enables users
    290 to securely resolve names to values. Names can be used to identify other
    291 users (for example, in social networking), or network services (for
    292 example, VPN services running at a peer in GNUnet, or purely IP-based
    293 services on the Internet). Users interact with GNS by typing in a
    294 hostname that ends in a top-level domain that is configured in the "GNS"
    295 section, matches an identity of the user or ends in a Base32-encoded
    296 public key.
    297 
    298 Videos giving an overview of most of the GNS and the motivations behind
    299 it is available here and here. The remainder of this chapter targets
    300 developers that are familiar with high level concepts of GNS as
    301 presented in these talks.
    302 
    303 .. todo:: Link to videos and GNS talks?
    304 
    305 GNS-aware applications should use the GNS resolver to obtain the
    306 respective records that are stored under that name in GNS. Each record
    307 consists of a type, value, expiration time and flags.
    308 
    309 The type specifies the format of the value. Types below 65536 correspond
    310 to DNS record types, larger values are used for GNS-specific records.
    311 Applications can define new GNS record types by reserving a number and
    312 implementing a plugin (which mostly needs to convert the binary value
    313 representation to a human-readable text format and vice-versa). The
    314 expiration time specifies how long the record is to be valid. The GNS
    315 API ensures that applications are only given non-expired values. The
    316 flags are typically irrelevant for applications, as GNS uses them
    317 internally to control visibility and validity of records.
    318 
    319 Records are stored along with a signature. The signature is generated
    320 using the private key of the authoritative zone. This allows any GNS
    321 resolver to verify the correctness of a name-value mapping.
    322 
    323 Internally, GNS uses the NAMECACHE to cache information obtained from
    324 other users, the NAMESTORE to store information specific to the local
    325 users, and the DHT to exchange data between users. A plugin API is used
    326 to enable applications to define new GNS record types.
    327 
    328 .. index::
    329    single: GNS; name cache
    330    double: subsystem; NAMECACHE
    331 
    332 .. _GNS-Namecache:
    333 
    334 NAMECACHE — DHT caching of GNS results
    335 ======================================
    336 
    337 The NAMECACHE subsystem is responsible for caching (encrypted)
    338 resolution results of the GNU Name System (GNS). GNS makes zone
    339 information available to other users via the DHT. However, as accessing
    340 the DHT for every lookup is expensive (and as the DHT's local cache is
    341 lost whenever the peer is restarted), GNS uses the NAMECACHE as a more
    342 persistent cache for DHT lookups. Thus, instead of always looking up
    343 every name in the DHT, GNS first checks if the result is already
    344 available locally in the NAMECACHE. Only if there is no result in the
    345 NAMECACHE, GNS queries the DHT. The NAMECACHE stores data in the same
    346 (encrypted) format as the DHT. It thus makes no sense to iterate over
    347 all items in the NAMECACHE – the NAMECACHE does not have a way to
    348 provide the keys required to decrypt the entries.
    349 
    350 Blocks in the NAMECACHE share the same expiration mechanism as blocks in
    351 the DHT – the block expires wheneever any of the records in the
    352 (encrypted) block expires. The expiration time of the block is the only
    353 information stored in plaintext. The NAMECACHE service internally
    354 performs all of the required work to expire blocks, clients do not have
    355 to worry about this. Also, given that NAMECACHE stores only GNS blocks
    356 that local users requested, there is no configuration option to limit
    357 the size of the NAMECACHE. It is assumed to be always small enough (a
    358 few MB) to fit on the drive.
    359 
    360 The NAMECACHE supports the use of different database backends via a
    361 plugin API.
    362 
    363 .. index:: 
    364    double: subsystem; NAMESTORE
    365 
    366 .. _NAMESTORE-Subsystem:
    367 
    368 NAMESTORE — Storage of local GNS zones
    369 ======================================
    370 
    371 The NAMESTORE subsystem provides persistent storage for local GNS zone
    372 information. All local GNS zone information are managed by NAMESTORE. It
    373 provides both the functionality to administer local GNS information
    374 (e.g. delete and add records) as well as to retrieve GNS information
    375 (e.g to list name information in a client). NAMESTORE does only manage
    376 the persistent storage of zone information belonging to the user running
    377 the service: GNS information from other users obtained from the DHT are
    378 stored by the NAMECACHE subsystem.
    379 
    380 NAMESTORE uses a plugin-based database backend to store GNS information
    381 with good performance. Here sqlite and PostgreSQL are supported
    382 database backends. NAMESTORE clients interact with the IDENTITY
    383 subsystem to obtain cryptographic information about zones based on egos
    384 as described with the IDENTITY subsystem, but internally NAMESTORE
    385 refers to zones using the respective private key.
    386 
    387 NAMESTORE is queried and monitored by the ZONEMASTER service which periodically
    388 publishes public records of GNS zones. ZONEMASTER also
    389 collaborates with the NAMECACHE subsystem and stores zone information
    390 when local information are modified in the NAMECACHE cache to increase look-up
    391 performance for local information and to enable local access to private records
    392 in zones through GNS.
    393 
    394 NAMESTORE provides functionality to look-up and store records, to
    395 iterate over a specific or all zones and to monitor zones for changes.
    396 NAMESTORE functionality can be accessed using the NAMESTORE C API, the NAMESTORE
    397 REST API, or the NAMESTORE command line tool.
    398 
    399 .. index::
    400    double: HOSTLIST; subsystem
    401 
    402 .. _HOSTLIST-Subsystem:
    403 
    404 HOSTLIST — HELLO bootstrapping and gossip
    405 =========================================
    406 
    407 Peers in the GNUnet overlay network need address information so that
    408 they can connect with other peers. GNUnet uses so called HELLO messages
    409 to store and exchange peer addresses. GNUnet provides several methods
    410 for peers to obtain this information:
    411 
    412 -  out-of-band exchange of HELLO messages (manually, using for example
    413    gnunet-core)
    414 
    415 -  HELLO messages shipped with GNUnet (automatic with distribution)
    416 
    417 -  UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast)
    418 
    419 -  topology gossiping (learning from other peers we already connected
    420    to), and
    421 
    422 -  the HOSTLIST daemon covered in this section, which is particularly
    423    relevant for bootstrapping new peers.
    424 
    425 New peers have no existing connections (and thus cannot learn from
    426 gossip among peers), may not have other peers in their LAN and might be
    427 started with an outdated set of HELLO messages from the distribution. In
    428 this case, getting new peers to connect to the network requires either
    429 manual effort or the use of a HOSTLIST to obtain HELLOs.
    430 
    431 .. _HELLOs:
    432 
    433 HELLOs
    434 ------
    435 
    436 The basic information peers require to connect to other peers are
    437 contained in so called HELLO messages you can think of as a business
    438 card. Besides the identity of the peer (based on the cryptographic
    439 public key) a HELLO message may contain address information that
    440 specifies ways to contact a peer. By obtaining HELLO messages, a peer
    441 can learn how to contact other peers.
    442 
    443 .. _Overview-for-the-HOSTLIST-subsystem:
    444 
    445 Overview for the HOSTLIST subsystem
    446 -----------------------------------
    447 
    448 The HOSTLIST subsystem provides a way to distribute and obtain contact
    449 information to connect to other peers using a simple HTTP GET request.
    450 Its implementation is split in three parts, the main file for the
    451 daemon itself (``gnunet-daemon-hostlist.c``), the HTTP client used to
    452 download peer information (``hostlist-client.c``) and the server
    453 component used to provide this information to other peers
    454 (``hostlist-server.c``). The server is basically a small HTTP web server
    455 (based on GNU libmicrohttpd) which provides a list of HELLOs known to
    456 the local peer for download. The client component is basically a HTTP
    457 client (based on libcurl) which can download hostlists from one or more
    458 websites. The hostlist format is a binary blob containing a sequence of
    459 HELLO messages. Note that any HTTP server can theoretically serve a
    460 hostlist, the built-in hostlist server makes it simply convenient to
    461 offer this service.
    462 
    463 .. _Features:
    464 
    465 Features
    466 ^^^^^^^^
    467 
    468 The HOSTLIST daemon can:
    469 
    470 -  provide HELLO messages with validated addresses obtained from
    471    PEERINFO to download for other peers
    472 
    473 -  download HELLO messages and forward these message to the TRANSPORT
    474    subsystem for validation
    475 
    476 -  advertises the URL of this peer's hostlist address to other peers via
    477    gossip
    478 
    479 -  automatically learn about hostlist servers from the gossip of other
    480    peers
    481 
    482 .. _HOSTLIST-_002d-Limitations:
    483 
    484 HOSTLIST - Limitations
    485 ^^^^^^^^^^^^^^^^^^^^^^
    486 
    487 The HOSTLIST daemon does not:
    488 
    489 -  verify the cryptographic information in the HELLO messages
    490 
    491 -  verify the address information in the HELLO messages
    492 
    493 .. _Interacting-with-the-HOSTLIST-daemon:
    494 
    495 Interacting with the HOSTLIST daemon
    496 ------------------------------------
    497 
    498 The HOSTLIST subsystem is currently implemented as a daemon, so there is
    499 no need for the user to interact with it and therefore there is no
    500 command line tool and no API to communicate with the daemon. In the
    501 future, we can envision changing this to allow users to manually trigger
    502 the download of a hostlist.
    503 
    504 Since there is no command line interface to interact with HOSTLIST, the
    505 only way to interact with the hostlist is to use STATISTICS to obtain or
    506 modify information about the status of HOSTLIST:
    507 
    508 ::
    509 
    510    $ gnunet-statistics -s hostlist
    511 
    512 In particular, HOSTLIST includes a **persistent** value in statistics
    513 that specifies when the hostlist server might be queried next. As this
    514 value is exponentially increasing during runtime, developers may want to
    515 reset or manually adjust it. Note that HOSTLIST (but not STATISTICS)
    516 needs to be shutdown if changes to this value are to have any effect on
    517 the daemon (as HOSTLIST does not monitor STATISTICS for changes to the
    518 download frequency).
    519 
    520 .. _Hostlist-security-address-validation:
    521 
    522 Hostlist security address validation
    523 ------------------------------------
    524 
    525 Since information obtained from other parties cannot be trusted without
    526 validation, we have to distinguish between *validated* and *not
    527 validated* addresses. Before using (and so trusting) information from
    528 other parties, this information has to be double-checked (validated).
    529 Address validation is not done by HOSTLIST but by the TRANSPORT service.
    530 
    531 The HOSTLIST component is functionally located between the PEERINFO and
    532 the TRANSPORT subsystem. When acting as a server, the daemon obtains
    533 valid (*validated*) peer information (HELLO messages) from the PEERINFO
    534 service and provides it to other peers. When acting as a client, it
    535 contacts the HOSTLIST servers specified in the configuration, downloads
    536 the (unvalidated) list of HELLO messages and forwards these information
    537 to the TRANSPORT server to validate the addresses.
    538 
    539 .. _The-HOSTLIST-daemon:
    540 
    541 :index:`The HOSTLIST daemon <double: daemon; HOSTLIST>`
    542 The HOSTLIST daemon
    543 -------------------
    544 
    545 The hostlist daemon is the main component of the HOSTLIST subsystem. It
    546 is started by the ARM service and (if configured) starts the HOSTLIST
    547 client and server components.
    548 
    549 GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT
    550 If the daemon provides a hostlist itself it can advertise it's own
    551 hostlist to other peers. To do so it sends a
    552 ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` message to other peers
    553 when they connect to this peer on the CORE level. This hostlist
    554 advertisement message contains the URL to access the HOSTLIST HTTP
    555 server of the sender. The daemon may also subscribe to this type of
    556 message from CORE service, and then forward these kind of message to the
    557 HOSTLIST client. The client then uses all available URLs to download
    558 peer information when necessary.
    559 
    560 When starting, the HOSTLIST daemon first connects to the CORE subsystem
    561 and if hostlist learning is enabled, registers a CORE handler to receive
    562 this kind of messages. Next it starts (if configured) the client and
    563 server. It passes pointers to CORE connect and disconnect and receive
    564 handlers where the client and server store their functions, so the
    565 daemon can notify them about CORE events.
    566 
    567 To clean up on shutdown, the daemon has a cleaning task, shutting down
    568 all subsystems and disconnecting from CORE.
    569 
    570 .. _The-HOSTLIST-server:
    571 
    572 :index:`The HOSTLIST server <single: HOSTLIST; server>`
    573 The HOSTLIST server
    574 -------------------
    575 
    576 The server provides a way for other peers to obtain HELLOs. Basically it
    577 is a small web server other peers can connect to and download a list of
    578 HELLOs using standard HTTP; it may also advertise the URL of the
    579 hostlist to other peers connecting on CORE level.
    580 
    581 .. _The-HTTP-Server:
    582 
    583 The HTTP Server
    584 ^^^^^^^^^^^^^^^
    585 
    586 During startup, the server starts a web server listening on the port
    587 specified with the HTTPPORT value (default 8080). In addition it
    588 connects to the PEERINFO service to obtain peer information. The
    589 HOSTLIST server uses the GNUNET_PEERINFO_iterate function to request
    590 HELLO information for all peers and adds their information to a new
    591 hostlist if they are suitable (expired addresses and HELLOs without
    592 addresses are both not suitable) and the maximum size for a hostlist is
    593 not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When PEERINFO finishes
    594 (with a last NULL callback), the server destroys the previous hostlist
    595 response available for download on the web server and replaces it with
    596 the updated hostlist. The hostlist format is basically a sequence of
    597 HELLO messages (as obtained from PEERINFO) without any special
    598 tokenization. Since each HELLO message contains a size field, the
    599 response can easily be split into separate HELLO messages by the client.
    600 
    601 A HOSTLIST client connecting to the HOSTLIST server will receive the
    602 hostlist as an HTTP response and the server will terminate the
    603 connection with the result code ``HTTP 200 OK``. The connection will be
    604 closed immediately if no hostlist is available.
    605 
    606 .. _Advertising-the-URL:
    607 
    608 Advertising the URL
    609 ^^^^^^^^^^^^^^^^^^^
    610 
    611 The server also advertises the URL to download the hostlist to other
    612 peers if hostlist advertisement is enabled. When a new peer connects and
    613 has hostlist learning enabled, the server sends a
    614 ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` message to this peer
    615 using the CORE service.
    616 
    617 HOSTLIST client
    618 .. _The-HOSTLIST-client:
    619 
    620 The HOSTLIST client
    621 -------------------
    622 
    623 The client provides the functionality to download the list of HELLOs
    624 from a set of URLs. It performs a standard HTTP request to the URLs
    625 configured and learned from advertisement messages received from other
    626 peers. When a HELLO is downloaded, the HOSTLIST client forwards the
    627 HELLO to the TRANSPORT service for validation.
    628 
    629 The client supports two modes of operation:
    630 
    631 -  download of HELLOs (bootstrapping)
    632 
    633 -  learning of URLs
    634 
    635 .. _Bootstrapping:
    636 
    637 Bootstrapping
    638 ^^^^^^^^^^^^^
    639 
    640 For bootstrapping, it schedules a task to download the hostlist from the
    641 set of known URLs. The downloads are only performed if the number of
    642 current connections is smaller than a minimum number of connections (at
    643 the moment 4). The interval between downloads increases exponentially;
    644 however, the exponential growth is limited if it becomes longer than an
    645 hour. At that point, the frequency growth is capped at (#number of
    646 connections \* 1h).
    647 
    648 Once the decision has been taken to download HELLOs, the daemon chooses
    649 a random URL from the list of known URLs. URLs can be configured in the
    650 configuration or be learned from advertisement messages. The client uses
    651 a HTTP client library (libcurl) to initiate the download using the
    652 libcurl multi interface. Libcurl passes the data to the
    653 callback_download function which stores the data in a buffer if space is
    654 available and the maximum size for a hostlist download is not exceeded
    655 (MAX_BYTES_PER_HOSTLISTS = 500000). When a full HELLO was downloaded,
    656 the HOSTLIST client offers this HELLO message to the TRANSPORT service
    657 for validation. When the download is finished or failed, statistical
    658 information about the quality of this URL is updated.
    659 
    660 .. _Learning:
    661 
    662 :index:`Learning <single: HOSTLIST; learning>`
    663 Learning
    664 ^^^^^^^^
    665 
    666 The client also manages hostlist advertisements from other peers. The
    667 HOSTLIST daemon forwards ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT``
    668 messages to the client subsystem, which extracts the URL from the
    669 message. Next, a test of the newly obtained URL is performed by
    670 triggering a download from the new URL. If the URL works correctly, it
    671 is added to the list of working URLs.
    672 
    673 The size of the list of URLs is restricted, so if an additional server
    674 is added and the list is full, the URL with the worst quality ranking
    675 (determined through successful downloads and number of HELLOs e.g.) is
    676 discarded. During shutdown the list of URLs is saved to a file for
    677 persistence and loaded on startup. URLs from the configuration file are
    678 never discarded.
    679 
    680 .. _Usage:
    681 
    682 Usage
    683 -----
    684 
    685 To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES
    686 section for the ARM services. This is done in the default configuration.
    687 
    688 For more information on how to configure the HOSTLIST subsystem see the
    689 installation handbook: Configuring the hostlist to bootstrap Configuring
    690 your peer to provide a hostlist
    691 
    692 .. index::
    693    double: IDENTITY; subsystem 
    694 
    695 .. _IDENTITY-Subsystem:
    696 
    697 IDENTITY — Ego management
    698 =========================
    699 
    700 Identities of \"users\" in GNUnet are called egos. Egos can be used as
    701 pseudonyms (\"fake names\") or be tied to an organization (for example,
    702 \"GNU\") or even the actual identity of a human. GNUnet users are
    703 expected to have many egos. They might have one tied to their real
    704 identity, some for organizations they manage, and more for different
    705 domains where they want to operate under a pseudonym.
    706 
    707 The IDENTITY service allows users to manage their egos. The identity
    708 service manages the private keys egos of the local user; it does not
    709 manage identities of other users (public keys). Public keys for other
    710 users need names to become manageable. GNUnet uses the GNU Name System
    711 (GNS) to give names to other users and manage their public keys
    712 securely. This chapter is about the IDENTITY service, which is about the
    713 management of private keys.
    714 
    715 On the network, an ego corresponds to an ECDSA key (over Curve25519,
    716 using RFC 6979, as required by GNS). Thus, users can perform actions
    717 under a particular ego by using (signing with) a particular private key.
    718 Other users can then confirm that the action was really performed by
    719 that ego by checking the signature against the respective public key.
    720 
    721 The IDENTITY service allows users to associate a human-readable name
    722 with each ego. This way, users can use names that will remind them of
    723 the purpose of a particular ego. The IDENTITY service will store the
    724 respective private keys and allows applications to access key
    725 information by name. Users can change the name that is locally (!)
    726 associated with an ego. Egos can also be deleted, which means that the
    727 private key will be removed and it thus will not be possible to perform
    728 actions with that ego in the future.
    729 
    730 Additionally, the IDENTITY subsystem can associate service functions
    731 with egos. For example, GNS requires the ego that should be used for the
    732 shorten zone. GNS will ask IDENTITY for an ego for the \"gns-short\"
    733 service. The IDENTITY service has a mapping of such service strings to
    734 the name of the ego that the user wants to use for this service, for
    735 example \"my-short-zone-ego\".
    736 
    737 Finally, the IDENTITY API provides access to a special ego, the
    738 anonymous ego. The anonymous ego is special in that its private key is
    739 not really private, but fixed and known to everyone. Thus, anyone can
    740 perform actions as anonymous. This can be useful as with this trick,
    741 code does not have to contain a special case to distinguish between
    742 anonymous and pseudonymous egos.
    743 
    744 .. index::
    745    double: subsystem; MESSENGER
    746 
    747 .. _MESSENGER-Subsystem:
    748 
    749 MESSENGER — Room-based end-to-end messaging 
    750 ===========================================
    751 
    752 The MESSENGER subsystem is responsible for secure end-to-end
    753 communication in groups of nodes in the GNUnet overlay network.
    754 MESSENGER builds on the CADET subsystem which provides a reliable and
    755 secure end-to-end communication between the nodes inside of these
    756 groups.
    757 
    758 Additionally to the CADET security benefits, MESSENGER provides
    759 following properties designed for application level usage:
    760 
    761 -  MESSENGER provides integrity by signing the messages with the users
    762    provided ego
    763 
    764 -  MESSENGER adds (optional) forward secrecy by replacing the key pair
    765    of the used ego and signing the propagation of the new one with old
    766    one (chaining egos)
    767 
    768 -  MESSENGER provides verification of a original sender by checking
    769    against all used egos from a member which are currently in active use
    770    (active use depends on the state of a member session)
    771 
    772 -  MESSENGER offsers (optional) decentralized message forwarding between
    773    all nodes in a group to improve availability and prevent MITM-attacks
    774 
    775 -  MESSENGER handles new connections and disconnections from nodes in
    776    the group by reconnecting them preserving an efficient structure for
    777    message distribution (ensuring availability and accountablity)
    778 
    779 -  MESSENGER provides replay protection (messages can be uniquely
    780    identified via SHA-512, include a timestamp and the hash of the last
    781    message)
    782 
    783 -  MESSENGER allows detection for dropped messages by chaining them
    784    (messages refer to the last message by their hash) improving
    785    accountability
    786 
    787 -  MESSENGER allows requesting messages from other peers explicitly to
    788    ensure availability
    789 
    790 -  MESSENGER provides confidentiality by padding messages to few
    791    different sizes (512 bytes, 4096 bytes, 32768 bytes and maximal
    792    message size from CADET)
    793 
    794 -  MESSENGER adds (optional) confidentiality with ECDHE to exchange and
    795    use symmetric encryption, encrypting with both AES-256 and Twofish
    796    but allowing only selected members to decrypt (using the receivers
    797    ego for ECDHE)
    798 
    799 Also MESSENGER provides multiple features with privacy in mind:
    800 
    801 -  MESSENGER allows deleting messages from all peers in the group by the
    802    original sender (uses the MESSENGER provided verification)
    803 
    804 -  MESSENGER allows using the publicly known anonymous ego instead of
    805    any unique identifying ego
    806 
    807 -  MESSENGER allows your node to decide between acting as host of the
    808    used messaging room (sharing your peer's identity with all nodes in
    809    the group) or acting as guest (sharing your peer's identity only with
    810    the nodes you explicitly open a connection to)
    811 
    812 -  MESSENGER handles members independently of the peer's identity making
    813    forwarded messages indistinguishable from directly received ones (
    814    complicating the tracking of messages and identifying its origin)
    815 
    816 -  MESSENGER allows names of members being not unique (also names are
    817    optional)
    818 
    819 -  MESSENGER does not include information about the selected receiver of
    820    an explicitly encrypted message in its header, complicating it for
    821    other members to draw conclusions from communication partners
    822 
    823 
    824 
    825 .. index::
    826    single: subsystem; Network size estimation
    827    see: NSE; Network size estimation
    828 
    829 .. _NSE-Subsystem:
    830 
    831 NSE — Network size estimation
    832 =============================
    833 
    834 NSE stands for Network Size Estimation. The NSE subsystem provides other
    835 subsystems and users with a rough estimate of the number of peers
    836 currently participating in the GNUnet overlay. The computed value is not
    837 a precise number as producing a precise number in a decentralized,
    838 efficient and secure way is impossible. While NSE's estimate is
    839 inherently imprecise, NSE also gives the expected range. For a peer that
    840 has been running in a stable network for a while, the real network size
    841 will typically (99.7% of the time) be in the range of [2/3 estimate, 3/2
    842 estimate]. We will now give an overview of the algorithm used to
    843 calculate the estimate; all of the details can be found in this
    844 technical report.
    845 
    846 .. todo:: link to the report.
    847 
    848 .. _Motivation:
    849 
    850 Motivation
    851 ----------
    852 
    853 Some subsystems, like DHT, need to know the size of the GNUnet network
    854 to optimize some parameters of their own protocol. The decentralized
    855 nature of GNUnet makes efficient and securely counting the exact number
    856 of peers infeasible. Although there are several decentralized algorithms
    857 to count the number of peers in a system, so far there is none to do so
    858 securely. Other protocols may allow any malicious peer to manipulate the
    859 final result or to take advantage of the system to perform Denial of
    860 Service (DoS) attacks against the network. GNUnet's NSE protocol avoids
    861 these drawbacks.
    862 
    863 NSE security
    864 .. _Security:
    865 
    866 :index:`Security <single: NSE; security>`
    867 Security
    868 ^^^^^^^^
    869 
    870 The NSE subsystem is designed to be resilient against these attacks. It
    871 uses `proofs of
    872 work <http://en.wikipedia.org/wiki/Proof-of-work_system>`__ to prevent
    873 one peer from impersonating a large number of participants, which would
    874 otherwise allow an adversary to artificially inflate the estimate. The
    875 DoS protection comes from the time-based nature of the protocol: the
    876 estimates are calculated periodically and out-of-time traffic is either
    877 ignored or stored for later retransmission by benign peers. In
    878 particular, peers cannot trigger global network communication at will.
    879 
    880 .. _Principle:
    881 
    882 :index:`Principle <single: NSE; principle of operation>`
    883 Principle
    884 ---------
    885 
    886 The algorithm calculates the estimate by finding the globally closest
    887 peer ID to a random, time-based value.
    888 
    889 The idea is that the closer the ID is to the random value, the more
    890 \"densely packed\" the ID space is, and therefore, more peers are in the
    891 network.
    892 
    893 .. _Example:
    894 
    895 Example
    896 ^^^^^^^
    897 
    898 Suppose all peers have IDs between 0 and 100 (our ID space), and the
    899 random value is 42. If the closest peer has the ID 70 we can imagine
    900 that the average \"distance\" between peers is around 30 and therefore
    901 the are around 3 peers in the whole ID space. On the other hand, if the
    902 closest peer has the ID 44, we can imagine that the space is rather
    903 packed with peers, maybe as much as 50 of them. Naturally, we could have
    904 been rather unlucky, and there is only one peer and happens to have the
    905 ID 44. Thus, the current estimate is calculated as the average over
    906 multiple rounds, and not just a single sample.
    907 
    908 .. _Algorithm:
    909 
    910 Algorithm
    911 ^^^^^^^^^
    912 
    913 Given that example, one can imagine that the job of the subsystem is to
    914 efficiently communicate the ID of the closest peer to the target value
    915 to all the other peers, who will calculate the estimate from it.
    916 
    917 .. _Target-value:
    918 
    919 Target value
    920 ^^^^^^^^^^^^
    921 
    922 The target value itself is generated by hashing the current time,
    923 rounded down to an agreed value. If the rounding amount is 1h (default)
    924 and the time is 12:34:56, the time to hash would be 12:00:00. The
    925 process is repeated each rounding amount (in this example would be every
    926 hour). Every repetition is called a round.
    927 
    928 .. _Timing:
    929 
    930 Timing
    931 ^^^^^^
    932 
    933 The NSE subsystem has some timing control to avoid everybody
    934 broadcasting its ID all at one. Once each peer has the target random
    935 value, it compares its own ID to the target and calculates the
    936 hypothetical size of the network if that peer were to be the closest.
    937 Then it compares the hypothetical size with the estimate from the
    938 previous rounds. For each value there is an associated point in the
    939 period, let's call it \"broadcast time\". If its own hypothetical
    940 estimate is the same as the previous global estimate, its \"broadcast
    941 time\" will be in the middle of the round. If its bigger it will be
    942 earlier and if its smaller (the most likely case) it will be later. This
    943 ensures that the peers closest to the target value start broadcasting
    944 their ID the first.
    945 
    946 .. _Controlled-Flooding:
    947 
    948 Controlled Flooding
    949 ^^^^^^^^^^^^^^^^^^^
    950 
    951 When a peer receives a value, first it verifies that it is closer than
    952 the closest value it had so far, otherwise it answers the incoming
    953 message with a message containing the better value. Then it checks a
    954 proof of work that must be included in the incoming message, to ensure
    955 that the other peer's ID is not made up (otherwise a malicious peer
    956 could claim to have an ID of exactly the target value every round). Once
    957 validated, it compares the broadcast time of the received value with the
    958 current time and if it's not too early, sends the received value to its
    959 neighbors. Otherwise it stores the value until the correct broadcast
    960 time comes. This prevents unnecessary traffic of sub-optimal values,
    961 since a better value can come before the broadcast time, rendering the
    962 previous one obsolete and saving the traffic that would have been used
    963 to broadcast it to the neighbors.
    964 
    965 .. _Calculating-the-estimate:
    966 
    967 Calculating the estimate
    968 ^^^^^^^^^^^^^^^^^^^^^^^^
    969 
    970 Once the closest ID has been spread across the network each peer gets
    971 the exact distance between this ID and the target value of the round and
    972 calculates the estimate with a mathematical formula described in the
    973 tech report. The estimate generated with this method for a single round
    974 is not very precise. Remember the case of the example, where the only
    975 peer is the ID 44 and we happen to generate the target value 42,
    976 thinking there are 50 peers in the network. Therefore, the NSE subsystem
    977 remembers the last 64 estimates and calculates an average over them,
    978 giving a result of which usually has one bit of uncertainty (the real
    979 size could be half of the estimate or twice as much). Note that the
    980 actual network size is calculated in powers of two of the raw input,
    981 thus one bit of uncertainty means a factor of two in the size estimate.
    982 
    983 .. index::
    984    double: subsystem; PEERINFO
    985 
    986 .. _PEERINFO-Subsystem:
    987 
    988 PEERINFO — Persistent HELLO storage
    989 ===================================
    990 
    991 The PEERINFO subsystem is used to store verified (validated) information
    992 about known peers in a persistent way. It obtains these addresses for
    993 example from TRANSPORT service which is in charge of address validation.
    994 Validation means that the information in the HELLO message are checked
    995 by connecting to the addresses and performing a cryptographic handshake
    996 to authenticate the peer instance stating to be reachable with these
    997 addresses. Peerinfo does not validate the HELLO messages itself but only
    998 stores them and gives them to interested clients.
    999 
   1000 As future work, we think about moving from storing just HELLO messages
   1001 to providing a generic persistent per-peer information store. More and
   1002 more subsystems tend to need to store per-peer information in persistent
   1003 way. To not duplicate this functionality we plan to provide a PEERSTORE
   1004 service providing this functionality.
   1005 
   1006 .. _PEERINFO-_002d-Features:
   1007 
   1008 PEERINFO - Features
   1009 -------------------
   1010 
   1011 -  Persistent storage
   1012 
   1013 -  Client notification mechanism on update
   1014 
   1015 -  Periodic clean up for expired information
   1016 
   1017 -  Differentiation between public and friend-only HELLO
   1018 
   1019 .. _PEERINFO-_002d-Limitations:
   1020 
   1021 PEERINFO - Limitations
   1022 ----------------------
   1023 
   1024 -  Does not perform HELLO validation
   1025 
   1026 .. _DeveloperPeer-Information:
   1027 
   1028 DeveloperPeer Information
   1029 -------------------------
   1030 
   1031 The PEERINFO subsystem stores these information in the form of HELLO
   1032 messages you can think of as business cards. These HELLO messages
   1033 contain the public key of a peer and the addresses a peer can be reached
   1034 under. The addresses include an expiration date describing how long they
   1035 are valid. This information is updated regularly by the TRANSPORT
   1036 service by revalidating the address. If an address is expired and not
   1037 renewed, it can be removed from the HELLO message.
   1038 
   1039 Some peer do not want to have their HELLO messages distributed to other
   1040 peers, especially when GNUnet's friend-to-friend modus is enabled. To
   1041 prevent this undesired distribution. PEERINFO distinguishes between
   1042 *public* and *friend-only* HELLO messages. Public HELLO messages can be
   1043 freely distributed to other (possibly unknown) peers (for example using
   1044 the hostlist, gossiping, broadcasting), whereas friend-only HELLO
   1045 messages may not be distributed to other peers. Friend-only HELLO
   1046 messages have an additional flag ``friend_only`` set internally. For
   1047 public HELLO message this flag is not set. PEERINFO does and cannot not
   1048 check if a client is allowed to obtain a specific HELLO type.
   1049 
   1050 The HELLO messages can be managed using the GNUnet HELLO library. Other
   1051 GNUnet systems can obtain these information from PEERINFO and use it for
   1052 their purposes. Clients are for example the HOSTLIST component providing
   1053 these information to other peers in form of a hostlist or the TRANSPORT
   1054 subsystem using these information to maintain connections to other
   1055 peers.
   1056 
   1057 .. _Startup:
   1058 
   1059 Startup
   1060 -------
   1061 
   1062 During startup the PEERINFO services loads persistent HELLOs from disk.
   1063 First PEERINFO parses the directory configured in the HOSTS value of the
   1064 ``PEERINFO`` configuration section to store PEERINFO information. For
   1065 all files found in this directory valid HELLO messages are extracted. In
   1066 addition it loads HELLO messages shipped with the GNUnet distribution.
   1067 These HELLOs are used to simplify network bootstrapping by providing
   1068 valid peer information with the distribution. The use of these HELLOs
   1069 can be prevented by setting the ``USE_INCLUDED_HELLOS`` in the
   1070 ``PEERINFO`` configuration section to ``NO``. Files containing invalid
   1071 information are removed.
   1072 
   1073 .. _Managing-Information:
   1074 
   1075 Managing Information
   1076 --------------------
   1077 
   1078 The PEERINFO services stores information about known PEERS and a single
   1079 HELLO message for every peer. A peer does not need to have a HELLO if no
   1080 information are available. HELLO information from different sources, for
   1081 example a HELLO obtained from a remote HOSTLIST and a second HELLO
   1082 stored on disk, are combined and merged into one single HELLO message
   1083 per peer which will be given to clients. During this merge process the
   1084 HELLO is immediately written to disk to ensure persistence.
   1085 
   1086 PEERINFO in addition periodically scans the directory where information
   1087 are stored for empty HELLO messages with expired TRANSPORT addresses.
   1088 This periodic task scans all files in the directory and recreates the
   1089 HELLO messages it finds. Expired TRANSPORT addresses are removed from
   1090 the HELLO and if the HELLO does not contain any valid addresses, it is
   1091 discarded and removed from the disk.
   1092 
   1093 .. _Obtaining-Information:
   1094 
   1095 Obtaining Information
   1096 ---------------------
   1097 
   1098 When a client requests information from PEERINFO, PEERINFO performs a
   1099 lookup for the respective peer or all peers if desired and transmits
   1100 this information to the client. The client can specify if friend-only
   1101 HELLOs have to be included or not and PEERINFO filters the respective
   1102 HELLO messages before transmitting information.
   1103 
   1104 To notify clients about changes to PEERINFO information, PEERINFO
   1105 maintains a list of clients interested in this notifications. Such a
   1106 notification occurs if a HELLO for a peer was updated (due to a merge
   1107 for example) or a new peer was added.
   1108 
   1109 .. index::
   1110    double: subsystem; PEERSTORE
   1111 
   1112 .. _PEERSTORE-Subsystem:
   1113 
   1114 PEERSTORE — Extensible local persistent data storage
   1115 ====================================================
   1116 
   1117 GNUnet's PEERSTORE subsystem offers persistent per-peer storage for
   1118 other GNUnet subsystems. GNUnet subsystems can use PEERSTORE to
   1119 persistently store and retrieve arbitrary data. Each data record stored
   1120 with PEERSTORE contains the following fields:
   1121 
   1122 -  subsystem: Name of the subsystem responsible for the record.
   1123 
   1124 -  peerid: Identity of the peer this record is related to.
   1125 
   1126 -  key: a key string identifying the record.
   1127 
   1128 -  value: binary record value.
   1129 
   1130 -  expiry: record expiry date.
   1131 
   1132 .. _Functionality:
   1133 
   1134 Functionality
   1135 -------------
   1136 
   1137 Subsystems can store any type of value under a (subsystem, peerid, key)
   1138 combination. A \"replace\" flag set during store operations forces the
   1139 PEERSTORE to replace any old values stored under the same (subsystem,
   1140 peerid, key) combination with the new value. Additionally, an expiry
   1141 date is set after which the record is \*possibly\* deleted by PEERSTORE.
   1142 
   1143 Subsystems can iterate over all values stored under any of the following
   1144 combination of fields:
   1145 
   1146 -  (subsystem)
   1147 
   1148 -  (subsystem, peerid)
   1149 
   1150 -  (subsystem, key)
   1151 
   1152 -  (subsystem, peerid, key)
   1153 
   1154 Subsystems can also request to be notified about any new values stored
   1155 under a (subsystem, peerid, key) combination by sending a \"watch\"
   1156 request to PEERSTORE.
   1157 
   1158 .. _Architecture:
   1159 
   1160 Architecture
   1161 ------------
   1162 
   1163 PEERSTORE implements the following components:
   1164 
   1165 -  PEERSTORE service: Handles store, iterate and watch operations.
   1166 
   1167 -  PEERSTORE API: API to be used by other subsystems to communicate and
   1168    issue commands to the PEERSTORE service.
   1169 
   1170 -  PEERSTORE plugins: Handles the persistent storage. At the moment,
   1171    only an \"sqlite\" plugin is implemented.
   1172 
   1173 .. index::
   1174    double: subsystem; REGEX
   1175 
   1176 .. _REGEX-Subsystem:
   1177 
   1178 REGEX — Service discovery using regular expressions
   1179 ===================================================
   1180 
   1181 Using the REGEX subsystem, you can discover peers that offer a
   1182 particular service using regular expressions. The peers that offer a
   1183 service specify it using a regular expressions. Peers that want to
   1184 patronize a service search using a string. The REGEX subsystem will then
   1185 use the DHT to return a set of matching offerers to the patrons.
   1186 
   1187 For the technical details, we have Max's defense talk and Max's Master's
   1188 thesis.
   1189 
   1190 .. note:: An additional publication is under preparation and available
   1191    to team members (in Git).
   1192 
   1193 .. todo:: Missing links to Max's talk and Master's thesis
   1194 
   1195 .. _How-to-run-the-regex-profiler:
   1196 
   1197 How to run the regex profiler
   1198 -----------------------------
   1199 
   1200 The gnunet-regex-profiler can be used to profile the usage of mesh/regex
   1201 for a given set of regular expressions and strings. Mesh/regex allows
   1202 you to announce your peer ID under a certain regex and search for peers
   1203 matching a particular regex using a string. See
   1204 `szengel2012ms <https://bib.gnunet.org/full/date.html#2012_5f2>`__ for a
   1205 full introduction.
   1206 
   1207 First of all, the regex profiler uses GNUnet testbed, thus all the
   1208 implications for testbed also apply to the regex profiler (for example
   1209 you need password-less ssh login to the machines listed in your hosts
   1210 file).
   1211 
   1212 **Configuration**
   1213 
   1214 Moreover, an appropriate configuration file is needed. In the following
   1215 paragraph the important details are highlighted.
   1216 
   1217 Announcing of the regular expressions is done by the
   1218 gnunet-daemon-regexprofiler, therefore you have to make sure it is
   1219 started, by adding it to the START_ON_DEMAND set of ARM:
   1220 
   1221 ::
   1222 
   1223    [regexprofiler]
   1224    START_ON_DEMAND = YES
   1225 
   1226 Furthermore you have to specify the location of the binary:
   1227 
   1228 ::
   1229 
   1230    [regexprofiler]
   1231    # Location of the gnunet-daemon-regexprofiler binary.
   1232    BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler
   1233    # Regex prefix that will be applied to all regular expressions and
   1234    # search string.
   1235    REGEX_PREFIX = "GNVPN-0001-PAD"
   1236 
   1237 When running the profiler with a large scale deployment, you probably
   1238 want to reduce the workload of each peer. Use the following options to
   1239 do this.
   1240 
   1241 ::
   1242 
   1243    [dht]
   1244    # Force network size estimation
   1245    FORCE_NSE = 1
   1246 
   1247    [dhtcache]
   1248    DATABASE = heap
   1249    # Disable RC-file for Bloom filter? (for benchmarking with limited IO
   1250    # availability)
   1251    DISABLE_BF_RC = YES
   1252    # Disable Bloom filter entirely
   1253    DISABLE_BF = YES
   1254 
   1255    [nse]
   1256    # Minimize proof-of-work CPU consumption by NSE
   1257    WORKBITS = 1
   1258 
   1259 **Options**
   1260 
   1261 To finally run the profiler some options and the input data need to be
   1262 specified on the command line.
   1263 
   1264 ::
   1265 
   1266    gnunet-regex-profiler -c config-file -d log-file -n num-links \
   1267    -p path-compression-length -s search-delay -t matching-timeout \
   1268    -a num-search-strings hosts-file policy-dir search-strings-file
   1269 
   1270 Where\...
   1271 
   1272 -  \... ``config-file`` means the configuration file created earlier.
   1273 
   1274 -  \... ``log-file`` is the file where to write statistics output.
   1275 
   1276 -  \... ``num-links`` indicates the number of random links between
   1277    started peers.
   1278 
   1279 -  \... ``path-compression-length`` is the maximum path compression
   1280    length in the DFA.
   1281 
   1282 -  \... ``search-delay`` time to wait between peers finished linking and
   1283    starting to match strings.
   1284 
   1285 -  \... ``matching-timeout`` timeout after which to cancel the
   1286    searching.
   1287 
   1288 -  \... ``num-search-strings`` number of strings in the
   1289    search-strings-file.
   1290 
   1291 -  \... the ``hosts-file`` should contain a list of hosts for the
   1292    testbed, one per line in the following format:
   1293 
   1294    -  ``user@host_ip:port``
   1295 
   1296 -  \... the ``policy-dir`` is a folder containing text files containing
   1297    one or more regular expressions. A peer is started for each file in
   1298    that folder and the regular expressions in the corresponding file are
   1299    announced by this peer.
   1300 
   1301 -  \... the ``search-strings-file`` is a text file containing search
   1302    strings, one in each line.
   1303 
   1304 You can create regular expressions and search strings for every AS in
   1305 the Internet using the attached scripts. You need one of the `CAIDA
   1306 routeviews
   1307 prefix2as <http://data.caida.org/datasets/routing/routeviews-prefix2as/>`__
   1308 data files for this. Run
   1309 
   1310 ::
   1311 
   1312    create_regex.py <filename> <output path>
   1313 
   1314 to create the regular expressions and
   1315 
   1316 ::
   1317 
   1318    create_strings.py <input path> <outfile>
   1319 
   1320 to create a search strings file from the previously created regular
   1321 expressions.
   1322 
   1323 
   1324 
   1325 .. index::
   1326   double: subsystem; REST
   1327 
   1328 .. _REST-Subsystem:
   1329 
   1330 REST — RESTful GNUnet Web APIs
   1331 ==============================
   1332 
   1333 .. todo:: Define REST
   1334 
   1335 Using the REST subsystem, you can expose REST-based APIs or services.
   1336 The REST service is designed as a pluggable architecture.
   1337 
   1338 **Configuration**
   1339 
   1340 The REST service can be configured in various ways. The reference config
   1341 file can be found in ``src/rest/rest.conf``:
   1342 
   1343 ::
   1344 
   1345    [rest]
   1346    REST_PORT=7776
   1347    REST_ALLOW_HEADERS=Authorization,Accept,Content-Type
   1348    REST_ALLOW_ORIGIN=*
   1349    REST_ALLOW_CREDENTIALS=true
   1350 
   1351 The port as well as CORS (cross-origin resource sharing) headers 
   1352 that are supposed to be advertised by the rest service are configurable.
   1353 
   1354 .. index::
   1355    double: subsystem; REVOCATION
   1356 
   1357 .. _REVOCATION-Subsystem:
   1358 
   1359 REVOCATION — Ego key revocation
   1360 ===============================
   1361 
   1362 The REVOCATION subsystem is responsible for key revocation of Egos. If a
   1363 user learns that their private key has been compromised or has lost it,
   1364 they can use the REVOCATION system to inform all of the other users that
   1365 their private key is no longer valid. The subsystem thus includes ways
   1366 to query for the validity of keys and to propagate revocation messages.
   1367 
   1368 .. _Dissemination:
   1369 
   1370 Dissemination
   1371 -------------
   1372 
   1373 When a revocation is performed, the revocation is first of all
   1374 disseminated by flooding the overlay network. The goal is to reach every
   1375 peer, so that when a peer needs to check if a key has been revoked, this
   1376 will be purely a local operation where the peer looks at its local
   1377 revocation list. Flooding the network is also the most robust form of
   1378 key revocation --- an adversary would have to control a separator of the
   1379 overlay graph to restrict the propagation of the revocation message.
   1380 Flooding is also very easy to implement --- peers that receive a
   1381 revocation message for a key that they have never seen before simply
   1382 pass the message to all of their neighbours.
   1383 
   1384 Flooding can only distribute the revocation message to peers that are
   1385 online. In order to notify peers that join the network later, the
   1386 revocation service performs efficient set reconciliation over the sets
   1387 of known revocation messages whenever two peers (that both support
   1388 REVOCATION dissemination) connect. The SET service is used to perform
   1389 this operation efficiently.
   1390 
   1391 .. _Revocation-Message-Design-Requirements:
   1392 
   1393 Revocation Message Design Requirements
   1394 --------------------------------------
   1395 
   1396 However, flooding is also quite costly, creating O(\|E\|) messages on a
   1397 network with \|E\| edges. Thus, revocation messages are required to
   1398 contain a proof-of-work, the result of an expensive computation (which,
   1399 however, is cheap to verify). Only peers that have expended the CPU time
   1400 necessary to provide this proof will be able to flood the network with
   1401 the revocation message. This ensures that an attacker cannot simply
   1402 flood the network with millions of revocation messages. The
   1403 proof-of-work required by GNUnet is set to take days on a typical PC to
   1404 compute; if the ability to quickly revoke a key is needed, users have
   1405 the option to pre-compute revocation messages to store off-line and use
   1406 instantly after their key has expired.
   1407 
   1408 Revocation messages must also be signed by the private key that is being
   1409 revoked. Thus, they can only be created while the private key is in the
   1410 possession of the respective user. This is another reason to create a
   1411 revocation message ahead of time and store it in a secure location.
   1412 
   1413 .. index::
   1414    double: subsystems; Random peer sampling
   1415    see: RPS; Random peer sampling
   1416 
   1417 .. _RPS-Subsystem:
   1418 
   1419 RPS — Random peer sampling
   1420 ==========================
   1421 
   1422 In literature, Random Peer Sampling (RPS) refers to the problem of
   1423 reliably [1]_ drawing random samples from an unstructured p2p network.
   1424 
   1425 Doing so in a reliable manner is not only hard because of inherent
   1426 problems but also because of possible malicious peers that could try to
   1427 bias the selection.
   1428 
   1429 It is useful for all kind of gossip protocols that require the selection
   1430 of random peers in the whole network like gathering statistics,
   1431 spreading and aggregating information in the network, load balancing and
   1432 overlay topology management.
   1433 
   1434 The approach chosen in the RPS service implementation in GNUnet follows
   1435 the `Brahms <https://bib.gnunet.org/full/date.html\#2009_5f0>`__ design.
   1436 
   1437 The current state is \"work in progress\". There are a lot of things
   1438 that need to be done, primarily finishing the experimental evaluation
   1439 and a re-design of the API.
   1440 
   1441 The abstract idea is to subscribe to connect to/start the RPS service
   1442 and request random peers that will be returned when they represent a
   1443 random selection from the whole network with high probability.
   1444 
   1445 An additional feature to the original Brahms-design is the selection of
   1446 sub-groups: The GNUnet implementation of RPS enables clients to ask for
   1447 random peers from a group that is defined by a common shared secret.
   1448 (The secret could of course also be public, depending on the use-case.)
   1449 
   1450 Another addition to the original protocol was made: The sampler
   1451 mechanism that was introduced in Brahms was slightly adapted and used to
   1452 actually sample the peers and returned to the client. This is necessary
   1453 as the original design only keeps peers connected to random other peers
   1454 in the network. In order to return random peers to client requests
   1455 independently random, they cannot be drawn from the connected peers. The
   1456 adapted sampler makes sure that each request for random peers is
   1457 independent from the others.
   1458 
   1459 .. _Brahms:
   1460 
   1461 Brahms
   1462 ------
   1463 
   1464 The high-level concept of Brahms is two-fold: Combining push-pull gossip
   1465 with locally fixing a assumed bias using cryptographic min-wise
   1466 permutations. The central data structure is the view - a peer's current
   1467 local sample. This view is used to select peers to push to and pull
   1468 from. This simple mechanism can be biased easily. For this reason Brahms
   1469 'fixes' the bias by using the so-called sampler. A data structure that
   1470 takes a list of elements as input and outputs a random one of them
   1471 independently of the frequency in the input set. Both an element that
   1472 was put into the sampler a single time and an element that was put into
   1473 it a million times have the same probability of being the output. This
   1474 is achieved with exploiting min-wise independent permutations. In the
   1475 RPS service we use HMACs: On the initialisation of a sampler element, a
   1476 key is chosen at random. On each input the HMAC with the random key is
   1477 computed. The sampler element keeps the element with the minimal HMAC.
   1478 
   1479 In order to fix the bias in the view, a fraction of the elements in the
   1480 view are sampled through the sampler from the random stream of peer IDs.
   1481 
   1482 According to the theoretical analysis of Bortnikov et al. this suffices
   1483 to keep the network connected and having random peers in the view.
   1484 
   1485 .. [1]
   1486    \"Reliable\" in this context means having no bias, neither spatial,
   1487    nor temporal, nor through malicious activity.
   1488 
   1489 .. index::
   1490    double: STATISTICS; subsystem
   1491 
   1492 .. _STATISTICS-Subsystem:
   1493 
   1494 STATISTICS — Runtime statistics publication
   1495 ===========================================
   1496 
   1497 In GNUnet, the STATISTICS subsystem offers a central place for all
   1498 subsystems to publish unsigned 64-bit integer run-time statistics.
   1499 Keeping this information centrally means that there is a unified way for
   1500 the user to obtain data on all subsystems, and individual subsystems do
   1501 not have to always include a custom data export method for performance
   1502 metrics and other statistics. For example, the TRANSPORT system uses
   1503 STATISTICS to update information about the number of directly connected
   1504 peers and the bandwidth that has been consumed by the various plugins.
   1505 This information is valuable for diagnosing connectivity and performance
   1506 issues.
   1507 
   1508 Following the GNUnet service architecture, the STATISTICS subsystem is
   1509 divided into an API which is exposed through the header
   1510 **gnunet_statistics_service.h** and the STATISTICS service
   1511 **gnunet-service-statistics**. The **gnunet-statistics** command-line
   1512 tool can be used to obtain (and change) information about the values
   1513 stored by the STATISTICS service. The STATISTICS service does not
   1514 communicate with other peers.
   1515 
   1516 Data is stored in the STATISTICS service in the form of tuples
   1517 **(subsystem, name, value, persistence)**. The subsystem determines to
   1518 which other GNUnet's subsystem the data belongs. name is the name
   1519 through which value is associated. It uniquely identifies the record
   1520 from among other records belonging to the same subsystem. In some parts
   1521 of the code, the pair **(subsystem, name)** is called a **statistic** as
   1522 it identifies the values stored in the STATISTCS service.The persistence
   1523 flag determines if the record has to be preserved across service
   1524 restarts. A record is said to be persistent if this flag is set for it;
   1525 if not, the record is treated as a non-persistent record and it is lost
   1526 after service restart. Persistent records are written to and read from
   1527 the file **statistics.data** before shutdown and upon startup. The file
   1528 is located in the HOME directory of the peer.
   1529 
   1530 An anomaly of the STATISTICS service is that it does not terminate
   1531 immediately upon receiving a shutdown signal if it has any clients
   1532 connected to it. It waits for all the clients that are not monitors to
   1533 close their connections before terminating itself. This is to prevent
   1534 the loss of data during peer shutdown — delaying the STATISTICS
   1535 service shutdown helps other services to store important data to
   1536 STATISTICS during shutdown.
   1537 
   1538 .. index:: 
   1539    double: TRANSPORT Next Generation; subsystem
   1540 
   1541 .. _TRANSPORT_002dNG-Subsystem:
   1542 
   1543 TRANSPORT-NG — Next-generation transport management
   1544 ===================================================
   1545 
   1546 The current GNUnet TRANSPORT architecture is rooted in the GNUnet 0.4
   1547 design of using plugins for the actual transmission operations and the
   1548 ATS subsystem to select a plugin and allocate bandwidth. The following
   1549 key issues have been identified with this design:
   1550 
   1551 -  Bugs in one plugin can affect the TRANSPORT service and other
   1552    plugins. There is at least one open bug that affects sockets, where
   1553    the origin is difficult to pinpoint due to the large code base.
   1554 
   1555 -  Relevant operating system default configurations often impose a limit
   1556    of 1024 file descriptors per process. Thus, one plugin may impact
   1557    other plugin's connectivity choices.
   1558 
   1559 -  Plugins are required to offer bi-directional connectivity. However,
   1560    firewalls (incl. NAT boxes) and physical environments sometimes only
   1561    allow uni-directional connectivity, which then currently cannot be
   1562    utilized at all.
   1563 
   1564 -  Distance vector routing was implemented in 209 but shortly afterwards
   1565    broken and due to the complexity of implementing it as a plugin and
   1566    dealing with the resource allocation consequences was never useful.
   1567 
   1568 -  Most existing plugins communicate completely using cleartext,
   1569    exposing metad data (message size) and making it easy to fingerprint
   1570    and possibly block GNUnet traffic.
   1571 
   1572 -  Various NAT traversal methods are not supported.
   1573 
   1574 -  The service logic is cluttered with \"manipulation\" support code for
   1575    TESTBED to enable faking network characteristics like lossy
   1576    connections or firewewalls.
   1577 
   1578 -  Bandwidth allocation is done in ATS, requiring the duplication of
   1579    state and resulting in much delayed allocation decisions. As a
   1580    result, often available bandwidth goes unused. Users are expected to
   1581    manually configure bandwidth limits, instead of TRANSPORT using
   1582    congestion control to adapt automatically.
   1583 
   1584 -  TRANSPORT is difficult to test and has bad test coverage.
   1585 
   1586 -  HELLOs include an absolute expiration time. Nodes with unsynchronized
   1587    clocks cannot connect.
   1588 
   1589 -  Displaying the contents of a HELLO requires the respective plugin as
   1590    the plugin-specific data is encoded in binary. This also complicates
   1591    logging.
   1592 
   1593 .. _Design-goals-of-TNG:
   1594 
   1595 Design goals of TNG
   1596 -------------------
   1597 
   1598 In order to address the above issues, we want to:
   1599 
   1600 -  Move plugins into separate processes which we shall call
   1601    *communicators*. Communicators connect as clients to the transport
   1602    service.
   1603 
   1604 -  TRANSPORT should be able to utilize any number of communicators to the
   1605    same peer at the same time.
   1606 
   1607 -  TRANSPORT should be responsible for fragmentation, retransmission,
   1608    flow- and congestion-control. Users should no longer have to
   1609    configure bandwidth limits: TRANSPORT should detect what is available
   1610    and use it.
   1611 
   1612 -  Communicators should be allowed to be uni-directional and
   1613    unreliable. TRANSPORT shall create bi-directional channels from this
   1614    whenever possible.
   1615 
   1616 -  DV should no longer be a plugin, but part of TRANSPORT.
   1617 
   1618 -  TRANSPORT should provide communicators help communicating, for
   1619    example in the case of uni-directional communicators or the need for
   1620    out-of-band signalling for NAT traversal. We call this functionality
   1621    *backchannels*.
   1622 
   1623 -  Transport manipulation should be signalled to CORE on a per-message
   1624    basis instead of an approximate bandwidth.
   1625 
   1626 -  CORE should signal performance requirements (reliability, latency,
   1627    etc.) on a per-message basis to TRANSPORT. If possible, TRANSPORT
   1628    should consider those options when scheduling messages for
   1629    transmission.
   1630 
   1631 -  HELLOs should be in a human-readable format with monotonic time
   1632    expirations.
   1633 
   1634 The new architecture is planned as follows:
   1635 
   1636 .. image:: /images/tng.png
   1637 
   1638 TRANSPORT's main objective is to establish bi-directional virtual links
   1639 using a variety of possibly uni-directional communicators. Links undergo
   1640 the following steps:
   1641 
   1642 1. Communicator informs TRANSPORT A that a queue (direct neighbour) is
   1643    available, or equivalently TRANSPORT A discovers a (DV) path to a
   1644    target B.
   1645 
   1646 2. TRANSPORT A sends a challenge to the target peer, trying to confirm
   1647    that the peer can receive. FIXME: This is not implemented properly
   1648    for DV. Here we should really take a validated DVH and send a
   1649    challenge exactly down that path!
   1650 
   1651 3. The other TRANSPORT, TRANSPORT B, receives the challenge, and sends
   1652    back a response, possibly using a dierent path. If TRANSPORT B does
   1653    not yet have a virtual link to A, it must try to establish a virtual
   1654    link.
   1655 
   1656 4. Upon receiving the response, TRANSPORT A creates the virtual link. If
   1657    the response included a challenge, TRANSPORT A must respond to this
   1658    challenge as well, eectively re-creating the TCP 3-way handshake
   1659    (just with longer challenge values).
   1660 
   1661 .. _HELLO_002dNG:
   1662 
   1663 HELLO-NG
   1664 --------
   1665 
   1666 HELLOs change in three ways. First of all, communicators encode the
   1667 respective addresses in a human-readable URL-like string. This way, we
   1668 do no longer require the communicator to print the contents of a HELLO.
   1669 Second, HELLOs no longer contain an expiration time, only a creation
   1670 time. The receiver must only compare the respective absolute values. So
   1671 given a HELLO from the same sender with a larger creation time, then the
   1672 old one is no longer valid. This also obsoletes the need for the
   1673 gnunet-hello binary to set HELLO expiration times to never. Third, a
   1674 peer no longer generates one big HELLO that always contains all of the
   1675 addresses. Instead, each address is signed individually and shared only
   1676 over the address scopes where it makes sense to share the address. In
   1677 particular, care should be taken to not share MACs across the Internet
   1678 and confine their use to the LAN. As each address is signed separately,
   1679 having multiple addresses valid at the same time (given the new creation
   1680 time expiration logic) requires that those addresses must have exactly
   1681 the same creation time. Whenever that monotonic time is increased, all
   1682 addresses must be re-signed and re-distributed.
   1683 
   1684 .. _Priorities-and-preferences:
   1685 
   1686 Priorities and preferences
   1687 --------------------------
   1688 
   1689 In the new design, TRANSPORT adopts a feature (which was previously
   1690 already available in CORE) of the MQ API to allow applications to
   1691 specify priorities and preferences per message (or rather, per MQ
   1692 envelope). The (updated) MQ API allows applications to specify one of
   1693 four priority levels as well as desired preferences for transmission by
   1694 setting options on an envelope. These preferences currently are:
   1695 
   1696 -  GNUNET_MQ_PREF_UNRELIABLE: Disables TRANSPORT waiting for ACKS on
   1697    unreliable channels like UDP. Now it is fire and forget. These
   1698    messages then cannot be used for RTT estimates either.
   1699 
   1700 -  GNUNET_MQ_PREF_LOW_LATENCY: Directs TRANSPORT to select the
   1701    lowest-latency transmission choices possible.
   1702 
   1703 -  GNUNET_MQ_PREF_CORK_ALLOWED: Allows TRANSPORT to delay transmission
   1704    to group the message with other messages into a larger batch to
   1705    reduce the number of packets sent.
   1706 
   1707 -  GNUNET_MQ_PREF_GOODPUT: Directs TRANSPORT to select the highest
   1708    goodput channel available.
   1709 
   1710 -  GNUNET_MQ_PREF_OUT_OF_ORDER: Allows TRANSPORT to reorder the messages
   1711    as it sees fit, otherwise TRANSPORT should attempt to preserve
   1712    transmission order.
   1713 
   1714 Each MQ envelope is always able to store those options (and the
   1715 priority), and in the future this uniform API will be used by TRANSPORT,
   1716 CORE, CADET and possibly other subsystems that send messages (like
   1717 LAKE). When CORE sets preferences and priorities, it is supposed to
   1718 respect the preferences and priorities it is given from higher layers.
   1719 Similarly, CADET also simply passes on the preferences and priorities of
   1720 the layer above CADET. When a layer combines multiple smaller messages
   1721 into one larger transmission, the ``GNUNET_MQ_env_combine_options()``
   1722 should be used to calculate options for the combined message. We note
   1723 that the exact semantics of the options may differ by layer. For
   1724 example, CADET will always strictly implement reliable and in-order
   1725 delivery of messages, while the same options are only advisory for
   1726 TRANSPORT and CORE: they should try (using ACKs on unreliable
   1727 communicators, not changing the message order themselves), but if
   1728 messages are lost anyway (e.g. because a TCP is dropped in the middle),
   1729 or if messages are reordered (e.g. because they took different paths
   1730 over the network and arrived in a different order) TRANSPORT and CORE do
   1731 not have to correct this. Whether a preference is strict or loose is
   1732 thus dened by the respective layer.
   1733 
   1734 .. _Communicators:
   1735 
   1736 Communicators
   1737 -------------
   1738 
   1739 The API for communicators is defined in
   1740 ``gnunet_transport_communication_service.h``. Each communicator must
   1741 specify its (global) communication characteristics, which for now only
   1742 say whether the communication is reliable (e.g. TCP, HTTPS) or
   1743 unreliable (e.g. UDP, WLAN). Each communicator must specify a unique
   1744 address prex, or NULL if the communicator cannot establish outgoing
   1745 connections (for example because it is only acting as a TCP server). A
   1746 communicator must tell TRANSPORT which addresses it is reachable under.
   1747 Addresses may be added or removed at any time. A communicator may have
   1748 zero addresses (transmission only). Addresses do not have to match the
   1749 address prefix.
   1750 
   1751 TRANSPORT may ask a communicator to try to connect to another address.
   1752 TRANSPORT will only ask for connections where the address matches the
   1753 communicator's address prefix that was provided when the connection was
   1754 established. Communicators should then attempt to establish a
   1755 connection.
   1756 It is under the discretion of the communicator whether to honor this request.
   1757 Reasons for not honoring such a request may be that an existing connection exists
   1758 or resource limitations.
   1759 No response is provided to TRANSPORT service on failure.
   1760 The TRANSPORT service has to ask the communicator explicitly to retry.
   1761 
   1762 If a communicator succeeds in establishing an outgoing connection for
   1763 transmission, or if a communicator receives an incoming bi-directional
   1764 connection, the communicator must inform the TRANSPORT service that a
   1765 message queue (MQ) for transmission is now available.
   1766 For that MQ, the communicator must provide the peer identity claimed by the other end.
   1767 It must also provide a human-readable address (for debugging) and a maximum transfer unit
   1768 (MTU). A MTU of zero means sending is not supported, SIZE_MAX should be
   1769 used for no MTU. The communicator should also tell TRANSPORT what
   1770 network type is used for the queue. The communicator may tell TRANSPORT
   1771 anytime that the queue was deleted and is no longer available.
   1772 
   1773 The communicator API also provides for flow control. First,
   1774 communicators exhibit back-pressure on TRANSPORT: the number of messages
   1775 TRANSPORT may add to a queue for transmission will be limited. So by not
   1776 draining the transmission queue, back-pressure is provided to TRANSPORT.
   1777 In the other direction, communicators may allow TRANSPORT to give
   1778 back-pressure towards the communicator by providing a non-NULL
   1779 ``GNUNET_TRANSPORT_MessageCompletedCallback`` argument to the
   1780 ``GNUNET_TRANSPORT_communicator_receive`` function. In this case,
   1781 TRANSPORT will only invoke this function once it has processed the
   1782 message and is ready to receive more. Communicators should then limit
   1783 how much traffic they receive based on this backpressure. Note that
   1784 communicators do not have to provide a
   1785 ``GNUNET_TRANSPORT_MessageCompletedCallback``; for example, UDP cannot
   1786 support back-pressure due to the nature of the UDP protocol. In this
   1787 case, TRANSPORT will implement its own TRANSPORT-to-TRANSPORT flow
   1788 control to reduce the sender's data rate to acceptable levels.
   1789 
   1790 TRANSPORT may notify a communicator about backchannel messages TRANSPORT
   1791 received from other peers for this communicator. Similarly,
   1792 communicators can ask TRANSPORT to try to send a backchannel message to
   1793 other communicators of other peers. The semantics of the backchannel
   1794 message are up to the communicators which use them. TRANSPORT may fail
   1795 transmitting backchannel messages, and TRANSPORT will not attempt to
   1796 retransmit them.
   1797 
   1798 UDP communicator
   1799 ^^^^^^^^^^^^^^^^
   1800 
   1801 The UDP communicator implements a basic encryption layer to protect from
   1802 metadata leakage.
   1803 The layer tries to establish a shared secret using an Elliptic-Curve Diffie-Hellman
   1804 key exchange in which the initiator of a packet creates an ephemeral key pair
   1805 to encrypt a message for the target peer identity.
   1806 The communicator always offers this kind of transmission queue to a (reachable)
   1807 peer in which messages are encrypted with dedicated keys.
   1808 The performance of this queue is not suitable for high volume data transfer.
   1809 
   1810 If the UDP connection is bi-directional, or the TRANSPORT is able to offer a
   1811 backchannel connection, the resulting key can be re-used if the recieving peer
   1812 is able to ACK the reception.
   1813 This will cause the communicator to offer a new queue (with a higher priority
   1814 than the default queue) to TRANSPORT with a limited capacity.
   1815 The capacity is increased whenever the communicator receives an ACK for a
   1816 transmission.
   1817 This queue is suitable for high-volume data transfer and TRANSPORT will likely
   1818 prioritize this queue (if available).
   1819 
   1820 Communicators that try to establish a connection to a target peer authenticate 
   1821 their peer ID (public key) in the first packets by signing a monotonic time
   1822 stamp, its peer ID, and the target peerID and send this data as well as the signature
   1823 in one of the first packets.
   1824 Receivers should keep track (persist) of the monotonic time stamps for each
   1825 peer ID to reject possible replay attacks.
   1826 
   1827 FIXME: Handshake wire format? KX, Flow.
   1828 
   1829 TCP communicator
   1830 ^^^^^^^^^^^^^^^^
   1831 
   1832 FIXME: Handshake wire format? KX, Flow.
   1833 
   1834 QUIC communicator
   1835 ^^^^^^^^^^^^^^^^^
   1836 The QUIC communicator runs over a bi-directional UDP connection.
   1837 TLS layer with self-signed certificates (binding/signed with peer ID?).
   1838 Single, bi-directional stream?
   1839 FIXME: Handshake wire format? KX, Flow.
   1840 
   1841 .. index::
   1842    double: TRANSPORT; subsystem
   1843 
   1844 .. _TRANSPORT-Subsystem:
   1845 
   1846 TRANSPORT — Overlay transport management
   1847 ========================================
   1848 
   1849 This chapter documents how the GNUnet transport subsystem works. The
   1850 GNUnet transport subsystem consists of three main components: the
   1851 transport API (the interface used by the rest of the system to access
   1852 the transport service), the transport service itself (most of the
   1853 interesting functions, such as choosing transports, happens here) and
   1854 the transport plugins. A transport plugin is a concrete implementation
   1855 for how two GNUnet peers communicate; many plugins exist, for example
   1856 for communication via TCP, UDP, HTTP, HTTPS and others. Finally, the
   1857 transport subsystem uses supporting code, especially the NAT/UPnP
   1858 library to help with tasks such as NAT traversal.
   1859 
   1860 Key tasks of the transport service include:
   1861 
   1862 -  Create our HELLO message, notify clients and neighbours if our HELLO
   1863    changes (using NAT library as necessary)
   1864 
   1865 -  Validate HELLOs from other peers (send PING), allow other peers to
   1866    validate our HELLO's addresses (send PONG)
   1867 
   1868 -  Upon request, establish connections to other peers (using address
   1869    selection from ATS subsystem) and maintain them (again using PINGs
   1870    and PONGs) as long as desired
   1871 
   1872 -  Accept incoming connections, give ATS service the opportunity to
   1873    switch communication channels
   1874 
   1875 -  Notify clients about peers that have connected to us or that have
   1876    been disconnected from us
   1877 
   1878 -  If a (stateful) connection goes down unexpectedly (without explicit
   1879    DISCONNECT), quickly attempt to recover (without notifying clients)
   1880    but do notify clients quickly if reconnecting fails
   1881 
   1882 -  Send (payload) messages arriving from clients to other peers via
   1883    transport plugins and receive messages from other peers, forwarding
   1884    those to clients
   1885 
   1886 -  Enforce inbound traffic limits (using flow-control if it is
   1887    applicable); outbound traffic limits are enforced by CORE, not by us
   1888    (!)
   1889 
   1890 -  Enforce restrictions on P2P connection as specified by the blacklist
   1891    configuration and blacklisting clients
   1892 
   1893 Note that the term \"clients\" in the list above really refers to the
   1894 GNUnet-CORE service, as CORE is typically the only client of the
   1895 transport service.
   1896 
   1897 .. index::
   1898    double: subsystem; SET
   1899 
   1900 .. _SET-Subsystem:
   1901 
   1902 SET — Peer to peer set operations (Deprecated)
   1903 ==============================================
   1904 
   1905 .. note:: 
   1906 
   1907    The SET subsystem is in process of being replaced by the SETU and SETI
   1908    subsystems, which provide basically the same functionality, just using
   1909    two different subsystems. SETI and SETU should be used for new code.
   1910 
   1911 The SET service implements efficient set operations between two peers
   1912 over a CADET tunnel. Currently, set union and set intersection are the
   1913 only supported operations. Elements of a set consist of an *element
   1914 type* and arbitrary binary *data*. The size of an element's data is
   1915 limited to around 62 KB.
   1916 
   1917 .. _Local-Sets:
   1918 
   1919 Local Sets
   1920 ----------
   1921 
   1922 Sets created by a local client can be modified and reused for multiple
   1923 operations. As each set operation requires potentially expensive special
   1924 auxiliary data to be computed for each element of a set, a set can only
   1925 participate in one type of set operation (either union or intersection).
   1926 The type of a set is determined upon its creation. If a the elements of
   1927 a set are needed for an operation of a different type, all of the set's
   1928 element must be copied to a new set of appropriate type.
   1929 
   1930 .. _Set-Modifications:
   1931 
   1932 Set Modifications
   1933 -----------------
   1934 
   1935 Even when set operations are active, one can add to and remove elements
   1936 from a set. However, these changes will only be visible to operations
   1937 that have been created after the changes have taken place. That is,
   1938 every set operation only sees a snapshot of the set from the time the
   1939 operation was started. This mechanism is *not* implemented by copying
   1940 the whole set, but by attaching *generation information* to each element
   1941 and operation.
   1942 
   1943 .. _Set-Operations:
   1944 
   1945 Set Operations
   1946 --------------
   1947 
   1948 Set operations can be started in two ways: Either by accepting an
   1949 operation request from a remote peer, or by requesting a set operation
   1950 from a remote peer. Set operations are uniquely identified by the
   1951 involved *peers*, an *application id* and the *operation type*.
   1952 
   1953 The client is notified of incoming set operations by *set listeners*. A
   1954 set listener listens for incoming operations of a specific operation
   1955 type and application id. Once notified of an incoming set request, the
   1956 client can accept the set request (providing a local set for the
   1957 operation) or reject it.
   1958 
   1959 .. _Result-Elements:
   1960 
   1961 Result Elements
   1962 ---------------
   1963 
   1964 The SET service has three *result modes* that determine how an
   1965 operation's result set is delivered to the client:
   1966 
   1967 -  **Full Result Set.** All elements of set resulting from the set
   1968    operation are returned to the client.
   1969 
   1970 -  **Added Elements.** Only elements that result from the operation and
   1971    are not already in the local peer's set are returned. Note that for
   1972    some operations (like set intersection) this result mode will never
   1973    return any elements. This can be useful if only the remove peer is
   1974    actually interested in the result of the set operation.
   1975 
   1976 -  **Removed Elements.** Only elements that are in the local peer's
   1977    initial set but not in the operation's result set are returned. Note
   1978    that for some operations (like set union) this result mode will never
   1979    return any elements. This can be useful if only the remove peer is
   1980    actually interested in the result of the set operation.
   1981 
   1982 .. index::
   1983    double: subsystem; SETI
   1984 
   1985 .. _SETI-Subsystem:
   1986 
   1987 SETI — Peer to peer set intersections
   1988 =====================================
   1989 
   1990 The SETI service implements efficient set intersection between two peers
   1991 over a CADET tunnel. Elements of a set consist of an *element type* and
   1992 arbitrary binary *data*. The size of an element's data is limited to
   1993 around 62 KB.
   1994 
   1995 .. _Intersection-Sets:
   1996 
   1997 Intersection Sets
   1998 -----------------
   1999 
   2000 Sets created by a local client can be modified (by adding additional
   2001 elements) and reused for multiple operations. If elements are to be
   2002 removed, a fresh set must be created by the client.
   2003 
   2004 .. _Set-Intersection-Modifications:
   2005 
   2006 Set Intersection Modifications
   2007 ------------------------------
   2008 
   2009 Even when set operations are active, one can add elements to a set.
   2010 However, these changes will only be visible to operations that have been
   2011 created after the changes have taken place. That is, every set operation
   2012 only sees a snapshot of the set from the time the operation was started.
   2013 This mechanism is *not* implemented by copying the whole set, but by
   2014 attaching *generation information* to each element and operation.
   2015 
   2016 .. _Set-Intersection-Operations:
   2017 
   2018 Set Intersection Operations
   2019 ---------------------------
   2020 
   2021 Set operations can be started in two ways: Either by accepting an
   2022 operation request from a remote peer, or by requesting a set operation
   2023 from a remote peer. Set operations are uniquely identified by the
   2024 involved *peers*, an *application id* and the *operation type*.
   2025 
   2026 The client is notified of incoming set operations by *set listeners*. A
   2027 set listener listens for incoming operations of a specific operation
   2028 type and application id. Once notified of an incoming set request, the
   2029 client can accept the set request (providing a local set for the
   2030 operation) or reject it.
   2031 
   2032 .. _Intersection-Result-Elements:
   2033 
   2034 Intersection Result Elements
   2035 ----------------------------
   2036 
   2037 The SET service has two *result modes* that determine how an operation's
   2038 result set is delivered to the client:
   2039 
   2040 -  **Return intersection.** All elements of set resulting from the set
   2041    intersection are returned to the client.
   2042 
   2043 -  **Removed Elements.** Only elements that are in the local peer's
   2044    initial set but not in the intersection are returned.
   2045 
   2046 
   2047 
   2048 
   2049 .. index:: 
   2050    double: SETU; subsystem
   2051 
   2052 .. _SETU-Subsystem:
   2053 
   2054 SETU — Peer to peer set unions
   2055 ==============================
   2056 
   2057 The SETU service implements efficient set union operations between two
   2058 peers over a CADET tunnel. Elements of a set consist of an *element
   2059 type* and arbitrary binary *data*. The size of an element's data is
   2060 limited to around 62 KB.
   2061 
   2062 .. _Union-Sets:
   2063 
   2064 Union Sets
   2065 ----------
   2066 
   2067 Sets created by a local client can be modified (by adding additional
   2068 elements) and reused for multiple operations. If elements are to be
   2069 removed, a fresh set must be created by the client.
   2070 
   2071 .. _Set-Union-Modifications:
   2072 
   2073 Set Union Modifications
   2074 -----------------------
   2075 
   2076 Even when set operations are active, one can add elements to a set.
   2077 However, these changes will only be visible to operations that have been
   2078 created after the changes have taken place. That is, every set operation
   2079 only sees a snapshot of the set from the time the operation was started.
   2080 This mechanism is *not* implemented by copying the whole set, but by
   2081 attaching *generation information* to each element and operation.
   2082 
   2083 .. _Set-Union-Operations:
   2084 
   2085 Set Union Operations
   2086 --------------------
   2087 
   2088 Set operations can be started in two ways: Either by accepting an
   2089 operation request from a remote peer, or by requesting a set operation
   2090 from a remote peer. Set operations are uniquely identified by the
   2091 involved *peers*, an *application id* and the *operation type*.
   2092 
   2093 The client is notified of incoming set operations by *set listeners*. A
   2094 set listener listens for incoming operations of a specific operation
   2095 type and application id. Once notified of an incoming set request, the
   2096 client can accept the set request (providing a local set for the
   2097 operation) or reject it.
   2098 
   2099 .. _Union-Result-Elements:
   2100 
   2101 Union Result Elements
   2102 ---------------------
   2103 
   2104 The SET service has three *result modes* that determine how an
   2105 operation's result set is delivered to the client:
   2106 
   2107 -  **Locally added Elements.** Elements that are in the union but not
   2108    already in the local peer's set are returned.
   2109 
   2110 -  **Remote added Elements.** Additionally, notify the client if the
   2111    remote peer lacked some elements and thus also return to the local
   2112    client those elements that we are sending to the remote peer to be
   2113    added to its union. Obtaining these elements requires setting the
   2114    ``GNUNET_SETU_OPTION_SYMMETRIC`` option.