@c *********************************************************************** @node GNUnet Developer Handbook @chapter GNUnet Developer Handbook This book is intended to be an introduction for programmers that want to extend the GNUnet framework. GNUnet is more than a simple peer-to-peer application. For developers, GNUnet is: @itemize @bullet @item developed by a community that believes in the GNU philosophy @item Free Software (Free as in Freedom), licensed under the GNU Affero General Public License (@uref{https://www.gnu.org/licenses/licenses.html#AGPL}) @item A set of standards, including coding conventions and architectural rules @item A set of layered protocols, both specifying the communication between peers as well as the communication between components of a single peer @item A set of libraries with well-defined APIs suitable for writing extensions @end itemize In particular, the architecture specifies that a peer consists of many processes communicating via protocols. Processes can be written in almost any language. @code{C}, @code{Java} and @code{Guile} APIs exist for accessing existing services and for writing extensions. It is possible to write extensions in other languages by implementing the necessary IPC protocols. GNUnet can be extended and improved along many possible dimensions, and anyone interested in Free Software and Freedom-enhancing Networking is welcome to join the effort. This Developer Handbook attempts to provide an initial introduction to some of the key design choices and central components of the system. This part of the GNUNet documentation is far from complete, and we welcome informed contributions, be it in the form of new chapters, sections or insightful comments. @menu * Developer Introduction:: * Internal dependencies:: * Code overview:: * System Architecture:: * Subsystem stability:: * Naming conventions and coding style guide:: * Build-system:: * Developing extensions for GNUnet using the gnunet-ext template:: * Writing testcases:: * Building GNUnet and its dependencies:: * TESTING library:: * Performance regression analysis with Gauger:: * TESTBED Subsystem:: * libgnunetutil:: * Automatic Restart Manager (ARM):: * TRANSPORT Subsystem:: * NAT library:: * Distance-Vector plugin:: * SMTP plugin:: * Bluetooth plugin:: * WLAN plugin:: * ATS Subsystem:: * CORE Subsystem:: * CADET Subsystem:: * NSE Subsystem:: * HOSTLIST Subsystem:: * IDENTITY Subsystem:: * NAMESTORE Subsystem:: * PEERINFO Subsystem:: * PEERSTORE Subsystem:: * SET Subsystem:: * STATISTICS Subsystem:: * Distributed Hash Table (DHT):: * GNU Name System (GNS):: * GNS Namecache:: * REVOCATION Subsystem:: * File-sharing (FS) Subsystem:: * REGEX Subsystem:: * REST Subsystem:: * RPS Subsystem:: @end menu @node Developer Introduction @section Developer Introduction This Developer Handbook is intended as first introduction to GNUnet for new developers that want to extend the GNUnet framework. After the introduction, each of the GNUnet subsystems (directories in the @file{src/} tree) is (supposed to be) covered in its own chapter. In addition to this documentation, GNUnet developers should be aware of the services available on the GNUnet server to them. New developers can have a look a the GNUnet tutorials for C and java available in the @file{src/} directory of the repository or under the following links: @c ** FIXME: Link to files in source, not online. @c ** FIXME: Where is the Java tutorial? @itemize @bullet @item @xref{Top, Introduction,, gnunet-c-tutorial, The GNUnet C Tutorial}. @item @uref{https://tutorial.gnunet.org/, GNUnet C tutorial} @item GNUnet Java tutorial @end itemize In addition to the GNUnet Reference Documentation you are reading, the GNUnet server at @uref{https://gnunet.org} contains various resources for GNUnet developers and those who aspire to become regular contributors. They are all conveniently reachable via the "Developer" entry in the navigation menu. Some additional tools (such as static analysis reports) require a special developer access to perform certain operations. If you want (or require) access, you should contact @uref{http://grothoff.org/christian/, Christian Grothoff}, GNUnet's maintainer. @c FIXME: A good part of this belongs on the website or should be @c extended in subsections explaining usage of this. A simple list @c is just taking space people have to read. The public subsystems on the GNUnet server that help developers are: @itemize @bullet @item The version control system (git) keeps our code and enables distributed development. It is publicly accessible at @uref{https://git.gnunet.org/}. Only developers with write access can commit code, everyone else is encouraged to submit patches to the GNUnet-developers mailinglist: @uref{https://lists.gnu.org/mailman/listinfo/gnunet-developers, https://lists.gnu.org/mailman/listinfo/gnunet-developers} @item The bugtracking system (Mantis). We use it to track feature requests, open bug reports and their resolutions. It can be accessed at @uref{https://bugs.gnunet.org/, https://bugs.gnunet.org/}. Anyone can report bugs. @item Our site installation of the Continuous Integration (CI) system @code{Buildbot} is used to check GNUnet builds automatically on a range of platforms. The web interface of this CI is exposed at @uref{https://old.gnunet.org/buildbot/, https://old.gnunet.org/buildbot/}. Builds are triggered automatically 30 minutes after the last commit to our repository was made. @item The current quality of our automated test suite is assessed using Code coverage analysis. This analysis is run daily; however the webpage is only updated if all automated tests pass at that time. Testcases that improve our code coverage are always welcome. @item We try to automatically find bugs using a static analysis scan. This scan is run daily; however the webpage is only updated if all automated tests pass at the time. Note that not everything that is flagged by the analysis is a bug, sometimes even good code can be marked as possibly problematic. Nevertheless, developers are encouraged to at least be aware of all issues in their code that are listed. @item We use Gauger for automatic performance regression visualization. @c FIXME: LINK! Details on how to use Gauger are here. @item We use @uref{http://junit.org/, junit} to automatically test @command{gnunet-java}. Automatically generated, current reports on the test suite are here. @c FIXME: Likewise. @item We use Cobertura to generate test coverage reports for gnunet-java. Current reports on test coverage are here. @c FIXME: Likewise. @end itemize @c *********************************************************************** @menu * Project overview:: @end menu @node Project overview @subsection Project overview The GNUnet project consists at this point of several sub-projects. This section is supposed to give an initial overview about the various sub-projects. Note that this description also lists projects that are far from complete, including even those that have literally not a single line of code in them yet. GNUnet sub-projects in order of likely relevance are currently: @table @asis @item @command{gnunet} Core of the P2P framework, including file-sharing, VPN and chat applications; this is what the Developer Handbook covers mostly @item @command{gnunet-gtk} Gtk+-based user interfaces, including: @itemize @bullet @item @command{gnunet-fs-gtk} (file-sharing), @item @command{gnunet-statistics-gtk} (statistics over time), @item @command{gnunet-peerinfo-gtk} (information about current connections and known peers), @item @command{gnunet-namestore-gtk} (GNS record editor), @item @command{gnunet-conversation-gtk} (voice chat GUI) and @item @command{gnunet-setup} (setup tool for "everything") @end itemize @item @command{gnunet-fuse} Mounting directories shared via GNUnet's file-sharing on GNU/Linux distributions @item @command{gnunet-update} Installation and update tool @item @command{gnunet-ext} Template for starting 'external' GNUnet projects @item @command{gnunet-java} Java APIs for writing GNUnet services and applications @item @command{gnunet-java-ext} @item @command{eclectic} Code to run GNUnet nodes on testbeds for research, development, testing and evaluation @c ** FIXME: Solve the status and location of gnunet-qt @item @command{gnunet-qt} Qt-based GNUnet GUI (is it deprecated?) @item @command{gnunet-cocoa} cocoa-based GNUnet GUI (is it deprecated?) @item @command{gnunet-guile} Guile bindings for GNUnet @item @command{gnunet-python} Python bindings for GNUnet @end table We are also working on various supporting libraries and tools: @c ** FIXME: What about gauger, and what about libmwmodem? @table @asis @item @command{libextractor} GNU libextractor (meta data extraction) @item @command{libmicrohttpd} GNU libmicrohttpd (embedded HTTP(S) server library) @item @command{gauger} Tool for performance regression analysis @item @command{monkey} Tool for automated debugging of distributed systems @item @command{libmwmodem} Library for accessing satellite connection quality reports @item @command{libgnurl} gnURL (feature-restricted variant of cURL/libcurl) @item @command{www} work in progress of the new gnunet.org website (Jinja2 framework based to replace our current Drupal website) @item @command{bibliography} Our collected bibliography, papers, references, and so forth @item @command{gnunet-videos-} Videos about and around gnunet activities @end table Finally, there are various external projects (see links for a list of those that have a public website) which build on top of the GNUnet framework. @c *********************************************************************** @node Internal dependencies @section Internal dependencies This section tries to give an overview of what processes a typical GNUnet peer running a particular application would consist of. All of the processes listed here should be automatically started by @command{gnunet-arm -s}. The list is given as a rough first guide to users for failure diagnostics. Ideally, end-users should never have to worry about these internal dependencies. In terms of internal dependencies, a minimum file-sharing system consists of the following GNUnet processes (in order of dependency): @itemize @bullet @item gnunet-service-arm @item gnunet-service-resolver (required by all) @item gnunet-service-statistics (required by all) @item gnunet-service-peerinfo @item gnunet-service-transport (requires peerinfo) @item gnunet-service-core (requires transport) @item gnunet-daemon-hostlist (requires core) @item gnunet-daemon-topology (requires hostlist, peerinfo) @item gnunet-service-datastore @item gnunet-service-dht (requires core) @item gnunet-service-identity @item gnunet-service-fs (requires identity, mesh, dht, datastore, core) @end itemize @noindent A minimum VPN system consists of the following GNUnet processes (in order of dependency): @itemize @bullet @item gnunet-service-arm @item gnunet-service-resolver (required by all) @item gnunet-service-statistics (required by all) @item gnunet-service-peerinfo @item gnunet-service-transport (requires peerinfo) @item gnunet-service-core (requires transport) @item gnunet-daemon-hostlist (requires core) @item gnunet-service-dht (requires core) @item gnunet-service-mesh (requires dht, core) @item gnunet-service-dns (requires dht) @item gnunet-service-regex (requires dht) @item gnunet-service-vpn (requires regex, dns, mesh, dht) @end itemize @noindent A minimum GNS system consists of the following GNUnet processes (in order of dependency): @itemize @bullet @item gnunet-service-arm @item gnunet-service-resolver (required by all) @item gnunet-service-statistics (required by all) @item gnunet-service-peerinfo @item gnunet-service-transport (requires peerinfo) @item gnunet-service-core (requires transport) @item gnunet-daemon-hostlist (requires core) @item gnunet-service-dht (requires core) @item gnunet-service-mesh (requires dht, core) @item gnunet-service-dns (requires dht) @item gnunet-service-regex (requires dht) @item gnunet-service-vpn (requires regex, dns, mesh, dht) @item gnunet-service-identity @item gnunet-service-namestore (requires identity) @item gnunet-service-gns (requires vpn, dns, dht, namestore, identity) @end itemize @c *********************************************************************** @node Code overview @section Code overview This section gives a brief overview of the GNUnet source code. Specifically, we sketch the function of each of the subdirectories in the @file{gnunet/src/} directory. The order given is roughly bottom-up (in terms of the layers of the system). @table @asis @item @file{util/} --- libgnunetutil Library with general utility functions, all GNUnet binaries link against this library. Anything from memory allocation and data structures to cryptography and inter-process communication. The goal is to provide an OS-independent interface and more 'secure' or convenient implementations of commonly used primitives. The API is spread over more than a dozen headers, developers should study those closely to avoid duplicating existing functions. @pxref{libgnunetutil}. @item @file{hello/} --- libgnunethello HELLO messages are used to describe under which addresses a peer can be reached (for example, protocol, IP, port). This library manages parsing and generating of HELLO messages. @item @file{block/} --- libgnunetblock The DHT and other components of GNUnet store information in units called 'blocks'. Each block has a type and the type defines a particular format and how that binary format is to be linked to a hash code (the key for the DHT and for databases). The block library is a wrapper around block plugins which provide the necessary functions for each block type. @item @file{statistics/} --- statistics service The statistics service enables associating values (of type uint64_t) with a component name and a string. The main uses is debugging (counting events), performance tracking and user entertainment (what did my peer do today?). @item @file{arm/} --- Automatic Restart Manager (ARM) The automatic-restart-manager (ARM) service is the GNUnet master service. Its role is to start gnunet-services, to re-start them when they crashed and finally to shut down the system when requested. @item @file{peerinfo/} --- peerinfo service The peerinfo service keeps track of which peers are known to the local peer and also tracks the validated addresses for each peer (in the form of a HELLO message) for each of those peers. The peer is not necessarily connected to all peers known to the peerinfo service. Peerinfo provides persistent storage for peer identities --- peers are not forgotten just because of a system restart. @item @file{datacache/} --- libgnunetdatacache The datacache library provides (temporary) block storage for the DHT. Existing plugins can store blocks in Sqlite, Postgres or MySQL databases. All data stored in the cache is lost when the peer is stopped or restarted (datacache uses temporary tables). @item @file{datastore/} --- datastore service The datastore service stores file-sharing blocks in databases for extended periods of time. In contrast to the datacache, data is not lost when peers restart. However, quota restrictions may still cause old, expired or low-priority data to be eventually discarded. Existing plugins can store blocks in Sqlite, Postgres or MySQL databases. @item @file{template/} --- service template Template for writing a new service. Does nothing. @item @file{ats/} --- Automatic Transport Selection The automatic transport selection (ATS) service is responsible for deciding which address (i.e. which transport plugin) should be used for communication with other peers, and at what bandwidth. @item @file{nat/} --- libgnunetnat Library that provides basic functions for NAT traversal. The library supports NAT traversal with manual hole-punching by the user, UPnP and ICMP-based autonomous NAT traversal. The library also includes an API for testing if the current configuration works and the @code{gnunet-nat-server} which provides an external service to test the local configuration. @item @file{fragmentation/} --- libgnunetfragmentation Some transports (UDP and WLAN, mostly) have restrictions on the maximum transfer unit (MTU) for packets. The fragmentation library can be used to break larger packets into chunks of at most 1k and transmit the resulting fragments reliably (with acknowledgment, retransmission, timeouts, etc.). @item @file{transport/} --- transport service The transport service is responsible for managing the basic P2P communication. It uses plugins to support P2P communication over TCP, UDP, HTTP, HTTPS and other protocols.The transport service validates peer addresses, enforces bandwidth restrictions, limits the total number of connections and enforces connectivity restrictions (i.e. friends-only). @item @file{peerinfo-tool/} --- gnunet-peerinfo This directory contains the gnunet-peerinfo binary which can be used to inspect the peers and HELLOs known to the peerinfo service. @item @file{core/} The core service is responsible for establishing encrypted, authenticated connections with other peers, encrypting and decrypting messages and forwarding messages to higher-level services that are interested in them. @item @file{testing/} --- libgnunettesting The testing library allows starting (and stopping) peers for writing testcases. It also supports automatic generation of configurations for peers ensuring that the ports and paths are disjoint. libgnunettesting is also the foundation for the testbed service @item @file{testbed/} --- testbed service The testbed service is used for creating small or large scale deployments of GNUnet peers for evaluation of protocols. It facilitates peer deployments on multiple hosts (for example, in a cluster) and establishing various network topologies (both underlay and overlay). @item @file{nse/} --- Network Size Estimation The network size estimation (NSE) service implements a protocol for (securely) estimating the current size of the P2P network. @item @file{dht/} --- distributed hash table The distributed hash table (DHT) service provides a distributed implementation of a hash table to store blocks under hash keys in the P2P network. @item @file{hostlist/} --- hostlist service The hostlist service allows learning about other peers in the network by downloading HELLO messages from an HTTP server, can be configured to run such an HTTP server and also implements a P2P protocol to advertise and automatically learn about other peers that offer a public hostlist server. @item @file{topology/} --- topology service The topology service is responsible for maintaining the mesh topology. It tries to maintain connections to friends (depending on the configuration) and also tries to ensure that the peer has a decent number of active connections at all times. If necessary, new connections are added. All peers should run the topology service, otherwise they may end up not being connected to any other peer (unless some other service ensures that core establishes the required connections). The topology service also tells the transport service which connections are permitted (for friend-to-friend networking) @item @file{fs/} --- file-sharing The file-sharing (FS) service implements GNUnet's file-sharing application. Both anonymous file-sharing (using gap) and non-anonymous file-sharing (using dht) are supported. @item @file{cadet/} --- cadet service The CADET service provides a general-purpose routing abstraction to create end-to-end encrypted tunnels in mesh networks. We wrote a paper documenting key aspects of the design. @item @file{tun/} --- libgnunettun Library for building IPv4, IPv6 packets and creating checksums for UDP, TCP and ICMP packets. The header defines C structs for common Internet packet formats and in particular structs for interacting with TUN (virtual network) interfaces. @item @file{mysql/} --- libgnunetmysql Library for creating and executing prepared MySQL statements and to manage the connection to the MySQL database. Essentially a lightweight wrapper for the interaction between GNUnet components and libmysqlclient. @item @file{dns/} Service that allows intercepting and modifying DNS requests of the local machine. Currently used for IPv4-IPv6 protocol translation (DNS-ALG) as implemented by "pt/" and for the GNUnet naming system. The service can also be configured to offer an exit service for DNS traffic. @item @file{vpn/} --- VPN service The virtual public network (VPN) service provides a virtual tunnel interface (VTUN) for IP routing over GNUnet. Needs some other peers to run an "exit" service to work. Can be activated using the "gnunet-vpn" tool or integrated with DNS using the "pt" daemon. @item @file{exit/} Daemon to allow traffic from the VPN to exit this peer to the Internet or to specific IP-based services of the local peer. Currently, an exit service can only be restricted to IPv4 or IPv6, not to specific ports and or IP address ranges. If this is not acceptable, additional firewall rules must be added manually. exit currently only works for normal UDP, TCP and ICMP traffic; DNS queries need to leave the system via a DNS service. @item @file{pt/} protocol translation daemon. This daemon enables 4-to-6, 6-to-4, 4-over-6 or 6-over-4 transitions for the local system. It essentially uses "DNS" to intercept DNS replies and then maps results to those offered by the VPN, which then sends them using mesh to some daemon offering an appropriate exit service. @item @file{identity/} Management of egos (alter egos) of a user; identities are essentially named ECC private keys and used for zones in the GNU name system and for namespaces in file-sharing, but might find other uses later @item @file{revocation/} Key revocation service, can be used to revoke the private key of an identity if it has been compromised @item @file{namecache/} Cache for resolution results for the GNU name system; data is encrypted and can be shared among users, loss of the data should ideally only result in a performance degradation (persistence not required) @item @file{namestore/} Database for the GNU name system with per-user private information, persistence required @item @file{gns/} GNU name system, a GNU approach to DNS and PKI. @item @file{dv/} A plugin for distance-vector (DV)-based routing. DV consists of a service and a transport plugin to provide peers with the illusion of a direct P2P connection for connections that use multiple (typically up to 3) hops in the actual underlay network. @item @file{regex/} Service for the (distributed) evaluation of regular expressions. @item @file{scalarproduct/} The scalar product service offers an API to perform a secure multiparty computation which calculates a scalar product between two peers without exposing the private input vectors of the peers to each other. @item @file{consensus/} The consensus service will allow a set of peers to agree on a set of values via a distributed set union computation. @item @file{rest/} The rest API allows access to GNUnet services using RESTful interaction. The services provide plugins that can exposed by the rest server. @c FIXME: Where did this disappear to? @c @item @file{experimentation/} @c The experimentation daemon coordinates distributed @c experimentation to evaluate transport and ATS properties. @end table @c *********************************************************************** @node System Architecture @section System Architecture @c FIXME: For those irritated by the textflow, we are missing images here, @c in the short term we should add them back, in the long term this should @c work without images or have images with alt-text. GNUnet developers like LEGOs. The blocks are indestructible, can be stacked together to construct complex buildings and it is generally easy to swap one block for a different one that has the same shape. GNUnet's architecture is based on LEGOs: @image{images/service_lego_block,5in,,picture of a LEGO block stack - 3 APIs upon IPC/network protocol provided by a service} This chapter documents the GNUnet LEGO system, also known as GNUnet's system architecture. The most common GNUnet component is a service. Services offer an API (or several, depending on what you count as "an API") which is implemented as a library. The library communicates with the main process of the service using a service-specific network protocol. The main process of the service typically doesn't fully provide everything that is needed --- it has holes to be filled by APIs to other services. A special kind of component in GNUnet are user interfaces and daemons. Like services, they have holes to be filled by APIs of other services. Unlike services, daemons do not implement their own network protocol and they have no API: @image{images/daemon_lego_block,5in,,A daemon in GNUnet is a component that does not offer an API for others to build upon} The GNUnet system provides a range of services, daemons and user interfaces, which are then combined into a layered GNUnet instance (also known as a peer). @image{images/service_stack,5in,,A GNUnet peer consists of many layers of services} Note that while it is generally possible to swap one service for another compatible service, there is often only one implementation. However, during development we often have a "new" version of a service in parallel with an "old" version. While the "new" version is not working, developers working on other parts of the service can continue their development by simply using the "old" service. Alternative design ideas can also be easily investigated by swapping out individual components. This is typically achieved by simply changing the name of the "BINARY" in the respective configuration section. Key properties of GNUnet services are that they must be separate processes and that they must protect themselves by applying tight error checking against the network protocol they implement (thereby achieving a certain degree of robustness). On the other hand, the APIs are implemented to tolerate failures of the service, isolating their host process from errors by the service. If the service process crashes, other services and daemons around it should not also fail, but instead wait for the service process to be restarted by ARM. @c *********************************************************************** @node Subsystem stability @section Subsystem stability This section documents the current stability of the various GNUnet subsystems. Stability here describes the expected degree of compatibility with future versions of GNUnet. For each subsystem we distinguish between compatibility on the P2P network level (communication protocol between peers), the IPC level (communication between the service and the service library) and the API level (stability of the API). P2P compatibility is relevant in terms of which applications are likely going to be able to communicate with future versions of the network. IPC communication is relevant for the implementation of language bindings that re-implement the IPC messages. Finally, API compatibility is relevant to developers that hope to be able to avoid changes to applications build on top of the APIs of the framework. The following table summarizes our current view of the stability of the respective protocols or APIs: @multitable @columnfractions .20 .20 .20 .20 @headitem Subsystem @tab P2P @tab IPC @tab C API @item util @tab n/a @tab n/a @tab stable @item arm @tab n/a @tab stable @tab stable @item ats @tab n/a @tab unstable @tab testing @item block @tab n/a @tab n/a @tab stable @item cadet @tab testing @tab testing @tab testing @item consensus @tab experimental @tab experimental @tab experimental @item core @tab stable @tab stable @tab stable @item datacache @tab n/a @tab n/a @tab stable @item datastore @tab n/a @tab stable @tab stable @item dht @tab stable @tab stable @tab stable @item dns @tab stable @tab stable @tab stable @item dv @tab testing @tab testing @tab n/a @item exit @tab testing @tab n/a @tab n/a @item fragmentation @tab stable @tab n/a @tab stable @item fs @tab stable @tab stable @tab stable @item gns @tab stable @tab stable @tab stable @item hello @tab n/a @tab n/a @tab testing @item hostlist @tab stable @tab stable @tab n/a @item identity @tab stable @tab stable @tab n/a @item multicast @tab experimental @tab experimental @tab experimental @item mysql @tab stable @tab n/a @tab stable @item namestore @tab n/a @tab stable @tab stable @item nat @tab n/a @tab n/a @tab stable @item nse @tab stable @tab stable @tab stable @item peerinfo @tab n/a @tab stable @tab stable @item psyc @tab experimental @tab experimental @tab experimental @item pt @tab n/a @tab n/a @tab n/a @item regex @tab stable @tab stable @tab stable @item revocation @tab stable @tab stable @tab stable @item social @tab experimental @tab experimental @tab experimental @item statistics @tab n/a @tab stable @tab stable @item testbed @tab n/a @tab testing @tab testing @item testing @tab n/a @tab n/a @tab testing @item topology @tab n/a @tab n/a @tab n/a @item transport @tab stable @tab stable @tab stable @item tun @tab n/a @tab n/a @tab stable @item vpn @tab testing @tab n/a @tab n/a @end multitable Here is a rough explanation of the values: @table @samp @item stable No incompatible changes are planned at this time; for IPC/APIs, if there are incompatible changes, they will be minor and might only require minimal changes to existing code; for P2P, changes will be avoided if at all possible for the 0.10.x-series @item testing No incompatible changes are planned at this time, but the code is still known to be in flux; so while we have no concrete plans, our expectation is that there will still be minor modifications; for P2P, changes will likely be extensions that should not break existing code @item unstable Changes are planned and will happen; however, they will not be totally radical and the result should still resemble what is there now; nevertheless, anticipated changes will break protocol/API compatibility @item experimental Changes are planned and the result may look nothing like what the API/protocol looks like today @item unknown Someone should think about where this subsystem headed @item n/a This subsystem does not have an API/IPC-protocol/P2P-protocol @end table @c *********************************************************************** @node Naming conventions and coding style guide @section Naming conventions and coding style guide Here you can find some rules to help you write code for GNUnet. @c *********************************************************************** @menu * Naming conventions:: * Coding style:: @end menu @node Naming conventions @subsection Naming conventions @c *********************************************************************** @menu * include files:: * binaries:: * logging:: * configuration:: * exported symbols:: * private (library-internal) symbols (including structs and macros):: * testcases:: * performance tests:: * src/ directories:: @end menu @node include files @subsubsection include files @itemize @bullet @item _lib: library without need for a process @item _service: library that needs a service process @item _plugin: plugin definition @item _protocol: structs used in network protocol @item exceptions: @itemize @bullet @item gnunet_config.h --- generated @item platform.h --- first included @item plibc.h --- external library @item gnunet_common.h --- fundamental routines @item gnunet_directories.h --- generated @item gettext.h --- external library @end itemize @end itemize @c *********************************************************************** @node binaries @subsubsection binaries @itemize @bullet @item gnunet-service-xxx: service process (has listen socket) @item gnunet-daemon-xxx: daemon process (no listen socket) @item gnunet-helper-xxx[-yyy]: SUID helper for module xxx @item gnunet-yyy: command-line tool for end-users @item libgnunet_plugin_xxx_yyy.so: plugin for API xxx @item libgnunetxxx.so: library for API xxx @end itemize @c *********************************************************************** @node logging @subsubsection logging @itemize @bullet @item services and daemons use their directory name in @code{GNUNET_log_setup} (i.e. 'core') and log using plain 'GNUNET_log'. @item command-line tools use their full name in @code{GNUNET_log_setup} (i.e. 'gnunet-publish') and log using plain 'GNUNET_log'. @item service access libraries log using '@code{GNUNET_log_from}' and use '@code{DIRNAME-api}' for the component (i.e. 'core-api') @item pure libraries (without associated service) use '@code{GNUNET_log_from}' with the component set to their library name (without lib or '@file{.so}'), which should also be their directory name (i.e. '@file{nat}') @item plugins should use '@code{GNUNET_log_from}' with the directory name and the plugin name combined to produce the component name (i.e. 'transport-tcp'). @item logging should be unified per-file by defining a @code{LOG} macro with the appropriate arguments, along these lines: @example #define LOG(kind,...) GNUNET_log_from (kind, "example-api",__VA_ARGS__) @end example @end itemize @c *********************************************************************** @node configuration @subsubsection configuration @itemize @bullet @item paths (that are substituted in all filenames) are in PATHS (have as few as possible) @item all options for a particular module (@file{src/MODULE}) are under @code{[MODULE]} @item options for a plugin of a module are under @code{[MODULE-PLUGINNAME]} @end itemize @c *********************************************************************** @node exported symbols @subsubsection exported symbols @itemize @bullet @item must start with @code{GNUNET_modulename_} and be defined in @file{modulename.c} @item exceptions: those defined in @file{gnunet_common.h} @end itemize @c *********************************************************************** @node private (library-internal) symbols (including structs and macros) @subsubsection private (library-internal) symbols (including structs and macros) @itemize @bullet @item must NOT start with any prefix @item must not be exported in a way that linkers could use them or@ other libraries might see them via headers; they must be either declared/defined in C source files or in headers that are in the respective directory under @file{src/modulename/} and NEVER be declared in @file{src/include/}. @end itemize @node testcases @subsubsection testcases @itemize @bullet @item must be called @file{test_module-under-test_case-description.c} @item "case-description" maybe omitted if there is only one test @end itemize @c *********************************************************************** @node performance tests @subsubsection performance tests @itemize @bullet @item must be called @file{perf_module-under-test_case-description.c} @item "case-description" maybe omitted if there is only one performance test @item Must only be run if @code{HAVE_BENCHMARKS} is satisfied @end itemize @c *********************************************************************** @node src/ directories @subsubsection src/ directories @itemize @bullet @item gnunet-NAME: end-user applications (i.e., gnunet-search, gnunet-arm) @item gnunet-service-NAME: service processes with accessor library (i.e., gnunet-service-arm) @item libgnunetNAME: accessor library (_service.h-header) or standalone library (_lib.h-header) @item gnunet-daemon-NAME: daemon process without accessor library (i.e., gnunet-daemon-hostlist) and no GNUnet management port @item libgnunet_plugin_DIR_NAME: loadable plugins (i.e., libgnunet_plugin_transport_tcp) @end itemize @cindex Coding style @node Coding style @subsection Coding style @c XXX: Adjust examples to GNU Standards! @itemize @bullet @item We follow the GNU Coding Standards (@pxref{Top, The GNU Coding Standards,, standards, The GNU Coding Standards}); @item Indentation is done with spaces, two per level, no tabs; @item C99 struct initialization is fine; @item declare only one variable per line, for example: @noindent instead of @example int i,j; @end example @noindent write: @example int i; int j; @end example @c TODO: include actual example from a file in source @noindent This helps keep diffs small and forces developers to think precisely about the type of every variable. Note that @code{char *} is different from @code{const char*} and @code{int} is different from @code{unsigned int} or @code{uint32_t}. Each variable type should be chosen with care. @item While @code{goto} should generally be avoided, having a @code{goto} to the end of a function to a block of clean up statements (free, close, etc.) can be acceptable. @item Conditions should be written with constants on the left (to avoid accidental assignment) and with the @code{true} target being either the @code{error} case or the significantly simpler continuation. For example: @example if (0 != stat ("filename," &sbuf)) @{ error(); @} else @{ /* handle normal case here */ @} @end example @noindent instead of @example if (stat ("filename," &sbuf) == 0) @{ /* handle normal case here */ @} else @{ error(); @} @end example @noindent If possible, the error clause should be terminated with a @code{return} (or @code{goto} to some cleanup routine) and in this case, the @code{else} clause should be omitted: @example if (0 != stat ("filename", &sbuf)) @{ error(); return; @} /* handle normal case here */ @end example This serves to avoid deep nesting. The 'constants on the left' rule applies to all constants (including. @code{GNUNET_SCHEDULER_NO_TASK}), NULL, and enums). With the two above rules (constants on left, errors in 'true' branch), there is only one way to write most branches correctly. @item Combined assignments and tests are allowed if they do not hinder code clarity. For example, one can write: @example if (NULL == (value = lookup_function())) @{ error(); return; @} @end example @item Use @code{break} and @code{continue} wherever possible to avoid deep(er) nesting. Thus, we would write: @example next = head; while (NULL != (pos = next)) @{ next = pos->next; if (! should_free (pos)) continue; GNUNET_CONTAINER_DLL_remove (head, tail, pos); GNUNET_free (pos); @} @end example instead of @example next = head; while (NULL != (pos = next)) @{ next = pos->next; if (should_free (pos)) @{ /* unnecessary nesting! */ GNUNET_CONTAINER_DLL_remove (head, tail, pos); GNUNET_free (pos); @} @} @end example @item We primarily use @code{for} and @code{while} loops. A @code{while} loop is used if the method for advancing in the loop is not a straightforward increment operation. In particular, we use: @example next = head; while (NULL != (pos = next)) @{ next = pos->next; if (! should_free (pos)) continue; GNUNET_CONTAINER_DLL_remove (head, tail, pos); GNUNET_free (pos); @} @end example to free entries in a list (as the iteration changes the structure of the list due to the free; the equivalent @code{for} loop does no longer follow the simple @code{for} paradigm of @code{for(INIT;TEST;INC)}). However, for loops that do follow the simple @code{for} paradigm we do use @code{for}, even if it involves linked lists: @example /* simple iteration over a linked list */ for (pos = head; NULL != pos; pos = pos->next) @{ use (pos); @} @end example @item The first argument to all higher-order functions in GNUnet must be declared to be of type @code{void *} and is reserved for a closure. We do not use inner functions, as trampolines would conflict with setups that use non-executable stacks. The first statement in a higher-order function, which unusually should be part of the variable declarations, should assign the @code{cls} argument to the precise expected type. For example: @example int callback (void *cls, char *args) @{ struct Foo *foo = cls; int other_variables; /* rest of function */ @} @end example @item As shown in the example above, after the return type of a function there should be a break. Each parameter should be on a new line. @item It is good practice to write complex @code{if} expressions instead of using deeply nested @code{if} statements. However, except for addition and multiplication, all operators should use parens. This is fine: @example if ( (1 == foo) || ( (0 == bar) && (x != y) ) ) return x; @end example However, this is not: @example if (1 == foo) return x; if (0 == bar && x != y) return x; @end example @noindent Note that splitting the @code{if} statement above is debatable as the @code{return x} is a very trivial statement. However, once the logic after the branch becomes more complicated (and is still identical), the "or" formulation should be used for sure. @item There should be two empty lines between the end of the function and the comments describing the following function. There should be a single empty line after the initial variable declarations of a function. If a function has no local variables, there should be no initial empty line. If a long function consists of several complex steps, those steps might be separated by an empty line (possibly followed by a comment describing the following step). The code should not contain empty lines in arbitrary places; if in doubt, it is likely better to NOT have an empty line (this way, more code will fit on the screen). @end itemize @c *********************************************************************** @node Build-system @section Build-system If you have code that is likely not to compile or build rules you might want to not trigger for most developers, use @code{if HAVE_EXPERIMENTAL} in your @file{Makefile.am}. Then it is OK to (temporarily) add non-compiling (or known-to-not-port) code. If you want to compile all testcases but NOT run them, run configure with the @code{--enable-test-suppression} option. If you want to run all testcases, including those that take a while, run configure with the @code{--enable-expensive-testcases} option. If you want to compile and run benchmarks, run configure with the @code{--enable-benchmarks} option. If you want to obtain code coverage results, run configure with the @code{--enable-coverage} option and run the @file{coverage.sh} script in the @file{contrib/} directory. @cindex gnunet-ext @node Developing extensions for GNUnet using the gnunet-ext template @section Developing extensions for GNUnet using the gnunet-ext template For developers who want to write extensions for GNUnet we provide the gnunet-ext template to provide an easy to use skeleton. gnunet-ext contains the build environment and template files for the development of GNUnet services, command line tools, APIs and tests. First of all you have to obtain gnunet-ext from git: @example git clone https://git.gnunet.org/gnunet-ext.git @end example The next step is to bootstrap and configure it. For configure you have to provide the path containing GNUnet with @code{--with-gnunet=/path/to/gnunet} and the prefix where you want the install the extension using @code{--prefix=/path/to/install}: @example ./bootstrap ./configure --prefix=/path/to/install --with-gnunet=/path/to/gnunet @end example When your GNUnet installation is not included in the default linker search path, you have to add @code{/path/to/gnunet} to the file @file{/etc/ld.so.conf} and run @code{ldconfig} or your add it to the environmental variable @code{LD_LIBRARY_PATH} by using @example export LD_LIBRARY_PATH=/path/to/gnunet/lib @end example @cindex writing testcases @node Writing testcases @section Writing testcases Ideally, any non-trivial GNUnet code should be covered by automated testcases. Testcases should reside in the same place as the code that is being tested. The name of source files implementing tests should begin with @code{test_} followed by the name of the file that contains the code that is being tested. Testcases in GNUnet should be integrated with the autotools build system. This way, developers and anyone building binary packages will be able to run all testcases simply by running @code{make check}. The final testcases shipped with the distribution should output at most some brief progress information and not display debug messages by default. The success or failure of a testcase must be indicated by returning zero (success) or non-zero (failure) from the main method of the testcase. The integration with the autotools is relatively straightforward and only requires modifications to the @file{Makefile.am} in the directory containing the testcase. For a testcase testing the code in @file{foo.c} the @file{Makefile.am} would contain the following lines: @example check_PROGRAMS = test_foo TESTS = $(check_PROGRAMS) test_foo_SOURCES = test_foo.c test_foo_LDADD = $(top_builddir)/src/util/libgnunetutil.la @end example Naturally, other libraries used by the testcase may be specified in the @code{LDADD} directive as necessary. Often testcases depend on additional input files, such as a configuration file. These support files have to be listed using the @code{EXTRA_DIST} directive in order to ensure that they are included in the distribution. Example: @example EXTRA_DIST = test_foo_data.conf @end example Executing @code{make check} will run all testcases in the current directory and all subdirectories. Testcases can be compiled individually by running @code{make test_foo} and then invoked directly using @code{./test_foo}. Note that due to the use of plugins in GNUnet, it is typically necessary to run @code{make install} before running any testcases. Thus the canonical command @code{make check install} has to be changed to @code{make install check} for GNUnet. @c *********************************************************************** @cindex Building GNUnet @node Building GNUnet and its dependencies @section Building GNUnet and its dependencies In the following section we will outline how to build GNUnet and some of its dependencies. We will assume a fair amount of knowledge for building applications under UNIX-like systems. Furthermore we assume that the build environment is sane and that you are aware of any implications actions in this process could have. Instructions here can be seen as notes for developers (an extension to the 'HACKING' section in README) as well as package maintainers. @b{Users should rely on the available binary packages.} We will use Debian as an example Operating System environment. Substitute accordingly with your own Operating System environment. For the full list of dependencies, consult the appropriate, up-to-date section in the @file{README} file. First, we need to build or install (depending on your OS) the following packages. If you build them from source, build them in this exact order: @example libgpgerror, libgcrypt, libnettle, libunbound, GnuTLS (with libunbound support) @end example After we have build and installed those packages, we continue with packages closer to GNUnet in this step: libgnurl (our libcurl fork), GNU libmicrohttpd, and GNU libextractor. Again, if your package manager provides one of these packages, use the packages provided from it unless you have good reasons (package version too old, conflicts, etc). We advise against compiling widely used packages such as GnuTLS yourself if your OS provides a variant already unless you take care of maintenance of the packages then. In the optimistic case, this command will give you all the dependencies: @example sudo apt-get install libgnurl libmicrohttpd libextractor @end example From experience we know that at the very least libgnurl is not available in some environments. You could substitute libgnurl with libcurl, but we recommend to install libgnurl, as it gives you a predefined libcurl with the small set GNUnet requires. In the past namespaces of libcurl and libgnurl were shared, which caused problems when you wanted to integrate both of them in one Operating System. This has been resolved, and they can be installed side by side now. @cindex libgnurl @cindex compiling libgnurl GNUnet and some of its function depend on a limited subset of cURL/libcurl. Rather than trying to enforce a certain configuration on the world, we opted to maintain a microfork of it that ensures we can link against the right set of features. We called this specialized set of libcurl ``libgnurl''. It is fully ABI compatible with libcurl and currently used by GNUnet and some of its dependencies. We download libgnurl and its digital signature from the GNU fileserver, assuming @env{TMPDIR} exists. Note: TMPDIR might be @file{/tmp}, @env{TMPDIR}, @env{TMP} or any other location. For consistency we assume @env{TMPDIR} points to @file{/tmp} for the remainder of this section. @example cd \$TMPDIR wget https://ftp.gnu.org/gnu/gnunet/gnurl-7.60.0.tar.Z wget https://ftp.gnu.org/gnu/gnunet/gnurl-7.60.0.tar.Z.sig @end example Next, verify the digital signature of the file: @example gpg --verify gnurl-7.60.0.tar.Z.sig @end example If gpg fails, you might try with @command{gpg2} on your OS. If the error states that ``the key can not be found'' or it is unknown, you have to retrieve the key (A88C8ADD129828D7EAC02E52E22F9BBFEE348588) from a keyserver first: @example gpg --keyserver pgp.mit.edu --recv-keys A88C8ADD129828D7EAC02E52E22F9BBFEE348588 @end example and rerun the verification command. libgnurl will require the following packages to be present at runtime: gnutls (with DANE support / libunbound), libidn, zlib and at compile time: libtool, groff, perl, pkg-config, and python 2.7. Once you have verified that all the required packages are present on your system, we can proceed to compile libgnurl: @example tar -xvf gnurl-7.60.0.tar.Z cd gnurl-7.60.0 sh configure --disable-ntlm-wb make make -C tests test sudo make install @end example After you've compiled and installed libgnurl, we can proceed to building GNUnet. First, in addition to the GNUnet sources you might require downloading the latest version of various dependencies, depending on how recent the software versions in your distribution of GNU/Linux are. Most distributions do not include sufficiently recent versions of these dependencies. Thus, a typically installation on a "modern" GNU/Linux distribution requires you to install the following dependencies (ideally in this order): @itemize @bullet @item libgpgerror and libgcrypt @item libnettle and libunbound (possibly from distribution), GnuTLS @item libgnurl (read the README) @item GNU libmicrohttpd @item GNU libextractor @end itemize Make sure to first install the various mandatory and optional dependencies including development headers from your distribution. Other dependencies that you should strongly consider to install is a database (MySQL, sqlite or Postgres). The following instructions will assume that you installed at least sqlite. For most distributions you should be able to find pre-build packages for the database. Again, make sure to install the client libraries @b{and} the respective development headers (if they are packaged separately) as well. You can find specific, detailed instructions for installing of the dependencies (and possibly the rest of the GNUnet installation) in the platform-specific descriptions, which can be found in the Index. Please consult them now. If your distribution is not listed, please study the build instructions for Debian stable, carefully as you try to install the dependencies for your own distribution. Contributing additional instructions for further platforms is always appreciated. Please take in mind that operating system development tends to move at a rather fast speed. Due to this you should be aware that some of the instructions could be outdated by the time you are reading this. If you find a mistake, please tell us about it (or even better: send a patch to the documentation to fix it!). Before proceeding further, please double-check the dependency list. Note that in addition to satisfying the dependencies, you might have to make sure that development headers for the various libraries are also installed. There maybe files for other distributions, or you might be able to find equivalent packages for your distribution. While it is possible to build and install GNUnet without having root access, we will assume that you have full control over your system in these instructions. First, you should create a system user @emph{gnunet} and an additional group @emph{gnunetdns}. On the GNU/Linux distributions Debian and Ubuntu, type: @example sudo adduser --system --home /var/lib/gnunet --group \ --disabled-password gnunet sudo addgroup --system gnunetdns @end example @noindent On other Unixes and GNU systems, this should have the same effect: @example sudo useradd --system --groups gnunet --home-dir /var/lib/gnunet sudo addgroup --system gnunetdns @end example Now compile and install GNUnet using: @example tar xvf gnunet-@value{VERSION}.tar.gz cd gnunet-@value{VERSION} ./configure --with-sudo=sudo --with-nssdir=/lib make sudo make install @end example If you want to be able to enable DEBUG-level log messages, add @code{--enable-logging=verbose} to the end of the @command{./configure} command. @code{DEBUG}-level log messages are in English only and should only be useful for developers (or for filing really detailed bug reports). @noindent Next, edit the file @file{/etc/gnunet.conf} to contain the following: @example [arm] START_SYSTEM_SERVICES = YES START_USER_SERVICES = NO @end example @noindent You may need to update your @code{ld.so} cache to include files installed in @file{/usr/local/lib}: @example # ldconfig @end example @noindent Then, switch from user @code{root} to user @code{gnunet} to start the peer: @example # su -s /bin/sh - gnunet $ gnunet-arm -c /etc/gnunet.conf -s @end example You may also want to add the last line in the gnunet user's @file{crontab} prefixed with @code{@@reboot} so that it is executed whenever the system is booted: @example @@reboot /usr/local/bin/gnunet-arm -c /etc/gnunet.conf -s @end example @noindent This will only start the system-wide GNUnet services. Type @command{exit} to get back your root shell. Now, you need to configure the per-user part. For each user that should get access to GNUnet on the system, run (replace alice with your username): @example sudo adduser alice gnunet @end example @noindent to allow them to access the system-wide GNUnet services. Then, each user should create a configuration file @file{~/.config/gnunet.conf} with the lines: @example [arm] START_SYSTEM_SERVICES = NO START_USER_SERVICES = YES DEFAULTSERVICES = gns @end example @noindent and start the per-user services using @example $ gnunet-arm -c ~/.config/gnunet.conf -s @end example @noindent Again, adding a @code{crontab} entry to autostart the peer is advised: @example @@reboot /usr/local/bin/gnunet-arm -c $HOME/.config/gnunet.conf -s @end example @noindent Note that some GNUnet services (such as SOCKS5 proxies) may need a system-wide TCP port for each user. For those services, systems with more than one user may require each user to specify a different port number in their personal configuration file. Finally, the user should perform the basic initial setup for the GNU Name System (GNS) certificate authority. This is done by running: @example $ gnunet-gns-proxy-setup-ca @end example @noindent The first generates the default zones, whereas the second setups the GNS Certificate Authority with the user's browser. Now, to activate GNS in the normal DNS resolution process, you need to edit your @file{/etc/nsswitch.conf} where you should find a line like this: @example hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4 @end example @noindent The exact details may differ a bit, which is fine. Add the text @emph{"gns [NOTFOUND=return]"} after @emph{"files"}. Keep in mind that we included a backslash ("\") here just for markup reasons. You should write the text below on @b{one line} and @b{without} the "\": @example hosts: files gns [NOTFOUND=return] mdns4_minimal \ [NOTFOUND=return] dns mdns4 @end example @c FIXME: Document new behavior. You might want to make sure that @file{/lib/libnss_gns.so.2} exists on your system, it should have been created during the installation. @c ********************************************************************** @cindex TESTING library @node TESTING library @section TESTING library The TESTING library is used for writing testcases which involve starting a single or multiple peers. While peers can also be started by testcases using the ARM subsystem, using TESTING library provides an elegant way to do this. The configurations of the peers are auto-generated from a given template to have non-conflicting port numbers ensuring that peers' services do not run into bind errors. This is achieved by testing ports' availability by binding a listening socket to them before allocating them to services in the generated configurations. An another advantage while using TESTING is that it shortens the testcase startup time as the hostkeys for peers are copied from a pre-computed set of hostkeys instead of generating them at peer startup which may take a considerable amount of time when starting multiple peers or on an embedded processor. TESTING also allows for certain services to be shared among peers. This feature is invaluable when testing with multiple peers as it helps to reduce the number of services run per each peer and hence the total number of processes run per testcase. TESTING library only handles creating, starting and stopping peers. Features useful for testcases such as connecting peers in a topology are not available in TESTING but are available in the TESTBED subsystem. Furthermore, TESTING only creates peers on the localhost, however by using TESTBED testcases can benefit from creating peers across multiple hosts. @menu * API:: * Finer control over peer stop:: * Helper functions:: * Testing with multiple processes:: @end menu @cindex TESTING API @node API @subsection API TESTING abstracts a group of peers as a TESTING system. All peers in a system have common hostname and no two services of these peers have a same port or a UNIX domain socket path. TESTING system can be created with the function @code{GNUNET_TESTING_system_create()} which returns a handle to the system. This function takes a directory path which is used for generating the configurations of peers, an IP address from which connections to the peers' services should be allowed, the hostname to be used in peers' configuration, and an array of shared service specifications of type @code{struct GNUNET_TESTING_SharedService}. The shared service specification must specify the name of the service to share, the configuration pertaining to that shared service and the maximum number of peers that are allowed to share a single instance of the shared service. TESTING system created with @code{GNUNET_TESTING_system_create()} chooses ports from the default range @code{12000} - @code{56000} while auto-generating configurations for peers. This range can be customised with the function @code{GNUNET_TESTING_system_create_with_portrange()}. This function is similar to @code{GNUNET_TESTING_system_create()} except that it take 2 additional parameters --- the start and end of the port range to use. A TESTING system is destroyed with the function @code{GNUNET_TESTING_system_destory()}. This function takes the handle of the system and a flag to remove the files created in the directory used to generate configurations. A peer is created with the function @code{GNUNET_TESTING_peer_configure()}. This functions takes the system handle, a configuration template from which the configuration for the peer is auto-generated and the index from where the hostkey for the peer has to be copied from. When successful, this function returns a handle to the peer which can be used to start and stop it and to obtain the identity of the peer. If unsuccessful, a NULL pointer is returned with an error message. This function handles the generated configuration to have non-conflicting ports and paths. Peers can be started and stopped by calling the functions @code{GNUNET_TESTING_peer_start()} and @code{GNUNET_TESTING_peer_stop()} respectively. A peer can be destroyed by calling the function @code{GNUNET_TESTING_peer_destroy}. When a peer is destroyed, the ports and paths in allocated in its configuration are reclaimed for usage in new peers. @c *********************************************************************** @node Finer control over peer stop @subsection Finer control over peer stop Using @code{GNUNET_TESTING_peer_stop()} is normally fine for testcases. However, calling this function for each peer is inefficient when trying to shutdown multiple peers as this function sends the termination signal to the given peer process and waits for it to terminate. It would be faster in this case to send the termination signals to the peers first and then wait on them. This is accomplished by the functions @code{GNUNET_TESTING_peer_kill()} which sends a termination signal to the peer, and the function @code{GNUNET_TESTING_peer_wait()} which waits on the peer. Further finer control can be achieved by choosing to stop a peer asynchronously with the function @code{GNUNET_TESTING_peer_stop_async()}. This function takes a callback parameter and a closure for it in addition to the handle to the peer to stop. The callback function is called with the given closure when the peer is stopped. Using this function eliminates blocking while waiting for the peer to terminate. An asynchronous peer stop can be canceled by calling the function @code{GNUNET_TESTING_peer_stop_async_cancel()}. Note that calling this function does not prevent the peer from terminating if the termination signal has already been sent to it. It does, however, cancels the callback to be called when the peer is stopped. @c *********************************************************************** @node Helper functions @subsection Helper functions Most of the testcases can benefit from an abstraction which configures a peer and starts it. This is provided by the function @code{GNUNET_TESTING_peer_run()}. This function takes the testing directory pathname, a configuration template, a callback and its closure. This function creates a peer in the given testing directory by using the configuration template, starts the peer and calls the given callback with the given closure. The function @code{GNUNET_TESTING_peer_run()} starts the ARM service of the peer which starts the rest of the configured services. A similar function @code{GNUNET_TESTING_service_run} can be used to just start a single service of a peer. In this case, the peer's ARM service is not started; instead, only the given service is run. @c *********************************************************************** @node Testing with multiple processes @subsection Testing with multiple processes When testing GNUnet, the splitting of the code into a services and clients often complicates testing. The solution to this is to have the testcase fork @code{gnunet-service-arm}, ask it to start the required server and daemon processes and then execute appropriate client actions (to test the client APIs or the core module or both). If necessary, multiple ARM services can be forked using different ports (!) to simulate a network. However, most of the time only one ARM process is needed. Note that on exit, the testcase should shutdown ARM with a @code{TERM} signal (to give it the chance to cleanly stop its child processes). The following code illustrates spawning and killing an ARM process from a testcase: @example static void run (void *cls, char *const *args, const char *cfgfile, const struct GNUNET_CONFIGURATION_Handle *cfg) @{ struct GNUNET_OS_Process *arm_pid; arm_pid = GNUNET_OS_start_process (NULL, NULL, "gnunet-service-arm", "gnunet-service-arm", "-c", cfgname, NULL); /* do real test work here */ if (0 != GNUNET_OS_process_kill (arm_pid, SIGTERM)) GNUNET_log_strerror (GNUNET_ERROR_TYPE_WARNING, "kill"); GNUNET_assert (GNUNET_OK == GNUNET_OS_process_wait (arm_pid)); GNUNET_OS_process_close (arm_pid); @} GNUNET_PROGRAM_run (argc, argv, "NAME-OF-TEST", "nohelp", options, &run, cls); @end example An alternative way that works well to test plugins is to implement a mock-version of the environment that the plugin expects and then to simply load the plugin directly. @c *********************************************************************** @node Performance regression analysis with Gauger @section Performance regression analysis with Gauger To help avoid performance regressions, GNUnet uses Gauger. Gauger is a simple logging tool that allows remote hosts to send performance data to a central server, where this data can be analyzed and visualized. Gauger shows graphs of the repository revisions and the performance data recorded for each revision, so sudden performance peaks or drops can be identified and linked to a specific revision number. In the case of GNUnet, the buildbots log the performance data obtained during the tests after each build. The data can be accessed on GNUnet's Gauger page. The menu on the left allows to select either the results of just one build bot (under "Hosts") or review the data from all hosts for a given test result (under "Metrics"). In case of very different absolute value of the results, for instance arm vs. amd64 machines, the option "Normalize" on a metric view can help to get an idea about the performance evolution across all hosts. Using Gauger in GNUnet and having the performance of a module tracked over time is very easy. First of course, the testcase must generate some consistent metric, which makes sense to have logged. Highly volatile or random dependent metrics probably are not ideal candidates for meaningful regression detection. To start logging any value, just include @code{gauger.h} in your testcase code. Then, use the macro @code{GAUGER()} to make the Buildbots log whatever value is of interest for you to @code{gnunet.org}'s Gauger server. No setup is necessary as most Buildbots have already everything in place and new metrics are created on demand. To delete a metric, you need to contact a member of the GNUnet development team (a file will need to be removed manually from the respective directory). The code in the test should look like this: @example [other includes] #include int main (int argc, char *argv[]) @{ [run test, generate data] GAUGER("YOUR_MODULE", "METRIC_NAME", (float)value, "UNIT"); @} @end example Where: @table @asis @item @strong{YOUR_MODULE} is a category in the gauger page and should be the name of the module or subsystem like "Core" or "DHT" @item @strong{METRIC} is the name of the metric being collected and should be concise and descriptive, like "PUT operations in sqlite-datastore". @item @strong{value} is the value of the metric that is logged for this run. @item @strong{UNIT} is the unit in which the value is measured, for instance "kb/s" or "kb of RAM/node". @end table If you wish to use Gauger for your own project, you can grab a copy of the latest stable release or check out Gauger's Subversion repository. @cindex TESTBED Subsystem @node TESTBED Subsystem @section TESTBED Subsystem The TESTBED subsystem facilitates testing and measuring of multi-peer deployments on a single host or over multiple hosts. The architecture of the testbed module is divided into the following: @itemize @bullet @item Testbed API: An API which is used by the testing driver programs. It provides with functions for creating, destroying, starting, stopping peers, etc. @item Testbed service (controller): A service which is started through the Testbed API. This service handles operations to create, destroy, start, stop peers, connect them, modify their configurations. @item Testbed helper: When a controller has to be started on a host, the testbed API starts the testbed helper on that host which in turn starts the controller. The testbed helper receives a configuration for the controller through its stdin and changes it to ensure the controller doesn't run into any port conflict on that host. @end itemize The testbed service (controller) is different from the other GNUnet services in that it is not started by ARM and is not supposed to be run as a daemon. It is started by the testbed API through a testbed helper. In a typical scenario involving multiple hosts, a controller is started on each host. Controllers take up the actual task of creating peers, starting and stopping them on the hosts they run. While running deployments on a single localhost the testbed API starts the testbed helper directly as a child process. When running deployments on remote hosts the testbed API starts Testbed Helpers on each remote host through remote shell. By default testbed API uses SSH as a remote shell. This can be changed by setting the environmental variable GNUNET_TESTBED_RSH_CMD to the required remote shell program. This variable can also contain parameters which are to be passed to the remote shell program. For e.g: @example export GNUNET_TESTBED_RSH_CMD="ssh -o BatchMode=yes \ -o NoHostAuthenticationForLocalhost=yes %h" @end example Substitutions are allowed in the command string above, this allows for substitutions through placemarks which begin with a `%'. At present the following substitutions are supported @itemize @bullet @item %h: hostname @item %u: username @item %p: port @end itemize Note that the substitution placemark is replaced only when the corresponding field is available and only once. Specifying @example %u@@%h @end example doesn't work either. If you want to user username substitutions for @command{SSH}, use the argument @code{-l} before the username substitution. For example: @example ssh -l %u -p %p %h @end example The testbed API and the helper communicate through the helpers stdin and stdout. As the helper is started through a remote shell on remote hosts any output messages from the remote shell interfere with the communication and results in a failure while starting the helper. For this reason, it is suggested to use flags to make the remote shells produce no output messages and to have password-less logins. The default remote shell, SSH, the default options are: @example -o BatchMode=yes -o NoHostBasedAuthenticationForLocalhost=yes" @end example Password-less logins should be ensured by using SSH keys. Since the testbed API executes the remote shell as a non-interactive shell, certain scripts like .bashrc, .profiler may not be executed. If this is the case testbed API can be forced to execute an interactive shell by setting up the environmental variable @code{GNUNET_TESTBED_RSH_CMD_SUFFIX} to a shell program. An example could be: @example export GNUNET_TESTBED_RSH_CMD_SUFFIX="sh -lc" @end example The testbed API will then execute the remote shell program as: @example $GNUNET_TESTBED_RSH_CMD -p $port $dest $GNUNET_TESTBED_RSH_CMD_SUFFIX \ gnunet-helper-testbed @end example On some systems, problems may arise while starting testbed helpers if GNUnet is installed into a custom location since the helper may not be found in the standard path. This can be addressed by setting the variable `@code{HELPER_BINARY_PATH}' to the path of the testbed helper. Testbed API will then use this path to start helper binaries both locally and remotely. Testbed API can accessed by including the @file{gnunet_testbed_service.h} file and linking with @code{-lgnunettestbed}. @c *********************************************************************** @menu * Supported Topologies:: * Hosts file format:: * Topology file format:: * Testbed Barriers:: * TESTBED Caveats:: @end menu @node Supported Topologies @subsection Supported Topologies While testing multi-peer deployments, it is often needed that the peers are connected in some topology. This requirement is addressed by the function @code{GNUNET_TESTBED_overlay_connect()} which connects any given two peers in the testbed. The API also provides a helper function @code{GNUNET_TESTBED_overlay_configure_topology()} to connect a given set of peers in any of the following supported topologies: @itemize @bullet @item @code{GNUNET_TESTBED_TOPOLOGY_CLIQUE}: All peers are connected with each other @item @code{GNUNET_TESTBED_TOPOLOGY_LINE}: Peers are connected to form a line @item @code{GNUNET_TESTBED_TOPOLOGY_RING}: Peers are connected to form a ring topology @item @code{GNUNET_TESTBED_TOPOLOGY_2D_TORUS}: Peers are connected to form a 2 dimensional torus topology. The number of peers may not be a perfect square, in that case the resulting torus may not have the uniform poloidal and toroidal lengths @item @code{GNUNET_TESTBED_TOPOLOGY_ERDOS_RENYI}: Topology is generated to form a random graph. The number of links to be present should be given @item @code{GNUNET_TESTBED_TOPOLOGY_SMALL_WORLD}: Peers are connected to form a 2D Torus with some random links among them. The number of random links are to be given @item @code{GNUNET_TESTBED_TOPOLOGY_SMALL_WORLD_RING}: Peers are connected to form a ring with some random links among them. The number of random links are to be given @item @code{GNUNET_TESTBED_TOPOLOGY_SCALE_FREE}: Connects peers in a topology where peer connectivity follows power law - new peers are connected with high probability to well connected peers. (See Emergence of Scaling in Random Networks. Science 286, 509-512, 1999 (@uref{https://git.gnunet.org/bibliography.git/plain/docs/emergence_of_scaling_in_random_networks__barabasi_albert_science_286__1999.pdf, pdf})) @item @code{GNUNET_TESTBED_TOPOLOGY_FROM_FILE}: The topology information is loaded from a file. The path to the file has to be given. @xref{Topology file format}, for the format of this file. @item @code{GNUNET_TESTBED_TOPOLOGY_NONE}: No topology @end itemize The above supported topologies can be specified respectively by setting the variable @code{OVERLAY_TOPOLOGY} to the following values in the configuration passed to Testbed API functions @code{GNUNET_TESTBED_test_run()} and @code{GNUNET_TESTBED_run()}: @itemize @bullet @item @code{CLIQUE} @item @code{RING} @item @code{LINE} @item @code{2D_TORUS} @item @code{RANDOM} @item @code{SMALL_WORLD} @item @code{SMALL_WORLD_RING} @item @code{SCALE_FREE} @item @code{FROM_FILE} @item @code{NONE} @end itemize Topologies @code{RANDOM}, @code{SMALL_WORLD} and @code{SMALL_WORLD_RING} require the option @code{OVERLAY_RANDOM_LINKS} to be set to the number of random links to be generated in the configuration. The option will be ignored for the rest of the topologies. Topology @code{SCALE_FREE} requires the options @code{SCALE_FREE_TOPOLOGY_CAP} to be set to the maximum number of peers which can connect to a peer and @code{SCALE_FREE_TOPOLOGY_M} to be set to how many peers a peer should be at least connected to. Similarly, the topology @code{FROM_FILE} requires the option @code{OVERLAY_TOPOLOGY_FILE} to contain the path of the file containing the topology information. This option is ignored for the rest of the topologies. @xref{Topology file format}, for the format of this file. @c *********************************************************************** @node Hosts file format @subsection Hosts file format The testbed API offers the function @code{GNUNET_TESTBED_hosts_load_from_file()} to load from a given file details about the hosts which testbed can use for deploying peers. This function is useful to keep the data about hosts separate instead of hard coding them in code. Another helper function from testbed API, @code{GNUNET_TESTBED_run()} also takes a hosts file name as its parameter. It uses the above function to populate the hosts data structures and start controllers to deploy peers. These functions require the hosts file to be of the following format: @itemize @bullet @item Each line is interpreted to have details about a host @item Host details should include the username to use for logging into the host, the hostname of the host and the port number to use for the remote shell program. All thee values should be given. @item These details should be given in the following format: @example @@: @end example @end itemize Note that having canonical hostnames may cause problems while resolving the IP addresses (See this bug). Hence it is advised to provide the hosts' IP numerical addresses as hostnames whenever possible. @c *********************************************************************** @node Topology file format @subsection Topology file format A topology file describes how peers are to be connected. It should adhere to the following format for testbed to parse it correctly. Each line should begin with the target peer id. This should be followed by a colon(`:') and origin peer ids separated by `|'. All spaces except for newline characters are ignored. The API will then try to connect each origin peer to the target peer. For example, the following file will result in 5 overlay connections: [2->1], [3->1],[4->3], [0->3], [2->0]@ @code{@ 1:2|3@ 3:4| 0@ 0: 2@ } @c *********************************************************************** @node Testbed Barriers @subsection Testbed Barriers The testbed subsystem's barriers API facilitates coordination among the peers run by the testbed and the experiment driver. The concept is similar to the barrier synchronisation mechanism found in parallel programming or multi-threading paradigms - a peer waits at a barrier upon reaching it until the barrier is reached by a predefined number of peers. This predefined number of peers required to cross a barrier is also called quorum. We say a peer has reached a barrier if the peer is waiting for the barrier to be crossed. Similarly a barrier is said to be reached if the required quorum of peers reach the barrier. A barrier which is reached is deemed as crossed after all the peers waiting on it are notified. The barriers API provides the following functions: @itemize @bullet @item @strong{@code{GNUNET_TESTBED_barrier_init()}:} function to initialize a barrier in the experiment @item @strong{@code{GNUNET_TESTBED_barrier_cancel()}:} function to cancel a barrier which has been initialized before @item @strong{@code{GNUNET_TESTBED_barrier_wait()}:} function to signal barrier service that the caller has reached a barrier and is waiting for it to be crossed @item @strong{@code{GNUNET_TESTBED_barrier_wait_cancel()}:} function to stop waiting for a barrier to be crossed @end itemize Among the above functions, the first two, namely @code{GNUNET_TESTBED_barrier_init()} and @code{GNUNET_TESTBED_barrier_cancel()} are used by experiment drivers. All barriers should be initialised by the experiment driver by calling @code{GNUNET_TESTBED_barrier_init()}. This function takes a name to identify the barrier, the quorum required for the barrier to be crossed and a notification callback for notifying the experiment driver when the barrier is crossed. @code{GNUNET_TESTBED_barrier_cancel()} cancels an initialised barrier and frees the resources allocated for it. This function can be called upon a initialised barrier before it is crossed. The remaining two functions @code{GNUNET_TESTBED_barrier_wait()} and @code{GNUNET_TESTBED_barrier_wait_cancel()} are used in the peer's processes. @code{GNUNET_TESTBED_barrier_wait()} connects to the local barrier service running on the same host the peer is running on and registers that the caller has reached the barrier and is waiting for the barrier to be crossed. Note that this function can only be used by peers which are started by testbed as this function tries to access the local barrier service which is part of the testbed controller service. Calling @code{GNUNET_TESTBED_barrier_wait()} on an uninitialised barrier results in failure. @code{GNUNET_TESTBED_barrier_wait_cancel()} cancels the notification registered by @code{GNUNET_TESTBED_barrier_wait()}. @c *********************************************************************** @menu * Implementation:: @end menu @node Implementation @subsubsection Implementation Since barriers involve coordination between experiment driver and peers, the barrier service in the testbed controller is split into two components. The first component responds to the message generated by the barrier API used by the experiment driver (functions @code{GNUNET_TESTBED_barrier_init()} and @code{GNUNET_TESTBED_barrier_cancel()}) and the second component to the messages generated by barrier API used by peers (functions @code{GNUNET_TESTBED_barrier_wait()} and @code{GNUNET_TESTBED_barrier_wait_cancel()}). Calling @code{GNUNET_TESTBED_barrier_init()} sends a @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_INIT} message to the master controller. The master controller then registers a barrier and calls @code{GNUNET_TESTBED_barrier_init()} for each its subcontrollers. In this way barrier initialisation is propagated to the controller hierarchy. While propagating initialisation, any errors at a subcontroller such as timeout during further propagation are reported up the hierarchy back to the experiment driver. Similar to @code{GNUNET_TESTBED_barrier_init()}, @code{GNUNET_TESTBED_barrier_cancel()} propagates @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_CANCEL} message which causes controllers to remove an initialised barrier. The second component is implemented as a separate service in the binary `gnunet-service-testbed' which already has the testbed controller service. Although this deviates from the gnunet process architecture of having one service per binary, it is needed in this case as this component needs access to barrier data created by the first component. This component responds to @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} messages from local peers when they call @code{GNUNET_TESTBED_barrier_wait()}. Upon receiving @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} message, the service checks if the requested barrier has been initialised before and if it was not initialised, an error status is sent through @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message to the local peer and the connection from the peer is terminated. If the barrier is initialised before, the barrier's counter for reached peers is incremented and a notification is registered to notify the peer when the barrier is reached. The connection from the peer is left open. When enough peers required to attain the quorum send @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} messages, the controller sends a @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message to its parent informing that the barrier is crossed. If the controller has started further subcontrollers, it delays this message until it receives a similar notification from each of those subcontrollers. Finally, the barriers API at the experiment driver receives the @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} when the barrier is reached at all the controllers. The barriers API at the experiment driver responds to the @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message by echoing it back to the master controller and notifying the experiment controller through the notification callback that a barrier has been crossed. The echoed @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message is propagated by the master controller to the controller hierarchy. This propagation triggers the notifications registered by peers at each of the controllers in the hierarchy. Note the difference between this downward propagation of the @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message from its upward propagation --- the upward propagation is needed for ensuring that the barrier is reached by all the controllers and the downward propagation is for triggering that the barrier is crossed. @cindex TESTBED Caveats @node TESTBED Caveats @subsection TESTBED Caveats This section documents a few caveats when using the GNUnet testbed subsystem. @c *********************************************************************** @menu * CORE must be started:: * ATS must want the connections:: @end menu @node CORE must be started @subsubsection CORE must be started A uncomplicated issue is bug #3993 (@uref{https://bugs.gnunet.org/view.php?id=3993, https://bugs.gnunet.org/view.php?id=3993}): Your configuration MUST somehow ensure that for each peer the @code{CORE} service is started when the peer is setup, otherwise @code{TESTBED} may fail to connect peers when the topology is initialized, as @code{TESTBED} will start some @code{CORE} services but not necessarily all (but it relies on all of them running). The easiest way is to set @example [core] IMMEDIATE_START = YES @end example @noindent in the configuration file. Alternatively, having any service that directly or indirectly depends on @code{CORE} being started with @code{IMMEDIATE_START} will also do. This issue largely arises if users try to over-optimize by not starting any services with @code{IMMEDIATE_START}. @c *********************************************************************** @node ATS must want the connections @subsubsection ATS must want the connections When TESTBED sets up connections, it only offers the respective HELLO information to the TRANSPORT service. It is then up to the ATS service to @strong{decide} to use the connection. The ATS service will typically eagerly establish any connection if the number of total connections is low (relative to bandwidth). Details may further depend on the specific ATS backend that was configured. If ATS decides to NOT establish a connection (even though TESTBED provided the required information), then that connection will count as failed for TESTBED. Note that you can configure TESTBED to tolerate a certain number of connection failures (see '-e' option of gnunet-testbed-profiler). This issue largely arises for dense overlay topologies, especially if you try to create cliques with more than 20 peers. @cindex libgnunetutil @node libgnunetutil @section libgnunetutil libgnunetutil is the fundamental library that all GNUnet code builds upon. Ideally, this library should contain most of the platform dependent code (except for user interfaces and really special needs that only few applications have). It is also supposed to offer basic services that most if not all GNUnet binaries require. The code of libgnunetutil is in the @file{src/util/} directory. The public interface to the library is in the gnunet_util.h header. The functions provided by libgnunetutil fall roughly into the following categories (in roughly the order of importance for new developers): @itemize @bullet @item logging (common_logging.c) @item memory allocation (common_allocation.c) @item endianess conversion (common_endian.c) @item internationalization (common_gettext.c) @item String manipulation (string.c) @item file access (disk.c) @item buffered disk IO (bio.c) @item time manipulation (time.c) @item configuration parsing (configuration.c) @item command-line handling (getopt*.c) @item cryptography (crypto_*.c) @item data structures (container_*.c) @item CPS-style scheduling (scheduler.c) @item Program initialization (program.c) @item Networking (network.c, client.c, server*.c, service.c) @item message queuing (mq.c) @item bandwidth calculations (bandwidth.c) @item Other OS-related (os*.c, plugin.c, signal.c) @item Pseudonym management (pseudonym.c) @end itemize It should be noted that only developers that fully understand this entire API will be able to write good GNUnet code. Ideally, porting GNUnet should only require porting the gnunetutil library. More testcases for the gnunetutil APIs are therefore a great way to make porting of GNUnet easier. @menu * Logging:: * Interprocess communication API (IPC):: * Cryptography API:: * Message Queue API:: * Service API:: * Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps:: * CONTAINER_MDLL API:: @end menu @cindex Logging @cindex log levels @node Logging @subsection Logging GNUnet is able to log its activity, mostly for the purposes of debugging the program at various levels. @file{gnunet_common.h} defines several @strong{log levels}: @table @asis @item ERROR for errors (really problematic situations, often leading to crashes) @item WARNING for warnings (troubling situations that might have negative consequences, although not fatal) @item INFO for various information. Used somewhat rarely, as GNUnet statistics is used to hold and display most of the information that users might find interesting. @item DEBUG for debugging. Does not produce much output on normal builds, but when extra logging is enabled at compile time, a staggering amount of data is outputted under this log level. @end table Normal builds of GNUnet (configured with @code{--enable-logging[=yes]}) are supposed to log nothing under DEBUG level. The @code{--enable-logging=verbose} configure option can be used to create a build with all logging enabled. However, such build will produce large amounts of log data, which is inconvenient when one tries to hunt down a specific problem. To mitigate this problem, GNUnet provides facilities to apply a filter to reduce the logs: @table @asis @item Logging by default When no log levels are configured in any other way (see below), GNUnet will default to the WARNING log level. This mostly applies to GNUnet command line utilities, services and daemons; tests will always set log level to WARNING or, if @code{--enable-logging=verbose} was passed to configure, to DEBUG. The default level is suggested for normal operation. @item The -L option Most GNUnet executables accept an "-L loglevel" or "--log=loglevel" option. If used, it makes the process set a global log level to "loglevel". Thus it is possible to run some processes with -L DEBUG, for example, and others with -L ERROR to enable specific settings to diagnose problems with a particular process. @item Configuration files. Because GNUnet service and daemon processes are usually launched by gnunet-arm, it is not possible to pass different custom command line options directly to every one of them. The options passed to @code{gnunet-arm} only affect gnunet-arm and not the rest of GNUnet. However, one can specify a configuration key "OPTIONS" in the section that corresponds to a service or a daemon, and put a value of "-L loglevel" there. This will make the respective service or daemon set its log level to "loglevel" (as the value of OPTIONS will be passed as a command-line argument). To specify the same log level for all services without creating separate "OPTIONS" entries in the configuration for each one, the user can specify a config key "GLOBAL_POSTFIX" in the [arm] section of the configuration file. The value of GLOBAL_POSTFIX will be appended to all command lines used by the ARM service to run other services. It can contain any option valid for all GNUnet commands, thus in particular the "-L loglevel" option. The ARM service itself is, however, unaffected by GLOBAL_POSTFIX; to set log level for it, one has to specify "OPTIONS" key in the [arm] section. @item Environment variables. Setting global per-process log levels with "-L loglevel" does not offer sufficient log filtering granularity, as one service will call interface libraries and supporting libraries of other GNUnet services, potentially producing lots of debug log messages from these libraries. Also, changing the config file is not always convenient (especially when running the GNUnet test suite).@ To fix that, and to allow GNUnet to use different log filtering at runtime without re-compiling the whole source tree, the log calls were changed to be configurable at run time. To configure them one has to define environment variables "GNUNET_FORCE_LOGFILE", "GNUNET_LOG" and/or "GNUNET_FORCE_LOG": @itemize @bullet @item "GNUNET_LOG" only affects the logging when no global log level is configured by any other means (that is, the process does not explicitly set its own log level, there are no "-L loglevel" options on command line or in configuration files), and can be used to override the default WARNING log level. @item "GNUNET_FORCE_LOG" will completely override any other log configuration options given. @item "GNUNET_FORCE_LOGFILE" will completely override the location of the file to log messages to. It should contain a relative or absolute file name. Setting GNUNET_FORCE_LOGFILE is equivalent to passing "--log-file=logfile" or "-l logfile" option (see below). It supports "[]" format in file names, but not "@{@}" (see below). @end itemize Because environment variables are inherited by child processes when they are launched, starting or re-starting the ARM service with these variables will propagate them to all other services. "GNUNET_LOG" and "GNUNET_FORCE_LOG" variables must contain a specially formatted @strong{logging definition} string, which looks like this:@ @c FIXME: Can we close this with [/component] instead? @example [component];[file];[function];[from_line[-to_line]];loglevel[/component...] @end example That is, a logging definition consists of definition entries, separated by slashes ('/'). If only one entry is present, there is no need to add a slash to its end (although it is not forbidden either).@ All definition fields (component, file, function, lines and loglevel) are mandatory, but (except for the loglevel) they can be empty. An empty field means "match anything". Note that even if fields are empty, the semicolon (';') separators must be present.@ The loglevel field is mandatory, and must contain one of the log level names (ERROR, WARNING, INFO or DEBUG).@ The lines field might contain one non-negative number, in which case it matches only one line, or a range "from_line-to_line", in which case it matches any line in the interval [from_line;to_line] (that is, including both start and end line).@ GNUnet mostly defaults component name to the name of the service that is implemented in a process ('transport', 'core', 'peerinfo', etc), but logging calls can specify custom component names using @code{GNUNET_log_from}.@ File name and function name are provided by the compiler (__FILE__ and __FUNCTION__ built-ins). Component, file and function fields are interpreted as non-extended regular expressions (GNU libc regex functions are used). Matching is case-sensitive, "^" and "$" will match the beginning and the end of the text. If a field is empty, its contents are automatically replaced with a ".*" regular expression, which matches anything. Matching is done in the default way, which means that the expression matches as long as it's contained anywhere in the string. Thus "GNUNET_" will match both "GNUNET_foo" and "BAR_GNUNET_BAZ". Use '^' and/or '$' to make sure that the expression matches at the start and/or at the end of the string. The semicolon (';') can't be escaped, and GNUnet will not use it in component names (it can't be used in function names and file names anyway). @end table Every logging call in GNUnet code will be (at run time) matched against the log definitions passed to the process. If a log definition fields are matching the call arguments, then the call log level is compared the the log level of that definition. If the call log level is less or equal to the definition log level, the call is allowed to proceed. Otherwise the logging call is forbidden, and nothing is logged. If no definitions matched at all, GNUnet will use the global log level or (if a global log level is not specified) will default to WARNING (that is, it will allow the call to proceed, if its level is less or equal to the global log level or to WARNING). That is, definitions are evaluated from left to right, and the first matching definition is used to allow or deny the logging call. Thus it is advised to place narrow definitions at the beginning of the logdef string, and generic definitions - at the end. Whether a call is allowed or not is only decided the first time this particular call is made. The evaluation result is then cached, so that any attempts to make the same call later will be allowed or disallowed right away. Because of that runtime log level evaluation should not significantly affect the process performance. Log definition parsing is only done once, at the first call to @code{GNUNET_log_setup ()} made by the process (which is usually done soon after it starts). At the moment of writing there is no way to specify logging definitions from configuration files, only via environment variables. At the moment GNUnet will stop processing a log definition when it encounters an error in definition formatting or an error in regular expression syntax, and will not report the failure in any way. @c *********************************************************************** @menu * Examples:: * Log files:: * Updated behavior of GNUNET_log:: @end menu @node Examples @subsubsection Examples @table @asis @item @code{GNUNET_FORCE_LOG=";;;;DEBUG" gnunet-arm -s} Start GNUnet process tree, running all processes with DEBUG level (one should be careful with it, as log files will grow at alarming rate!) @item @code{GNUNET_FORCE_LOG="core;;;;DEBUG" gnunet-arm -s} Start GNUnet process tree, running the core service under DEBUG level (everything else will use configured or default level). @item Start GNUnet process tree, allowing any logging calls from gnunet-service-transport_validation.c (everything else will use configured or default level). @example GNUNET_FORCE_LOG=";gnunet-service-transport_validation.c;;; DEBUG" \ gnunet-arm -s @end example @item Start GNUnet process tree, allowing any logging calls from gnunet-gnunet-service-fs_push.c (everything else will use configured or default level). @example GNUNET_FORCE_LOG="fs;gnunet-service-fs_push.c;;;DEBUG" gnunet-arm -s @end example @item Start GNUnet process tree, allowing any logging calls from the GNUNET_NETWORK_socket_select function (everything else will use configured or default level). @example GNUNET_FORCE_LOG=";;GNUNET_NETWORK_socket_select;;DEBUG" gnunet-arm -s @end example @item Start GNUnet process tree, allowing any logging calls from the components that have "transport" in their names, and are made from function that have "send" in their names. Everything else will be allowed to be logged only if it has WARNING level. @example GNUNET_FORCE_LOG="transport.*;;.*send.*;;DEBUG/;;;;WARNING" gnunet-arm -s @end example @end table On Windows, one can use batch files to run GNUnet processes with special environment variables, without affecting the whole system. Such batch file will look like this: @example set GNUNET_FORCE_LOG=;;do_transmit;;DEBUG@ gnunet-arm -s @end example (note the absence of double quotes in the environment variable definition, as opposed to earlier examples, which use the shell). Another limitation, on Windows, GNUNET_FORCE_LOGFILE @strong{MUST} be set in order to GNUNET_FORCE_LOG to work. @cindex Log files @node Log files @subsubsection Log files GNUnet can be told to log everything into a file instead of stderr (which is the default) using the "--log-file=logfile" or "-l logfile" option. This option can also be passed via command line, or from the "OPTION" and "GLOBAL_POSTFIX" configuration keys (see above). The file name passed with this option is subject to GNUnet filename expansion. If specified in "GLOBAL_POSTFIX", it is also subject to ARM service filename expansion, in particular, it may contain "@{@}" (left and right curly brace) sequence, which will be replaced by ARM with the name of the service. This is used to keep logs from more than one service separate, while only specifying one template containing "@{@}" in GLOBAL_POSTFIX. As part of a secondary file name expansion, the first occurrence of "[]" sequence ("left square brace" followed by "right square brace") in the file name will be replaced with a process identifier or the process when it initializes its logging subsystem. As a result, all processes will log into different files. This is convenient for isolating messages of a particular process, and prevents I/O races when multiple processes try to write into the file at the same time. This expansion is done independently of "@{@}" expansion that ARM service does (see above). The log file name that is specified via "-l" can contain format characters from the 'strftime' function family. For example, "%Y" will be replaced with the current year. Using "basename-%Y-%m-%d.log" would include the current year, month and day in the log file. If a GNUnet process runs for long enough to need more than one log file, it will eventually clean up old log files. Currently, only the last three log files (plus the current log file) are preserved. So once the fifth log file goes into use (so after 4 days if you use "%Y-%m-%d" as above), the first log file will be automatically deleted. Note that if your log file name only contains "%Y", then log files would be kept for 4 years and the logs from the first year would be deleted once year 5 begins. If you do not use any date-related string format codes, logs would never be automatically deleted by GNUnet. @c *********************************************************************** @node Updated behavior of GNUNET_log @subsubsection Updated behavior of GNUNET_log It's currently quite common to see constructions like this all over the code: @example #if MESH_DEBUG GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, "MESH: client disconnected\n"); #endif @end example The reason for the #if is not to avoid displaying the message when disabled (GNUNET_ERROR_TYPE takes care of that), but to avoid the compiler including it in the binary at all, when compiling GNUnet for platforms with restricted storage space / memory (MIPS routers, ARM plug computers / dev boards, etc). This presents several problems: the code gets ugly, hard to write and it is very easy to forget to include the #if guards, creating non-consistent code. A new change in GNUNET_log aims to solve these problems. @strong{This change requires to @file{./configure} with at least @code{--enable-logging=verbose} to see debug messages.} Here is an example of code with dense debug statements: @example switch (restrict_topology) @{ case GNUNET_TESTING_TOPOLOGY_CLIQUE:#if VERBOSE_TESTING GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, _("Blacklisting all but clique topology\n")); #endif unblacklisted_connections = create_clique (pg, &remove_connections, BLACKLIST, GNUNET_NO); break; case GNUNET_TESTING_TOPOLOGY_SMALL_WORLD_RING: #if VERBOSE_TESTING GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, _("Blacklisting all but small world (ring) topology\n")); #endif unblacklisted_connections = create_small_world_ring (pg,&remove_connections, BLACKLIST); break; @end example Pretty hard to follow, huh? From now on, it is not necessary to include the #if / #endif statements to achieve the same behavior. The @code{GNUNET_log} and @code{GNUNET_log_from} macros take care of it for you, depending on the configure option: @itemize @bullet @item If @code{--enable-logging} is set to @code{no}, the binary will contain no log messages at all. @item If @code{--enable-logging} is set to @code{yes}, the binary will contain no DEBUG messages, and therefore running with @command{-L DEBUG} will have no effect. Other messages (ERROR, WARNING, INFO, etc) will be included. @item If @code{--enable-logging} is set to @code{verbose}, or @code{veryverbose} the binary will contain DEBUG messages (still, it will be necessary to run with @command{-L DEBUG} or set the DEBUG config option to show them). @end itemize If you are a developer: @itemize @bullet @item please make sure that you @code{./configure --enable-logging=@{verbose,veryverbose@}}, so you can see DEBUG messages. @item please remove the @code{#if} statements around @code{GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, ...)} lines, to improve the readability of your code. @end itemize Since now activating DEBUG automatically makes it VERBOSE and activates @strong{all} debug messages by default, you probably want to use the @uref{https://docs.gnunet.org/#Logging, https://docs.gnunet.org/#Logging} functionality to filter only relevant messages. A suitable configuration could be: @example $ export GNUNET_FORCE_LOG="^YOUR_SUBSYSTEM$;;;;DEBUG/;;;;WARNING" @end example Which will behave almost like enabling DEBUG in that subsystem before the change. Of course you can adapt it to your particular needs, this is only a quick example. @cindex Interprocess communication API @cindex ICP @node Interprocess communication API (IPC) @subsection Interprocess communication API (IPC) In GNUnet a variety of new message types might be defined and used in interprocess communication, in this tutorial we use the @code{struct AddressLookupMessage} as a example to introduce how to construct our own message type in GNUnet and how to implement the message communication between service and client. (Here, a client uses the @code{struct AddressLookupMessage} as a request to ask the server to return the address of any other peer connecting to the service.) @c *********************************************************************** @menu * Define new message types:: * Define message struct:: * Client - Establish connection:: * Client - Initialize request message:: * Client - Send request and receive response:: * Server - Startup service:: * Server - Add new handles for specified messages:: * Server - Process request message:: * Server - Response to client:: * Server - Notification of clients:: * Conversion between Network Byte Order (Big Endian) and Host Byte Order:: @end menu @node Define new message types @subsubsection Define new message types First of all, you should define the new message type in @file{gnunet_protocols.h}: @example // Request to look addresses of peers in server. #define GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP 29 // Response to the address lookup request. #define GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY 30 @end example @c *********************************************************************** @node Define message struct @subsubsection Define message struct After the type definition, the specified message structure should also be described in the header file, e.g. transport.h in our case. @example struct AddressLookupMessage @{ struct GNUNET_MessageHeader header; int32_t numeric_only GNUNET_PACKED; struct GNUNET_TIME_AbsoluteNBO timeout; uint32_t addrlen GNUNET_PACKED; /* followed by 'addrlen' bytes of the actual address, then followed by the 0-terminated name of the transport */ @}; GNUNET_NETWORK_STRUCT_END @end example Please note @code{GNUNET_NETWORK_STRUCT_BEGIN} and @code{GNUNET_PACKED} which both ensure correct alignment when sending structs over the network. @menu @end menu @c *********************************************************************** @node Client - Establish connection @subsubsection Client - Establish connection At first, on the client side, the underlying API is employed to create a new connection to a service, in our example the transport service would be connected. @example struct GNUNET_CLIENT_Connection *client; client = GNUNET_CLIENT_connect ("transport", cfg); @end example @c *********************************************************************** @node Client - Initialize request message @subsubsection Client - Initialize request message When the connection is ready, we initialize the message. In this step, all the fields of the message should be properly initialized, namely the size, type, and some extra user-defined data, such as timeout, name of transport, address and name of transport. @example struct AddressLookupMessage *msg; size_t len = sizeof (struct AddressLookupMessage) + addressLen + strlen (nameTrans) + 1; msg->header->size = htons (len); msg->header->type = htons (GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP); msg->timeout = GNUNET_TIME_absolute_hton (abs_timeout); msg->addrlen = htonl (addressLen); char *addrbuf = (char *) &msg[1]; memcpy (addrbuf, address, addressLen); char *tbuf = &addrbuf[addressLen]; memcpy (tbuf, nameTrans, strlen (nameTrans) + 1); @end example Note that, here the functions @code{htonl}, @code{htons} and @code{GNUNET_TIME_absolute_hton} are applied to convert little endian into big endian, about the usage of the big/small endian order and the corresponding conversion function please refer to Introduction of Big Endian and Little Endian. @c *********************************************************************** @node Client - Send request and receive response @subsubsection Client - Send request and receive response @b{FIXME: This is very outdated, see the tutorial for the current API!} Next, the client would send the constructed message as a request to the service and wait for the response from the service. To accomplish this goal, there are a number of API calls that can be used. In this example, @code{GNUNET_CLIENT_transmit_and_get_response} is chosen as the most appropriate function to use. @example GNUNET_CLIENT_transmit_and_get_response (client, msg->header, timeout, GNUNET_YES, &address_response_processor, arp_ctx); @end example the argument @code{address_response_processor} is a function with @code{GNUNET_CLIENT_MessageHandler} type, which is used to process the reply message from the service. @node Server - Startup service @subsubsection Server - Startup service After receiving the request message, we run a standard GNUnet service startup sequence using @code{GNUNET_SERVICE_run}, as follows, @example int main(int argc, char**argv) @{ GNUNET_SERVICE_run(argc, argv, "transport" GNUNET_SERVICE_OPTION_NONE, &run, NULL)); @} @end example @c *********************************************************************** @node Server - Add new handles for specified messages @subsubsection Server - Add new handles for specified messages in the function above the argument @code{run} is used to initiate transport service,and defined like this: @example static void run (void *cls, struct GNUNET_SERVER_Handle *serv, const struct GNUNET_CONFIGURATION_Handle *cfg) @{ GNUNET_SERVER_add_handlers (serv, handlers); @} @end example Here, @code{GNUNET_SERVER_add_handlers} must be called in the run function to add new handlers in the service. The parameter @code{handlers} is a list of @code{struct GNUNET_SERVER_MessageHandler} to tell the service which function should be called when a particular type of message is received, and should be defined in this way: @example static struct GNUNET_SERVER_MessageHandler handlers[] = @{ @{&handle_start, NULL, GNUNET_MESSAGE_TYPE_TRANSPORT_START, 0@}, @{&handle_send, NULL, GNUNET_MESSAGE_TYPE_TRANSPORT_SEND, 0@}, @{&handle_try_connect, NULL, GNUNET_MESSAGE_TYPE_TRANSPORT_TRY_CONNECT, sizeof (struct TryConnectMessage) @}, @{&handle_address_lookup, NULL, GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP, 0@}, @{NULL, NULL, 0, 0@} @}; @end example As shown, the first member of the struct in the first area is a callback function, which is called to process the specified message types, given as the third member. The second parameter is the closure for the callback function, which is set to @code{NULL} in most cases, and the last parameter is the expected size of the message of this type, usually we set it to 0 to accept variable size, for special cases the exact size of the specified message also can be set. In addition, the terminator sign depicted as @code{@{NULL, NULL, 0, 0@}} is set in the last area. @c *********************************************************************** @node Server - Process request message @subsubsection Server - Process request message After the initialization of transport service, the request message would be processed. Before handling the main message data, the validity of this message should be checked out, e.g., to check whether the size of message is correct. @example size = ntohs (message->size); if (size < sizeof (struct AddressLookupMessage)) @{ GNUNET_break_op (0); GNUNET_SERVER_receive_done (client, GNUNET_SYSERR); return; @} @end example Note that, opposite to the construction method of the request message in the client, in the server the function @code{nothl} and @code{ntohs} should be employed during the extraction of the data from the message, so that the data in big endian order can be converted back into little endian order. See more in detail please refer to Introduction of Big Endian and Little Endian. Moreover in this example, the name of the transport stored in the message is a 0-terminated string, so we should also check whether the name of the transport in the received message is 0-terminated: @example nameTransport = (const char *) &address[addressLen]; if (nameTransport[size - sizeof (struct AddressLookupMessage) - addressLen - 1] != '\0') @{ GNUNET_break_op (0); GNUNET_SERVER_receive_done (client, GNUNET_SYSERR); return; @} @end example Here, @code{GNUNET_SERVER_receive_done} should be called to tell the service that the request is done and can receive the next message. The argument @code{GNUNET_SYSERR} here indicates that the service didn't understand the request message, and the processing of this request would be terminated. In comparison to the aforementioned situation, when the argument is equal to @code{GNUNET_OK}, the service would continue to process the request message. @c *********************************************************************** @node Server - Response to client @subsubsection Server - Response to client Once the processing of current request is done, the server should give the response to the client. A new @code{struct AddressLookupMessage} would be produced by the server in a similar way as the client did and sent to the client, but here the type should be @code{GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY} rather than @code{GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP} in client. @example struct AddressLookupMessage *msg; size_t len = sizeof (struct AddressLookupMessage) + addressLen + strlen (nameTrans) + 1; msg->header->size = htons (len); msg->header->type = htons (GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY); // ... struct GNUNET_SERVER_TransmitContext *tc; tc = GNUNET_SERVER_transmit_context_create (client); GNUNET_SERVER_transmit_context_append_data (tc, NULL, 0, GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY); GNUNET_SERVER_transmit_context_run (tc, rtimeout); @end example Note that, there are also a number of other APIs provided to the service to send the message. @c *********************************************************************** @node Server - Notification of clients @subsubsection Server - Notification of clients Often a service needs to (repeatedly) transmit notifications to a client or a group of clients. In these cases, the client typically has once registered for a set of events and then needs to receive a message whenever such an event happens (until the client disconnects). The use of a notification context can help manage message queues to clients and handle disconnects. Notification contexts can be used to send individualized messages to a particular client or to broadcast messages to a group of clients. An individualized notification might look like this: @example GNUNET_SERVER_notification_context_unicast(nc, client, msg, GNUNET_YES); @end example Note that after processing the original registration message for notifications, the server code still typically needs to call @code{GNUNET_SERVER_receive_done} so that the client can transmit further messages to the server. @c *********************************************************************** @node Conversion between Network Byte Order (Big Endian) and Host Byte Order @subsubsection Conversion between Network Byte Order (Big Endian) and Host Byte Order @c %** subsub? it's a referenced page on the ipc document. Here we can simply comprehend big endian and little endian as Network Byte Order and Host Byte Order respectively. What is the difference between both two? Usually in our host computer we store the data byte as Host Byte Order, for example, we store a integer in the RAM which might occupies 4 Byte, as Host Byte Order the higher Byte would be stored at the lower address of RAM, and the lower Byte would be stored at the higher address of RAM. However, contrast to this, Network Byte Order just take the totally opposite way to store the data, says, it will store the lower Byte at the lower address, and the higher Byte will stay at higher address. For the current communication of network, we normally exchange the information by surveying the data package, every two host wants to communicate with each other must send and receive data package through network. In order to maintain the identity of data through the transmission in the network, the order of the Byte storage must changed before sending and after receiving the data. There ten convenient functions to realize the conversion of Byte Order in GNUnet, as following: @table @asis @item uint16_t htons(uint16_t hostshort) Convert host byte order to net byte order with short int @item uint32_t htonl(uint32_t hostlong) Convert host byte order to net byte order with long int @item uint16_t ntohs(uint16_t netshort) Convert net byte order to host byte order with short int @item uint32_t ntohl(uint32_t netlong) Convert net byte order to host byte order with long int @item unsigned long long GNUNET_ntohll (unsigned long long netlonglong) Convert net byte order to host byte order with long long int @item unsigned long long GNUNET_htonll (unsigned long long hostlonglong) Convert host byte order to net byte order with long long int @item struct GNUNET_TIME_RelativeNBO GNUNET_TIME_relative_hton (struct GNUNET_TIME_Relative a) Convert relative time to network byte order. @item struct GNUNET_TIME_Relative GNUNET_TIME_relative_ntoh (struct GNUNET_TIME_RelativeNBO a) Convert relative time from network byte order. @item struct GNUNET_TIME_AbsoluteNBO GNUNET_TIME_absolute_hton (struct GNUNET_TIME_Absolute a) Convert relative time to network byte order. @item struct GNUNET_TIME_Absolute GNUNET_TIME_absolute_ntoh (struct GNUNET_TIME_AbsoluteNBO a) Convert relative time from network byte order. @end table @cindex Cryptography API @node Cryptography API @subsection Cryptography API The gnunetutil APIs provides the cryptographic primitives used in GNUnet. GNUnet uses 2048 bit RSA keys for the session key exchange and for signing messages by peers and most other public-key operations. Most researchers in cryptography consider 2048 bit RSA keys as secure and practically unbreakable for a long time. The API provides functions to create a fresh key pair, read a private key from a file (or create a new file if the file does not exist), encrypt, decrypt, sign, verify and extraction of the public key into a format suitable for network transmission. For the encryption of files and the actual data exchanged between peers GNUnet uses 256-bit AES encryption. Fresh, session keys are negotiated for every new connection.@ Again, there is no published technique to break this cipher in any realistic amount of time. The API provides functions for generation of keys, validation of keys (important for checking that decryptions using RSA succeeded), encryption and decryption. GNUnet uses SHA-512 for computing one-way hash codes. The API provides functions to compute a hash over a block in memory or over a file on disk. The crypto API also provides functions for randomizing a block of memory, obtaining a single random number and for generating a permutation of the numbers 0 to n-1. Random number generation distinguishes between WEAK and STRONG random number quality; WEAK random numbers are pseudo-random whereas STRONG random numbers use entropy gathered from the operating system. Finally, the crypto API provides a means to deterministically generate a 1024-bit RSA key from a hash code. These functions should most likely not be used by most applications; most importantly, GNUNET_CRYPTO_rsa_key_create_from_hash does not create an RSA-key that should be considered secure for traditional applications of RSA. @cindex Message Queue API @node Message Queue API @subsection Message Queue API @strong{ Introduction }@ Often, applications need to queue messages that are to be sent to other GNUnet peers, clients or services. As all of GNUnet's message-based communication APIs, by design, do not allow messages to be queued, it is common to implement custom message queues manually when they are needed. However, writing very similar code in multiple places is tedious and leads to code duplication. MQ (for Message Queue) is an API that provides the functionality to implement and use message queues. We intend to eventually replace all of the custom message queue implementations in GNUnet with MQ. @strong{ Basic Concepts }@ The two most important entities in MQ are queues and envelopes. Every queue is backed by a specific implementation (e.g. for mesh, stream, connection, server client, etc.) that will actually deliver the queued messages. For convenience,@ some queues also allow to specify a list of message handlers. The message queue will then also wait for incoming messages and dispatch them appropriately. An envelope holds the the memory for a message, as well as metadata (Where is the envelope queued? What should happen after it has been sent?). Any envelope can only be queued in one message queue. @strong{ Creating Queues }@ The following is a list of currently available message queues. Note that to avoid layering issues, message queues for higher level APIs are not part of @code{libgnunetutil}, but@ the respective API itself provides the queue implementation. @table @asis @item @code{GNUNET_MQ_queue_for_connection_client} Transmits queued messages over a @code{GNUNET_CLIENT_Connection} handle. Also supports receiving with message handlers. @item @code{GNUNET_MQ_queue_for_server_client} Transmits queued messages over a @code{GNUNET_SERVER_Client} handle. Does not support incoming message handlers. @item @code{GNUNET_MESH_mq_create} Transmits queued messages over a @code{GNUNET_MESH_Tunnel} handle. Does not support incoming message handlers. @item @code{GNUNET_MQ_queue_for_callbacks} This is the most general implementation. Instead of delivering and receiving messages with one of GNUnet's communication APIs, implementation callbacks are called. Refer to "Implementing Queues" for a more detailed explanation. @end table @strong{ Allocating Envelopes }@ A GNUnet message (as defined by the GNUNET_MessageHeader) has three parts: The size, the type, and the body. MQ provides macros to allocate an envelope containing a message conveniently, automatically setting the size and type fields of the message. Consider the following simple message, with the body consisting of a single number value. @c why the empty code function? @code{} @example struct NumberMessage @{ /** Type: GNUNET_MESSAGE_TYPE_EXAMPLE_1 */ struct GNUNET_MessageHeader header; uint32_t number GNUNET_PACKED; @}; @end example An envelope containing an instance of the NumberMessage can be constructed like this: @example struct GNUNET_MQ_Envelope *ev; struct NumberMessage *msg; ev = GNUNET_MQ_msg (msg, GNUNET_MESSAGE_TYPE_EXAMPLE_1); msg->number = htonl (42); @end example In the above code, @code{GNUNET_MQ_msg} is a macro. The return value is the newly allocated envelope. The first argument must be a pointer to some @code{struct} containing a @code{struct GNUNET_MessageHeader header} field, while the second argument is the desired message type, in host byte order. The @code{msg} pointer now points to an allocated message, where the message type and the message size are already set. The message's size is inferred from the type of the @code{msg} pointer: It will be set to 'sizeof(*msg)', properly converted to network byte order. If the message body's size is dynamic, the the macro @code{GNUNET_MQ_msg_extra} can be used to allocate an envelope whose message has additional space allocated after the @code{msg} structure. If no structure has been defined for the message, @code{GNUNET_MQ_msg_header_extra} can be used to allocate additional space after the message header. The first argument then must be a pointer to a @code{GNUNET_MessageHeader}. @strong{Envelope Properties}@ A few functions in MQ allow to set additional properties on envelopes: @table @asis @item @code{GNUNET_MQ_notify_sent} Allows to specify a function that will be called once the envelope's message has been sent irrevocably. An envelope can be canceled precisely up to the@ point where the notify sent callback has been called. @item @code{GNUNET_MQ_disable_corking} No corking will be used when sending the message. Not every@ queue supports this flag, per default, envelopes are sent with corking.@ @end table @strong{Sending Envelopes}@ Once an envelope has been constructed, it can be queued for sending with @code{GNUNET_MQ_send}. Note that in order to avoid memory leaks, an envelope must either be sent (the queue will free it) or destroyed explicitly with @code{GNUNET_MQ_discard}. @strong{Canceling Envelopes}@ An envelope queued with @code{GNUNET_MQ_send} can be canceled with @code{GNUNET_MQ_cancel}. Note that after the notify sent callback has been called, canceling a message results in undefined behavior. Thus it is unsafe to cancel an envelope that does not have a notify sent callback. When canceling an envelope, it is not necessary@ to call @code{GNUNET_MQ_discard}, and the envelope can't be sent again. @strong{ Implementing Queues }@ @code{TODO} @cindex Service API @node Service API @subsection Service API Most GNUnet code lives in the form of services. Services are processes that offer an API for other components of the system to build on. Those other components can be command-line tools for users, graphical user interfaces or other services. Services provide their API using an IPC protocol. For this, each service must listen on either a TCP port or a UNIX domain socket; for this, the service implementation uses the server API. This use of server is exposed directly to the users of the service API. Thus, when using the service API, one is usually also often using large parts of the server API. The service API provides various convenience functions, such as parsing command-line arguments and the configuration file, which are not found in the server API. The dual to the service/server API is the client API, which can be used to access services. The most common way to start a service is to use the @code{GNUNET_SERVICE_run} function from the program's main function. @code{GNUNET_SERVICE_run} will then parse the command line and configuration files and, based on the options found there, start the server. It will then give back control to the main program, passing the server and the configuration to the @code{GNUNET_SERVICE_Main} callback. @code{GNUNET_SERVICE_run} will also take care of starting the scheduler loop. If this is inappropriate (for example, because the scheduler loop is already running), @code{GNUNET_SERVICE_start} and related functions provide an alternative to @code{GNUNET_SERVICE_run}. When starting a service, the service_name option is used to determine which sections in the configuration file should be used to configure the service. A typical value here is the name of the @file{src/} sub-directory, for example @file{statistics}. The same string would also be given to @code{GNUNET_CLIENT_connect} to access the service. Once a service has been initialized, the program should use the @code{GNUNET_SERVICE_Main} callback to register message handlers using @code{GNUNET_SERVER_add_handlers}. The service will already have registered a handler for the "TEST" message. @findex GNUNET_SERVICE_Options The option bitfield (@code{enum GNUNET_SERVICE_Options}) determines how a service should behave during shutdown. There are three key strategies: @table @asis @item instant (@code{GNUNET_SERVICE_OPTION_NONE}) Upon receiving the shutdown signal from the scheduler, the service immediately terminates the server, closing all existing connections with clients. @item manual (@code{GNUNET_SERVICE_OPTION_MANUAL_SHUTDOWN}) The service does nothing by itself during shutdown. The main program will need to take the appropriate action by calling GNUNET_SERVER_destroy or GNUNET_SERVICE_stop (depending on how the service was initialized) to terminate the service. This method is used by gnunet-service-arm and rather uncommon. @item soft (@code{GNUNET_SERVICE_OPTION_SOFT_SHUTDOWN}) Upon receiving the shutdown signal from the scheduler, the service immediately tells the server to stop listening for incoming clients. Requests from normal existing clients are still processed and the server/service terminates once all normal clients have disconnected. Clients that are not expected to ever disconnect (such as clients that monitor performance values) can be marked as 'monitor' clients using GNUNET_SERVER_client_mark_monitor. Those clients will continue to be processed until all 'normal' clients have disconnected. Then, the server will terminate, closing the monitor connections. This mode is for example used by 'statistics', allowing existing 'normal' clients to set (possibly persistent) statistic values before terminating. @end table @c *********************************************************************** @node Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps @subsection Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps A commonly used data structure in GNUnet is a (multi-)hash map. It is most often used to map a peer identity to some data structure, but also to map arbitrary keys to values (for example to track requests in the distributed hash table or in file-sharing). As it is commonly used, the DHT is actually sometimes responsible for a large share of GNUnet's overall memory consumption (for some processes, 30% is not uncommon). The following text documents some API quirks (and their implications for applications) that were recently introduced to minimize the footprint of the hash map. @c *********************************************************************** @menu * Analysis:: * Solution:: * Migration:: * Conclusion:: * Availability:: @end menu @node Analysis @subsubsection Analysis The main reason for the "excessive" memory consumption by the hash map is that GNUnet uses 512-bit cryptographic hash codes --- and the (multi-)hash map also uses the same 512-bit 'struct GNUNET_HashCode'. As a result, storing just the keys requires 64 bytes of memory for each key. As some applications like to keep a large number of entries in the hash map (after all, that's what maps are good for), 64 bytes per hash is significant: keeping a pointer to the value and having a linked list for collisions consume between 8 and 16 bytes, and 'malloc' may add about the same overhead per allocation, putting us in the 16 to 32 byte per entry ballpark. Adding a 64-byte key then triples the overall memory requirement for the hash map. To make things "worse", most of the time storing the key in the hash map is not required: it is typically already in memory elsewhere! In most cases, the values stored in the hash map are some application-specific struct that _also_ contains the hash. Here is a simplified example: @example struct MyValue @{ struct GNUNET_HashCode key; unsigned int my_data; @}; // ... val = GNUNET_malloc (sizeof (struct MyValue)); val->key = key; val->my_data = 42; GNUNET_CONTAINER_multihashmap_put (map, &key, val, ...); @end example This is a common pattern as later the entries might need to be removed, and at that time it is convenient to have the key immediately at hand: @example GNUNET_CONTAINER_multihashmap_remove (map, &val->key, val); @end example Note that here we end up with two times 64 bytes for the key, plus maybe 64 bytes total for the rest of the 'struct MyValue' and the map entry in the hash map. The resulting redundant storage of the key increases overall memory consumption per entry from the "optimal" 128 bytes to 192 bytes. This is not just an extreme example: overheads in practice are actually sometimes close to those highlighted in this example. This is especially true for maps with a significant number of entries, as there we tend to really try to keep the entries small. @c *********************************************************************** @node Solution @subsubsection Solution The solution that has now been implemented is to @strong{optionally} allow the hash map to not make a (deep) copy of the hash but instead have a pointer to the hash/key in the entry. This reduces the memory consumption for the key from 64 bytes to 4 to 8 bytes. However, it can also only work if the key is actually stored in the entry (which is the case most of the time) and if the entry does not modify the key (which in all of the code I'm aware of has been always the case if there key is stored in the entry). Finally, when the client stores an entry in the hash map, it @strong{must} provide a pointer to the key within the entry, not just a pointer to a transient location of the key. If the client code does not meet these requirements, the result is a dangling pointer and undefined behavior of the (multi-)hash map API. @c *********************************************************************** @node Migration @subsubsection Migration To use the new feature, first check that the values contain the respective key (and never modify it). Then, all calls to @code{GNUNET_CONTAINER_multihashmap_put} on the respective map must be audited and most likely changed to pass a pointer into the value's struct. For the initial example, the new code would look like this: @example struct MyValue @{ struct GNUNET_HashCode key; unsigned int my_data; @}; // ... val = GNUNET_malloc (sizeof (struct MyValue)); val->key = key; val->my_data = 42; GNUNET_CONTAINER_multihashmap_put (map, &val->key, val, ...); @end example Note that @code{&val} was changed to @code{&val->key} in the argument to the @code{put} call. This is critical as often @code{key} is on the stack or in some other transient data structure and thus having the hash map keep a pointer to @code{key} would not work. Only the key inside of @code{val} has the same lifetime as the entry in the map (this must of course be checked as well). Naturally, @code{val->key} must be initialized before the @code{put} call. Once all @code{put} calls have been converted and double-checked, you can change the call to create the hash map from @example map = GNUNET_CONTAINER_multihashmap_create (SIZE, GNUNET_NO); @end example to @example map = GNUNET_CONTAINER_multihashmap_create (SIZE, GNUNET_YES); @end example If everything was done correctly, you now use about 60 bytes less memory per entry in @code{map}. However, if now (or in the future) any call to @code{put} does not ensure that the given key is valid until the entry is removed from the map, undefined behavior is likely to be observed. @c *********************************************************************** @node Conclusion @subsubsection Conclusion The new optimization can is often applicable and can result in a reduction in memory consumption of up to 30% in practice. However, it makes the code less robust as additional invariants are imposed on the multi hash map client. Thus applications should refrain from enabling the new mode unless the resulting performance increase is deemed significant enough. In particular, it should generally not be used in new code (wait at least until benchmarks exist). @c *********************************************************************** @node Availability @subsubsection Availability The new multi hash map code was committed in SVN 24319 (which made its way into GNUnet version 0.9.4). Various subsystems (transport, core, dht, file-sharing) were previously audited and modified to take advantage of the new capability. In particular, memory consumption of the file-sharing service is expected to drop by 20-30% due to this change. @cindex CONTAINER_MDLL API @node CONTAINER_MDLL API @subsection CONTAINER_MDLL API This text documents the GNUNET_CONTAINER_MDLL API. The GNUNET_CONTAINER_MDLL API is similar to the GNUNET_CONTAINER_DLL API in that it provides operations for the construction and manipulation of doubly-linked lists. The key difference to the (simpler) DLL-API is that the MDLL-version allows a single element (instance of a "struct") to be in multiple linked lists at the same time. Like the DLL API, the MDLL API stores (most of) the data structures for the doubly-linked list with the respective elements; only the 'head' and 'tail' pointers are stored "elsewhere" --- and the application needs to provide the locations of head and tail to each of the calls in the MDLL API. The key difference for the MDLL API is that the "next" and "previous" pointers in the struct can no longer be simply called "next" and "prev" --- after all, the element may be in multiple doubly-linked lists, so we cannot just have one "next" and one "prev" pointer! The solution is to have multiple fields that must have a name of the format "next_XX" and "prev_XX" where "XX" is the name of one of the doubly-linked lists. Here is a simple example: @example struct MyMultiListElement @{ struct MyMultiListElement *next_ALIST; struct MyMultiListElement *prev_ALIST; struct MyMultiListElement *next_BLIST; struct MyMultiListElement *prev_BLIST; void *data; @}; @end example Note that by convention, we use all-uppercase letters for the list names. In addition, the program needs to have a location for the head and tail pointers for both lists, for example: @example static struct MyMultiListElement *head_ALIST; static struct MyMultiListElement *tail_ALIST; static struct MyMultiListElement *head_BLIST; static struct MyMultiListElement *tail_BLIST; @end example Using the MDLL-macros, we can now insert an element into the ALIST: @example GNUNET_CONTAINER_MDLL_insert (ALIST, head_ALIST, tail_ALIST, element); @end example Passing "ALIST" as the first argument to MDLL specifies which of the next/prev fields in the 'struct MyMultiListElement' should be used. The extra "ALIST" argument and the "_ALIST" in the names of the next/prev-members are the only differences between the MDDL and DLL-API. Like the DLL-API, the MDLL-API offers functions for inserting (at head, at tail, after a given element) and removing elements from the list. Iterating over the list should be done by directly accessing the "next_XX" and/or "prev_XX" members. @cindex Automatic Restart Manager @cindex ARM @node Automatic Restart Manager (ARM) @section Automatic Restart Manager (ARM) GNUnet's Automated Restart Manager (ARM) is the GNUnet service responsible for system initialization and service babysitting. ARM starts and halts services, detects configuration changes and restarts services impacted by the changes as needed. It's also responsible for restarting services in case of crashes and is planned to incorporate automatic debugging for diagnosing service crashes providing developers insights about crash reasons. The purpose of this document is to give GNUnet developer an idea about how ARM works and how to interact with it. @menu * Basic functionality:: * Key configuration options:: * ARM - Availability:: * Reliability:: @end menu @c *********************************************************************** @node Basic functionality @subsection Basic functionality @itemize @bullet @item ARM source code can be found under "src/arm".@ Service processes are managed by the functions in "gnunet-service-arm.c" which is controlled with "gnunet-arm.c" (main function in that file is ARM's entry point). @item The functions responsible for communicating with ARM , starting and stopping services -including ARM service itself- are provided by the ARM API "arm_api.c".@ Function: GNUNET_ARM_connect() returns to the caller an ARM handle after setting it to the caller's context (configuration and scheduler in use). This handle can be used afterwards by the caller to communicate with ARM. Functions GNUNET_ARM_start_service() and GNUNET_ARM_stop_service() are used for starting and stopping services respectively. @item A typical example of using these basic ARM services can be found in file test_arm_api.c. The test case connects to ARM, starts it, then uses it to start a service "resolver", stops the "resolver" then stops "ARM". @end itemize @c *********************************************************************** @node Key configuration options @subsection Key configuration options Configurations for ARM and services should be available in a .conf file (As an example, see test_arm_api_data.conf). When running ARM, the configuration file to use should be passed to the command: @example $ gnunet-arm -s -c configuration_to_use.conf @end example If no configuration is passed, the default configuration file will be used (see GNUNET_PREFIX/share/gnunet/defaults.conf which is created from contrib/defaults.conf).@ Each of the services is having a section starting by the service name between square brackets, for example: "[arm]". The following options configure how ARM configures or interacts with the various services: @table @asis @item PORT Port number on which the service is listening for incoming TCP connections. ARM will start the services should it notice a request at this port. @item HOSTNAME Specifies on which host the service is deployed. Note that ARM can only start services that are running on the local system (but will not check that the hostname matches the local machine name). This option is used by the @code{gnunet_client_lib.h} implementation to determine which system to connect to. The default is "localhost". @item BINARY The name of the service binary file. @item OPTIONS To be passed to the service. @item PREFIX A command to pre-pend to the actual command, for example, running a service with "valgrind" or "gdb" @item DEBUG Run in debug mode (much verbosity). @item START_ON_DEMAND ARM will listen to UNIX domain socket and/or TCP port of the service and start the service on-demand. @item IMMEDIATE_START ARM will always start this service when the peer is started. @item ACCEPT_FROM IPv4 addresses the service accepts connections from. @item ACCEPT_FROM6 IPv6 addresses the service accepts connections from. @end table Options that impact the operation of ARM overall are in the "[arm]" section. ARM is a normal service and has (except for START_ON_DEMAND) all of the options that other services do. In addition, ARM has the following options: @table @asis @item GLOBAL_PREFIX Command to be pre-pended to all services that are going to run. @item GLOBAL_POSTFIX Global option that will be supplied to all the services that are going to run. @end table @c *********************************************************************** @node ARM - Availability @subsection ARM - Availability As mentioned before, one of the features provided by ARM is starting services on demand. Consider the example of one service "client" that wants to connect to another service a "server". The "client" will ask ARM to run the "server". ARM starts the "server". The "server" starts listening to incoming connections. The "client" will establish a connection with the "server". And then, they will start to communicate together.@ One problem with that scheme is that it's slow!@ The "client" service wants to communicate with the "server" service at once and is not willing wait for it to be started and listening to incoming connections before serving its request.@ One solution for that problem will be that ARM starts all services as default services. That solution will solve the problem, yet, it's not quite practical, for some services that are going to be started can never be used or are going to be used after a relatively long time.@ The approach followed by ARM to solve this problem is as follows: @itemize @bullet @item For each service having a PORT field in the configuration file and that is not one of the default services ( a service that accepts incoming connections from clients), ARM creates listening sockets for all addresses associated with that service. @item The "client" will immediately establish a connection with the "server". @item ARM --- pretending to be the "server" --- will listen on the respective port and notice the incoming connection from the "client" (but not accept it), instead @item Once there is an incoming connection, ARM will start the "server", passing on the listen sockets (now, the service is started and can do its work). @item Other client services now can directly connect directly to the "server". @end itemize @c *********************************************************************** @node Reliability @subsection Reliability One of the features provided by ARM, is the automatic restart of crashed services.@ ARM needs to know which of the running services died. Function "gnunet-service-arm.c/maint_child_death()" is responsible for that. The function is scheduled to run upon receiving a SIGCHLD signal. The function, then, iterates ARM's list of services running and monitors which service has died (crashed). For all crashing services, ARM restarts them.@ Now, considering the case of a service having a serious problem causing it to crash each time it's started by ARM. If ARM keeps blindly restarting such a service, we are going to have the pattern: start-crash-restart-crash-restart-crash and so forth!! Which is of course not practical.@ For that reason, ARM schedules the service to be restarted after waiting for some delay that grows exponentially with each crash/restart of that service.@ To clarify the idea, considering the following example: @itemize @bullet @item Service S crashed. @item ARM receives the SIGCHLD and inspects its list of services to find the dead one(s). @item ARM finds S dead and schedules it for restarting after "backoff" time which is initially set to 1ms. ARM will double the backoff time correspondent to S (now backoff(S) = 2ms) @item Because there is a severe problem with S, it crashed again. @item Again ARM receives the SIGCHLD and detects that it's S again that's crashed. ARM schedules it for restarting but after its new backoff time (which became 2ms), and doubles its backoff time (now backoff(S) = 4). @item and so on, until backoff(S) reaches a certain threshold (@code{EXPONENTIAL_BACKOFF_THRESHOLD} is set to half an hour), after reaching it, backoff(S) will remain half an hour, hence ARM won't be busy for a lot of time trying to restart a problematic service. @end itemize @cindex TRANSPORT Subsystem @node TRANSPORT Subsystem @section TRANSPORT Subsystem This chapter documents how the GNUnet transport subsystem works. The GNUnet transport subsystem consists of three main components: the transport API (the interface used by the rest of the system to access the transport service), the transport service itself (most of the interesting functions, such as choosing transports, happens here) and the transport plugins. A transport plugin is a concrete implementation for how two GNUnet peers communicate; many plugins exist, for example for communication via TCP, UDP, HTTP, HTTPS and others. Finally, the transport subsystem uses supporting code, especially the NAT/UPnP library to help with tasks such as NAT traversal. Key tasks of the transport service include: @itemize @bullet @item Create our HELLO message, notify clients and neighbours if our HELLO changes (using NAT library as necessary) @item Validate HELLOs from other peers (send PING), allow other peers to validate our HELLO's addresses (send PONG) @item Upon request, establish connections to other peers (using address selection from ATS subsystem) and maintain them (again using PINGs and PONGs) as long as desired @item Accept incoming connections, give ATS service the opportunity to switch communication channels @item Notify clients about peers that have connected to us or that have been disconnected from us @item If a (stateful) connection goes down unexpectedly (without explicit DISCONNECT), quickly attempt to recover (without notifying clients) but do notify clients quickly if reconnecting fails @item Send (payload) messages arriving from clients to other peers via transport plugins and receive messages from other peers, forwarding those to clients @item Enforce inbound traffic limits (using flow-control if it is applicable); outbound traffic limits are enforced by CORE, not by us (!) @item Enforce restrictions on P2P connection as specified by the blacklist configuration and blacklisting clients @end itemize Note that the term "clients" in the list above really refers to the GNUnet-CORE service, as CORE is typically the only client of the transport service. @menu * Address validation protocol:: @end menu @node Address validation protocol @subsection Address validation protocol This section documents how the GNUnet transport service validates connections with other peers. It is a high-level description of the protocol necessary to understand the details of the implementation. It should be noted that when we talk about PING and PONG messages in this section, we refer to transport-level PING and PONG messages, which are different from core-level PING and PONG messages (both in implementation and function). The goal of transport-level address validation is to minimize the chances of a successful man-in-the-middle attack against GNUnet peers on the transport level. Such an attack would not allow the adversary to decrypt the P2P transmissions, but a successful attacker could at least measure traffic volumes and latencies (raising the adversaries capabilities by those of a global passive adversary in the worst case). The scenarios we are concerned about is an attacker, Mallory, giving a @code{HELLO} to Alice that claims to be for Bob, but contains Mallory's IP address instead of Bobs (for some transport). Mallory would then forward the traffic to Bob (by initiating a connection to Bob and claiming to be Alice). As a further complication, the scheme has to work even if say Alice is behind a NAT without traversal support and hence has no address of her own (and thus Alice must always initiate the connection to Bob). An additional constraint is that @code{HELLO} messages do not contain a cryptographic signature since other peers must be able to edit (i.e. remove) addresses from the @code{HELLO} at any time (this was not true in GNUnet 0.8.x). A basic @strong{assumption} is that each peer knows the set of possible network addresses that it @strong{might} be reachable under (so for example, the external IP address of the NAT plus the LAN address(es) with the respective ports). The solution is the following. If Alice wants to validate that a given address for Bob is valid (i.e. is actually established @strong{directly} with the intended target), she sends a PING message over that connection to Bob. Note that in this case, Alice initiated the connection so only Alice knows which address was used for sure (Alice may be behind NAT, so whatever address Bob sees may not be an address Alice knows she has). Bob checks that the address given in the @code{PING} is actually one of Bob's addresses (ie: does not belong to Mallory), and if it is, sends back a @code{PONG} (with a signature that says that Bob owns/uses the address from the @code{PING}). Alice checks the signature and is happy if it is valid and the address in the @code{PONG} is the address Alice used. This is similar to the 0.8.x protocol where the @code{HELLO} contained a signature from Bob for each address used by Bob. Here, the purpose code for the signature is @code{GNUNET_SIGNATURE_PURPOSE_TRANSPORT_PONG_OWN}. After this, Alice will remember Bob's address and consider the address valid for a while (12h in the current implementation). Note that after this exchange, Alice only considers Bob's address to be valid, the connection itself is not considered 'established'. In particular, Alice may have many addresses for Bob that Alice considers valid. The @code{PONG} message is protected with a nonce/challenge against replay attacks (@uref{http://en.wikipedia.org/wiki/Replay_attack, replay}) and uses an expiration time for the signature (but those are almost implementation details). @cindex NAT library @node NAT library @section NAT library The goal of the GNUnet NAT library is to provide a general-purpose API for NAT traversal @strong{without} third-party support. So protocols that involve contacting a third peer to help establish a connection between two peers are outside of the scope of this API. That does not mean that GNUnet doesn't support involving a third peer (we can do this with the distance-vector transport or using application-level protocols), it just means that the NAT API is not concerned with this possibility. The API is written so that it will work for IPv6-NAT in the future as well as current IPv4-NAT. Furthermore, the NAT API is always used, even for peers that are not behind NAT --- in that case, the mapping provided is simply the identity. NAT traversal is initiated by calling @code{GNUNET_NAT_register}. Given a set of addresses that the peer has locally bound to (TCP or UDP), the NAT library will return (via callback) a (possibly longer) list of addresses the peer @strong{might} be reachable under. Internally, depending on the configuration, the NAT library will try to punch a hole (using UPnP) or just "know" that the NAT was manually punched and generate the respective external IP address (the one that should be globally visible) based on the given information. The NAT library also supports ICMP-based NAT traversal. Here, the other peer can request connection-reversal by this peer (in this special case, the peer is even allowed to configure a port number of zero). If the NAT library detects a connection-reversal request, it returns the respective target address to the client as well. It should be noted that connection-reversal is currently only intended for TCP, so other plugins @strong{must} pass @code{NULL} for the reversal callback. Naturally, the NAT library also supports requesting connection reversal from a remote peer (@code{GNUNET_NAT_run_client}). Once initialized, the NAT handle can be used to test if a given address is possibly a valid address for this peer (@code{GNUNET_NAT_test_address}). This is used for validating our addresses when generating PONGs. Finally, the NAT library contains an API to test if our NAT configuration is correct. Using @code{GNUNET_NAT_test_start} @strong{before} binding to the respective port, the NAT library can be used to test if the configuration works. The test function act as a local client, initialize the NAT traversal and then contact a @code{gnunet-nat-server} (running by default on @code{gnunet.org}) and ask for a connection to be established. This way, it is easy to test if the current NAT configuration is valid. @node Distance-Vector plugin @section Distance-Vector plugin The Distance Vector (DV) transport is a transport mechanism that allows peers to act as relays for each other, thereby connecting peers that would otherwise be unable to connect. This gives a larger connection set to applications that may work better with more peers to choose from (for example, File Sharing and/or DHT). The Distance Vector transport essentially has two functions. The first is "gossiping" connection information about more distant peers to directly connected peers. The second is taking messages intended for non-directly connected peers and encapsulating them in a DV wrapper that contains the required information for routing the message through forwarding peers. Via gossiping, optimal routes through the known DV neighborhood are discovered and utilized and the message encapsulation provides some benefits in addition to simply getting the message from the correct source to the proper destination. The gossiping function of DV provides an up to date routing table of peers that are available up to some number of hops. We call this a fisheye view of the network (like a fish, nearby objects are known while more distant ones unknown). Gossip messages are sent only to directly connected peers, but they are sent about other knowns peers within the "fisheye distance". Whenever two peers connect, they immediately gossip to each other about their appropriate other neighbors. They also gossip about the newly connected peer to previously connected neighbors. In order to keep the routing tables up to date, disconnect notifications are propagated as gossip as well (because disconnects may not be sent/received, timeouts are also used remove stagnant routing table entries). Routing of messages via DV is straightforward. When the DV transport is notified of a message destined for a non-direct neighbor, the appropriate forwarding peer is selected, and the base message is encapsulated in a DV message which contains information about the initial peer and the intended recipient. At each forwarding hop, the initial peer is validated (the forwarding peer ensures that it has the initial peer in its neighborhood, otherwise the message is dropped). Next the base message is re-encapsulated in a new DV message for the next hop in the forwarding chain (or delivered to the current peer, if it has arrived at the destination). Assume a three peer network with peers Alice, Bob and Carol. Assume that @example Alice <-> Bob and Bob <-> Carol @end example @noindent are direct (e.g. over TCP or UDP transports) connections, but that Alice cannot directly connect to Carol. This may be the case due to NAT or firewall restrictions, or perhaps based on one of the peers respective configurations. If the Distance Vector transport is enabled on all three peers, it will automatically discover (from the gossip protocol) that Alice and Carol can connect via Bob and provide a "virtual" Alice <-> Carol connection. Routing between Alice and Carol happens as follows; Alice creates a message destined for Carol and notifies the DV transport about it. The DV transport at Alice looks up Carol in the routing table and finds that the message must be sent through Bob for Carol. The message is encapsulated setting Alice as the initiator and Carol as the destination and sent to Bob. Bob receives the messages, verifies that both Alice and Carol are known to Bob, and re-wraps the message in a new DV message for Carol. The DV transport at Carol receives this message, unwraps the original message, and delivers it to Carol as though it came directly from Alice. @cindex SMTP plugin @node SMTP plugin @section SMTP plugin @c TODO: Update! This section describes the new SMTP transport plugin for GNUnet as it exists in the 0.7.x and 0.8.x branch. SMTP support is currently not available in GNUnet 0.9.x. This page also describes the transport layer abstraction (as it existed in 0.7.x and 0.8.x) in more detail and gives some benchmarking results. The performance results presented are quite old and maybe outdated at this point. For the readers in the year 2019, you will notice by the mention of version 0.7, 0.8, and 0.9 that this section has to be taken with your usual grain of salt and be updated eventually. @itemize @bullet @item Why use SMTP for a peer-to-peer transport? @item SMTPHow does it work? @item How do I configure my peer? @item How do I test if it works? @item How fast is it? @item Is there any additional documentation? @end itemize @menu * Why use SMTP for a peer-to-peer transport?:: * How does it work?:: * How do I configure my peer?:: * How do I test if it works?:: * How fast is it?:: @end menu @node Why use SMTP for a peer-to-peer transport? @subsection Why use SMTP for a peer-to-peer transport? There are many reasons why one would not want to use SMTP: @itemize @bullet @item SMTP is using more bandwidth than TCP, UDP or HTTP @item SMTP has a much higher latency. @item SMTP requires significantly more computation (encoding and decoding time) for the peers. @item SMTP is significantly more complicated to configure. @item SMTP may be abused by tricking GNUnet into sending mail to@ non-participating third parties. @end itemize So why would anybody want to use SMTP? @itemize @bullet @item SMTP can be used to contact peers behind NAT boxes (in virtual private networks). @item SMTP can be used to circumvent policies that limit or prohibit peer-to-peer traffic by masking as "legitimate" traffic. @item SMTP uses E-mail addresses which are independent of a specific IP, which can be useful to address peers that use dynamic IP addresses. @item SMTP can be used to initiate a connection (e.g. initial address exchange) and peers can then negotiate the use of a more efficient protocol (e.g. TCP) for the actual communication. @end itemize In summary, SMTP can for example be used to send a message to a peer behind a NAT box that has a dynamic IP to tell the peer to establish a TCP connection to a peer outside of the private network. Even an extraordinary overhead for this first message would be irrelevant in this type of situation. @node How does it work? @subsection How does it work? When a GNUnet peer needs to send a message to another GNUnet peer that has advertised (only) an SMTP transport address, GNUnet base64-encodes the message and sends it in an E-mail to the advertised address. The advertisement contains a filter which is placed in the E-mail header, such that the receiving host can filter the tagged E-mails and forward it to the GNUnet peer process. The filter can be specified individually by each peer and be changed over time. This makes it impossible to censor GNUnet E-mail messages by searching for a generic filter. @node How do I configure my peer? @subsection How do I configure my peer? First, you need to configure @code{procmail} to filter your inbound E-mail for GNUnet traffic. The GNUnet messages must be delivered into a pipe, for example @code{/tmp/gnunet.smtp}. You also need to define a filter that is used by @command{procmail} to detect GNUnet messages. You are free to choose whichever filter you like, but you should make sure that it does not occur in your other E-mail. In our example, we will use @code{X-mailer: GNUnet}. The @code{~/.procmailrc} configuration file then looks like this: @example :0: * ^X-mailer: GNUnet /tmp/gnunet.smtp # where do you want your other e-mail delivered to # (default: /var/spool/mail/) :0: /var/spool/mail/ @end example After adding this file, first make sure that your regular E-mail still works (e.g. by sending an E-mail to yourself). Then edit the GNUnet configuration. In the section @code{SMTP} you need to specify your E-mail address under @code{EMAIL}, your mail server (for outgoing mail) under @code{SERVER}, the filter (X-mailer: GNUnet in the example) under @code{FILTER} and the name of the pipe under @code{PIPE}.@ The completed section could then look like this: @example EMAIL = me@@mail.gnu.org MTU = 65000 SERVER = mail.gnu.org:25 FILTER = "X-mailer: GNUnet" PIPE = /tmp/gnunet.smtp @end example Finally, you need to add @code{smtp} to the list of @code{TRANSPORTS} in the @code{GNUNETD} section. GNUnet peers will use the E-mail address that you specified to contact your peer until the advertisement times out. Thus, if you are not sure if everything works properly or if you are not planning to be online for a long time, you may want to configure this timeout to be short, e.g. just one hour. For this, set @code{HELLOEXPIRES} to @code{1} in the @code{GNUNETD} section. This should be it, but you may probably want to test it first. @node How do I test if it works? @subsection How do I test if it works? Any transport can be subjected to some rudimentary tests using the @code{gnunet-transport-check} tool. The tool sends a message to the local node via the transport and checks that a valid message is received. While this test does not involve other peers and can not check if firewalls or other network obstacles prohibit proper operation, this is a great testcase for the SMTP transport since it tests pretty much nearly all of the functionality. @code{gnunet-transport-check} should only be used without running @code{gnunetd} at the same time. By default, @code{gnunet-transport-check} tests all transports that are specified in the configuration file. But you can specifically test SMTP by giving the option @code{--transport=smtp}. Note that this test always checks if a transport can receive and send. While you can configure most transports to only receive or only send messages, this test will only work if you have configured the transport to send and receive messages. @node How fast is it? @subsection How fast is it? We have measured the performance of the UDP, TCP and SMTP transport layer directly and when used from an application using the GNUnet core. Measuring just the transport layer gives the better view of the actual overhead of the protocol, whereas evaluating the transport from the application puts the overhead into perspective from a practical point of view. The loopback measurements of the SMTP transport were performed on three different machines spanning a range of modern SMTP configurations. We used a PIII-800 running RedHat 7.3 with the Purdue Computer Science configuration which includes filters for spam. We also used a Xenon 2 GHZ with a vanilla RedHat 8.0 sendmail configuration. Furthermore, we used qmail on a PIII-1000 running Sorcerer GNU Linux (SGL). The numbers for UDP and TCP are provided using the SGL configuration. The qmail benchmark uses qmail's internal filtering whereas the sendmail benchmarks relies on procmail to filter and deliver the mail. We used the transport layer to send a message of b bytes (excluding transport protocol headers) directly to the local machine. This way, network latency and packet loss on the wire have no impact on the timings. n messages were sent sequentially over the transport layer, sending message i+1 after the i-th message was received. All messages were sent over the same connection and the time to establish the connection was not taken into account since this overhead is minuscule in practice --- as long as a connection is used for a significant number of messages. @multitable @columnfractions .20 .15 .15 .15 .15 .15 @headitem Transport @tab UDP @tab TCP @tab SMTP (Purdue sendmail) @tab SMTP (RH 8.0) @tab SMTP (SGL qmail) @item 11 bytes @tab 31 ms @tab 55 ms @tab 781 s @tab 77 s @tab 24 s @item 407 bytes @tab 37 ms @tab 62 ms @tab 789 s @tab 78 s @tab 25 s @item 1,221 bytes @tab 46 ms @tab 73 ms @tab 804 s @tab 78 s @tab 25 s @end multitable The benchmarks show that UDP and TCP are, as expected, both significantly faster compared with any of the SMTP services. Among the SMTP implementations, there can be significant differences depending on the SMTP configuration. Filtering with an external tool like procmail that needs to re-parse its configuration for each mail can be very expensive. Applying spam filters can also significantly impact the performance of the underlying SMTP implementation. The microbenchmark shows that SMTP can be a viable solution for initiating peer-to-peer sessions: a couple of seconds to connect to a peer are probably not even going to be noticed by users. The next benchmark measures the possible throughput for a transport. Throughput can be measured by sending multiple messages in parallel and measuring packet loss. Note that not only UDP but also the TCP transport can actually loose messages since the TCP implementation drops messages if the @code{write} to the socket would block. While the SMTP protocol never drops messages itself, it is often so slow that only a fraction of the messages can be sent and received in the given time-bounds. For this benchmark we report the message loss after allowing t time for sending m messages. If messages were not sent (or received) after an overall timeout of t, they were considered lost. The benchmark was performed using two Xeon 2 GHZ machines running RedHat 8.0 with sendmail. The machines were connected with a direct 100 MBit Ethernet connection.@ Figures udp1200, tcp1200 and smtp-MTUs show that the throughput for messages of size 1,200 octets is 2,343 kbps, 3,310 kbps and 6 kbps for UDP, TCP and SMTP respectively. The high per-message overhead of SMTP can be improved by increasing the MTU, for example, an MTU of 12,000 octets improves the throughput to 13 kbps as figure smtp-MTUs shows. Our research paper) has some more details on the benchmarking results. @cindex Bluetooth plugin @node Bluetooth plugin @section Bluetooth plugin This page describes the new Bluetooth transport plugin for GNUnet. The plugin is still in the testing stage so don't expect it to work perfectly. If you have any questions or problems just post them here or ask on the IRC channel. @itemize @bullet @item What do I need to use the Bluetooth plugin transport? @item BluetoothHow does it work? @item What possible errors should I be aware of? @item How do I configure my peer? @item How can I test it? @end itemize @menu * What do I need to use the Bluetooth plugin transport?:: * How does it work2?:: * What possible errors should I be aware of?:: * How do I configure my peer2?:: * How can I test it?:: * The implementation of the Bluetooth transport plugin:: @end menu @node What do I need to use the Bluetooth plugin transport? @subsection What do I need to use the Bluetooth plugin transport? If you are a GNU/Linux user and you want to use the Bluetooth transport plugin you should install the @command{BlueZ development libraries} (if they aren't already installed). For instructions about how to install the libraries you should check out the BlueZ site (@uref{http://www.bluez.org/, http://www.bluez.org}). If you don't know if you have the necessary libraries, don't worry, just run the GNUnet configure script and you will be able to see a notification at the end which will warn you if you don't have the necessary libraries. If you are a Windows user you should have installed the @emph{MinGW}/@emph{MSys2} with the latest updates (especially the @emph{ws2bth} header). If this is your first build of GNUnet on Windows you should check out the SBuild repository. It will semi-automatically assembles a @emph{MinGW}/@emph{MSys2} installation with a lot of extra packages which are needed for the GNUnet build. So this will ease your work!@ Finally you just have to be sure that you have the correct drivers for your Bluetooth device installed and that your device is on and in a discoverable mode. The Windows Bluetooth Stack supports only the RFCOMM protocol so we cannot turn on your device programatically! @c FIXME: Change to unique title @node How does it work2? @subsection How does it work2? The Bluetooth transport plugin uses virtually the same code as the WLAN plugin and only the helper binary is different. The helper takes a single argument, which represents the interface name and is specified in the configuration file. Here are the basic steps that are followed by the helper binary used on GNU/Linux: @itemize @bullet @item it verifies if the name corresponds to a Bluetooth interface name @item it verifies if the interface is up (if it is not, it tries to bring it up) @item it tries to enable the page and inquiry scan in order to make the device discoverable and to accept incoming connection requests @emph{The above operations require root access so you should start the transport plugin with root privileges.} @item it finds an available port number and registers a SDP service which will be used to find out on which port number is the server listening on and switch the socket in listening mode @item it sends a HELLO message with its address @item finally it forwards traffic from the reading sockets to the STDOUT and from the STDIN to the writing socket @end itemize Once in a while the device will make an inquiry scan to discover the nearby devices and it will send them randomly HELLO messages for peer discovery. @node What possible errors should I be aware of? @subsection What possible errors should I be aware of? @emph{This section is dedicated for GNU/Linux users} Well there are many ways in which things could go wrong but I will try to present some tools that you could use to debug and some scenarios. @itemize @bullet @item @code{bluetoothd -n -d} : use this command to enable logging in the foreground and to print the logging messages @item @code{hciconfig}: can be used to configure the Bluetooth devices. If you run it without any arguments it will print information about the state of the interfaces. So if you receive an error that the device couldn't be brought up you should try to bring it manually and to see if it works (use @code{hciconfig -a hciX up}). If you can't and the Bluetooth address has the form 00:00:00:00:00:00 it means that there is something wrong with the D-Bus daemon or with the Bluetooth daemon. Use @code{bluetoothd} tool to see the logs @item @code{sdptool} can be used to control and interrogate SDP servers. If you encounter problems regarding the SDP server (like the SDP server is down) you should check out if the D-Bus daemon is running correctly and to see if the Bluetooth daemon started correctly(use @code{bluetoothd} tool). Also, sometimes the SDP service could work but somehow the device couldn't register its service. Use @code{sdptool browse [dev-address]} to see if the service is registered. There should be a service with the name of the interface and GNUnet as provider. @item @code{hcitool} : another useful tool which can be used to configure the device and to send some particular commands to it. @item @code{hcidump} : could be used for low level debugging @end itemize @c FIXME: A more unique name @node How do I configure my peer2? @subsection How do I configure my peer2? On GNU/Linux, you just have to be sure that the interface name corresponds to the one that you want to use. Use the @code{hciconfig} tool to check that. By default it is set to hci0 but you can change it. A basic configuration looks like this: @example [transport-bluetooth] # Name of the interface (typically hciX) INTERFACE = hci0 # Real hardware, no testing TESTMODE = 0 TESTING_IGNORE_KEYS = ACCEPT_FROM; @end example In order to use the Bluetooth transport plugin when the transport service is started, you must add the plugin name to the default transport service plugins list. For example: @example [transport] ... PLUGINS = dns bluetooth ... @end example If you want to use only the Bluetooth plugin set @emph{PLUGINS = bluetooth} On Windows, you cannot specify which device to use. The only thing that you should do is to add @emph{bluetooth} on the plugins list of the transport service. @node How can I test it? @subsection How can I test it? If you have two Bluetooth devices on the same machine and you are using GNU/Linux you must: @itemize @bullet @item create two different file configuration (one which will use the first interface (@emph{hci0}) and the other which will use the second interface (@emph{hci1})). Let's name them @emph{peer1.conf} and @emph{peer2.conf}. @item run @emph{gnunet-peerinfo -c peerX.conf -s} in order to generate the peers private keys. The @strong{X} must be replace with 1 or 2. @item run @emph{gnunet-arm -c peerX.conf -s -i=transport} in order to start the transport service. (Make sure that you have "bluetooth" on the transport plugins list if the Bluetooth transport service doesn't start.) @item run @emph{gnunet-peerinfo -c peer1.conf -s} to get the first peer's ID. If you already know your peer ID (you saved it from the first command), this can be skipped. @item run @emph{gnunet-transport -c peer2.conf -p=PEER1_ID -s} to start sending data for benchmarking to the other peer. @end itemize This scenario will try to connect the second peer to the first one and then start sending data for benchmarking. On Windows you cannot test the plugin functionality using two Bluetooth devices from the same machine because after you install the drivers there will occur some conflicts between the Bluetooth stacks. (At least that is what happened on my machine : I wasn't able to use the Bluesoleil stack and the WINDCOMM one in the same time). If you have two different machines and your configuration files are good you can use the same scenario presented on the beginning of this section. Another way to test the plugin functionality is to create your own application which will use the GNUnet framework with the Bluetooth transport service. @node The implementation of the Bluetooth transport plugin @subsection The implementation of the Bluetooth transport plugin This page describes the implementation of the Bluetooth transport plugin. First I want to remind you that the Bluetooth transport plugin uses virtually the same code as the WLAN plugin and only the helper binary is different. Also the scope of the helper binary from the Bluetooth transport plugin is the same as the one used for the WLAN transport plugin: it accesses the interface and then it forwards traffic in both directions between the Bluetooth interface and stdin/stdout of the process involved. The Bluetooth plugin transport could be used both on GNU/Linux and Windows platforms. @itemize @bullet @item Linux functionality @item Windows functionality @item Pending Features @end itemize @menu * Linux functionality:: * THE INITIALIZATION:: * THE LOOP:: * Details about the broadcast implementation:: * Windows functionality:: * Pending features:: @end menu @node Linux functionality @subsubsection Linux functionality In order to implement the plugin functionality on GNU/Linux I used the BlueZ stack. For the communication with the other devices I used the RFCOMM protocol. Also I used the HCI protocol to gain some control over the device. The helper binary takes a single argument (the name of the Bluetooth interface) and is separated in two stages: @c %** 'THE INITIALIZATION' should be in bigger letters or stand out, not @c %** starting a new section? @node THE INITIALIZATION @subsubsection THE INITIALIZATION @itemize @bullet @item first, it checks if we have root privileges (@emph{Remember that we need to have root privileges in order to be able to bring the interface up if it is down or to change its state.}). @item second, it verifies if the interface with the given name exists. @strong{If the interface with that name exists and it is a Bluetooth interface:} @item it creates a RFCOMM socket which will be used for listening and call the @emph{open_device} method On the @emph{open_device} method: @itemize @bullet @item creates a HCI socket used to send control events to the the device @item searches for the device ID using the interface name @item saves the device MAC address @item checks if the interface is down and tries to bring it UP @item checks if the interface is in discoverable mode and tries to make it discoverable @item closes the HCI socket and binds the RFCOMM one @item switches the RFCOMM socket in listening mode @item registers the SDP service (the service will be used by the other devices to get the port on which this device is listening on) @end itemize @item drops the root privileges @strong{If the interface is not a Bluetooth interface the helper exits with a suitable error} @end itemize @c %** Same as for @node entry above @node THE LOOP @subsubsection THE LOOP The helper binary uses a list where it saves all the connected neighbour devices (@emph{neighbours.devices}) and two buffers (@emph{write_pout} and @emph{write_std}). The first message which is send is a control message with the device's MAC address in order to announce the peer presence to the neighbours. Here are a short description of what happens in the main loop: @itemize @bullet @item Every time when it receives something from the STDIN it processes the data and saves the message in the first buffer (@emph{write_pout}). When it has something in the buffer, it gets the destination address from the buffer, searches the destination address in the list (if there is no connection with that device, it creates a new one and saves it to the list) and sends the message. @item Every time when it receives something on the listening socket it accepts the connection and saves the socket on a list with the reading sockets. @item Every time when it receives something from a reading socket it parses the message, verifies the CRC and saves it in the @emph{write_std} buffer in order to be sent later to the STDOUT. @end itemize So in the main loop we use the select function to wait until one of the file descriptor saved in one of the two file descriptors sets used is ready to use. The first set (@emph{rfds}) represents the reading set and it could contain the list with the reading sockets, the STDIN file descriptor or the listening socket. The second set (@emph{wfds}) is the writing set and it could contain the sending socket or the STDOUT file descriptor. After the select function returns, we check which file descriptor is ready to use and we do what is supposed to do on that kind of event. @emph{For example:} if it is the listening socket then we accept a new connection and save the socket in the reading list; if it is the STDOUT file descriptor, then we write to STDOUT the message from the @emph{write_std} buffer. To find out on which port a device is listening on we connect to the local SDP server and search the registered service for that device. @emph{You should be aware of the fact that if the device fails to connect to another one when trying to send a message it will attempt one more time. If it fails again, then it skips the message.} @emph{Also you should know that the transport Bluetooth plugin has support for @strong{broadcast messages}.} @node Details about the broadcast implementation @subsubsection Details about the broadcast implementation First I want to point out that the broadcast functionality for the CONTROL messages is not implemented in a conventional way. Since the inquiry scan time is too big and it will take some time to send a message to all the discoverable devices I decided to tackle the problem in a different way. Here is how I did it: @itemize @bullet @item If it is the first time when I have to broadcast a message I make an inquiry scan and save all the devices' addresses to a vector. @item After the inquiry scan ends I take the first address from the list and I try to connect to it. If it fails, I try to connect to the next one. If it succeeds, I save the socket to a list and send the message to the device. @item When I have to broadcast another message, first I search on the list for a new device which I'm not connected to. If there is no new device on the list I go to the beginning of the list and send the message to the old devices. After 5 cycles I make a new inquiry scan to check out if there are new discoverable devices and save them to the list. If there are no new discoverable devices I reset the cycling counter and go again through the old list and send messages to the devices saved in it. @end itemize @strong{Therefore}: @itemize @bullet @item every time when I have a broadcast message I look up on the list for a new device and send the message to it @item if I reached the end of the list for 5 times and I'm connected to all the devices from the list I make a new inquiry scan. @emph{The number of the list's cycles after an inquiry scan could be increased by redefining the MAX_LOOPS variable} @item when there are no new devices I send messages to the old ones. @end itemize Doing so, the broadcast control messages will reach the devices but with delay. @emph{NOTICE:} When I have to send a message to a certain device first I check on the broadcast list to see if we are connected to that device. If not we try to connect to it and in case of success we save the address and the socket on the list. If we are already connected to that device we simply use the socket. @node Windows functionality @subsubsection Windows functionality For Windows I decided to use the Microsoft Bluetooth stack which has the advantage of coming standard from Windows XP SP2. The main disadvantage is that it only supports the RFCOMM protocol so we will not be able to have a low level control over the Bluetooth device. Therefore it is the user responsibility to check if the device is up and in the discoverable mode. Also there are no tools which could be used for debugging in order to read the data coming from and going to a Bluetooth device, which obviously hindered my work. Another thing that slowed down the implementation of the plugin (besides that I wasn't too accommodated with the win32 API) was that there were some bugs on MinGW regarding the Bluetooth. Now they are solved but you should keep in mind that you should have the latest updates (especially the @emph{ws2bth} header). Besides the fact that it uses the Windows Sockets, the Windows implementation follows the same principles as the GNU/Linux one: @itemize @bullet @item It has a initalization part where it initializes the Windows Sockets, creates a RFCOMM socket which will be binded and switched to the listening mode and registers a SDP service. In the Microsoft Bluetooth API there are two ways to work with the SDP: @itemize @bullet @item an easy way which works with very simple service records @item a hard way which is useful when you need to update or to delete the record @end itemize @end itemize Since I only needed the SDP service to find out on which port the device is listening on and that did not change, I decided to use the easy way. In order to register the service I used the @emph{WSASetService} function and I generated the @emph{Universally Unique Identifier} with the @emph{guidgen.exe} Windows's tool. In the loop section the only difference from the GNU/Linux implementation is that I used the @code{GNUNET_NETWORK} library for functions like @emph{accept}, @emph{bind}, @emph{connect} or @emph{select}. I decided to use the @code{GNUNET_NETWORK} library because I also needed to interact with the STDIN and STDOUT handles and on Windows the select function is only defined for sockets, and it will not work for arbitrary file handles. Another difference between GNU/Linux and Windows implementation is that in GNU/Linux, the Bluetooth address is represented in 48 bits while in Windows is represented in 64 bits. Therefore I had to do some changes on @emph{plugin_transport_wlan} header. Also, currently on Windows the Bluetooth plugin doesn't have support for broadcast messages. When it receives a broadcast message it will skip it. @node Pending features @subsubsection Pending features @itemize @bullet @item Implement the broadcast functionality on Windows @emph{(currently working on)} @item Implement a testcase for the helper :@ @emph{The testcase consists of a program which emulates the plugin and uses the helper. It will simulate connections, disconnections and data transfers.} @end itemize If you have a new idea about a feature of the plugin or suggestions about how I could improve the implementation you are welcome to comment or to contact me. @node WLAN plugin @section WLAN plugin This section documents how the wlan transport plugin works. Parts which are not implemented yet or could be better implemented are described at the end. @cindex ATS Subsystem @node ATS Subsystem @section ATS Subsystem ATS stands for "automatic transport selection", and the function of ATS in GNUnet is to decide on which address (and thus transport plugin) should be used for two peers to communicate, and what bandwidth limits should be imposed on such an individual connection. To help ATS make an informed decision, higher-level services inform the ATS service about their requirements and the quality of the service rendered. The ATS service also interacts with the transport service to be appraised of working addresses and to communicate its resource allocation decisions. Finally, the ATS service's operation can be observed using a monitoring API. The main logic of the ATS service only collects the available addresses, their performance characteristics and the applications requirements, but does not make the actual allocation decision. This last critical step is left to an ATS plugin, as we have implemented (currently three) different allocation strategies which differ significantly in their performance and maturity, and it is still unclear if any particular plugin is generally superior. @cindex CORE Subsystem @node CORE Subsystem @section CORE Subsystem The CORE subsystem in GNUnet is responsible for securing link-layer communications between nodes in the GNUnet overlay network. CORE builds on the TRANSPORT subsystem which provides for the actual, insecure, unreliable link-layer communication (for example, via UDP or WLAN), and then adds fundamental security to the connections: @itemize @bullet @item confidentiality with so-called perfect forward secrecy; we use ECDHE (@uref{http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman, Elliptic-curve Diffie---Hellman}) powered by Curve25519 (@uref{http://cr.yp.to/ecdh.html, Curve25519}) for the key exchange and then use symmetric encryption, encrypting with both AES-256 (@uref{http://en.wikipedia.org/wiki/Rijndael, AES-256}) and Twofish (@uref{http://en.wikipedia.org/wiki/Twofish, Twofish}) @item @uref{http://en.wikipedia.org/wiki/Authentication, authentication} is achieved by signing the ephemeral keys using Ed25519 (@uref{http://ed25519.cr.yp.to/, Ed25519}), a deterministic variant of ECDSA (@uref{http://en.wikipedia.org/wiki/ECDSA, ECDSA}) @item integrity protection (using SHA-512 (@uref{http://en.wikipedia.org/wiki/SHA-2, SHA-512}) to do encrypt-then-MAC (@uref{http://en.wikipedia.org/wiki/Authenticated_encryption, encrypt-then-MAC})) @item Replay (@uref{http://en.wikipedia.org/wiki/Replay_attack, replay}) protection (using nonces, timestamps, challenge-response, message counters and ephemeral keys) @item liveness (keep-alive messages, timeout) @end itemize @menu * Limitations:: * When is a peer "connected"?:: * libgnunetcore:: * The CORE Client-Service Protocol:: * The CORE Peer-to-Peer Protocol:: @end menu @cindex core subsystem limitations @node Limitations @subsection Limitations CORE does not perform @uref{http://en.wikipedia.org/wiki/Routing, routing}; using CORE it is only possible to communicate with peers that happen to already be "directly" connected with each other. CORE also does not have an API to allow applications to establish such "direct" connections --- for this, applications can ask TRANSPORT, but TRANSPORT might not be able to establish a "direct" connection. The TOPOLOGY subsystem is responsible for trying to keep a few "direct" connections open at all times. Applications that need to talk to particular peers should use the CADET subsystem, as it can establish arbitrary "indirect" connections. Because CORE does not perform routing, CORE must only be used directly by applications that either perform their own routing logic (such as anonymous file-sharing) or that do not require routing, for example because they are based on flooding the network. CORE communication is unreliable and delivery is possibly out-of-order. Applications that require reliable communication should use the CADET service. Each application can only queue one message per target peer with the CORE service at any time; messages cannot be larger than approximately 63 kilobytes. If messages are small, CORE may group multiple messages (possibly from different applications) prior to encryption. If permitted by the application (using the @uref{http://baus.net/on-tcp_cork/, cork} option), CORE may delay transmissions to facilitate grouping of multiple small messages. If cork is not enabled, CORE will transmit the message as soon as TRANSPORT allows it (TRANSPORT is responsible for limiting bandwidth and congestion control). CORE does not allow flow control; applications are expected to process messages at line-speed. If flow control is needed, applications should use the CADET service. @cindex when is a peer connected @node When is a peer "connected"? @subsection When is a peer "connected"? In addition to the security features mentioned above, CORE also provides one additional key feature to applications using it, and that is a limited form of protocol-compatibility checking. CORE distinguishes between TRANSPORT-level connections (which enable communication with other peers) and application-level connections. Applications using the CORE API will (typically) learn about application-level connections from CORE, and not about TRANSPORT-level connections. When a typical application uses CORE, it will specify a set of message types (from @code{gnunet_protocols.h}) that it understands. CORE will then notify the application about connections it has with other peers if and only if those applications registered an intersecting set of message types with their CORE service. Thus, it is quite possible that CORE only exposes a subset of the established direct connections to a particular application --- and different applications running above CORE might see different sets of connections at the same time. A special case are applications that do not register a handler for any message type. CORE assumes that these applications merely want to monitor connections (or "all" messages via other callbacks) and will notify those applications about all connections. This is used, for example, by the @code{gnunet-core} command-line tool to display the active connections. Note that it is also possible that the TRANSPORT service has more active connections than the CORE service, as the CORE service first has to perform a key exchange with connecting peers before exchanging information about supported message types and notifying applications about the new connection. @cindex libgnunetcore @node libgnunetcore @subsection libgnunetcore The CORE API (defined in @file{gnunet_core_service.h}) is the basic messaging API used by P2P applications built using GNUnet. It provides applications the ability to send and receive encrypted messages to the peer's "directly" connected neighbours. As CORE connections are generally "direct" connections,@ applications must not assume that they can connect to arbitrary peers this way, as "direct" connections may not always be possible. Applications using CORE are notified about which peers are connected. Creating new "direct" connections must be done using the TRANSPORT API. The CORE API provides unreliable, out-of-order delivery. While the implementation tries to ensure timely, in-order delivery, both message losses and reordering are not detected and must be tolerated by the application. Most important, the core will NOT perform retransmission if messages could not be delivered. Note that CORE allows applications to queue one message per connected peer. The rate at which each connection operates is influenced by the preferences expressed by local application as well as restrictions imposed by the other peer. Local applications can express their preferences for particular connections using the "performance" API of the ATS service. Applications that require more sophisticated transmission capabilities such as TCP-like behavior, or if you intend to send messages to arbitrary remote peers, should use the CADET API. The typical use of the CORE API is to connect to the CORE service using @code{GNUNET_CORE_connect}, process events from the CORE service (such as peers connecting, peers disconnecting and incoming messages) and send messages to connected peers using @code{GNUNET_CORE_notify_transmit_ready}. Note that applications must cancel pending transmission requests if they receive a disconnect event for a peer that had a transmission pending; furthermore, queuing more than one transmission request per peer per application using the service is not permitted. The CORE API also allows applications to monitor all communications of the peer prior to encryption (for outgoing messages) or after decryption (for incoming messages). This can be useful for debugging, diagnostics or to establish the presence of cover traffic (for anonymity). As monitoring applications are often not interested in the payload, the monitoring callbacks can be configured to only provide the message headers (including the message type and size) instead of copying the full data stream to the monitoring client. The init callback of the @code{GNUNET_CORE_connect} function is called with the hash of the public key of the peer. This public key is used to identify the peer globally in the GNUnet network. Applications are encouraged to check that the provided hash matches the hash that they are using (as theoretically the application may be using a different configuration file with a different private key, which would result in hard to find bugs). As with most service APIs, the CORE API isolates applications from crashes of the CORE service. If the CORE service crashes, the application will see disconnect events for all existing connections. Once the connections are re-established, the applications will be receive matching connect events. @cindex core clinet-service protocol @node The CORE Client-Service Protocol @subsection The CORE Client-Service Protocol This section describes the protocol between an application using the CORE service (the client) and the CORE service process itself. @menu * Setup2:: * Notifications:: * Sending:: @end menu @node Setup2 @subsubsection Setup2 When a client connects to the CORE service, it first sends a @code{InitMessage} which specifies options for the connection and a set of message type values which are supported by the application. The options bitmask specifies which events the client would like to be notified about. The options include: @table @asis @item GNUNET_CORE_OPTION_NOTHING No notifications @item GNUNET_CORE_OPTION_STATUS_CHANGE Peers connecting and disconnecting @item GNUNET_CORE_OPTION_FULL_INBOUND All inbound messages (after decryption) with full payload @item GNUNET_CORE_OPTION_HDR_INBOUND Just the @code{MessageHeader} of all inbound messages @item GNUNET_CORE_OPTION_FULL_OUTBOUND All outbound messages (prior to encryption) with full payload @item GNUNET_CORE_OPTION_HDR_OUTBOUND Just the @code{MessageHeader} of all outbound messages @end table Typical applications will only monitor for connection status changes. The CORE service responds to the @code{InitMessage} with an @code{InitReplyMessage} which contains the peer's identity. Afterwards, both CORE and the client can send messages. @node Notifications @subsubsection Notifications The CORE will send @code{ConnectNotifyMessage}s and @code{DisconnectNotifyMessage}s whenever peers connect or disconnect from the CORE (assuming their type maps overlap with the message types registered by the client). When the CORE receives a message that matches the set of message types specified during the @code{InitMessage} (or if monitoring is enabled in for inbound messages in the options), it sends a @code{NotifyTrafficMessage} with the peer identity of the sender and the decrypted payload. The same message format (except with @code{GNUNET_MESSAGE_TYPE_CORE_NOTIFY_OUTBOUND} for the message type) is used to notify clients monitoring outbound messages; here, the peer identity given is that of the receiver. @node Sending @subsubsection Sending When a client wants to transmit a message, it first requests a transmission slot by sending a @code{SendMessageRequest} which specifies the priority, deadline and size of the message. Note that these values may be ignored by CORE. When CORE is ready for the message, it answers with a @code{SendMessageReady} response. The client can then transmit the payload with a @code{SendMessage} message. Note that the actual message size in the @code{SendMessage} is allowed to be smaller than the size in the original request. A client may at any time send a fresh @code{SendMessageRequest}, which then superceeds the previous @code{SendMessageRequest}, which is then no longer valid. The client can tell which @code{SendMessageRequest} the CORE service's @code{SendMessageReady} message is for as all of these messages contain a "unique" request ID (based on a counter incremented by the client for each request). @cindex CORE Peer-to-Peer Protocol @node The CORE Peer-to-Peer Protocol @subsection The CORE Peer-to-Peer Protocol @menu * Creating the EphemeralKeyMessage:: * Establishing a connection:: * Encryption and Decryption:: * Type maps:: @end menu @cindex EphemeralKeyMessage creation @node Creating the EphemeralKeyMessage @subsubsection Creating the EphemeralKeyMessage When the CORE service starts, each peer creates a fresh ephemeral (ECC) public-private key pair and signs the corresponding @code{EphemeralKeyMessage} with its long-term key (which we usually call the peer's identity; the hash of the public long term key is what results in a @code{struct GNUNET_PeerIdentity} in all GNUnet APIs. The ephemeral key is ONLY used for an ECDHE (@uref{http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman, Elliptic-curve Diffie---Hellman}) exchange by the CORE service to establish symmetric session keys. A peer will use the same @code{EphemeralKeyMessage} for all peers for @code{REKEY_FREQUENCY}, which is usually 12 hours. After that time, it will create a fresh ephemeral key (forgetting the old one) and broadcast the new @code{EphemeralKeyMessage} to all connected peers, resulting in fresh symmetric session keys. Note that peers independently decide on when to discard ephemeral keys; it is not a protocol violation to discard keys more often. Ephemeral keys are also never stored to disk; restarting a peer will thus always create a fresh ephemeral key. The use of ephemeral keys is what provides @uref{http://en.wikipedia.org/wiki/Forward_secrecy, forward secrecy}. Just before transmission, the @code{EphemeralKeyMessage} is patched to reflect the current sender_status, which specifies the current state of the connection from the point of view of the sender. The possible values are: @itemize @bullet @item @code{KX_STATE_DOWN} Initial value, never used on the network @item @code{KX_STATE_KEY_SENT} We sent our ephemeral key, do not know the key of the other peer @item @code{KX_STATE_KEY_RECEIVED} This peer has received a valid ephemeral key of the other peer, but we are waiting for the other peer to confirm it's authenticity (ability to decode) via challenge-response. @item @code{KX_STATE_UP} The connection is fully up from the point of view of the sender (now performing keep-alives) @item @code{KX_STATE_REKEY_SENT} The sender has initiated a rekeying operation; the other peer has so far failed to confirm a working connection using the new ephemeral key @end itemize @node Establishing a connection @subsubsection Establishing a connection Peers begin their interaction by sending a @code{EphemeralKeyMessage} to the other peer once the TRANSPORT service notifies the CORE service about the connection. A peer receiving an @code{EphemeralKeyMessage} with a status indicating that the sender does not have the receiver's ephemeral key, the receiver's @code{EphemeralKeyMessage} is sent in response. Additionally, if the receiver has not yet confirmed the authenticity of the sender, it also sends an (encrypted)@code{PingMessage} with a challenge (and the identity of the target) to the other peer. Peers receiving a @code{PingMessage} respond with an (encrypted) @code{PongMessage} which includes the challenge. Peers receiving a @code{PongMessage} check the challenge, and if it matches set the connection to @code{KX_STATE_UP}. @node Encryption and Decryption @subsubsection Encryption and Decryption All functions related to the key exchange and encryption/decryption of messages can be found in @file{gnunet-service-core_kx.c} (except for the cryptographic primitives, which are in @file{util/crypto*.c}). Given the key material from ECDHE, a Key derivation function (@uref{https://en.wikipedia.org/wiki/Key_derivation_function, Key derivation function}) is used to derive two pairs of encryption and decryption keys for AES-256 and TwoFish, as well as initialization vectors and authentication keys (for HMAC (@uref{https://en.wikipedia.org/wiki/HMAC, HMAC})). The HMAC is computed over the encrypted payload. Encrypted messages include an iv_seed and the HMAC in the header. Each encrypted message in the CORE service includes a sequence number and a timestamp in the encrypted payload. The CORE service remembers the largest observed sequence number and a bit-mask which represents which of the previous 32 sequence numbers were already used. Messages with sequence numbers lower than the largest observed sequence number minus 32 are discarded. Messages with a timestamp that is less than @code{REKEY_TOLERANCE} off (5 minutes) are also discarded. This of course means that system clocks need to be reasonably synchronized for peers to be able to communicate. Additionally, as the ephemeral key changes every 12 hours, a peer would not even be able to decrypt messages older than 12 hours. @node Type maps @subsubsection Type maps Once an encrypted connection has been established, peers begin to exchange type maps. Type maps are used to allow the CORE service to determine which (encrypted) connections should be shown to which applications. A type map is an array of 65536 bits representing the different types of messages understood by applications using the CORE service. Each CORE service maintains this map, simply by setting the respective bit for each message type supported by any of the applications using the CORE service. Note that bits for message types embedded in higher-level protocols (such as MESH) will not be included in these type maps. Typically, the type map of a peer will be sparse. Thus, the CORE service attempts to compress its type map using @code{gzip}-style compression ("deflate") prior to transmission. However, if the compression fails to compact the map, the map may also be transmitted without compression (resulting in @code{GNUNET_MESSAGE_TYPE_CORE_COMPRESSED_TYPE_MAP} or @code{GNUNET_MESSAGE_TYPE_CORE_BINARY_TYPE_MAP} messages respectively). Upon receiving a type map, the respective CORE service notifies applications about the connection to the other peer if they support any message type indicated in the type map (or no message type at all). If the CORE service experience a connect or disconnect event from an application, it updates its type map (setting or unsetting the respective bits) and notifies its neighbours about the change. The CORE services of the neighbours then in turn generate connect and disconnect events for the peer that sent the type map for their respective applications. As CORE messages may be lost, the CORE service confirms receiving a type map by sending back a @code{GNUNET_MESSAGE_TYPE_CORE_CONFIRM_TYPE_MAP}. If such a confirmation (with the correct hash of the type map) is not received, the sender will retransmit the type map (with exponential back-off). @cindex CADET Subsystem @cindex CADET @cindex cadet @node CADET Subsystem @section CADET Subsystem The CADET subsystem in GNUnet is responsible for secure end-to-end communications between nodes in the GNUnet overlay network. CADET builds on the CORE subsystem which provides for the link-layer communication and then adds routing, forwarding and additional security to the connections. CADET offers the same cryptographic services as CORE, but on an end-to-end level. This is done so peers retransmitting traffic on behalf of other peers cannot access the payload data. @itemize @bullet @item CADET provides confidentiality with so-called perfect forward secrecy; we use ECDHE powered by Curve25519 for the key exchange and then use symmetric encryption, encrypting with both AES-256 and Twofish @item authentication is achieved by signing the ephemeral keys using Ed25519, a deterministic variant of ECDSA @item integrity protection (using SHA-512 to do encrypt-then-MAC, although only 256 bits are sent to reduce overhead) @item replay protection (using nonces, timestamps, challenge-response, message counters and ephemeral keys) @item liveness (keep-alive messages, timeout) @end itemize Additional to the CORE-like security benefits, CADET offers other properties that make it a more universal service than CORE. @itemize @bullet @item CADET can establish channels to arbitrary peers in GNUnet. If a peer is not immediately reachable, CADET will find a path through the network and ask other peers to retransmit the traffic on its behalf. @item CADET offers (optional) reliability mechanisms. In a reliable channel traffic is guaranteed to arrive complete, unchanged and in-order. @item CADET takes care of flow and congestion control mechanisms, not allowing the sender to send more traffic than the receiver or the network are able to process. @end itemize @menu * libgnunetcadet:: @end menu @cindex libgnunetcadet @node libgnunetcadet @subsection libgnunetcadet The CADET API (defined in @file{gnunet_cadet_service.h}) is the messaging API used by P2P applications built using GNUnet. It provides applications the ability to send and receive encrypted messages to any peer participating in GNUnet. The API is heavily base on the CORE API. CADET delivers messages to other peers in "channels". A channel is a permanent connection defined by a destination peer (identified by its public key) and a port number. Internally, CADET tunnels all channels towards a destination peer using one session key and relays the data on multiple "connections", independent from the channels. Each channel has optional parameters, the most important being the reliability flag. Should a message get lost on TRANSPORT/CORE level, if a channel is created with as reliable, CADET will retransmit the lost message and deliver it in order to the destination application. @pindex GNUNET_CADET_connect To communicate with other peers using CADET, it is necessary to first connect to the service using @code{GNUNET_CADET_connect}. This function takes several parameters in form of callbacks, to allow the client to react to various events, like incoming channels or channels that terminate, as well as specify a list of ports the client wishes to listen to (at the moment it is not possible to start listening on further ports once connected, but nothing prevents a client to connect several times to CADET, even do one connection per listening port). The function returns a handle which has to be used for any further interaction with the service. @pindex GNUNET_CADET_channel_create To connect to a remote peer, a client has to call the @code{GNUNET_CADET_channel_create} function. The most important parameters given are the remote peer's identity (it public key) and a port, which specifies which application on the remote peer to connect to, similar to TCP/UDP ports. CADET will then find the peer in the GNUnet network and establish the proper low-level connections and do the necessary key exchanges to assure and authenticated, secure and verified communication. Similar to @code{GNUNET_CADET_connect},@code{GNUNET_CADET_create_channel} returns a handle to interact with the created channel. @pindex GNUNET_CADET_notify_transmit_ready For every message the client wants to send to the remote application, @code{GNUNET_CADET_notify_transmit_ready} must be called, indicating the channel on which the message should be sent and the size of the message (but not the message itself!). Once CADET is ready to send the message, the provided callback will fire, and the message contents are provided to this callback. Please note the CADET does not provide an explicit notification of when a channel is connected. In loosely connected networks, like big wireless mesh networks, this can take several seconds, even minutes in the worst case. To be alerted when a channel is online, a client can call @code{GNUNET_CADET_notify_transmit_ready} immediately after @code{GNUNET_CADET_create_channel}. When the callback is activated, it means that the channel is online. The callback can give 0 bytes to CADET if no message is to be sent, this is OK. @pindex GNUNET_CADET_notify_transmit_cancel If a transmission was requested but before the callback fires it is no longer needed, it can be canceled with @code{GNUNET_CADET_notify_transmit_ready_cancel}, which uses the handle given back by @code{GNUNET_CADET_notify_transmit_ready}. As in the case of CORE, only one message can be requested at a time: a client must not call @code{GNUNET_CADET_notify_transmit_ready} again until the callback is called or the request is canceled. @pindex GNUNET_CADET_channel_destroy When a channel is no longer needed, a client can call @code{GNUNET_CADET_channel_destroy} to get rid of it. Note that CADET will try to transmit all pending traffic before notifying the remote peer of the destruction of the channel, including retransmitting lost messages if the channel was reliable. Incoming channels, channels being closed by the remote peer, and traffic on any incoming or outgoing channels are given to the client when CADET executes the callbacks given to it at the time of @code{GNUNET_CADET_connect}. @pindex GNUNET_CADET_disconnect Finally, when an application no longer wants to use CADET, it should call @code{GNUNET_CADET_disconnect}, but first all channels and pending transmissions must be closed (otherwise CADET will complain). @cindex NSE Subsystem @node NSE Subsystem @section NSE Subsystem NSE stands for @dfn{Network Size Estimation}. The NSE subsystem provides other subsystems and users with a rough estimate of the number of peers currently participating in the GNUnet overlay. The computed value is not a precise number as producing a precise number in a decentralized, efficient and secure way is impossible. While NSE's estimate is inherently imprecise, NSE also gives the expected range. For a peer that has been running in a stable network for a while, the real network size will typically (99.7% of the time) be in the range of [2/3 estimate, 3/2 estimate]. We will now give an overview of the algorithm used to calculate the estimate; all of the details can be found in this technical report. @c FIXME: link to the report. @menu * Motivation:: * Principle:: * libgnunetnse:: * The NSE Client-Service Protocol:: * The NSE Peer-to-Peer Protocol:: @end menu @node Motivation @subsection Motivation Some subsystems, like DHT, need to know the size of the GNUnet network to optimize some parameters of their own protocol. The decentralized nature of GNUnet makes efficient and securely counting the exact number of peers infeasible. Although there are several decentralized algorithms to count the number of peers in a system, so far there is none to do so securely. Other protocols may allow any malicious peer to manipulate the final result or to take advantage of the system to perform @dfn{Denial of Service} (DoS) attacks against the network. GNUnet's NSE protocol avoids these drawbacks. @menu * Security:: @end menu @cindex NSE security @cindex nse security @node Security @subsubsection Security The NSE subsystem is designed to be resilient against these attacks. It uses @uref{http://en.wikipedia.org/wiki/Proof-of-work_system, proofs of work} to prevent one peer from impersonating a large number of participants, which would otherwise allow an adversary to artificially inflate the estimate. The DoS protection comes from the time-based nature of the protocol: the estimates are calculated periodically and out-of-time traffic is either ignored or stored for later retransmission by benign peers. In particular, peers cannot trigger global network communication at will. @cindex NSE principle @cindex nse principle @node Principle @subsection Principle The algorithm calculates the estimate by finding the globally closest peer ID to a random, time-based value. The idea is that the closer the ID is to the random value, the more "densely packed" the ID space is, and therefore, more peers are in the network. @menu * Example:: * Algorithm:: * Target value:: * Timing:: * Controlled Flooding:: * Calculating the estimate:: @end menu @node Example @subsubsection Example Suppose all peers have IDs between 0 and 100 (our ID space), and the random value is 42. If the closest peer has the ID 70 we can imagine that the average "distance" between peers is around 30 and therefore the are around 3 peers in the whole ID space. On the other hand, if the closest peer has the ID 44, we can imagine that the space is rather packed with peers, maybe as much as 50 of them. Naturally, we could have been rather unlucky, and there is only one peer and happens to have the ID 44. Thus, the current estimate is calculated as the average over multiple rounds, and not just a single sample. @node Algorithm @subsubsection Algorithm Given that example, one can imagine that the job of the subsystem is to efficiently communicate the ID of the closest peer to the target value to all the other peers, who will calculate the estimate from it. @node Target value @subsubsection Target value The target value itself is generated by hashing the current time, rounded down to an agreed value. If the rounding amount is 1h (default) and the time is 12:34:56, the time to hash would be 12:00:00. The process is repeated each rounding amount (in this example would be every hour). Every repetition is called a round. @node Timing @subsubsection Timing The NSE subsystem has some timing control to avoid everybody broadcasting its ID all at one. Once each peer has the target random value, it compares its own ID to the target and calculates the hypothetical size of the network if that peer were to be the closest. Then it compares the hypothetical size with the estimate from the previous rounds. For each value there is an associated point in the period, let's call it "broadcast time". If its own hypothetical estimate is the same as the previous global estimate, its "broadcast time" will be in the middle of the round. If its bigger it will be earlier and if its smaller (the most likely case) it will be later. This ensures that the peers closest to the target value start broadcasting their ID the first. @node Controlled Flooding @subsubsection Controlled Flooding When a peer receives a value, first it verifies that it is closer than the closest value it had so far, otherwise it answers the incoming message with a message containing the better value. Then it checks a proof of work that must be included in the incoming message, to ensure that the other peer's ID is not made up (otherwise a malicious peer could claim to have an ID of exactly the target value every round). Once validated, it compares the broadcast time of the received value with the current time and if it's not too early, sends the received value to its neighbors. Otherwise it stores the value until the correct broadcast time comes. This prevents unnecessary traffic of sub-optimal values, since a better value can come before the broadcast time, rendering the previous one obsolete and saving the traffic that would have been used to broadcast it to the neighbors. @node Calculating the estimate @subsubsection Calculating the estimate Once the closest ID has been spread across the network each peer gets the exact distance between this ID and the target value of the round and calculates the estimate with a mathematical formula described in the tech report. The estimate generated with this method for a single round is not very precise. Remember the case of the example, where the only peer is the ID 44 and we happen to generate the target value 42, thinking there are 50 peers in the network. Therefore, the NSE subsystem remembers the last 64 estimates and calculates an average over them, giving a result of which usually has one bit of uncertainty (the real size could be half of the estimate or twice as much). Note that the actual network size is calculated in powers of two of the raw input, thus one bit of uncertainty means a factor of two in the size estimate. @cindex libgnunetnse @node libgnunetnse @subsection libgnunetnse The NSE subsystem has the simplest API of all services, with only two calls: @code{GNUNET_NSE_connect} and @code{GNUNET_NSE_disconnect}. The connect call gets a callback function as a parameter and this function is called each time the network agrees on an estimate. This usually is once per round, with some exceptions: if the closest peer has a late local clock and starts spreading its ID after everyone else agreed on a value, the callback might be activated twice in a round, the second value being always bigger than the first. The default round time is set to 1 hour. The disconnect call disconnects from the NSE subsystem and the callback is no longer called with new estimates. @menu * Results:: * libgnunetnse - Examples:: @end menu @node Results @subsubsection Results The callback provides two values: the average and the @uref{http://en.wikipedia.org/wiki/Standard_deviation, standard deviation} of the last 64 rounds. The values provided by the callback function are logarithmic, this means that the real estimate numbers can be obtained by calculating 2 to the power of the given value (2average). From a statistics point of view this means that: @itemize @bullet @item 68% of the time the real size is included in the interval [(2average-stddev), 2] @item 95% of the time the real size is included in the interval [(2average-2*stddev, 2^average+2*stddev] @item 99.7% of the time the real size is included in the interval [(2average-3*stddev, 2average+3*stddev] @end itemize The expected standard variation for 64 rounds in a network of stable size is 0.2. Thus, we can say that normally: @itemize @bullet @item 68% of the time the real size is in the range [-13%, +15%] @item 95% of the time the real size is in the range [-24%, +32%] @item 99.7% of the time the real size is in the range [-34%, +52%] @end itemize As said in the introduction, we can be quite sure that usually the real size is between one third and three times the estimate. This can of course vary with network conditions. Thus, applications may want to also consider the provided standard deviation value, not only the average (in particular, if the standard variation is very high, the average maybe meaningless: the network size is changing rapidly). @node libgnunetnse - Examples @subsubsection libgnunetnse -Examples Let's close with a couple examples. @table @asis @item Average: 10, std dev: 1 Here the estimate would be 2^10 = 1024 peers. (The range in which we can be 95% sure is: [2^8, 2^12] = [256, 4096]. We can be very (>99.7%) sure that the network is not a hundred peers and absolutely sure that it is not a million peers, but somewhere around a thousand.) @item Average 22, std dev: 0.2 Here the estimate would be 2^22 = 4 Million peers. (The range in which we can be 99.7% sure is: [2^21.4, 2^22.6] = [2.8M, 6.3M]. We can be sure that the network size is around four million, with absolutely way of it being 1 million.) @end table To put this in perspective, if someone remembers the LHC Higgs boson results, were announced with "5 sigma" and "6 sigma" certainties. In this case a 5 sigma minimum would be 2 million and a 6 sigma minimum, 1.8 million. @node The NSE Client-Service Protocol @subsection The NSE Client-Service Protocol As with the API, the client-service protocol is very simple, only has 2 different messages, defined in @code{src/nse/nse.h}: @itemize @bullet @item @code{GNUNET_MESSAGE_TYPE_NSE_START}@ This message has no parameters and is sent from the client to the service upon connection. @item @code{GNUNET_MESSAGE_TYPE_NSE_ESTIMATE}@ This message is sent from the service to the client for every new estimate and upon connection. Contains a timestamp for the estimate, the average and the standard deviation for the respective round. @end itemize When the @code{GNUNET_NSE_disconnect} API call is executed, the client simply disconnects from the service, with no message involved. @cindex NSE Peer-to-Peer Protocol @node The NSE Peer-to-Peer Protocol @subsection The NSE Peer-to-Peer Protocol @pindex GNUNET_MESSAGE_TYPE_NSE_P2P_FLOOD The NSE subsystem only has one message in the P2P protocol, the @code{GNUNET_MESSAGE_TYPE_NSE_P2P_FLOOD} message. This message key contents are the timestamp to identify the round (differences in system clocks may cause some peers to send messages way too early or way too late, so the timestamp allows other peers to identify such messages easily), the @uref{http://en.wikipedia.org/wiki/Proof-of-work_system, proof of work} used to make it difficult to mount a @uref{http://en.wikipedia.org/wiki/Sybil_attack, Sybil attack}, and the public key, which is used to verify the signature on the message. Every peer stores a message for the previous, current and next round. The messages for the previous and current round are given to peers that connect to us. The message for the next round is simply stored until our system clock advances to the next round. The message for the current round is what we are flooding the network with right now. At the beginning of each round the peer does the following: @itemize @bullet @item calculates its own distance to the target value @item creates, signs and stores the message for the current round (unless it has a better message in the "next round" slot which came early in the previous round) @item calculates, based on the stored round message (own or received) when to start flooding it to its neighbors @end itemize Upon receiving a message the peer checks the validity of the message (round, proof of work, signature). The next action depends on the contents of the incoming message: @itemize @bullet @item if the message is worse than the current stored message, the peer sends the current message back immediately, to stop the other peer from spreading suboptimal results @item if the message is better than the current stored message, the peer stores the new message and calculates the new target time to start spreading it to its neighbors (excluding the one the message came from) @item if the message is for the previous round, it is compared to the message stored in the "previous round slot", which may then be updated @item if the message is for the next round, it is compared to the message stored in the "next round slot", which again may then be updated @end itemize Finally, when it comes to send the stored message for the current round to the neighbors there is a random delay added for each neighbor, to avoid traffic spikes and minimize cross-messages. @cindex HOSTLIST Subsystem @node HOSTLIST Subsystem @section HOSTLIST Subsystem Peers in the GNUnet overlay network need address information so that they can connect with other peers. GNUnet uses so called HELLO messages to store and exchange peer addresses. GNUnet provides several methods for peers to obtain this information: @itemize @bullet @item out-of-band exchange of HELLO messages (manually, using for example gnunet-peerinfo) @item HELLO messages shipped with GNUnet (automatic with distribution) @item UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast) @item topology gossiping (learning from other peers we already connected to), and @item the HOSTLIST daemon covered in this section, which is particularly relevant for bootstrapping new peers. @end itemize New peers have no existing connections (and thus cannot learn from gossip among peers), may not have other peers in their LAN and might be started with an outdated set of HELLO messages from the distribution. In this case, getting new peers to connect to the network requires either manual effort or the use of a HOSTLIST to obtain HELLOs. @menu * HELLOs:: * Overview for the HOSTLIST subsystem:: * Interacting with the HOSTLIST daemon:: * Hostlist security address validation:: * The HOSTLIST daemon:: * The HOSTLIST server:: * The HOSTLIST client:: * Usage:: @end menu @node HELLOs @subsection HELLOs The basic information peers require to connect to other peers are contained in so called HELLO messages you can think of as a business card. Besides the identity of the peer (based on the cryptographic public key) a HELLO message may contain address information that specifies ways to contact a peer. By obtaining HELLO messages, a peer can learn how to contact other peers. @node Overview for the HOSTLIST subsystem @subsection Overview for the HOSTLIST subsystem The HOSTLIST subsystem provides a way to distribute and obtain contact information to connect to other peers using a simple HTTP GET request. It's implementation is split in three parts, the main file for the daemon itself (@file{gnunet-daemon-hostlist.c}), the HTTP client used to download peer information (@file{hostlist-client.c}) and the server component used to provide this information to other peers (@file{hostlist-server.c}). The server is basically a small HTTP web server (based on GNU libmicrohttpd) which provides a list of HELLOs known to the local peer for download. The client component is basically a HTTP client (based on libcurl) which can download hostlists from one or more websites. The hostlist format is a binary blob containing a sequence of HELLO messages. Note that any HTTP server can theoretically serve a hostlist, the build-in hostlist server makes it simply convenient to offer this service. @menu * Features:: * HOSTLIST - Limitations:: @end menu @node Features @subsubsection Features The HOSTLIST daemon can: @itemize @bullet @item provide HELLO messages with validated addresses obtained from PEERINFO to download for other peers @item download HELLO messages and forward these message to the TRANSPORT subsystem for validation @item advertises the URL of this peer's hostlist address to other peers via gossip @item automatically learn about hostlist servers from the gossip of other peers @end itemize @node HOSTLIST - Limitations @subsubsection HOSTLIST - Limitations The HOSTLIST daemon does not: @itemize @bullet @item verify the cryptographic information in the HELLO messages @item verify the address information in the HELLO messages @end itemize @node Interacting with the HOSTLIST daemon @subsection Interacting with the HOSTLIST daemon The HOSTLIST subsystem is currently implemented as a daemon, so there is no need for the user to interact with it and therefore there is no command line tool and no API to communicate with the daemon. In the future, we can envision changing this to allow users to manually trigger the download of a hostlist. Since there is no command line interface to interact with HOSTLIST, the only way to interact with the hostlist is to use STATISTICS to obtain or modify information about the status of HOSTLIST: @example $ gnunet-statistics -s hostlist @end example @noindent In particular, HOSTLIST includes a @strong{persistent} value in statistics that specifies when the hostlist server might be queried next. As this value is exponentially increasing during runtime, developers may want to reset or manually adjust it. Note that HOSTLIST (but not STATISTICS) needs to be shutdown if changes to this value are to have any effect on the daemon (as HOSTLIST does not monitor STATISTICS for changes to the download frequency). @node Hostlist security address validation @subsection Hostlist security address validation Since information obtained from other parties cannot be trusted without validation, we have to distinguish between @emph{validated} and @emph{not validated} addresses. Before using (and so trusting) information from other parties, this information has to be double-checked (validated). Address validation is not done by HOSTLIST but by the TRANSPORT service. The HOSTLIST component is functionally located between the PEERINFO and the TRANSPORT subsystem. When acting as a server, the daemon obtains valid (@emph{validated}) peer information (HELLO messages) from the PEERINFO service and provides it to other peers. When acting as a client, it contacts the HOSTLIST servers specified in the configuration, downloads the (unvalidated) list of HELLO messages and forwards these information to the TRANSPORT server to validate the addresses. @cindex HOSTLIST daemon @node The HOSTLIST daemon @subsection The HOSTLIST daemon The hostlist daemon is the main component of the HOSTLIST subsystem. It is started by the ARM service and (if configured) starts the HOSTLIST client and server components. @pindex GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT If the daemon provides a hostlist itself it can advertise it's own hostlist to other peers. To do so it sends a @code{GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT} message to other peers when they connect to this peer on the CORE level. This hostlist advertisement message contains the URL to access the HOSTLIST HTTP server of the sender. The daemon may also subscribe to this type of message from CORE service, and then forward these kind of message to the HOSTLIST client. The client then uses all available URLs to download peer information when necessary. When starting, the HOSTLIST daemon first connects to the CORE subsystem and if hostlist learning is enabled, registers a CORE handler to receive this kind of messages. Next it starts (if configured) the client and server. It passes pointers to CORE connect and disconnect and receive handlers where the client and server store their functions, so the daemon can notify them about CORE events. To clean up on shutdown, the daemon has a cleaning task, shutting down all subsystems and disconnecting from CORE. @cindex HOSTLIST server @node The HOSTLIST server @subsection The HOSTLIST server The server provides a way for other peers to obtain HELLOs. Basically it is a small web server other peers can connect to and download a list of HELLOs using standard HTTP; it may also advertise the URL of the hostlist to other peers connecting on CORE level. @menu * The HTTP Server:: * Advertising the URL:: @end menu @node The HTTP Server @subsubsection The HTTP Server During startup, the server starts a web server listening on the port specified with the HTTPPORT value (default 8080). In addition it connects to the PEERINFO service to obtain peer information. The HOSTLIST server uses the GNUNET_PEERINFO_iterate function to request HELLO information for all peers and adds their information to a new hostlist if they are suitable (expired addresses and HELLOs without addresses are both not suitable) and the maximum size for a hostlist is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When PEERINFO finishes (with a last NULL callback), the server destroys the previous hostlist response available for download on the web server and replaces it with the updated hostlist. The hostlist format is basically a sequence of HELLO messages (as obtained from PEERINFO) without any special tokenization. Since each HELLO message contains a size field, the response can easily be split into separate HELLO messages by the client. A HOSTLIST client connecting to the HOSTLIST server will receive the hostlist as a HTTP response and the the server will terminate the connection with the result code @code{HTTP 200 OK}. The connection will be closed immediately if no hostlist is available. @node Advertising the URL @subsubsection Advertising the URL The server also advertises the URL to download the hostlist to other peers if hostlist advertisement is enabled. When a new peer connects and has hostlist learning enabled, the server sends a @code{GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT} message to this peer using the CORE service. @cindex HOSTLIST client @node The HOSTLIST client @subsection The HOSTLIST client The client provides the functionality to download the list of HELLOs from a set of URLs. It performs a standard HTTP request to the URLs configured and learned from advertisement messages received from other peers. When a HELLO is downloaded, the HOSTLIST client forwards the HELLO to the TRANSPORT service for validation. The client supports two modes of operation: @itemize @bullet @item download of HELLOs (bootstrapping) @item learning of URLs @end itemize @menu * Bootstrapping:: * Learning:: @end menu @node Bootstrapping @subsubsection Bootstrapping For bootstrapping, it schedules a task to download the hostlist from the set of known URLs. The downloads are only performed if the number of current connections is smaller than a minimum number of connections (at the moment 4). The interval between downloads increases exponentially; however, the exponential growth is limited if it becomes longer than an hour. At that point, the frequency growth is capped at (#number of connections * 1h). Once the decision has been taken to download HELLOs, the daemon chooses a random URL from the list of known URLs. URLs can be configured in the configuration or be learned from advertisement messages. The client uses a HTTP client library (libcurl) to initiate the download using the libcurl multi interface. Libcurl passes the data to the callback_download function which stores the data in a buffer if space is available and the maximum size for a hostlist download is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When a full HELLO was downloaded, the HOSTLIST client offers this HELLO message to the TRANSPORT service for validation. When the download is finished or failed, statistical information about the quality of this URL is updated. @cindex HOSTLIST learning @node Learning @subsubsection Learning The client also manages hostlist advertisements from other peers. The HOSTLIST daemon forwards @code{GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT} messages to the client subsystem, which extracts the URL from the message. Next, a test of the newly obtained URL is performed by triggering a download from the new URL. If the URL works correctly, it is added to the list of working URLs. The size of the list of URLs is restricted, so if an additional server is added and the list is full, the URL with the worst quality ranking (determined through successful downloads and number of HELLOs e.g.) is discarded. During shutdown the list of URLs is saved to a file for persistance and loaded on startup. URLs from the configuration file are never discarded. @node Usage @subsection Usage To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES section for the ARM services. This is done in the default configuration. For more information on how to configure the HOSTLIST subsystem see the installation handbook:@ Configuring the hostlist to bootstrap@ Configuring your peer to provide a hostlist @cindex IDENTITY Subsystem @node IDENTITY Subsystem @section IDENTITY Subsystem Identities of "users" in GNUnet are called egos. Egos can be used as pseudonyms ("fake names") or be tied to an organization (for example, "GNU") or even the actual identity of a human. GNUnet users are expected to have many egos. They might have one tied to their real identity, some for organizations they manage, and more for different domains where they want to operate under a pseudonym. The IDENTITY service allows users to manage their egos. The identity service manages the private keys egos of the local user; it does not manage identities of other users (public keys). Public keys for other users need names to become manageable. GNUnet uses the @dfn{GNU Name System} (GNS) to give names to other users and manage their public keys securely. This chapter is about the IDENTITY service, which is about the management of private keys. On the network, an ego corresponds to an ECDSA key (over Curve25519, using RFC 6979, as required by GNS). Thus, users can perform actions under a particular ego by using (signing with) a particular private key. Other users can then confirm that the action was really performed by that ego by checking the signature against the respective public key. The IDENTITY service allows users to associate a human-readable name with each ego. This way, users can use names that will remind them of the purpose of a particular ego. The IDENTITY service will store the respective private keys and allows applications to access key information by name. Users can change the name that is locally (!) associated with an ego. Egos can also be deleted, which means that the private key will be removed and it thus will not be possible to perform actions with that ego in the future. Additionally, the IDENTITY subsystem can associate service functions with egos. For example, GNS requires the ego that should be used for the shorten zone. GNS will ask IDENTITY for an ego for the "gns-short" service. The IDENTITY service has a mapping of such service strings to the name of the ego that the user wants to use for this service, for example "my-short-zone-ego". Finally, the IDENTITY API provides access to a special ego, the anonymous ego. The anonymous ego is special in that its private key is not really private, but fixed and known to everyone. Thus, anyone can perform actions as anonymous. This can be useful as with this trick, code does not have to contain a special case to distinguish between anonymous and pseudonymous egos. @menu * libgnunetidentity:: * The IDENTITY Client-Service Protocol:: @end menu @cindex libgnunetidentity @node libgnunetidentity @subsection libgnunetidentity @menu * Connecting to the service:: * Operations on Egos:: * The anonymous Ego:: * Convenience API to lookup a single ego:: * Associating egos with service functions:: @end menu @node Connecting to the service @subsubsection Connecting to the service First, typical clients connect to the identity service using @code{GNUNET_IDENTITY_connect}. This function takes a callback as a parameter. If the given callback parameter is non-null, it will be invoked to notify the application about the current state of the identities in the system. @itemize @bullet @item First, it will be invoked on all known egos at the time of the connection. For each ego, a handle to the ego and the user's name for the ego will be passed to the callback. Furthermore, a @code{void **} context argument will be provided which gives the client the opportunity to associate some state with the ego. @item Second, the callback will be invoked with NULL for the ego, the name and the context. This signals that the (initial) iteration over all egos has completed. @item Then, the callback will be invoked whenever something changes about an ego. If an ego is renamed, the callback is invoked with the ego handle of the ego that was renamed, and the new name. If an ego is deleted, the callback is invoked with the ego handle and a name of NULL. In the deletion case, the application should also release resources stored in the context. @item When the application destroys the connection to the identity service using @code{GNUNET_IDENTITY_disconnect}, the callback is again invoked with the ego and a name of NULL (equivalent to deletion of the egos). This should again be used to clean up the per-ego context. @end itemize The ego handle passed to the callback remains valid until the callback is invoked with a name of NULL, so it is safe to store a reference to the ego's handle. @node Operations on Egos @subsubsection Operations on Egos Given an ego handle, the main operations are to get its associated private key using @code{GNUNET_IDENTITY_ego_get_private_key} or its associated public key using @code{GNUNET_IDENTITY_ego_get_public_key}. The other operations on egos are pretty straightforward. Using @code{GNUNET_IDENTITY_create}, an application can request the creation of an ego by specifying the desired name. The operation will fail if that name is already in use. Using @code{GNUNET_IDENTITY_rename} the name of an existing ego can be changed. Finally, egos can be deleted using @code{GNUNET_IDENTITY_delete}. All of these operations will trigger updates to the callback given to the @code{GNUNET_IDENTITY_connect} function of all applications that are connected with the identity service at the time. @code{GNUNET_IDENTITY_cancel} can be used to cancel the operations before the respective continuations would be called. It is not guaranteed that the operation will not be completed anyway, only the continuation will no longer be called. @node The anonymous Ego @subsubsection The anonymous Ego A special way to obtain an ego handle is to call @code{GNUNET_IDENTITY_ego_get_anonymous}, which returns an ego for the "anonymous" user --- anyone knows and can get the private key for this user, so it is suitable for operations that are supposed to be anonymous but require signatures (for example, to avoid a special path in the code). The anonymous ego is always valid and accessing it does not require a connection to the identity service. @node Convenience API to lookup a single ego @subsubsection Convenience API to lookup a single ego As applications commonly simply have to lookup a single ego, there is a convenience API to do just that. Use @code{GNUNET_IDENTITY_ego_lookup} to lookup a single ego by name. Note that this is the user's name for the ego, not the service function. The resulting ego will be returned via a callback and will only be valid during that callback. The operation can be canceled via @code{GNUNET_IDENTITY_ego_lookup_cancel} (cancellation is only legal before the callback is invoked). @node Associating egos with service functions @subsubsection Associating egos with service functions The @code{GNUNET_IDENTITY_set} function is used to associate a particular ego with a service function. The name used by the service and the ego are given as arguments. Afterwards, the service can use its name to lookup the associated ego using @code{GNUNET_IDENTITY_get}. @node The IDENTITY Client-Service Protocol @subsection The IDENTITY Client-Service Protocol A client connecting to the identity service first sends a message with type @code{GNUNET_MESSAGE_TYPE_IDENTITY_START} to the service. After that, the client will receive information about changes to the egos by receiving messages of type @code{GNUNET_MESSAGE_TYPE_IDENTITY_UPDATE}. Those messages contain the private key of the ego and the user's name of the ego (or zero bytes for the name to indicate that the ego was deleted). A special bit @code{end_of_list} is used to indicate the end of the initial iteration over the identity service's egos. The client can trigger changes to the egos by sending @code{CREATE}, @code{RENAME} or @code{DELETE} messages. The CREATE message contains the private key and the desired name.@ The RENAME message contains the old name and the new name.@ The DELETE message only needs to include the name of the ego to delete.@ The service responds to each of these messages with a @code{RESULT_CODE} message which indicates success or error of the operation, and possibly a human-readable error message. Finally, the client can bind the name of a service function to an ego by sending a @code{SET_DEFAULT} message with the name of the service function and the private key of the ego. Such bindings can then be resolved using a @code{GET_DEFAULT} message, which includes the name of the service function. The identity service will respond to a GET_DEFAULT request with a SET_DEFAULT message containing the respective information, or with a RESULT_CODE to indicate an error. @cindex NAMESTORE Subsystem @node NAMESTORE Subsystem @section NAMESTORE Subsystem The NAMESTORE subsystem provides persistent storage for local GNS zone information. All local GNS zone information are managed by NAMESTORE. It provides both the functionality to administer local GNS information (e.g. delete and add records) as well as to retrieve GNS information (e.g to list name information in a client). NAMESTORE does only manage the persistent storage of zone information belonging to the user running the service: GNS information from other users obtained from the DHT are stored by the NAMECACHE subsystem. NAMESTORE uses a plugin-based database backend to store GNS information with good performance. Here sqlite, MySQL and PostgreSQL are supported database backends. NAMESTORE clients interact with the IDENTITY subsystem to obtain cryptographic information about zones based on egos as described with the IDENTITY subsystem, but internally NAMESTORE refers to zones using the ECDSA private key. In addition, it collaborates with the NAMECACHE subsystem and stores zone information when local information are modified in the GNS cache to increase look-up performance for local information. NAMESTORE provides functionality to look-up and store records, to iterate over a specific or all zones and to monitor zones for changes. NAMESTORE functionality can be accessed using the NAMESTORE api or the NAMESTORE command line tool. @menu * libgnunetnamestore:: @end menu @cindex libgnunetnamestore @node libgnunetnamestore @subsection libgnunetnamestore To interact with NAMESTORE clients first connect to the NAMESTORE service using the @code{GNUNET_NAMESTORE_connect} passing a configuration handle. As a result they obtain a NAMESTORE handle, they can use for operations, or NULL is returned if the connection failed. To disconnect from NAMESTORE, clients use @code{GNUNET_NAMESTORE_disconnect} and specify the handle to disconnect. NAMESTORE internally uses the ECDSA private key to refer to zones. These private keys can be obtained from the IDENTITY subsytem. Here @emph{egos} @emph{can be used to refer to zones or the default ego assigned to the GNS subsystem can be used to obtained the master zone's private key.} @menu * Editing Zone Information:: * Iterating Zone Information:: * Monitoring Zone Information:: @end menu @node Editing Zone Information @subsubsection Editing Zone Information NAMESTORE provides functions to lookup records stored under a label in a zone and to store records under a label in a zone. To store (and delete) records, the client uses the @code{GNUNET_NAMESTORE_records_store} function and has to provide namestore handle to use, the private key of the zone, the label to store the records under, the records and number of records plus an callback function. After the operation is performed NAMESTORE will call the provided callback function with the result GNUNET_SYSERR on failure (including timeout/queue drop/failure to validate), GNUNET_NO if content was already there or not found GNUNET_YES (or other positive value) on success plus an additional error message. Records are deleted by using the store command with 0 records to store. It is important to note, that records are not merged when records exist with the label. So a client has first to retrieve records, merge with existing records and then store the result. To perform a lookup operation, the client uses the @code{GNUNET_NAMESTORE_records_store} function. Here it has to pass the namestore handle, the private key of the zone and the label. It also has to provide a callback function which will be called with the result of the lookup operation: the zone for the records, the label, and the records including the number of records included. A special operation is used to set the preferred nickname for a zone. This nickname is stored with the zone and is automatically merged with all labels and records stored in a zone. Here the client uses the @code{GNUNET_NAMESTORE_set_nick} function and passes the private key of the zone, the nickname as string plus a the callback with the result of the operation. @node Iterating Zone Information @subsubsection Iterating Zone Information A client can iterate over all information in a zone or all zones managed by NAMESTORE. Here a client uses the @code{GNUNET_NAMESTORE_zone_iteration_start} function and passes the namestore handle, the zone to iterate over and a callback function to call with the result. If the client wants to iterate over all the WHAT!? FIXME, it passes NULL for the zone. A @code{GNUNET_NAMESTORE_ZoneIterator} handle is returned to be used to continue iteration. NAMESTORE calls the callback for every result and expects the client to call @code{GNUNET_NAMESTORE_zone_iterator_next} to continue to iterate or @code{GNUNET_NAMESTORE_zone_iterator_stop} to interrupt the iteration. When NAMESTORE reached the last item it will call the callback with a NULL value to indicate. @node Monitoring Zone Information @subsubsection Monitoring Zone Information Clients can also monitor zones to be notified about changes. Here the clients uses the @code{GNUNET_NAMESTORE_zone_monitor_start} function and passes the private key of the zone and and a callback function to call with updates for a zone. The client can specify to obtain zone information first by iterating over the zone and specify a synchronization callback to be called when the client and the namestore are synced. On an update, NAMESTORE will call the callback with the private key of the zone, the label and the records and their number. To stop monitoring, the client calls @code{GNUNET_NAMESTORE_zone_monitor_stop} and passes the handle obtained from the function to start the monitoring. @cindex PEERINFO Subsystem @node PEERINFO Subsystem @section PEERINFO Subsystem The PEERINFO subsystem is used to store verified (validated) information about known peers in a persistent way. It obtains these addresses for example from TRANSPORT service which is in charge of address validation. Validation means that the information in the HELLO message are checked by connecting to the addresses and performing a cryptographic handshake to authenticate the peer instance stating to be reachable with these addresses. Peerinfo does not validate the HELLO messages itself but only stores them and gives them to interested clients. As future work, we think about moving from storing just HELLO messages to providing a generic persistent per-peer information store. More and more subsystems tend to need to store per-peer information in persistent way. To not duplicate this functionality we plan to provide a PEERSTORE service providing this functionality. @menu * PEERINFO - Features:: * PEERINFO - Limitations:: * DeveloperPeer Information:: * Startup:: * Managing Information:: * Obtaining Information:: * The PEERINFO Client-Service Protocol:: * libgnunetpeerinfo:: @end menu @node PEERINFO - Features @subsection PEERINFO - Features @itemize @bullet @item Persistent storage @item Client notification mechanism on update @item Periodic clean up for expired information @item Differentiation between public and friend-only HELLO @end itemize @node PEERINFO - Limitations @subsection PEERINFO - Limitations @itemize @bullet @item Does not perform HELLO validation @end itemize @node DeveloperPeer Information @subsection DeveloperPeer Information The PEERINFO subsystem stores these information in the form of HELLO messages you can think of as business cards. These HELLO messages contain the public key of a peer and the addresses a peer can be reached under. The addresses include an expiration date describing how long they are valid. This information is updated regularly by the TRANSPORT service by revalidating the address. If an address is expired and not renewed, it can be removed from the HELLO message. Some peer do not want to have their HELLO messages distributed to other peers, especially when GNUnet's friend-to-friend modus is enabled. To prevent this undesired distribution. PEERINFO distinguishes between @emph{public} and @emph{friend-only} HELLO messages. Public HELLO messages can be freely distributed to other (possibly unknown) peers (for example using the hostlist, gossiping, broadcasting), whereas friend-only HELLO messages may not be distributed to other peers. Friend-only HELLO messages have an additional flag @code{friend_only} set internally. For public HELLO message this flag is not set. PEERINFO does and cannot not check if a client is allowed to obtain a specific HELLO type. The HELLO messages can be managed using the GNUnet HELLO library. Other GNUnet systems can obtain these information from PEERINFO and use it for their purposes. Clients are for example the HOSTLIST component providing these information to other peers in form of a hostlist or the TRANSPORT subsystem using these information to maintain connections to other peers. @node Startup @subsection Startup During startup the PEERINFO services loads persistent HELLOs from disk. First PEERINFO parses the directory configured in the HOSTS value of the @code{PEERINFO} configuration section to store PEERINFO information. For all files found in this directory valid HELLO messages are extracted. In addition it loads HELLO messages shipped with the GNUnet distribution. These HELLOs are used to simplify network bootstrapping by providing valid peer information with the distribution. The use of these HELLOs can be prevented by setting the @code{USE_INCLUDED_HELLOS} in the @code{PEERINFO} configuration section to @code{NO}. Files containing invalid information are removed. @node Managing Information @subsection Managing Information The PEERINFO services stores information about known PEERS and a single HELLO message for every peer. A peer does not need to have a HELLO if no information are available. HELLO information from different sources, for example a HELLO obtained from a remote HOSTLIST and a second HELLO stored on disk, are combined and merged into one single HELLO message per peer which will be given to clients. During this merge process the HELLO is immediately written to disk to ensure persistence. PEERINFO in addition periodically scans the directory where information are stored for empty HELLO messages with expired TRANSPORT addresses. This periodic task scans all files in the directory and recreates the HELLO messages it finds. Expired TRANSPORT addresses are removed from the HELLO and if the HELLO does not contain any valid addresses, it is discarded and removed from the disk. @node Obtaining Information @subsection Obtaining Information When a client requests information from PEERINFO, PEERINFO performs a lookup for the respective peer or all peers if desired and transmits this information to the client. The client can specify if friend-only HELLOs have to be included or not and PEERINFO filters the respective HELLO messages before transmitting information. To notify clients about changes to PEERINFO information, PEERINFO maintains a list of clients interested in this notifications. Such a notification occurs if a HELLO for a peer was updated (due to a merge for example) or a new peer was added. @node The PEERINFO Client-Service Protocol @subsection The PEERINFO Client-Service Protocol To connect and disconnect to and from the PEERINFO Service PEERINFO utilizes the util client/server infrastructure, so no special messages types are used here. To add information for a peer, the plain HELLO message is transmitted to the service without any wrapping. All pieces of information required are stored within the HELLO message. The PEERINFO service provides a message handler accepting and processing these HELLO messages. When obtaining PEERINFO information using the iterate functionality specific messages are used. To obtain information for all peers, a @code{struct ListAllPeersMessage} with message type @code{GNUNET_MESSAGE_TYPE_PEERINFO_GET_ALL} and a flag include_friend_only to indicate if friend-only HELLO messages should be included are transmitted. If information for a specific peer is required a @code{struct ListAllPeersMessage} with @code{GNUNET_MESSAGE_TYPE_PEERINFO_GET} containing the peer identity is used. For both variants the PEERINFO service replies for each HELLO message it wants to transmit with a @code{struct ListAllPeersMessage} with type @code{GNUNET_MESSAGE_TYPE_PEERINFO_INFO} containing the plain HELLO. The final message is @code{struct GNUNET_MessageHeader} with type @code{GNUNET_MESSAGE_TYPE_PEERINFO_INFO}. If the client receives this message, it can proceed with the next request if any is pending. @node libgnunetpeerinfo @subsection libgnunetpeerinfo The PEERINFO API consists mainly of three different functionalities: @itemize @bullet @item maintaining a connection to the service @item adding new information to the PEERINFO service @item retrieving information from the PEERINFO service @end itemize @menu * Connecting to the PEERINFO Service:: * Adding Information to the PEERINFO Service:: * Obtaining Information from the PEERINFO Service:: @end menu @node Connecting to the PEERINFO Service @subsubsection Connecting to the PEERINFO Service To connect to the PEERINFO service the function @code{GNUNET_PEERINFO_connect} is used, taking a configuration handle as an argument, and to disconnect from PEERINFO the function @code{GNUNET_PEERINFO_disconnect}, taking the PEERINFO handle returned from the connect function has to be called. @node Adding Information to the PEERINFO Service @subsubsection Adding Information to the PEERINFO Service @code{GNUNET_PEERINFO_add_peer} adds a new peer to the PEERINFO subsystem storage. This function takes the PEERINFO handle as an argument, the HELLO message to store and a continuation with a closure to be called with the result of the operation. The @code{GNUNET_PEERINFO_add_peer} returns a handle to this operation allowing to cancel the operation with the respective cancel function @code{GNUNET_PEERINFO_add_peer_cancel}. To retrieve information from PEERINFO you can iterate over all information stored with PEERINFO or you can tell PEERINFO to notify if new peer information are available. @node Obtaining Information from the PEERINFO Service @subsubsection Obtaining Information from the PEERINFO Service To iterate over information in PEERINFO you use @code{GNUNET_PEERINFO_iterate}. This function expects the PEERINFO handle, a flag if HELLO messages intended for friend only mode should be included, a timeout how long the operation should take and a callback with a callback closure to be called for the results. If you want to obtain information for a specific peer, you can specify the peer identity, if this identity is NULL, information for all peers are returned. The function returns a handle to allow to cancel the operation using @code{GNUNET_PEERINFO_iterate_cancel}. To get notified when peer information changes, you can use @code{GNUNET_PEERINFO_notify}. This function expects a configuration handle and a flag if friend-only HELLO messages should be included. The PEERINFO service will notify you about every change and the callback function will be called to notify you about changes. The function returns a handle to cancel notifications with @code{GNUNET_PEERINFO_notify_cancel}. @cindex PEERSTORE Subsystem @node PEERSTORE Subsystem @section PEERSTORE Subsystem GNUnet's PEERSTORE subsystem offers persistent per-peer storage for other GNUnet subsystems. GNUnet subsystems can use PEERSTORE to persistently store and retrieve arbitrary data. Each data record stored with PEERSTORE contains the following fields: @itemize @bullet @item subsystem: Name of the subsystem responsible for the record. @item peerid: Identity of the peer this record is related to. @item key: a key string identifying the record. @item value: binary record value. @item expiry: record expiry date. @end itemize @menu * Functionality:: * Architecture:: * libgnunetpeerstore:: @end menu @node Functionality @subsection Functionality Subsystems can store any type of value under a (subsystem, peerid, key) combination. A "replace" flag set during store operations forces the PEERSTORE to replace any old values stored under the same (subsystem, peerid, key) combination with the new value. Additionally, an expiry date is set after which the record is *possibly* deleted by PEERSTORE. Subsystems can iterate over all values stored under any of the following combination of fields: @itemize @bullet @item (subsystem) @item (subsystem, peerid) @item (subsystem, key) @item (subsystem, peerid, key) @end itemize Subsystems can also request to be notified about any new values stored under a (subsystem, peerid, key) combination by sending a "watch" request to PEERSTORE. @node Architecture @subsection Architecture PEERSTORE implements the following components: @itemize @bullet @item PEERSTORE service: Handles store, iterate and watch operations. @item PEERSTORE API: API to be used by other subsystems to communicate and issue commands to the PEERSTORE service. @item PEERSTORE plugins: Handles the persistent storage. At the moment, only an "sqlite" plugin is implemented. @end itemize @cindex libgnunetpeerstore @node libgnunetpeerstore @subsection libgnunetpeerstore libgnunetpeerstore is the library containing the PEERSTORE API. Subsystems wishing to communicate with the PEERSTORE service use this API to open a connection to PEERSTORE. This is done by calling @code{GNUNET_PEERSTORE_connect} which returns a handle to the newly created connection. This handle has to be used with any further calls to the API. To store a new record, the function @code{GNUNET_PEERSTORE_store} is to be used which requires the record fields and a continuation function that will be called by the API after the STORE request is sent to the PEERSTORE service. Note that calling the continuation function does not mean that the record is successfully stored, only that the STORE request has been successfully sent to the PEERSTORE service. @code{GNUNET_PEERSTORE_store_cancel} can be called to cancel the STORE request only before the continuation function has been called. To iterate over stored records, the function @code{GNUNET_PEERSTORE_iterate} is to be used. @emph{peerid} and @emph{key} can be set to NULL. An iterator callback function will be called with each matching record found and a NULL record at the end to signal the end of result set. @code{GNUNET_PEERSTORE_iterate_cancel} can be used to cancel the ITERATE request before the iterator callback is called with a NULL record. To be notified with new values stored under a (subsystem, peerid, key) combination, the function @code{GNUNET_PEERSTORE_watch} is to be used. This will register the watcher with the PEERSTORE service, any new records matching the given combination will trigger the callback function passed to @code{GNUNET_PEERSTORE_watch}. This continues until @code{GNUNET_PEERSTORE_watch_cancel} is called or the connection to the service is destroyed. After the connection is no longer needed, the function @code{GNUNET_PEERSTORE_disconnect} can be called to disconnect from the PEERSTORE service. Any pending ITERATE or WATCH requests will be destroyed. If the @code{sync_first} flag is set to @code{GNUNET_YES}, the API will delay the disconnection until all pending STORE requests are sent to the PEERSTORE service, otherwise, the pending STORE requests will be destroyed as well. @cindex SET Subsystem @node SET Subsystem @section SET Subsystem The SET service implements efficient set operations between two peers over a mesh tunnel. Currently, set union and set intersection are the only supported operations. Elements of a set consist of an @emph{element type} and arbitrary binary @emph{data}. The size of an element's data is limited to around 62 KB. @menu * Local Sets:: * Set Modifications:: * Set Operations:: * Result Elements:: * libgnunetset:: * The SET Client-Service Protocol:: * The SET Intersection Peer-to-Peer Protocol:: * The SET Union Peer-to-Peer Protocol:: @end menu @node Local Sets @subsection Local Sets Sets created by a local client can be modified and reused for multiple operations. As each set operation requires potentially expensive special auxiliary data to be computed for each element of a set, a set can only participate in one type of set operation (i.e. union or intersection). The type of a set is determined upon its creation. If a the elements of a set are needed for an operation of a different type, all of the set's element must be copied to a new set of appropriate type. @node Set Modifications @subsection Set Modifications Even when set operations are active, one can add to and remove elements from a set. However, these changes will only be visible to operations that have been created after the changes have taken place. That is, every set operation only sees a snapshot of the set from the time the operation was started. This mechanism is @emph{not} implemented by copying the whole set, but by attaching @emph{generation information} to each element and operation. @node Set Operations @subsection Set Operations Set operations can be started in two ways: Either by accepting an operation request from a remote peer, or by requesting a set operation from a remote peer. Set operations are uniquely identified by the involved @emph{peers}, an @emph{application id} and the @emph{operation type}. The client is notified of incoming set operations by @emph{set listeners}. A set listener listens for incoming operations of a specific operation type and application id. Once notified of an incoming set request, the client can accept the set request (providing a local set for the operation) or reject it. @node Result Elements @subsection Result Elements The SET service has three @emph{result modes} that determine how an operation's result set is delivered to the client: @itemize @bullet @item @strong{Full Result Set.} All elements of set resulting from the set operation are returned to the client. @item @strong{Added Elements.} Only elements that result from the operation and are not already in the local peer's set are returned. Note that for some operations (like set intersection) this result mode will never return any elements. This can be useful if only the remove peer is actually interested in the result of the set operation. @item @strong{Removed Elements.} Only elements that are in the local peer's initial set but not in the operation's result set are returned. Note that for some operations (like set union) this result mode will never return any elements. This can be useful if only the remove peer is actually interested in the result of the set operation. @end itemize @cindex libgnunetset @node libgnunetset @subsection libgnunetset @menu * Sets:: * Listeners:: * Operations:: * Supplying a Set:: * The Result Callback:: @end menu @node Sets @subsubsection Sets New sets are created with @code{GNUNET_SET_create}. Both the local peer's configuration (as each set has its own client connection) and the operation type must be specified. The set exists until either the client calls @code{GNUNET_SET_destroy} or the client's connection to the service is disrupted. In the latter case, the client is notified by the return value of functions dealing with sets. This return value must always be checked. Elements are added and removed with @code{GNUNET_SET_add_element} and @code{GNUNET_SET_remove_element}. @node Listeners @subsubsection Listeners Listeners are created with @code{GNUNET_SET_listen}. Each time time a remote peer suggests a set operation with an application id and operation type matching a listener, the listener's callback is invoked. The client then must synchronously call either @code{GNUNET_SET_accept} or @code{GNUNET_SET_reject}. Note that the operation will not be started until the client calls @code{GNUNET_SET_commit} (see Section "Supplying a Set"). @node Operations @subsubsection Operations Operations to be initiated by the local peer are created with @code{GNUNET_SET_prepare}. Note that the operation will not be started until the client calls @code{GNUNET_SET_commit} (see Section "Supplying a Set"). @node Supplying a Set @subsubsection Supplying a Set To create symmetry between the two ways of starting a set operation (accepting and initiating it), the operation handles returned by @code{GNUNET_SET_accept} and @code{GNUNET_SET_prepare} do not yet have a set to operate on, thus they can not do any work yet. The client must call @code{GNUNET_SET_commit} to specify a set to use for an operation. @code{GNUNET_SET_commit} may only be called once per set operation. @node The Result Callback @subsubsection The Result Callback Clients must specify both a result mode and a result callback with @code{GNUNET_SET_accept} and @code{GNUNET_SET_prepare}. The result callback with a status indicating either that an element was received, or the operation failed or succeeded. The interpretation of the received element depends on the result mode. The callback needs to know which result mode it is used in, as the arguments do not indicate if an element is part of the full result set, or if it is in the difference between the original set and the final set. @node The SET Client-Service Protocol @subsection The SET Client-Service Protocol @menu * Creating Sets:: * Listeners2:: * Initiating Operations:: * Modifying Sets:: * Results and Operation Status:: * Iterating Sets:: @end menu @node Creating Sets @subsubsection Creating Sets For each set of a client, there exists a client connection to the service. Sets are created by sending the @code{GNUNET_SERVICE_SET_CREATE} message over a new client connection. Multiple operations for one set are multiplexed over one client connection, using a request id supplied by the client. @node Listeners2 @subsubsection Listeners2 Each listener also requires a seperate client connection. By sending the @code{GNUNET_SERVICE_SET_LISTEN} message, the client notifies the service of the application id and operation type it is interested in. A client rejects an incoming request by sending @code{GNUNET_SERVICE_SET_REJECT} on the listener's client connection. In contrast, when accepting an incoming request, a @code{GNUNET_SERVICE_SET_ACCEPT} message must be sent over the@ set that is supplied for the set operation. @node Initiating Operations @subsubsection Initiating Operations Operations with remote peers are initiated by sending a @code{GNUNET_SERVICE_SET_EVALUATE} message to the service. The@ client connection that this message is sent by determines the set to use. @node Modifying Sets @subsubsection Modifying Sets Sets are modified with the @code{GNUNET_SERVICE_SET_ADD} and @code{GNUNET_SERVICE_SET_REMOVE} messages. @c %@menu @c %* Results and Operation Status:: @c %* Iterating Sets:: @c %@end menu @node Results and Operation Status @subsubsection Results and Operation Status The service notifies the client of result elements and success/failure of a set operation with the @code{GNUNET_SERVICE_SET_RESULT} message. @node Iterating Sets @subsubsection Iterating Sets All elements of a set can be requested by sending @code{GNUNET_SERVICE_SET_ITER_REQUEST}. The server responds with @code{GNUNET_SERVICE_SET_ITER_ELEMENT} and eventually terminates the iteration with @code{GNUNET_SERVICE_SET_ITER_DONE}. After each received element, the client must send @code{GNUNET_SERVICE_SET_ITER_ACK}. Note that only one set iteration may be active for a set at any given time. @node The SET Intersection Peer-to-Peer Protocol @subsection The SET Intersection Peer-to-Peer Protocol The intersection protocol operates over CADET and starts with a GNUNET_MESSAGE_TYPE_SET_P2P_OPERATION_REQUEST being sent by the peer initiating the operation to the peer listening for inbound requests. It includes the number of elements of the initiating peer, which is used to decide which side will send a Bloom filter first. The listening peer checks if the operation type and application identifier are acceptable for its current state. If not, it responds with a GNUNET_MESSAGE_TYPE_SET_RESULT and a status of GNUNET_SET_STATUS_FAILURE (and terminates the CADET channel). If the application accepts the request, the listener sends back a @code{GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_ELEMENT_INFO} if it has more elements in the set than the client. Otherwise, it immediately starts with the Bloom filter exchange. If the initiator receives a @code{GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_ELEMENT_INFO} response, it beings the Bloom filter exchange, unless the set size is indicated to be zero, in which case the intersection is considered finished after just the initial handshake. @menu * The Bloom filter exchange:: * Salt:: @end menu @node The Bloom filter exchange @subsubsection The Bloom filter exchange In this phase, each peer transmits a Bloom filter over the remaining keys of the local set to the other peer using a @code{GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_BF} message. This message additionally includes the number of elements left in the sender's set, as well as the XOR over all of the keys in that set. The number of bits 'k' set per element in the Bloom filter is calculated based on the relative size of the two sets. Furthermore, the size of the Bloom filter is calculated based on 'k' and the number of elements in the set to maximize the amount of data filtered per byte transmitted on the wire (while avoiding an excessively high number of iterations). The receiver of the message removes all elements from its local set that do not pass the Bloom filter test. It then checks if the set size of the sender and the XOR over the keys match what is left of its own set. If they do, it sends a @code{GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_DONE} back to indicate that the latest set is the final result. Otherwise, the receiver starts another Bloom filter exchange, except this time as the sender. @node Salt @subsubsection Salt Bloomfilter operations are probabilistic: With some non-zero probability the test may incorrectly say an element is in the set, even though it is not. To mitigate this problem, the intersection protocol iterates exchanging Bloom filters using a different random 32-bit salt in each iteration (the salt is also included in the message). With different salts, set operations may fail for different elements. Merging the results from the executions, the probability of failure drops to zero. The iterations terminate once both peers have established that they have sets of the same size, and where the XOR over all keys computes the same 512-bit value (leaving a failure probability of 2-511). @node The SET Union Peer-to-Peer Protocol @subsection The SET Union Peer-to-Peer Protocol The SET union protocol is based on Eppstein's efficient set reconciliation without prior context. You should read this paper first if you want to understand the protocol. The union protocol operates over CADET and starts with a GNUNET_MESSAGE_TYPE_SET_P2P_OPERATION_REQUEST being sent by the peer initiating the operation to the peer listening for inbound requests. It includes the number of elements of the initiating peer, which is currently not used. The listening peer checks if the operation type and application identifier are acceptable for its current state. If not, it responds with a @code{GNUNET_MESSAGE_TYPE_SET_RESULT} and a status of @code{GNUNET_SET_STATUS_FAILURE} (and terminates the CADET channel). If the application accepts the request, it sends back a strata estimator using a message of type GNUNET_MESSAGE_TYPE_SET_UNION_P2P_SE. The initiator evaluates the strata estimator and initiates the exchange of invertible Bloom filters, sending a GNUNET_MESSAGE_TYPE_SET_UNION_P2P_IBF. During the IBF exchange, if the receiver cannot invert the Bloom filter or detects a cycle, it sends a larger IBF in response (up to a defined maximum limit; if that limit is reached, the operation fails). Elements decoded while processing the IBF are transmitted to the other peer using GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENTS, or requested from the other peer using GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENT_REQUESTS messages, depending on the sign observed during decoding of the IBF. Peers respond to a GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENT_REQUESTS message with the respective element in a GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENTS message. If the IBF fully decodes, the peer responds with a GNUNET_MESSAGE_TYPE_SET_UNION_P2P_DONE message instead of another GNUNET_MESSAGE_TYPE_SET_UNION_P2P_IBF. All Bloom filter operations use a salt to mingle keys before hashing them into buckets, such that future iterations have a fresh chance of succeeding if they failed due to collisions before. @cindex STATISTICS Subsystem @node STATISTICS Subsystem @section STATISTICS Subsystem In GNUnet, the STATISTICS subsystem offers a central place for all subsystems to publish unsigned 64-bit integer run-time statistics. Keeping this information centrally means that there is a unified way for the user to obtain data on all subsystems, and individual subsystems do not have to always include a custom data export method for performance metrics and other statistics. For example, the TRANSPORT system uses STATISTICS to update information about the number of directly connected peers and the bandwidth that has been consumed by the various plugins. This information is valuable for diagnosing connectivity and performance issues. Following the GNUnet service architecture, the STATISTICS subsystem is divided into an API which is exposed through the header @strong{gnunet_statistics_service.h} and the STATISTICS service @strong{gnunet-service-statistics}. The @strong{gnunet-statistics} command-line tool can be used to obtain (and change) information about the values stored by the STATISTICS service. The STATISTICS service does not communicate with other peers. Data is stored in the STATISTICS service in the form of tuples @strong{(subsystem, name, value, persistence)}. The subsystem determines to which other GNUnet's subsystem the data belongs. name is the name through which value is associated. It uniquely identifies the record from among other records belonging to the same subsystem. In some parts of the code, the pair @strong{(subsystem, name)} is called a @strong{statistic} as it identifies the values stored in the STATISTCS service.The persistence flag determines if the record has to be preserved across service restarts. A record is said to be persistent if this flag is set for it; if not, the record is treated as a non-persistent record and it is lost after service restart. Persistent records are written to and read from the file @strong{statistics.data} before shutdown and upon startup. The file is located in the HOME directory of the peer. An anomaly of the STATISTICS service is that it does not terminate immediately upon receiving a shutdown signal if it has any clients connected to it. It waits for all the clients that are not monitors to close their connections before terminating itself. This is to prevent the loss of data during peer shutdown --- delaying the STATISTICS service shutdown helps other services to store important data to STATISTICS during shutdown. @menu * libgnunetstatistics:: * The STATISTICS Client-Service Protocol:: @end menu @cindex libgnunetstatistics @node libgnunetstatistics @subsection libgnunetstatistics @strong{libgnunetstatistics} is the library containing the API for the STATISTICS subsystem. Any process requiring to use STATISTICS should use this API by to open a connection to the STATISTICS service. This is done by calling the function @code{GNUNET_STATISTICS_create()}. This function takes the subsystem's name which is trying to use STATISTICS and a configuration. All values written to STATISTICS with this connection will be placed in the section corresponding to the given subsystem's name. The connection to STATISTICS can be destroyed with the function @code{GNUNET_STATISTICS_destroy()}. This function allows for the connection to be destroyed immediately or upon transferring all pending write requests to the service. Note: STATISTICS subsystem can be disabled by setting @code{DISABLE = YES} under the @code{[STATISTICS]} section in the configuration. With such a configuration all calls to @code{GNUNET_STATISTICS_create()} return @code{NULL} as the STATISTICS subsystem is unavailable and no other functions from the API can be used. @menu * Statistics retrieval:: * Setting statistics and updating them:: * Watches:: @end menu @node Statistics retrieval @subsubsection Statistics retrieval Once a connection to the statistics service is obtained, information about any other system which uses statistics can be retrieved with the function GNUNET_STATISTICS_get(). This function takes the connection handle, the name of the subsystem whose information we are interested in (a @code{NULL} value will retrieve information of all available subsystems using STATISTICS), the name of the statistic we are interested in (a @code{NULL} value will retrieve all available statistics), a continuation callback which is called when all of requested information is retrieved, an iterator callback which is called for each parameter in the retrieved information and a closure for the aforementioned callbacks. The library then invokes the iterator callback for each value matching the request. Call to @code{GNUNET_STATISTICS_get()} is asynchronous and can be canceled with the function @code{GNUNET_STATISTICS_get_cancel()}. This is helpful when retrieving statistics takes too long and especially when we want to shutdown and cleanup everything. @node Setting statistics and updating them @subsubsection Setting statistics and updating them So far we have seen how to retrieve statistics, here we will learn how we can set statistics and update them so that other subsystems can retrieve them. A new statistic can be set using the function @code{GNUNET_STATISTICS_set()}. This function takes the name of the statistic and its value and a flag to make the statistic persistent. The value of the statistic should be of the type @code{uint64_t}. The function does not take the name of the subsystem; it is determined from the previous @code{GNUNET_STATISTICS_create()} invocation. If the given statistic is already present, its value is overwritten. An existing statistics can be updated, i.e its value can be increased or decreased by an amount with the function @code{GNUNET_STATISTICS_update()}. The parameters to this function are similar to @code{GNUNET_STATISTICS_set()}, except that it takes the amount to be changed as a type @code{int64_t} instead of the value. The library will combine multiple set or update operations into one message if the client performs requests at a rate that is faster than the available IPC with the STATISTICS service. Thus, the client does not have to worry about sending requests too quickly. @node Watches @subsubsection Watches As interesting feature of STATISTICS lies in serving notifications whenever a statistic of our interest is modified. This is achieved by registering a watch through the function @code{GNUNET_STATISTICS_watch()}. The parameters of this function are similar to those of @code{GNUNET_STATISTICS_get()}. Changes to the respective statistic's value will then cause the given iterator callback to be called. Note: A watch can only be registered for a specific statistic. Hence the subsystem name and the parameter name cannot be @code{NULL} in a call to @code{GNUNET_STATISTICS_watch()}. A registered watch will keep notifying any value changes until @code{GNUNET_STATISTICS_watch_cancel()} is called with the same parameters that are used for registering the watch. @node The STATISTICS Client-Service Protocol @subsection The STATISTICS Client-Service Protocol @menu * Statistics retrieval2:: * Setting and updating statistics:: * Watching for updates:: @end menu @node Statistics retrieval2 @subsubsection Statistics retrieval2 To retrieve statistics, the client transmits a message of type @code{GNUNET_MESSAGE_TYPE_STATISTICS_GET} containing the given subsystem name and statistic parameter to the STATISTICS service. The service responds with a message of type @code{GNUNET_MESSAGE_TYPE_STATISTICS_VALUE} for each of the statistics parameters that match the client request for the client. The end of information retrieved is signaled by the service by sending a message of type @code{GNUNET_MESSAGE_TYPE_STATISTICS_END}. @node Setting and updating statistics @subsubsection Setting and updating statistics The subsystem name, parameter name, its value and the persistence flag are communicated to the service through the message @code{GNUNET_MESSAGE_TYPE_STATISTICS_SET}. When the service receives a message of type @code{GNUNET_MESSAGE_TYPE_STATISTICS_SET}, it retrieves the subsystem name and checks for a statistic parameter with matching the name given in the message. If a statistic parameter is found, the value is overwritten by the new value from the message; if not found then a new statistic parameter is created with the given name and value. In addition to just setting an absolute value, it is possible to perform a relative update by sending a message of type @code{GNUNET_MESSAGE_TYPE_STATISTICS_SET} with an update flag (@code{GNUNET_STATISTICS_SETFLAG_RELATIVE}) signifying that the value in the message should be treated as an update value. @node Watching for updates @subsubsection Watching for updates The function registers the watch at the service by sending a message of type @code{GNUNET_MESSAGE_TYPE_STATISTICS_WATCH}. The service then sends notifications through messages of type @code{GNUNET_MESSAGE_TYPE_STATISTICS_WATCH_VALUE} whenever the statistic parameter's value is changed. @cindex DHT @cindex Distributed Hash Table @node Distributed Hash Table (DHT) @section Distributed Hash Table (DHT) GNUnet includes a generic distributed hash table that can be used by developers building P2P applications in the framework. This section documents high-level features and how developers are expected to use the DHT. We have a research paper detailing how the DHT works. Also, Nate's thesis includes a detailed description and performance analysis (in chapter 6). Key features of GNUnet's DHT include: @itemize @bullet @item stores key-value pairs with values up to (approximately) 63k in size @item works with many underlay network topologies (small-world, random graph), underlay does not need to be a full mesh / clique @item support for extended queries (more than just a simple 'key'), filtering duplicate replies within the network (bloomfilter) and content validation (for details, please read the subsection on the block library) @item can (optionally) return paths taken by the PUT and GET operations to the application @item provides content replication to handle churn @end itemize GNUnet's DHT is randomized and unreliable. Unreliable means that there is no strict guarantee that a value stored in the DHT is always found --- values are only found with high probability. While this is somewhat true in all P2P DHTs, GNUnet developers should be particularly wary of this fact (this will help you write secure, fault-tolerant code). Thus, when writing any application using the DHT, you should always consider the possibility that a value stored in the DHT by you or some other peer might simply not be returned, or returned with a significant delay. Your application logic must be written to tolerate this (naturally, some loss of performance or quality of service is expected in this case). @menu * Block library and plugins:: * libgnunetdht:: * The DHT Client-Service Protocol:: * The DHT Peer-to-Peer Protocol:: @end menu @node Block library and plugins @subsection Block library and plugins @menu * What is a Block?:: * The API of libgnunetblock:: * Queries:: * Sample Code:: * Conclusion2:: @end menu @node What is a Block? @subsubsection What is a Block? Blocks are small (< 63k) pieces of data stored under a key (struct GNUNET_HashCode). Blocks have a type (enum GNUNET_BlockType) which defines their data format. Blocks are used in GNUnet as units of static data exchanged between peers and stored (or cached) locally. Uses of blocks include file-sharing (the files are broken up into blocks), the VPN (DNS information is stored in blocks) and the DHT (all information in the DHT and meta-information for the maintenance of the DHT are both stored using blocks). The block subsystem provides a few common functions that must be available for any type of block. @cindex libgnunetblock API @node The API of libgnunetblock @subsubsection The API of libgnunetblock The block library requires for each (family of) block type(s) a block plugin (implementing @file{gnunet_block_plugin.h}) that provides basic functions that are needed by the DHT (and possibly other subsystems) to manage the block. These block plugins are typically implemented within their respective subsystems. The main block library is then used to locate, load and query the appropriate block plugin. Which plugin is appropriate is determined by the block type (which is just a 32-bit integer). Block plugins contain code that specifies which block types are supported by a given plugin. The block library loads all block plugins that are installed at the local peer and forwards the application request to the respective plugin. The central functions of the block APIs (plugin and main library) are to allow the mapping of blocks to their respective key (if possible) and the ability to check that a block is well-formed and matches a given request (again, if possible). This way, GNUnet can avoid storing invalid blocks, storing blocks under the wrong key and forwarding blocks in response to a query that they do not answer. One key function of block plugins is that it allows GNUnet to detect duplicate replies (via the Bloom filter). All plugins MUST support detecting duplicate replies (by adding the current response to the Bloom filter and rejecting it if it is encountered again). If a plugin fails to do this, responses may loop in the network. @node Queries @subsubsection Queries The query format for any block in GNUnet consists of four main components. First, the type of the desired block must be specified. Second, the query must contain a hash code. The hash code is used for lookups in hash tables and databases and must not be unique for the block (however, if possible a unique hash should be used as this would be best for performance). Third, an optional Bloom filter can be specified to exclude known results; replies that hash to the bits set in the Bloom filter are considered invalid. False-positives can be eliminated by sending the same query again with a different Bloom filter mutator value, which parameterizes the hash function that is used. Finally, an optional application-specific "eXtended query" (xquery) can be specified to further constrain the results. It is entirely up to the type-specific plugin to determine whether or not a given block matches a query (type, hash, Bloom filter, and xquery). Naturally, not all xquery's are valid and some types of blocks may not support Bloom filters either, so the plugin also needs to check if the query is valid in the first place. Depending on the results from the plugin, the DHT will then discard the (invalid) query, forward the query, discard the (invalid) reply, cache the (valid) reply, and/or forward the (valid and non-duplicate) reply. @node Sample Code @subsubsection Sample Code The source code in @strong{plugin_block_test.c} is a good starting point for new block plugins --- it does the minimal work by implementing a plugin that performs no validation at all. The respective @strong{Makefile.am} shows how to build and install a block plugin. @node Conclusion2 @subsubsection Conclusion2 In conclusion, GNUnet subsystems that want to use the DHT need to define a block format and write a plugin to match queries and replies. For testing, the @code{GNUNET_BLOCK_TYPE_TEST} block type can be used; it accepts any query as valid and any reply as matching any query. This type is also used for the DHT command line tools. However, it should NOT be used for normal applications due to the lack of error checking that results from this primitive implementation. @cindex libgnunetdht @node libgnunetdht @subsection libgnunetdht The DHT API itself is pretty simple and offers the usual GET and PUT functions that work as expected. The specified block type refers to the block library which allows the DHT to run application-specific logic for data stored in the network. @menu * GET:: * PUT:: * MONITOR:: * DHT Routing Options:: @end menu @node GET @subsubsection GET When using GET, the main consideration for developers (other than the block library) should be that after issuing a GET, the DHT will continuously cause (small amounts of) network traffic until the operation is explicitly canceled. So GET does not simply send out a single network request once; instead, the DHT will continue to search for data. This is needed to achieve good success rates and also handles the case where the respective PUT operation happens after the GET operation was started. Developers should not cancel an existing GET operation and then explicitly re-start it to trigger a new round of network requests; this is simply inefficient, especially as the internal automated version can be more efficient, for example by filtering results in the network that have already been returned. If an application that performs a GET request has a set of replies that it already knows and would like to filter, it can call@ @code{GNUNET_DHT_get_filter_known_results} with an array of hashes over the respective blocks to tell the DHT that these results are not desired (any more). This way, the DHT will filter the respective blocks using the block library in the network, which may result in a significant reduction in bandwidth consumption. @node PUT @subsubsection PUT @c inconsistent use of ``must'' above it's written ``MUST'' In contrast to GET operations, developers @strong{must} manually re-run PUT operations periodically (if they intend the content to continue to be available). Content stored in the DHT expires or might be lost due to churn. Furthermore, GNUnet's DHT typically requires multiple rounds of PUT operations before a key-value pair is consistently available to all peers (the DHT randomizes paths and thus storage locations, and only after multiple rounds of PUTs there will be a sufficient number of replicas in large DHTs). An explicit PUT operation using the DHT API will only cause network traffic once, so in order to ensure basic availability and resistance to churn (and adversaries), PUTs must be repeated. While the exact frequency depends on the application, a rule of thumb is that there should be at least a dozen PUT operations within the content lifetime. Content in the DHT typically expires after one day, so DHT PUT operations should be repeated at least every 1-2 hours. @node MONITOR @subsubsection MONITOR The DHT API also allows applications to monitor messages crossing the local DHT service. The types of messages used by the DHT are GET, PUT and RESULT messages. Using the monitoring API, applications can choose to monitor these requests, possibly limiting themselves to requests for a particular block type. The monitoring API is not only useful for diagnostics, it can also be used to trigger application operations based on PUT operations. For example, an application may use PUTs to distribute work requests to other peers. The workers would then monitor for PUTs that give them work, instead of looking for work using GET operations. This can be beneficial, especially if the workers have no good way to guess the keys under which work would be stored. Naturally, additional protocols might be needed to ensure that the desired number of workers will process the distributed workload. @node DHT Routing Options @subsubsection DHT Routing Options There are two important options for GET and PUT requests: @table @asis @item GNUNET_DHT_RO_DEMULITPLEX_EVERYWHERE This option means that all peers should process the request, even if their peer ID is not closest to the key. For a PUT request, this means that all peers that a request traverses may make a copy of the data. Similarly for a GET request, all peers will check their local database for a result. Setting this option can thus significantly improve caching and reduce bandwidth consumption --- at the expense of a larger DHT database. If in doubt, we recommend that this option should be used. @item GNUNET_DHT_RO_RECORD_ROUTE This option instructs the DHT to record the path that a GET or a PUT request is taking through the overlay network. The resulting paths are then returned to the application with the respective result. This allows the receiver of a result to construct a path to the originator of the data, which might then be used for routing. Naturally, setting this option requires additional bandwidth and disk space, so applications should only set this if the paths are needed by the application logic. @item GNUNET_DHT_RO_FIND_PEER This option is an internal option used by the DHT's peer discovery mechanism and should not be used by applications. @item GNUNET_DHT_RO_BART This option is currently not implemented. It may in the future offer performance improvements for clique topologies. @end table @node The DHT Client-Service Protocol @subsection The DHT Client-Service Protocol @menu * PUTting data into the DHT:: * GETting data from the DHT:: * Monitoring the DHT:: @end menu @node PUTting data into the DHT @subsubsection PUTting data into the DHT To store (PUT) data into the DHT, the client sends a @code{struct GNUNET_DHT_ClientPutMessage} to the service. This message specifies the block type, routing options, the desired replication level, the expiration time, key, value and a 64-bit unique ID for the operation. The service responds with a @code{struct GNUNET_DHT_ClientPutConfirmationMessage} with the same 64-bit unique ID. Note that the service sends the confirmation as soon as it has locally processed the PUT request. The PUT may still be propagating through the network at this time. In the future, we may want to change this to provide (limited) feedback to the client, for example if we detect that the PUT operation had no effect because the same key-value pair was already stored in the DHT. However, changing this would also require additional state and messages in the P2P interaction. @node GETting data from the DHT @subsubsection GETting data from the DHT To retrieve (GET) data from the DHT, the client sends a @code{struct GNUNET_DHT_ClientGetMessage} to the service. The message specifies routing options, a replication level (for replicating the GET, not the content), the desired block type, the key, the (optional) extended query and unique 64-bit request ID. Additionally, the client may send any number of @code{struct GNUNET_DHT_ClientGetResultSeenMessage}s to notify the service about results that the client is already aware of. These messages consist of the key, the unique 64-bit ID of the request, and an arbitrary number of hash codes over the blocks that the client is already aware of. As messages are restricted to 64k, a client that already knows more than about a thousand blocks may need to send several of these messages. Naturally, the client should transmit these messages as quickly as possible after the original GET request such that the DHT can filter those results in the network early on. Naturally, as these messages are sent after the original request, it is conceivable that the DHT service may return blocks that match those already known to the client anyway. In response to a GET request, the service will send @code{struct GNUNET_DHT_ClientResultMessage}s to the client. These messages contain the block type, expiration, key, unique ID of the request and of course the value (a block). Depending on the options set for the respective operations, the replies may also contain the path the GET and/or the PUT took through the network. A client can stop receiving replies either by disconnecting or by sending a @code{struct GNUNET_DHT_ClientGetStopMessage} which must contain the key and the 64-bit unique ID of the original request. Using an explicit "stop" message is more common as this allows a client to run many concurrent GET operations over the same connection with the DHT service --- and to stop them individually. @node Monitoring the DHT @subsubsection Monitoring the DHT To begin monitoring, the client sends a @code{struct GNUNET_DHT_MonitorStartStop} message to the DHT service. In this message, flags can be set to enable (or disable) monitoring of GET, PUT and RESULT messages that pass through a peer. The message can also restrict monitoring to a particular block type or a particular key. Once monitoring is enabled, the DHT service will notify the client about any matching event using @code{struct GNUNET_DHT_MonitorGetMessage}s for GET events, @code{struct GNUNET_DHT_MonitorPutMessage} for PUT events and @code{struct GNUNET_DHT_MonitorGetRespMessage} for RESULTs. Each of these messages contains all of the information about the event. @node The DHT Peer-to-Peer Protocol @subsection The DHT Peer-to-Peer Protocol @menu * Routing GETs or PUTs:: * PUTting data into the DHT2:: * GETting data from the DHT2:: @end menu @node Routing GETs or PUTs @subsubsection Routing GETs or PUTs When routing GETs or PUTs, the DHT service selects a suitable subset of neighbours for forwarding. The exact number of neighbours can be zero or more and depends on the hop counter of the query (initially zero) in relation to the (log of) the network size estimate, the desired replication level and the peer's connectivity. Depending on the hop counter and our network size estimate, the selection of the peers maybe randomized or by proximity to the key. Furthermore, requests include a set of peers that a request has already traversed; those peers are also excluded from the selection. @node PUTting data into the DHT2 @subsubsection PUTting data into the DHT2 To PUT data into the DHT, the service sends a @code{struct PeerPutMessage} of type @code{GNUNET_MESSAGE_TYPE_DHT_P2P_PUT} to the respective neighbour. In addition to the usual information about the content (type, routing options, desired replication level for the content, expiration time, key and value), the message contains a fixed-size Bloom filter with information about which peers (may) have already seen this request. This Bloom filter is used to ensure that DHT messages never loop back to a peer that has already processed the request. Additionally, the message includes the current hop counter and, depending on the routing options, the message may include the full path that the message has taken so far. The Bloom filter should already contain the identity of the previous hop; however, the path should not include the identity of the previous hop and the receiver should append the identity of the sender to the path, not its own identity (this is done to reduce bandwidth). @node GETting data from the DHT2 @subsubsection GETting data from the DHT2 A peer can search the DHT by sending @code{struct PeerGetMessage}s of type @code{GNUNET_MESSAGE_TYPE_DHT_P2P_GET} to other peers. In addition to the usual information about the request (type, routing options, desired replication level for the request, the key and the extended query), a GET request also contains a hop counter, a Bloom filter over the peers that have processed the request already and depending on the routing options the full path traversed by the GET. Finally, a GET request includes a variable-size second Bloom filter and a so-called Bloom filter mutator value which together indicate which replies the sender has already seen. During the lookup, each block that matches they block type, key and extended query is additionally subjected to a test against this Bloom filter. The block plugin is expected to take the hash of the block and combine it with the mutator value and check if the result is not yet in the Bloom filter. The originator of the query will from time to time modify the mutator to (eventually) allow false-positives filtered by the Bloom filter to be returned. Peers that receive a GET request perform a local lookup (depending on their proximity to the key and the query options) and forward the request to other peers. They then remember the request (including the Bloom filter for blocking duplicate results) and when they obtain a matching, non-filtered response a @code{struct PeerResultMessage} of type @code{GNUNET_MESSAGE_TYPE_DHT_P2P_RESULT} is forwarded to the previous hop. Whenever a result is forwarded, the block plugin is used to update the Bloom filter accordingly, to ensure that the same result is never forwarded more than once. The DHT service may also cache forwarded results locally if the "CACHE_RESULTS" option is set to "YES" in the configuration. @cindex GNS @cindex GNU Name System @node GNU Name System (GNS) @section GNU Name System (GNS) The GNU Name System (GNS) is a decentralized database that enables users to securely resolve names to values. Names can be used to identify other users (for example, in social networking), or network services (for example, VPN services running at a peer in GNUnet, or purely IP-based services on the Internet). Users interact with GNS by typing in a hostname that ends in a top-level domain that is configured in the ``GNS'' section, matches an identity of the user or ends in a Base32-encoded public key. Videos giving an overview of most of the GNS and the motivations behind it is available here and here. The remainder of this chapter targets developers that are familiar with high level concepts of GNS as presented in these talks. @c TODO: Add links to here and here and to these. GNS-aware applications should use the GNS resolver to obtain the respective records that are stored under that name in GNS. Each record consists of a type, value, expiration time and flags. The type specifies the format of the value. Types below 65536 correspond to DNS record types, larger values are used for GNS-specific records. Applications can define new GNS record types by reserving a number and implementing a plugin (which mostly needs to convert the binary value representation to a human-readable text format and vice-versa). The expiration time specifies how long the record is to be valid. The GNS API ensures that applications are only given non-expired values. The flags are typically irrelevant for applications, as GNS uses them internally to control visibility and validity of records. Records are stored along with a signature. The signature is generated using the private key of the authoritative zone. This allows any GNS resolver to verify the correctness of a name-value mapping. Internally, GNS uses the NAMECACHE to cache information obtained from other users, the NAMESTORE to store information specific to the local users, and the DHT to exchange data between users. A plugin API is used to enable applications to define new GNS record types. @menu * libgnunetgns:: * libgnunetgnsrecord:: * GNS plugins:: * The GNS Client-Service Protocol:: * Hijacking the DNS-Traffic using gnunet-service-dns:: * Serving DNS lookups via GNS on W32:: * Importing DNS Zones into GNS:: @end menu @node libgnunetgns @subsection libgnunetgns The GNS API itself is extremely simple. Clients first connect to the GNS service using @code{GNUNET_GNS_connect}. They can then perform lookups using @code{GNUNET_GNS_lookup} or cancel pending lookups using @code{GNUNET_GNS_lookup_cancel}. Once finished, clients disconnect using @code{GNUNET_GNS_disconnect}. @menu * Looking up records:: * Accessing the records:: * Creating records:: * Future work:: @end menu @node Looking up records @subsubsection Looking up records @code{GNUNET_GNS_lookup} takes a number of arguments: @table @asis @item handle This is simply the GNS connection handle from @code{GNUNET_GNS_connect}. @item name The client needs to specify the name to be resolved. This can be any valid DNS or GNS hostname. @item zone The client needs to specify the public key of the GNS zone against which the resolution should be done. Note that a key must be provided, the client should look up plausible values using its configuration, the identity service and by attempting to interpret the TLD as a base32-encoded public key. @item type This is the desired GNS or DNS record type to look for. While all records for the given name will be returned, this can be important if the client wants to resolve record types that themselves delegate resolution, such as CNAME, PKEY or GNS2DNS. Resolving a record of any of these types will only work if the respective record type is specified in the request, as the GNS resolver will otherwise follow the delegation and return the records from the respective destination, instead of the delegating record. @item only_cached This argument should typically be set to @code{GNUNET_NO}. Setting it to @code{GNUNET_YES} disables resolution via the overlay network. @item shorten_zone_key If GNS encounters new names during resolution, their respective zones can automatically be learned and added to the "shorten zone". If this is desired, clients must pass the private key of the shorten zone. If NULL is passed, shortening is disabled. @item proc This argument identifies the function to call with the result. It is given proc_cls, the number of records found (possibly zero) and the array of the records as arguments. proc will only be called once. After proc,> has been called, the lookup must no longer be canceled. @item proc_cls The closure for proc. @end table @node Accessing the records @subsubsection Accessing the records The @code{libgnunetgnsrecord} library provides an API to manipulate the GNS record array that is given to proc. In particular, it offers functions such as converting record values to human-readable strings (and back). However, most @code{libgnunetgnsrecord} functions are not interesting to GNS client applications. For DNS records, the @code{libgnunetdnsparser} library provides functions for parsing (and serializing) common types of DNS records. @node Creating records @subsubsection Creating records Creating GNS records is typically done by building the respective record information (possibly with the help of @code{libgnunetgnsrecord} and @code{libgnunetdnsparser}) and then using the @code{libgnunetnamestore} to publish the information. The GNS API is not involved in this operation. @node Future work @subsubsection Future work In the future, we want to expand @code{libgnunetgns} to allow applications to observe shortening operations performed during GNS resolution, for example so that users can receive visual feedback when this happens. @node libgnunetgnsrecord @subsection libgnunetgnsrecord The @code{libgnunetgnsrecord} library is used to manipulate GNS records (in plaintext or in their encrypted format). Applications mostly interact with @code{libgnunetgnsrecord} by using the functions to convert GNS record values to strings or vice-versa, or to lookup a GNS record type number by name (or vice-versa). The library also provides various other functions that are mostly used internally within GNS, such as converting keys to names, checking for expiration, encrypting GNS records to GNS blocks, verifying GNS block signatures and decrypting GNS records from GNS blocks. We will now discuss the four commonly used functions of the API.@ @code{libgnunetgnsrecord} does not perform these operations itself, but instead uses plugins to perform the operation. GNUnet includes plugins to support common DNS record types as well as standard GNS record types. @menu * Value handling:: * Type handling:: @end menu @node Value handling @subsubsection Value handling @code{GNUNET_GNSRECORD_value_to_string} can be used to convert the (binary) representation of a GNS record value to a human readable, 0-terminated UTF-8 string. NULL is returned if the specified record type is not supported by any available plugin. @code{GNUNET_GNSRECORD_string_to_value} can be used to try to convert a human readable string to the respective (binary) representation of a GNS record value. @node Type handling @subsubsection Type handling @code{GNUNET_GNSRECORD_typename_to_number} can be used to obtain the numeric value associated with a given typename. For example, given the typename "A" (for DNS A reocrds), the function will return the number 1. A list of common DNS record types is @uref{http://en.wikipedia.org/wiki/List_of_DNS_record_types, here}. Note that not all DNS record types are supported by GNUnet GNSRECORD plugins at this time. @code{GNUNET_GNSRECORD_number_to_typename} can be used to obtain the typename associated with a given numeric value. For example, given the type number 1, the function will return the typename "A". @node GNS plugins @subsection GNS plugins Adding a new GNS record type typically involves writing (or extending) a GNSRECORD plugin. The plugin needs to implement the @code{gnunet_gnsrecord_plugin.h} API which provides basic functions that are needed by GNSRECORD to convert typenames and values of the respective record type to strings (and back). These gnsrecord plugins are typically implemented within their respective subsystems. Examples for such plugins can be found in the GNSRECORD, GNS and CONVERSATION subsystems. The @code{libgnunetgnsrecord} library is then used to locate, load and query the appropriate gnsrecord plugin. Which plugin is appropriate is determined by the record type (which is just a 32-bit integer). The @code{libgnunetgnsrecord} library loads all block plugins that are installed at the local peer and forwards the application request to the plugins. If the record type is not supported by the plugin, it should simply return an error code. The central functions of the block APIs (plugin and main library) are the same four functions for converting between values and strings, and typenames and numbers documented in the previous subsection. @node The GNS Client-Service Protocol @subsection The GNS Client-Service Protocol The GNS client-service protocol consists of two simple messages, the @code{LOOKUP} message and the @code{LOOKUP_RESULT}. Each @code{LOOKUP} message contains a unique 32-bit identifier, which will be included in the corresponding response. Thus, clients can send many lookup requests in parallel and receive responses out-of-order. A @code{LOOKUP} request also includes the public key of the GNS zone, the desired record type and fields specifying whether shortening is enabled or networking is disabled. Finally, the @code{LOOKUP} message includes the name to be resolved. The response includes the number of records and the records themselves in the format created by @code{GNUNET_GNSRECORD_records_serialize}. They can thus be deserialized using @code{GNUNET_GNSRECORD_records_deserialize}. @node Hijacking the DNS-Traffic using gnunet-service-dns @subsection Hijacking the DNS-Traffic using gnunet-service-dns This section documents how the gnunet-service-dns (and the gnunet-helper-dns) intercepts DNS queries from the local system. This is merely one method for how we can obtain GNS queries. It is also possible to change @code{resolv.conf} to point to a machine running @code{gnunet-dns2gns} or to modify libc's name system switch (NSS) configuration to include a GNS resolution plugin. The method described in this chapter is more of a last-ditch catch-all approach. @code{gnunet-service-dns} enables intercepting DNS traffic using policy based routing. We MARK every outgoing DNS-packet if it was not sent by our application. Using a second routing table in the Linux kernel these marked packets are then routed through our virtual network interface and can thus be captured unchanged. Our application then reads the query and decides how to handle it. If the query can be addressed via GNS, it is passed to @code{gnunet-service-gns} and resolved internally using GNS. In the future, a reverse query for an address of the configured virtual network could be answered with records kept about previous forward queries. Queries that are not hijacked by some application using the DNS service will be sent to the original recipient. The answer to the query will always be sent back through the virtual interface with the original nameserver as source address. @menu * Network Setup Details:: @end menu @node Network Setup Details @subsubsection Network Setup Details The DNS interceptor adds the following rules to the Linux kernel: @example iptables -t mangle -I OUTPUT 1 -p udp --sport $LOCALPORT --dport 53 \ -j ACCEPT iptables -t mangle -I OUTPUT 2 -p udp --dport 53 -j MARK \ --set-mark 3 ip rule add fwmark 3 table2 ip route add default via \ $VIRTUALDNS table2 @end example @c FIXME: Rewrite to reflect display which is no longer content by line @c FIXME: due to the < 74 characters limit. Line 1 makes sure that all packets coming from a port our application opened beforehand (@code{$LOCALPORT}) will be routed normally. Line 2 marks every other packet to a DNS-Server with mark 3 (chosen arbitrarily). The third line adds a routing policy based on this mark 3 via the routing table. @node Serving DNS lookups via GNS on W32 @subsection Serving DNS lookups via GNS on W32 This section documents how the libw32nsp (and gnunet-gns-helper-service-w32) do DNS resolutions of DNS queries on the local system. This only applies to GNUnet running on W32. W32 has a concept of "Namespaces" and "Namespace providers". These are used to present various name systems to applications in a generic way. Namespaces include DNS, mDNS, NLA and others. For each namespace any number of providers could be registered, and they are queried in an order of priority (which is adjustable). Applications can resolve names by using WSALookupService*() family of functions. However, these are WSA-only facilities. Common BSD socket functions for namespace resolutions are gethostbyname and getaddrinfo (among others). These functions are implemented internally (by default - by mswsock, which also implements the default DNS provider) as wrappers around WSALookupService*() functions (see "Sample Code for a Service Provider" on MSDN). On W32 GNUnet builds a libw32nsp - a namespace provider, which can then be installed into the system by using w32nsp-install (and uninstalled by w32nsp-uninstall), as described in "Installation Handbook". libw32nsp is very simple and has almost no dependencies. As a response to NSPLookupServiceBegin(), it only checks that the provider GUID passed to it by the caller matches GNUnet DNS Provider GUID, then connects to gnunet-gns-helper-service-w32 at 127.0.0.1:5353 (hardcoded) and sends the name resolution request there, returning the connected socket to the caller. When the caller invokes NSPLookupServiceNext(), libw32nsp reads a completely formed reply from that socket, unmarshalls it, then gives it back to the caller. At the moment gnunet-gns-helper-service-w32 is implemented to ever give only one reply, and subsequent calls to NSPLookupServiceNext() will fail with WSA_NODATA (first call to NSPLookupServiceNext() might also fail if GNS failed to find the name, or there was an error connecting to it). gnunet-gns-helper-service-w32 does most of the processing: @itemize @bullet @item Maintains a connection to GNS. @item Reads GNS config and loads appropriate keys. @item Checks service GUID and decides on the type of record to look up, refusing to make a lookup outright when unsupported service GUID is passed. @item Launches the lookup @end itemize When lookup result arrives, gnunet-gns-helper-service-w32 forms a complete reply (including filling a WSAQUERYSETW structure and, possibly, a binary blob with a hostent structure for gethostbyname() client), marshalls it, and sends it back to libw32nsp. If no records were found, it sends an empty header. This works for most normal applications that use gethostbyname() or getaddrinfo() to resolve names, but fails to do anything with applications that use alternative means of resolving names (such as sending queries to a DNS server directly by themselves). This includes some of well known utilities, like "ping" and "nslookup". @node Importing DNS Zones into GNS @subsection Importing DNS Zones into GNS This section discusses the challenges and problems faced when writing the Ascension tool. It also takes a look at possible improvements in the future. Consider the following diagram that shows the workflow of Ascension: @image{images/ascension_ssd,6in,,Ascensions workflow} Further the interaction between components of GNUnet are shown in the diagram below: @center @image{images/ascension_interaction,,6in,Ascensions workflow} @menu * Conversions between DNS and GNS:: * DNS Zone Size:: * Performance:: @end menu @cindex DNS Conversion @node Conversions between DNS and GNS @subsubsection Conversions between DNS and GNS The differences between the two name systems lies in the details and is not always transparent. For instance an SRV record is converted to a BOX record which is unique to GNS. This is done by converting to a BOX record from an existing SRV record: @example # SRV # _service._proto.name. TTL class SRV priority weight port target _sip._tcp.example.com. 14000 IN SRV 0 0 5060 www.example.com. # BOX # TTL BOX flags port protocol recordtype priority weight port target 14000 BOX n 5060 6 33 0 0 5060 www.example.com @end example Other records that need to undergo such transformation is the MX record type, as well as the SOA record type. Transformation of a SOA record into GNS works as described in the following example. Very important to note are the rname and mname keys. @example # BIND syntax for a clean SOA record @ IN SOA master.example.com. hostmaster.example.com. ( 2017030300 ; serial 3600 ; refresh 1800 ; retry 604800 ; expire 600 ) ; ttl # Recordline for adding the record $ gnunet-namestore -z example.com -a -n @ -t SOA -V \ rname=master.example.com mname=hostmaster.example.com \ 2017030300,3600,1800,604800,600 -e 7200s @end example The transformation of MX records is done in a simple way. @example # mail.example.com. 3600 IN MX 10 mail.example.com. $ gnunet-namestore -z example.com -n mail -R 3600 MX n 10,mail @end example Finally, one of the biggest struggling points were the NS records that are found in top level domain zones. The intended behaviour for those is to add GNS2DNS records for those so that gnunet-gns can resolve records for those domains on its own. Those require the values from DNS GLUE records, provided they are within the same zone. The following two examples show one record with a GLUE record and the other one does not have a GLUE record. This takes place in the 'com' TLD. @example # ns1.example.com 86400 IN A 127.0.0.1 # example.com 86400 IN NS ns1.example.com. $ gnunet-namestore -z com -n example -R 86400 GNS2DNS n \ example.com@@127.0.0.1 # example.com 86400 IN NS ns1.example.org. $ gnunet-namestore -z com -n example -R 86400 GNS2DNS n \ example.com@@ns1.example.org @end example As you can see, one of the GNS2DNS records has an IP address listed and the other one a DNS name. For the first one there is a GLUE record to do the translation directly and the second one will issue another DNS query to figure out the IP of ns1.example.org. A solution was found by creating a hierarchical zone structure in GNS and linking the zones using PKEY records to one another. This allows the resolution of the name servers to work within GNS while not taking control over unwanted zones. Currently the following record types are supported: @itemize @bullet @item A @item AAAA @item CNAME @item MX @item NS @item SRV @item TXT @end itemize This is not due to technical limitations but rather a practical ones. The problem occurs with DNSSEC enabled DNS zones. As records within those zones are signed periodically, and every new signature is an update to the zone, there are many revisions of zones. This results in a problem with bigger zones as there are lots of records that have been signed again but no major changes. Also trying to add records that are unknown that require a different format take time as they cause a CLI call of the namestore. Furthermore certain record types need transformation into a GNS compatible format which, depending on the record type, takes more time. Further a blacklist was added to drop for instance DNSSEC related records. Also if a record type is neither in the white list nor the blacklist it is considered as a loss of data and a message is shown to the user. This helps with transparency and also with contributing, as the not supported record types can then be added accordingly. @node DNS Zone Size @subsubsection DNS Zone Size Another very big problem exists with very large zones. When migrating a small zone the delay between adding of records and their expiry is negligible. However when working with big zones that easily have more than a few million records this delay becomes a problem. Records will start to expire well before the zone has finished migrating. This is usually not a problem but can cause a high CPU load when a peer is restarted and the records have expired. A good solution has not been found yet. One of the idea that floated around was that the records should be added with the s (shadow) flag to keep the records resolvable even if they expired. However this would introduce the problem of how to detect if a record has been removed from the zone and would require deletion of said record(s). Another problem that still persists is how to refresh records. Expired records are still displayed when calling gnunet-namestore but do not resolve with gnunet-gns. Zonemaster will sign the expired records again and make sure that the records are still valid. With a recent change this was fixed as gnunet-gns to improve the suffix lookup which allows for a fast lookup even with thousands of local egos. Currently the pace of adding records in general is around 10 records per second. Crypto is the upper limit for adding of records. The performance of your machine can be tested with the perf_crypto_* tools. There is still a big discrepancy between the pace of Ascension and the theoretical limit. A performance metric for measuring improvements has not yet been implemented in Ascension. @node Performance @subsubsection Performance The performance when migrating a zone using the Ascension tool is limited by a handful of factors. First of all ascension is written in Python3 and calls the CLI tools of GNUnet. This is comparable to a fork and exec call which costs a few CPU cycles. Furthermore all the records that are added to the same label are signed using the zones private key. This signing operation is very resource heavy and was optimized during development by adding the '-R' (Recordline) option to gnunet-namestore which allows to specify multiple records using the CLI tool. Assuming that in a TLD zone every domain has at least two name servers this halves the amount of signatures needed. Another improvement that could be made is with the addition of multiple threads or using asynchronous subprocesses when opening the GNUnet CLI tools. This could be implemented by simply creating more workers in the program but performance improvements were not tested. Ascension was tested using different hardware and database backends. Performance differences between SQLite and postgresql are marginal and almost non existent. What did make a huge impact on record adding performance was the storage medium. On a traditional mechanical hard drive adding of records were slow compared to a solid state disk. In conclusion there are many bottlenecks still around in the program, namely the single threaded implementation and inefficient, sequential calls of gnunet-namestore. In the future a solution that uses the C API would be cleaner and better. @cindex GNS Namecache @node GNS Namecache @section GNS Namecache The NAMECACHE subsystem is responsible for caching (encrypted) resolution results of the GNU Name System (GNS). GNS makes zone information available to other users via the DHT. However, as accessing the DHT for every lookup is expensive (and as the DHT's local cache is lost whenever the peer is restarted), GNS uses the NAMECACHE as a more persistent cache for DHT lookups. Thus, instead of always looking up every name in the DHT, GNS first checks if the result is already available locally in the NAMECACHE. Only if there is no result in the NAMECACHE, GNS queries the DHT. The NAMECACHE stores data in the same (encrypted) format as the DHT. It thus makes no sense to iterate over all items in the NAMECACHE --- the NAMECACHE does not have a way to provide the keys required to decrypt the entries. Blocks in the NAMECACHE share the same expiration mechanism as blocks in the DHT --- the block expires wheneever any of the records in the (encrypted) block expires. The expiration time of the block is the only information stored in plaintext. The NAMECACHE service internally performs all of the required work to expire blocks, clients do not have to worry about this. Also, given that NAMECACHE stores only GNS blocks that local users requested, there is no configuration option to limit the size of the NAMECACHE. It is assumed to be always small enough (a few MB) to fit on the drive. The NAMECACHE supports the use of different database backends via a plugin API. @menu * libgnunetnamecache:: * The NAMECACHE Client-Service Protocol:: * The NAMECACHE Plugin API:: @end menu @node libgnunetnamecache @subsection libgnunetnamecache The NAMECACHE API consists of five simple functions. First, there is @code{GNUNET_NAMECACHE_connect} to connect to the NAMECACHE service. This returns the handle required for all other operations on the NAMECACHE. Using @code{GNUNET_NAMECACHE_block_cache} clients can insert a block into the cache. @code{GNUNET_NAMECACHE_lookup_block} can be used to lookup blocks that were stored in the NAMECACHE. Both operations can be canceled using @code{GNUNET_NAMECACHE_cancel}. Note that canceling a @code{GNUNET_NAMECACHE_block_cache} operation can result in the block being stored in the NAMECACHE --- or not. Cancellation primarily ensures that the continuation function with the result of the operation will no longer be invoked. Finally, @code{GNUNET_NAMECACHE_disconnect} closes the connection to the NAMECACHE. The maximum size of a block that can be stored in the NAMECACHE is @code{GNUNET_NAMECACHE_MAX_VALUE_SIZE}, which is defined to be 63 kB. @node The NAMECACHE Client-Service Protocol @subsection The NAMECACHE Client-Service Protocol All messages in the NAMECACHE IPC protocol start with the @code{struct GNUNET_NAMECACHE_Header} which adds a request ID (32-bit integer) to the standard message header. The request ID is used to match requests with the respective responses from the NAMECACHE, as they are allowed to happen out-of-order. @menu * Lookup:: * Store:: @end menu @node Lookup @subsubsection Lookup The @code{struct LookupBlockMessage} is used to lookup a block stored in the cache. It contains the query hash. The NAMECACHE always responds with a @code{struct LookupBlockResponseMessage}. If the NAMECACHE has no response, it sets the expiration time in the response to zero. Otherwise, the response is expected to contain the expiration time, the ECDSA signature, the derived key and the (variable-size) encrypted data of the block. @node Store @subsubsection Store The @code{struct BlockCacheMessage} is used to cache a block in the NAMECACHE. It has the same structure as the @code{struct LookupBlockResponseMessage}. The service responds with a @code{struct BlockCacheResponseMessage} which contains the result of the operation (success or failure). In the future, we might want to make it possible to provide an error message as well. @node The NAMECACHE Plugin API @subsection The NAMECACHE Plugin API The NAMECACHE plugin API consists of two functions, @code{cache_block} to store a block in the database, and @code{lookup_block} to lookup a block in the database. @menu * Lookup2:: * Store2:: @end menu @node Lookup2 @subsubsection Lookup2 The @code{lookup_block} function is expected to return at most one block to the iterator, and return @code{GNUNET_NO} if there were no non-expired results. If there are multiple non-expired results in the cache, the lookup is supposed to return the result with the largest expiration time. @node Store2 @subsubsection Store2 The @code{cache_block} function is expected to try to store the block in the database, and return @code{GNUNET_SYSERR} if this was not possible for any reason. Furthermore, @code{cache_block} is expected to implicitly perform cache maintenance and purge blocks from the cache that have expired. Note that @code{cache_block} might encounter the case where the database already has another block stored under the same key. In this case, the plugin must ensure that the block with the larger expiration time is preserved. Obviously, this can done either by simply adding new blocks and selecting for the most recent expiration time during lookup, or by checking which block is more recent during the store operation. @cindex REVOCATION Subsystem @node REVOCATION Subsystem @section REVOCATION Subsystem The REVOCATION subsystem is responsible for key revocation of Egos. If a user learns that theis private key has been compromised or has lost it, they can use the REVOCATION system to inform all of the other users that their private key is no longer valid. The subsystem thus includes ways to query for the validity of keys and to propagate revocation messages. @menu * Dissemination:: * Revocation Message Design Requirements:: * libgnunetrevocation:: * The REVOCATION Client-Service Protocol:: * The REVOCATION Peer-to-Peer Protocol:: @end menu @node Dissemination @subsection Dissemination When a revocation is performed, the revocation is first of all disseminated by flooding the overlay network. The goal is to reach every peer, so that when a peer needs to check if a key has been revoked, this will be purely a local operation where the peer looks at its local revocation list. Flooding the network is also the most robust form of key revocation --- an adversary would have to control a separator of the overlay graph to restrict the propagation of the revocation message. Flooding is also very easy to implement --- peers that receive a revocation message for a key that they have never seen before simply pass the message to all of their neighbours. Flooding can only distribute the revocation message to peers that are online. In order to notify peers that join the network later, the revocation service performs efficient set reconciliation over the sets of known revocation messages whenever two peers (that both support REVOCATION dissemination) connect. The SET service is used to perform this operation efficiently. @node Revocation Message Design Requirements @subsection Revocation Message Design Requirements However, flooding is also quite costly, creating O(|E|) messages on a network with |E| edges. Thus, revocation messages are required to contain a proof-of-work, the result of an expensive computation (which, however, is cheap to verify). Only peers that have expended the CPU time necessary to provide this proof will be able to flood the network with the revocation message. This ensures that an attacker cannot simply flood the network with millions of revocation messages. The proof-of-work required by GNUnet is set to take days on a typical PC to compute; if the ability to quickly revoke a key is needed, users have the option to pre-compute revocation messages to store off-line and use instantly after their key has expired. Revocation messages must also be signed by the private key that is being revoked. Thus, they can only be created while the private key is in the possession of the respective user. This is another reason to create a revocation message ahead of time and store it in a secure location. @node libgnunetrevocation @subsection libgnunetrevocation The REVOCATION API consists of two parts, to query and to issue revocations. @menu * Querying for revoked keys:: * Preparing revocations:: * Issuing revocations:: @end menu @node Querying for revoked keys @subsubsection Querying for revoked keys @code{GNUNET_REVOCATION_query} is used to check if a given ECDSA public key has been revoked. The given callback will be invoked with the result of the check. The query can be canceled using @code{GNUNET_REVOCATION_query_cancel} on the return value. @node Preparing revocations @subsubsection Preparing revocations It is often desirable to create a revocation record ahead-of-time and store it in an off-line location to be used later in an emergency. This is particularly true for GNUnet revocations, where performing the revocation operation itself is computationally expensive and thus is likely to take some time. Thus, if users want the ability to perform revocations quickly in an emergency, they must pre-compute the revocation message. The revocation API enables this with two functions that are used to compute the revocation message, but not trigger the actual revocation operation. @code{GNUNET_REVOCATION_check_pow} should be used to calculate the proof-of-work required in the revocation message. This function takes the public key, the required number of bits for the proof of work (which in GNUnet is a network-wide constant) and finally a proof-of-work number as arguments. The function then checks if the given proof-of-work number is a valid proof of work for the given public key. Clients preparing a revocation are expected to call this function repeatedly (typically with a monotonically increasing sequence of numbers of the proof-of-work number) until a given number satisfies the check. That number should then be saved for later use in the revocation operation. @code{GNUNET_REVOCATION_sign_revocation} is used to generate the signature that is required in a revocation message. It takes the private key that (possibly in the future) is to be revoked and returns the signature. The signature can again be saved to disk for later use, which will then allow performing a revocation even without access to the private key. @node Issuing revocations @subsubsection Issuing revocations Given a ECDSA public key, the signature from @code{GNUNET_REVOCATION_sign} and the proof-of-work, @code{GNUNET_REVOCATION_revoke} can be used to perform the actual revocation. The given callback is called upon completion of the operation. @code{GNUNET_REVOCATION_revoke_cancel} can be used to stop the library from calling the continuation; however, in that case it is undefined whether or not the revocation operation will be executed. @node The REVOCATION Client-Service Protocol @subsection The REVOCATION Client-Service Protocol The REVOCATION protocol consists of four simple messages. A @code{QueryMessage} containing a public ECDSA key is used to check if a particular key has been revoked. The service responds with a @code{QueryResponseMessage} which simply contains a bit that says if the given public key is still valid, or if it has been revoked. The second possible interaction is for a client to revoke a key by passing a @code{RevokeMessage} to the service. The @code{RevokeMessage} contains the ECDSA public key to be revoked, a signature by the corresponding private key and the proof-of-work, The service responds with a @code{RevocationResponseMessage} which can be used to indicate that the @code{RevokeMessage} was invalid (i.e. proof of work incorrect), or otherwise indicates that the revocation has been processed successfully. @node The REVOCATION Peer-to-Peer Protocol @subsection The REVOCATION Peer-to-Peer Protocol Revocation uses two disjoint ways to spread revocation information among peers. First of all, P2P gossip exchanged via CORE-level neighbours is used to quickly spread revocations to all connected peers. Second, whenever two peers (that both support revocations) connect, the SET service is used to compute the union of the respective revocation sets. In both cases, the exchanged messages are @code{RevokeMessage}s which contain the public key that is being revoked, a matching ECDSA signature, and a proof-of-work. Whenever a peer learns about a new revocation this way, it first validates the signature and the proof-of-work, then stores it to disk (typically to a file $GNUNET_DATA_HOME/revocation.dat) and finally spreads the information to all directly connected neighbours. For computing the union using the SET service, the peer with the smaller hashed peer identity will connect (as a "client" in the two-party set protocol) to the other peer after one second (to reduce traffic spikes on connect) and initiate the computation of the set union. All revocation services use a common hash to identify the SET operation over revocation sets. The current implementation accepts revocation set union operations from all peers at any time; however, well-behaved peers should only initiate this operation once after establishing a connection to a peer with a larger hashed peer identity. @cindex FS @cindex FS Subsystem @node File-sharing (FS) Subsystem @section File-sharing (FS) Subsystem This chapter describes the details of how the file-sharing service works. As with all services, it is split into an API (libgnunetfs), the service process (gnunet-service-fs) and user interface(s). The file-sharing service uses the datastore service to store blocks and the DHT (and indirectly datacache) for lookups for non-anonymous file-sharing. Furthermore, the file-sharing service uses the block library (and the block fs plugin) for validation of DHT operations. In contrast to many other services, libgnunetfs is rather complex since the client library includes a large number of high-level abstractions; this is necessary since the Fs service itself largely only operates on the block level. The FS library is responsible for providing a file-based abstraction to applications, including directories, meta data, keyword search, verification, and so on. The method used by GNUnet to break large files into blocks and to use keyword search is called the "Encoding for Censorship Resistant Sharing" (ECRS). ECRS is largely implemented in the fs library; block validation is also reflected in the block FS plugin and the FS service. ECRS on-demand encoding is implemented in the FS service. NOTE: The documentation in this chapter is quite incomplete. @menu * Encoding for Censorship-Resistant Sharing (ECRS):: * File-sharing persistence directory structure:: @end menu @cindex ECRS @cindex Encoding for Censorship-Resistant Sharing @node Encoding for Censorship-Resistant Sharing (ECRS) @subsection Encoding for Censorship-Resistant Sharing (ECRS) When GNUnet shares files, it uses a content encoding that is called ECRS, the Encoding for Censorship-Resistant Sharing. Most of ECRS is described in the (so far unpublished) research paper attached to this page. ECRS obsoletes the previous ESED and ESED II encodings which were used in GNUnet before version 0.7.0. The rest of this page assumes that the reader is familiar with the attached paper. What follows is a description of some minor extensions that GNUnet makes over what is described in the paper. The reason why these extensions are not in the paper is that we felt that they were obvious or trivial extensions to the original scheme and thus did not warrant space in the research report. @menu * Namespace Advertisements:: * KSBlocks:: @end menu @node Namespace Advertisements @subsubsection Namespace Advertisements @c %**FIXME: all zeroses -> ? An @code{SBlock} with identifier all zeros is a signed advertisement for a namespace. This special @code{SBlock} contains metadata describing the content of the namespace. Instead of the name of the identifier for a potential update, it contains the identifier for the root of the namespace. The URI should always be empty. The @code{SBlock} is signed with the content provider's RSA private key (just like any other SBlock). Peers can search for @code{SBlock}s in order to find out more about a namespace. @node KSBlocks @subsubsection KSBlocks GNUnet implements @code{KSBlocks} which are @code{KBlocks} that, instead of encrypting a CHK and metadata, encrypt an @code{SBlock} instead. In other words, @code{KSBlocks} enable GNUnet to find @code{SBlocks} using the global keyword search. Usually the encrypted @code{SBlock} is a namespace advertisement. The rationale behind @code{KSBlock}s and @code{SBlock}s is to enable peers to discover namespaces via keyword searches, and, to associate useful information with namespaces. When GNUnet finds @code{KSBlocks} during a normal keyword search, it adds the information to an internal list of discovered namespaces. Users looking for interesting namespaces can then inspect this list, reducing the need for out-of-band discovery of namespaces. Naturally, namespaces (or more specifically, namespace advertisements) can also be referenced from directories, but @code{KSBlock}s should make it easier to advertise namespaces for the owner of the pseudonym since they eliminate the need to first create a directory. Collections are also advertised using @code{KSBlock}s. @c https://old.gnunet.org/sites/default/files/ecrs.pdf @node File-sharing persistence directory structure @subsection File-sharing persistence directory structure This section documents how the file-sharing library implements persistence of file-sharing operations and specifically the resulting directory structure. This code is only active if the @code{GNUNET_FS_FLAGS_PERSISTENCE} flag was set when calling @code{GNUNET_FS_start}. In this case, the file-sharing library will try hard to ensure that all major operations (searching, downloading, publishing, unindexing) are persistent, that is, can live longer than the process itself. More specifically, an operation is supposed to live until it is explicitly stopped. If @code{GNUNET_FS_stop} is called before an operation has been stopped, a @code{SUSPEND} event is generated and then when the process calls @code{GNUNET_FS_start} next time, a @code{RESUME} event is generated. Additionally, even if an application crashes (segfault, SIGKILL, system crash) and hence @code{GNUNET_FS_stop} is never called and no @code{SUSPEND} events are generated, operations are still resumed (with @code{RESUME} events). This is implemented by constantly writing the current state of the file-sharing operations to disk. Specifically, the current state is always written to disk whenever anything significant changes (the exception are block-wise progress in publishing and unindexing, since those operations would be slowed down significantly and can be resumed cheaply even without detailed accounting). Note that if the process crashes (or is killed) during a serialization operation, FS does not guarantee that this specific operation is recoverable (no strict transactional semantics, again for performance reasons). However, all other unrelated operations should resume nicely. Since we need to serialize the state continuously and want to recover as much as possible even after crashing during a serialization operation, we do not use one large file for serialization. Instead, several directories are used for the various operations. When @code{GNUNET_FS_start} executes, the master directories are scanned for files describing operations to resume. Sometimes, these operations can refer to related operations in child directories which may also be resumed at this point. Note that corrupted files are cleaned up automatically. However, dangling files in child directories (those that are not referenced by files from the master directories) are not automatically removed. Persistence data is kept in a directory that begins with the "STATE_DIR" prefix from the configuration file (by default, "$SERVICEHOME/persistence/") followed by the name of the client as given to @code{GNUNET_FS_start} (for example, "gnunet-gtk") followed by the actual name of the master or child directory. The names for the master directories follow the names of the operations: @itemize @bullet @item "search" @item "download" @item "publish" @item "unindex" @end itemize Each of the master directories contains names (chosen at random) for each active top-level (master) operation. Note that a download that is associated with a search result is not a top-level operation. In contrast to the master directories, the child directories are only consulted when another operation refers to them. For each search, a subdirectory (named after the master search synchronization file) contains the search results. Search results can have an associated download, which is then stored in the general "download-child" directory. Downloads can be recursive, in which case children are stored in subdirectories mirroring the structure of the recursive download (either starting in the master "download" directory or in the "download-child" directory depending on how the download was initiated). For publishing operations, the "publish-file" directory contains information about the individual files and directories that are part of the publication. However, this directory structure is flat and does not mirror the structure of the publishing operation. Note that unindex operations cannot have associated child operations. @cindex REGEX subsystem @node REGEX Subsystem @section REGEX Subsystem Using the REGEX subsystem, you can discover peers that offer a particular service using regular expressions. The peers that offer a service specify it using a regular expressions. Peers that want to patronize a service search using a string. The REGEX subsystem will then use the DHT to return a set of matching offerers to the patrons. For the technical details, we have Max's defense talk and Max's Master's thesis. @c An additional publication is under preparation and available to @c team members (in Git). @c FIXME: Where is the file? Point to it. Assuming that it's szengel2012ms @menu * How to run the regex profiler:: @end menu @node How to run the regex profiler @subsection How to run the regex profiler The gnunet-regex-profiler can be used to profile the usage of mesh/regex for a given set of regular expressions and strings. Mesh/regex allows you to announce your peer ID under a certain regex and search for peers matching a particular regex using a string. See @uref{https://old.gnunet.org/szengel2012ms, szengel2012ms} for a full introduction. First of all, the regex profiler uses GNUnet testbed, thus all the implications for testbed also apply to the regex profiler (for example you need password-less ssh login to the machines listed in your hosts file). @strong{Configuration} Moreover, an appropriate configuration file is needed. Generally you can refer to the @file{contrib/regex_profiler_infiniband.conf} file in the sourcecode of GNUnet for an example configuration. In the following paragraph the important details are highlighted. Announcing of the regular expressions is done by the gnunet-daemon-regexprofiler, therefore you have to make sure it is started, by adding it to the START_ON_DEMAND set of ARM: @example [regexprofiler] START_ON_DEMAND = YES @end example @noindent Furthermore you have to specify the location of the binary: @example [regexprofiler] # Location of the gnunet-daemon-regexprofiler binary. BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler # Regex prefix that will be applied to all regular expressions and # search string. REGEX_PREFIX = "GNVPN-0001-PAD" @end example @noindent When running the profiler with a large scale deployment, you probably want to reduce the workload of each peer. Use the following options to do this. @example [dht] # Force network size estimation FORCE_NSE = 1 [dhtcache] DATABASE = heap # Disable RC-file for Bloom filter? (for benchmarking with limited IO # availability) DISABLE_BF_RC = YES # Disable Bloom filter entirely DISABLE_BF = YES [nse] # Minimize proof-of-work CPU consumption by NSE WORKBITS = 1 @end example @noindent @strong{Options} To finally run the profiler some options and the input data need to be specified on the command line. @example gnunet-regex-profiler -c config-file -d log-file -n num-links \ -p path-compression-length -s search-delay -t matching-timeout \ -a num-search-strings hosts-file policy-dir search-strings-file @end example @noindent Where... @itemize @bullet @item ... @code{config-file} means the configuration file created earlier. @item ... @code{log-file} is the file where to write statistics output. @item ... @code{num-links} indicates the number of random links between started peers. @item ... @code{path-compression-length} is the maximum path compression length in the DFA. @item ... @code{search-delay} time to wait between peers finished linking and starting to match strings. @item ... @code{matching-timeout} timeout after which to cancel the searching. @item ... @code{num-search-strings} number of strings in the search-strings-file. @item ... the @code{hosts-file} should contain a list of hosts for the testbed, one per line in the following format: @itemize @bullet @item @code{user@@host_ip:port} @end itemize @item ... the @code{policy-dir} is a folder containing text files containing one or more regular expressions. A peer is started for each file in that folder and the regular expressions in the corresponding file are announced by this peer. @item ... the @code{search-strings-file} is a text file containing search strings, one in each line. @end itemize @noindent You can create regular expressions and search strings for every AS in the Internet using the attached scripts. You need one of the @uref{http://data.caida.org/datasets/routing/routeviews-prefix2as/, CAIDA routeviews prefix2as} data files for this. Run @example create_regex.py @end example @noindent to create the regular expressions and @example create_strings.py @end example @noindent to create a search strings file from the previously created regular expressions. @cindex REST subsystem @node REST Subsystem @section REST Subsystem Using the REST subsystem, you can expose REST-based APIs or services. The REST service is designed as a pluggable architecture. To create a new REST endpoint, simply add a library in the form ``plugin_rest_*''. The REST service will automatically load all REST plugins on startup. @strong{Configuration} The REST service can be configured in various ways. The reference config file can be found in @file{src/rest/rest.conf}: @example [rest] REST_PORT=7776 REST_ALLOW_HEADERS=Authorization,Accept,Content-Type REST_ALLOW_ORIGIN=* REST_ALLOW_CREDENTIALS=true @end example The port as well as @deffn{cross-origin resource sharing} (CORS) @end deffn headers that are supposed to be advertised by the rest service are configurable. @menu * Namespace considerations:: * Endpoint documentation:: @end menu @node Namespace considerations @subsection Namespace considerations The @command{gnunet-rest-service} will load all plugins that are installed. As such it is important that the endpoint namespaces do not clash. For example, plugin X might expose the endpoint ``/xxx'' while plugin Y exposes endpoint ``/xxx/yyy''. This is a problem if plugin X is also supposed to handle a call to ``/xxx/yyy''. Currently the REST service will not complain or warn about such clashes, so please make sure that endpoints are unambiguous. @node Endpoint documentation @subsection Endpoint documentation This is WIP. Endpoints should be documented appropriately. Preferably using annotations. @cindex RPS Subsystem @node RPS Subsystem @section RPS Subsystem In literature, Random Peer Sampling (RPS) refers to the problem of reliably drawing random samples from an unstructured p2p network. Doing so in a reliable manner is not only hard because of inherent problems but also because of possible malicious peers that could try to bias the selection. It is useful for all kind of gossip protocols that require the selection of random peers in the whole network like gathering statistics, spreading and aggregating information in the network, load balancing and overlay topology management. The approach chosen in the rps implementation in GNUnet follows the Brahms@uref{https://bib.gnunet.org/full/date.html\#2009_5f0} design. The current state is "work in progress". There are a lot of things that need to be done, primarily finishing the experimental evaluation and a re-design of the API. The abstract idea is to subscribe to connect to/start the rps service and request random peers that will be returned when they represent a random selection from the whole network with high probability. An additional feature to the original Brahms-design is the selection of sub-groups: The GNUnet implementation of rps enables clients to ask for random peers from a group that is defined by a common shared secret. (The secret could of course also be public, depending on the use-case.) Another addition to the original protocol was made: The sampler mechanism that was introduced in Brahms was slightly adapted and used to actually sample the peers and returned to the client. This is necessary as the original design only keeps peers connected to random other peers in the network. In order to return random peers to client requests independently random, they cannot be drawn from the connected peers. The adapted sampler makes sure that each request for random peers is independent from the others. @node Brahms @subsection Brahms The high-level concept of Brahms is two-fold: Combining push-pull gossip with locally fixing a assumed bias using cryptographic min-wise permutations. The central data structure is the view - a peer's current local sample. This view is used to select peers to push to and pull from. This simple mechanism can be biased easily. For this reason Brahms 'fixes' the bias by using the so-called sampler. A data structure that takes a list of elements as input and outputs a random one of them independently of the frequency in the input set. Both an element that was put into the sampler a single time and an element that was put into it a million times have the same probability of being the output. This is achieved this is achieved with exploiting min-wise independent permutations. In rps we use HMACs: On the initialisation of a sampler element, a key is chosen at random. On each input the HMAC with the random key is computed. The sampler element keeps the element with the minimal HMAC. In order to fix the bias in the view, a fraction of the elements in the view are sampled through the sampler from the random stream of peer IDs. According to the theoretical analysis of Bortnikov et al. this suffices to keep the network connected and having random peers in the view.