* Architecture Secure Share intends to implement a scalable P2P social network enabling real-time one-to-one, one-to-many and many-to-many message distribution for applications using the network while fulfilling the privacy requirements described in the previous chapter. It provides private and group messaging, status updates and profiles in the first prototype version, while keeping the protocol extensible allowing various social applications to be built on top later. By combining PSYC with a P2P network architecture we get an efficient and extensible protocol provided by PSYC and security and privacy properties provided by the underlying P2P network. ** P2P network architecture Many P2P networks use an architecture where nodes connect to arbitrary peers, no trust relation exists between them. A problem with this approach is that some nodes could use more resources of the network than they contribute to it (freeloaders), which can be alleviated by applying an economic model in the network. For instance GNUnet uses an excess-based economy: a node when idle does favors for free, but when busy it only works for nodes it likes and charges them for favors they request, which they can pay back by doing a favor in return. Another problem that could arise in this architecture are malicious nodes who can perform various active attacks, including blocking access to parts of the network, or returning false information to certain requests. These can be avoided to some extent by randomized routing and by making it harder to create new identities in the network. A different approach we use is a friend-to-friend (F2F) architecture where nodes only connect to friendly peers whom they trust. This has the advantage that it avoids many attacks involving malicious nodes in the network. An attacker has to infiltrate a user's social circle to perform a successful attack, which is much harder. By adding a trust level metric to social connections we can further differentiate between more and less trusted nodes in the network. Also, a F2F architecture gives better incentives to participants in the network: users help their friends by forwarding packets for them instead of random strangers. Nodes with high bandwidth and no connection restrictions -- e.g. server machines in data centers -- can improve throughput and connectivity in the network by serving their owner's social circle. Other systems based on a F2F architecture include Freenet \cite{dark-freenet}, Drac \cite{drac}, Tonika, and GNUnet has a F2F mode as well. ** Structure of the network Another aspect of P2P networks is whether they're structured or not. In structured networks the structure of the network is predefined, the node ID determines the position of the node in the network, this information is enough to be able to route packets to their destination. Often a distributed hash table (DHT) is used in structured P2P networks which provides hash table functionality distributed over many nodes in the network. A different approach is an unstructured network like the Internet, where arbitrary nodes can connect, no structure is imposed upon the nodes. In this case a routing table is needed to be able to route a packet to its destination. A social network could be built purely using a DHT, LifeSocial \cite{lifesocial} is an example of such a network. In this case every shared status message, image or document would become an entry in the DHT, and a profile consists of a collection of links to other DHT entries. To ensure only the intended recipients have access to private data, DHT entries are encrypted with a symmetric key, which is attached to the entry encrypted with every user's public key who should have access to the entry. This means that there's no forward secrecy in this network, if a user's private key is compromised all these entries can still be decrypted with that key. Even if noticed in time, re-encrypting all entries affected by a compromised key is quite a costly operation when the number of entries become larger after using the system over the years. For our case either an unstructured network is suitable, or a structured network where the structure is only used for routing, and not for storing user data in a DHT. In our architecture data is pushed once to recipients who store it locally as long as they need it, which means all profile data, messages and received files are all available locally -- even offline -- and can be viewed and searched using local tools on the personal device. ** Software components In a P2P network every user runs the P2P software on their devices, so it's important that it is multi-platform, lightweight, and written in a compiled language, so we can easily run it on all popular desktop platforms and small devices as well, including plug computers, home routers, and even smartphones. In our case the P2P software runs as a daemon -- a background process -- on the local machine or on another device on the network. Client applications connect to this daemon and integrate into the desktop or mobile GUI environment running on the system. Server machines, home routers and plug computers act as intermediary nodes in the system, helping their owners' social network by forwarding packets for them. Mobile phones require a different approach. Continuous network usage would drain the battery quite fast, so we'll have to minimize it by disabling packet forwarding for mobile nodes and connecting only to a trusted node with good connectivity -- e.g. a server machine or a plug computer at home -- which would forward the necessary packets for the mobile node. ** Peer-to-peer framework We have examined various P2P systems looking for an implementation that can serve as a basis for our social messaging platform. The criteria for a suitable P2P framework was: - Free/libre/open-source software. - Multi-platform, lightweight and written in a compiled language. - Implements and provides an API for essential P2P features such as bootstrapping, addressing, routing, encryption and NAT traversal. We have found GNUnet to be the most promising implementation out there satisfying these requirements. It is a modular P2P framework written in C, providing an API for essential P2P functionalities. It supports advanced NAT (Network Address Translation) traversal, which enables contacting nodes without a public IP address typically found in home or corporate networks. Furthermore it has several transport mechanisms with automatic transport selection, including TCP, UDP, HTTP(S), SMTP and ad-hoc WiFi mesh networks. It also provides various routing schemes and a distributed hash table. It has three operation modes: in P2P mode it makes connections with any peer in the network, in friend-to-friend (F2F) mode only trusted nodes are connected, and in mixed mode a minimum number of trusted nodes are required to be connected at all times. GNUnet currently has two options for routing packets in the network: the distance vector and the mesh service. The distance vector (DV) service uses a fish-eye bounded distance vector protocol \cite{gnunet-decrouting}, which builds a routing table by gossiping about neighboring peers within a limited number of hops distance. It is a link-state routing protocol with improved efficiency: nodes only know about the state of a local neighborhood, and link state of nodes close to each other are updated more often than of nodes multiple hops away. The DV service also provides onion routing of packets through multiple hops, which improves network connectivity by connecting two peers behind NAT through an intermediary hop, and makes it harder for an observer to determine who is talking to whom. The mesh service creates tunnels through several hops and supports multicast as well. Initial routes to recipients are discovered using the DHT. It is still being heavily worked on by the GNUnet team, for instance encryption is missing and has to be implemented for the multicast groups in order to make it useful for our purpose. These routing methods only support delivery of packets to connected nodes, in order to provide offline messaging, we'll need a store-and-forward mechanism in the network. This can be implemented by storing encrypted packets on more stable nodes in the network, until the recipient comes back online. #+BEGIN_COMMENT GNUnet's DHT component can be used for facilitating the bootstrapping process by storing user public key to current node ID mappings in the DHT. This allows peers offline for a longer period to look up the current node of a contact in order to re-establish connection to the network, or it can be used to publish addresses of nodes hosting public groups or providing a public news feed. #+END_COMMENT GNUnet also has an anonymous file sharing component which uses a DHT together with the GNUnet Anonymity Protocol (GAP). For our use case -- transferring files between friends -- this is not needed, instead we transfer files just like other messages, using PSYC's multicast distribution channels. As the PSYC packet syntax supports binary data without any encoding, this causes no additional overhead. In order to transfer files, we would have to split them up into smaller fragments, as the maximum packet size supported by GNUnet is 64KB. #+CAPTION: Components and message flow in GNUnet #+LABEL: fig:arch #+ATTR_LaTeX: width=8.2cm placement=[h!] [[./gnunet.png]] ** Messaging daemon GNUnet's modular architecture allows us to extend it with a service that implements a messaging protocol, manages the connections between people, and provides a local client interface. This service -- called psycd -- uses the PSYC protocol for communication with both other peers and local clients. Psycd sends messages through GNUnet core, which encrypts the message and passes it to the modular transport system, sending packets through one of its transport plugins. In our prototype we use direct connections to peers. Users manually add their friends by exchanging hello messages, which contain their public key and current addresses. For the prototype version the focus was on the implementation of the messaging daemon, and we intend to work on the underlying routing mechanism in future versions. See figure \ref{fig:arch} for an illustration of the components used in the system. Dotted parts are not existing yet, only planned. The arrows depict the flow of messages between components. ** Functionality One of the core concepts of PSYC is programmable channels with their own subscription lists. Using this combined with custom user interfaces makes it possible to implement the usual functionality found in centralized and federated social networks, like private and group messages, status updates, photo and link sharing, as well as features not found in those networks, like sharing of files and custom content, or real-time notifications for custom events. As Secure Share runs on the users' own device and stores all incoming messages and data locally, this enables offline usage and local search in the data received from subscribed friends or groups.