req (7910B)
1 * Requirements and Related Work 2 3 This chapter describes our requirements for a system that we can use to build a 4 secure social network and introduces currently available alternatives to 5 centralized social networks. This chapter is partly based on \cite{fsw-paranoia}. 6 7 ** Privacy 8 9 Our goal is to provide a system for social interaction in a privacy-protecting 10 and scalable manner. A truly private communication system we're aiming for 11 should have the following properties: 12 13 - End-to-end encryption: only the intended recipients can read the messages, no 14 server or network operators along the way between the communicating 15 parties. To ensure this, it is not enough to use link-level encryption between 16 a client and a server, end-to-end encryption is needed, which means that every 17 participant in the system has to manage their own cryptographic keys on their 18 own systems. 19 - Perfect forward secrecy: messages transmitted over the network can't be 20 decrypted later if a user's private key is compromised. To achieve this, 21 temporary session keys need to be used when encrypting messages. 22 - When logging a message to disk it should not contain a cryptographic signature 23 of the sender, so if someone gains access to the log, it does not provide a 24 proof that someone actually transmitted the messages. 25 - An observer cannot determine for sure when two parties are communicating and 26 how much data they exchange with each other. This requires a trade-off: while 27 sending packets through other participants in the network would ensure this, 28 this also increases message delay. 29 - Padding of packets is necessary to prevent attacks based on statistical 30 analysis of packet lengths. This is absolutely necessary when sending messages 31 through multiple hops, otherwise it would be enough to monitor packet lengths 32 to determine where a packet is forwarded to. 33 - Delayed forwarding is also necessary to prevent correlation of received and 34 transmitted packets when forwarding. Sending multiple packets at once at 35 certain intervals would help to prevent this. 36 - Private contact list: only visible to whom it needs be -- typically other 37 friends -- not available publicly or managed on servers where server operators 38 have access to it. 39 - Every component of the system should be open source, so one can ensure it 40 really works as advertised. A closed component would be a security risk, as it 41 could leak information or otherwise weaken the security of the system, which 42 is harder to detect when no source code is available. This can be enforced 43 with a copyleft license, such as the Affero General Public License (AGPL). 44 45 Currently available alternatives to centralized social network services are in 46 most cases federated networks, which use a standardized protocol between servers 47 enabling many service providers to take part in the network and communicate with 48 each other. Examples for such systems include web-based platforms like Diaspora 49 or Friendica, and others using a messaging protocol extended with social network 50 functionalities -- friendship establishment, status messages to friends -- like 51 OneSocialWeb, which is based on XMPP (Extensible Messaging and Presence 52 Protocol) or PSYC (Protocol for SYnchronous Conferencing). 53 54 These federated systems intend to offer more privacy than centralized systems, 55 but they still not fulfill most of the requirements above, in most cases they 56 only provide link-level encryption. They still store personal data on servers 57 unencrypted, just like centralized systems. Users can have a server themselves, 58 but that requires server administration skills which average users do not have, 59 so we'll end up with a few larger servers and several smaller ones, just like in 60 the case of email. Privacy is an even more serious issue in this case as it's no 61 longer enough to trust one company, there are several server operators in this 62 architecture sharing personal data with each other -- users' messages and 63 profile data are transmitted to and stored unencrypted on servers of their 64 friends as well. Even if some users run their own server, they would still 65 communicate with people without their own server, exposing personal data to even 66 more server operators this way. 67 68 It is possible to enhance privacy of these federated protocols by adding 69 end-to-end encryption on top of them, this is what PGP (Pretty Good Privacy) 70 does for e-mail and OTR (Off-The-Record Messaging) does for instant messaging 71 protocols. While this prevents servers from reading the content of messages, 72 they still know everything else about a message, e.g. its sender, recipient, and 73 size. There's an additional overhead of base64 encoding, which is needed because 74 the underlying messaging protocols often do not support binary data 75 transfer. Furthermore PGP and OTR can only be used for one-to-one messaging, 76 one-to-many and many-to-many messaging are not supported by them. 77 78 ** Scalability 79 80 Efficient message distribution is crucial in social networks, as one of their 81 most prevalent features is sending one-to-many status updates, but many-to-many 82 group messaging is frequently used as well. To deliver these messages most 83 efficiently, multicast message distribution would be necessary. IP multicast 84 does not scale to a large number of channels, as multicast routing tables would 85 fill up very fast -- at least one channel would be needed for a user's status 86 updates, and similarly, at least one for each group -- thus this has to be 87 implemented on the application layer to make it work. 88 89 XMPP has a simple distribution strategy, it sends one message per recipient 90 server, which is only efficient if there are many large sites. XMPP's 91 scalability is also limited by the way it handles presence updates, the majority 92 of inter-server traffic in the XMPP network consists of this type of messages. 93 94 XMPP's use of an XML stream as network protocol without any framing makes it 95 less efficient, as it complicates parsing and makes it impossible to transport 96 binary data without Base64 or similar encoding. Also, protocol extensions 97 described in XML add a large amount of unnecessary verbosity to the protocol. 98 99 PSYC is another federated messaging protocol with a compact but extensible 100 syntax, which enables fast parsing and small bandwidth usage. It is a text-based 101 protocol with length prefixes for binary data. Benchmarks we made show that it 102 outperforms XMPP and JSON when it comes to parsing speed \cite{psyc-bench}. 103 104 PSYC sends out one message per recipient server when distributing messages, but 105 it also has manual multicast tree configuration. 106 107 ** Peer-to-peer networks 108 109 Peer-to-peer (P2P) networks come closer to fulfilling these privacy 110 requirements, as in many cases they're designed with security and privacy in 111 mind from the ground up. 112 113 Projects such as Tor and I2P aim to create an anonymous overlay network, while 114 Freenet and GNUnet focus on anonymous information storage and retrieval. GNUnet 115 also provides an extensive framework for writing P2P applications, including 116 packet-based communication over different transport mechanisms. 117 118 In a P2P network every user of the network runs the P2P software on their own 119 computers (a computer in the P2P network is referred to as a node). This allows 120 for creating a network architecture where servers are not needed to store and 121 manage user data, every user can do so on their own node, giving them more 122 control over their data. High-capacity servers we had in federated networks 123 would be still useful in a P2P network, they can forward (and store when needed) 124 encrypted data without being able to decrypt them, this way improving 125 throughput, connectivity and stability of the network. 126 127 Combining peer-to-peer network technology with social network semantics allows 128 for creating a scalable, privacy-protecting social network based on connections 129 of trusted peers. The next section describes the architecture of such a network.