* Implementation

This chapter describes core concepts in PSYC, how they are applied in a
peer-to-peer context and what changes we had to make to the federated PSYC
\cite{psyc-paper} protocol to make it work in a peer-to-peer network.

Federated PSYC is the existing implementation of the PSYC protocol designed for
a federated architecture. It is implemented as a stand-alone daemon process
written in the LPC language.

P2P PSYC is the new implementation we have developed and the one we use in
Secure Share. The messaging daemon -- called psycd -- is implemented in C as a
service in the GNUnet framework. It uses GNUnet libraries for communication with
the rest of GNUnet, and libpsyc for the parsing and rendering of PSYC packets.
It stores data in an SQLite database.

** Syntax

PSYC is a text-based protocol with length prefixes for binary data, which makes
it possible to transmit any kind of content in PSYC packets efficiently while
keeping the protocol extensible. Its syntax is described in [[#syntax][Appendix 1]].

An example packet looks like this:

#+BEGIN_SRC psyc
:_context	psyc://J61VSCQA:g/#test
:_source_relay	psyc://I0GCD93U:g/
70
=_simple_var	value
:_binary_var 5	value
_method_name
Packet
body
here.
|
#+END_SRC

A packet contains a routing header, followed by the length of the rest of the packet,
context state modifiers, the method name and the packet body.

** Identifiers

In federated PSYC a server is identified by its DNS domain name. A server hosts
person and group entities, each of which can manage several channels. Uniforms
serve as identifiers for entities or channels, described with a URI (Uniform
Resource Identifier) syntax:

: psyc://host[:port[transport]][/[entity-type]entity[#channel]]
: psyc://example.net/~alice#friends

In peer-to-peer PSYC DNS is not employed, a public key is used instead to
identify node, person or group. GNUnet uses a SHA-512 hash of the public key as
node identifiers, we use a similar method for identifying entities. The
ASCII-encoded version of this hash becomes the host part of the uniform, with no
port number and 'g' as transport identifier:

: psyc://pubkey-hash:g[/[entity-type]entity[#channel]]
: psyc://I0GC...L29G:g/#friends

As these identifiers are very long and not user-friendly, they can be aliased to
shorter nicknames. The aliases are only used in client applications, they do not
appear on the protocol level.

In the prototype version GNUnet's host keys are used for identifying person
entities as well, this simplification allows only one person per node. A more
elaborate identification scheme is to be implemented later.

Each user will have a master key which serves as the identifier of the person,
its purpose is to sign subkeys used by various devices of the person. If a
subkey gets compromised, the master key can be used to prune messages sent with
the compromised key.

These subkeys are assigned to person entities. A GNUnet node can host one or
more entities. When using the distance vector transport, node and entity IDs are
added to the DV routing table, and nodes gossip about available peers and
entities in a local neighborhood up to a limited number of hops away, in the
social circle of users. When using the mesh service, user ID to current node ID
mappings are stored in the DHT.

** Circuits

A circuit is a virtual connection between two PSYC nodes, packets are sent and
received over circuits. When sending packets the circuit type is determined by
the transport specified in the target uniform.

In federated PSYC we had TCP, UDP and TLS transports. In P2P PSYC psycd
implements two circuit types so far: TCP circuits for local clients and GNUnet
circuits for remote peers. Unix sockets, TLS and possibly UDP circuits are
planned for later.

** Contacting peers

In federated PSYC it was enough to know the uniform of a person or group to
establish contact. The uniform contains the host name, port number and transport
method, which is all the information needed to establish connection to the
remote entity.

When using PSYC over P2P, two nodes have to know each other's public key and
know how to reach the node associated with the public key. GNUnet introduces
nodes to each other using hello messages which contain a public key and various
transport methods and addresses which can be used to establish contact with the
node. In case of the DV transport a hello message contains the identifier of
another node through which it can be reached. The DV routing protocol gossips
about connected nodes and entities in the network so they become reachable by
their social network.

When two users want to talk to each other, they should have received a hello
message from the other party beforehand. When using the DV transport they might
already know about each other if they are connected through common friends and
received a gossip message about the other node. If they are on the same network
they would discover each other through IPv4 broadcast or IPv6 multicast, or when
using the WLAN transport a WiFi mesh network is created from the present
nodes. Otherwise a hello message can be exchanged manually between users, using
e.g. email or a USB stick. When sending a hello message over an insecure channel
it should be encrypted using a shared secret in order to maintain
confidentiality and integrity of the information contained within. Usually it's
enough to exchange hello messages manually once when establishing connection for
the first time, after that more stable, longer running nodes would be available
to bootstrap a reconnecting node.

When connection is established between two users, they set appropriate trust
levels for each other -- which can be used in routing decisions in the network --
and they subscribe one or more channels of the other party.

** Entities

Entities are addressable objects in the PSYC network. Entity types include place
entities which are used for group communication or news feeds, and person
entities which can make friendships between each other and subscribe to other
entities. Each entity manages one or more channels with different subscription
lists.

Psycd implements person entities enabling clients to link to their entity, send
and receive messages and manage membership of various channels. It also has a
simple implementation of place entities providing dedicated group messaging.

** Multicast contexts

PSYC uses multicast contexts for efficient distribution of messages. A context
is managed by the context master at the top of the distribution tree. Context
members send packets to the context master which distributes them to context
slaves on the next level in the multicast tree, which distribute them further
down the tree. Figure 4.1 shows such a tree.

#+CAPTION: Multicast context distribution tree
[[./context.png]]

Entities manage multiple channels, each of which is a separate multicast context
having different membership and multicast distribution tree. Social
interactions, such as status updates, group and private messaging can be modeled
using these channels. An entity manages membership of its channels, in case of a
person entity this could be used to create different circles of friends using a
channel for each of them, or provide different channels for various topics to
which interested friends -- or if desired anyone who can contact the person --
can subscribe to. Ad-hoc group and private chats with friends can be modeled as
well with channels of a person entity.

Federated PSYC only implemented manually configured multicast distribution trees
so far, this should be made fully automatic in the peer-to-peer version. When
multicast routing is added, every node becomes a multicast routing hop serving
several multicast contexts. A node can join a multicast context at any other
node already a member of that particular context. By adding encryption to
multicast contexts any node can help in the multicast routing process without
being able to decrypt message contents. This way receiving packets for a
multicast context does not necessarily mean that the given node can decrypt the
packets sent to it. In its simplest implementation multicast encryption involves
a symmetric key distributed by the context master to all the members which has
to be changed periodically, and when a member joins or leaves.

In \cite{hordes} Hordes, an anonymity protocol based on IP multicast is
suggested. While we're not using IP multicast, part of their analysis could be
applied to application-level multicast implemented in a P2P network.

The prototype does not implement actual multicast yet, multicast contexts are
modeled but messages to contexts are distributed to each member by unicast.

** Distributed state

PSYC has the concept of distributed state, a set of key-value pairs -- state
variables -- are assigned to each multicast context and distributed to every
member. It is used to model profile data, context membership, or any other data
related to a context. Advantage of this approach is that it avoids unnecessary
request-response operations as members have an up-to-date version of the state
data most of the time, and allows local browsing of profiles of contacts, even
offline. We have implemented distributed state for P2P PSYC in psycd -- a feature
federated PSYC has long planned for but still lacked.

Context state is kept in sync using state modifiers provided by the PSYC syntax.
A state modifier adds, removes or modifies a state variable. State changes are
distributed to context members only once, which means it is very bandwidth
efficient. Using state modifiers require reliable, in-order delivery of
packets. Packet loss can be detected with the help of a =_counter= variable in
the routing header of packets. As the name suggests, it is a counter incremented
by one for every packet sent to the context. When there's a missed packet, a
node can re-request it from its parent node in the multicast distribution
tree. After a node has joined a context, a full state synchronization is
necessary to bring the node up-to-date.

Syntax of a state modifier in Augmented Backus-Naur Form (ABNF):

#+BEGIN_SRC abnf
entity-modifier = operator variable entity-arg
entity-arg	= simple-arg / binary-arg / LF

operator	= "=" / ":" / "+" / "-" / "?" / "!" / "@"
variable	= 1*kwchar
simple-arg	= HTAB text-data LF
binary-arg	= SP length HTAB binary-data LF
length		= 1*DIGIT
binary-data	= *OCTET
#+END_SRC

Operators:
- =:= (set) -- set variable just for the current packet, state is not modified
- ~=~ (assign) -- assign value to state variable
- =+= (augment) -- concatenate string or add list/dictionary element, depending
  on type
- =-= (diminish) -- remove list or dictionary element
- =@= (update) -- update an item in a list or dictionary
- =?= alone on a line: request state synchronization, all state variables are
  returned in the response
- ~=~ alone on a line: reset state, i.e. remove all previously stored state
  variables
- the rest of the operators are reserved for future use

*** Syntax changes

The state implementation involved some syntax changes: we have added a
dictionary type in order to be able to store key-value pairs in a state
variable, and modified the list syntax to make it consistent with the new
dictionary syntax, allowing us to specify types for list elements as well. We
have also added a new update modifier, which allows for updating individual list
and dictionary elements.

These syntax changes were necessary to represent more complex data structures,
such as context members or alias mappings.

*** List syntax

A list is a list of ordered elements. Its syntax in ABNF is specified as the
following:

#+BEGIN_SRC abnf
list	   = [ default-type ] *list-elem
list-sep   = "|"
list-elem  = list-sep [ "=" type ] [ SP list-value ]
list-elem  =/ list-sep "=" type ":" ] [ length ] [ SP *OCTET ]
list-value = %x00-7B / %x7D-FF	; any byte except "|"
#+END_SRC

Examples:
#+BEGIN_SRC psyc
=_list_one	_type| elem1| elem2| elem3
=_list_two	|=_type1 elem1|=_type2 elem2|=_type3 elem3
#+END_SRC

**** Inserting list elements

For inserting values before a specified index the =+= operator is used. Index of
the first element is 1, index of the last is -1. 0 means the end of the list,
which is the default if the index is omitted.

Syntax of the value part:
#+BEGIN_SRC abnf
list-insert	= [ list-index SP ] list
list-index	= "#" 1*DIGIT
#+END_SRC

#+LaTeX: \pagebreak

Example:
#+BEGIN_SRC psyc
+_list_fruits	| banana| mango
+_list_fruits	#0 | banana| mango
#+END_SRC

**** Removing list elements

For removing elements the =-= operator is used. Parameters are the start index
which defaults to -1, and the amount of elements to be removed which defaults to 1.

Syntax of the value part:
#+BEGIN_SRC abnf
list-remove	= ( list-index SP uint | list-index | uint )
#+END_SRC

Example:
#+BEGIN_SRC psyc
-_list_fruits	#1
-_list_fruits	#1 1
#+END_SRC

*** Dictionary syntax

A dictionary is a set of key-value pairs. Its syntax specified in ABNF is:

#+BEGIN_SRC abnf
dict		= [ type ] *dict-item
dict-item	= dict-item-key dict-item-value
dict-item-key	= "{" ( dict-key / length SP *OCTET) "}"
dict-item-value	= type [ SP dict-value ]
dict-item-value =/ [ length ] [ ":" type ] [ SP *OCTET ]
dict-key	= %x00-7C / %x7E-FF	; any byte except "{"
dict-value	= %x00-7A / %x7C-FF	; any byte except "}"
#+END_SRC

=type= is the default type for elements which do not have a type specified.

Examples:
#+BEGIN_SRC psyc
=_dict_one	_type{4 key1}6 value1{key2} value2{key3}6 value3
=_dict_two	{4 key1}=_type1:6 val1{key2}=_type2 val2{key3}6 val3

=_dict_avatars	_picture{alice}3 \o/{bob}7 \oXoXo/
#+END_SRC

The =struct= type can be used to define dictionary values with less
repetition. The structure is first defined once, then used for one or all
elements. It works like a C struct, a list of types are defined in a specific
order, after that we don't have to specify the types again when specifying the values.

#+BEGIN_SRC psyc
=_struct_member	|=_nick|=_picture
=_dict_members	_struct_member{13 psyc://alice/}12 | alice| \o/
=_dict_members	{psyc://alice/}=_struct_member | alice| \o/
#+END_SRC

**** Adding dictionary entries

The =+= operator is used for adding entries to an existing dictionary. The syntax
is equivalent to the initial assignment of entries. If a key already exists in
the dictionary, its value is overwritten.

**** Removing entries from a dictionary

The =-= operator is used for removing entries, syntax is the same as assignment
but only the keys are listed.

Example, removing 2 entries:
#+BEGIN_SRC psyc
-_dict_members	{psyc://alice/}{psyc://bob/}
#+END_SRC

*** Update syntax

For updating specific entries in a list or dictionary the =@= operator is used. It
has the following syntax:

#+BEGIN_SRC abnf
update		= 1*index SP op [ type ] [ ":" length] [SP value]
index		= ( dict-item-key / index-list / index-struct )
index-list	= "#" 1*DIGIT
index-struct	= "." type
#+END_SRC

Examples:
#+BEGIN_SRC psyc
@_list_gallery	#-1 =_picture:7 \oXoXo/
@_list_gallery	#-1 =:7 \oXoXo/
@_list_fruits	#1 = pear
@_list_prices	#2 =_int 1000

@_dict_gallery	{alice} =_picture:7 \oXoXo/
@_dict_gallery	{alice} =:7 \oXoXo/
@_dict_members	{psyc://alice/}._nick = Alice
@_dict_members	{psyc://bob/}._nick + Bob
@_dict_members	{psyc://foo/}._int_score + 2
#+END_SRC

** Storage

Incoming and outgoing packets, state variables and channel configuration are
stored in an SQLite database. This allows for persistent storage of context
state as well, which is restored after a restart of the node. Packets are stored
for two purposes: it provides a message history for contexts and it can be used
later to resend lost packets to nodes requesting it.

SQLite is used mainly because of its efficient memory handling and wide platform
support.

The database consists of two tables with the following schema:
- *contexts* (*uni* blob primary key, *state* blob, *config* blob,
  *created* timestamp default current_timestamp)
- *packets* (*context* blob, *source* blob, *target* blob, *counter* unsigned int,
  *fragment* unsigned int, *packet* blob,
  *created* timestamp default current_timestamp,\\
  *primary key* (context, source, target, counter, fragment))

We store information about subscribed and hosted contexts in these tables.
The contexts table is used for storing configuration and state of contexts,
whereas the packets table is for storing packet history. All this information
is stored in PSYC packet format in the database.