presentations

Presentations
Log | Files | Refs

impl (16481B)


      1 * Implementation
      2 
      3 This chapter describes core concepts in PSYC, how they are applied in a
      4 peer-to-peer context and what changes we had to make to the federated PSYC
      5 \cite{psyc-paper} protocol to make it work in a peer-to-peer network.
      6 
      7 Federated PSYC is the existing implementation of the PSYC protocol designed for
      8 a federated architecture. It is implemented as a stand-alone daemon process
      9 written in the LPC language.
     10 
     11 P2P PSYC is the new implementation we have developed and the one we use in
     12 Secure Share. The messaging daemon -- called psycd -- is implemented in C as a
     13 service in the GNUnet framework. It uses GNUnet libraries for communication with
     14 the rest of GNUnet, and libpsyc for the parsing and rendering of PSYC packets.
     15 It stores data in an SQLite database.
     16 
     17 ** Syntax
     18 
     19 PSYC is a text-based protocol with length prefixes for binary data, which makes
     20 it possible to transmit any kind of content in PSYC packets efficiently while
     21 keeping the protocol extensible. Its syntax is described in [[#syntax][Appendix 1]].
     22 
     23 An example packet looks like this:
     24 
     25 #+BEGIN_SRC psyc
     26 :_context	psyc://J61VSCQA:g/#test
     27 :_source_relay	psyc://I0GCD93U:g/
     28 70
     29 =_simple_var	value
     30 :_binary_var 5	value
     31 _method_name
     32 Packet
     33 body
     34 here.
     35 |
     36 #+END_SRC
     37 
     38 A packet contains a routing header, followed by the length of the rest of the packet,
     39 context state modifiers, the method name and the packet body.
     40 
     41 ** Identifiers
     42 
     43 In federated PSYC a server is identified by its DNS domain name. A server hosts
     44 person and group entities, each of which can manage several channels. Uniforms
     45 serve as identifiers for entities or channels, described with a URI (Uniform
     46 Resource Identifier) syntax:
     47 
     48 : psyc://host[:port[transport]][/[entity-type]entity[#channel]]
     49 : psyc://example.net/~alice#friends
     50 
     51 In peer-to-peer PSYC DNS is not employed, a public key is used instead to
     52 identify node, person or group. GNUnet uses a SHA-512 hash of the public key as
     53 node identifiers, we use a similar method for identifying entities. The
     54 ASCII-encoded version of this hash becomes the host part of the uniform, with no
     55 port number and 'g' as transport identifier:
     56 
     57 : psyc://pubkey-hash:g[/[entity-type]entity[#channel]]
     58 : psyc://I0GC...L29G:g/#friends
     59 
     60 As these identifiers are very long and not user-friendly, they can be aliased to
     61 shorter nicknames. The aliases are only used in client applications, they do not
     62 appear on the protocol level.
     63 
     64 In the prototype version GNUnet's host keys are used for identifying person
     65 entities as well, this simplification allows only one person per node. A more
     66 elaborate identification scheme is to be implemented later.
     67 
     68 Each user will have a master key which serves as the identifier of the person,
     69 its purpose is to sign subkeys used by various devices of the person. If a
     70 subkey gets compromised, the master key can be used to prune messages sent with
     71 the compromised key.
     72 
     73 These subkeys are assigned to person entities. A GNUnet node can host one or
     74 more entities. When using the distance vector transport, node and entity IDs are
     75 added to the DV routing table, and nodes gossip about available peers and
     76 entities in a local neighborhood up to a limited number of hops away, in the
     77 social circle of users. When using the mesh service, user ID to current node ID
     78 mappings are stored in the DHT.
     79 
     80 ** Circuits
     81 
     82 A circuit is a virtual connection between two PSYC nodes, packets are sent and
     83 received over circuits. When sending packets the circuit type is determined by
     84 the transport specified in the target uniform.
     85 
     86 In federated PSYC we had TCP, UDP and TLS transports. In P2P PSYC psycd
     87 implements two circuit types so far: TCP circuits for local clients and GNUnet
     88 circuits for remote peers. Unix sockets, TLS and possibly UDP circuits are
     89 planned for later.
     90 
     91 ** Contacting peers
     92 
     93 In federated PSYC it was enough to know the uniform of a person or group to
     94 establish contact. The uniform contains the host name, port number and transport
     95 method, which is all the information needed to establish connection to the
     96 remote entity.
     97 
     98 When using PSYC over P2P, two nodes have to know each other's public key and
     99 know how to reach the node associated with the public key. GNUnet introduces
    100 nodes to each other using hello messages which contain a public key and various
    101 transport methods and addresses which can be used to establish contact with the
    102 node. In case of the DV transport a hello message contains the identifier of
    103 another node through which it can be reached. The DV routing protocol gossips
    104 about connected nodes and entities in the network so they become reachable by
    105 their social network.
    106 
    107 When two users want to talk to each other, they should have received a hello
    108 message from the other party beforehand. When using the DV transport they might
    109 already know about each other if they are connected through common friends and
    110 received a gossip message about the other node. If they are on the same network
    111 they would discover each other through IPv4 broadcast or IPv6 multicast, or when
    112 using the WLAN transport a WiFi mesh network is created from the present
    113 nodes. Otherwise a hello message can be exchanged manually between users, using
    114 e.g. email or a USB stick. When sending a hello message over an insecure channel
    115 it should be encrypted using a shared secret in order to maintain
    116 confidentiality and integrity of the information contained within. Usually it's
    117 enough to exchange hello messages manually once when establishing connection for
    118 the first time, after that more stable, longer running nodes would be available
    119 to bootstrap a reconnecting node.
    120 
    121 When connection is established between two users, they set appropriate trust
    122 levels for each other -- which can be used in routing decisions in the network --
    123 and they subscribe one or more channels of the other party.
    124 
    125 ** Entities
    126 
    127 Entities are addressable objects in the PSYC network. Entity types include place
    128 entities which are used for group communication or news feeds, and person
    129 entities which can make friendships between each other and subscribe to other
    130 entities. Each entity manages one or more channels with different subscription
    131 lists.
    132 
    133 Psycd implements person entities enabling clients to link to their entity, send
    134 and receive messages and manage membership of various channels. It also has a
    135 simple implementation of place entities providing dedicated group messaging.
    136 
    137 ** Multicast contexts
    138 
    139 PSYC uses multicast contexts for efficient distribution of messages. A context
    140 is managed by the context master at the top of the distribution tree. Context
    141 members send packets to the context master which distributes them to context
    142 slaves on the next level in the multicast tree, which distribute them further
    143 down the tree. Figure 4.1 shows such a tree.
    144 
    145 #+CAPTION: Multicast context distribution tree
    146 [[./context.png]]
    147 
    148 Entities manage multiple channels, each of which is a separate multicast context
    149 having different membership and multicast distribution tree. Social
    150 interactions, such as status updates, group and private messaging can be modeled
    151 using these channels. An entity manages membership of its channels, in case of a
    152 person entity this could be used to create different circles of friends using a
    153 channel for each of them, or provide different channels for various topics to
    154 which interested friends -- or if desired anyone who can contact the person --
    155 can subscribe to. Ad-hoc group and private chats with friends can be modeled as
    156 well with channels of a person entity.
    157 
    158 Federated PSYC only implemented manually configured multicast distribution trees
    159 so far, this should be made fully automatic in the peer-to-peer version. When
    160 multicast routing is added, every node becomes a multicast routing hop serving
    161 several multicast contexts. A node can join a multicast context at any other
    162 node already a member of that particular context. By adding encryption to
    163 multicast contexts any node can help in the multicast routing process without
    164 being able to decrypt message contents. This way receiving packets for a
    165 multicast context does not necessarily mean that the given node can decrypt the
    166 packets sent to it. In its simplest implementation multicast encryption involves
    167 a symmetric key distributed by the context master to all the members which has
    168 to be changed periodically, and when a member joins or leaves.
    169 
    170 In \cite{hordes} Hordes, an anonymity protocol based on IP multicast is
    171 suggested. While we're not using IP multicast, part of their analysis could be
    172 applied to application-level multicast implemented in a P2P network.
    173 
    174 The prototype does not implement actual multicast yet, multicast contexts are
    175 modeled but messages to contexts are distributed to each member by unicast.
    176 
    177 ** Distributed state
    178 
    179 PSYC has the concept of distributed state, a set of key-value pairs -- state
    180 variables -- are assigned to each multicast context and distributed to every
    181 member. It is used to model profile data, context membership, or any other data
    182 related to a context. Advantage of this approach is that it avoids unnecessary
    183 request-response operations as members have an up-to-date version of the state
    184 data most of the time, and allows local browsing of profiles of contacts, even
    185 offline. We have implemented distributed state for P2P PSYC in psycd -- a feature
    186 federated PSYC has long planned for but still lacked.
    187 
    188 Context state is kept in sync using state modifiers provided by the PSYC syntax.
    189 A state modifier adds, removes or modifies a state variable. State changes are
    190 distributed to context members only once, which means it is very bandwidth
    191 efficient. Using state modifiers require reliable, in-order delivery of
    192 packets. Packet loss can be detected with the help of a =_counter= variable in
    193 the routing header of packets. As the name suggests, it is a counter incremented
    194 by one for every packet sent to the context. When there's a missed packet, a
    195 node can re-request it from its parent node in the multicast distribution
    196 tree. After a node has joined a context, a full state synchronization is
    197 necessary to bring the node up-to-date.
    198 
    199 Syntax of a state modifier in Augmented Backus-Naur Form (ABNF):
    200 
    201 #+BEGIN_SRC abnf
    202 entity-modifier = operator variable entity-arg
    203 entity-arg	= simple-arg / binary-arg / LF
    204 
    205 operator	= "=" / ":" / "+" / "-" / "?" / "!" / "@"
    206 variable	= 1*kwchar
    207 simple-arg	= HTAB text-data LF
    208 binary-arg	= SP length HTAB binary-data LF
    209 length		= 1*DIGIT
    210 binary-data	= *OCTET
    211 #+END_SRC
    212 
    213 Operators:
    214 - =:= (set) -- set variable just for the current packet, state is not modified
    215 - ~=~ (assign) -- assign value to state variable
    216 - =+= (augment) -- concatenate string or add list/dictionary element, depending
    217   on type
    218 - =-= (diminish) -- remove list or dictionary element
    219 - =@= (update) -- update an item in a list or dictionary
    220 - =?= alone on a line: request state synchronization, all state variables are
    221   returned in the response
    222 - ~=~ alone on a line: reset state, i.e. remove all previously stored state
    223   variables
    224 - the rest of the operators are reserved for future use
    225 
    226 *** Syntax changes
    227 
    228 The state implementation involved some syntax changes: we have added a
    229 dictionary type in order to be able to store key-value pairs in a state
    230 variable, and modified the list syntax to make it consistent with the new
    231 dictionary syntax, allowing us to specify types for list elements as well. We
    232 have also added a new update modifier, which allows for updating individual list
    233 and dictionary elements.
    234 
    235 These syntax changes were necessary to represent more complex data structures,
    236 such as context members or alias mappings.
    237 
    238 *** List syntax
    239 
    240 A list is a list of ordered elements. Its syntax in ABNF is specified as the
    241 following:
    242 
    243 #+BEGIN_SRC abnf
    244 list	   = [ default-type ] *list-elem
    245 list-sep   = "|"
    246 list-elem  = list-sep [ "=" type ] [ SP list-value ]
    247 list-elem  =/ list-sep "=" type ":" ] [ length ] [ SP *OCTET ]
    248 list-value = %x00-7B / %x7D-FF	; any byte except "|"
    249 #+END_SRC
    250 
    251 Examples:
    252 #+BEGIN_SRC psyc
    253 =_list_one	_type| elem1| elem2| elem3
    254 =_list_two	|=_type1 elem1|=_type2 elem2|=_type3 elem3
    255 #+END_SRC
    256 
    257 **** Inserting list elements
    258 
    259 For inserting values before a specified index the =+= operator is used. Index of
    260 the first element is 1, index of the last is -1. 0 means the end of the list,
    261 which is the default if the index is omitted.
    262 
    263 Syntax of the value part:
    264 #+BEGIN_SRC abnf
    265 list-insert	= [ list-index SP ] list
    266 list-index	= "#" 1*DIGIT
    267 #+END_SRC
    268 
    269 #+LaTeX: \pagebreak
    270 
    271 Example:
    272 #+BEGIN_SRC psyc
    273 +_list_fruits	| banana| mango
    274 +_list_fruits	#0 | banana| mango
    275 #+END_SRC
    276 
    277 **** Removing list elements
    278 
    279 For removing elements the =-= operator is used. Parameters are the start index
    280 which defaults to -1, and the amount of elements to be removed which defaults to 1.
    281 
    282 Syntax of the value part:
    283 #+BEGIN_SRC abnf
    284 list-remove	= ( list-index SP uint | list-index | uint )
    285 #+END_SRC
    286 
    287 Example:
    288 #+BEGIN_SRC psyc
    289 -_list_fruits	#1
    290 -_list_fruits	#1 1
    291 #+END_SRC
    292 
    293 *** Dictionary syntax
    294 
    295 A dictionary is a set of key-value pairs. Its syntax specified in ABNF is:
    296 
    297 #+BEGIN_SRC abnf
    298 dict		= [ type ] *dict-item
    299 dict-item	= dict-item-key dict-item-value
    300 dict-item-key	= "{" ( dict-key / length SP *OCTET) "}"
    301 dict-item-value	= type [ SP dict-value ]
    302 dict-item-value =/ [ length ] [ ":" type ] [ SP *OCTET ]
    303 dict-key	= %x00-7C / %x7E-FF	; any byte except "{"
    304 dict-value	= %x00-7A / %x7C-FF	; any byte except "}"
    305 #+END_SRC
    306 
    307 =type= is the default type for elements which do not have a type specified.
    308 
    309 Examples:
    310 #+BEGIN_SRC psyc
    311 =_dict_one	_type{4 key1}6 value1{key2} value2{key3}6 value3
    312 =_dict_two	{4 key1}=_type1:6 val1{key2}=_type2 val2{key3}6 val3
    313 
    314 =_dict_avatars	_picture{alice}3 \o/{bob}7 \oXoXo/
    315 #+END_SRC
    316 
    317 The =struct= type can be used to define dictionary values with less
    318 repetition. The structure is first defined once, then used for one or all
    319 elements. It works like a C struct, a list of types are defined in a specific
    320 order, after that we don't have to specify the types again when specifying the values.
    321 
    322 #+BEGIN_SRC psyc
    323 =_struct_member	|=_nick|=_picture
    324 =_dict_members	_struct_member{13 psyc://alice/}12 | alice| \o/
    325 =_dict_members	{psyc://alice/}=_struct_member | alice| \o/
    326 #+END_SRC
    327 
    328 **** Adding dictionary entries
    329 
    330 The =+= operator is used for adding entries to an existing dictionary. The syntax
    331 is equivalent to the initial assignment of entries. If a key already exists in
    332 the dictionary, its value is overwritten.
    333 
    334 **** Removing entries from a dictionary
    335 
    336 The =-= operator is used for removing entries, syntax is the same as assignment
    337 but only the keys are listed.
    338 
    339 Example, removing 2 entries:
    340 #+BEGIN_SRC psyc
    341 -_dict_members	{psyc://alice/}{psyc://bob/}
    342 #+END_SRC
    343 
    344 *** Update syntax
    345 
    346 For updating specific entries in a list or dictionary the =@= operator is used. It
    347 has the following syntax:
    348 
    349 #+BEGIN_SRC abnf
    350 update		= 1*index SP op [ type ] [ ":" length] [SP value]
    351 index		= ( dict-item-key / index-list / index-struct )
    352 index-list	= "#" 1*DIGIT
    353 index-struct	= "." type
    354 #+END_SRC
    355 
    356 Examples:
    357 #+BEGIN_SRC psyc
    358 @_list_gallery	#-1 =_picture:7 \oXoXo/
    359 @_list_gallery	#-1 =:7 \oXoXo/
    360 @_list_fruits	#1 = pear
    361 @_list_prices	#2 =_int 1000
    362 
    363 @_dict_gallery	{alice} =_picture:7 \oXoXo/
    364 @_dict_gallery	{alice} =:7 \oXoXo/
    365 @_dict_members	{psyc://alice/}._nick = Alice
    366 @_dict_members	{psyc://bob/}._nick + Bob
    367 @_dict_members	{psyc://foo/}._int_score + 2
    368 #+END_SRC
    369 
    370 ** Storage
    371 
    372 Incoming and outgoing packets, state variables and channel configuration are
    373 stored in an SQLite database. This allows for persistent storage of context
    374 state as well, which is restored after a restart of the node. Packets are stored
    375 for two purposes: it provides a message history for contexts and it can be used
    376 later to resend lost packets to nodes requesting it.
    377 
    378 SQLite is used mainly because of its efficient memory handling and wide platform
    379 support.
    380 
    381 The database consists of two tables with the following schema:
    382 - *contexts* (*uni* blob primary key, *state* blob, *config* blob,
    383   *created* timestamp default current_timestamp)
    384 - *packets* (*context* blob, *source* blob, *target* blob, *counter* unsigned int,
    385   *fragment* unsigned int, *packet* blob,
    386   *created* timestamp default current_timestamp,\\
    387   *primary key* (context, source, target, counter, fragment))
    388 
    389 We store information about subscribed and hosted contexts in these tables.
    390 The contexts table is used for storing configuration and state of contexts,
    391 whereas the packets table is for storing packet history. All this information
    392 is stored in PSYC packet format in the database.