impl (16481B)
1 * Implementation 2 3 This chapter describes core concepts in PSYC, how they are applied in a 4 peer-to-peer context and what changes we had to make to the federated PSYC 5 \cite{psyc-paper} protocol to make it work in a peer-to-peer network. 6 7 Federated PSYC is the existing implementation of the PSYC protocol designed for 8 a federated architecture. It is implemented as a stand-alone daemon process 9 written in the LPC language. 10 11 P2P PSYC is the new implementation we have developed and the one we use in 12 Secure Share. The messaging daemon -- called psycd -- is implemented in C as a 13 service in the GNUnet framework. It uses GNUnet libraries for communication with 14 the rest of GNUnet, and libpsyc for the parsing and rendering of PSYC packets. 15 It stores data in an SQLite database. 16 17 ** Syntax 18 19 PSYC is a text-based protocol with length prefixes for binary data, which makes 20 it possible to transmit any kind of content in PSYC packets efficiently while 21 keeping the protocol extensible. Its syntax is described in [[#syntax][Appendix 1]]. 22 23 An example packet looks like this: 24 25 #+BEGIN_SRC psyc 26 :_context psyc://J61VSCQA:g/#test 27 :_source_relay psyc://I0GCD93U:g/ 28 70 29 =_simple_var value 30 :_binary_var 5 value 31 _method_name 32 Packet 33 body 34 here. 35 | 36 #+END_SRC 37 38 A packet contains a routing header, followed by the length of the rest of the packet, 39 context state modifiers, the method name and the packet body. 40 41 ** Identifiers 42 43 In federated PSYC a server is identified by its DNS domain name. A server hosts 44 person and group entities, each of which can manage several channels. Uniforms 45 serve as identifiers for entities or channels, described with a URI (Uniform 46 Resource Identifier) syntax: 47 48 : psyc://host[:port[transport]][/[entity-type]entity[#channel]] 49 : psyc://example.net/~alice#friends 50 51 In peer-to-peer PSYC DNS is not employed, a public key is used instead to 52 identify node, person or group. GNUnet uses a SHA-512 hash of the public key as 53 node identifiers, we use a similar method for identifying entities. The 54 ASCII-encoded version of this hash becomes the host part of the uniform, with no 55 port number and 'g' as transport identifier: 56 57 : psyc://pubkey-hash:g[/[entity-type]entity[#channel]] 58 : psyc://I0GC...L29G:g/#friends 59 60 As these identifiers are very long and not user-friendly, they can be aliased to 61 shorter nicknames. The aliases are only used in client applications, they do not 62 appear on the protocol level. 63 64 In the prototype version GNUnet's host keys are used for identifying person 65 entities as well, this simplification allows only one person per node. A more 66 elaborate identification scheme is to be implemented later. 67 68 Each user will have a master key which serves as the identifier of the person, 69 its purpose is to sign subkeys used by various devices of the person. If a 70 subkey gets compromised, the master key can be used to prune messages sent with 71 the compromised key. 72 73 These subkeys are assigned to person entities. A GNUnet node can host one or 74 more entities. When using the distance vector transport, node and entity IDs are 75 added to the DV routing table, and nodes gossip about available peers and 76 entities in a local neighborhood up to a limited number of hops away, in the 77 social circle of users. When using the mesh service, user ID to current node ID 78 mappings are stored in the DHT. 79 80 ** Circuits 81 82 A circuit is a virtual connection between two PSYC nodes, packets are sent and 83 received over circuits. When sending packets the circuit type is determined by 84 the transport specified in the target uniform. 85 86 In federated PSYC we had TCP, UDP and TLS transports. In P2P PSYC psycd 87 implements two circuit types so far: TCP circuits for local clients and GNUnet 88 circuits for remote peers. Unix sockets, TLS and possibly UDP circuits are 89 planned for later. 90 91 ** Contacting peers 92 93 In federated PSYC it was enough to know the uniform of a person or group to 94 establish contact. The uniform contains the host name, port number and transport 95 method, which is all the information needed to establish connection to the 96 remote entity. 97 98 When using PSYC over P2P, two nodes have to know each other's public key and 99 know how to reach the node associated with the public key. GNUnet introduces 100 nodes to each other using hello messages which contain a public key and various 101 transport methods and addresses which can be used to establish contact with the 102 node. In case of the DV transport a hello message contains the identifier of 103 another node through which it can be reached. The DV routing protocol gossips 104 about connected nodes and entities in the network so they become reachable by 105 their social network. 106 107 When two users want to talk to each other, they should have received a hello 108 message from the other party beforehand. When using the DV transport they might 109 already know about each other if they are connected through common friends and 110 received a gossip message about the other node. If they are on the same network 111 they would discover each other through IPv4 broadcast or IPv6 multicast, or when 112 using the WLAN transport a WiFi mesh network is created from the present 113 nodes. Otherwise a hello message can be exchanged manually between users, using 114 e.g. email or a USB stick. When sending a hello message over an insecure channel 115 it should be encrypted using a shared secret in order to maintain 116 confidentiality and integrity of the information contained within. Usually it's 117 enough to exchange hello messages manually once when establishing connection for 118 the first time, after that more stable, longer running nodes would be available 119 to bootstrap a reconnecting node. 120 121 When connection is established between two users, they set appropriate trust 122 levels for each other -- which can be used in routing decisions in the network -- 123 and they subscribe one or more channels of the other party. 124 125 ** Entities 126 127 Entities are addressable objects in the PSYC network. Entity types include place 128 entities which are used for group communication or news feeds, and person 129 entities which can make friendships between each other and subscribe to other 130 entities. Each entity manages one or more channels with different subscription 131 lists. 132 133 Psycd implements person entities enabling clients to link to their entity, send 134 and receive messages and manage membership of various channels. It also has a 135 simple implementation of place entities providing dedicated group messaging. 136 137 ** Multicast contexts 138 139 PSYC uses multicast contexts for efficient distribution of messages. A context 140 is managed by the context master at the top of the distribution tree. Context 141 members send packets to the context master which distributes them to context 142 slaves on the next level in the multicast tree, which distribute them further 143 down the tree. Figure 4.1 shows such a tree. 144 145 #+CAPTION: Multicast context distribution tree 146 [[./context.png]] 147 148 Entities manage multiple channels, each of which is a separate multicast context 149 having different membership and multicast distribution tree. Social 150 interactions, such as status updates, group and private messaging can be modeled 151 using these channels. An entity manages membership of its channels, in case of a 152 person entity this could be used to create different circles of friends using a 153 channel for each of them, or provide different channels for various topics to 154 which interested friends -- or if desired anyone who can contact the person -- 155 can subscribe to. Ad-hoc group and private chats with friends can be modeled as 156 well with channels of a person entity. 157 158 Federated PSYC only implemented manually configured multicast distribution trees 159 so far, this should be made fully automatic in the peer-to-peer version. When 160 multicast routing is added, every node becomes a multicast routing hop serving 161 several multicast contexts. A node can join a multicast context at any other 162 node already a member of that particular context. By adding encryption to 163 multicast contexts any node can help in the multicast routing process without 164 being able to decrypt message contents. This way receiving packets for a 165 multicast context does not necessarily mean that the given node can decrypt the 166 packets sent to it. In its simplest implementation multicast encryption involves 167 a symmetric key distributed by the context master to all the members which has 168 to be changed periodically, and when a member joins or leaves. 169 170 In \cite{hordes} Hordes, an anonymity protocol based on IP multicast is 171 suggested. While we're not using IP multicast, part of their analysis could be 172 applied to application-level multicast implemented in a P2P network. 173 174 The prototype does not implement actual multicast yet, multicast contexts are 175 modeled but messages to contexts are distributed to each member by unicast. 176 177 ** Distributed state 178 179 PSYC has the concept of distributed state, a set of key-value pairs -- state 180 variables -- are assigned to each multicast context and distributed to every 181 member. It is used to model profile data, context membership, or any other data 182 related to a context. Advantage of this approach is that it avoids unnecessary 183 request-response operations as members have an up-to-date version of the state 184 data most of the time, and allows local browsing of profiles of contacts, even 185 offline. We have implemented distributed state for P2P PSYC in psycd -- a feature 186 federated PSYC has long planned for but still lacked. 187 188 Context state is kept in sync using state modifiers provided by the PSYC syntax. 189 A state modifier adds, removes or modifies a state variable. State changes are 190 distributed to context members only once, which means it is very bandwidth 191 efficient. Using state modifiers require reliable, in-order delivery of 192 packets. Packet loss can be detected with the help of a =_counter= variable in 193 the routing header of packets. As the name suggests, it is a counter incremented 194 by one for every packet sent to the context. When there's a missed packet, a 195 node can re-request it from its parent node in the multicast distribution 196 tree. After a node has joined a context, a full state synchronization is 197 necessary to bring the node up-to-date. 198 199 Syntax of a state modifier in Augmented Backus-Naur Form (ABNF): 200 201 #+BEGIN_SRC abnf 202 entity-modifier = operator variable entity-arg 203 entity-arg = simple-arg / binary-arg / LF 204 205 operator = "=" / ":" / "+" / "-" / "?" / "!" / "@" 206 variable = 1*kwchar 207 simple-arg = HTAB text-data LF 208 binary-arg = SP length HTAB binary-data LF 209 length = 1*DIGIT 210 binary-data = *OCTET 211 #+END_SRC 212 213 Operators: 214 - =:= (set) -- set variable just for the current packet, state is not modified 215 - ~=~ (assign) -- assign value to state variable 216 - =+= (augment) -- concatenate string or add list/dictionary element, depending 217 on type 218 - =-= (diminish) -- remove list or dictionary element 219 - =@= (update) -- update an item in a list or dictionary 220 - =?= alone on a line: request state synchronization, all state variables are 221 returned in the response 222 - ~=~ alone on a line: reset state, i.e. remove all previously stored state 223 variables 224 - the rest of the operators are reserved for future use 225 226 *** Syntax changes 227 228 The state implementation involved some syntax changes: we have added a 229 dictionary type in order to be able to store key-value pairs in a state 230 variable, and modified the list syntax to make it consistent with the new 231 dictionary syntax, allowing us to specify types for list elements as well. We 232 have also added a new update modifier, which allows for updating individual list 233 and dictionary elements. 234 235 These syntax changes were necessary to represent more complex data structures, 236 such as context members or alias mappings. 237 238 *** List syntax 239 240 A list is a list of ordered elements. Its syntax in ABNF is specified as the 241 following: 242 243 #+BEGIN_SRC abnf 244 list = [ default-type ] *list-elem 245 list-sep = "|" 246 list-elem = list-sep [ "=" type ] [ SP list-value ] 247 list-elem =/ list-sep "=" type ":" ] [ length ] [ SP *OCTET ] 248 list-value = %x00-7B / %x7D-FF ; any byte except "|" 249 #+END_SRC 250 251 Examples: 252 #+BEGIN_SRC psyc 253 =_list_one _type| elem1| elem2| elem3 254 =_list_two |=_type1 elem1|=_type2 elem2|=_type3 elem3 255 #+END_SRC 256 257 **** Inserting list elements 258 259 For inserting values before a specified index the =+= operator is used. Index of 260 the first element is 1, index of the last is -1. 0 means the end of the list, 261 which is the default if the index is omitted. 262 263 Syntax of the value part: 264 #+BEGIN_SRC abnf 265 list-insert = [ list-index SP ] list 266 list-index = "#" 1*DIGIT 267 #+END_SRC 268 269 #+LaTeX: \pagebreak 270 271 Example: 272 #+BEGIN_SRC psyc 273 +_list_fruits | banana| mango 274 +_list_fruits #0 | banana| mango 275 #+END_SRC 276 277 **** Removing list elements 278 279 For removing elements the =-= operator is used. Parameters are the start index 280 which defaults to -1, and the amount of elements to be removed which defaults to 1. 281 282 Syntax of the value part: 283 #+BEGIN_SRC abnf 284 list-remove = ( list-index SP uint | list-index | uint ) 285 #+END_SRC 286 287 Example: 288 #+BEGIN_SRC psyc 289 -_list_fruits #1 290 -_list_fruits #1 1 291 #+END_SRC 292 293 *** Dictionary syntax 294 295 A dictionary is a set of key-value pairs. Its syntax specified in ABNF is: 296 297 #+BEGIN_SRC abnf 298 dict = [ type ] *dict-item 299 dict-item = dict-item-key dict-item-value 300 dict-item-key = "{" ( dict-key / length SP *OCTET) "}" 301 dict-item-value = type [ SP dict-value ] 302 dict-item-value =/ [ length ] [ ":" type ] [ SP *OCTET ] 303 dict-key = %x00-7C / %x7E-FF ; any byte except "{" 304 dict-value = %x00-7A / %x7C-FF ; any byte except "}" 305 #+END_SRC 306 307 =type= is the default type for elements which do not have a type specified. 308 309 Examples: 310 #+BEGIN_SRC psyc 311 =_dict_one _type{4 key1}6 value1{key2} value2{key3}6 value3 312 =_dict_two {4 key1}=_type1:6 val1{key2}=_type2 val2{key3}6 val3 313 314 =_dict_avatars _picture{alice}3 \o/{bob}7 \oXoXo/ 315 #+END_SRC 316 317 The =struct= type can be used to define dictionary values with less 318 repetition. The structure is first defined once, then used for one or all 319 elements. It works like a C struct, a list of types are defined in a specific 320 order, after that we don't have to specify the types again when specifying the values. 321 322 #+BEGIN_SRC psyc 323 =_struct_member |=_nick|=_picture 324 =_dict_members _struct_member{13 psyc://alice/}12 | alice| \o/ 325 =_dict_members {psyc://alice/}=_struct_member | alice| \o/ 326 #+END_SRC 327 328 **** Adding dictionary entries 329 330 The =+= operator is used for adding entries to an existing dictionary. The syntax 331 is equivalent to the initial assignment of entries. If a key already exists in 332 the dictionary, its value is overwritten. 333 334 **** Removing entries from a dictionary 335 336 The =-= operator is used for removing entries, syntax is the same as assignment 337 but only the keys are listed. 338 339 Example, removing 2 entries: 340 #+BEGIN_SRC psyc 341 -_dict_members {psyc://alice/}{psyc://bob/} 342 #+END_SRC 343 344 *** Update syntax 345 346 For updating specific entries in a list or dictionary the =@= operator is used. It 347 has the following syntax: 348 349 #+BEGIN_SRC abnf 350 update = 1*index SP op [ type ] [ ":" length] [SP value] 351 index = ( dict-item-key / index-list / index-struct ) 352 index-list = "#" 1*DIGIT 353 index-struct = "." type 354 #+END_SRC 355 356 Examples: 357 #+BEGIN_SRC psyc 358 @_list_gallery #-1 =_picture:7 \oXoXo/ 359 @_list_gallery #-1 =:7 \oXoXo/ 360 @_list_fruits #1 = pear 361 @_list_prices #2 =_int 1000 362 363 @_dict_gallery {alice} =_picture:7 \oXoXo/ 364 @_dict_gallery {alice} =:7 \oXoXo/ 365 @_dict_members {psyc://alice/}._nick = Alice 366 @_dict_members {psyc://bob/}._nick + Bob 367 @_dict_members {psyc://foo/}._int_score + 2 368 #+END_SRC 369 370 ** Storage 371 372 Incoming and outgoing packets, state variables and channel configuration are 373 stored in an SQLite database. This allows for persistent storage of context 374 state as well, which is restored after a restart of the node. Packets are stored 375 for two purposes: it provides a message history for contexts and it can be used 376 later to resend lost packets to nodes requesting it. 377 378 SQLite is used mainly because of its efficient memory handling and wide platform 379 support. 380 381 The database consists of two tables with the following schema: 382 - *contexts* (*uni* blob primary key, *state* blob, *config* blob, 383 *created* timestamp default current_timestamp) 384 - *packets* (*context* blob, *source* blob, *target* blob, *counter* unsigned int, 385 *fragment* unsigned int, *packet* blob, 386 *created* timestamp default current_timestamp,\\ 387 *primary key* (context, source, target, counter, fragment)) 388 389 We store information about subscribed and hosted contexts in these tables. 390 The contexts table is used for storing configuration and state of contexts, 391 whereas the packets table is for storing packet history. All this information 392 is stored in PSYC packet format in the database.