impl - presentations - Presentations

impl (16481B)

1 * Implementation
2
3 This chapter describes core concepts in PSYC, how they are applied in a
4 peer-to-peer context and what changes we had to make to the federated PSYC
5 \cite{psyc-paper} protocol to make it work in a peer-to-peer network.
6
7 Federated PSYC is the existing implementation of the PSYC protocol designed for
8 a federated architecture. It is implemented as a stand-alone daemon process
9 written in the LPC language.
10
11 P2P PSYC is the new implementation we have developed and the one we use in
12 Secure Share. The messaging daemon -- called psycd -- is implemented in C as a
13 service in the GNUnet framework. It uses GNUnet libraries for communication with
14 the rest of GNUnet, and libpsyc for the parsing and rendering of PSYC packets.
15 It stores data in an SQLite database.
16
17 ** Syntax
18
19 PSYC is a text-based protocol with length prefixes for binary data, which makes
20 it possible to transmit any kind of content in PSYC packets efficiently while
21 keeping the protocol extensible. Its syntax is described in [[#syntax][Appendix 1]].
22
23 An example packet looks like this:
24
25 #+BEGIN_SRC psyc
26 :_context psyc://J61VSCQA:g/#test
27 :_source_relay psyc://I0GCD93U:g/
28 70
29 =_simple_var value
30 :_binary_var 5 value
31 _method_name
32 Packet
33 body
34 here.
35 |
36 #+END_SRC
37
38 A packet contains a routing header, followed by the length of the rest of the packet,
39 context state modifiers, the method name and the packet body.
40
41 ** Identifiers
42
43 In federated PSYC a server is identified by its DNS domain name. A server hosts
44 person and group entities, each of which can manage several channels. Uniforms
45 serve as identifiers for entities or channels, described with a URI (Uniform
46 Resource Identifier) syntax:
47
48 : psyc://host[:port[transport]][/[entity-type]entity[#channel]]
49 : psyc://example.net/~alice#friends
50
51 In peer-to-peer PSYC DNS is not employed, a public key is used instead to
52 identify node, person or group. GNUnet uses a SHA-512 hash of the public key as
53 node identifiers, we use a similar method for identifying entities. The
54 ASCII-encoded version of this hash becomes the host part of the uniform, with no
55 port number and 'g' as transport identifier:
56
57 : psyc://pubkey-hash:g[/[entity-type]entity[#channel]]
58 : psyc://I0GC...L29G:g/#friends
59
60 As these identifiers are very long and not user-friendly, they can be aliased to
61 shorter nicknames. The aliases are only used in client applications, they do not
62 appear on the protocol level.
63
64 In the prototype version GNUnet's host keys are used for identifying person
65 entities as well, this simplification allows only one person per node. A more
66 elaborate identification scheme is to be implemented later.
67
68 Each user will have a master key which serves as the identifier of the person,
69 its purpose is to sign subkeys used by various devices of the person. If a
70 subkey gets compromised, the master key can be used to prune messages sent with
71 the compromised key.
72
73 These subkeys are assigned to person entities. A GNUnet node can host one or
74 more entities. When using the distance vector transport, node and entity IDs are
75 added to the DV routing table, and nodes gossip about available peers and
76 entities in a local neighborhood up to a limited number of hops away, in the
77 social circle of users. When using the mesh service, user ID to current node ID
78 mappings are stored in the DHT.
79
80 ** Circuits
81
82 A circuit is a virtual connection between two PSYC nodes, packets are sent and
83 received over circuits. When sending packets the circuit type is determined by
84 the transport specified in the target uniform.
85
86 In federated PSYC we had TCP, UDP and TLS transports. In P2P PSYC psycd
87 implements two circuit types so far: TCP circuits for local clients and GNUnet
88 circuits for remote peers. Unix sockets, TLS and possibly UDP circuits are
89 planned for later.
90
91 ** Contacting peers
92
93 In federated PSYC it was enough to know the uniform of a person or group to
94 establish contact. The uniform contains the host name, port number and transport
95 method, which is all the information needed to establish connection to the
96 remote entity.
97
98 When using PSYC over P2P, two nodes have to know each other's public key and
99 know how to reach the node associated with the public key. GNUnet introduces
100 nodes to each other using hello messages which contain a public key and various
101 transport methods and addresses which can be used to establish contact with the
102 node. In case of the DV transport a hello message contains the identifier of
103 another node through which it can be reached. The DV routing protocol gossips
104 about connected nodes and entities in the network so they become reachable by
105 their social network.
106
107 When two users want to talk to each other, they should have received a hello
108 message from the other party beforehand. When using the DV transport they might
109 already know about each other if they are connected through common friends and
110 received a gossip message about the other node. If they are on the same network
111 they would discover each other through IPv4 broadcast or IPv6 multicast, or when
112 using the WLAN transport a WiFi mesh network is created from the present
113 nodes. Otherwise a hello message can be exchanged manually between users, using
114 e.g. email or a USB stick. When sending a hello message over an insecure channel
115 it should be encrypted using a shared secret in order to maintain
116 confidentiality and integrity of the information contained within. Usually it's
117 enough to exchange hello messages manually once when establishing connection for
118 the first time, after that more stable, longer running nodes would be available
119 to bootstrap a reconnecting node.
120
121 When connection is established between two users, they set appropriate trust
122 levels for each other -- which can be used in routing decisions in the network --
123 and they subscribe one or more channels of the other party.
124
125 ** Entities
126
127 Entities are addressable objects in the PSYC network. Entity types include place
128 entities which are used for group communication or news feeds, and person
129 entities which can make friendships between each other and subscribe to other
130 entities. Each entity manages one or more channels with different subscription
131 lists.
132
133 Psycd implements person entities enabling clients to link to their entity, send
134 and receive messages and manage membership of various channels. It also has a
135 simple implementation of place entities providing dedicated group messaging.
136
137 ** Multicast contexts
138
139 PSYC uses multicast contexts for efficient distribution of messages. A context
140 is managed by the context master at the top of the distribution tree. Context
141 members send packets to the context master which distributes them to context
142 slaves on the next level in the multicast tree, which distribute them further
143 down the tree. Figure 4.1 shows such a tree.
144
145 #+CAPTION: Multicast context distribution tree
146 [[./context.png]]
147
148 Entities manage multiple channels, each of which is a separate multicast context
149 having different membership and multicast distribution tree. Social
150 interactions, such as status updates, group and private messaging can be modeled
151 using these channels. An entity manages membership of its channels, in case of a
152 person entity this could be used to create different circles of friends using a
153 channel for each of them, or provide different channels for various topics to
154 which interested friends -- or if desired anyone who can contact the person --
155 can subscribe to. Ad-hoc group and private chats with friends can be modeled as
156 well with channels of a person entity.
157
158 Federated PSYC only implemented manually configured multicast distribution trees
159 so far, this should be made fully automatic in the peer-to-peer version. When
160 multicast routing is added, every node becomes a multicast routing hop serving
161 several multicast contexts. A node can join a multicast context at any other
162 node already a member of that particular context. By adding encryption to
163 multicast contexts any node can help in the multicast routing process without
164 being able to decrypt message contents. This way receiving packets for a
165 multicast context does not necessarily mean that the given node can decrypt the
166 packets sent to it. In its simplest implementation multicast encryption involves
167 a symmetric key distributed by the context master to all the members which has
168 to be changed periodically, and when a member joins or leaves.
169
170 In \cite{hordes} Hordes, an anonymity protocol based on IP multicast is
171 suggested. While we're not using IP multicast, part of their analysis could be
172 applied to application-level multicast implemented in a P2P network.
173
174 The prototype does not implement actual multicast yet, multicast contexts are
175 modeled but messages to contexts are distributed to each member by unicast.
176
177 ** Distributed state
178
179 PSYC has the concept of distributed state, a set of key-value pairs -- state
180 variables -- are assigned to each multicast context and distributed to every
181 member. It is used to model profile data, context membership, or any other data
182 related to a context. Advantage of this approach is that it avoids unnecessary
183 request-response operations as members have an up-to-date version of the state
184 data most of the time, and allows local browsing of profiles of contacts, even
185 offline. We have implemented distributed state for P2P PSYC in psycd -- a feature
186 federated PSYC has long planned for but still lacked.
187
188 Context state is kept in sync using state modifiers provided by the PSYC syntax.
189 A state modifier adds, removes or modifies a state variable. State changes are
190 distributed to context members only once, which means it is very bandwidth
191 efficient. Using state modifiers require reliable, in-order delivery of
192 packets. Packet loss can be detected with the help of a =_counter= variable in
193 the routing header of packets. As the name suggests, it is a counter incremented
194 by one for every packet sent to the context. When there's a missed packet, a
195 node can re-request it from its parent node in the multicast distribution
196 tree. After a node has joined a context, a full state synchronization is
197 necessary to bring the node up-to-date.
198
199 Syntax of a state modifier in Augmented Backus-Naur Form (ABNF):
200
201 #+BEGIN_SRC abnf
202 entity-modifier = operator variable entity-arg
203 entity-arg = simple-arg / binary-arg / LF
204
205 operator = "=" / ":" / "+" / "-" / "?" / "!" / "@"
206 variable = 1*kwchar
207 simple-arg = HTAB text-data LF
208 binary-arg = SP length HTAB binary-data LF
209 length = 1*DIGIT
210 binary-data = *OCTET
211 #+END_SRC
212
213 Operators:
214 - =:= (set) -- set variable just for the current packet, state is not modified
215 - ~=~ (assign) -- assign value to state variable
216 - =+= (augment) -- concatenate string or add list/dictionary element, depending
217 on type
218 - =-= (diminish) -- remove list or dictionary element
219 - =@= (update) -- update an item in a list or dictionary
220 - =?= alone on a line: request state synchronization, all state variables are
221 returned in the response
222 - ~=~ alone on a line: reset state, i.e. remove all previously stored state
223 variables
224 - the rest of the operators are reserved for future use
225
226 *** Syntax changes
227
228 The state implementation involved some syntax changes: we have added a
229 dictionary type in order to be able to store key-value pairs in a state
230 variable, and modified the list syntax to make it consistent with the new
231 dictionary syntax, allowing us to specify types for list elements as well. We
232 have also added a new update modifier, which allows for updating individual list
233 and dictionary elements.
234
235 These syntax changes were necessary to represent more complex data structures,
236 such as context members or alias mappings.
237
238 *** List syntax
239
240 A list is a list of ordered elements. Its syntax in ABNF is specified as the
241 following:
242
243 #+BEGIN_SRC abnf
244 list = [ default-type ] *list-elem
245 list-sep = "|"
246 list-elem = list-sep [ "=" type ] [ SP list-value ]
247 list-elem =/ list-sep "=" type ":" ] [ length ] [ SP *OCTET ]
248 list-value = %x00-7B / %x7D-FF ; any byte except "|"
249 #+END_SRC
250
251 Examples:
252 #+BEGIN_SRC psyc
253 =_list_one _type| elem1| elem2| elem3
254 =_list_two |=_type1 elem1|=_type2 elem2|=_type3 elem3
255 #+END_SRC
256
257 **** Inserting list elements
258
259 For inserting values before a specified index the =+= operator is used. Index of
260 the first element is 1, index of the last is -1. 0 means the end of the list,
261 which is the default if the index is omitted.
262
263 Syntax of the value part:
264 #+BEGIN_SRC abnf
265 list-insert = [ list-index SP ] list
266 list-index = "#" 1*DIGIT
267 #+END_SRC
268
269 #+LaTeX: \pagebreak
270
271 Example:
272 #+BEGIN_SRC psyc
273 +_list_fruits | banana| mango
274 +_list_fruits #0 | banana| mango
275 #+END_SRC
276
277 **** Removing list elements
278
279 For removing elements the =-= operator is used. Parameters are the start index
280 which defaults to -1, and the amount of elements to be removed which defaults to 1.
281
282 Syntax of the value part:
283 #+BEGIN_SRC abnf
284 list-remove = ( list-index SP uint | list-index | uint )
285 #+END_SRC
286
287 Example:
288 #+BEGIN_SRC psyc
289 -_list_fruits #1
290 -_list_fruits #1 1
291 #+END_SRC
292
293 *** Dictionary syntax
294
295 A dictionary is a set of key-value pairs. Its syntax specified in ABNF is:
296
297 #+BEGIN_SRC abnf
298 dict = [ type ] *dict-item
299 dict-item = dict-item-key dict-item-value
300 dict-item-key = "{" ( dict-key / length SP *OCTET) "}"
301 dict-item-value = type [ SP dict-value ]
302 dict-item-value =/ [ length ] [ ":" type ] [ SP *OCTET ]
303 dict-key = %x00-7C / %x7E-FF ; any byte except "{"
304 dict-value = %x00-7A / %x7C-FF ; any byte except "}"
305 #+END_SRC
306
307 =type= is the default type for elements which do not have a type specified.
308
309 Examples:
310 #+BEGIN_SRC psyc
311 =_dict_one _type{4 key1}6 value1{key2} value2{key3}6 value3
312 =_dict_two {4 key1}=_type1:6 val1{key2}=_type2 val2{key3}6 val3
313
314 =_dict_avatars _picture{alice}3 \o/{bob}7 \oXoXo/
315 #+END_SRC
316
317 The =struct= type can be used to define dictionary values with less
318 repetition. The structure is first defined once, then used for one or all
319 elements. It works like a C struct, a list of types are defined in a specific
320 order, after that we don't have to specify the types again when specifying the values.
321
322 #+BEGIN_SRC psyc
323 =_struct_member |=_nick|=_picture
324 =_dict_members _struct_member{13 psyc://alice/}12 | alice| \o/
325 =_dict_members {psyc://alice/}=_struct_member | alice| \o/
326 #+END_SRC
327
328 **** Adding dictionary entries
329
330 The =+= operator is used for adding entries to an existing dictionary. The syntax
331 is equivalent to the initial assignment of entries. If a key already exists in
332 the dictionary, its value is overwritten.
333
334 **** Removing entries from a dictionary
335
336 The =-= operator is used for removing entries, syntax is the same as assignment
337 but only the keys are listed.
338
339 Example, removing 2 entries:
340 #+BEGIN_SRC psyc
341 -_dict_members {psyc://alice/}{psyc://bob/}
342 #+END_SRC
343
344 *** Update syntax
345
346 For updating specific entries in a list or dictionary the =@= operator is used. It
347 has the following syntax:
348
349 #+BEGIN_SRC abnf
350 update = 1*index SP op [ type ] [ ":" length] [SP value]
351 index = ( dict-item-key / index-list / index-struct )
352 index-list = "#" 1*DIGIT
353 index-struct = "." type
354 #+END_SRC
355
356 Examples:
357 #+BEGIN_SRC psyc
358 @_list_gallery #-1 =_picture:7 \oXoXo/
359 @_list_gallery #-1 =:7 \oXoXo/
360 @_list_fruits #1 = pear
361 @_list_prices #2 =_int 1000
362
363 @_dict_gallery {alice} =_picture:7 \oXoXo/
364 @_dict_gallery {alice} =:7 \oXoXo/
365 @_dict_members {psyc://alice/}._nick = Alice
366 @_dict_members {psyc://bob/}._nick + Bob
367 @_dict_members {psyc://foo/}._int_score + 2
368 #+END_SRC
369
370 ** Storage
371
372 Incoming and outgoing packets, state variables and channel configuration are
373 stored in an SQLite database. This allows for persistent storage of context
374 state as well, which is restored after a restart of the node. Packets are stored
375 for two purposes: it provides a message history for contexts and it can be used
376 later to resend lost packets to nodes requesting it.
377
378 SQLite is used mainly because of its efficient memory handling and wide platform
379 support.
380
381 The database consists of two tables with the following schema:
382 - *contexts* (*uni* blob primary key, *state* blob, *config* blob,
383 *created* timestamp default current_timestamp)
384 - *packets* (*context* blob, *source* blob, *target* blob, *counter* unsigned int,
385 *fragment* unsigned int, *packet* blob,
386 *created* timestamp default current_timestamp,\\
387 *primary key* (context, source, target, counter, fragment))
388
389 We store information about subscribed and hosted contexts in these tables.
390 The contexts table is used for storing configuration and state of contexts,
391 whereas the packets table is for storing packet history. All this information
392 is stored in PSYC packet format in the database.

	presentations Presentations
	Log \| Files \| Refs