fs.rst (18907B)
1 File-sharing 2 ------------ 3 4 This chapter documents the GNUnet file-sharing application. The original 5 file-sharing implementation for GNUnet was designed to provide anonymous 6 file-sharing. However, over time, we have also added support for 7 non-anonymous file-sharing (which can provide better performance). 8 Anonymous and non-anonymous file-sharing are quite integrated in GNUnet 9 and, except for routing, share most of the concepts and implementation. 10 There are three primary file-sharing operations: publishing, searching 11 and downloading. For each of these operations, the user specifies an 12 anonymity level. If both the publisher and the searcher/downloader 13 specify “no anonymity”, non-anonymous file-sharing is used. If either 14 user specifies some desired degree of anonymity, anonymous file-sharing 15 will be used. 16 17 After a short introduction, we will first look at the various concepts 18 in GNUnet’s file-sharing implementation. Then, we will discuss specifics 19 as to how they impact users that publish, search or download files. 20 21 Searching 22 ~~~~~~~~~ 23 24 The command ``gnunet-search`` can be used to search for content on 25 GNUnet. The format is: 26 27 :: 28 29 $ gnunet-search [-t TIMEOUT] KEYWORD 30 31 The ``-t`` option specifies that the query should timeout after 32 approximately ``TIMEOUT`` seconds. A value of zero (“0”) is interpreted 33 as no timeout, which is the default. In this case, gnunet-search will 34 never terminate (unless you press CTRL-C). 35 36 If multiple words are passed as keywords, they will all be considered 37 optional. Prefix keywords with a “+” to make them mandatory. 38 39 Note that searching using: 40 41 :: 42 43 $ gnunet-search Das Kapital 44 45 is not the same as searching for 46 47 :: 48 49 $ gnunet-search "Das Kapital" 50 51 as the first will match files shared under the keywords “Das” or 52 “Kapital” whereas the second will match files shared under the keyword 53 “Das Kapital”. 54 55 Search results are printed like this: 56 57 :: 58 59 #15: 60 gnunet-download -o "COPYING" gnunet://fs/chk/PGK8M...3EK130.75446 61 62 The whole line is the command you would have to enter to download the 63 file. The first argument passed to ``-o`` is the suggested filename (you 64 may change it to whatever you like). It is followed by the key for 65 decrypting the file, the query for searching the file, a checksum (in 66 hexadecimal) finally the size of the file in bytes. 67 68 Downloading 69 ~~~~~~~~~~~ 70 71 In order to download a file, you need the whole line returned by 72 gnunet-search. You can then use the tool ``gnunet-download`` to obtain 73 the file: 74 75 :: 76 77 $ gnunet-download -o <FILENAME> <GNUNET-URL> 78 79 ``FILENAME`` specifies the name of the file where GNUnet is supposed to 80 write the result. Existing files are overwritten. If the existing file 81 contains blocks that are identical to the desired download, those blocks 82 will not be downloaded again (automatic resume). 83 84 If you want to download the GPL from the previous example, you do the 85 following: 86 87 :: 88 89 $ gnunet-download -o "COPYING" gnunet://fs/chk/PGK8M...3EK130.75446 90 91 If you ever have to abort a download, you can continue it at any time by 92 re-issuing gnunet-download with the same filename. In that case, GNUnet 93 will **not** download blocks again that are already present. 94 95 GNUnet’s file-encoding mechanism will ensure file integrity, even if the 96 existing file was not downloaded from GNUnet in the first place. 97 98 You may want to use the ``-V`` switch to turn on verbose reporting. In 99 this case, gnunet-download will print the current number of bytes 100 downloaded whenever new data was received. 101 102 Publishing 103 ~~~~~~~~~~ 104 105 The command ``gnunet-publish`` can be used to add content to the 106 network. The basic format of the command is: 107 108 :: 109 110 $ gnunet-publish [-n] [-k KEYWORDS]* [-m TYPE:VALUE] FILENAME 111 112 For example: 113 114 :: 115 116 $ gnunet-publish -m "description:GNU License" -k gpl -k test -m "mimetype:text/plain" COPYING 117 118 The option ``-k`` is used to specify keywords for the file that should 119 be inserted. You can supply any number of keywords, and each of the 120 keywords will be sufficient to locate and retrieve the file. Please note 121 that you must use the ``-k`` option more than once – one for each 122 expression you use as a keyword for the filename. 123 124 The ``-m`` option is used to specify meta-data, such as descriptions. 125 You can use ``-m`` multiple times. The ``TYPE`` passed must be from the 126 list of meta-data types known to libextractor. You can obtain this list 127 by running ``extract -L``. Use quotes around the entire meta-data 128 argument if the value contains spaces. The meta-data is displayed to 129 other users when they select which files to download. The meta-data and 130 the keywords are optional and may be inferred using GNU libextractor. 131 132 ``gnunet-publish`` has a few additional options to handle namespaces and 133 directories. Refer to the man-page for details. 134 135 Indexing vs Inserting 136 ~~~~~~~~~~~~~~~~~~~~~ 137 138 By default, GNUnet indexes a file instead of making a full copy. This is 139 much more efficient, but requires the file to stay unaltered at the 140 location where it was when it was indexed. If you intend to move, delete 141 or alter a file, consider using the option ``-n`` which will force 142 GNUnet to make a copy of the file in the database. 143 144 Since it is much less efficient, this is strongly discouraged for large 145 files. When GNUnet indexes a file (default), GNUnet does **not** create 146 an additional encrypted copy of the file but just computes a summary (or 147 index) of the file. That summary is approximately two percent of the 148 size of the original file and is stored in GNUnet’s database. Whenever a 149 request for a part of an indexed file reaches GNUnet, this part is 150 encrypted on-demand and send out. This way, there is no need for an 151 additional encrypted copy of the file to stay anywhere on the drive. 152 This is different from other systems, such as Freenet, where each file 153 that is put online must be in Freenet’s database in encrypted format, 154 doubling the space requirements if the user wants to preserve a directly 155 accessible copy in plaintext. 156 157 Thus indexing should be used for all files where the user will keep 158 using this file (at the location given to gnunet-publish) and does not 159 want to retrieve it back from GNUnet each time. If you want to remove a 160 file that you have indexed from the local peer, use the tool 161 gnunet-unindex to un-index the file. 162 163 The option ``-n`` may be used if the user fears that the file might be 164 found on their drive (assuming the computer comes under the control of 165 an adversary). When used with the ``-n`` flag, the user has a much 166 better chance of denying knowledge of the existence of the file, even if 167 it is still (encrypted) on the drive and the adversary is able to crack 168 the encryption (e.g. by guessing the keyword). 169 170 .. _fs_002dConcepts: 171 172 Concepts 173 ~~~~~~~~ 174 175 For better results with filesharing it is useful to understand the 176 following concepts. In addition to anonymous routing GNUnet attempts to 177 give users a better experience in searching for content. GNUnet uses 178 cryptography to safely break content into smaller pieces that can be 179 obtained from different sources without allowing participants to corrupt 180 files. GNUnet makes it difficult for an adversary to send back bogus 181 search results. GNUnet enables content providers to group related 182 content and to establish a reputation. Furthermore, GNUnet allows 183 updates to certain content to be made available. This section is 184 supposed to introduce users to the concepts that are used to achieve 185 these goals. 186 187 .. _Files: 188 189 Files 190 ^^^^^ 191 192 A file in GNUnet is just a sequence of bytes. Any file-format is allowed 193 and the maximum file size is theoretically :math:`2^64 - 1` bytes, 194 except that it would take an impractical amount of time to share such a 195 file. GNUnet itself never interprets the contents of shared files, 196 except when using GNU libextractor to obtain keywords. 197 198 .. _Keywords: 199 200 Keywords 201 ^^^^^^^^ 202 203 Keywords are the most simple mechanism to find files on GNUnet. Keywords 204 are **case-sensitive** and the search string must always match 205 **exactly** the keyword used by the person providing the file. Keywords 206 are never transmitted in plaintext. The only way for an adversary to 207 determine the keyword that you used to search is to guess it (which then 208 allows the adversary to produce the same search request). Since 209 providing keywords by hand for each shared file is tedious, GNUnet uses 210 GNU libextractor to help automate this process. Starting a keyword 211 search on a slow machine can take a little while since the keyword 212 search involves computing a fresh RSA key to formulate the request. 213 214 .. _Directories: 215 216 Directories 217 ^^^^^^^^^^^ 218 219 A directory in GNUnet is a list of file identifiers with meta data. The 220 file identifiers provide sufficient information about the files to allow 221 downloading the contents. Once a directory has been created, it cannot 222 be changed since it is treated just like an ordinary file by the 223 network. Small files (of a few kilobytes) can be inlined in the 224 directory, so that a separate download becomes unnecessary. 225 226 Directories are shared just like ordinary files. If you download a 227 directory with ``gnunet-download``, you can use ``gnunet-directory`` to 228 list its contents. The canonical extension for GNUnet directories when 229 stored as files in your local file-system is \".gnd\". The contents of a 230 directory are URIs and meta data. The URIs contain all the information 231 required by ``gnunet-download`` to retrieve the file. The meta data 232 typically includes the mime-type, description, a filename and other meta 233 information, and possibly even the full original file (if it was small). 234 235 .. _Egos-and-File_002dSharing: 236 237 Egos and File-Sharing 238 ^^^^^^^^^^^^^^^^^^^^^ 239 240 When sharing files, it is sometimes desirable to build a reputation as a 241 source for quality information. With egos, publishers can 242 (cryptographically) sign files, thereby demonstrating that various files 243 were published by the same entity. An ego thus allows users to link 244 different publication events, thereby deliberately reducing anonymity to 245 pseudonymity. 246 247 Egos used in GNUnet's file-sharing for such pseudonymous publishing also 248 correspond to the egos used to identify and sign zones in the GNU Name 249 System. However, if the same ego is used for file-sharing and for a GNS 250 zone, this will weaken the privacy assurances provided by the anonymous 251 file-sharing protocol. 252 253 Note that an ego is NOT bound to a GNUnet peer. There can be multiple 254 egos for a single user, and users could (theoretically) share the 255 private keys of an ego by copying the respective private keys. 256 257 .. _Namespaces: 258 259 Namespaces 260 ^^^^^^^^^^ 261 262 A namespace is a set of files that were signed by the same ego. Today, 263 namespaces are implemented independently of GNS zones, but in the future 264 we plan to merge the two such that a GNS zone can basically contain 265 files using a file-sharing specific record type. 266 267 Files (or directories) that have been signed and placed into a namespace 268 can be updated. Updates are identified as authentic if the same secret 269 key was used to sign the update. 270 271 .. _Advertisements: 272 273 Advertisements 274 ^^^^^^^^^^^^^^ 275 276 Advertisements are used to notify other users about the existence of a 277 namespace. Advertisements are propagated using the normal keyword 278 search. When an advertisement is received (in response to a search), the 279 namespace is added to the list of namespaces available in the 280 namespace-search dialogs of gnunet-fs-gtk and printed by 281 ``gnunet-identity``. Whenever a namespace is created, an appropriate 282 advertisement can be generated. The default keyword for the advertising 283 of namespaces is \"namespace\". 284 285 .. _Anonymity-level: 286 287 Anonymity level 288 ^^^^^^^^^^^^^^^ 289 290 The anonymity level determines how hard it should be for an adversary to 291 determine the identity of the publisher or the searcher/downloader. An 292 anonymity level of zero means that anonymity is not required. The 293 default anonymity level of \"1\" means that anonymous routing is 294 desired, but no particular amount of cover traffic is necessary. A 295 powerful adversary might thus still be able to deduce the origin of the 296 traffic using traffic analysis. Specifying higher anonymity levels 297 increases the amount of cover traffic required. 298 299 The specific numeric value (for anonymity levels above 1) is simple: 300 Given an anonymity level L (above 1), each request FS makes on your 301 behalf must be hidden in L-1 equivalent requests of cover traffic 302 (traffic your peer routes for others) in the same time-period. The 303 time-period is twice the average delay by which GNUnet artificially 304 delays traffic. 305 306 While higher anonymity levels may offer better privacy, they can also 307 significantly hurt performance. 308 309 .. _Content-Priority: 310 311 Content Priority 312 ^^^^^^^^^^^^^^^^ 313 314 Depending on the peer's configuration, GNUnet peers migrate content 315 between peers. Content in this sense are individual blocks of a file, 316 not necessarily entire files. When peers run out of space (due to local 317 publishing operations or due to migration of content from other peers), 318 blocks sometimes need to be discarded. GNUnet first always discards 319 expired blocks (typically, blocks are published with an expiration of 320 about two years in the future; this is another option). If there is 321 still not enough space, GNUnet discards the blocks with the lowest 322 priority. The priority of a block is decided by its popularity (in terms 323 of requests from peers we trust) and, in case of blocks published 324 locally, the base-priority that was specified by the user when the block 325 was published initially. 326 327 .. _Replication: 328 329 Replication 330 ^^^^^^^^^^^ 331 332 When peers migrate content to other systems, the replication level of a 333 block is used to decide which blocks need to be migrated most urgently. 334 GNUnet will always push the block with the highest replication level 335 into the network, and then decrement the replication level by one. If 336 all blocks reach replication level zero, the selection is simply random. 337 338 .. _Namespace-Management: 339 340 Namespace Management 341 ~~~~~~~~~~~~~~~~~~~~ 342 343 The ``gnunet-identity`` tool can be used to create egos. By default, 344 ``gnunet-identity --display`` simply lists all locally available egos. 345 346 .. _Creating-Egos: 347 348 Creating Egos 349 ^^^^^^^^^^^^^ 350 351 With the ``--create=NICK`` option it can also be used to create a new 352 ego. An ego is the virtual identity of the entity in control of a 353 namespace or GNS zone. Anyone can create any number of egos. The 354 provided NICK name automatically corresponds to a GNU Name System domain 355 name. Thus, henceforth name resolution for any name ending in ".NICK" 356 will use the NICK's zone. You should avoid using NICKs that collide with 357 well-known DNS names. 358 359 Currently, the IDENTITY subsystem supports two types of identity keys: 360 ECDSA and EdDSA. By default, ECDSA identities are creates with ECDSA 361 keys. In order to create an identity with EdDSA keys, you can use the 362 ``--eddsa`` flag. 363 364 .. _Deleting-Egos: 365 366 Deleting Egos 367 ^^^^^^^^^^^^^ 368 369 With the ``-D NICK`` option egos can be deleted. Once the ego has been 370 deleted it is impossible to add content to the corresponding namespace 371 or zone. However, the existing GNS zone data is currently not dropped. 372 This may change in the future. 373 374 Deleting the pseudonym does not make the namespace or any content in it 375 unavailable. 376 377 .. _File_002dSharing-URIs: 378 379 File-Sharing URIs 380 ~~~~~~~~~~~~~~~~~ 381 382 GNUnet (currently) uses four different types of URIs for file-sharing. 383 They all begin with \"gnunet://fs/\". This section describes the four 384 different URI types in detail. 385 386 For FS URIs empty KEYWORDs are not allowed. Quotes are allowed to denote 387 whitespace between words. Keywords must contain a balanced number of 388 double quotes. Doubles quotes can not be used in the actual keywords. 389 This means that the string '\"\"foo bar\"\"' will be turned into two 390 OR-ed keywords 'foo' and 'bar', not into '\"foo bar\"'. 391 392 .. _Encoding-of-hash-values-in-URIs: 393 394 Encoding of hash values in URIs 395 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 396 397 Most URIs include some hash values. Hashes are encoded using base32hex 398 (RFC 2938). 399 400 chk-uri 401 .. _Content-Hash-Key-_0028chk_0029: 402 403 Content Hash Key (chk) 404 ^^^^^^^^^^^^^^^^^^^^^^ 405 406 A chk-URI is used to (uniquely) identify a file or directory and to 407 allow peers to download the file. Files are stored in GNUnet as a tree 408 of encrypted blocks. The chk-URI thus contains the information to 409 download and decrypt those blocks. A chk-URI has the format 410 \"gnunet://fs/chk/KEYHASH.QUERYHASH.SIZE\". Here, \"SIZE\" is the size 411 of the file (which allows a peer to determine the shape of the tree), 412 KEYHASH is the key used to decrypt the file (also the hash of the 413 plaintext of the top block) and QUERYHASH is the query used to request 414 the top-level block (also the hash of the encrypted block). 415 416 loc-uri 417 .. _Location-identifiers-_0028loc_0029: 418 419 Location identifiers (loc) 420 ^^^^^^^^^^^^^^^^^^^^^^^^^^ 421 422 For non-anonymous file-sharing, loc-URIs are used to specify which peer 423 is offering the data (in addition to specifying all of the data from a 424 chk-URI). Location identifiers include a digital signature of the peer 425 to affirm that the peer is truly the origin of the data. The format is 426 \"gnunet://fs/loc/KEYHASH.QUERYHASH.SIZE.PEER.SIG.EXPTIME\". Here, 427 \"PEER\" is the public key of the peer (in GNUnet format in base32hex), 428 SIG is the RSA signature (in GNUnet format in base32hex) and EXPTIME 429 specifies when the signature expires (in milliseconds after 1970). 430 431 ksk-uri 432 .. _Keyword-queries-_0028ksk_0029: 433 434 Keyword queries (ksk) 435 ^^^^^^^^^^^^^^^^^^^^^ 436 437 A keyword-URI is used to specify that the desired operation is the 438 search using a particular keyword. The format is simply 439 \"gnunet://fs/ksk/KEYWORD\". Non-ASCII characters can be specified using 440 the typical URI-encoding (using hex values) from HTTP. \"+\" can be used 441 to specify multiple keywords (which are then logically \"OR\"-ed in the 442 search, results matching both keywords are given a higher rank): 443 \"gnunet://fs/ksk/KEYWORD1+KEYWORD2\". ksk-URIs must not begin or end 444 with the plus ('+') character. Furthermore they must not contain '++'. 445 446 sks-uri 447 .. _Namespace-content-_0028sks_0029: 448 449 Namespace content (sks) 450 ^^^^^^^^^^^^^^^^^^^^^^^ 451 452 **Please note that the text in this subsection is outdated and needs** 453 **to be rewritten for version 0.10!** **This especially concerns the 454 terminology of Pseudonym/Ego/Identity.** 455 456 Namespaces are sets of files that have been approved by some (usually 457 pseudonymous) user --- typically by that user publishing all of the 458 files together. A file can be in many namespaces. A file is in a 459 namespace if the owner of the ego (aka the namespace's private key) 460 signs the CHK of the file cryptographically. An SKS-URI is used to 461 search a namespace. The result is a block containing meta data, the CHK 462 and the namespace owner's signature. The format of a sks-URI is 463 \"gnunet://fs/sks/NAMESPACE/IDENTIFIER\". Here, \"NAMESPACE\" is the 464 public key for the namespace. \"IDENTIFIER\" is a freely chosen keyword 465 (or password!). A commonly used identifier is \"root\" which by 466 convention refers to some kind of index or other entry point into the 467 namespace. 468 469