fs.rst - gnunet-handbook - The GNUnet Handbook

fs.rst (18907B)
      1 File-sharing
      2 ------------
      3 
      4 This chapter documents the GNUnet file-sharing application. The original
      5 file-sharing implementation for GNUnet was designed to provide anonymous
      6 file-sharing. However, over time, we have also added support for
      7 non-anonymous file-sharing (which can provide better performance).
      8 Anonymous and non-anonymous file-sharing are quite integrated in GNUnet
      9 and, except for routing, share most of the concepts and implementation.
     10 There are three primary file-sharing operations: publishing, searching
     11 and downloading. For each of these operations, the user specifies an
     12 anonymity level. If both the publisher and the searcher/downloader
     13 specify “no anonymity”, non-anonymous file-sharing is used. If either
     14 user specifies some desired degree of anonymity, anonymous file-sharing
     15 will be used.
     16 
     17 After a short introduction, we will first look at the various concepts
     18 in GNUnet’s file-sharing implementation. Then, we will discuss specifics
     19 as to how they impact users that publish, search or download files.
     20 
     21 Searching
     22 ~~~~~~~~~
     23 
     24 The command ``gnunet-search`` can be used to search for content on
     25 GNUnet. The format is:
     26 
     27 ::
     28 
     29    $ gnunet-search [-t TIMEOUT] KEYWORD
     30 
     31 The ``-t`` option specifies that the query should timeout after
     32 approximately ``TIMEOUT`` seconds. A value of zero (“0”) is interpreted
     33 as no timeout, which is the default. In this case, gnunet-search will
     34 never terminate (unless you press CTRL-C).
     35 
     36 If multiple words are passed as keywords, they will all be considered
     37 optional. Prefix keywords with a “+” to make them mandatory.
     38 
     39 Note that searching using:
     40 
     41 ::
     42 
     43    $ gnunet-search Das Kapital
     44 
     45 is not the same as searching for
     46 
     47 ::
     48 
     49    $ gnunet-search "Das Kapital"
     50 
     51 as the first will match files shared under the keywords “Das” or
     52 “Kapital” whereas the second will match files shared under the keyword
     53 “Das Kapital”.
     54 
     55 Search results are printed like this:
     56 
     57 ::
     58 
     59    #15:
     60    gnunet-download -o "COPYING" gnunet://fs/chk/PGK8M...3EK130.75446
     61 
     62 The whole line is the command you would have to enter to download the
     63 file. The first argument passed to ``-o`` is the suggested filename (you
     64 may change it to whatever you like). It is followed by the key for
     65 decrypting the file, the query for searching the file, a checksum (in
     66 hexadecimal) finally the size of the file in bytes.
     67 
     68 Downloading
     69 ~~~~~~~~~~~
     70 
     71 In order to download a file, you need the whole line returned by
     72 gnunet-search. You can then use the tool ``gnunet-download`` to obtain
     73 the file:
     74 
     75 ::
     76 
     77    $ gnunet-download -o <FILENAME> <GNUNET-URL>
     78 
     79 ``FILENAME`` specifies the name of the file where GNUnet is supposed to
     80 write the result. Existing files are overwritten. If the existing file
     81 contains blocks that are identical to the desired download, those blocks
     82 will not be downloaded again (automatic resume).
     83 
     84 If you want to download the GPL from the previous example, you do the
     85 following:
     86 
     87 ::
     88 
     89    $ gnunet-download -o "COPYING" gnunet://fs/chk/PGK8M...3EK130.75446
     90 
     91 If you ever have to abort a download, you can continue it at any time by
     92 re-issuing gnunet-download with the same filename. In that case, GNUnet
     93 will **not** download blocks again that are already present.
     94 
     95 GNUnet’s file-encoding mechanism will ensure file integrity, even if the
     96 existing file was not downloaded from GNUnet in the first place.
     97 
     98 You may want to use the ``-V`` switch to turn on verbose reporting. In
     99 this case, gnunet-download will print the current number of bytes
    100 downloaded whenever new data was received.
    101 
    102 Publishing
    103 ~~~~~~~~~~
    104 
    105 The command ``gnunet-publish`` can be used to add content to the
    106 network. The basic format of the command is:
    107 
    108 ::
    109 
    110    $ gnunet-publish [-n] [-k KEYWORDS]* [-m TYPE:VALUE] FILENAME
    111 
    112 For example:
    113 
    114 ::
    115 
    116    $ gnunet-publish -m "description:GNU License" -k gpl -k test -m "mimetype:text/plain" COPYING
    117 
    118 The option ``-k`` is used to specify keywords for the file that should
    119 be inserted. You can supply any number of keywords, and each of the
    120 keywords will be sufficient to locate and retrieve the file. Please note
    121 that you must use the ``-k`` option more than once – one for each
    122 expression you use as a keyword for the filename.
    123 
    124 The ``-m`` option is used to specify meta-data, such as descriptions.
    125 You can use ``-m`` multiple times. The ``TYPE`` passed must be from the
    126 list of meta-data types known to libextractor. You can obtain this list
    127 by running ``extract -L``. Use quotes around the entire meta-data
    128 argument if the value contains spaces. The meta-data is displayed to
    129 other users when they select which files to download. The meta-data and
    130 the keywords are optional and may be inferred using GNU libextractor.
    131 
    132 ``gnunet-publish`` has a few additional options to handle namespaces and
    133 directories. Refer to the man-page for details.
    134 
    135 Indexing vs Inserting
    136 ~~~~~~~~~~~~~~~~~~~~~
    137 
    138 By default, GNUnet indexes a file instead of making a full copy. This is
    139 much more efficient, but requires the file to stay unaltered at the
    140 location where it was when it was indexed. If you intend to move, delete
    141 or alter a file, consider using the option ``-n`` which will force
    142 GNUnet to make a copy of the file in the database.
    143 
    144 Since it is much less efficient, this is strongly discouraged for large
    145 files. When GNUnet indexes a file (default), GNUnet does **not** create
    146 an additional encrypted copy of the file but just computes a summary (or
    147 index) of the file. That summary is approximately two percent of the
    148 size of the original file and is stored in GNUnet’s database. Whenever a
    149 request for a part of an indexed file reaches GNUnet, this part is
    150 encrypted on-demand and send out. This way, there is no need for an
    151 additional encrypted copy of the file to stay anywhere on the drive.
    152 This is different from other systems, such as Freenet, where each file
    153 that is put online must be in Freenet’s database in encrypted format,
    154 doubling the space requirements if the user wants to preserve a directly
    155 accessible copy in plaintext.
    156 
    157 Thus indexing should be used for all files where the user will keep
    158 using this file (at the location given to gnunet-publish) and does not
    159 want to retrieve it back from GNUnet each time. If you want to remove a
    160 file that you have indexed from the local peer, use the tool
    161 gnunet-unindex to un-index the file.
    162 
    163 The option ``-n`` may be used if the user fears that the file might be
    164 found on their drive (assuming the computer comes under the control of
    165 an adversary). When used with the ``-n`` flag, the user has a much
    166 better chance of denying knowledge of the existence of the file, even if
    167 it is still (encrypted) on the drive and the adversary is able to crack
    168 the encryption (e.g. by guessing the keyword).
    169 
    170 .. _fs_002dConcepts:
    171 
    172 Concepts
    173 ~~~~~~~~
    174 
    175 For better results with filesharing it is useful to understand the
    176 following concepts. In addition to anonymous routing GNUnet attempts to
    177 give users a better experience in searching for content. GNUnet uses
    178 cryptography to safely break content into smaller pieces that can be
    179 obtained from different sources without allowing participants to corrupt
    180 files. GNUnet makes it difficult for an adversary to send back bogus
    181 search results. GNUnet enables content providers to group related
    182 content and to establish a reputation. Furthermore, GNUnet allows
    183 updates to certain content to be made available. This section is
    184 supposed to introduce users to the concepts that are used to achieve
    185 these goals.
    186 
    187 .. _Files:
    188 
    189 Files
    190 ^^^^^
    191 
    192 A file in GNUnet is just a sequence of bytes. Any file-format is allowed
    193 and the maximum file size is theoretically :math:`2^64 - 1` bytes,
    194 except that it would take an impractical amount of time to share such a
    195 file. GNUnet itself never interprets the contents of shared files,
    196 except when using GNU libextractor to obtain keywords.
    197 
    198 .. _Keywords:
    199 
    200 Keywords
    201 ^^^^^^^^
    202 
    203 Keywords are the most simple mechanism to find files on GNUnet. Keywords
    204 are **case-sensitive** and the search string must always match
    205 **exactly** the keyword used by the person providing the file. Keywords
    206 are never transmitted in plaintext. The only way for an adversary to
    207 determine the keyword that you used to search is to guess it (which then
    208 allows the adversary to produce the same search request). Since
    209 providing keywords by hand for each shared file is tedious, GNUnet uses
    210 GNU libextractor to help automate this process. Starting a keyword
    211 search on a slow machine can take a little while since the keyword
    212 search involves computing a fresh RSA key to formulate the request.
    213 
    214 .. _Directories:
    215 
    216 Directories
    217 ^^^^^^^^^^^
    218 
    219 A directory in GNUnet is a list of file identifiers with meta data. The
    220 file identifiers provide sufficient information about the files to allow
    221 downloading the contents. Once a directory has been created, it cannot
    222 be changed since it is treated just like an ordinary file by the
    223 network. Small files (of a few kilobytes) can be inlined in the
    224 directory, so that a separate download becomes unnecessary.
    225 
    226 Directories are shared just like ordinary files. If you download a
    227 directory with ``gnunet-download``, you can use ``gnunet-directory`` to
    228 list its contents. The canonical extension for GNUnet directories when
    229 stored as files in your local file-system is \".gnd\". The contents of a
    230 directory are URIs and meta data. The URIs contain all the information
    231 required by ``gnunet-download`` to retrieve the file. The meta data
    232 typically includes the mime-type, description, a filename and other meta
    233 information, and possibly even the full original file (if it was small).
    234 
    235 .. _Egos-and-File_002dSharing:
    236 
    237 Egos and File-Sharing
    238 ^^^^^^^^^^^^^^^^^^^^^
    239 
    240 When sharing files, it is sometimes desirable to build a reputation as a
    241 source for quality information. With egos, publishers can
    242 (cryptographically) sign files, thereby demonstrating that various files
    243 were published by the same entity. An ego thus allows users to link
    244 different publication events, thereby deliberately reducing anonymity to
    245 pseudonymity.
    246 
    247 Egos used in GNUnet's file-sharing for such pseudonymous publishing also
    248 correspond to the egos used to identify and sign zones in the GNU Name
    249 System. However, if the same ego is used for file-sharing and for a GNS
    250 zone, this will weaken the privacy assurances provided by the anonymous
    251 file-sharing protocol.
    252 
    253 Note that an ego is NOT bound to a GNUnet peer. There can be multiple
    254 egos for a single user, and users could (theoretically) share the
    255 private keys of an ego by copying the respective private keys.
    256 
    257 .. _Namespaces:
    258 
    259 Namespaces
    260 ^^^^^^^^^^
    261 
    262 A namespace is a set of files that were signed by the same ego. Today,
    263 namespaces are implemented independently of GNS zones, but in the future
    264 we plan to merge the two such that a GNS zone can basically contain
    265 files using a file-sharing specific record type.
    266 
    267 Files (or directories) that have been signed and placed into a namespace
    268 can be updated. Updates are identified as authentic if the same secret
    269 key was used to sign the update.
    270 
    271 .. _Advertisements:
    272 
    273 Advertisements
    274 ^^^^^^^^^^^^^^
    275 
    276 Advertisements are used to notify other users about the existence of a
    277 namespace. Advertisements are propagated using the normal keyword
    278 search. When an advertisement is received (in response to a search), the
    279 namespace is added to the list of namespaces available in the
    280 namespace-search dialogs of gnunet-fs-gtk and printed by
    281 ``gnunet-identity``. Whenever a namespace is created, an appropriate
    282 advertisement can be generated. The default keyword for the advertising
    283 of namespaces is \"namespace\".
    284 
    285 .. _Anonymity-level:
    286 
    287 Anonymity level
    288 ^^^^^^^^^^^^^^^
    289 
    290 The anonymity level determines how hard it should be for an adversary to
    291 determine the identity of the publisher or the searcher/downloader. An
    292 anonymity level of zero means that anonymity is not required. The
    293 default anonymity level of \"1\" means that anonymous routing is
    294 desired, but no particular amount of cover traffic is necessary. A
    295 powerful adversary might thus still be able to deduce the origin of the
    296 traffic using traffic analysis. Specifying higher anonymity levels
    297 increases the amount of cover traffic required.
    298 
    299 The specific numeric value (for anonymity levels above 1) is simple:
    300 Given an anonymity level L (above 1), each request FS makes on your
    301 behalf must be hidden in L-1 equivalent requests of cover traffic
    302 (traffic your peer routes for others) in the same time-period. The
    303 time-period is twice the average delay by which GNUnet artificially
    304 delays traffic.
    305 
    306 While higher anonymity levels may offer better privacy, they can also
    307 significantly hurt performance.
    308 
    309 .. _Content-Priority:
    310 
    311 Content Priority
    312 ^^^^^^^^^^^^^^^^
    313 
    314 Depending on the peer's configuration, GNUnet peers migrate content
    315 between peers. Content in this sense are individual blocks of a file,
    316 not necessarily entire files. When peers run out of space (due to local
    317 publishing operations or due to migration of content from other peers),
    318 blocks sometimes need to be discarded. GNUnet first always discards
    319 expired blocks (typically, blocks are published with an expiration of
    320 about two years in the future; this is another option). If there is
    321 still not enough space, GNUnet discards the blocks with the lowest
    322 priority. The priority of a block is decided by its popularity (in terms
    323 of requests from peers we trust) and, in case of blocks published
    324 locally, the base-priority that was specified by the user when the block
    325 was published initially.
    326 
    327 .. _Replication:
    328 
    329 Replication
    330 ^^^^^^^^^^^
    331 
    332 When peers migrate content to other systems, the replication level of a
    333 block is used to decide which blocks need to be migrated most urgently.
    334 GNUnet will always push the block with the highest replication level
    335 into the network, and then decrement the replication level by one. If
    336 all blocks reach replication level zero, the selection is simply random.
    337 
    338 .. _Namespace-Management:
    339 
    340 Namespace Management
    341 ~~~~~~~~~~~~~~~~~~~~
    342 
    343 The ``gnunet-identity`` tool can be used to create egos. By default,
    344 ``gnunet-identity --display`` simply lists all locally available egos.
    345 
    346 .. _Creating-Egos:
    347 
    348 Creating Egos
    349 ^^^^^^^^^^^^^
    350 
    351 With the ``--create=NICK`` option it can also be used to create a new
    352 ego. An ego is the virtual identity of the entity in control of a
    353 namespace or GNS zone. Anyone can create any number of egos. The
    354 provided NICK name automatically corresponds to a GNU Name System domain
    355 name. Thus, henceforth name resolution for any name ending in ".NICK"
    356 will use the NICK's zone. You should avoid using NICKs that collide with
    357 well-known DNS names.
    358 
    359 Currently, the IDENTITY subsystem supports two types of identity keys:
    360 ECDSA and EdDSA. By default, ECDSA identities are creates with ECDSA
    361 keys. In order to create an identity with EdDSA keys, you can use the
    362 ``--eddsa`` flag.
    363 
    364 .. _Deleting-Egos:
    365 
    366 Deleting Egos
    367 ^^^^^^^^^^^^^
    368 
    369 With the ``-D NICK`` option egos can be deleted. Once the ego has been
    370 deleted it is impossible to add content to the corresponding namespace
    371 or zone. However, the existing GNS zone data is currently not dropped.
    372 This may change in the future.
    373 
    374 Deleting the pseudonym does not make the namespace or any content in it
    375 unavailable.
    376 
    377 .. _File_002dSharing-URIs:
    378 
    379 File-Sharing URIs
    380 ~~~~~~~~~~~~~~~~~
    381 
    382 GNUnet (currently) uses four different types of URIs for file-sharing.
    383 They all begin with \"gnunet://fs/\". This section describes the four
    384 different URI types in detail.
    385 
    386 For FS URIs empty KEYWORDs are not allowed. Quotes are allowed to denote
    387 whitespace between words. Keywords must contain a balanced number of
    388 double quotes. Doubles quotes can not be used in the actual keywords.
    389 This means that the string '\"\"foo bar\"\"' will be turned into two
    390 OR-ed keywords 'foo' and 'bar', not into '\"foo bar\"'.
    391 
    392 .. _Encoding-of-hash-values-in-URIs:
    393 
    394 Encoding of hash values in URIs
    395 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    396 
    397 Most URIs include some hash values. Hashes are encoded using base32hex
    398 (RFC 2938).
    399 
    400 chk-uri
    401 .. _Content-Hash-Key-_0028chk_0029:
    402 
    403 Content Hash Key (chk)
    404 ^^^^^^^^^^^^^^^^^^^^^^
    405 
    406 A chk-URI is used to (uniquely) identify a file or directory and to
    407 allow peers to download the file. Files are stored in GNUnet as a tree
    408 of encrypted blocks. The chk-URI thus contains the information to
    409 download and decrypt those blocks. A chk-URI has the format
    410 \"gnunet://fs/chk/KEYHASH.QUERYHASH.SIZE\". Here, \"SIZE\" is the size
    411 of the file (which allows a peer to determine the shape of the tree),
    412 KEYHASH is the key used to decrypt the file (also the hash of the
    413 plaintext of the top block) and QUERYHASH is the query used to request
    414 the top-level block (also the hash of the encrypted block).
    415 
    416 loc-uri
    417 .. _Location-identifiers-_0028loc_0029:
    418 
    419 Location identifiers (loc)
    420 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    421 
    422 For non-anonymous file-sharing, loc-URIs are used to specify which peer
    423 is offering the data (in addition to specifying all of the data from a
    424 chk-URI). Location identifiers include a digital signature of the peer
    425 to affirm that the peer is truly the origin of the data. The format is
    426 \"gnunet://fs/loc/KEYHASH.QUERYHASH.SIZE.PEER.SIG.EXPTIME\". Here,
    427 \"PEER\" is the public key of the peer (in GNUnet format in base32hex),
    428 SIG is the RSA signature (in GNUnet format in base32hex) and EXPTIME
    429 specifies when the signature expires (in milliseconds after 1970).
    430 
    431 ksk-uri
    432 .. _Keyword-queries-_0028ksk_0029:
    433 
    434 Keyword queries (ksk)
    435 ^^^^^^^^^^^^^^^^^^^^^
    436 
    437 A keyword-URI is used to specify that the desired operation is the
    438 search using a particular keyword. The format is simply
    439 \"gnunet://fs/ksk/KEYWORD\". Non-ASCII characters can be specified using
    440 the typical URI-encoding (using hex values) from HTTP. \"+\" can be used
    441 to specify multiple keywords (which are then logically \"OR\"-ed in the
    442 search, results matching both keywords are given a higher rank):
    443 \"gnunet://fs/ksk/KEYWORD1+KEYWORD2\". ksk-URIs must not begin or end
    444 with the plus ('+') character. Furthermore they must not contain '++'.
    445 
    446 sks-uri
    447 .. _Namespace-content-_0028sks_0029:
    448 
    449 Namespace content (sks)
    450 ^^^^^^^^^^^^^^^^^^^^^^^
    451 
    452 **Please note that the text in this subsection is outdated and needs**
    453 **to be rewritten for version 0.10!** **This especially concerns the
    454 terminology of Pseudonym/Ego/Identity.**
    455 
    456 Namespaces are sets of files that have been approved by some (usually
    457 pseudonymous) user --- typically by that user publishing all of the
    458 files together. A file can be in many namespaces. A file is in a
    459 namespace if the owner of the ego (aka the namespace's private key)
    460 signs the CHK of the file cryptographically. An SKS-URI is used to
    461 search a namespace. The result is a block containing meta data, the CHK
    462 and the namespace owner's signature. The format of a sks-URI is
    463 \"gnunet://fs/sks/NAMESPACE/IDENTIFIER\". Here, \"NAMESPACE\" is the
    464 public key for the namespace. \"IDENTIFIER\" is a freely chosen keyword
    465 (or password!). A commonly used identifier is \"root\" which by
    466 convention refers to some kind of index or other entry point into the
    467 namespace.
    468 
    469