lsd0003

LSD0003: Set Union
Log | Files | Refs | README

commit 7c6bef140c96746cb3be4e2b2907cc6fee52b70e
parent 64b92f47a1ee2fd808e22a075d984dd77aab0493
Author: Elias Summermatter <elias.summermatter@seccom.ch>
Date:   Tue, 11 May 2021 08:58:28 +0200

Improved RFC

Diffstat:
Mdraft-summermatter-set-union.xml | 590++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
1 file changed, 404 insertions(+), 186 deletions(-)

diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml @@ -106,7 +106,7 @@ </t> <t> The protocol described in this report is generic and - suitable for a wide range of applicaitons. As a result, + suitable for a wide range of applications. As a result, the internal structure of the elements in the sets must be defined and verified by the application using the protocol. This document thus does not cover the element @@ -121,9 +121,9 @@ using the protocol must provide a parameter that specifies the cost-ratio of round-trips vs. bandwidth usage. Given this trade-off factor, the protocol will then choose parameters - that minimize the total execution cost. In particular, there - is one major choice to be made, which is between sending the - full set of elements, or just sending the elements that differ. + that minimize the total execution costs. In particular, there + is one major choice to be made, namely between sending the + complete set of elements, or sending only the elements that differ. In the latter case, our design is basically a concrete implementation of a proposal by Eppstein.<xref target="Eppstein" format="default" /> </t> @@ -152,7 +152,7 @@ <t> To defend against some of these attacks, applications need to remember the number of elements previously - shared with a peer, and offer a means to check that + shared with a peer, and provide a way to check that elements are well-formed. Applications may also be able to provide an upper bound on the total number of valid elements that may exist. For example, in E-voting, the @@ -179,14 +179,14 @@ <section anchor="bf" numbered="true" toc="default"> <name>Bloom Filters</name> <t> - A Bloom filter (BF) is a space-efficient datastructure to test if am element is part of a set of elements. + A Bloom filter (BF) is a space-efficient datastructure to test if an element is part of a set of elements. Elements are identified by an element ID. Since a BF is a probabilistic datastructure, it is possible to have false-positives: when asked if an element is in the set, the answer from a BF is either "no" or "maybe". </t> <t> A BF consists of L buckets. Every bucket is a binary value that can be either 0 or 1. All buckets are initialized - to 0. A mapping function M is used to map each the ID of each element from the set to a subset of k buckets. M is non-injective + to 0. A mapping function M is used to map each ID of each element from the set to a subset of k buckets. M is non-injective and can thus map the same element multiple times to the same bucket. The type of the mapping function can thus be described by the following mathematical notation: </t> @@ -214,7 +214,7 @@ To check if an element may be in the set, one tests if all buckets under the map M are set to 1. </t> <t> - Further in this document a bitstream outputted by the mapping function is represented by + Further in this document a bitstream output by the mapping function is represented by a set of numeric values for example (0101) = (2,4). In the BF the buckets are set to 1 if the corresponding bit in the bitstream is 1. If there is a collision and a bucket is already set to 1, the bucket stays 1. @@ -231,8 +231,8 @@ ]]></artwork> </figure> <t> - Is easy to see that the M(element) = (0,3) could be in the BF bellow and M(element) = (0,2) can't be - in the BF bellow: + Is easy to see that the M(element) = (0,3) could be in the BF below and M(element) = (0,2) can't be + in the BF below: </t> <figure anchor="figure_bf_contains"> @@ -259,7 +259,7 @@ <name>Counting Bloom Filter</name> <t> A Counting Bloom Filter (CBF) is an extension of the <xref target="bf" format="title" />. In the CBF, buckets are - unsigned numbers instead of binary values. This allows the removal of an elements from the CBF. + unsigned numbers instead of binary values. This allows the removal of an element from the CBF. </t> <t> Adding an element to the CBF is similar to the adding operation of the BF. However, instead of setting the bucket on hit to 1 the @@ -294,7 +294,7 @@ </figure> <t> In practice, the number of bits available for the counters is usually finite. For example, given a 4-bit - counter, a CBF bucket would overflow once 16 elements are mapped to the same bucket. To efficiently + counter, a CBF bucket would overflow 16 elements are mapped to the same bucket. To efficiently handle this case, the maximum value (15 in our example) is considered to represent "infinity". Once the order of a bucket reaches "infinity", it is no longer incremented or decremented. </t> @@ -328,7 +328,7 @@ including the CPU architecture. </t> <t> - If the IBF size is to small or the mapping + If the IBF size is too small or the mapping function does not spread out the elements uniformly, the signed counter can overflow or underflow. As with the CBF, the "maximum" value is @@ -456,11 +456,11 @@ hashSum | 0x0000 | 0x4242 | 0x0000 | 0x4242 | Note that it is possible to "remove" elements from an IBF that were never present in the IBF in the first place. A negative counter value is thus indicative of elements that were removed without having been added. Note that an IBF bucket - counter of zero no longer warrants that an element mapped to that bucket is not + counter of zero no longer guarantees that an element mapped to that bucket is not present in the set: a bucket with a counter of zero can be the result of one element being added and a different element (mapped to the same bucket) being removed. To check that an element is not present requires a counter of zero and an - IDSUM and HASHSUM of zero --- and some assurance that there was no collision due + IDSUM and HASHSUM of zero --- and some certainty that there was no collision due to the limited number of bits in IDSUM and HASHSUM. Thus, IBFs are not suitable to replace BFs or IBFs. </t> @@ -504,7 +504,7 @@ hashSum | 0x0000 | 0x4242 | 0x0000 | 0x4242 | a different mapping function M. A more common scenario (especially if L was too small) is that IBF decoding fails because there is no pure bucket. In this case, the - higher-level protocol also should retry using different parameters. + higher-level protocol should also retry using different parameters. </t> <t> Suppose the IBF contains a pure bucket. In this case, the IDSUM in the @@ -513,9 +513,9 @@ hashSum | 0x0000 | 0x4242 | 0x0000 | 0x4242 | was negative, and by removing it if the counter was positive). This is likely to cause other buckets to become pure, allowing further elements to be decoded. Eventually, decoding should succeed with - all counters and IDSUM and HASHSUM values reaching zero. However, it is also + all counters and IDSUM and HASHSUM values reach zero. However, it is also possible that an IBF only partly decodes and then decoding fails after - yielding some elements. + obtaining some elements. </t> <t> In the following example the successful decoding of an IBF containing @@ -554,7 +554,7 @@ hashSum | 0x0000 | 0x4242 | 0x0000 | 0x4242 | </figure> <t> In the IBF only pure buckets are left, we choose to continue decoding bucket 2 and decode element - with the hash 0x4242. Now the IBF is empty (all buckets have count 0) that means the IBF has successfully + with the hash 0x4242. Now the IBF is empty (all buckets have count 0) that means the IBF has been successfully decoded. </t> <figure anchor="figure_ibf_decode_2"> @@ -577,7 +577,7 @@ hashSum | 0x0000 | 0x0000 | 0x0000 | 0x0000 | Given addition and removal as defined above, it is possible to define an operation on IBFs that computes an IBF representing the set difference. Suppose IBF1 represents set A, and IBF2 represents set B. Then this set difference operation will compute IBF3 which - represents the set A - B --- without needing elements from set A or B. + represents the set A - B --- without having to transfer the elements from set A or B. To calculate the IBF representing this set difference, both IBFs must have the same length L, the same number of buckets per element k and use the same map M. Given this, @@ -635,7 +635,7 @@ hashSum | 0x0101 | 0x0101 | 0x5050 | 0x0000 | +-------------+-------------+-------------+-------------+ ]]></artwork> </figure> - <t>After calculating and decoding the IBF-AB its clear that in IBF-A the element with the hash 0x5050 + <t>After calculating and decoding the IBF-AB shows clear that in IBF-A the element with the hash 0x5050 is missing (-1 in bit-3) while in IBF-B the element with the hash 0101 is missing (1 in bit-1 and bit-2). The element with hash 0x4242 is present in IBF-A and IBF-B and is removed by the set difference operation (bit-4). @@ -648,8 +648,8 @@ hashSum | 0x0101 | 0x0101 | 0x5050 | 0x0000 | <t> To facilitate a reasonably CPU-efficient implementation, this specification requires the - IBF counter to always use 8 bits. Fewer bits - would result in a paritcularly inefficient + IBF counter always to use 8 bits. Fewer bits + would result in a particularly inefficient implementation, while more bits are rarely useful as sets with so many elements should likely be represented using a larger number of buckets. This @@ -672,17 +672,17 @@ hashSum | 0x0101 | 0x0101 | 0x5050 | 0x0000 | </t> <t> For the "HASHSUM", we always use a 32-bit - representation. Here, it is mostly important to + representation. Here, it is most important to avoid collisions, where different elements are mapped to the same hash. However, we note that by design only a few elements (certainly less than 127) should ever be mapped - to the same bucket, so a small number of bits + to the same bucket, a small number of bits should suffice. Furthermore, our protocol is designed to handle occasional collisions, so while with 32-bits there remains a chance of accidental collisions, at 32 bit the chance is - generally believed to be sufficiently small enough + generally believed to be sufficiently small for the protocol to handle those cases efficiently for a wide range of use-cases. Smaller hash values would safe bandwidth, but also drastically @@ -697,7 +697,7 @@ hashSum | 0x0101 | 0x0101 | 0x5050 | 0x0000 | with HMAC-SHA512 as XTR and HMAC-SHA256 as PRF and salt is set to the unsigned 64-bit equivalent of 0. The output is then truncated to 64-bit. Its important that the elements can be redistributed over the buckets in case the IBF does not - decode, that's why the ID is salted with a random salt given in the SALT field of this message. + decode. That's why the ID is salted with a random salt given in the SALT field of this message. Salting is done by calculation the a random salt modulo 64 (using only the lowest 6-bits of the salt) and do a bitwise right rotation of output of KDF by the 6-bit salts numeric representation. </t> @@ -743,22 +743,22 @@ FUNCTION id_calculation (element,ibf_salt): <t> The mapping function M as described above in the figure <xref target="bf_mapping_function_math" format="default" /> decides in which buckets the ID and HASH have to be binary XORed to. In practice - there the following algorithm is used: + the following algorithm is used: </t> <t> The first index is simply the HASH modulo the IBF size. The second index is calculated by creating a new 64-bit value by shifting the 32-bit value left and setting the lower 32-bit to the number of indexes already processed. From the - resulting 64-bit value a CRC32 checksum is created the second index is now the modulo of the - CRC32 output this is repeated until the predefined amount indexes is generated. + resulting 64-bit value a CRC32 checksum is created. The second index is now the modulo of the + CRC32 output, this is repeated until the predefined amount of indexes is generated. In the case a index is hit twice, which would mean this bucket could not get pure again, - the second hit is just skipped and the next iteration is used as. + the second hit is just skipped and the next iteration is used. </t> <figure anchor="ibf_format_bucket_identification_pseudo_code"> <artwork name="" type="" align="left" alt=""><![CDATA[ # INPUTS: # key: Is the ID of the element calculated in the id_calculation function above. -# number_of_buckets_per_element: Pre-defined count of buckets elements are inserted into +# number_of_buckets_per_element: Pre-defined numbers of buckets elements are inserted into # ibf_size: the size of the ibf (count of buckets) # OUTPUT: # dst: Array with bucket IDs to insert ID and HASH @@ -807,7 +807,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <section anchor="se_description" numbered="true" toc="default"> <name>Description</name> <t> - Strata Estimators help estimate the size of the set difference between two set of elements. + Strata Estimators help estimate the size of the set difference between two sets of elements. This is necessary to efficiently determinate the tuning parameters for an IBF, in particular a good value for L. </t> @@ -845,7 +845,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <t> The simplest mode is the "full" synchronization mode. The idea is that if the difference between the sets of the two peers exceeds a certain threshold, the overhead to determine which elements are different outweighs - the overhead of sending the complete set. In this case, the most efficient method can be to just + the overhead of sending the complete set. In this case, the most efficient method can be just to exchange the full sets. </t> <t> @@ -854,7 +854,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </t> <t> The second possibility is that the difference of the sets is small compared to the set size. - Here, an efficient "delta" synchronization mode is more efficient. Given these two possibilities, + In this case, an efficient "delta" synchronization mode is more efficient. These two possibilities given, the first steps of the protocol are used to determine which mode should be used. </t> @@ -920,7 +920,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <em><xref target="messages_ibf" format="title" /></em> message from the initiating peer and transitions into the <strong>Expecting IBF Last</strong> state when there are multiple <em><xref target="messages_ibf" format="title" /></em> messages to sent, - when there is just a single <em><xref target="messages_ibf" format="title" /></em> message the reviving peer + when there is just a single <em><xref target="messages_ibf" format="title" /></em> message the receiving peer transitions directly to the <strong>Active Decoding</strong> state. </t> <t> @@ -975,8 +975,8 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) When a new element message has been received the peer checks if a corresponding <em><xref target="messages_demand" format="title" /></em> for the element has been sent and the demand is still unsatisfied. - If the element has been demanded the peer checks the element for validity, removed it from the list - of pending demands and then then saves the element to the the set otherwise the peer + If the element has been demanded the peer checks the element for validity, removes it from the list + of pending demands and then saves the element to the set otherwise the peer rejects the element. </dd> <dt><em><xref target="messages_ibf" format="title" /></em> message:</dt> @@ -990,7 +990,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dt><em><xref target="messages_ibf_last" format="title" /></em> message:</dt> <dd> If an <em><xref target="messages_ibf_last" format="title" /></em> message is received this - indicates that the there is just one IBF slice and a direct state and role transition from + indicates that there is just one IBF slice left and a direct state and role transition from <strong>Passive Decoding</strong> to <strong>Active Decoding</strong> is initiated. </dd> <dt><em><xref target="messages_done" format="title" /></em> message:</dt> @@ -1017,7 +1017,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) sends a <em><xref target="messages_inquiry" format="title" /></em> to the passive peer. </t> <t> - In case the IBF does not successfully decode anymore, the active peer sends a new IBF to the passive client + In case the IBF does not successfully decode anymore, the active peer sends a new IBF to the passive peer and changes into <strong>Passive Decoding</strong> state. This initiates a role swap. To reduce overhead and prevent double transmission of offers and elements the new IBF is created on the new complete set after all demands and inquiries have been satisfied. @@ -1039,8 +1039,8 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) the active peer. If a Inquiry has been sent and <!-- FIXME: is this indeed a condition that is checked? --> the offered element is missing in the active peers set, the active peer sends a <em><xref target="messages_demand" format="title" /></em> message to the - passive peer. The send demand need to be added to a list with unsatisfied demands. - In the case the received offer is for an element that is already in the set of the peer the offer is ignored. + passive peer. The sent demand needs to be added to a list with unsatisfied demands. + In case the received offer is for an element that is already in the set of the peer the offer is ignored. <!-- FIXME: what happens if the offer is for an element that is not missing? I think then we just ignore it, right? --> </dd> <dt><em><xref target="messages_demand" format="title" /></em> message:</dt> @@ -1051,9 +1051,9 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <em><xref target="messages_elements" format="title" /></em> message if a offer request for the element has been sent. <!-- IMPLEMENT: This is not implemented in code // Change --> - In the case the demanded element does not exist in the - set there was probably a bucket decoded that was not really pure so potentially all <em><xref target="messages_offer" format="title" /></em> - and <em><xref target="messages_demand" format="title" /></em> messages sent after are invalid + In case the demanded element does not exist in the + set there was probably a bucket decoded that was not pure so potentially all <em><xref target="messages_offer" format="title" /></em> + and <em><xref target="messages_demand" format="title" /></em> messages sent later are invalid in this case a role change active -> passive with a new IBF is easiest. If a demand for the same element is received multiple times the demands should be discarded. @@ -1062,18 +1062,18 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </dd> <dt><em><xref target="messages_elements" format="title" /></em> message:</dt> <dd> - A element that is received is marked in the list of demanded elements as satisfied, validated and - saved and not further action is taken. + An element that is received is marked in the list of demanded elements as satisfied, validated and + saved and no further action is taken. Elements that are not demanded or already known are discarded. </dd> <dt><em><xref target="messages_done" format="title" /></em> message:</dt> <dd> Receiving the message <em><xref target="messages_done" format="title" /></em> indicates that all demands of the passive peer have been satisfied. The active peer then changes into the - state <strong>Finish Closing</strong> state. + <strong>Finish Closing</strong> state. <!-- IMPLEMENT: This is not implemented in code // Change --> - If the IBF is not finished decoding and the <em><xref target="messages_done" format="title" /></em> - is received the other peer is not in compliance with the protocol and the set reconciliation MUST be aborted. + If the IBF has not finished decoding and the <em><xref target="messages_done" format="title" /></em> + is received, the other peer is not in compliance with the protocol and the set reconciliation MUST be aborted. <!-- IMPLEMENT: This is not implemented in code // Change --> </dd> </dl> @@ -1090,7 +1090,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dd> <t> In this states the peers are waiting for all demands to be satisfied and for the synchronisation - to be completed. When all demands are satisfied the peer changes into state <strong>Finished</strong>. + to be completed. When all demands are satisfied the peer changes into <strong>Finished</strong>state. </t> </dd> </dl> @@ -1104,7 +1104,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </t> <t> The <xref target="modeofoperation_individual-elements" format="title" /> is only efficient on small set - differences or if the byte-size of the elements is large. Is the set difference is estimated to be large + differences or if the byte-size of the elements is large. If the set difference is estimated to be large the <xref target="modeofoperation_full-sync" format="title" /> is more efficient. The exact heuristics and parameters on which the protocol decides which mode should be used are described in the <xref target="performance" format="title" /> section of this document. @@ -1114,7 +1114,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) is always used. The first case is when one of the peers announces having an empty set. This is announced by setting the SETSIZE field in the <em><xref target="messages_se" format="title" /></em> to 0. - The second case is if the application requested full synchronization explicitly. + The second case is if the application requests full synchronization explicitly. This is useful for testing and should not be used in production. </t> <!-- @@ -1169,7 +1169,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1209,7 +1209,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) The <em>IBF</em> message is sent at the start of the protocol from the initiating peer in the transaction between <strong>Expect SE</strong> -> <strong>Expecting IBF Last</strong> or when the IBF does not decode and there is a role change in the transition between <strong>Active Decoding</strong> -> <strong>Expecting IBF Last</strong>. - This message is only sent if there are more than one IBF slice to sent, in the case there is just + This message is only sent if there are more than one IBF slice to be sent, in case there is just one slice the <xref target="messages_ibf_last" format="title" /> message is sent. </t> </section> @@ -1219,12 +1219,12 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <artwork name="" type="" align="left" alt=""><![CDATA[ 0 8 16 24 32 40 48 56 +-----+-----+-----+-----+-----+-----+-----+-----+ - | MSG SIZE | MSG TYPE |ORDER| PAD | + | MSG SIZE | MSG TYPE | SIZE | +-----+-----+-----+-----+-----+-----+-----+-----+ - | OFFSET | SALT | + |IMCS | OFFSET | SALT | +-----+-----+-----+-----+-----+-----+-----+-----+ - | IBF-SLICE - + / + | IBF-SLICE + +----- / / / / / ]]></artwork> @@ -1233,21 +1233,25 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte orderwhichdescribes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> the type of SETU_P2P_REQUEST_IBF as registered in <xref target="gana" format="title" /> in network byte order. </dd> - <dt>ORDER</dt> + + <dt>SIZE</dt> <dd> - is a 8-bit unsigned integer which signals the order of the IBF. The order of the IBF - is defined as the logarithm of the number of buckets of the IBF. + is a 32-bit unsigned integer which signals the number of buckets in the IBF. </dd> - <dt>PAD</dt> + <dt>IMCS</dt> <dd> - is 24-bit always set to zero + IBF max counter size is a 8-bit unsigned integer which describes the number of bit that + is required to store a single counter. This is used for the unpacking function as described + in the <xref target="performance_counter_variable_size" format="title" /> section. </dd> + + <dt>OFFSET</dt> <dd> is a 32-bit unsigned integer which signals the offset to the following ibf slices in the original. @@ -1259,16 +1263,18 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </dd> <dt>IBF-SLICE</dt> <dd> + <t> - are variable count of slices in an array. A single slice contains out multiple 64-bit IDSUMS, - 32-bit HASHSUMS and 8-bit COUNTERS. In the network order the array of IDSUMS is first, followed + are variable numbers of slices in an array. A single slice contains multiple 64-bit IDSUMS, + 32-bit HASHSUMS and 1-64bit COUNTERS of variable size. In the network order the array of IDSUMS is first, followed by an array of HASHSUMS and ended with an array of COUNTERS. Length of the array is defined - by MIN( 2^ORDER - OFFSET, MAX_BUCKETS_PER_MESSAGE). MAX_BUCKETS_PER_MESSAGE is defined as + by MIN( SIZE - OFFSET, MAX_BUCKETS_PER_MESSAGE). MAX_BUCKETS_PER_MESSAGE is defined as 32768 divided by the BUCKET_SIZE which is 13-byte (104-bit). </t> + <t> - To get the IDSUM field, all IDs who hit a bucket are added up with a binary XOR operation. - See <xref target="ibf_format_id_generation" format="title" /> for details about ID generation. + To get the IDSUM field, all IDs hitting a bucket are added up with a binary XOR operation. + See <xref target="ibf_format_id_generation" format="title" /> details about ID generation. </t> <t> The calculation of the HASHSUM field is done accordingly to the calculation of the IDSUM field: @@ -1279,7 +1285,11 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) The algorithm to find the correct bucket in which the ID and the HASH have to be added is described in detail in section <xref target="ibf_format_bucket_identification" format="title" />. </t> - + + <t> + Test vectors for an implementation can be found in the <xref target="test_vectors" format="title" /> section + </t> + <!-- FIXME: this is not sufficiently precise! How are the element IDs (and IDSUMS) computed? How are the HASHes (and HASHSUMS) computed? Which byte order is used? What role does @@ -1296,12 +1306,13 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) +-----+-----+-----+-----+-----+-----+-----+-----+ | IDSUMS | +-----+-----+-----+-----+-----+-----+-----+-----+ - | HASHSUMS | HASHSUMS | + | HASHSUMS | HASHSUMS | +-----+-----+-----+-----+-----+-----+-----+-----+ - | COUNTERS | COUNTERS | + | COUNTERS* | COUNTERS* | +-----+-----+-----+-----+-----+-----+-----+-----+ / / / / +* Counter size is variable. In this example the size is 32-bit (4-byte) ]]></artwork> </figure> </section> @@ -1313,7 +1324,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <section anchor="messages_ibf_last_description" numbered="true" toc="default"> <name>Description</name> <t> - This message indicates to the remote peer that all slices of the bloom filter have been sent. + This message indicates the remote peer that all slices of the bloom filter have been sent. The binary structure is exactly the same as the <xref target="messages_ibf_structure" format="title" /> of the message <xref target="messages_ibf" format="title" /> with a different "MSG TYPE" which is defined in <xref target="gana" format="title" /> "SETU_P2P_IBF_LAST". @@ -1343,7 +1354,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) as answer to a <em><xref target="messages_demand" format="title" /></em> message from the remote peer. The Element message can also be received in the <strong>Finish Closing</strong> or <strong>Finish Waiting</strong> state after receiving a <em><xref target="messages_done" format="title" /></em> message from the remote peer, in this - case the client changes to the <strong>Finished</strong> state as soon as all demands for elements have been satisfied. + case the peer changes to the <strong>Finished</strong> state as soon as all demands for elements have been satisfied. </t> <t> This message is exclusively sent in the <xref target="modeofoperation_individual-elements" format="title" />. @@ -1367,7 +1378,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1375,7 +1386,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </dd> <dt>E TYPE</dt> <dd> - element type is a 16-bit unsigned integer witch defines the element type for + element type is a 16-bit unsigned integer which defines the element type for the application. </dd> <dt>PADDING</dt> @@ -1384,7 +1395,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </dd> <dt>E SIZE</dt> <dd> - element size is 16-bit unsigned integer that signals the size of the elements data part. + element size is a 16-bit unsigned integer that signals the size of the elements data part. </dd> <dt>AE TYPE</dt> <dd> @@ -1408,7 +1419,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) The offer message is an answer to an <em><xref target="messages_inquiry" format="title" /></em> message and transmits the full hash of an element that has been requested by the other peer. This full hash enables the other peer to check if the element is really missing in its set and - eventually sends a <em><xref target="messages_demand" format="title" /></em> message for that a element. + eventually sends a <em><xref target="messages_demand" format="title" /></em> message for that element. </t> <t> The offer is sent and received only in the <strong>Active Decoding</strong> and in the <strong>Passive Decoding</strong> @@ -1434,7 +1445,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1482,7 +1493,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1503,7 +1514,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <name>Description</name> <t> The demand message is sent in the <strong>Active Decoding</strong> and in the <strong>Passive Decoding</strong> - state. It is a answer to a received <em><xref target="messages_offer" format="title" /></em> message + state. It is an answer to a received <em><xref target="messages_offer" format="title" /></em> message and is sent if the element described in the <em><xref target="messages_offer" format="title" /></em> message is missing in the peers set. In the normal workflow the answer to the demand message is an <em><xref target="messages_elements" format="title" /></em> message. @@ -1528,7 +1539,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and the header is included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1576,7 +1587,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1599,7 +1610,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <name>Description</name> <t> The full done message is sent in the <xref target="modeofoperation_full-sync" format="title" /> - to signal that all remaining elements of the set have been sent. The message is received and sent in in the + to signal that all remaining elements of the set have been sent. The message is received and sent in the <strong>Full Sending</strong> and in the <strong>Full Receiving</strong> state. When the full done message is received in <strong>Full Sending</strong> state the peer changes directly into <strong>Finished</strong> state. In <strong>Full Receiving</strong> state receiving a full done message initiates the sending of @@ -1622,7 +1633,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1668,7 +1679,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1684,7 +1695,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <section anchor="messages_se_description" numbered="true" toc="default"> <name>Description</name> <t> - The strata estimator is sent by the receiving peer at the start of the protocol right after the + The strata estimator is sent by the receiving peer at the start of the protocol, right after the <xref target="messages_operation_request" format="title" /> message has been received. </t> <t> @@ -1714,7 +1725,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1743,7 +1754,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </t> <t> Since the content of the message is the same as the uncompressed Strata Estimator, the details - aren't repeated here for details see section <xref target="messages_se" format="counter" />. + are not repeated here for details see section <xref target="messages_se" format="counter" />. </t> </section> </section> @@ -1765,7 +1776,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <strong>Full Receiving</strong> state. </t> <t> - After the last full element messages has been sent the <xref target="messages_full_done" format="title" /> message + After the last full element message has been sent the <xref target="messages_full_done" format="title" /> message is sent to conclude the full synchronisation of the element sending peer. </t> </section> @@ -1787,7 +1798,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) <dl> <dt>MSG SIZE</dt> <dd> - is 16-bit unsigned integer in network byte order witch describes the message size in bytes and the header is included. + is a 16-bit unsigned integer in network byte order which describes the message size in bytes and header included. </dd> <dt>MSG TYPE</dt> <dd> @@ -1795,7 +1806,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </dd> <dt>E TYPE</dt> <dd> - element type is a 16-bit unsigned integer witch defines the element type for + element type is a 16-bit unsigned integer which defines the element type for the application. </dd> <dt>PADDING</dt> @@ -1804,7 +1815,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) </dd> <dt>E SIZE</dt> <dd> - element size is 16-bit unsigned integer that signals the size of the elements data part. + element size is a 16-bit unsigned integer that signals the size of the elements data part. </dd> <dt>AE TYPE</dt> <dd> @@ -1831,7 +1842,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size) The decision which mode of operations is used is described by the following code: </t> <t> - The function takes as input the initial the local setsize, the remote setsize, the + The function takes as input the initial local setsize, the remote setsize, the by the strata estimator calculated difference, a static boolean that enforces full synchronisation mode of operation and the bandwith/roundtrips tradeoff. As output the function returns "FULL" if the full synchronisation @@ -1862,9 +1873,9 @@ FUNCTION decide_operation_mode(initial_local_setsize, remote_setsize, set_diff, </figure> </section> <section anchor="performance_formulas_full_sending_dec_first_send" numbered="true" toc="default"> - <name>Full Synchronisation: Decision witch peer sends elements first</name> + <name>Full Synchronisation: Decision which peer sends elements first</name> <t> - The following function determinate which peer starts sending its full set in full synchronisation + The following function determinates which peer starts sending its full set in full synchronisation mode of operation. </t> <t> @@ -1892,11 +1903,11 @@ FUNCTION decide_full_sending(initial_local_size, remote_setsize) <section anchor="performance_formula_ibf_parameters" numbered="true" toc="default"> <name>IBF Parameters</name> <t> - The following function calculate the required parameter to create an optimal sized IBF. These - parameter are the number of buckets and the number of buckets a single element is mapped to. + The following function calculates the required parameter to create an optimal sized IBF. These + parameters are the number of buckets and the number of buckets a single element is mapped to. </t> <t> - The function takes as input the setsize and returns a array with two numbers the total number of buckets + The function takes as input the setsize and returns an array with two numbers, the total number of buckets and the number of buckets a single element is mapped to. </t> <figure anchor="performance_formula_ibf_parameters_code"> @@ -1907,14 +1918,163 @@ FUNCTION decide_full_sending(initial_local_size, remote_setsize) # returns: Array: first element total nr ob buckets and # second element number of buckets per element -FUNCTION calculate_ibf_params (setsize): +FUNCTION calculate_ibf_params (set_diff): number_of_bucket_per_element = 4 - total_number_of_buckets = setsize + total_number_of_buckets = set_diff return [ total_number_of_buckets, number_of_bucket_per_element ] ]]></artwork> </figure> </section> </section> + + <section anchor="performance_counter_variable_size" numbered="true" toc="default"> + <name>Counter variable size</name> + <t> + Since the optimal number of bytes a counter in the IBF contains is very variable and varies + due to different parameters. Details are described in the BSC thesis + by Elias Summermatter, BFH 2021. + <!-- TODO link Thesis --> + Therefore a compression algorithm has been implemented, which always creates the IBF counter in optimal size. + and thus minimizes the bandwidth needed to transmit the IBF. + </t> + <t> + A simple algorithm is used for the compression. In a first step it is determined, which is the largest counter + and how many bits are needed to store it. In a second step for every counter of every bucket the counter + is stored in the bits determined in the first step and these are concatenated. + </t> + <t> + Three individual functions are used for this purpose. The first one is a function that iterates over each bucket of the + bucket of the IBF to get the maximum counter in the IBF. As second it needs + a function that compresses the counter of the IBF and thirdly a function that decompresses the IBF. + </t> + <figure anchor="performance_counter_variable_size_code"> + <artwork name="" type="" align="left" alt=""><![CDATA[ + +# INPUTS: +# ibf: The IBF +# OUTPUTS: +# returns: minimal amount of bytes required to store the counter + +FUNCTION ibf_get_max_counter(ibf) + max_counter=0 + FOR bucket IN ibf + IF bucket.counter > max_counter + max_counter = bucket.counter + + RETURN CEILING( log2 ( max_counter ) ) # next bigger discrete number of the binary logarithm of the max counter + +# INPUTS: +# ibf: The IBF +# offset: The offset which defines the starting point from which bucket the compress operation should start +# count: The number of buckets to in the array that should be compressed +# OUTPUTS: +# returns: An byte array of compressed counters to send over the network + +FUNCTION pack_counter(ibf, offset, count) + counter_bytes = ibf_get_max_counter(ibf) + store = 0 + store_bits = 0 + byte_ctr = 0 + buffer=[] + + FOR bucket IN ibf[offset] to ibf[count] + byte_len = counter_bytes + counter = bucket.counter + + WHILE byte_len > 0 + byte_to_write = 0 + + IF counter_bytes + store_bits >= 8 + bit_to_shift=0 + + IF store_bits > 0 OR counter_bytes > 8 + bit_free = 8 - store_bits + bit_to_shift = counter_bytes - bit_free + store = store << bit_free + + byte_to_write = (( counter >> bit_to_shift) | store) & 0xFF + bit_to_shift -= 8 - store_bits + counter = counter & (( 1 << counter_bytes ) - 1) + store = 0 + store_bits = 0 + + ELSE + IF 0 == store_bits + store = counter + ELSE + store = (store << counter_bytes) | counter + + store_bits = store_bits + byte_len + byte_len = 0 + BREAK + + buffer[byte_ctr] = byte_to_write + byte_ctr++ + NEXT_BUCKET + + # Write the last partial compressed byte to the buffer + buffer[byte_ctr] = store << (8 - store_bits) + byte_ctr++ + + RETURN buffer + +# INPUTS: +# ibf: The IBF +# offset: The offset which defines the starting point from which bucket the compress operation should start +# count: The number of buckets to in the array that should be compressed +# counter_bit_length: The bit length of the counter can be found in the ibf message in the ibf_counter_bit_length field +# packed_data: A byte array which contains the data packed with the pack_counter function +# OUTPUTS: +# returns: Nothing because the unpacked counter is saved directly into the IBF + +FUNCTION unpack_counter(ibf, offset, count, counter_bit_length, packed_data) + store = 0 + store_bits = 0 + byte_ctr = 0 + ibf_bucket_ctr = 0 + + number_bytes_read = CEILING((count * counter_bit_length) / 8) + + WHILE ibf_bucket_ctr <= (count -1) + byte_to_read = packed_data[byte_ctr] + byte_ctr++ + bit_to_pack_left = 8 + + WHILE bit_to_pack_left >= 0 + + # Prevent packet from reading more than required + IF ibf_bucket_ctr > (count -1) + return + + IF ( store_bits + bit_to_pack_left ) >= counter_bit_length + bit_use = counter_bit_length - store_bits + + IF store_bits > 0 + store = store << bit_use + + bytes_to_shift = bit_to_pack_left - bit_use + counter_partial = byte_to_read >> bytes_to_shift + store = store | counter_partial + ibf.counter[ibf_bucket_ctr] = store + byte_to_read = byte_to_read & (( 1 << bytes_to_shift ) - 1) + + bit_to_pack_left -= bit_use + ibf_bucket_ctr++ + store = 0 + store_bits = 0 + + ELSE + store_bits += bit_to_pack_left + + IF 0 == store_bits + store = byte_to_read + ELSE + store = store << bit_to_pack_left + store = store | byte_to_read + BREAK + ]]></artwork> + </figure> + </section> </section> <!-- @@ -1931,19 +2091,19 @@ FUNCTION calculate_ibf_params (setsize): <t> The security considerations in this document focus mainly on the security - goal of availability, the primary goal of the protocol is prevent an attacker from + goal of availability. The primary goal of the protocol is to prevent an attacker from wasting cpu and network resources of the attacked peer. </t> <t> - To prevent denial of service attacks its vital to check that peers can only - reconcile a set once in a pre defined time span. This is a predefined values and need - to be adapted on per use basis. To enhance reliability and to allow - failures in the protocol its possible to introduce a threshold for max failed reconciliation + To prevent denial of service attacks, it is vital to check that peers can only + reconcile a set once in a predefined time span. This is a predefined value and needs + to be adapted per use basis. To enhance reliability and to allow + failures in the protocol, it is possible to introduce a threshold for max failed reconciliation ties. <!-- IMPLEMENT: How to implement? IP? Other construct? --> </t> <t> - The formal format of all messages needs to be properly validated, this is important to prevent many + The formal format of all messages needs to be properly validated. This is important to prevent many attacks on the code. The application data should be validated by the application using the protocol not by the implementation of the protocol. In case the format validation fails the set operation MUST be terminated. @@ -1951,17 +2111,18 @@ FUNCTION calculate_ibf_params (setsize): </t> <t> - To prevent an attacker from sending a peer into a endless loop between active and passive decoding a - limitation for active/passive roll switches in required. This can be implemented by - a simple counter which terminates the operation after a predefined count of switches. - The count of switches needs to be defined as such that its very undroppable that more + To prevent an attacker from sending a peer into an endless loop between active and passive decoding, a + limitation for active/passive roll switches is required. This can be implemented by + a simple counter which terminates the operation after a predefined number of switches. + The number of switches needs to be defined in such a way that it is very unprobable that more switches are required an the malicious intend of the other peer can be assumed. + <!-- IMPLEMENT: This counter --> </t> <t> <!--- SHOULD BE HANDLED BY UNDERLYING PROTOCOL BUT HOW IS IT HANDLED? --> - Its important to close and purge connections after a given timeout + It is important to close and purge connections after a given timeout to prevent draining attacks. <!-- IMPLEMENT: How ist this handheld right now? --> </t> @@ -1976,7 +2137,7 @@ FUNCTION calculate_ibf_params (setsize): <name>Duplicated or Missing Message detection</name> <t> Most of the messages received need to be checked that they are not - received multiple times this is solved with a global store (message) + received multiple times. This is solved with a global store (message) and the following code </t> <figure anchor="security_generic_functions_missing_message_code"> @@ -2031,7 +2192,7 @@ FUNCTION isStoreComplete(store) ENDFOR return TRUE -# Returns the count of message received +# Returns the number of message received # INPUTS: # store: Store to count # OUTPUTS: @@ -2046,8 +2207,8 @@ FUNCTION getNumberOfMessage(store) <section anchor="security_generic_functions_element_nr" numbered="true" toc="default"> <name>Store Remote Peers Element Number</name> <t> - To prevent an other peer from requesting the same set multiple times - its important to memorize the number of elements a peer had in previous + To prevent an other peer from requesting the same set multiple times, + it is important to memorize the number of elements a peer had in previous reconciliation sessions. </t> <figure anchor="security_generic_functions_element_nr_code"> @@ -2096,12 +2257,12 @@ FUNCTION save_number_of_elements_last_sync(client_id, remote_setsize) <t> It needs to be checked that the full synchronisation is plausible according to the formula deciding which operation mode - is applicable this is achieved by calculating the upper and lower + is applicable. This is achieved by calculating the upper and lower boundaries of the number of elements in the other peers set. The lower boundary of number of elements can be easily memorized as result from the last synchronisation and the upper boundary can be estimated with prior knowledge of the maximal - plausible increase of element since the last reconciliation and + plausible increase of elements since the last reconciliation and the maximal plausible number of elements. <!-- Entscheindungsfindung: myset fulltransimtion or epstein 5.3 ist zu verckürtzt fall: nur 2 cases Es fehlt traidof nach paramer in perfomance section @@ -2155,12 +2316,12 @@ FUNCTION validate_messages_request_full(client_id, remote_setsize, local_setsize <dt><xref target="messages_ibf" format="title" /></dt> <dd> <t> - Its important do define a threshold to limit + It is important to define a threshold to limit the maximal number of IBFs that are expected from the other peer. <!-- change count to number full section --> This maximal plausible size can be calculated with the known inputs: - number of elements in my set and the pre defined applications upper - limit as described in the performance section. + number of elements in the local set and the predefined applications upper + limit, as described in the performance section. <!-- IMPLEMENT: Is this already checked?--> <!-- TODO: Link performance section --> That the other peer chooses the correct mode of operation MUST @@ -2222,7 +2383,7 @@ FUNCTION get_bucket_number(remote_setsize, local_setsize, initial_local_size, se <dt><xref target="messages_full_element" format="title" /></dt> <dd> <t> - If a full element is received the set of the other peer + If a full element is received, the set of the other peer is smaller than the set of the peer in the <strong>Expecting IBF</strong> state and the set difference is smaller than threshold for full synchronisation as described in the performance section. @@ -2322,9 +2483,9 @@ FUNCTION validate_messages_full_element_init(client_id, remote_setsize, <dt><xref target="messages_full_element" format="title" /></dt> <dd> <t> - When receiving full elements there needs to be checked that every - element is a valid element, no element is resized more than once and - not more or less elements are received as the other peer has committed + When receiving full elements there needs to be checked, that every + element is a valid element, no element has been received more than once and + not more or less elements are received, as the other peer has committed to in the beginning of the operation. Detail pseudocode implementation can be found in <xref target="security_states_expecting_ibf" format="title" />. <!-- IMPLEMENT: Is this check already implemented?--> @@ -2333,12 +2494,12 @@ FUNCTION validate_messages_full_element_init(client_id, remote_setsize, <dt><xref target="messages_full_done" format="title" /></dt> <dd> <t> - When receiving the full done message its important to check that + When receiving the full done message it is important to check that not less elements are received as the other peer has committed to send. The 512-bit hash of the complete reconciled set contained in - the full done message is required to ensures that both sets are truly identical. If - the sets differ a resynchronisation is required. The count of possible + the full done message is required to ensure that both sets are truly identical. If + the sets differ, a resynchronisation is required. The number of possible resynchronisation MUST be limited to prevent resource exhaustion attacks. <!-- IMPLEMENT: Is this check already implemented?--> </t> @@ -2382,7 +2543,7 @@ FUNCTION validate_messages_full_done(full_done_msg, full_element_msg_store, <dt><xref target="messages_ibf" format="title" /></dt> <dd> <t> - When receiving multiple IBFs its important to check that the other + When receiving multiple IBFs it is important to check that the other peer can only send as many IBFs as expected. The number of expected IBFs can be calculated with the knowledge of the set difference as described in the performance section. @@ -2397,37 +2558,37 @@ FUNCTION validate_messages_full_done(full_done_msg, full_element_msg_store, <section anchor="security_states_active_decoding" numbered="true" toc="default"> <name>Active Decoding</name> <t> - In the Active Decoding state its important to prevent an attacker from - generating and passing unlimited amount of IBF that do not decode or - even worse generate an IBF that is constructed to sends the peers in an endless loop. - To prevent an endless loop in decoding a loop detection should be implemented - the simplest solution would be to prevent decoding of more than a given amount of elements, - a more robust solution is to implement a algorithm that detects a loop by - analyzing past partially decoded IBFs to detect cycles. This can be archived + In the Active Decoding state it is important to prevent an attacker from + generating and passing an unlimited amount of IBFs, that do not decode or + even worse, generate an IBF constructed, to send the peers in an endless loop. + To prevent an endless loop in decoding, a loop detection should be implemented. + The simplest solution would be to prevent decoding of more than a given amount of elements. + A more robust solution is to implement a algorithm that detects a loop by + analyzing past partially decoded IBFs. This can be archived by saving the hash of all prior partly decoded IBFs hashes in a hashmap and check - for every inserted hash if it is already in the hashmap. + for every inserted hash, if it is already in the hashmap. <!-- TODO: Link some algo to find loops in directed graph --> <!-- IMPLEMENT: Implement an algo that detects loops in IBF decoding --> </t> <t> - If the IBF decodes more or less elements than are plausible the + If the IBF decodes more or less elements than are plausible, the operation MUST be terminated. The upper and lower threshold - for the decoded elements can be calculated with the peers set size + for the decoded elements can be calculated with the peers set sizes and the other peer committed set sizes from the <strong>Expecting IBF</strong> State. </t> - <!-- Wenne element mehrfach decodiert seitenwechseln daher detecten. --> + <!-- Wenn ein Element mehrfach decodiert seitenwechseln daher detecten. --> <t>Security considerations for received messages:</t> <dl> <dt><xref target="messages_offer" format="title" /></dt> <dd> <t> - If an offer for an element that never has been requested by - an inquiry or if an offer is received twice the operation MUST be terminated. - This requirement can be fulfilled by saving lists that keeps track of the state of - all send inquiries and offers. When answering offers these lists MUST be checked. + If an offer for an element, that never has been requested by + an inquiry or if an offer is received twice, the operation MUST be terminated. + This requirement can be fulfilled by saving lists that keep track of the state of + all sent inquiries and offers. When answering offers these lists MUST be checked. <!-- IMPLEMENT: Check to keep track of all send Inquiries --> </t> <figure anchor="security_states_active_decoding_offer_code"> @@ -2462,9 +2623,9 @@ FUNCTION validate_messages_offer(offer_msg,inquiry_msg_store) If an element that never has been requested by a demand or is received double the operation MUST be terminated. This requirement can be fulfilled by a simple table that keeps track - of the state of all send demands. + of the state of all sent demands. <!-- IMPLEMENT: Check to keep track of all send demands --> - If an invalid element is received the operation has failed and the + If an invalid element is received, the operation has failed and it MUST be terminated. <!-- IMPLEMENT: Termination if invalid element si revived --> </t> @@ -2497,10 +2658,10 @@ FUNCTION validate_messages_elements(element_msg,demand_msg_store) <dt><xref target="messages_demand" format="title" /></dt> <dd> <t> - For every received demand a offer has to be send in advance. If an demand - for an element is received that never has been offered or the offer already has - been answered with a demand the operation MUST be terminated. Its required to implement - a list which keeps track of the state of all send offers and received demands. + For every received demand an offer has to be sent in advance. If a demand + for an element is received, that never has been offered or the offer already has + been answered with a demand, the operation MUST be terminated. It is required to implement + a list which keeps track of the state of all sent offers and received demands. </t> <figure anchor="security_states_active_decoding_demand_code"> <artwork name="" type="" align="left" alt=""><![CDATA[ @@ -2532,14 +2693,14 @@ FUNCTION validate_messages_demand(demand_msg,offer_msg_store) <dt><xref target="messages_done" format="title" /></dt> <dd> <t> - The done message is only received if the IBF has been finished + The done message is only received, if the IBF has been finished decoding and all offers have been sent. If the done message is received before the decoding of the IBF is finished or all open offers and demands - have been answered the operation MUST be terminated. + have been answered, the operation MUST be terminated. <!-- IMPLEMENT: Check that in active decoding no done message is received before ibf has been decoded--> The 512-bit hash of the complete reconciled set contained in - the done message is required to ensures that both sets are truly identical. If - the sets differ a resynchronisation is required. The count of possible + the done message is required to ensure that both sets are truly identical. If + the sets differ, a resynchronisation is required. The number of possible resynchronisation MUST be limited to prevent resource exhaustion attacks. </t> <figure anchor="security_states_active_decoding_done_code"> @@ -2591,8 +2752,8 @@ FUNCTION validate_messages_done(done_msg, offer_msg_store, <section anchor="security_states_finish_closing" numbered="true" toc="default"> <name>Finish Closing</name> <t> - In case not all sent demands or inquiries have ben answered in time a pre defined - timeout the operation has failed and MUST be terminated. + In case not all sent demands or inquiries have been answered in time, a predefined + timeout, the operation has failed and MUST be terminated. </t> <!-- FIXME: In state diagram in finish closing only Elements can be received. What happens if i receive an offer? --> <t>Security considerations for received messages:</t> @@ -2616,14 +2777,14 @@ FUNCTION validate_messages_done(done_msg, offer_msg_store, <dt><xref target="messages_se" format="title" /></dt> <dd> <t> - In case the Strata Estimator does not decode the + In case the Strata Estimator does not decode, the operation MUST be terminated to prevent to get to a unresolvable state. <!-- IMPLEMENT: If in expect SE state the strata estimator does not decode terminate the operation --> The set difference calculated from the strata estimator needs to be plausible, - to ensure this multiple factors need to be considered: The absolute plausible maximum of - elements in a set which has to be predefined according + to ensure this, multiple factors need to be considered: The absolute plausible maximum of + elements in a set, which has to be predefined according to the use case and the maximal plausible element increase since the last - successful set reconciliation which should be either predefined or can be calculated + successful set reconciliation, which should be either predefined or can be calculated with the gaussian distribution function over all passed set reconciliations. <!-- IMPLEMENT: Terminate if in check expect se state for a max size difference is exceeded --> </t> @@ -2673,7 +2834,7 @@ FUNCTION validate_messages_se(se_msg, remote_setsize, local_setsize) <t> The peer in <strong>Full Receiving</strong> state needs to check that exactly the number of elements is received from the remote peer as - he initially committed too. If the remote peer transmits less or + he has initially committed too. If the remote peer transmits less or more elements the operation MUST be terminated. </t> <t> @@ -2684,14 +2845,14 @@ FUNCTION validate_messages_se(se_msg, remote_setsize, local_setsize) <dd> <t> When the full done message is received from the remote peer all - elements that the remote peer has committed to needs to be received + elements, that the remote peer has committed to, need to be received, otherwise the operation MUST be terminated. After receiving the full done message no future elements should be accepted. <!-- FIXME: Check that after full done in full receiving no elements can be received anymore! Additional state? --> The 512-bit hash of the complete reconciled set contained in - the full done message is required to ensures that both sets are truly identical. If - the sets differ a resynchronisation is required. The count of possible - resynchronisation MUST be limited to prevent resource exhaustion attacks. + the full done message is required to ensure that both sets are truly identical. If + the sets differ, a resynchronisation is required. The number of possible + resynchronisations MUST be limited to prevent resource exhaustion attacks. </t> <t> Pseudocode for implementation described in section <xref target="security_states_full_sending" format="title" />. @@ -2707,8 +2868,8 @@ FUNCTION validate_messages_se(se_msg, remote_setsize, local_setsize) <dd> <t> In case an IBF message is received by the peer a active/passive role switch - is initiated by the active decoding remote peer. In this instance the peer should - wait for all open offers and demands to be fulfilled to prevent + is initiated by the active decoding remote peer. In this moment the peer should + wait for all open offers and demands to be fulfilled, to prevent retransmission before switching into active decoding operation mode. A switch into active decoding mode should only be permitted for a predefined number of times as described in the top section @@ -2755,13 +2916,13 @@ FUNCTION validate_messages_ibf(offer_msg_store, demand_msg_store, element_msg_st <dt><xref target="messages_inquiry" format="title" /></dt> <dd> <t> - A check needs to be in place that prevents receiving a inquiry + A check needs to be in place that prevents receiving an inquiry for an element multiple times or more inquiries than are plausible. - The amount of inquiries that is plausible can be estimated by considering - known values as the remote set size, the local set size, the - predefined absolute maximum of elements in the set which is defined + The amount of inquiries that is plausible, can be estimated by considering + known values, as the remote set size, the local set size, the + predefined absolute maximum of elements in the set, which is defined by real world limitations. - To implement this restrictions a list with all received inquiries + To implement this restrictions, a list with all received inquiries should be stored and new inquiries should be checked against. </t> <figure anchor="security_states_passive_decoding_inquiry_code"> @@ -2817,7 +2978,7 @@ FUNCTION validate_messages_inquiry(inquiry_msg, set_diff) <section anchor="security_states_finish_waiting" numbered="true" toc="default"> <name>Finish Waiting</name> <t> - In case not all sent demands or inquiries have ben answered in time the operation + In case not all sent demands or inquiries have ben answered in time, the operation has failed and MUST be terminated. </t> <!-- FIXME: In state diagram in finish closing only Elements can be received. What happens if i receive an offer? --> @@ -3082,6 +3243,15 @@ Type | Name | References | Description <date year="2011"/> </front> </reference> + <reference anchor="secure_set_reconciliation" target="http://ti.bfh.ch"> + <front> + <title>Secure Set Reconciliation</title> + <author initials="E." surname="Summermatter" fullname="Elias Summermatter"> + <organization>BFH - Berner Fachhochschule</organization> + </author> + <date year="2021"/> + </front> + </reference> <!-- <reference anchor="ISO20022"> <front> @@ -3103,34 +3273,82 @@ Type | Name | References | Description <t> INPUTS: </t> - <t> - key: 0xFFFFFFFFFFFFFFFF (64-bit) - number_of_buckets_per_element: 3 - ibf_size: 300 - </t> + <figure anchor="test_vectors_map_function_inputs"> + <artwork name="" type="" align="left" alt=""><![CDATA[ +number_of_buckets_per_element: 3 +ibf_size: 300 + +key1: 0xFFFFFFFFFFFFFFFF (64-bit) +key2: 0x0000000000000000 (64-bit) +key3: 0x00000000FFFFFFFF (64-bit) +key4: 0xC662B6298512A22D (64-bit) +key5: 0xF20fA7C0AA0585BE (64-bit) + ]]></artwork> + </figure> <t> OUTPUT: </t> - <t> - ["222","32","10"] - </t> + <figure anchor="test_vectors_map_function_outputs"> + <artwork name="" type="" align="left" alt=""><![CDATA[ +key1: ["122","157","192"] +key2: ["85","243","126"] +key3: ["208","101","222"] +key4: ["239","269","56"] +key5: ["150","104","33"] + ]]></artwork> + </figure> </section> <section anchor="test_vectors_id_function" numbered="true" toc="default"> <name>ID Calculation Function</name> <t> INPUTS: </t> + <figure anchor="test_vectors_id_function_inputs"> + <artwork name="" type="" align="left" alt=""><![CDATA[ +element 1: 0xFFFFFFFFFFFFFFFF (64-bit) +element 2: 0x0000000000000000 (64-bit) +element 3: 0x00000000FFFFFFFF (64-bit) +element 4: 0xC662B6298512A22D (64-bit) +element 5: 0xF20fA7C0AA0585BE (64-bit) + ]]></artwork> + </figure> <t> - element: 0xadadadadadadadad - ibf_salt 0x3F (6-bit) + OUTPUT: </t> + <figure anchor="test_vectors_id_function_outputs"> + <artwork name="" type="" align="left" alt=""><![CDATA[ +element 1: 0x5AFB177B +element 2: 0x64AB557C +element 3: 0xCB5DB740 +element 4: 0x8C6A2BB2 +element 5: 0x7EC42981 + ]]></artwork> + </figure> + </section> + <section anchor="test_counter_compression_function" numbered="true" toc="default"> + <name>Counter Compression Function</name> <t> - OUTPUT: + INPUTS: </t> + <figure anchor="test_counter_compression_function_inputs"> + <artwork name="" type="" align="left" alt=""><![CDATA[ +counter serie 1: [1,8,10,6,2] (min bytes 4) +counter serie 2: [26,17,19,15,2,8] (min bytes 5) +counter serie 3: [4,2,0,1,3] (min bytes 3) + ]]></artwork> + </figure> <t> - 0xFFFFFFFFFFFFFFFF + OUTPUT: </t> + <figure anchor="test_counter_compression_function_outputs"> + <artwork name="" type="" align="left" alt=""><![CDATA[ +counter serie 1: 0x18A62 +counter serie 2: 0x3519BC48 +counter serie 3: 0x440B + ]]></artwork> + </figure> </section> </section> </back> </rfc> +