commit 97a7e6ff4153fb10e009c43558683b5670968306
parent dc5c89dbe55665153cd8b4bd3b2cad156d2dffda
Author: Elias Summermatter <elias.summermatter@seccom.ch>
Date: Wed, 9 Jun 2021 09:06:57 +0200
Merge branch 'master' of git+ssh://gnunet.org/lsd0003
Diffstat:
1 file changed, 40 insertions(+), 37 deletions(-)
diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
@@ -62,7 +62,7 @@
<name>Introduction</name>
<t>
This document describes a Byzantine fault-tolerant set reconciliation protocol used to efficient and securely
- synchronize two sets of elements between two peers.
+ compute the union of two sets across a network.
</t>
<t>
This Byzantine fault-tolerant set reconciliation
@@ -97,7 +97,7 @@
<t>
The protocol described in this report is generic and
suitable for a wide range of applications. As a result,
- the internal structure of the elements in the sets must
+ the internal structure of the elements in the sets MUST
be defined and verified by the application using the
protocol. This document thus does not cover the element
structure, except for imposing a limit on the maximum
@@ -108,7 +108,7 @@
the number of network round-trips and the number of bytes
sent over the network. Thus, for the protocol to choose
the right parameters for a given situation, applications
- using the protocol must provide a parameter that specifies
+ using the protocol SHOULD provide a parameter that specifies
the cost-ratio of round-trips vs. bandwidth usage. Given
this trade-off factor, the protocol will then choose parameters
that minimize the total execution costs. In particular, there
@@ -241,7 +241,7 @@
]]></artwork>
</figure>
<t>
- The parameters L and k depend on the set size and must be
+ The parameters L and k depend on the set size and MUST be
chosen carefully to ensure that the BF does not return too
many false-positives.
</t>
@@ -296,7 +296,7 @@
order of a bucket reaches "infinity", it is no longer incremented or decremented.
</t>
<t>
- The parameters L and k and the number of bits allocated to the counters should depend on the set size.
+ The parameters L and k and the number of bits allocated to the counters depend on the set size.
An IBF will degenerate when subjected to insert and remove iterations of different elements, and eventually all
buckets will reach "infinity". The speed of the degradation will depend on the choice of L and k in
relation to the number of elements stored in the IBF.
@@ -487,12 +487,12 @@ hashSum | 0x0000 | 0x4242 | 0x0000 | 0x4242 |
decoder returns an IDSUM value that is actually the XOR of several IDSUMs.
This is primarily detected by checking that the HASHSUM is the hash of the IDSUM.
Only if the HASHSUM also matches, the bucket could be pure. Additionally,
- one should check that the IDSUM value actually would be mapped by M to
+ one MUST check that the IDSUM value actually would be mapped by M to
the respective bucket. If not, there was a hash collision.
</t>
<t>
The very rare case that after all these checks a bucket is still
- falsely identified as pure must be detected (say by determining that
+ falsely identified as pure MUST be detected (say by determining that
extracted element IDs do not match any actual elements), and addressed
at a higher level in the protocol. As these failures are probabilistic
and depend on element IDs and the IBF construction, they can typically
@@ -501,7 +501,7 @@ hashSum | 0x0000 | 0x4242 | 0x0000 | 0x4242 |
a different mapping function M.
A more common scenario (especially if L was too small) is that
IBF decoding fails because there is no pure bucket. In this case, the
- higher-level protocol should also retry using different parameters.
+ higher-level protocol SHOULD also retry using different parameters.
</t>
<t>
Suppose the IBF contains a pure bucket. In this case, the IDSUM in the
@@ -509,7 +509,7 @@ hashSum | 0x0000 | 0x4242 | 0x0000 | 0x4242 |
to remove that element from the IBF (by inserting it if the counter
was negative, and by removing it if the counter was positive). This
is likely to cause other buckets to become pure, allowing further
- elements to be decoded. Eventually, decoding should succeed with
+ elements to be decoded. Eventually, decoding ought to succeed with
all counters and IDSUM and HASHSUM values reach zero. However, it is also
possible that an IBF only partly decodes and then decoding fails after
obtaining some elements.
@@ -576,10 +576,10 @@ hashSum | 0x0000 | 0x0000 | 0x0000 | 0x0000 |
IBF2 represents set B. Then this set difference operation will compute IBF3 which
represents the set A - B --- without having to transfer the elements from set A or B.
- To calculate the IBF representing this set difference, both IBFs must have the same
+ To calculate the IBF representing this set difference, both IBFs MUST have the same
length L, the same number of buckets per element k and use the same map M. Given this,
one can compute the IBF representing the set difference by taking the XOR of the IDSUM and HASHSUM values
- of the respective buckets and subtracting the respective counters. Care should be taken
+ of the respective buckets and subtracting the respective counters. Care MUST be taken
to handle overflows and underflows by setting the counter to "infinity" as necessary.
The result is a new IBF with the same number of buckets representing the set difference.
</t>
@@ -645,10 +645,12 @@ hashSum | 0x0101 | 0x0101 | 0x5050 | 0x0000 |
<t>
To facilitate a reasonably CPU-efficient
implementation, this specification requires the
- IBF counter always to use 8 bits. Fewer bits
+ IBF counter always to use 8 bits.
+ FIXME: I thought we lifted this constraint!?
+ Fewer bits
would result in a particularly inefficient
implementation, while more bits are rarely useful
- as sets with so many elements should likely be
+ as sets with so many elements should be
represented using a larger number of buckets. This
means the counter of this design can reach a
minimum of -127 and a maximum of 127 before the
@@ -656,7 +658,7 @@ hashSum | 0x0101 | 0x0101 | 0x5050 | 0x0000 |
</t>
<t>
For the "IDSUM", we always use a 64-bit representation.
- The IDSUM value must have sufficient entropy for the
+ The IDSUM value MUST have sufficient entropy for the
mapping function M to yield reasonably random buckets
even for very large values of L. With a 32 bit
value the chance that multiple elements may be mapped
@@ -679,7 +681,7 @@ hashSum | 0x0101 | 0x0101 | 0x5050 | 0x0000 |
to handle occasional collisions, so while with
32-bits there remains a chance of
accidental collisions, at 32 bit the chance is
- generally believed to be sufficiently small
+ generally believed to be sufficiently small
for the protocol to handle those cases efficiently
for a wide range of use-cases. Smaller hash
values would safe bandwidth, but also drastically
@@ -812,7 +814,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
Basically a Strata Estimator (SE) is a series of IBFs (with a rather small value of L)
in which increasingly large subsets of the full set
of elements are added to each IBF. For the n-th IBF, the function selecting the
- subset of elements should sample to select (probabilistically) 1/(2^n) of all
+ subset of elements MUST sample to select (probabilistically) 1/(2^n) of all
elements. This can be done by counting the number of trailing bits set to "1"
in an element ID, and then inserting the element into the IBF identified by that
counter. As a result, all elements will be mapped to one IBF, with the n-th
@@ -852,7 +854,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
<t>
The second possibility is that the difference of the sets is small compared to the set size.
In this case, an efficient "delta" synchronization mode is more efficient. These two possibilities given,
- the first steps of the protocol are used to determine which mode should be used.
+ the first steps of the protocol are used to determine which mode MUST be used.
</t>
<t>
@@ -979,7 +981,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
<dt><em><xref target="messages_ibf" format="title" /></em> message:</dt>
<dd>
If an <em><xref target="messages_ibf" format="title" /></em> message is received, this
- indicates that decoding of the IBF on the active site has failed and roles should be swapped.
+ indicates that decoding of the IBF on the active site has failed and roles will be swapped.
The receiving passive peer transitions into the <strong>Expecting IBF Last</strong> state,
and waits for more <em><xref target="messages_ibf" format="title" /></em> messages
or the final <em><xref target="messages_ibf_last" format="title" /></em> message to be received.
@@ -1050,8 +1052,10 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
set there was probably a bucket decoded that was not pure so potentially all <em><xref target="messages_offer" format="title" /></em>
and <em><xref target="messages_demand" format="title" /></em> messages sent later are invalid
in this case a role change active -> passive with a new IBF is easiest.
+
If a demand for the same element is received multiple times the demands should be
discarded.
+ FIXME: I thought demanding the same element multiple times is now verboten?
</dd>
<dt><em><xref target="messages_elements" format="title" /></em> message:</dt>
<dd>
@@ -1098,7 +1102,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
differences or if the byte-size of the elements is large. If the set difference is estimated to be large
the <xref target="modeofoperation_full-sync" format="title" /> is
more efficient. The exact heuristics and parameters on which the protocol decides which mode
- should be used are described in the <xref target="performance" format="title" /> section of this document.
+ MUST be used are described in the <xref target="performance" format="title" /> section of this document.
</t>
<t>
There are two main cases when a <xref target="modeofoperation_full-sync" format="title" />
@@ -1106,7 +1110,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
The first case is when one of the peers announces having an empty set. This is announced by setting
the SETSIZE field in the <em><xref target="messages_se" format="title" /></em> to 0.
The second case is if the application requests full synchronization explicitly.
- This is useful for testing and should not be used in production.
+ This is useful for testing and MUST NOT be used in production.
</t>
</section>
</section>
@@ -1211,7 +1215,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
<dd>
the type of SETU_P2P_REQUEST_IBF as registered in <xref target="gana" format="title" /> in network byte order.
</dd>
-
+
<dt>SIZE</dt>
<dd>
is a 32-bit unsigned integer which signals the number of buckets in the IBF.
@@ -1223,7 +1227,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
in the <xref target="performance_counter_variable_size" format="title" /> section.
</dd>
-
+
<dt>OFFSET</dt>
<dd>
is a 32-bit unsigned integer which signals the offset to the following ibf slices in the original.
@@ -1235,7 +1239,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
</dd>
<dt>IBF-SLICE</dt>
<dd>
-
+
<t>
are variable numbers of slices in an array. A single slice contains multiple 64-bit IDSUMS,
32-bit HASHSUMS and 1-64bit COUNTERS of variable size. In the network order the array of IDSUMS is first, followed
@@ -1244,7 +1248,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
defined by MIN( SIZE - OFFSET, MAX_BUCKETS_PER_MESSAGE). MAX_BUCKETS_PER_MESSAGE is defined as
32768 divided by the BUCKET_SIZE which is 13-byte (104-bit).
</t>
-
+
<t>
To get the IDSUM field, all IDs hitting a bucket are added up with a binary XOR operation.
See <xref target="ibf_format_id_generation" format="title" /> details about ID generation.
@@ -1258,7 +1262,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element, ibf_size)
The algorithm to find the correct bucket in which the ID and the HASH have to be added
is described in detail in section <xref target="ibf_format_bucket_identification" format="title" />.
</t>
-
+
<t>
Test vectors for an implementation can be found in the <xref target="test_vectors" format="title" /> section
</t>
@@ -1977,8 +1981,8 @@ FUNCTION ibf_get_max_counter(ibf)
# INPUTS:
# ibf: The IBF
-# offset: The offset which defines the starting point from which bucket the compress operation should start
-# count: The number of buckets to in the array that should be compressed
+# offset: The offset which defines the starting point from which bucket the compress operation starts
+# count: The number of buckets to in the array that will be compressed
# OUTPUTS:
# returns: An byte array of compressed counters to send over the network
@@ -2032,8 +2036,8 @@ FUNCTION pack_counter(ibf, offset, count)
# INPUTS:
# ibf: The IBF
-# offset: The offset which defines the starting point from which bucket the compress operation should start
-# count: The number of buckets to in the array that should be compressed
+# offset: The offset which defines the starting point from which bucket the compress operation starts
+# count: The number of buckets to in the array that will be compressed
# counter_bit_length: The bit length of the counter can be found in the ibf message in the ibf_counter_bit_length field
# packed_data: A byte array which contains the data packed with the pack_counter function
# OUTPUTS:
@@ -2107,7 +2111,7 @@ FUNCTION unpack_counter(ibf, offset, count, counter_bit_length, packed_data)
</t>
<t>
The formal format of all messages needs to be properly validated. This is important to prevent many
- attacks on the code. The application data should be validated by the application using
+ attacks on the code. The application data MUST be validated by the application using
the protocol not by the implementation of the protocol.
In case the format validation fails the set operation MUST be terminated.
</t>
@@ -2540,7 +2544,7 @@ FUNCTION validate_messages_full_done(full_done_msg, full_element_msg_store,
In the Active Decoding state it is important to prevent an attacker from
generating and passing an unlimited amount of IBFs, that do not decode or
even worse, generate an IBF constructed, to send the peers in an endless loop.
- To prevent an endless loop in decoding, a loop detection should be implemented.
+ To prevent an endless loop in decoding, a loop detection MUST be implemented.
The simplest solution would be to prevent decoding of more than a given amount of elements.
A more robust solution is to implement a algorithm that detects a loop by
analyzing past partially decoded IBFs. This can be archived
@@ -2750,7 +2754,7 @@ FUNCTION validate_messages_done(done_msg, offer_msg_store,
to ensure this, multiple factors need to be considered: The absolute plausible maximum of
elements in a set, which has to be predefined according
to the use case and the maximal plausible element increase since the last
- successful set reconciliation, which should be either predefined or can be calculated
+ successful set reconciliation, which SHOULD be either predefined or can be calculated
with the gaussian distribution function over all passed set reconciliations.
</t>
<t>
@@ -2812,7 +2816,7 @@ FUNCTION validate_messages_se(se_msg, remote_setsize, local_setsize)
When the full done message is received from the remote peer all
elements, that the remote peer has committed to, need to be received,
otherwise the operation MUST be terminated. After receiving the
- full done message no future elements should be accepted.
+ full done message, future elements MUST NOT be accepted.
The 512-bit hash of the complete reconciled set contained in
the full done message is required to ensure that both sets are truly identical. If
the sets differ, a resynchronisation is required. The number of possible
@@ -2832,10 +2836,10 @@ FUNCTION validate_messages_se(se_msg, remote_setsize, local_setsize)
<dd>
<t>
In case an IBF message is received by the peer a active/passive role switch
- is initiated by the active decoding remote peer. In this moment the peer should
+ is initiated by the active decoding remote peer. In this moment the peer MUST
wait for all open offers and demands to be fulfilled, to prevent
retransmission before switching into active decoding operation mode.
- A switch into active decoding mode should only be permitted for
+ A switch into active decoding mode MUST only be permitted for
a predefined number of times as described in the top section
of the security section.
</t>
@@ -2886,7 +2890,7 @@ FUNCTION validate_messages_ibf(offer_msg_store, demand_msg_store, element_msg_st
predefined absolute maximum of elements in the set, which is defined
by real world limitations.
To implement this restrictions, a list with all received inquiries
- should be stored and new inquiries should be checked against.
+ MUST be stored and new inquiries MUST be checked against.
</t>
<figure anchor="security_states_passive_decoding_inquiry_code">
<artwork name="" type="" align="left" alt=""><![CDATA[
@@ -3180,4 +3184,3 @@ counter serie 3: 0x440B
</section>
</back>
</rfc>
-