lsd0003

LSD0003: Set Union
Log | Files | Refs | README

commit 3d401040fbb64edb8b21f572645dffd756deb592
parent b492d44ea06c6a00f3f8ba8608bfe7438bf7827a
Author: Christian Grothoff <christian@grothoff.org>
Date:   Mon, 14 Jun 2021 14:27:07 +0200

fixes

Diffstat:
Mdraft-summermatter-set-union.xml | 106+++++++++++++++++++++++++++++++++++++++++++++----------------------------------
1 file changed, 60 insertions(+), 46 deletions(-)

diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml @@ -1829,7 +1829,7 @@ hashSum | 0x0101 | 0x5151 | 0x5050 | 0x0000 | More details can be found in section <xref target="performance_multi_se" format="title" />. </t> <t> - The IBFs in a strata estimator always have 79 buckets. The reason why can be found in Summermatter's work in section 3.4.2. <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> + The IBFs in a strata estimator always have 79 buckets. The reason why can be found in <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> in section 3.4.2. </t> <!-- Give a more precise reference into the thesis for this, do not cite the whole thesis! --> </section> @@ -2015,15 +2015,15 @@ hashSum | 0x0101 | 0x5151 | 0x5050 | 0x0000 | <t> The function takes as input the average element size, the local set size, the remote set size, the set differences as estimated from the strata estimator for both the local and remote sets, and the bandwidth/roundtrip tradeoff. - The function returns the exact <xref target="modeofoperation" format="title"/> as output: FULL_SYNC_REMOTE_SENDING_FIRST - if it is optimal that the other peer transmits his elements first, FULL_SYNC_LOCAL_SENDING_FIRST - if it is optimal that the elements are transmitted to the other peer directly and - DIFFERENTIAL_SYNC if the differential synchronisation is optimal. + The function returns the exact <xref target="modeofoperation" format="title"/> that is predicted to be best as output: FULL_SYNC_REMOTE_SENDING_FIRST + if it is likely cheapest that the other peer transmits his elements first, FULL_SYNC_LOCAL_SENDING_FIRST + if it is likely cheapest that the elements are transmitted to the other peer directly, and + DIFFERENTIAL_SYNC if the differential synchronisation is likely cheapest. </t> <t> The constant IBF_BUCKET_NUMBER_FACTOR is always 2 and IBF_MIN_SIZE is 37. The method for deriving - this can be found in the IBF parameter study in Summermatter's work in section 4.5.2. <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> + this can be found in the IBF parameter study in <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> in section 4.5.2. </t> <figure anchor="performance_formulas_operationmode_code"> <artwork name="" type="" align="left" alt=""><![CDATA[ @@ -2057,10 +2057,10 @@ FUNCTION decide_operation_mode(avg_es, # extreme cases, allowing us to omit this code. IF (0 == rss) RETURN FULL_SYNC_LOCAL_SENDING_FIRST - IF END + END IF IF (0 == lss) RETURN FULL_SYNC_REMOTE_SENDING_FIRST - IF END + END IF # Estimate required transferred bytes when doing a full synchronisation # and transmitting local set first. @@ -2132,15 +2132,15 @@ FUNCTION decide_operation_mode(avg_es, END IF ELSE RETURN DIFFERENTIAL_SYNC - IF END + END IF ]]></artwork> </figure> </section> <section anchor="performance_formula_ibf_parameters" numbered="true" toc="default"> <name>IBF Size</name> <t> - The functions, described in this section, calculate the optimal initial size (initial_ibf_size) - and in case of decoding failure, the optimal next IBF size (get_next_ibf_size). + The functions, described in this section, calculate a good initial size (initial_ibf_size) + and in case of decoding failure, a good next IBF size (get_next_ibf_size). </t> <t> These algorithms are described and justified in more details in @@ -2157,6 +2157,10 @@ FUNCTION decide_operation_mode(avg_es, # next_size: Size of the initial IBF FUNCTION initial_ibf_size(sd) + # We do not go below 37, as 37 buckets should + # basically always be below one MTU, so there is + # little to be gained, while a smaller IBF would + # increase the chance of a decoding failure. return MAX(37, IBF_BUCKET_NUMBER_FACTOR * sd); FUNCTION END @@ -2170,6 +2174,8 @@ FUNCTION END FUNCTION get_next_ibf_size(de, lis) next_size = IBF_BUCKET_NUMBER_FACTOR * (lis - de) + # The MAX operation here also ensures that the + # result is positive. return MAX(37, next_size); FUNCTION END ]]></artwork> @@ -2179,7 +2185,7 @@ FUNCTION END <name>Number of Buckets an Element is Hashed into</name> <t> The number of buckets an element is hashed to is hardcoded to 3. Reasoning and - justification can be found in the following work + justification can be found in <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> in the IBF parameter performance study in section 4.5.2. </t> @@ -2188,22 +2194,25 @@ FUNCTION END <section anchor="performance_counter_variable_size" numbered="true" toc="default"> <name>Variable Counter Size</name> <t> - Since the optimal number of bytes a counter in the IBF contains is very variable and varies - due to different parameters. Details are described in the BSC thesis - by Summermatter <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> - in section 3.2. - Therefore a packing algorithm has been implemented, which always creates the IBF counter in optimal size. - and thus minimizes the bandwidth needed to transmit the IBF. + The number of bits required to represent the counters of an IBF varies + due to different parameters as described in section 3.2 of + <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/>. + Therefore, a packing algorithm has been implemented. + This algorithm encodes the IBF counters in their optimal bit-width + and thus minimizes the bandwidth needed to transmit the IBF. </t> <t> - A simple algorithm is used for the packing. In a first step it is determined, which is the largest counter - and how many bits are needed to store it. In a second step for every counter of every bucket, the counter - is stored in the bit length, determined in the first step and these are concatenated. + A simple algorithm is used for the packing. + In a first step it is determined, which is the largest counter. + The the base 2 logarithm then determines how many bits are needed to store it. + In a second step for every counter of every bucket, the counter + is stored using this many bits. The resulting bit sequence is then simply concatenated. </t> <t> - Three individual functions are used for this purpose. The first one is a function that iterates over each bucket of - the IBF to get the maximum counter in the IBF. As second it needs - a function that pack the counter of the IBF and thirdly a function that unpack the counter. + Three individual functions are used for this purpose. + The first one is a function that iterates over each bucket of + the IBF to get the maximum counter in the IBF. The second function + packs the counters of the IBF, and the third function that unpacks the counters. </t> <figure anchor="performance_counter_variable_size_code"> <artwork name="" type="" align="left" alt=""><![CDATA[ @@ -2211,14 +2220,15 @@ FUNCTION END # INPUTS: # ibf: The IBF # OUTPUTS: -# returns: Minimal amount of bytes required to store the counter +# returns: Minimal amount of bits required to store the counter FUNCTION ibf_get_max_counter(ibf) - max_counter=0 + max_counter=1 # convince static analysis that we never take log2(0) FOR bucket IN ibf IF bucket.counter > max_counter max_counter = bucket.counter - + END IF + END FOR # next bigger discrete number of the binary logarithm of the max counter RETURN CEILING( log2 ( max_counter ) ) @@ -2245,19 +2255,18 @@ FUNCTION pack_counter(ibf, offset, count) byte_to_write = 0 IF counter_bytes + store_bits >= 8 - bit_to_shift=0 + bit_to_shift = 0 IF store_bits > 0 OR counter_bytes > 8 bit_free = 8 - store_bits bit_to_shift = counter_bytes - bit_free store = store << bit_free - + END IF byte_to_write = (( counter >> bit_to_shift) | store) & 0xFF bit_to_shift -= 8 - store_bits - counter = counter & (( 1 << counter_bytes ) - 1) + counter = counter & ((1 << counter_bytes) - 1) store = 0 store_bits = 0 - ELSE IF 0 == store_bits store = counter @@ -2267,14 +2276,16 @@ FUNCTION pack_counter(ibf, offset, count) store_bits = store_bits + byte_len byte_len = 0 BREAK - + END IF buffer[byte_ctr] = byte_to_write - byte_ctr++ - NEXT_BUCKET + byte_ctr = byte_ctr + 1 + END WHILE + END FOR # Write the last partial packed byte to the buffer + # TODO: should we not limit this to the case where store_bits > 0? buffer[byte_ctr] = store << (8 - store_bits) - byte_ctr++ + byte_ctr = byte_ctr + 1 RETURN buffer @@ -2306,15 +2317,16 @@ FUNCTION unpack_counter(ibf, offset, count, cbl, pd) WHILE bit_to_pack_left >= 0 # Prevent packet from reading more than required - IF ibf_bucket_ctr > (count -1) - return + IF ibf_bucket_ctr > (count - 1) + RETURN + END IF - IF ( store_bits + bit_to_pack_left ) >= cbl + IF (store_bits + bit_to_pack_left) >= cbl bit_use = cbl - store_bits IF store_bits > 0 store = store << bit_use - + END IF bytes_to_shift = bit_to_pack_left - bit_use counter_partial = byte_to_read >> bytes_to_shift store = store | counter_partial @@ -2325,9 +2337,8 @@ FUNCTION unpack_counter(ibf, offset, count, cbl, pd) ibf_bucket_ctr++ store = 0 store_bits = 0 - ELSE - store_bits += bit_to_pack_left + store_bits = store_bits + bit_to_pack_left IF 0 == store_bits store = byte_to_read @@ -2335,6 +2346,9 @@ FUNCTION unpack_counter(ibf, offset, count, cbl, pd) store = store << bit_to_pack_left store = store | byte_to_read BREAK + END IF + END WHILE + END WHILE ]]></artwork> </figure> </section> @@ -2382,7 +2396,7 @@ FUNCTION se_key_salting(value, salt) ]]></artwork> </figure> <t> - Performance study and details about the reasoning for the used methods can be found in the work of Summermatter in section + Performance study and details about the reasoning for the used methods can be found in <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> in section 3.4.1 under the title "Added Support for Multiple Strata Estimators". <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> </t> @@ -2687,7 +2701,7 @@ FUNCTION END ]]></artwork> </figure> <t> - This is based on the work of Summermatter. More details can be found in his thesis <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> in section 5.3 (Message Control Flow). + This is based on <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/>, section 5.3 (Message Control Flow). </t> </section> @@ -2700,8 +2714,8 @@ FUNCTION END </t> <t> The question after how many active/passive switches it can be assumed that the other peer is not honest, - depends on many factors and can only be solved probabilistically. In the work of Summermatter <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> - in the section 5.4 this is described in detail. From this work it is concluded that the probability of decoding failure is about + depends on many factors and can only be solved probabilistically. This is described in detail in <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> + in section 5.4. From this work it is concluded that the probability of decoding failure is about 15% for each round. The probability that there will be n active/passive changes is given by 0.15^{round number}. Which means that after about 30 active/passive switches it can be said with a certainty of 2^80 that one of the peers is not following the protocol. It is reasonable that a maximum of 30 active/passive changes should be set. @@ -3085,7 +3099,7 @@ FUNCTION END <name>Constants</name> <t> The following table contains constants used by the protocol. The constants marked with a * are - validated through experiments in Summermatter's work <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/> in sections given below. + validated through experiments in <xref target="byzantine_fault_tolerant_set_reconciliation" format="default"/>. </t> <figure anchor="figure_constants"> <artwork name="" type="" align="left" alt=""><![CDATA[