diff options
Diffstat (limited to 'src')
-rw-r--r-- | src/testbed/barriers.README.org | 65 |
1 files changed, 65 insertions, 0 deletions
diff --git a/src/testbed/barriers.README.org b/src/testbed/barriers.README.org new file mode 100644 index 000000000..4547009e2 --- /dev/null +++ b/src/testbed/barriers.README.org | |||
@@ -0,0 +1,65 @@ | |||
1 | * Description | ||
2 | The barriers component of testbed facilitates coordination among the peers run | ||
3 | by the testbed and the experiment driver. The concept is similar to the barrier | ||
4 | synchronisation mechanism found in parallel programming or multithreading | ||
5 | paradigms - a peer waits at a barrier upon reaching it until it is crossed i.e, | ||
6 | reached by a predefined number of peers. This predefined number peers required | ||
7 | to cross a barrier is also called quorum. | ||
8 | |||
9 | Coordination among the peers and the experiment driver is achieved through the | ||
10 | barriers service and its respective barriers API. The barriers API provides the | ||
11 | following functions: | ||
12 | |||
13 | 1) barrier_init(): function to initialse a barrier in the experiment | ||
14 | 2) barrier_cancel(): function to cancel a barrier which has been initialised | ||
15 | before | ||
16 | 3) barrier_wait(): function to signal barrier service that the caller has reached | ||
17 | a barrier and is waiting for it to be crossed | ||
18 | 4) barrier_wait_cancel(): function to stop waiting for a barrier to be crossed | ||
19 | |||
20 | Among the above functions, the first two, namely barrier_init() and | ||
21 | barrier_cacel() are used by experiment drivers. All barriers should be | ||
22 | initialised by the experiment driver by calling barrier_init(). This function | ||
23 | takes a name to identify the barrier and a notification callback for notifying | ||
24 | the experiment driver when the barrier is crossed. The function | ||
25 | barrier_cancel() cancels an initialised barrier and frees the resources | ||
26 | allocated for it. This function can be called upon a initialised barrier before | ||
27 | it is crossed. | ||
28 | |||
29 | The remaining two functions barrier_wait() and barrier_wait_cancel() are used in | ||
30 | the peer's processes. barrier_wait() connects to the local barrier service and | ||
31 | registers that the caller has reached the barrier and is waiting for the barrier | ||
32 | to be crossed. Note that this function can only be used by peers which are | ||
33 | started by testbed as this function tries to access the local barrier service | ||
34 | which is part of the testbed controller service. Calling barrier_wait() on an | ||
35 | uninitialised barrier (or not-yet-initialised) barrier results in failure. | ||
36 | barrier_wait_cancel() cancels the notification registered by barrier_wait(). | ||
37 | |||
38 | |||
39 | * Implementation | ||
40 | Since barriers involve coordination between experiment driver and peers the | ||
41 | barrier service is split into two components. The first component responds to | ||
42 | the barrier API used by the experiment driver (functions barrier_init() and | ||
43 | barrier_cancel()) and the second component to the barrier API used by peers | ||
44 | (functions barrier_wait() and barrier_wait_cancel()) | ||
45 | |||
46 | Calling barrier_init() sends a BARRIER_INIT message to the master controller. | ||
47 | The master controller then registers a barrier and calls barrier_init() for each | ||
48 | its subcontrollers. In this way barrier initialisation is propagated to the | ||
49 | controller hierarchy. While propagating initialisation, any errors at a | ||
50 | subcontroller such as timeout during further propagation are reported up the | ||
51 | hierarchy back to the experiment driver. | ||
52 | |||
53 | Similar to barrier_init(), barrier_cancel() propagates BARRIER_CANCEL message | ||
54 | which causes controllers to remove an initialised barrier. | ||
55 | |||
56 | The second component, according to gnunet architecture, is actually an another | ||
57 | service but runs in the same binary `gnunet-service-testbed'; the reason is | ||
58 | that it requires access to barrier data created by the first component. This | ||
59 | component responds to BARRIER_WAIT messages from local peers when they call | ||
60 | barrier_wait(). Upon receiving BARRIER_WAIT message, the service checks if the | ||
61 | requested barrier has been initialised before and it was not initialised the | ||
62 | an error status is sent through BARRIER_STATUS message to the local peer and the | ||
63 | connection from the peer is terminated. If the barrier is initialised before, | ||
64 | the barrier's counter for reached peers is incremented and a notification is | ||
65 | registered to notify this peer when the barrier is reached. | ||