aboutsummaryrefslogtreecommitdiff
path: root/RATIONALE
diff options
context:
space:
mode:
authorChristian Grothoff <christian@grothoff.org>2011-11-19 20:26:33 +0000
committerChristian Grothoff <christian@grothoff.org>2011-11-19 20:26:33 +0000
commit461cf105fd0e7bf0eb7b7ae0481b327db44e950c (patch)
tree21580e1c9b5c1e0d6c91c8d10c057188aee9b7df /RATIONALE
parentdbaa827eaaeded675980aa2f8f8af87f82923df6 (diff)
downloadgnunet-461cf105fd0e7bf0eb7b7ae0481b327db44e950c.tar.gz
gnunet-461cf105fd0e7bf0eb7b7ae0481b327db44e950c.zip
moved to drupal
Diffstat (limited to 'RATIONALE')
-rw-r--r--RATIONALE316
1 files changed, 0 insertions, 316 deletions
diff --git a/RATIONALE b/RATIONALE
deleted file mode 100644
index 1851aeb78..000000000
--- a/RATIONALE
+++ /dev/null
@@ -1,316 +0,0 @@
1This document is a summary of the changes made to GNUnet for version
20.9.x (from 0.8.x) and what this major redesign tries to address.
3
4First of all, the redesign does not (intentionally) change anything
5fundamental about the application-level protocols or how files are
6encoded and shared. However, it is not protocol-compatible due to
7other changes that do not relate to the essence of the application
8protocols. This choice was made since productive development and
9readable code were considered more important than compatibility at
10this point.
11
12The redesign tries to address the following major problem groups
13describing isssues that apply more or less to all GNUnet versions
14prior to 0.9.x:
15
16
17PROBLEM GROUP 1 (scalability):
18* The code was modular, but bugs were not. Memory corruption
19 in one plugin could cause crashes in others and it was not
20 always easy to identify the culprit. This approach
21 fundamentally does not scale (in the sense of GNUnet being
22 a framework and a GNUnet server running hundreds of
23 different application protocols -- and the result still
24 being debuggable, secure and stable).
25* The code was heavily multi-threaded resulting in complex
26 locking operations. GNUnet 0.8.x had over 70 different
27 mutexes and almost 1000 lines of lock/unlock operations.
28 It is challenging for even good programmers to program or
29 maintain good multi-threaded code with this complexity.
30 The excessive locking essentially prevents GNUnet 0.8 from
31 actually doing much in parallel on multicores.
32* Despite efforts like Freeway, it was virtually
33 impossible to contribute code to GNUnet 0.8 that was not
34 writen in C/C++.
35* Changes to the configuration almost always required restarts
36 of gnunetd; the existence of change-notifications does not
37 really change that (how many users are even aware of SIGHUP,
38 and how few options worked with that -- and at what expense
39 in code complexity!).
40* Valgrinding could only be done for the entire gnunetd
41 process. Given that gnunetd does quite a bit of
42 CPU-intensive crypto, this could not be done for a system
43 under heavy (or even moderate) load.
44* Stack overflows with threads, while rare under Linux these
45 days, result in really nasty and hard-to-find crashes.
46* structs of function pointers in service APIs were
47 needlessly adding complexity, especially since in
48 most cases there was no actual polymorphism
49
50SOLUTION:
51* Use multiple, lously-coupled processes and one big select
52 loop in each (supported by a powerful util library to eliminate
53 code duplication for each process).
54* Eliminate all threads, manage the processes with a
55 master-process (gnunet-arm, for automatic restart manager)
56 which also ensures that configuration changes trigger the
57 necessary restarts.
58* Use continuations (with timeouts) as a way to unify
59 cron-jobs and other event-based code (such as waiting
60 on network IO).
61 => Using multiple processes ensures that memory corruption
62 stays localized.
63 => Using multiple processes will make it easy to contribute
64 services written in other language(s).
65 => Individual services can now be subjected to valgrind
66 => Process priorities can be used to schedule the CPU better
67 Note that we can not just use one process with a big
68 select loop because we have blocking operations (and the
69 blocking is outside of our control, thanks to MySQL,
70 sqlite, gethostbyaddr, etc.). So in order to perform
71 reasonably well, we need some construct for parallel
72 execution.
73
74 RULE: If your service contains blocking functions, it
75 MUST be a process by itself. If your service
76 is sufficiently complex, you MAY choose to make
77 it a separate process.
78* Eliminate structs with function pointers for service APIs;
79 instead, provide a library (still ending in _service.h) API
80 that transmits the requests nicely to the respective
81 process (easier to use, no need to "request" service
82 in the first place; API can cause process to be started/stopped
83 via ARM if necessary).
84
85
86PROBLEM GROUP 2 (UTIL-APIs causing bugs):
87* The existing logging functions were awkward to use and
88 their expressive power was never really used for much.
89* While we had some rules for naming functions, there
90 were still plenty of inconsistencies.
91* Specification of default values in configuration could
92 result in inconsistencies between defaults in
93 config.scm and defaults used by the program; also,
94 different defaults might have been specified for the
95 same option in different parts of the program.
96* The TIME API did not distinguish between absolute
97 and relative time, requiring users to know which
98 type of value some variable contained and to
99 manually convert properly. Combined with the
100 possibility of integer overflows this is a major
101 source of bugs.
102* The TIME API for seconds has a theoretical problem
103 with a 32-bit overflow on some platforms which is
104 only partially fixed by the old code with some
105 hackery.
106
107SOLUTION:
108* Logging was radically simplified.
109* Functions are now more conistently named.
110* Configuration has no more defaults; instead,
111 we load a global default configuration file
112 before the user-specific configuration (which
113 can be used to override defaults); the global
114 default configuration file will be generated
115 from config.scm.
116* Time now distinguishes between
117 struct GNUNET_TIME_Absolute and
118 struct GNUNET_TIME_Relative. We use structs
119 so that the compiler won't coerce for us
120 (forcing the use of specific conversion
121 functions which have checks for overflows, etc.).
122 Naturally the need to use these functions makes
123 the code a bit more verbose, but that's a good
124 thing given the potential for bugs.
125* There is no more TIME API function to do anything
126 with 32-bit seconds
127* There is now a bandwidth API to handle
128 non-trivial bandwidth utilization calculations
129
130
131PROBLEM GROUP 3 (statistics):
132* Databases and others needed to store capacity values
133 similar to what stats was already doing, but
134 across process lifetimes ("state"-API was a partial
135 solution for that, but using it was clunky)
136* Only gnunetd could use statistics, but other
137 processes in the GNUnet system might have had
138 good uses for it as well
139
140SOLUTION:
141* New statistics library and service that offer
142 an API to inspect and modify statistics
143* Statistics are distinguished by service name
144 in addition to the name of the value
145* Statistics can be marked as persistent, in
146 which case they are written to disk when
147 the statistics service shuts down.
148 => One solution for existing stats uses,
149 application stats, database stats and
150 versioning information!
151
152
153PROBLEM GROUP 4 (Testing):
154* The existing structure of the code with modules
155 stored in places far away from the test code
156 resulted in tools like lcov not giving good results.
157* The codebase had evolved into a complex, deeply
158 nested hierarchy often with directories that
159 then only contained a single file. Some of these
160 files had the same name making it hard to find
161 the source corresponding to a crash based on
162 the reported filename/line information.
163* Non-trivial portions of the code lacked good testcases,
164 and it was not always obvious which parts of the code
165 were not well-tested.
166
167SOLUTION:
168* Code that should be tested together is now
169 in the same directory.
170* The hierarchy is now essentially flat, each
171 major service having on directory under src/;
172 naming conventions help to make sure that
173 files have globally-unique names
174* All code added to the new repository must
175 come with testcases with reasonable coverage.
176
177
178PROBLEM GROUP 5 (core/transports):
179* The new DV service requires session key exchange
180 between DV-neighbours, but the existing
181 session key code can not be used to achieve this.
182* The core requires certain services
183 (such as identity, pingpong, fragmentation,
184 transport, traffic, session) which makes it
185 meaningless to have these as modules
186 (especially since there is really only one
187 way to implement these)
188* HELLO's are larger than necessary since we need
189 one for each transport (and hence often have
190 to pick a subset of our HELLOs to transmit)
191* Fragmentation is done at the core level but only
192 required for a few transports; future versions of
193 these transports might want to be aware of fragments
194 and do things like retransmission
195* Autoconfiguration is hard since we have no good
196 way to detect (and then use securely) our external IP address
197* It is currently not possible for multiple transports
198 between the same pair of peers to be used concurrently
199 in the same direction(s)
200* We're using lots of cron-based jobs to periodically
201 try (and fail) to build and transmit
202
203SOLUTION:
204* Rewrite core to integrate most of these services
205 into one "core" service.
206* Redesign HELLO to contain the addresses for
207 all enabled transports in one message (avoiding
208 having to transmit the public key and signature
209 many, many times)
210* With discovery being part of the transport service,
211 it is now also possible to "learn" our external
212 IP address from other peers (we just add plausible
213 addresses to the list; other peers will discard
214 those addresses that don't work for them!)
215* New DV will consist of a "transport" and a
216 high-level service (to handle encrypted DV
217 control- and data-messages).
218* Move expiration from one field per HELLO to one
219 per address
220* Require signature in PONG, not in HELLO (and confirm
221 on address at a time)
222* Move fragmentation into helper library linked
223 against by UDP (and others that might need it)
224* Link-to-link advertising of our HELLO is transport
225 responsibility; global advertising/bootstrap remains
226 responsibility of higher layers
227* Change APIs to be event-based (transports pull for
228 transmission data instead of core pushing and failing)
229
230
231PROBLEM GROUP 6 (FS-APIs):
232* As with gnunetd, the FS-APIs are heavily threaded,
233 resulting in hard-to-understand code (slightly
234 better than gnunetd, but not much).
235* GTK in particular does not like this, resulting
236 in complicated code to switch to the GTK event
237 thread when needed (which may still be causing
238 problems on Gnome, not sure).
239* If GUIs die (or are not properly shutdown), state
240 of current transactions is lost (FSUI only
241 saves to disk on shutdown)
242* FILENAME metadata is killed by ECRS/FSUI to avoid
243 exposing HOME, but what if the user set it manually?
244* The DHT was a generic data structure with no
245 support for ECRS-style block validation
246
247SOLUTION:
248* Eliminate threads from FS-APIs
249* Incrementally store FS-state always also on disk using many
250 small files instead of one big file
251* Have API to manipulate sharing tree before
252 upload; have auto-construction modify FILENAME
253 but allow user-modifications afterwards
254* DHT API was extended with a BLOCK API for content
255 validation by block type; validators for FS and
256 DHT block types were written; BLOCK API is also
257 used by gap routing code.
258
259
260PROBLEM GROUP 7 (User experience):
261* Searches often do not return a sufficient / significant number of
262 results
263* Sharing a directory with thousands of similar files (image/jpeg)
264 creates thousands of search results for the mime-type keyword
265 (problem with DB performance, network transmission, caching,
266 end-user display, etc.)
267* Users that wanted to share important content had no way to
268 tell the system to replicate it more; replication was also
269 inefficient (this desired feature was sometimes called
270 "power" publishing or content pushing)
271
272SOLUTION:
273* Have option to canonicalize keywords (see suggestion on mailinglist end of
274 June 2009: keep consonants and sort those alphabetically); not
275 fully implemented yet
276* When sharing directories, extract keywords first and then
277 push keywords that are common in all files up to the
278 directory level; when processing an AND-ed query and a directory
279 is found to match the result, do an inspection on the metadata
280 of the files in the directory to possibly produce further results
281 (requires downloading of the directory in the background);
282 needs more testing
283* A desired replication level can now be specified and is tracked
284 in the datastore; migration prefers content with a high
285 replication level (which decreases as replicase are created)
286 => datastore format changed; we also took out a size field
287 that was redundant, so the overall overhead remains the same
288* Peers with a full disk (or disabled migration) can now notify
289 other peers that they are not interested in migration right
290 now; as a result, less bandwidth is wasted pushing content
291 to these peers (and replication counters are not generally
292 decreased based on copies that are just discarded; naturally,
293 there is still no guarantee that the replicas will stay
294 available)
295
296
297
298SUMMARY:
299* Features eliminated from util:
300 - threading (goal: good riddance!)
301 - complex logging features [ectx-passing, target-kinds] (goal: good riddance!)
302 - complex configuration features [defaults, notifications] (goal: good riddance!)
303 - network traffic monitors (goal: eliminate)
304 - IPC semaphores (goal: d-bus? / eliminate?)
305 - second timers
306* New features in util:
307 - scheduler
308 - service and program boot-strap code
309 - bandwidth and time APIs
310 - buffered IO API
311 - HKDF implementation (crypto)
312 - load calculation API
313 - bandwidth calculation API
314* Major changes in util:
315 - more expressive server (replaces selector)
316 - DNS lookup replaced by async service