aboutsummaryrefslogtreecommitdiff
path: root/TODO
blob: 3ae2b1d444663ef4f1a5d2ca7112487c504a76b8 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
FIX:
* check exiv2 memory consumption on very large files;
  also investigate 500kb (!) allocation/leak in exiv2 on test/test.html
  (reported by valgrind)
* 500 kb leak for each load/unload of exiv2 plugin (glibc?)
* ffmpeg needs make 3.81: add configure check for it

Core:
* error reporting facilities
* add support for different character sets (to 'all' extractors)

'Unclean' code:
* ASF
* RPM

Incomplete code (missing features):
* RIFF (idx1 attribute)
* IDv2{3,4} (some attributes, make testcases in test/id3v2/ work)
* StarOffice sdw (some attributes, see doc/)
* man pages (interpret sections for authors, brief description)
* pdf: full-text extraction!
* EXIV2

Desirable missing formats:
* mbox / various e-mail formats
* info pages (scan for 'Node: %s^?ID' - see end of .info files!)
* sources (Java, C, C++, see doxygen!)
* a.out (== ar?)
* rtf
* EXE
* APEv2 (MPC file format, www.personal.uni-jena.de/~pfk/mpp/sv8/apetag.html)
* PRC (Palm module, http://web.mit.edu/tytso/www/pilot/prc-format.html)
* KOffice
* TGA
* ODF (OpenDocument format)

==============

UTF-8 conversion (only listing what is left to do):
* DVI: special headers are in what format? (rest is ASCII)
* SDW: needs to be done (need info about charsets)
* JPEG: presumably ASCII (or not specified)
* PS?
* WAV?
* ZIP?
* TAR?
* RIFF?
* MAN: presumably ASCII/Utf-8
* DEB: to be done
* ASF: ?
* HTML: to be done
* OLE2: done
* OO: to be done