diff options
Diffstat (limited to 'doc/libextractor.texi')
-rw-r--r-- | doc/libextractor.texi | 124 |
1 files changed, 49 insertions, 75 deletions
diff --git a/doc/libextractor.texi b/doc/libextractor.texi index 52c6635..bec63c0 100644 --- a/doc/libextractor.texi +++ b/doc/libextractor.texi | |||
@@ -51,36 +51,10 @@ Free Documentation License''. | |||
51 | @insertcopying | 51 | @insertcopying |
52 | @end titlepage | 52 | @end titlepage |
53 | 53 | ||
54 | |||
55 | @summarycontents | 54 | @summarycontents |
56 | @contents | 55 | @contents |
57 | 56 | ||
58 | 57 | ||
59 | @macro gnu{} | ||
60 | @acronym{GNU} | ||
61 | @end macro | ||
62 | |||
63 | @macro gpl{} | ||
64 | @acronym{GPL} | ||
65 | @end macro | ||
66 | |||
67 | @macro api{} | ||
68 | @acronym{API} | ||
69 | @end macro | ||
70 | |||
71 | @macro cfunction{arg} | ||
72 | @code{\arg\()} | ||
73 | @end macro | ||
74 | |||
75 | @macro mynull{} | ||
76 | @code{NULL} | ||
77 | @end macro | ||
78 | |||
79 | @macro gnule{} | ||
80 | @acronym{GNU libextractor} | ||
81 | @end macro | ||
82 | |||
83 | |||
84 | @ifnottex | 58 | @ifnottex |
85 | @node Top | 59 | @node Top |
86 | @top The GNU libextractor Reference Manual | 60 | @top The GNU libextractor Reference Manual |
@@ -88,15 +62,15 @@ Free Documentation License''. | |||
88 | @end ifnottex | 62 | @end ifnottex |
89 | 63 | ||
90 | @menu | 64 | @menu |
91 | * Introduction:: What is @gnule{}. | 65 | * Introduction:: What is GNU libextractor. |
92 | * Preparation:: What you should do before using the library. | 66 | * Preparation:: What you should do before using the library. |
93 | * Generalities:: General library functions and data types. | 67 | * Generalities:: General library functions and data types. |
94 | * Extracting meta data:: How to use @gnule{} to obtain meta data. | 68 | * Extracting meta data:: How to use GNU libextractor to obtain meta data. |
95 | * Language bindings:: How to use @gnule{} from languages other than C. | 69 | * Language bindings:: How to use GNU libextractor from languages other than C. |
96 | * Utility functions:: Utility functions of @gnule{}. | 70 | * Utility functions:: Utility functions of GNU libextractor. |
97 | * Existing Plugins:: What plugins are available. | 71 | * Existing Plugins:: What plugins are available. |
98 | * Writing new Plugins:: How to write new plugins for @gnule{}. | 72 | * Writing new Plugins:: How to write new plugins for GNU libextractor. |
99 | * Internal utility functions:: Utility functions of @gnule{} for writing plugins. | 73 | * Internal utility functions:: Utility functions of GNU libextractor for writing plugins. |
100 | * Reporting bugs:: How to report bugs or request new features. | 74 | * Reporting bugs:: How to report bugs or request new features. |
101 | 75 | ||
102 | Appendices | 76 | Appendices |
@@ -120,7 +94,7 @@ Indices | |||
120 | @chapter Introduction | 94 | @chapter Introduction |
121 | 95 | ||
122 | @cindex error handling | 96 | @cindex error handling |
123 | @gnule{} is GNU's library for extracting meta data from | 97 | GNU libextractor is GNU's library for extracting meta data from |
124 | files. Meta data includes format information (such as mime type, | 98 | files. Meta data includes format information (such as mime type, |
125 | image dimensions, color depth, recording frequency), content | 99 | image dimensions, color depth, recording frequency), content |
126 | descriptions (such as document title or document description) and | 100 | descriptions (such as document title or document description) and |
@@ -128,38 +102,38 @@ copyright information (such as license, author and contributors). | |||
128 | Meta data extraction is an inherently uncertain business --- a parse | 102 | Meta data extraction is an inherently uncertain business --- a parse |
129 | error can be a corrupt file, an incompatibility in the file format | 103 | error can be a corrupt file, an incompatibility in the file format |
130 | version, an entirely different file format or a bug in the parser. As | 104 | version, an entirely different file format or a bug in the parser. As |
131 | a result of this uncertainty, @gnule{} deliberately | 105 | a result of this uncertainty, GNU libextractor deliberately |
132 | avoids to ever report any errors. Unexpected file contents simply | 106 | avoids to ever report any errors. Unexpected file contents simply |
133 | result in less or possibly no meta data being extracted. | 107 | result in less or possibly no meta data being extracted. |
134 | 108 | ||
135 | @cindex plugin | 109 | @cindex plugin |
136 | @gnule{} uses plugins to handle various file formats. | 110 | GNU libextractor uses plugins to handle various file formats. |
137 | Technically a plugin can support multiple file formats; however, most | 111 | Technically a plugin can support multiple file formats; however, most |
138 | plugins only support one particular format. By default, | 112 | plugins only support one particular format. By default, |
139 | @gnule{} will use all plugins that are available and found | 113 | GNU libextractor will use all plugins that are available and found |
140 | in the plugin installation directory. Applications can | 114 | in the plugin installation directory. Applications can |
141 | request the use of only specific plugins or the exclusion of | 115 | request the use of only specific plugins or the exclusion of |
142 | certain plugins. | 116 | certain plugins. |
143 | 117 | ||
144 | @gnule{} is distributed with the @command{extract} | 118 | GNU libextractor is distributed with the @command{extract} |
145 | command@footnote{Some distributions ship @command{extract} in a | 119 | command@footnote{Some distributions ship @command{extract} in a |
146 | seperate package.} which is a command-line tool for extracting | 120 | seperate package.} which is a command-line tool for extracting |
147 | meta data. @command{extract} is given a list of filenames and | 121 | meta data. @command{extract} is given a list of filenames and |
148 | prints the resulting meta data to the console. The @command{extract} | 122 | prints the resulting meta data to the console. The @command{extract} |
149 | source code also serves as an advanced example for how to use | 123 | source code also serves as an advanced example for how to use |
150 | @gnule{}. | 124 | GNU libextractor. |
151 | 125 | ||
152 | This manual focuses on providing documentation for writing software | 126 | This manual focuses on providing documentation for writing software |
153 | with @gnule{}. The only relevant parts for end-users | 127 | with GNU libextractor. The only relevant parts for end-users |
154 | are the chapter on compiling and installing @gnule{} | 128 | are the chapter on compiling and installing GNU libextractor |
155 | (@xref{Preparation}.). Also, the chapter on existing plugins maybe of | 129 | (@xref{Preparation}.). Also, the chapter on existing plugins maybe of |
156 | interest (@xref{Existing Plugins}.). Additional documentation for | 130 | interest (@xref{Existing Plugins}.). Additional documentation for |
157 | end-users can be find in the man page on @command{extract} (using | 131 | end-users can be find in the man page on @command{extract} (using |
158 | @verb{|man extract|}). | 132 | @verb{|man extract|}). |
159 | 133 | ||
160 | @cindex license | 134 | @cindex license |
161 | @gnule{} is licensed under the GNU General Public License, | 135 | GNU libextractor is licensed under the GNU General Public License, |
162 | specifically, since version 0.7, @gnule{} is licensed under GPLv3 | 136 | specifically, since version 0.7, GNU libextractor is licensed under GPLv3 |
163 | @emph{or any later version}. | 137 | @emph{or any later version}. |
164 | 138 | ||
165 | @node Preparation | 139 | @node Preparation |
@@ -170,12 +144,12 @@ should apply to all systems. Specific instructions for known problems | |||
170 | for particular platforms are then described in individual sections | 144 | for particular platforms are then described in individual sections |
171 | afterwards. | 145 | afterwards. |
172 | 146 | ||
173 | Compiling @gnule{} follows the standard GNU autotools build process | 147 | Compiling GNU libextractor follows the standard GNU autotools build process |
174 | using @command{configure} and @command{make}. For details on the GNU | 148 | using @command{configure} and @command{make}. For details on the GNU |
175 | autotools build process, read the @file{INSTALL} file and query | 149 | autotools build process, read the @file{INSTALL} file and query |
176 | @verb{|./configure --help|} for additional options. | 150 | @verb{|./configure --help|} for additional options. |
177 | 151 | ||
178 | @gnule{} has various dependencies, most of which are optional. | 152 | GNU libextractor has various dependencies, most of which are optional. |
179 | Instead of specifying the names of the software packages, we | 153 | Instead of specifying the names of the software packages, we |
180 | will give the list in terms of the names of the respective | 154 | will give the list in terms of the names of the respective |
181 | Debian (unstable) packages that should be installed. | 155 | Debian (unstable) packages that should be installed. |
@@ -241,29 +215,29 @@ Please notify us if we missed some dependencies (note that the list is | |||
241 | supposed to only list direct dependencies, not transitive | 215 | supposed to only list direct dependencies, not transitive |
242 | dependencies). | 216 | dependencies). |
243 | 217 | ||
244 | Once you have compiled and installed @gnule{}, you should have a file | 218 | Once you have compiled and installed GNU libextractor, you should have a file |
245 | @file{extractor.h} installed in your @file{include/} directory. This | 219 | @file{extractor.h} installed in your @file{include/} directory. This |
246 | file should be the starting point for your C and C++ development with | 220 | file should be the starting point for your C and C++ development with |
247 | @gnule{}. The build process also installs the @file{extract} binary and | 221 | GNU libextractor. The build process also installs the @file{extract} binary and |
248 | man pages for @file{extract} and @gnule{}. The @file{extract} man page | 222 | man pages for @file{extract} and GNU libextractor. The @file{extract} man page |
249 | documents the @file{extract} tool. The @gnule{} man page gives a brief | 223 | documents the @file{extract} tool. The GNU libextractor man page gives a brief |
250 | summary of the C API for @gnule{}. | 224 | summary of the C API for GNU libextractor. |
251 | 225 | ||
252 | @cindex packageing | 226 | @cindex packageing |
253 | @cindex directory structure | 227 | @cindex directory structure |
254 | @cindex plugin | 228 | @cindex plugin |
255 | @cindex environment variables | 229 | @cindex environment variables |
256 | @vindex LIBEXTRACTOR_PREFIX | 230 | @vindex LIBEXTRACTOR_PREFIX |
257 | When you install @gnule{}, various plugins will be | 231 | When you install GNU libextractor, various plugins will be |
258 | installed in the @file{lib/libextractor/} directory. The main library | 232 | installed in the @file{lib/libextractor/} directory. The main library |
259 | will be installed as @file{lib/libextractor.so}. Note that | 233 | will be installed as @file{lib/libextractor.so}. Note that |
260 | @gnule{} will attempt to find the plugins relative to the | 234 | GNU libextractor will attempt to find the plugins relative to the |
261 | path of the main library. Consequently, a package manager can move | 235 | path of the main library. Consequently, a package manager can move |
262 | the library and its plugins to a different location later --- as long | 236 | the library and its plugins to a different location later --- as long |
263 | as the relative path between the main library and the plugins is | 237 | as the relative path between the main library and the plugins is |
264 | preserved. As a method of last resort, the user can specify an | 238 | preserved. As a method of last resort, the user can specify an |
265 | environment variable @verb{|LIBEXTRACTOR_PREFIX|}. If | 239 | environment variable @verb{|LIBEXTRACTOR_PREFIX|}. If |
266 | @gnule{} cannot locate a plugin, it will look in | 240 | GNU libextractor cannot locate a plugin, it will look in |
267 | @verb{|LIBEXTRACTOR_PREFIX/lib/libextractor/|}. | 241 | @verb{|LIBEXTRACTOR_PREFIX/lib/libextractor/|}. |
268 | 242 | ||
269 | 243 | ||
@@ -280,7 +254,7 @@ Should work using the standard instructions without problems. | |||
280 | @section Installation on OpenBSD | 254 | @section Installation on OpenBSD |
281 | 255 | ||
282 | OpenBSD 3.8 also doesn't have CODESET in @file{langinfo.h}. CODESET | 256 | OpenBSD 3.8 also doesn't have CODESET in @file{langinfo.h}. CODESET |
283 | is used in @gnule{} in about three places. This causes problems | 257 | is used in GNU libextractor in about three places. This causes problems |
284 | during compilation. | 258 | during compilation. |
285 | 259 | ||
286 | 260 | ||
@@ -477,9 +451,9 @@ comment - Testing keyword extraction | |||
477 | 451 | ||
478 | @section Introduction to the libextractor library | 452 | @section Introduction to the libextractor library |
479 | 453 | ||
480 | Each public symbol exported by @gnule{} has the prefix | 454 | Each public symbol exported by GNU libextractor has the prefix |
481 | @verb{|EXTRACTOR_|}. All-caps names are used for constants. For the | 455 | @verb{|EXTRACTOR_|}. All-caps names are used for constants. For the |
482 | impatient, the minimal C code for using @gnule{} (on the | 456 | impatient, the minimal C code for using GNU libextractor (on the |
483 | executing binary itself) looks like this: | 457 | executing binary itself) looks like this: |
484 | 458 | ||
485 | @verbatim | 459 | @verbatim |
@@ -499,7 +473,7 @@ main (int argc, char ** argv) | |||
499 | @end verbatim | 473 | @end verbatim |
500 | 474 | ||
501 | The minimal API illustrated by this example is actually sufficient for | 475 | The minimal API illustrated by this example is actually sufficient for |
502 | many applications. The full external C API of @gnule{} is described | 476 | many applications. The full external C API of GNU libextractor is described |
503 | in chapter @xref{Extracting meta data}. Bindings for other languages | 477 | in chapter @xref{Extracting meta data}. Bindings for other languages |
504 | are described in chapter @xref{Language bindings}. The API for | 478 | are described in chapter @xref{Language bindings}. The API for |
505 | writing new plugins is described in chapter @xref{Writing new Plugins}. | 479 | writing new plugins is described in chapter @xref{Writing new Plugins}. |
@@ -507,7 +481,7 @@ writing new plugins is described in chapter @xref{Writing new Plugins}. | |||
507 | @node Extracting meta data | 481 | @node Extracting meta data |
508 | @chapter Extracting meta data | 482 | @chapter Extracting meta data |
509 | 483 | ||
510 | In order to extract meta data with @gnule{} you first need to | 484 | In order to extract meta data with GNU libextractor you first need to |
511 | load the respective plugins and then call the extraction API | 485 | load the respective plugins and then call the extraction API |
512 | with the plugins and the data to process. This section | 486 | with the plugins and the data to process. This section |
513 | documents how to load and unload plugins, the various types | 487 | documents how to load and unload plugins, the various types |
@@ -531,8 +505,8 @@ and finally the extraction API itself. | |||
531 | @cindex thread-safety | 505 | @cindex thread-safety |
532 | @tindex enum EXTRACTOR_Options | 506 | @tindex enum EXTRACTOR_Options |
533 | 507 | ||
534 | Using @gnule{} from a multi-threaded parent process requires some | 508 | Using GNU libextractor from a multi-threaded parent process requires some |
535 | care. The problem is that on most platforms @gnule{} starts | 509 | care. The problem is that on most platforms GNU libextractor starts |
536 | sub-processes for the actual extraction work. This is useful to | 510 | sub-processes for the actual extraction work. This is useful to |
537 | isolate the parent process from potential bugs; however, it can cause | 511 | isolate the parent process from potential bugs; however, it can cause |
538 | problems if the parent process is multi-threaded. The issue is that | 512 | problems if the parent process is multi-threaded. The issue is that |
@@ -545,7 +519,7 @@ actually been observed with a lock in GNU gettext that is triggered by | |||
545 | the plugin startup code when it interacts with libltdl. | 519 | the plugin startup code when it interacts with libltdl. |
546 | 520 | ||
547 | The problem can be solved by loading the plugins using the | 521 | The problem can be solved by loading the plugins using the |
548 | @code{EXTRACTOR_OPTION_IN_PROCESS} option, which will run @gnule{} | 522 | @code{EXTRACTOR_OPTION_IN_PROCESS} option, which will run GNU libextractor |
549 | in-process and thus avoid the locking issue. In this case, all of the | 523 | in-process and thus avoid the locking issue. In this case, all of the |
550 | functions for loading and unloading plugins, including | 524 | functions for loading and unloading plugins, including |
551 | @verb{|EXTRACTOR_plugin_add_defaults|} and | 525 | @verb{|EXTRACTOR_plugin_add_defaults|} and |
@@ -583,19 +557,19 @@ Unloads a particular plugin. The given name should be the short name of the plu | |||
583 | @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add (struct EXTRACTOR_PluginList *plugins, const char* name,const char* options, enum EXTRACTOR_Options flags) | 557 | @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add (struct EXTRACTOR_PluginList *plugins, const char* name,const char* options, enum EXTRACTOR_Options flags) |
584 | @findex EXTRACTOR_plugin_add | 558 | @findex EXTRACTOR_plugin_add |
585 | 559 | ||
586 | Loads a particular plugin. The plugin is added to the existing list, which can be NULL. The second argument specifies the name of the plugin (i.e. ``ogg''). The third argument can be NULL and specifies plugin-specific options. Finally, the last argument specifies if the plugin should be executed out-of-process (@code{EXTRACTOR_OPTION_DEFAULT_POLICY}) or not. | 560 | Loads a particular plugin. The plugin is added to the existing list, which can be @code{NULL}. The second argument specifies the name of the plugin (i.e. ``ogg''). The third argument can be @code{NULL} and specifies plugin-specific options. Finally, the last argument specifies if the plugin should be executed out-of-process (@code{EXTRACTOR_OPTION_DEFAULT_POLICY}) or not. |
587 | @end deftypefun | 561 | @end deftypefun |
588 | 562 | ||
589 | @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add_config (struct EXTRACTOR_PluginList *plugins, const char* config, enum EXTRACTOR_Options flags) | 563 | @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add_config (struct EXTRACTOR_PluginList *plugins, const char* config, enum EXTRACTOR_Options flags) |
590 | @findex EXTRACTOR_plugin_add_config | 564 | @findex EXTRACTOR_plugin_add_config |
591 | 565 | ||
592 | Loads and unloads plugins based on a configuration string, modifying the existing list, which can be NULL. The string has the format ``[-]NAME(OPTIONS)@{:[-]NAME(OPTIONS)@}*''. Prefixing the plugin name with a ``-'' means that the plugin should be unloaded. | 566 | Loads and unloads plugins based on a configuration string, modifying the existing list, which can be @code{NULL}. The string has the format ``[-]NAME(OPTIONS)@{:[-]NAME(OPTIONS)@}*''. Prefixing the plugin name with a ``-'' means that the plugin should be unloaded. |
593 | @end deftypefun | 567 | @end deftypefun |
594 | 568 | ||
595 | @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add_defaults (enum EXTRACTOR_Options flags) | 569 | @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add_defaults (enum EXTRACTOR_Options flags) |
596 | @findex EXTRACTOR_plugin_add_defaults | 570 | @findex EXTRACTOR_plugin_add_defaults |
597 | 571 | ||
598 | Loads all of the plugins in the plugin directory. This function is what most @gnule{} applications should use to setup the plugins. | 572 | Loads all of the plugins in the plugin directory. This function is what most GNU libextractor applications should use to setup the plugins. |
599 | @end deftypefun | 573 | @end deftypefun |
600 | 574 | ||
601 | 575 | ||
@@ -607,14 +581,14 @@ Loads all of the plugins in the plugin directory. This function is what most @g | |||
607 | @tindex enum EXTRACTOR_MetaType | 581 | @tindex enum EXTRACTOR_MetaType |
608 | @findex EXTRACTOR_metatype_get_max | 582 | @findex EXTRACTOR_metatype_get_max |
609 | 583 | ||
610 | @verb{|enum EXTRACTOR_MetaType|} is a C enum which defines a list of over 100 different types of meta data. The total number can differ between different @gnule{} releases; the maximum value for the current release can be obtained using the @verb{|EXTRACTOR_metatype_get_max|} function. All values in this enumeration are of the form @verb{|EXTRACTOR_METATYPE_XXX|}. | 584 | @verb{|enum EXTRACTOR_MetaType|} is a C enum which defines a list of over 100 different types of meta data. The total number can differ between different GNU libextractor releases; the maximum value for the current release can be obtained using the @verb{|EXTRACTOR_metatype_get_max|} function. All values in this enumeration are of the form @verb{|EXTRACTOR_METATYPE_XXX|}. |
611 | 585 | ||
612 | @deftypefun {const char *} EXTRACTOR_metatype_to_string (enum EXTRACTOR_MetaType type) | 586 | @deftypefun {const char *} EXTRACTOR_metatype_to_string (enum EXTRACTOR_MetaType type) |
613 | @findex EXTRACTOR_metatype_to_string | 587 | @findex EXTRACTOR_metatype_to_string |
614 | @cindex gettext | 588 | @cindex gettext |
615 | @cindex internationalization | 589 | @cindex internationalization |
616 | 590 | ||
617 | The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short English string @samp{s} describing the meta data type. The string can be translated into other languages using GNU gettext with the domain set to @gnule{} (@verb{|dgettext("libextractor", s)|}). | 591 | The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short English string @samp{s} describing the meta data type. The string can be translated into other languages using GNU gettext with the domain set to GNU libextractor (@verb{|dgettext("libextractor", s)|}). |
618 | @end deftypefun | 592 | @end deftypefun |
619 | 593 | ||
620 | @deftypefun {const char *} EXTRACTOR_metatype_to_description (enum EXTRACTOR_MetaType type) | 594 | @deftypefun {const char *} EXTRACTOR_metatype_to_description (enum EXTRACTOR_MetaType type) |
@@ -622,7 +596,7 @@ The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short | |||
622 | @cindex gettext | 596 | @cindex gettext |
623 | @cindex internationalization | 597 | @cindex internationalization |
624 | 598 | ||
625 | The function @verb{|EXTRACTOR_metatype_to_description|} can be used to obtain a longer English string @samp{s} describing the meta data type. The description may be empty if the short description returned by @code{EXTRACTOR_metatype_to_string} is already comprehensive. The string can be translated into other languages using GNU gettext with the domain set to @gnule{} (@verb{|dgettext("libextractor", s)|}). | 599 | The function @verb{|EXTRACTOR_metatype_to_description|} can be used to obtain a longer English string @samp{s} describing the meta data type. The description may be empty if the short description returned by @code{EXTRACTOR_metatype_to_string} is already comprehensive. The string can be translated into other languages using GNU gettext with the domain set to GNU libextractor (@verb{|dgettext("libextractor", s)|}). |
626 | @end deftypefun | 600 | @end deftypefun |
627 | 601 | ||
628 | 602 | ||
@@ -661,7 +635,7 @@ libextractor-type describing the meta data; | |||
661 | format information about data | 635 | format information about data |
662 | 636 | ||
663 | @item data_mime_type | 637 | @item data_mime_type |
664 | mime-type of data (not of the original file); can be NULL (if mime-type is not known); | 638 | mime-type of data (not of the original file); can be @code{NULL} (if mime-type is not known); |
665 | 639 | ||
666 | @item data | 640 | @item data |
667 | actual meta-data found | 641 | actual meta-data found |
@@ -683,11 +657,11 @@ Return 0 to continue extracting, 1 to abort. | |||
683 | @cindex threads | 657 | @cindex threads |
684 | @cindex thread-safety | 658 | @cindex thread-safety |
685 | 659 | ||
686 | This is the main function for extracting keywords with @gnule{}. The first argument is a plugin list which specifies the set of plugins that should be used for extracting meta data. The @samp{filename} argument is optional and can be used to specify the name of a file to process. If @samp{filename} is NULL, then the @samp{data} argument must point to the in-memory data to extract meta data from. If @samp{filename} is non-NULL, @samp{data} can be NULL. If @samp{data} is non-null, then @samp{size} is the size of @samp{data} in bytes. Otherwise @samp{size} should be zero. For each meta data item found, GNU libextractor will call the @samp{proc} function, passing @samp{proc_cls} as the first argument to @samp{proc}. The other arguments to @samp{proc} depend on the specific meta data found. | 660 | This is the main function for extracting keywords with GNU libextractor. The first argument is a plugin list which specifies the set of plugins that should be used for extracting meta data. The @samp{filename} argument is optional and can be used to specify the name of a file to process. If @samp{filename} is @code{NULL}, then the @samp{data} argument must point to the in-memory data to extract meta data from. If @samp{filename} is non-@code{NULL}, @samp{data} can be @code{NULL}. If @samp{data} is non-null, then @samp{size} is the size of @samp{data} in bytes. Otherwise @samp{size} should be zero. For each meta data item found, GNU libextractor will call the @samp{proc} function, passing @samp{proc_cls} as the first argument to @samp{proc}. The other arguments to @samp{proc} depend on the specific meta data found. |
687 | 661 | ||
688 | @cindex SIGBUS | 662 | @cindex SIGBUS |
689 | @cindex bus error | 663 | @cindex bus error |
690 | Meta data extraction should never really fail --- at worst, @gnule{} should not call @samp{proc} with any meta data. By design, @gnule{} should never crash or leak memory, even given corrupt files as input. Note however, that running @gnule{} on a corrupt file system (or incorrectly @verb{|mmap|}ed files) can result in the operating system sending a SIGBUS (bus error) to the process. While @gnule{} runs plugins out-of-process, it first maps the file into memory and then attempts to decompress it. During decompression it is possible to encounter a SIGBUS. @gnule{} will @emph{not} attempt to catch this signal and your application is likely to crash. Note again that this should only happen if the file @emph{system} is corrupt (not if individual files are corrupt). If this is not acceptable, you might want to consider running @gnule{} itself also out-of-process (as done, for example, by @url{http://grothoff.org/christian/doodle/,doodle}). | 664 | Meta data extraction should never really fail --- at worst, GNU libextractor should not call @samp{proc} with any meta data. By design, GNU libextractor should never crash or leak memory, even given corrupt files as input. Note however, that running GNU libextractor on a corrupt file system (or incorrectly @verb{|mmap|}ed files) can result in the operating system sending a SIGBUS (bus error) to the process. While GNU libextractor runs plugins out-of-process, it first maps the file into memory and then attempts to decompress it. During decompression it is possible to encounter a SIGBUS. GNU libextractor will @emph{not} attempt to catch this signal and your application is likely to crash. Note again that this should only happen if the file @emph{system} is corrupt (not if individual files are corrupt). If this is not acceptable, you might want to consider running GNU libextractor itself also out-of-process (as done, for example, by @url{http://grothoff.org/christian/doodle/,doodle}). |
691 | 665 | ||
692 | @end deftypefun | 666 | @end deftypefun |
693 | 667 | ||
@@ -701,7 +675,7 @@ Meta data extraction should never really fail --- at worst, @gnule{} should not | |||
701 | @cindex PHP | 675 | @cindex PHP |
702 | @cindex Ruby | 676 | @cindex Ruby |
703 | 677 | ||
704 | @gnule{} works immediately with C and C++ code. Bindings for Java, Mono, Ruby, Perl, PHP and Python are available for download from the main @gnule{} website. Documentation for these bindings (if available) is part of the downloads for the respective binding. In all cases, a full installation of the C library is required before the binding can be installed. | 678 | GNU libextractor works immediately with C and C++ code. Bindings for Java, Mono, Ruby, Perl, PHP and Python are available for download from the main GNU libextractor website. Documentation for these bindings (if available) is part of the downloads for the respective binding. In all cases, a full installation of the C library is required before the binding can be installed. |
705 | 679 | ||
706 | @section Java | 680 | @section Java |
707 | 681 | ||
@@ -763,7 +737,7 @@ This binding is undocumented at this point. | |||
763 | @cindex concurrency | 737 | @cindex concurrency |
764 | @cindex threads | 738 | @cindex threads |
765 | @cindex thread-safety | 739 | @cindex thread-safety |
766 | This chapter describes various utility functions for @gnule{} usage. All of the functions are reentrant. | 740 | This chapter describes various utility functions for GNU libextractor usage. All of the functions are reentrant. |
767 | 741 | ||
768 | @menu | 742 | @menu |
769 | * Utility Constants:: | 743 | * Utility Constants:: |
@@ -961,12 +935,12 @@ below. | |||
961 | @cindex UTF-8 | 935 | @cindex UTF-8 |
962 | @cindex character set | 936 | @cindex character set |
963 | @findex EXTRACTOR_common_convert_to_utf8 | 937 | @findex EXTRACTOR_common_convert_to_utf8 |
964 | Various @gnule{} plugins make use of the internal | 938 | Various GNU libextractor plugins make use of the internal |
965 | @file{convert.h} header which defines a function | 939 | @file{convert.h} header which defines a function |
966 | 940 | ||
967 | @verb{|EXTRACTOR_common_convert_to_utf8|} which can be used to easily convert text from | 941 | @verb{|EXTRACTOR_common_convert_to_utf8|} which can be used to easily convert text from |
968 | any character set to UTF-8. This conversion is important since the | 942 | any character set to UTF-8. This conversion is important since the |
969 | linked list of keywords that is returned by @gnule{} is | 943 | linked list of keywords that is returned by GNU libextractor is |
970 | expected to contain only UTF-8 strings. Naturally, proper conversion | 944 | expected to contain only UTF-8 strings. Naturally, proper conversion |
971 | may not always be possible since some file formats fail to specify the | 945 | may not always be possible since some file formats fail to specify the |
972 | character set. In that case, it is often better to not convert at | 946 | character set. In that case, it is often better to not convert at |
@@ -990,9 +964,9 @@ caller, so storing the string in the keyword list is acceptable. | |||
990 | @chapter Reporting bugs | 964 | @chapter Reporting bugs |
991 | 965 | ||
992 | @cindex bug | 966 | @cindex bug |
993 | @gnule{} uses the @url{https://gnunet.org/bugs/,Mantis bugtracking | 967 | GNU libextractor uses the @url{https://gnunet.org/bugs/,Mantis bugtracking |
994 | system}. If possible, please report bugs there. You can also e-mail | 968 | system}. If possible, please report bugs there. You can also e-mail |
995 | the @gnule{} mailinglist at @url{libextractor@@gnu.org}. | 969 | the GNU libextractor mailinglist at @url{libextractor@@gnu.org}. |
996 | 970 | ||
997 | 971 | ||
998 | 972 | ||