libextractor

GNU libextractor
Log | Files | Refs | Submodules | README | LICENSE

commit 6f3d4312d80f80b5d1c4ee65245f1eb3120e9200
parent fe70eca489fdf6b184b88c5f4af26e6ed95c3314
Author: Christian Grothoff <christian@grothoff.org>
Date:   Sat,  8 Sep 2012 08:41:03 +0000

removing macros

Diffstat:
Mdoc/libextractor.texi | 124+++++++++++++++++++++++++++++++------------------------------------------------
1 file changed, 49 insertions(+), 75 deletions(-)

diff --git a/doc/libextractor.texi b/doc/libextractor.texi @@ -51,36 +51,10 @@ Free Documentation License''. @insertcopying @end titlepage - @summarycontents @contents -@macro gnu{} -@acronym{GNU} -@end macro - -@macro gpl{} -@acronym{GPL} -@end macro - -@macro api{} -@acronym{API} -@end macro - -@macro cfunction{arg} -@code{\arg\()} -@end macro - -@macro mynull{} -@code{NULL} -@end macro - -@macro gnule{} -@acronym{GNU libextractor} -@end macro - - @ifnottex @node Top @top The GNU libextractor Reference Manual @@ -88,15 +62,15 @@ Free Documentation License''. @end ifnottex @menu -* Introduction:: What is @gnule{}. +* Introduction:: What is GNU libextractor. * Preparation:: What you should do before using the library. * Generalities:: General library functions and data types. -* Extracting meta data:: How to use @gnule{} to obtain meta data. -* Language bindings:: How to use @gnule{} from languages other than C. -* Utility functions:: Utility functions of @gnule{}. +* Extracting meta data:: How to use GNU libextractor to obtain meta data. +* Language bindings:: How to use GNU libextractor from languages other than C. +* Utility functions:: Utility functions of GNU libextractor. * Existing Plugins:: What plugins are available. -* Writing new Plugins:: How to write new plugins for @gnule{}. -* Internal utility functions:: Utility functions of @gnule{} for writing plugins. +* Writing new Plugins:: How to write new plugins for GNU libextractor. +* Internal utility functions:: Utility functions of GNU libextractor for writing plugins. * Reporting bugs:: How to report bugs or request new features. Appendices @@ -120,7 +94,7 @@ Indices @chapter Introduction @cindex error handling -@gnule{} is GNU's library for extracting meta data from +GNU libextractor is GNU's library for extracting meta data from files. Meta data includes format information (such as mime type, image dimensions, color depth, recording frequency), content descriptions (such as document title or document description) and @@ -128,38 +102,38 @@ copyright information (such as license, author and contributors). Meta data extraction is an inherently uncertain business --- a parse error can be a corrupt file, an incompatibility in the file format version, an entirely different file format or a bug in the parser. As -a result of this uncertainty, @gnule{} deliberately +a result of this uncertainty, GNU libextractor deliberately avoids to ever report any errors. Unexpected file contents simply result in less or possibly no meta data being extracted. @cindex plugin -@gnule{} uses plugins to handle various file formats. +GNU libextractor uses plugins to handle various file formats. Technically a plugin can support multiple file formats; however, most plugins only support one particular format. By default, -@gnule{} will use all plugins that are available and found +GNU libextractor will use all plugins that are available and found in the plugin installation directory. Applications can request the use of only specific plugins or the exclusion of certain plugins. -@gnule{} is distributed with the @command{extract} +GNU libextractor is distributed with the @command{extract} command@footnote{Some distributions ship @command{extract} in a seperate package.} which is a command-line tool for extracting meta data. @command{extract} is given a list of filenames and prints the resulting meta data to the console. The @command{extract} source code also serves as an advanced example for how to use -@gnule{}. +GNU libextractor. This manual focuses on providing documentation for writing software -with @gnule{}. The only relevant parts for end-users -are the chapter on compiling and installing @gnule{} +with GNU libextractor. The only relevant parts for end-users +are the chapter on compiling and installing GNU libextractor (@xref{Preparation}.). Also, the chapter on existing plugins maybe of interest (@xref{Existing Plugins}.). Additional documentation for end-users can be find in the man page on @command{extract} (using @verb{|man extract|}). @cindex license -@gnule{} is licensed under the GNU General Public License, -specifically, since version 0.7, @gnule{} is licensed under GPLv3 +GNU libextractor is licensed under the GNU General Public License, +specifically, since version 0.7, GNU libextractor is licensed under GPLv3 @emph{or any later version}. @node Preparation @@ -170,12 +144,12 @@ should apply to all systems. Specific instructions for known problems for particular platforms are then described in individual sections afterwards. -Compiling @gnule{} follows the standard GNU autotools build process +Compiling GNU libextractor follows the standard GNU autotools build process using @command{configure} and @command{make}. For details on the GNU autotools build process, read the @file{INSTALL} file and query @verb{|./configure --help|} for additional options. -@gnule{} has various dependencies, most of which are optional. +GNU libextractor has various dependencies, most of which are optional. Instead of specifying the names of the software packages, we will give the list in terms of the names of the respective Debian (unstable) packages that should be installed. @@ -241,29 +215,29 @@ Please notify us if we missed some dependencies (note that the list is supposed to only list direct dependencies, not transitive dependencies). -Once you have compiled and installed @gnule{}, you should have a file +Once you have compiled and installed GNU libextractor, you should have a file @file{extractor.h} installed in your @file{include/} directory. This file should be the starting point for your C and C++ development with -@gnule{}. The build process also installs the @file{extract} binary and -man pages for @file{extract} and @gnule{}. The @file{extract} man page -documents the @file{extract} tool. The @gnule{} man page gives a brief -summary of the C API for @gnule{}. +GNU libextractor. The build process also installs the @file{extract} binary and +man pages for @file{extract} and GNU libextractor. The @file{extract} man page +documents the @file{extract} tool. The GNU libextractor man page gives a brief +summary of the C API for GNU libextractor. @cindex packageing @cindex directory structure @cindex plugin @cindex environment variables @vindex LIBEXTRACTOR_PREFIX -When you install @gnule{}, various plugins will be +When you install GNU libextractor, various plugins will be installed in the @file{lib/libextractor/} directory. The main library will be installed as @file{lib/libextractor.so}. Note that -@gnule{} will attempt to find the plugins relative to the +GNU libextractor will attempt to find the plugins relative to the path of the main library. Consequently, a package manager can move the library and its plugins to a different location later --- as long as the relative path between the main library and the plugins is preserved. As a method of last resort, the user can specify an environment variable @verb{|LIBEXTRACTOR_PREFIX|}. If -@gnule{} cannot locate a plugin, it will look in +GNU libextractor cannot locate a plugin, it will look in @verb{|LIBEXTRACTOR_PREFIX/lib/libextractor/|}. @@ -280,7 +254,7 @@ Should work using the standard instructions without problems. @section Installation on OpenBSD OpenBSD 3.8 also doesn't have CODESET in @file{langinfo.h}. CODESET -is used in @gnule{} in about three places. This causes problems +is used in GNU libextractor in about three places. This causes problems during compilation. @@ -477,9 +451,9 @@ comment - Testing keyword extraction @section Introduction to the libextractor library -Each public symbol exported by @gnule{} has the prefix +Each public symbol exported by GNU libextractor has the prefix @verb{|EXTRACTOR_|}. All-caps names are used for constants. For the -impatient, the minimal C code for using @gnule{} (on the +impatient, the minimal C code for using GNU libextractor (on the executing binary itself) looks like this: @verbatim @@ -499,7 +473,7 @@ main (int argc, char ** argv) @end verbatim The minimal API illustrated by this example is actually sufficient for -many applications. The full external C API of @gnule{} is described +many applications. The full external C API of GNU libextractor is described in chapter @xref{Extracting meta data}. Bindings for other languages are described in chapter @xref{Language bindings}. The API for writing new plugins is described in chapter @xref{Writing new Plugins}. @@ -507,7 +481,7 @@ writing new plugins is described in chapter @xref{Writing new Plugins}. @node Extracting meta data @chapter Extracting meta data -In order to extract meta data with @gnule{} you first need to +In order to extract meta data with GNU libextractor you first need to load the respective plugins and then call the extraction API with the plugins and the data to process. This section documents how to load and unload plugins, the various types @@ -531,8 +505,8 @@ and finally the extraction API itself. @cindex thread-safety @tindex enum EXTRACTOR_Options -Using @gnule{} from a multi-threaded parent process requires some -care. The problem is that on most platforms @gnule{} starts +Using GNU libextractor from a multi-threaded parent process requires some +care. The problem is that on most platforms GNU libextractor starts sub-processes for the actual extraction work. This is useful to isolate the parent process from potential bugs; however, it can cause problems if the parent process is multi-threaded. The issue is that @@ -545,7 +519,7 @@ actually been observed with a lock in GNU gettext that is triggered by the plugin startup code when it interacts with libltdl. The problem can be solved by loading the plugins using the -@code{EXTRACTOR_OPTION_IN_PROCESS} option, which will run @gnule{} +@code{EXTRACTOR_OPTION_IN_PROCESS} option, which will run GNU libextractor in-process and thus avoid the locking issue. In this case, all of the functions for loading and unloading plugins, including @verb{|EXTRACTOR_plugin_add_defaults|} and @@ -583,19 +557,19 @@ Unloads a particular plugin. The given name should be the short name of the plu @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add (struct EXTRACTOR_PluginList *plugins, const char* name,const char* options, enum EXTRACTOR_Options flags) @findex EXTRACTOR_plugin_add -Loads a particular plugin. The plugin is added to the existing list, which can be NULL. The second argument specifies the name of the plugin (i.e. ``ogg''). The third argument can be NULL and specifies plugin-specific options. Finally, the last argument specifies if the plugin should be executed out-of-process (@code{EXTRACTOR_OPTION_DEFAULT_POLICY}) or not. +Loads a particular plugin. The plugin is added to the existing list, which can be @code{NULL}. The second argument specifies the name of the plugin (i.e. ``ogg''). The third argument can be @code{NULL} and specifies plugin-specific options. Finally, the last argument specifies if the plugin should be executed out-of-process (@code{EXTRACTOR_OPTION_DEFAULT_POLICY}) or not. @end deftypefun @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add_config (struct EXTRACTOR_PluginList *plugins, const char* config, enum EXTRACTOR_Options flags) @findex EXTRACTOR_plugin_add_config -Loads and unloads plugins based on a configuration string, modifying the existing list, which can be NULL. The string has the format ``[-]NAME(OPTIONS)@{:[-]NAME(OPTIONS)@}*''. Prefixing the plugin name with a ``-'' means that the plugin should be unloaded. +Loads and unloads plugins based on a configuration string, modifying the existing list, which can be @code{NULL}. The string has the format ``[-]NAME(OPTIONS)@{:[-]NAME(OPTIONS)@}*''. Prefixing the plugin name with a ``-'' means that the plugin should be unloaded. @end deftypefun @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add_defaults (enum EXTRACTOR_Options flags) @findex EXTRACTOR_plugin_add_defaults -Loads all of the plugins in the plugin directory. This function is what most @gnule{} applications should use to setup the plugins. +Loads all of the plugins in the plugin directory. This function is what most GNU libextractor applications should use to setup the plugins. @end deftypefun @@ -607,14 +581,14 @@ Loads all of the plugins in the plugin directory. This function is what most @g @tindex enum EXTRACTOR_MetaType @findex EXTRACTOR_metatype_get_max -@verb{|enum EXTRACTOR_MetaType|} is a C enum which defines a list of over 100 different types of meta data. The total number can differ between different @gnule{} releases; the maximum value for the current release can be obtained using the @verb{|EXTRACTOR_metatype_get_max|} function. All values in this enumeration are of the form @verb{|EXTRACTOR_METATYPE_XXX|}. +@verb{|enum EXTRACTOR_MetaType|} is a C enum which defines a list of over 100 different types of meta data. The total number can differ between different GNU libextractor releases; the maximum value for the current release can be obtained using the @verb{|EXTRACTOR_metatype_get_max|} function. All values in this enumeration are of the form @verb{|EXTRACTOR_METATYPE_XXX|}. @deftypefun {const char *} EXTRACTOR_metatype_to_string (enum EXTRACTOR_MetaType type) @findex EXTRACTOR_metatype_to_string @cindex gettext @cindex internationalization -The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short English string @samp{s} describing the meta data type. The string can be translated into other languages using GNU gettext with the domain set to @gnule{} (@verb{|dgettext("libextractor", s)|}). +The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short English string @samp{s} describing the meta data type. The string can be translated into other languages using GNU gettext with the domain set to GNU libextractor (@verb{|dgettext("libextractor", s)|}). @end deftypefun @deftypefun {const char *} EXTRACTOR_metatype_to_description (enum EXTRACTOR_MetaType type) @@ -622,7 +596,7 @@ The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short @cindex gettext @cindex internationalization -The function @verb{|EXTRACTOR_metatype_to_description|} can be used to obtain a longer English string @samp{s} describing the meta data type. The description may be empty if the short description returned by @code{EXTRACTOR_metatype_to_string} is already comprehensive. The string can be translated into other languages using GNU gettext with the domain set to @gnule{} (@verb{|dgettext("libextractor", s)|}). +The function @verb{|EXTRACTOR_metatype_to_description|} can be used to obtain a longer English string @samp{s} describing the meta data type. The description may be empty if the short description returned by @code{EXTRACTOR_metatype_to_string} is already comprehensive. The string can be translated into other languages using GNU gettext with the domain set to GNU libextractor (@verb{|dgettext("libextractor", s)|}). @end deftypefun @@ -661,7 +635,7 @@ libextractor-type describing the meta data; format information about data @item data_mime_type -mime-type of data (not of the original file); can be NULL (if mime-type is not known); +mime-type of data (not of the original file); can be @code{NULL} (if mime-type is not known); @item data actual meta-data found @@ -683,11 +657,11 @@ Return 0 to continue extracting, 1 to abort. @cindex threads @cindex thread-safety -This is the main function for extracting keywords with @gnule{}. The first argument is a plugin list which specifies the set of plugins that should be used for extracting meta data. The @samp{filename} argument is optional and can be used to specify the name of a file to process. If @samp{filename} is NULL, then the @samp{data} argument must point to the in-memory data to extract meta data from. If @samp{filename} is non-NULL, @samp{data} can be NULL. If @samp{data} is non-null, then @samp{size} is the size of @samp{data} in bytes. Otherwise @samp{size} should be zero. For each meta data item found, GNU libextractor will call the @samp{proc} function, passing @samp{proc_cls} as the first argument to @samp{proc}. The other arguments to @samp{proc} depend on the specific meta data found. +This is the main function for extracting keywords with GNU libextractor. The first argument is a plugin list which specifies the set of plugins that should be used for extracting meta data. The @samp{filename} argument is optional and can be used to specify the name of a file to process. If @samp{filename} is @code{NULL}, then the @samp{data} argument must point to the in-memory data to extract meta data from. If @samp{filename} is non-@code{NULL}, @samp{data} can be @code{NULL}. If @samp{data} is non-null, then @samp{size} is the size of @samp{data} in bytes. Otherwise @samp{size} should be zero. For each meta data item found, GNU libextractor will call the @samp{proc} function, passing @samp{proc_cls} as the first argument to @samp{proc}. The other arguments to @samp{proc} depend on the specific meta data found. @cindex SIGBUS @cindex bus error -Meta data extraction should never really fail --- at worst, @gnule{} should not call @samp{proc} with any meta data. By design, @gnule{} should never crash or leak memory, even given corrupt files as input. Note however, that running @gnule{} on a corrupt file system (or incorrectly @verb{|mmap|}ed files) can result in the operating system sending a SIGBUS (bus error) to the process. While @gnule{} runs plugins out-of-process, it first maps the file into memory and then attempts to decompress it. During decompression it is possible to encounter a SIGBUS. @gnule{} will @emph{not} attempt to catch this signal and your application is likely to crash. Note again that this should only happen if the file @emph{system} is corrupt (not if individual files are corrupt). If this is not acceptable, you might want to consider running @gnule{} itself also out-of-process (as done, for example, by @url{http://grothoff.org/christian/doodle/,doodle}). +Meta data extraction should never really fail --- at worst, GNU libextractor should not call @samp{proc} with any meta data. By design, GNU libextractor should never crash or leak memory, even given corrupt files as input. Note however, that running GNU libextractor on a corrupt file system (or incorrectly @verb{|mmap|}ed files) can result in the operating system sending a SIGBUS (bus error) to the process. While GNU libextractor runs plugins out-of-process, it first maps the file into memory and then attempts to decompress it. During decompression it is possible to encounter a SIGBUS. GNU libextractor will @emph{not} attempt to catch this signal and your application is likely to crash. Note again that this should only happen if the file @emph{system} is corrupt (not if individual files are corrupt). If this is not acceptable, you might want to consider running GNU libextractor itself also out-of-process (as done, for example, by @url{http://grothoff.org/christian/doodle/,doodle}). @end deftypefun @@ -701,7 +675,7 @@ Meta data extraction should never really fail --- at worst, @gnule{} should not @cindex PHP @cindex Ruby -@gnule{} works immediately with C and C++ code. Bindings for Java, Mono, Ruby, Perl, PHP and Python are available for download from the main @gnule{} website. Documentation for these bindings (if available) is part of the downloads for the respective binding. In all cases, a full installation of the C library is required before the binding can be installed. +GNU libextractor works immediately with C and C++ code. Bindings for Java, Mono, Ruby, Perl, PHP and Python are available for download from the main GNU libextractor website. Documentation for these bindings (if available) is part of the downloads for the respective binding. In all cases, a full installation of the C library is required before the binding can be installed. @section Java @@ -763,7 +737,7 @@ This binding is undocumented at this point. @cindex concurrency @cindex threads @cindex thread-safety -This chapter describes various utility functions for @gnule{} usage. All of the functions are reentrant. +This chapter describes various utility functions for GNU libextractor usage. All of the functions are reentrant. @menu * Utility Constants:: @@ -961,12 +935,12 @@ below. @cindex UTF-8 @cindex character set @findex EXTRACTOR_common_convert_to_utf8 -Various @gnule{} plugins make use of the internal +Various GNU libextractor plugins make use of the internal @file{convert.h} header which defines a function @verb{|EXTRACTOR_common_convert_to_utf8|} which can be used to easily convert text from any character set to UTF-8. This conversion is important since the -linked list of keywords that is returned by @gnule{} is +linked list of keywords that is returned by GNU libextractor is expected to contain only UTF-8 strings. Naturally, proper conversion may not always be possible since some file formats fail to specify the character set. In that case, it is often better to not convert at @@ -990,9 +964,9 @@ caller, so storing the string in the keyword list is acceptable. @chapter Reporting bugs @cindex bug -@gnule{} uses the @url{https://gnunet.org/bugs/,Mantis bugtracking +GNU libextractor uses the @url{https://gnunet.org/bugs/,Mantis bugtracking system}. If possible, please report bugs there. You can also e-mail -the @gnule{} mailinglist at @url{libextractor@@gnu.org}. +the GNU libextractor mailinglist at @url{libextractor@@gnu.org}.