libextractor

GNU libextractor
Log | Files | Refs | Submodules | README | LICENSE

commit 40b9c39604e1d2d9db792940500aa48f933d5588
parent 8372891411f4e97914386b4626f1dcdb5ec167e8
Author: Christian Grothoff <christian@grothoff.org>
Date:   Wed, 13 Jan 2010 13:42:34 +0000

adding support for tail extraction, documenting, using it for ID3v1

Diffstat:
Mdoc/extractor.texi | 212+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------
Mdoc/version.texi | 2+-
Msrc/main/extractor.c | 361++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
Msrc/plugins/Makefile.am | 8++++++++
Asrc/plugins/id3_extractor.c | 305++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/plugins/mp3_extractor.c | 275++-----------------------------------------------------------------------------
6 files changed, 753 insertions(+), 410 deletions(-)

diff --git a/doc/extractor.texi b/doc/extractor.texi @@ -10,8 +10,10 @@ @c %**end of header @copying This manual is for GNU libextractor -(version @value{VERSION}, @value{UPDATED}), -which is GNU's library for meta data extraction. +(version @value{VERSION}, @value{UPDATED}). + +GNU libextractor is a GNU package. + Copyright @copyright{} 2007, 2010 Christian Grothoff @@ -73,7 +75,7 @@ Free Documentation License". @code{NULL} @end macro -@macro le{} +@macro gnule{} @acronym{GNU libextractor} @end macro @@ -84,24 +86,22 @@ Free Documentation License". @insertcopying @end ifnottex -GNU libextractor is a GNU package. - @menu -* Introduction:: What is @le{}. +* Introduction:: What is @gnule{}. * Preparation:: What you should do before using the library. * Generalities:: General library functions and data types. -* Extracting meta data:: How to use @le{} to obtain meta data. -* Language bindings:: How to use @le{} from languages other than C. -* Utility functions:: Utility functions of @le{}. +* Extracting meta data:: How to use @gnule{} to obtain meta data. +* Language bindings:: How to use @gnule{} from languages other than C. +* Utility functions:: Utility functions of @gnule{}. * Existing Plugins:: What plugins are available. -* Writing new Plugins:: How to write new plugins for @le{}. -* Internal utility functions:: Utility functions of @le{} for writing plugins. +* Writing new Plugins:: How to write new plugins for @gnule{}. +* Internal utility functions:: Utility functions of @gnule{} for writing plugins. * Reporting bugs:: How to report bugs or request new features. Appendices * Copying:: The GNU General Public License says how you - can copy and share some parts of @le{}. + can copy and share some parts of @gnule{}. Indices @@ -120,7 +120,7 @@ Indices @chapter Introduction @cindex error handling -@le{} is GNU's library for extracting meta data from +@gnule{} is GNU's library for extracting meta data from files. Meta data includes format information (such as mime type, image dimensions, color depth, recording frequency), content descriptions (such as document title or document description) and @@ -128,55 +128,55 @@ copyright information (such as license, author and contributors). Meta data extraction is an inherently uncertain business --- a parse error can be a corrupt file, an incompatibility in the file format version, an entirely different file format or a bug in the parser. As -a result of this uncertainty, @le{} deliberately +a result of this uncertainty, @gnule{} deliberately avoids to ever report any errors. Unexpected file contents simply result in less or possibly no meta data being extracted. @cindex plugin -@le{} uses plugins to handle various file formats. +@gnule{} uses plugins to handle various file formats. Technically a plugin can support multiple file formats; however, most plugins only support one particular format. By default, -@le{} will use all plugins that are available and found +@gnule{} will use all plugins that are available and found in the plugin installation directory. Applications can request the use of only specific plugins or the exclusion of certain plugins. -@le{} is distributed with the @command{extract} +@gnule{} is distributed with the @command{extract} command@footnote{Some distributions ship @command{extract} in a seperate package.} which is a command-line tool for extracting meta data. @command{extract} is given a list of filenames and prints the resulting meta data to the console. The @command{extract} source code also serves as an advanced example for how to use -@le{}. +@gnule{}. This manual focuses on providing documentation for writing software -with @le{}. The only relevant parts for end-users -are the chapter on compiling and installing @le{} +with @gnule{}. The only relevant parts for end-users +are the chapter on compiling and installing @gnule{} (@xref{Preparation}.). Also, the chapter on existing plugins maybe of interest (@xref{Existing Plugins}.). Additional documentation for end-users can be find in the man page on @command{extract} (using @verb{|man extract|}). @cindex license -@le{} is licensed under the GNU General Public License. The +@gnule{} is licensed under the GNU General Public License. The developers have frequently received requests to license GNU -libextractor under alternative terms. However, @le{} +libextractor under alternative terms. However, @gnule{} borrows plenty of GPL-licensed code from various other projects. Hence we cannot change the license (even if we wanted to).@footnote{It maybe possible to switch to GPLv3 in the future. For this, an audit of the license status of our dependencies would be required. The new -code that was developed specifically for @le{} has +code that was developed specifically for @gnule{} has always been licensed under GPLv2 @emph{or any later version}.} @node Preparation @chapter Preparation -Compiling @le{} follows the standard GNU autotools +Compiling @gnule{} follows the standard GNU autotools build process using @command{configure} and @command{make}. For details, read the @file{INSTALL} file and query @verb{|./configure --help|} for additional options. -@le{} has various dependencies, some of which are optional. +@gnule{} has various dependencies, some of which are optional. Instead of specifying the names of the software packages, we will give the list in terms of the names of the respective Debian (unstable) packages that should be installed. @@ -246,29 +246,29 @@ Please notify us if we missed some dependencies (note that the list is supposed to only list direct dependencies, not transitive dependencies). -Once you have compiled and installed @le{}, you should have a file +Once you have compiled and installed @gnule{}, you should have a file @file{extractor.h} installed in your @file{include/} directory. This file should be the starting point for your C and C++ development with -@le{}. The build process also installs the @file{extract} binary and -man pages for @file{extract} and @le{}. The @file{extract} man page -documents the @file{extract} tool. The @le{} man page gives a brief -summary of the C API for @le{}. +@gnule{}. The build process also installs the @file{extract} binary and +man pages for @file{extract} and @gnule{}. The @file{extract} man page +documents the @file{extract} tool. The @gnule{} man page gives a brief +summary of the C API for @gnule{}. @cindex packageing @cindex directory structure @cindex plugin @cindex environment variables @vindex LIBEXTRACTOR_PREFIX -When you install @le{}, various plugins will be +When you install @gnule{}, various plugins will be installed in the @file{lib/libextractor/} directory. The main library will be installed as @file{lib/libextractor.so}. Note that -@le{} will attempt to find the plugins relative to the +@gnule{} will attempt to find the plugins relative to the path of the main library. Consequently, a package manager can move the library and its plugins to a different location later --- as long as the relative path between the main library and the plugins is preserved. As a method of last resort, the user can specify an environment variable @verb{|LIBEXTRACTOR_PREFIX|}. If -@le{} cannot locate a plugin, it will look in +@gnule{} cannot locate a plugin, it will look in @verb{|LIBEXTRACTOR_PREFIX/lib/libextractor/|}. @section Note to package maintainers @@ -304,9 +304,9 @@ resources. @node Generalities @chapter Generalities -Each public symbol exported by @le{} has the prefix +Each public symbol exported by @gnule{} has the prefix @verb{|EXTRACTOR_|}. All-caps names are used for constants. For the -impatient, the minimal C code for using @le{} (on the +impatient, the minimal C code for using @gnule{} (on the executing binary itself) looks like this: @verbatim @@ -326,6 +326,13 @@ int main(int argc, char ** argv) { @node Extracting meta data @chapter Extracting meta data +In order to extract meta data with @gnule{} you first need to +load the respective plugins and then call the extraction API +with the plugins and the data to process. This section +documents how to load and unload plugins, the various types +and formats in which meta data is returned to the application +and finally the extraction API itself. + @menu * Plugin management:: How to load and unload plugins * Meta types:: About meta types @@ -350,7 +357,7 @@ from multiple threads at the same time is not safe. Creating multiple plugin lists and using them concurrently is supported as long as the @code{EXTRACTOR_OPTION_IN_PROCESS} option is not used. -Generally, @le{} is fully thread-safe and mostly reentrant. +Generally, @gnule{} is fully thread-safe and mostly reentrant. All plugin code is expected required to be reentrant and state-less, but due to the extensive use of 3rd party libraries this cannot be guaranteed. Hence plugins are executed (by default) out of @@ -402,7 +409,7 @@ Loads and unloads plugins based on a configuration string, modifying the existin @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add_defaults (enum EXTRACTOR_Options flags) @findex EXTRACTOR_plugin_add_defaults -Loads all of the plugins in the plugin directory. This function is what most @le{} applications should use to setup the plugins. +Loads all of the plugins in the plugin directory. This function is what most @gnule{} applications should use to setup the plugins. @end deftypefun @@ -414,14 +421,14 @@ Loads all of the plugins in the plugin directory. This function is what most @l @tindex enum EXTRACTOR_MetaType @findex EXTRACTOR_metatype_get_max -@verb{|enum EXTRACTOR_MetaType|} is a C enum which defines a list of over 100 different types of meta data. The total number can differ between different @le{} releases; the maximum value for the current release can be obtained using the @verb{|EXTRACTOR_metatype_get_max|} function. All values in this enumeration are of the form @verb{|EXTRACTOR_METATYPE_XXX|}. +@verb{|enum EXTRACTOR_MetaType|} is a C enum which defines a list of over 100 different types of meta data. The total number can differ between different @gnule{} releases; the maximum value for the current release can be obtained using the @verb{|EXTRACTOR_metatype_get_max|} function. All values in this enumeration are of the form @verb{|EXTRACTOR_METATYPE_XXX|}. @deftypefun {const char *} EXTRACTOR_metatype_to_string (enum EXTRACTOR_MetaType type) @findex EXTRACTOR_metatype_to_string @cindex gettext @cindex internationalization -The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short English string @samp{s} describing the meta data type. The string can be translated into other languages using GNU gettext with the domain set to @le{} (@verb{|dgettext("libextractor", s)|}). +The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short English string @samp{s} describing the meta data type. The string can be translated into other languages using GNU gettext with the domain set to @gnule{} (@verb{|dgettext("libextractor", s)|}). @end deftypefun @deftypefun {const char *} EXTRACTOR_metatype_to_description (enum EXTRACTOR_MetaType type) @@ -429,7 +436,7 @@ The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a short @cindex gettext @cindex internationalization -The function @verb{|EXTRACTOR_metatype_to_description|} can be used to obtain a longer English string @samp{s} describing the meta data type. The description may be empty if the short description returned by @code{EXTRACTOR_metatype_to_string} is already comprehensive. The string can be translated into other languages using GNU gettext with the domain set to @le{} (@verb{|dgettext("libextractor", s)|}). +The function @verb{|EXTRACTOR_metatype_to_description|} can be used to obtain a longer English string @samp{s} describing the meta data type. The description may be empty if the short description returned by @code{EXTRACTOR_metatype_to_string} is already comprehensive. The string can be translated into other languages using GNU gettext with the domain set to @gnule{} (@verb{|dgettext("libextractor", s)|}). @end deftypefun @@ -490,11 +497,11 @@ Return 0 to continue extracting, 1 to abort. @cindex threads @cindex thread-safety -This is the main function for extracting keywords with @le{}. The first argument is a plugin list which specifies the set of plugins that should be used for extracting meta data. The @samp{filename} argument is optional and can be used to specify the name of a file to process. If @samp{filename} is NULL, then the @samp{data} argument must point to the in-memory data to extract meta data from. If @samp{filename} is non-NULL, @samp{data} can be NULL. If @samp{data} is non-null, then @samp{size} is the size of @samp{data} in bytes. Otherwise @samp{size} should be zero. For each meta data item found, GNU libextractor will call the @samp{proc} function, passing @samp{proc_cls} as the first argument to @samp{proc}. The other arguments to @samp{proc} depend on the specific meta data found. +This is the main function for extracting keywords with @gnule{}. The first argument is a plugin list which specifies the set of plugins that should be used for extracting meta data. The @samp{filename} argument is optional and can be used to specify the name of a file to process. If @samp{filename} is NULL, then the @samp{data} argument must point to the in-memory data to extract meta data from. If @samp{filename} is non-NULL, @samp{data} can be NULL. If @samp{data} is non-null, then @samp{size} is the size of @samp{data} in bytes. Otherwise @samp{size} should be zero. For each meta data item found, GNU libextractor will call the @samp{proc} function, passing @samp{proc_cls} as the first argument to @samp{proc}. The other arguments to @samp{proc} depend on the specific meta data found. @cindex SIGBUS @cindex bus error -Meta data extraction should never really fail --- at worst, @le{} should not call @samp{proc} with any meta data. By design, @le{} should never crash or leak memory, even given corrupt files as input. Note however, that running @le{} on a corrupt file system (or incorrectly @verb{|mmap|}ed files) can result in the operating system sending a SIGBUS (bus error) to the process. While @le{} runs plugins out-of-process, it first maps the file into memory and then attempts to decompress it. During decompression it is possible to encounter a SIGBUS. @le{} will @emph{not} attempt to catch this signal and your application is likely to crash. Note again that this should only happen if the file @emph{system} is corrupt (not if individual files are corrupt). If this is not acceptable, you might want to consider running @le{} itself also out-of-process (as done, for example, by @url{http://grothoff.org/christian/doodle/,doodle}). +Meta data extraction should never really fail --- at worst, @gnule{} should not call @samp{proc} with any meta data. By design, @gnule{} should never crash or leak memory, even given corrupt files as input. Note however, that running @gnule{} on a corrupt file system (or incorrectly @verb{|mmap|}ed files) can result in the operating system sending a SIGBUS (bus error) to the process. While @gnule{} runs plugins out-of-process, it first maps the file into memory and then attempts to decompress it. During decompression it is possible to encounter a SIGBUS. @gnule{} will @emph{not} attempt to catch this signal and your application is likely to crash. Note again that this should only happen if the file @emph{system} is corrupt (not if individual files are corrupt). If this is not acceptable, you might want to consider running @gnule{} itself also out-of-process (as done, for example, by @url{http://grothoff.org/christian/doodle/,doodle}). @end deftypefun @@ -509,7 +516,7 @@ Meta data extraction should never really fail --- at worst, @le{} should not cal @cindex PHP @cindex Ruby -@le{} works immediately with C and C++ code. Bindings for Java, Mono, Ruby, Perl, PHP and Python are available for download from the main @le{} website. Documentation for these bindings (if available) is part of the downloads for the respective binding. In all cases, a full installation of the C library is required before the binding can be installed. +@gnule{} works immediately with C and C++ code. Bindings for Java, Mono, Ruby, Perl, PHP and Python are available for download from the main @gnule{} website. Documentation for these bindings (if available) is part of the downloads for the respective binding. In all cases, a full installation of the C library is required before the binding can be installed. @section Java @@ -571,7 +578,7 @@ This binding is undocumented at this point. @cindex concurrency @cindex threads @cindex thread-safety -This chapter describes various utility functions for @le{} usage. All of the functions are reentrant. +This chapter describes various utility functions for @gnule{} usage. All of the functions are reentrant. @menu * Utility Constants:: @@ -724,6 +731,115 @@ in-process (making it easier to debug) and without any of the other plugins. +@section Example for a minimal extract method + +The following example shows how a plugin can return the mime type of +a file. +@example + +int +EXTRACTOR_mymime_extract + (const char *data, + size_t data_size, + EXTRACTOR_MetaDataProcessor proc, + void *proc_cls, + const char * options) +{ + if (data_size < 4) + return 0; + if (0 != memcmp (data, "\177ELF", 4)) + return 0; + if (0 != proc (proc_cls, + "mymime", + EXTRACTOR_METATYPE_MIMETYPE, + EXTRACTOR_METAFORMAT_UTF8, + "text/plain", + "application/x-executable", + 1 + strlen("application/x-executable"))) + return 1; + /* more calls to 'proc' here as needed */ + return 0; +} + +@end example + +@section Plugin execution options + +Plugins can request that their execution be done in a particular way. +For this, the plugin defines a function with the following signature: + +@verbatim +const char * +EXTRACTOR_XXX_options (void); +@end verbatim + +The function should return a string with the execution options. +Individual options in this string should be separated by semicolons. +Options that are included in the string but not known to the library +are ignored. The following options are supported: + +@itemize @bullet +@item +@code{oop-only} ensures that the plugin is only run out-of-process; if +this is not possible, the plugin will not be executed at all if this +option is set. + +@item +@code{close-stderr} ensures that @code{stderr} is closed during the +execution of the plugin. This is useful if the plugin uses libraries +that write (error) messages to @code{stderr} and where this behavior cannot be +turned off. This option only works if the plugin is executed out-of-process. + +@item +@code{close-stdout} ensures that @code{stdout} is closed during the +execution of the plugin. This is useful if the plugin uses libraries +that write messages to @code{stdout} and where this behavior cannot be +turned off. This option only works if the plugin is executed out-of-process. + +@item +@code{force-kill} kills and restarts the plugin process for each +file that is being analyzed. This is useful if the plugin uses +libraries that keep global state between runs that is problematic or +if the plugin uses libraries that are known to have serious resource +leaks (such as memory leaks). + +@item +@code{want-tail} +In order to limit memory consumption, limit the amount if reading from +disk and to keep the API simple, the @samp{data} argument passed to +the @code{EXTRACTOR_XXX_extract} method bounded (to 32 MB of normal +data; for compressed data, a limit of 16 MB is imposed).@footnote{If +@gnule{} was given a pointer to an existing, uncompressed block of +data in memory, no bound is imposed for plugins executing in-process; +for out-of-process plugins, a 32 MB limit is still imposed.} Since +some file formats contain meta data at the end of the file, this option +provides a way for plugins to access not the first 16--32 MB of a file +but instead the last (roughly) 32 MB. + +Note that even for files larger than 32 MB, @samp{size} is not +guaranteed to be 32 MB since @samp{data} will be aligned to the page +size of the operating system. However, the last byte of @samp{data} +is guaranteed to be the last byte of the file. Furthermore, if the +file was large and compressed, unlike in the case of meta data +extraction from the header, the end of the file will not be +automatically decompressed by @gnule{}. + +@end itemize + +Note that using options other than @code{want-tail} is pretty much +always a kludge and should thus be avoided. + +@section Example for an options method + +The following example shows how a plugin can set some of the options listed above: +@example +const char * +EXTRACTOR_id3_options () +{ + return "close-stderr;want-tail"; +} +@end example + @node Internal utility functions @chapter Internal utility functions @@ -752,12 +868,12 @@ below. @cindex UTF-8 @cindex character set @findex EXTRACTOR_common_convert_to_utf8 -Various @le{} plugins make use of the internal +Various @gnule{} plugins make use of the internal @file{convert.h} header which defines a function @verb{|EXTRACTOR_common_convert_to_utf8|} which can be used to easily convert text from any character set to UTF-8. This conversion is important since the -linked list of keywords that is returned by @le{} is +linked list of keywords that is returned by @gnule{} is expected to contain only UTF-8 strings. Naturally, proper conversion may not always be possible since some file formats fail to specify the character set. In that case, it is often better to not convert at @@ -781,9 +897,9 @@ caller, so storing the string in the keyword list is acceptable. @chapter Reporting bugs @cindex bug -@le{} uses the @url{http://gnunet.org/bugs/,Mantis bugtracking +@gnule{} uses the @url{http://gnunet.org/bugs/,Mantis bugtracking system}. If possible, please report bugs there. You can also e-mail -the @le{} mailinglist at @url{libextractor@@gnu.org}. +the @gnule{} mailinglist at @url{libextractor@@gnu.org}. diff --git a/doc/version.texi b/doc/version.texi @@ -1,4 +1,4 @@ -@set UPDATED 1 January 2010 +@set UPDATED 13 January 2010 @set UPDATED-MONTH January 2010 @set EDITION 0.6.0 @set VERSION 0.6.0 diff --git a/src/main/extractor.c b/src/main/extractor.c @@ -630,6 +630,7 @@ EXTRACTOR_plugin_add_defaults(enum EXTRACTOR_Options flags) */ static void * get_symbol_with_prefix(void *lib_handle, + const char *template, const char *prefix, const char **options) { @@ -649,9 +650,9 @@ get_symbol_with_prefix(void *lib_handle, dot = strstr (sym, "."); if (dot != NULL) *dot = '\0'; - name = malloc(strlen(sym) + 32); + name = malloc(strlen(sym) + strlen(template) + 1); sprintf(name, - "_EXTRACTOR_%s_extract", + template, sym); /* try without '_' first */ symbol = lt_dlsym(lib_handle, name + 1); @@ -678,7 +679,8 @@ get_symbol_with_prefix(void *lib_handle, #endif } - if (symbol != NULL) + if ( (symbol != NULL) && + (NULL != options) ) { /* get special options */ sprintf(name, @@ -741,6 +743,7 @@ plugin_load (struct EXTRACTOR_PluginList *plugin) return -1; } plugin->extractMethod = get_symbol_with_prefix (plugin->libraryHandle, + "_EXTRACTOR_%s_extract", plugin->libname, &plugin->specials); if (plugin->extractMethod == NULL) @@ -1094,10 +1097,9 @@ transmit_reply (void *cls, /** - * 'main' function of the child process. - * Reads shm-filenames from 'in' (line-by-line) and - * writes meta data blocks to 'out'. The meta data - * stream is terminated by an empty entry. + * 'main' function of the child process. Reads shm-filenames from + * 'in' (line-by-line) and writes meta data blocks to 'out'. The meta + * data stream is terminated by an empty entry. * * @param plugin extractor plugin to use * @param in stream to read from @@ -1108,12 +1110,15 @@ process_requests (struct EXTRACTOR_PluginList *plugin, int in, int out) { - char fn[256]; + char hfn[256]; + char tfn[256]; + char *fn; FILE *fin; void *ptr; int shmid; struct IpcHeader hdr; size_t size; + int want_tail; #ifdef WINDOWS HANDLE map; #endif @@ -1129,6 +1134,13 @@ process_requests (struct EXTRACTOR_PluginList *plugin, #endif return; } + want_tail = 0; + if ( (plugin->specials != NULL) && + (NULL != strstr (plugin->specials, + "want-tail")) ) + { + want_tail = 1; + } if ( (plugin->specials != NULL) && (NULL != strstr (plugin->specials, "close-stderr")) ) @@ -1144,12 +1156,27 @@ process_requests (struct EXTRACTOR_PluginList *plugin, memset (&hdr, 0, sizeof (hdr)); fin = fdopen (in, "r"); - while (NULL != fgets (fn, sizeof(fn), fin)) + while (NULL != fgets (hfn, sizeof(hfn), fin)) { - if (strlen (fn) == 0) + if (strlen (hfn) <= 1) break; ptr = NULL; - fn[strlen(fn)-1] = '\0'; /* kill newline */ + hfn[strlen(hfn)-1] = '\0'; /* kill newline */ + if (NULL == fgets (tfn, sizeof(tfn), fin)) + break; + if ('!' != tfn[0]) + break; + tfn[strlen(tfn)-1] = '\0'; /* kill newline */ + if ( (want_tail) && + (strlen (tfn) > 1) ) + { + fn = &tfn[1]; + } + else + { + fn = hfn; + } + #ifndef WINDOWS if ( (-1 != (shmid = shm_open (fn, O_RDONLY, 0))) && (((off_t)-1) != (size = lseek (shmid, 0, SEEK_END))) && @@ -1161,12 +1188,13 @@ process_requests (struct EXTRACTOR_PluginList *plugin, if (ptr != NULL) #endif { - if (0 != plugin->extractMethod (ptr, - size, - &transmit_reply, - &out, - plugin->plugin_options)) - break; + if ( (plugin->extractMethod != NULL) && + (0 != plugin->extractMethod (ptr, + size, + &transmit_reply, + &out, + plugin->plugin_options)) ) + break; if (0 != write_all (out, &hdr, sizeof(hdr))) break; } @@ -1195,8 +1223,10 @@ process_requests (struct EXTRACTOR_PluginList *plugin, close (out); } + #ifdef WINDOWS -static void write_plugin_data (HANDLE h, const struct EXTRACTOR_PluginList *plugin) +static void +write_plugin_data (HANDLE h, const struct EXTRACTOR_PluginList *plugin) { size_t i; DWORD len; @@ -1217,7 +1247,9 @@ static void write_plugin_data (HANDLE h, const struct EXTRACTOR_PluginList *plug WriteFile (h, plugin->plugin_options, i, &len, NULL); } -static struct EXTRACTOR_PluginList *read_plugin_data (FILE *f) + +static struct EXTRACTOR_PluginList * +read_plugin_data (FILE *f) { struct EXTRACTOR_PluginList *ret; size_t i; @@ -1239,7 +1271,9 @@ static struct EXTRACTOR_PluginList *read_plugin_data (FILE *f) return ret; } -void CALLBACK RundllEntryPoint(HWND hwnd, HINSTANCE hinst, LPSTR lpszCmdLine, int nCmdShow) + +void CALLBACK +RundllEntryPoint(HWND hwnd, HINSTANCE hinst, LPSTR lpszCmdLine, int nCmdShow) { int in, out; @@ -1253,6 +1287,7 @@ void CALLBACK RundllEntryPoint(HWND hwnd, HINSTANCE hinst, LPSTR lpszCmdLine, in } #endif + /** * Start the process for the given plugin. */ @@ -1331,6 +1366,7 @@ start_process (struct EXTRACTOR_PluginList *plugin) * * @param plugin which plugin to call * @param shmfn file name of the shared memory segment + * @param tshmfn file name of the shared memory segment for the end of the data * @param proc function to call on the meta data * @param proc_cls cls for proc * @return 0 if proc did not return non-zero @@ -1338,6 +1374,7 @@ start_process (struct EXTRACTOR_PluginList *plugin) static int extract_oop (struct EXTRACTOR_PluginList *plugin, const char *shmfn, + const char *tshmfn, EXTRACTOR_MetaDataProcessor proc, void *proc_cls) { @@ -1347,7 +1384,19 @@ extract_oop (struct EXTRACTOR_PluginList *plugin, if (plugin->cpid == -1) return 0; - if (0 >= fprintf (plugin->cpipe_in, "%s\n", shmfn)) + if (0 >= fprintf (plugin->cpipe_in, + "%s\n", + shmfn)) + { + stop_process (plugin); + plugin->cpid = -1; + if (plugin->flags != EXTRACTOR_OPTION_DEFAULT_POLICY) + plugin->flags = EXTRACTOR_OPTION_DISABLED; + return 0; + } + if (0 >= fprintf (plugin->cpipe_in, + "!%s\n", + (tshmfn != NULL) ? tshmfn : "")) { stop_process (plugin); plugin->cpid = -1; @@ -1420,33 +1469,108 @@ extract_oop (struct EXTRACTOR_PluginList *plugin, /** - * Extract keywords from a file using the given set of plugins. + * Setup a shared memory segment. + * + * @param ptr set to the location of the shm segment + * @param shmid where to store the shm ID + * @param fn name of the shared segment + * @param fn_size size available in fn + * @param size number of bytes to allocated for the segment + * @return 0 on success + */ +static int +make_shm (int is_tail, + void **ptr, +#ifndef WINDOWS + int *shmid, +#else + HANDLE *mappedFile, + HANDLE *map, +#endif + char *fn, + size_t fn_size, + size_t size) +{ + snprintf (fn, + fn_size, +#ifdef WINDOWS + "%TEMP%\\" +#else + "/" +#endif + "libextractor-%sshm-%u-%u", + (is_tail) ? "t" : "", + getpid(), + (unsigned int) RANDOM()); +#ifndef WINDOWS + *shmid = shm_open (fn, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR); + *ptr = NULL; + if (-1 == (*shmid)) + return 1; + if ( (0 != ftruncate (*shmid, size)) || + (NULL == (*ptr = mmap (NULL, size, PROT_WRITE, MAP_SHARED, *shmid, 0))) || + (*ptr == (void*) -1) ) + { + close (*shmid); + *shmid = -1; + return 1; + } + return 0; +#else + *mappedFile = CreateFile (fn, + GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, CREATE_ALWAYS, + FILE_FLAG_DELETE_ON_CLOSE, NULL); + *map = CreateFileMapping (*mappedFile, NULL, PAGE_READWRITE, 1, 0, NULL); + ptr = MapViewOfFile (*map, FILE_MAP_READ, 0, 0, 0); + if (ptr == NULL) + { + CloseHandle (*map); + CloseHandle (*mappedFile); + return 1; + } +#endif + return 0; +} + + +/** + * Extract keywords using the given set of plugins. * * @param plugins the list of plugins to use - * @param filename the name of the file, can be NULL * @param data data to process, never NULL * @param size number of bytes in data, ignored if data is NULL + * @param tdata end of file data, or NULL + * @param tsize number of bytes in tdata * @param proc function to call for each meta data item found * @param proc_cls cls argument to proc */ static void extract (struct EXTRACTOR_PluginList *plugins, - const char * filename, const char * data, size_t size, + const char * tdata, + size_t tsize, EXTRACTOR_MetaDataProcessor proc, void *proc_cls) { struct EXTRACTOR_PluginList *ppos; -#ifndef WINDOWS - int shmid; -#else - HANDLE map, mappedFile; -#endif enum EXTRACTOR_Options flags; void *ptr; + void *tptr; char fn[255]; + char tfn[255]; int want_shm; + int want_tail; +#ifndef WINDOWS + int shmid; + int tshmid; +#else + HANDLE map; + HANDLE mappedFile; + HANDLE tmap; + HANDLE tmappedFile; +#endif want_shm = 0; ppos = plugins; @@ -1472,100 +1596,106 @@ extract (struct EXTRACTOR_PluginList *plugins, } ppos = ppos->next; } + ptr = NULL; + tptr = NULL; if (want_shm) { - snprintf (fn, - sizeof(fn), -#ifdef WINDOWS - "%TEMP%\\" + if (size > MAX_READ) + size = MAX_READ; + if (0 == make_shm (0, + &ptr, +#ifndef WINDOWS + &shmid, #else - "/" + &mappedFile, + &map, #endif - "libextractor-shm-%u-%u", - getpid(), - (unsigned int) RANDOM()); -#ifndef WINDOWS - shmid = shm_open (fn, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR); - ptr = NULL; - if (shmid != -1) + fn, sizeof(fn), size)) { - if ( (0 != ftruncate (shmid, size)) || - (NULL == (ptr = mmap (NULL, size, PROT_WRITE, MAP_SHARED, shmid, 0))) || - (ptr == (void*) -1) ) + memcpy (ptr, data, size); + if ( (tdata != NULL) && + (0 == make_shm (1, + &tptr, +#ifndef WINDOWS + &tshmid, +#else + &tmappedFile, + &tmap, +#endif + tfn, sizeof(tfn), tsize)) ) { - close (shmid); - shmid = -1; + memcpy (tptr, tdata, tsize); } else { - memcpy (ptr, data, size); + tptr = NULL; } } -#else - mappedFile = CreateFile (fn, GENERIC_READ | GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, CREATE_ALWAYS, - FILE_FLAG_DELETE_ON_CLOSE, NULL); - map = CreateFileMapping (mappedFile, NULL, PAGE_READWRITE, 1, 0, NULL); - ptr = MapViewOfFile (map, FILE_MAP_READ, 0, 0, 0); - if (ptr == NULL) - { - CloseHandle (map); - CloseHandle (mappedFile); - map = NULL; - } else - memcpy (ptr, data, size); -#endif + { + want_shm = 0; + } } - else -#ifndef WINDOWS - shmid = -1; - if (want_shm && (shmid == -1)) - _exit(1); -#else - map = NULL; - if (want_shm && map == NULL) - _exit(1); -#endif ppos = plugins; while (NULL != ppos) { flags = ppos->flags; -#ifndef WINDOWS - if (shmid == -1) -#else - if (map == NULL) -#endif + if (! want_shm) flags = EXTRACTOR_OPTION_IN_PROCESS; switch (flags) { case EXTRACTOR_OPTION_DEFAULT_POLICY: - if (0 != extract_oop (ppos, fn, proc, proc_cls)) + if (0 != extract_oop (ppos, fn, + (tptr != NULL) ? tfn : NULL, + proc, proc_cls)) return; if (ppos->cpid == -1) { start_process (ppos); - if (0 != extract_oop (ppos, fn, proc, proc_cls)) + if (0 != extract_oop (ppos, fn, + (tptr != NULL) ? tfn : NULL, + proc, proc_cls)) return; } break; case EXTRACTOR_OPTION_OUT_OF_PROCESS_NO_RESTART: - if (0 != extract_oop (ppos, fn, proc, proc_cls)) + if (0 != extract_oop (ppos, fn, + (tptr != NULL) ? tfn : NULL, + proc, proc_cls)) return; break; case EXTRACTOR_OPTION_IN_PROCESS: - if (NULL == ppos->extractMethod) + want_tail = ( (ppos->specials != NULL) && + (NULL != strstr (ppos->specials, + "want-tail"))); + if (NULL == ppos->extractMethod) plugin_load (ppos); if ( ( (ppos->specials == NULL) || (NULL == strstr (ppos->specials, - "oop-only")) ) && - (NULL != ppos->extractMethod) && - (0 != ppos->extractMethod (data, - size, - proc, - proc_cls, - ppos->plugin_options)) ) - return; + "oop-only")) ) ) + { + if (want_tail) + { + if ( (NULL != ppos->extractMethod) && + (tdata != NULL) && + (0 != ppos->extractMethod (tdata, + tsize, + proc, + proc_cls, + ppos->plugin_options)) ) + return; + } + else + { + if ( (NULL != ppos->extractMethod) && + (0 != ppos->extractMethod (data, + size, + proc, + proc_cls, + ppos->plugin_options)) ) + return; + } + } break; case EXTRACTOR_OPTION_DISABLED: break; @@ -1580,10 +1710,21 @@ extract (struct EXTRACTOR_PluginList *plugins, if (shmid != -1) close (shmid); shm_unlink (fn); + if (NULL != tptr) + munmap (tptr, tsize); + if (tshmid != -1) + close (tshmid); + shm_unlink (tfn); #else UnmapViewOfFile (ptr); CloseHandle (map); CloseHandle (mappedFile); + if (tptr != NULL) + { + UnmapViewOfFile (tptr); + CloseHandle (tmap); + CloseHandle (tmappedFile); + } #endif } } @@ -1595,17 +1736,19 @@ extract (struct EXTRACTOR_PluginList *plugins, * contents if they were not compressed). * * @param plugins the list of plugins to use - * @param filename the name of the file, can be NULL * @param data data to process, never NULL - * @param size number of bytes in data, ignored if data is NULL + * @param size number of bytes in data + * @param tdata end of file data, or NULL + * @param tsize number of bytes in tdata * @param proc function to call for each meta data item found * @param proc_cls cls argument to proc */ static void decompress_and_extract (struct EXTRACTOR_PluginList *plugins, - const char * filename, const unsigned char * data, size_t size, + const char * tdata, + size_t tsize, EXTRACTOR_MetaDataProcessor proc, void *proc_cls) { unsigned char * buf; @@ -1838,9 +1981,10 @@ decompress_and_extract (struct EXTRACTOR_PluginList *plugins, size = dsize; } extract (plugins, - filename, (const char*) data, size, + tdata, + tsize, proc, proc_cls); if (buf != NULL) @@ -1908,9 +2052,13 @@ EXTRACTOR_extract (struct EXTRACTOR_PluginList *plugins, { int fd; void * buffer; + void * tbuffer; struct stat fstatbuf; size_t fsize; + size_t tsize; int eno; + off_t offset; + long pg; fd = -1; buffer = NULL; @@ -1941,14 +2089,41 @@ EXTRACTOR_extract (struct EXTRACTOR_PluginList *plugins, if ( (buffer == NULL) && (data == NULL) ) return; + /* for footer extraction */ + tsize = 0; + tbuffer = NULL; + if ( (data == NULL) && + (fstatbuf.st_size > fsize) && + (fstatbuf.st_size > MAX_READ) ) + { + pg = sysconf (_SC_PAGE_SIZE); + if ( (pg > 0) && + (pg < MAX_READ) ) + { + offset = (1 + (fstatbuf.st_size - MAX_READ) / pg) * pg; + if (offset < fstatbuf.st_size) + { + tsize = fstatbuf.st_size - offset; + tbuffer = MMAP (NULL, tsize, PROT_READ, MAP_PRIVATE, fd, offset); + if ( (tbuffer == NULL) || (tbuffer == (void *) -1) ) + { + tsize = 0; + tbuffer = NULL; + } + } + } + } decompress_and_extract (plugins, - filename, buffer != NULL ? buffer : data, buffer != NULL ? fsize : size, + tbuffer, + tsize, proc, proc_cls); if (buffer != NULL) MUNMAP (buffer, fsize); + if (tbuffer != NULL) + MUNMAP (tbuffer, tsize); if (-1 != fd) close(fd); } diff --git a/src/plugins/Makefile.am b/src/plugins/Makefile.am @@ -86,6 +86,7 @@ plugin_LTLIBRARIES = \ libextractor_flv.la \ libextractor_gif.la \ libextractor_html.la \ + libextractor_id3.la \ libextractor_id3v2.la \ libextractor_id3v23.la \ libextractor_id3v24.la \ @@ -186,6 +187,13 @@ libextractor_html_la_LDFLAGS = \ libextractor_html_la_LIBADD = \ $(top_builddir)/src/common/libextractor_common.la +libextractor_id3_la_SOURCES = \ + id3_extractor.c +libextractor_id3_la_LDFLAGS = \ + $(PLUGINFLAGS) +libextractor_id3_la_LIBADD = \ + $(top_builddir)/src/common/libextractor_common.la + libextractor_id3v2_la_SOURCES = \ id3v2_extractor.c libextractor_id3v2_la_LDFLAGS = \ diff --git a/src/plugins/id3_extractor.c b/src/plugins/id3_extractor.c @@ -0,0 +1,305 @@ +/* + This file is part of libextractor. + (C) 2002, 2003, 2004, 2006, 2009, 2010 Vidyut Samanta and Christian Grothoff + + libextractor is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 2, or (at your + option) any later version. + + libextractor is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with libextractor; see the file COPYING. If not, write to the + Free Software Foundation, Inc., 59 Temple Place - Suite 330, + Boston, MA 02111-1307, USA. + + */ + +#include "platform.h" +#include "extractor.h" +#include "convert.h" +#include <string.h> +#include <stdio.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <unistd.h> +#include <stdlib.h> + +typedef struct +{ + char *title; + char *artist; + char *album; + char *year; + char *comment; + const char *genre; + unsigned int track_number; +} id3tag; + +static const char *const genre_names[] = { + gettext_noop ("Blues"), + gettext_noop ("Classic Rock"), + gettext_noop ("Country"), + gettext_noop ("Dance"), + gettext_noop ("Disco"), + gettext_noop ("Funk"), + gettext_noop ("Grunge"), + gettext_noop ("Hip-Hop"), + gettext_noop ("Jazz"), + gettext_noop ("Metal"), + gettext_noop ("New Age"), + gettext_noop ("Oldies"), + gettext_noop ("Other"), + gettext_noop ("Pop"), + gettext_noop ("R&B"), + gettext_noop ("Rap"), + gettext_noop ("Reggae"), + gettext_noop ("Rock"), + gettext_noop ("Techno"), + gettext_noop ("Industrial"), + gettext_noop ("Alternative"), + gettext_noop ("Ska"), + gettext_noop ("Death Metal"), + gettext_noop ("Pranks"), + gettext_noop ("Soundtrack"), + gettext_noop ("Euro-Techno"), + gettext_noop ("Ambient"), + gettext_noop ("Trip-Hop"), + gettext_noop ("Vocal"), + gettext_noop ("Jazz+Funk"), + gettext_noop ("Fusion"), + gettext_noop ("Trance"), + gettext_noop ("Classical"), + gettext_noop ("Instrumental"), + gettext_noop ("Acid"), + gettext_noop ("House"), + gettext_noop ("Game"), + gettext_noop ("Sound Clip"), + gettext_noop ("Gospel"), + gettext_noop ("Noise"), + gettext_noop ("Alt. Rock"), + gettext_noop ("Bass"), + gettext_noop ("Soul"), + gettext_noop ("Punk"), + gettext_noop ("Space"), + gettext_noop ("Meditative"), + gettext_noop ("Instrumental Pop"), + gettext_noop ("Instrumental Rock"), + gettext_noop ("Ethnic"), + gettext_noop ("Gothic"), + gettext_noop ("Darkwave"), + gettext_noop ("Techno-Industrial"), + gettext_noop ("Electronic"), + gettext_noop ("Pop-Folk"), + gettext_noop ("Eurodance"), + gettext_noop ("Dream"), + gettext_noop ("Southern Rock"), + gettext_noop ("Comedy"), + gettext_noop ("Cult"), + gettext_noop ("Gangsta Rap"), + gettext_noop ("Top 40"), + gettext_noop ("Christian Rap"), + gettext_noop ("Pop/Funk"), + gettext_noop ("Jungle"), + gettext_noop ("Native American"), + gettext_noop ("Cabaret"), + gettext_noop ("New Wave"), + gettext_noop ("Psychedelic"), + gettext_noop ("Rave"), + gettext_noop ("Showtunes"), + gettext_noop ("Trailer"), + gettext_noop ("Lo-Fi"), + gettext_noop ("Tribal"), + gettext_noop ("Acid Punk"), + gettext_noop ("Acid Jazz"), + gettext_noop ("Polka"), + gettext_noop ("Retro"), + gettext_noop ("Musical"), + gettext_noop ("Rock & Roll"), + gettext_noop ("Hard Rock"), + gettext_noop ("Folk"), + gettext_noop ("Folk/Rock"), + gettext_noop ("National Folk"), + gettext_noop ("Swing"), + gettext_noop ("Fast-Fusion"), + gettext_noop ("Bebob"), + gettext_noop ("Latin"), + gettext_noop ("Revival"), + gettext_noop ("Celtic"), + gettext_noop ("Bluegrass"), + gettext_noop ("Avantgarde"), + gettext_noop ("Gothic Rock"), + gettext_noop ("Progressive Rock"), + gettext_noop ("Psychedelic Rock"), + gettext_noop ("Symphonic Rock"), + gettext_noop ("Slow Rock"), + gettext_noop ("Big Band"), + gettext_noop ("Chorus"), + gettext_noop ("Easy Listening"), + gettext_noop ("Acoustic"), + gettext_noop ("Humour"), + gettext_noop ("Speech"), + gettext_noop ("Chanson"), + gettext_noop ("Opera"), + gettext_noop ("Chamber Music"), + gettext_noop ("Sonata"), + gettext_noop ("Symphony"), + gettext_noop ("Booty Bass"), + gettext_noop ("Primus"), + gettext_noop ("Porn Groove"), + gettext_noop ("Satire"), + gettext_noop ("Slow Jam"), + gettext_noop ("Club"), + gettext_noop ("Tango"), + gettext_noop ("Samba"), + gettext_noop ("Folklore"), + gettext_noop ("Ballad"), + gettext_noop ("Power Ballad"), + gettext_noop ("Rhythmic Soul"), + gettext_noop ("Freestyle"), + gettext_noop ("Duet"), + gettext_noop ("Punk Rock"), + gettext_noop ("Drum Solo"), + gettext_noop ("A Cappella"), + gettext_noop ("Euro-House"), + gettext_noop ("Dance Hall"), + gettext_noop ("Goa"), + gettext_noop ("Drum & Bass"), + gettext_noop ("Club-House"), + gettext_noop ("Hardcore"), + gettext_noop ("Terror"), + gettext_noop ("Indie"), + gettext_noop ("BritPop"), + gettext_noop ("Negerpunk"), + gettext_noop ("Polsk Punk"), + gettext_noop ("Beat"), + gettext_noop ("Christian Gangsta Rap"), + gettext_noop ("Heavy Metal"), + gettext_noop ("Black Metal"), + gettext_noop ("Crossover"), + gettext_noop ("Contemporary Christian"), + gettext_noop ("Christian Rock"), + gettext_noop ("Merengue"), + gettext_noop ("Salsa"), + gettext_noop ("Thrash Metal"), + gettext_noop ("Anime"), + gettext_noop ("JPop"), + gettext_noop ("Synthpop"), +}; + +#define GENRE_NAME_COUNT \ + ((unsigned int)(sizeof genre_names / sizeof (const char *const))) + + + +#define OK 0 +#define INVALID_ID3 1 + +static void +trim (char *k) +{ + while ((strlen (k) > 0) && (isspace (k[strlen (k) - 1]))) + k[strlen (k) - 1] = '\0'; +} + +static int +get_id3 (const char *data, size_t size, id3tag * id3) +{ + const char *pos; + + if (size < 128) + return INVALID_ID3; + + pos = &data[size - 128]; + if (0 != strncmp ("TAG", pos, 3)) + return INVALID_ID3; + pos += 3; + + id3->title = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1"); + trim (id3->title); + pos += 30; + id3->artist = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1"); + trim (id3->artist); + pos += 30; + id3->album = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1"); + trim (id3->album); + pos += 30; + id3->year = EXTRACTOR_common_convert_to_utf8 (pos, 4, "ISO-8859-1"); + trim (id3->year); + pos += 4; + id3->comment = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1"); + trim (id3->comment); + if ( (pos[28] == '\0') && + (pos[29] != '\0') ) + { + /* ID3v1.1 */ + id3->track_number = pos[29]; + } + else + { + id3->track_number = 0; + } + pos += 30; + id3->genre = ""; + if (pos[0] < GENRE_NAME_COUNT) + id3->genre = dgettext (PACKAGE, genre_names[(unsigned) pos[0]]); + return OK; +} + + +#define ADD(s,t) do { if (0 != (ret = proc (proc_cls, "id3", t, EXTRACTOR_METAFORMAT_UTF8, "text/plain", s, strlen(s)+1))) goto FINISH; } while (0) + + +const char * +EXTRACTOR_id3_options () +{ + return "want-tail"; +} + + +int +EXTRACTOR_id3_extract (const char *data, + size_t size, + EXTRACTOR_MetaDataProcessor proc, + void *proc_cls, + const char *options) +{ + id3tag info; + char track[16]; + int ret; + + fprintf (stderr, "called with %llu bytes\n", (unsigned long long) size); + if (OK != get_id3 (data, size, &info)) + return 0; + if (strlen (info.title) > 0) + ADD (info.title, EXTRACTOR_METATYPE_TITLE); + if (strlen (info.artist) > 0) + ADD (info.artist, EXTRACTOR_METATYPE_ARTIST); + if (strlen (info.album) > 0) + ADD (info.album, EXTRACTOR_METATYPE_ALBUM); + if (strlen (info.year) > 0) + ADD (info.year, EXTRACTOR_METATYPE_PUBLICATION_YEAR); + if (strlen (info.genre) > 0) + ADD (info.genre, EXTRACTOR_METATYPE_GENRE); + if (strlen (info.comment) > 0) + ADD (info.comment, EXTRACTOR_METATYPE_COMMENT); + if (info.track_number != 0) + { + snprintf(track, + sizeof(track), "%u", info.track_number); + ADD (track, EXTRACTOR_METATYPE_TRACK_NUMBER); + } +FINISH: + free (info.title); + free (info.year); + free (info.album); + free (info.artist); + free (info.comment); + return ret; +} + +/* end of id3_extractor.c */ diff --git a/src/plugins/mp3_extractor.c b/src/plugins/mp3_extractor.c @@ -36,172 +36,6 @@ #include <unistd.h> #include <stdlib.h> -typedef struct -{ - char *title; - char *artist; - char *album; - char *year; - char *comment; - const char *genre; - unsigned int track_number; -} id3tag; - -static const char *const genre_names[] = { - gettext_noop ("Blues"), - gettext_noop ("Classic Rock"), - gettext_noop ("Country"), - gettext_noop ("Dance"), - gettext_noop ("Disco"), - gettext_noop ("Funk"), - gettext_noop ("Grunge"), - gettext_noop ("Hip-Hop"), - gettext_noop ("Jazz"), - gettext_noop ("Metal"), - gettext_noop ("New Age"), - gettext_noop ("Oldies"), - gettext_noop ("Other"), - gettext_noop ("Pop"), - gettext_noop ("R&B"), - gettext_noop ("Rap"), - gettext_noop ("Reggae"), - gettext_noop ("Rock"), - gettext_noop ("Techno"), - gettext_noop ("Industrial"), - gettext_noop ("Alternative"), - gettext_noop ("Ska"), - gettext_noop ("Death Metal"), - gettext_noop ("Pranks"), - gettext_noop ("Soundtrack"), - gettext_noop ("Euro-Techno"), - gettext_noop ("Ambient"), - gettext_noop ("Trip-Hop"), - gettext_noop ("Vocal"), - gettext_noop ("Jazz+Funk"), - gettext_noop ("Fusion"), - gettext_noop ("Trance"), - gettext_noop ("Classical"), - gettext_noop ("Instrumental"), - gettext_noop ("Acid"), - gettext_noop ("House"), - gettext_noop ("Game"), - gettext_noop ("Sound Clip"), - gettext_noop ("Gospel"), - gettext_noop ("Noise"), - gettext_noop ("Alt. Rock"), - gettext_noop ("Bass"), - gettext_noop ("Soul"), - gettext_noop ("Punk"), - gettext_noop ("Space"), - gettext_noop ("Meditative"), - gettext_noop ("Instrumental Pop"), - gettext_noop ("Instrumental Rock"), - gettext_noop ("Ethnic"), - gettext_noop ("Gothic"), - gettext_noop ("Darkwave"), - gettext_noop ("Techno-Industrial"), - gettext_noop ("Electronic"), - gettext_noop ("Pop-Folk"), - gettext_noop ("Eurodance"), - gettext_noop ("Dream"), - gettext_noop ("Southern Rock"), - gettext_noop ("Comedy"), - gettext_noop ("Cult"), - gettext_noop ("Gangsta Rap"), - gettext_noop ("Top 40"), - gettext_noop ("Christian Rap"), - gettext_noop ("Pop/Funk"), - gettext_noop ("Jungle"), - gettext_noop ("Native American"), - gettext_noop ("Cabaret"), - gettext_noop ("New Wave"), - gettext_noop ("Psychedelic"), - gettext_noop ("Rave"), - gettext_noop ("Showtunes"), - gettext_noop ("Trailer"), - gettext_noop ("Lo-Fi"), - gettext_noop ("Tribal"), - gettext_noop ("Acid Punk"), - gettext_noop ("Acid Jazz"), - gettext_noop ("Polka"), - gettext_noop ("Retro"), - gettext_noop ("Musical"), - gettext_noop ("Rock & Roll"), - gettext_noop ("Hard Rock"), - gettext_noop ("Folk"), - gettext_noop ("Folk/Rock"), - gettext_noop ("National Folk"), - gettext_noop ("Swing"), - gettext_noop ("Fast-Fusion"), - gettext_noop ("Bebob"), - gettext_noop ("Latin"), - gettext_noop ("Revival"), - gettext_noop ("Celtic"), - gettext_noop ("Bluegrass"), - gettext_noop ("Avantgarde"), - gettext_noop ("Gothic Rock"), - gettext_noop ("Progressive Rock"), - gettext_noop ("Psychedelic Rock"), - gettext_noop ("Symphonic Rock"), - gettext_noop ("Slow Rock"), - gettext_noop ("Big Band"), - gettext_noop ("Chorus"), - gettext_noop ("Easy Listening"), - gettext_noop ("Acoustic"), - gettext_noop ("Humour"), - gettext_noop ("Speech"), - gettext_noop ("Chanson"), - gettext_noop ("Opera"), - gettext_noop ("Chamber Music"), - gettext_noop ("Sonata"), - gettext_noop ("Symphony"), - gettext_noop ("Booty Bass"), - gettext_noop ("Primus"), - gettext_noop ("Porn Groove"), - gettext_noop ("Satire"), - gettext_noop ("Slow Jam"), - gettext_noop ("Club"), - gettext_noop ("Tango"), - gettext_noop ("Samba"), - gettext_noop ("Folklore"), - gettext_noop ("Ballad"), - gettext_noop ("Power Ballad"), - gettext_noop ("Rhythmic Soul"), - gettext_noop ("Freestyle"), - gettext_noop ("Duet"), - gettext_noop ("Punk Rock"), - gettext_noop ("Drum Solo"), - gettext_noop ("A Cappella"), - gettext_noop ("Euro-House"), - gettext_noop ("Dance Hall"), - gettext_noop ("Goa"), - gettext_noop ("Drum & Bass"), - gettext_noop ("Club-House"), - gettext_noop ("Hardcore"), - gettext_noop ("Terror"), - gettext_noop ("Indie"), - gettext_noop ("BritPop"), - gettext_noop ("Negerpunk"), - gettext_noop ("Polsk Punk"), - gettext_noop ("Beat"), - gettext_noop ("Christian Gangsta Rap"), - gettext_noop ("Heavy Metal"), - gettext_noop ("Black Metal"), - gettext_noop ("Crossover"), - gettext_noop ("Contemporary Christian"), - gettext_noop ("Christian Rock"), - gettext_noop ("Merengue"), - gettext_noop ("Salsa"), - gettext_noop ("Thrash Metal"), - gettext_noop ("Anime"), - gettext_noop ("JPop"), - gettext_noop ("Synthpop"), -}; - -#define GENRE_NAME_COUNT \ - ((unsigned int)(sizeof genre_names / sizeof (const char *const))) - - #define MAX_MP3_SCAN_DEEP 16768 const int max_frames_scan = 1024; enum @@ -270,64 +104,15 @@ static const char * const layer_names[3] = { #define SYSERR 1 #define INVALID_ID3 2 -static void -trim (char *k) -{ - while ((strlen (k) > 0) && (isspace (k[strlen (k) - 1]))) - k[strlen (k) - 1] = '\0'; -} - -static int -get_id3 (const char *data, size_t size, id3tag * id3) -{ - const char *pos; - - if (size < 128) - return INVALID_ID3; - - pos = &data[size - 128]; - if (0 != strncmp ("TAG", pos, 3)) - return INVALID_ID3; - pos += 3; - - id3->title = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1"); - trim (id3->title); - pos += 30; - id3->artist = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1"); - trim (id3->artist); - pos += 30; - id3->album = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1"); - trim (id3->album); - pos += 30; - id3->year = EXTRACTOR_common_convert_to_utf8 (pos, 4, "ISO-8859-1"); - trim (id3->year); - pos += 4; - id3->comment = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1"); - trim (id3->comment); - if ( (pos[28] == '\0') && - (pos[29] != '\0') ) - { - /* ID3v1.1 */ - id3->track_number = pos[29]; - } - else - { - id3->track_number = 0; - } - pos += 30; - id3->genre = ""; - if (pos[0] < GENRE_NAME_COUNT) - id3->genre = dgettext (PACKAGE, genre_names[(unsigned) pos[0]]); - return OK; -} - - #define ADDR(s,t) do { if (0 != proc (proc_cls, "mp3", t, EXTRACTOR_METAFORMAT_UTF8, "text/plain", s, strlen(s)+1)) return 1; } while (0) -static int -mp3parse (const unsigned char *data, size_t size, - EXTRACTOR_MetaDataProcessor proc, - void *proc_cls) +/* mimetype = audio/mpeg */ +int +EXTRACTOR_mp3_extract (const unsigned char *data, + size_t size, + EXTRACTOR_MetaDataProcessor proc, + void *proc_cls, + const char *options) { unsigned int header; int counter = 0; @@ -474,50 +259,4 @@ mp3parse (const unsigned char *data, size_t size, return 0; } - -#define ADD(s,t) do { if (0 != (ret = proc (proc_cls, "mp3", t, EXTRACTOR_METAFORMAT_UTF8, "text/plain", s, strlen(s)+1))) goto FINISH; } while (0) - - -/* mimetype = audio/mpeg */ -int -EXTRACTOR_mp3_extract (const char *data, - size_t size, - EXTRACTOR_MetaDataProcessor proc, - void *proc_cls, - const char *options) -{ - id3tag info; - char track[16]; - int ret; - - if (0 != get_id3 (data, size, &info)) - return 0; - if (strlen (info.title) > 0) - ADD (info.title, EXTRACTOR_METATYPE_TITLE); - if (strlen (info.artist) > 0) - ADD (info.artist, EXTRACTOR_METATYPE_ARTIST); - if (strlen (info.album) > 0) - ADD (info.album, EXTRACTOR_METATYPE_ALBUM); - if (strlen (info.year) > 0) - ADD (info.year, EXTRACTOR_METATYPE_PUBLICATION_YEAR); - if (strlen (info.genre) > 0) - ADD (info.genre, EXTRACTOR_METATYPE_GENRE); - if (strlen (info.comment) > 0) - ADD (info.comment, EXTRACTOR_METATYPE_COMMENT); - if (info.track_number != 0) - { - snprintf(track, - sizeof(track), "%u", info.track_number); - ADD (track, EXTRACTOR_METATYPE_TRACK_NUMBER); - } - ret = mp3parse ((const unsigned char *) data, size, proc, proc_cls); -FINISH: - free (info.title); - free (info.year); - free (info.album); - free (info.artist); - free (info.comment); - return ret; -} - /* end of mp3_extractor.c */