libextractor

GNU libextractor
Log | Files | Refs | Submodules | README | LICENSE

man_extract.1 (3510B)


      1 .TH EXTRACT 1 "Aug 7, 2012" "libextractor 0.7.0"
      2 .\" $Id
      3 .SH NAME
      4 extract
      5 \- determine meta-information about a file
      6 .SH SYNOPSIS
      7 .B extract
      8 [
      9 .B \-bgihLmnvV
     10 ]
     11 [
     12 .B \-l
     13 .I library
     14 ]
     15 [
     16 .B \-p
     17 .I type
     18 ]
     19 [
     20 .B \-x
     21 .I type
     22 ]
     23 .I file
     24 \&...
     25 .br
     26 .SH DESCRIPTION
     27 This manual page documents version 0.7.0 of the
     28 .B extract
     29 command.
     30 .PP
     31 .B extract
     32 tests each file specified in the argument list in an attempt to infer meta\-information from it.  Each file is subjected to the meta\-data extraction libraries from
     33 .I libextractor.
     34 .PP
     35 libextractor classifies meta\-information (also referred to as keywords) into types. A list of all types can be obtained with the
     36 .B \-L
     37 option.
     38 
     39 .SH OPTIONS
     40 .TP 8
     41 .B \-b
     42 Display the output in BiBTeX format.
     43 .TP 8
     44 .B \-g
     45 Use grep\-friendly output (all keywords on a single line for each file).  Use the verbose option to print the filename first, followed by the keywords.  Use the verbose option twice to also display the keyword types.  This option will not print keyword types or non\-textual metadata.
     46 .TP 8
     47 .B \-h
     48 Print a brief summary of the options.
     49 .TP 8
     50 .B \-i
     51 Run plugins in\-process (for debugging).  By default, each plugin is run in its own process.
     52 .TP 8
     53 .BI \-l " libraries"
     54 Use the specified libraries to extract keywords. The general format of libraries is .I [[\-]LIBRARYNAME[:[\-]LIBRARYNAME]*] where LIBRARYNAME is a libextractor compatible library and typically of the form .Ijpeg\. The minus before the libraryname indicates that this library should be removed from the existing list.  To run only a few selected plugins, use \-l in combination with \-n.
     55 .TP 8
     56 .B \-L
     57 Print a list of all known keyword types.
     58 .TP 8
     59 .B \-m
     60 Load the file into memory and perform extraction from memory (for debugging).
     61 .TP 8
     62 .B \-n
     63 Do not use the default set of extractors (typically all standard extractors, currently mp3, ogg, jpg, gif, png, tiff, real, html, pdf and mime\-types), use only the extractors specified with the .B \-l option.
     64 .TP
     65 .B \-p " type"
     66 Print only the keywords matching the specified type. By default, all keywords that are found and not removed as duplicates are printed.
     67 .TP 8
     68 .B \-v
     69 Print the version number and exit.
     70 .TP 8
     71 .B \-V
     72 Be verbose.  This option can be specified multiple times to increase verbosity further.
     73 .TP 8
     74 .I \-x " type"
     75 Exclude keywords of the specified type from the output. By default, all keywords that are found and not removed as duplicates are printed.
     76 .SH SEE ALSO
     77 .BR libextractor (3)
     78 \- description of the libextractor library
     79 .br
     80 .SH EXAMPLES
     81 .nf
     82 $ extract test/test.jpg
     83 comment \- (C) 2001 by Christian Grothoff, using gimp 1.2 1
     84 mimetype \- image/jpeg
     85 
     86 $ extract \-V \-x comment test/test.jpg
     87 Keywords for file test/test.jpg:
     88 mimetype \- image/jpeg
     89 
     90 $ extract \-p comment test/test.jpg
     91 comment \- (C) 2001 by Christian Grothoff, using gimp 1.2 1
     92 
     93 $ extract \-nV \-l png.so \-p comment test/test.jpg test/test.png
     94 Keywords for file test/test.jpg:
     95 Keywords for file test/test.png:
     96 comment \- Testing keyword extraction
     97 
     98 .SH LEGAL NOTICE
     99 libextractor and the extract tool are released under the GPL.  libextractor is a GNU package.
    100 
    101 .SH BUGS
    102 A couple of file\-formats (on the order of 10^3) are not recognized...
    103 
    104 .SH AUTHORS
    105 .B extract
    106 was originally written by Christian Grothoff <christian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use <libextractor@gnu.org> to contact the current maintainer(s).
    107 
    108 .SH AVAILABILITY
    109 You can obtain the original author's latest version from http://www.gnu.org/software/libextractor/