libextractor
============

libextractor is a simple library for keyword extraction.
libextractor does not support all formats but supports
a simple plugging mechanism such that you can quickly
add extractors for additional formats, even without
recompiling libextractor. libextractor typically
ships with a dozen helper-libraries that can be
used to obtain keywords from common file-types.



extract
=======

extract is a simple command-line interface to libextractor.



Writing plugins
===============


If you want to write your own extractor for some filetype,
all you need to do is write a little library that implements
a single method with this signature:


KeywordList * <libraryname>_extract(const char * filename,
                                    char * data,
                                    size_t size,
                                    KeywordList * prev);

where <libraryname> is the name of the library file that you
will tell libExtractor to load, minus the suffix.  For example,
if you link your extractor into a file called 'myextractor.so',
the method above should be called 'myextractor_extract'.

The filename is the name of the file, data is a pointer
to the contents of the file and size is the size of the file.
The extract method must prepend keywords that it finds
to the linked list 'prev' and return the new head.
The library must allocate (malloc) the entry in the
keyword list and the memory for the filename since both
will be free'ed by libExtractor once the application calls
freeKeywords.

An example implementation can be found in mp3extractor.c.



Notes
=====

libextractor contains some very large C files.  gcc can
easily use over (!) 100 MB of memory to compile them.  If
you have that much, libextractor will compile in about
a minute.  If you don't have that much, you may want to
consider using the binaries.

On Mac OS X, libextractor will avoid using GCC 3.1, because
of problems compiling one of the extractors.  GCC 3.3 and
2.95.2 are known to work well; as such, libextractor will
first look for 3.3 (by attempting to run gcc-3.3, cpp-3.3,
and g++-3.3) and then 2.95.2 (by attempting to run gcc2
and g++2).
