[ale] Mining PDF's

Kevin O'Neill Stoll kevinostoll at yahoo.com
Fri Dec 6 17:41:25 EST 2002


It seems that my solution to this problem is the age old
question of time or money.


IF, I have lots of money then there seem to be quite a few
products available that would allow me to index and catalog
an archive of PDF's, with some limitations in the area of
PDF's that are scanned in, such that they are images and
not text.

IF, I have lots of time then my solution leans towards
linux with the use of xpdf. By converting the pdfs to text
then using a script to load this information into a
database table. Then build a search application to perform
a full-text search on the table I just built.


That's what I came up with, I'm open to any comments /
critiques that anyone may have. This may not be the most
elaborate solution but it does meet all of the requirements
that my supervisior had asked for and is fairly savy.


thanks for the help.

=====
Kevin Stoll
http://kevinstoll.org

OpenSource Software...FREE!
Angering Bill Gates...priceless.
============================================================

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
_______________________________________________
Ale mailing list
Ale at ale.org
http://www.ale.org/mailman/listinfo/ale






More information about the Ale mailing list