[ale] Mining PDF's

Swantje Willms swillms at mail.sis.pitt.edu
Thu Dec 5 16:31:09 EST 2002


Thanks for the question and the answer :-)
This is cool, I wouldn't have thought of it. I'll be able to use this for
my PhD project :-)
In the past (before my Linux days) I used the cumbersome and for this 
project not feasible Adobe email method.

Swantje

On Thu, 5 Dec 2002, synco gibraldter wrote:

> 
> there is a utility... info at:
> www.linux4smallbiz.com/Members/Eugene%20von/
> poweruser/page3/portalarticle_view
> 
> also, `strings file.pdf|grep /Title` seems to give some keywords.
> 
> ------------------
> synco at xodarap.net
> the xodarap network [what you thought?]
> atl,georgia
> http://news.xodarap.net
> irc://irc.xodarap.net
> 
> On Thu, 5 Dec 2002, Kevin O'Neill Stoll wrote:
<snip>
> >
> > In mining the pdfs, the search functionality needs to grab
> > a title, file size, a summary and relevance based on a text
> > search. (i.e. - if I search for 'dog', all pdfs with the
> > phrase 'dog' in it would be returned. )  I'm just not sure
> > how to get the text out of a pdf.
> >
> > Anywho, thanks in advance.
> >
> >
> >
> > =====
> > Kevin Stoll
> > http://kevinstoll.org
> >
> > OpenSource Software...FREE!
> > Angering Bill Gates...priceless.
> > ============================================================
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> > _______________________________________________
> > Ale mailing list
> > Ale at ale.org
> > http://www.ale.org/mailman/listinfo/ale
> >
> 
> 

_______________________________________________
Ale mailing list
Ale at ale.org
http://www.ale.org/mailman/listinfo/ale






More information about the Ale mailing list