[ale] Mining PDF's

synco gibraldter synco at xodarap.net
Thu Dec 5 14:29:15 EST 2002



there is a utility... info at:
www.linux4smallbiz.com/Members/Eugene%20von/
poweruser/page3/portalarticle_view

also, `strings file.pdf|grep /Title` seems to give some keywords.

------------------
synco at xodarap.net
the xodarap network [what you thought?]
atl,georgia
http://news.xodarap.net
irc://irc.xodarap.net

On Thu, 5 Dec 2002, Kevin O'Neill Stoll wrote:

> Hey all,
>
> I need to implement a search functionality that is able to
> mine a url directory structure which contains pdf's. I was
> hoping that someone knew of an opensource project that
> already has done some of the grunt work otherwise, I'm open
> to ideas as to how to accomplish this task.
>
> In mining the pdfs, the search functionality needs to grab
> a title, file size, a summary and relevance based on a text
> search. (i.e. - if I search for 'dog', all pdfs with the
> phrase 'dog' in it would be returned. )  I'm just not sure
> how to get the text out of a pdf.
>
> Anywho, thanks in advance.
>
>
>
> =====
> Kevin Stoll
> http://kevinstoll.org
>
> OpenSource Software...FREE!
> Angering Bill Gates...priceless.
> ============================================================
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
>

_______________________________________________
Ale mailing list
Ale at ale.org
http://www.ale.org/mailman/listinfo/ale






More information about the Ale mailing list