[ale] Document Imaging under Linux
Jeff Hubbs
hbbs at comcast.net
Mon Sep 15 08:51:08 EDT 2003
For integrity purposes, you aren't supposed to rely on any storage
that's not a straight representation of the original paper. You can get
away with one-bit depth and lossless compression. If you OCRed stuff, a
one-character error could trip you up in court - "That's your name!"
"No, it isn't!"
It would be reasonable (and actually advantageous) to incorporate PDF
export, but I wouldn't store them that way.
- Jeff
On Mon, 2003-09-15 at 08:35, John Wells wrote:
> Matthew,
> Thanks for the reply. So you just scanned the docs in as normal
> images (i.e., not OCR) and saved as TIFF? Any idea what sort of file
> size I can expect there on, say, a standard 5 page doc?
>
> Thanks!
> John
>
> -----Original Message-----
> From: Matthew Brown [mailto:matthew.brown at cordata.com]
> Sent: Monday, September 15, 2003 8:31 AM
> To: Atlanta Linux Enthusiasts
> Subject: RE: [ale] Document Imaging under Linux
>
>
> I haven't looked at this in some time, and never on Linux, but
> when I did this for a living, we always scanned into TIFF-CALS
> Group 4, Black and White. It offered reasonable compression,
> , multi-page capability, and excellent portability. I also
> know a lot of folks use PDF, but I was under the impression it
> created much larger files... just an impression, don't recall
> why.
>
> My OCR experience is similar to Geoffrey's.
>
>
>
>
> On Mon, 2003-09-15 at 07:21, John Wells wrote:
> > My thoughts exactly. So I wonder...if I'm not using OCR, is pdf the best
> > (and smallest file size) option?
> >
> > Thanks,
> >
> > John
> >
> >
> >
> > -----Original Message-----
> > From: Geoffrey [mailto:esoteric at 3times25.net]
> > Sent: Monday, September 15, 2003 7:12 AM
> > To: Atlanta Linux Enthusiasts
> > Subject: Re: [ale] Document Imaging under Linux
> >
> >
> > James P. Kinney III wrote:
> > > xsane + ADF scanner + script-foo + database + webserver + time
> > >
> > > Save the images as a pdf unless you plan on trying OCR, then scan using
> > > black and white and save as tiff.
> >
> > All the free Linux OCR software is pretty green. I've not been
> > impressed. Then again, I've never found anything that was perfect in
> > any environment. With the massive number of fonts and such out there,
> > which continues to grow, it's a very difficult problem.
>
> --
> Best regards,
>
> Matthew Brown
> CorData, Inc.
> Office: 770-795-0089
> Fax: 404-806-4855
>
>
> ______________________________________________________________________
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
--
Jeff Hubbs <hbbs at comcast.net>
More information about the Ale
mailing list