[ale] Document Imaging under Linux

Jeff Hubbs hbbs at comcast.net
Mon Sep 15 08:51:08 EDT 2003


For integrity purposes, you aren't supposed to rely on any storage
that's not a straight representation of the original paper.  You can get
away with one-bit depth and lossless compression.  If you OCRed stuff, a
one-character error could trip you up in court - "That's your name!" 
"No, it isn't!"

It would be reasonable (and actually advantageous) to incorporate PDF
export, but I wouldn't store them that way.

- Jeff

On Mon, 2003-09-15 at 08:35, John Wells wrote:
> Matthew,
> Thanks for the reply.  So you just scanned the docs in as normal
> images (i.e., not OCR) and saved as TIFF?   Any idea what sort of file
> size I can expect there on, say, a standard 5 page doc?
>  
> Thanks!
> John
>  
>         -----Original Message-----
>         From: Matthew Brown [mailto:matthew.brown at cordata.com]
>         Sent: Monday, September 15, 2003 8:31 AM
>         To: Atlanta Linux Enthusiasts
>         Subject: RE: [ale] Document Imaging under Linux
>         
>         
>         I haven't looked at this in some time, and never on Linux, but
>         when I did this for a living, we always scanned into TIFF-CALS
>         Group 4, Black and White.  It offered reasonable compression,
>         , multi-page capability, and excellent portability.  I also
>         know a lot of folks use PDF, but I was under the impression it
>         created much larger files... just an impression, don't recall
>         why.
>         
>         My OCR experience is similar to Geoffrey's.
>         
>         
>         
>         
>         On Mon, 2003-09-15 at 07:21, John Wells wrote: 
>         > My thoughts exactly.  So I wonder...if I'm not using OCR, is pdf the best
>         > (and smallest file size) option?
>         > 
>         > Thanks,
>         > 
>         > John
>         > 
>         > 
>         > 
>         > -----Original Message-----
>         > From: Geoffrey [mailto:esoteric at 3times25.net]
>         > Sent: Monday, September 15, 2003 7:12 AM
>         > To: Atlanta Linux Enthusiasts
>         > Subject: Re: [ale] Document Imaging under Linux
>         > 
>         > 
>         > James P. Kinney III wrote:
>         > > xsane + ADF scanner + script-foo + database + webserver + time
>         > > 
>         > > Save the images as a pdf unless you plan on trying OCR, then scan using
>         > > black and white and save as tiff.
>         > 
>         > All the free Linux OCR software is pretty green.  I've not been 
>         > impressed.  Then again, I've never found anything that was perfect in 
>         > any environment.  With the massive number of fonts and such out there, 
>         > which continues to grow, it's a very difficult problem.
>         
>         -- 
>         Best regards,
>         
>         Matthew Brown
>         CorData, Inc.
>         Office:	770-795-0089
>         Fax:	404-806-4855
> 
> 
> ______________________________________________________________________
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
-- 
Jeff Hubbs <hbbs at comcast.net>



More information about the Ale mailing list