[ale] comparing files

Michelangelo Grigni mic at mathcs.emory.edu
Thu Jul 18 12:59:51 EDT 2002


x3 writes:
> I have 100 text files that are all about 100Kb in size. The data in the files 
> is supposed to be sequential - however, in my haste to backup the files from 
> a dying system, I copied repetitive data in some of them.
> ... Anyone know of a program that can d00 this in Linux (or even Win)?

To find and report common passages among many text files,
try a plagiarism detector such as "copyfind" at:

  http://plagiarism.phys.virginia.edu/home.html

In the usual application the files are student writing or
programming assignments, so they would tend to be shorter
than your files; I am not sure whether this will become a
performance issue.

---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
sent to listmaster at ale dot org.






More information about the Ale mailing list