<div dir="ltr"><div><div><div><div><div>Hmm....<br><br></div>This <pseudo-code> is off the top of my head, so there are probably some serious issues with it.<br><br></div>for file in `find /my_dir`<br></div>do<br></div>
MD5=`md5sum $file`<br></div><div> EXISTS=`grep $MD5 <file_of_sums> | wc -l`<br></div><div> if [ $EXISTS -ne 0 ]<br></div><div> then<br></div><div> EXISTS=0<br></div><div> rm $file<br></div><div> else<br>
</div><div> echo "$MD5" >> <file_of_sums><br></div><div> fi<br></div><div>done<br><br><br></div><div><div><div><div><br></div></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Fri, Oct 18, 2013 at 12:59 PM, JD <span dir="ltr"><<a href="mailto:jdp@algoloma.com" target="_blank">jdp@algoloma.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Slashdot had a question about this 1-2 yrs ago. Lots of people suggested<br>
scripting it, others pointed out some C code on sourceforge.<br>
<br>
I had a few hrs free that day and wrote some Perl (200+ LOC). Use it all the<br>
time, but I'd probably go with the C tool for any very large datasets. Mine<br>
doesn't automaticly remove anything and is far from perfect, that is certain.<br>
It is relatively fast on most types of files, however.<br>
<br>
On 10/18/2013 12:34 PM, Calvin Harrigan wrote:<br>
> Good Afternoon,<br>
> I'm looking for a little advice/recommendation on file de-duplication<br>
> software. I've have a disk filled with files that most certainly have<br>
> duplicates. What's the best way to get rid of the duplicates. I'd like to<br>
> check deeper than just file name/date/size. If possible I'd like to check<br>
> content (checksum?). Are you aware of anything like that? Linux or windows is<br>
> fine. Thanks<br>
> _______________________________<br>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org">Ale@ale.org</a><br>
<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</blockquote></div><br><br clear="all"><br>-- <br><div><a href="http://leamhall.blogspot.com/" target="_blank">Mind on a Mission</a></div>
</div>