[ale] Filed De-duplication

Jim Kinney jim.kinney at gmail.com
Fri Oct 18 18:32:44 EDT 2013


Oh yes! De-dupe process can totally eat IO for hours. ZFS supports dedupe
at the block level as do several SAN devices. It will gulp time and storage
to keep that list of checksum and block data. Add in the lookup for every
block write. It will/can save more space and time than it costs.

Oh. Don't even think about block level dedupe on encrypted drives. It's an
exercise for the reader on the reason :-)
On Oct 18, 2013 5:02 PM, "Jeff Hubbs" <jhubbslist at att.net> wrote:

> When I was running a previous employer's file server (that I built on
> Gentoo, btw, referencing the other thread), I would pipe find output to
> xargs to md5sum to sort so that I could get a text file that I could
> visually eyeball to see where the dupes tended to be.  In my view it wasn't
> a big deal until you had, like, ISO images that a dozen or more people had
> copies of - if that's going on, there needs to be some housecleaning and
> organization taking place.  I suppose if you wanted you could script
> something that moved dupes to a common area and generated links in place of
> the dupes, but I'm not sure if that doesn't introduce more problems than it
> solves.
>
> As for auto-de-duping filesystems - which I suppose involves some sort of
> abstraction between what the OS thinks are files and what actually goes on
> disk - I wonder if there wouldn't wind up being some rather casual disk
> operations that could set off a whole flurry of r/w activity and plug up
> the works for a little while. Fun to experiment with, I'm sure.
>
> On 10/18/13 12:34 PM, Calvin Harrigan wrote:
>
>> Good Afternoon,
>>     I'm looking for a little advice/recommendation on file de-duplication
>> software. I've have a disk filled with files that most certainly have
>> duplicates.  What's the best way to get rid of the duplicates.  I'd like to
>> check deeper than just file name/date/size.  If possible I'd like to check
>> content (checksum?).  Are you aware of anything like that?  Linux or
>> windows is fine.  Thanks
>> ______________________________**_________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/**listinfo/ale<http://mail.ale.org/mailman/listinfo/ale>
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/**listinfo<http://mail.ale.org/mailman/listinfo>
>>
>>
> ______________________________**_________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/**listinfo/ale<http://mail.ale.org/mailman/listinfo/ale>
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/**listinfo<http://mail.ale.org/mailman/listinfo>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20131018/8de8609f/attachment.html>


More information about the Ale mailing list