[ale] best way to copy 3Tb of data
Scott Plante
splante at insightsys.com
Tue Oct 27 11:07:33 EDT 2015
You didn't say what you liked about the tarball. Is it the compression or the one file to deal with? In case it's the compression, here are some ideas.
There is a FUSE compression filesystem called fusecompress. I can just install it on my desktop (openSUSE) from my regular OS repos. You may be able to do the same, or otherwise you can get the source and some instructions here:
https://code.google.com/p/fusecompress/wiki/Usage
Basically, you could just create a directory on your NAS (say, mkdir /storage/datadirectory) and then type:
# fusecompress /storage/datadirectory
That makes a compressed filesystem (which would contain the contents of /storage/datadirectory, if there were any) and mounts it back over that spot. You could then use rsync to copy to /storage/datadirectory and get the compression advantage of tarball along with the restart advantage of rsync.
Similarly, if your underlying filesystem is already btrfs by chance, it has compression built in and you can enable it for a directory using chattr. I haven't really played with this though.
https://btrfs.wiki.kernel.org/index.php/Compression
By the way, I've been using rsync for many years and never set up an rsync server. I always use it via ssh.
Scott
----- Original Message -----
From: "Todor Fassl" <fassl.tod at gmail.com>
To: "Atlanta Linux Enthusiasts" <ale at ale.org>
Sent: Tuesday, October 27, 2015 9:33:37 AM
Subject: [ale] best way to copy 3Tb of data
One of the researchers I support wants to backup 3T of data to his space
on our NAS. The data is on an HPC cluster on another network. It's not
an on-going backup. He just needs to save it to our NAS while the HPC
cluster is rebuilt. Then he'll need to copy it right back.
There is a very stable 1G connection between the 2 networks. We have
plenty of space on our NAS. What is the best way to do the caopy?
Ideally, it seems we'd want to have boththe ability to restart the copy
if it fails part way through and to end up with a compressed archive
like a tarball. Googling around tends to suggest that it's eitehr rsync
or tar. But with rsync, you wouldn't end up with a tarball. And with
tar, you can't restart it in the middle. Any other ideas?
Since the network connection is very stable, I am thinking of suggesting
tar.
tar zcvf - /datadirectory | ssh user at backup.server "cat > backupfile.tgz"
If the researcher would prefer his data to be copied to our NAS as
regular files, just use rsync with compression. We don't have an rsync
server that is accessible to the outside world. He could use ssh with
rsync but I could set up rsync if it would be worthwhile.
Ideas? Suggestions?
on at the far end.
He is going to need to copy the data back in a few weeks. It might even
be worthwhile to send it via tar without uncompressing/unarchiving it on
receiving end.
_______________________________________________
Ale mailing list
Ale at ale.org
http://mail.ale.org/mailman/listinfo/ale
See JOBS, ANNOUNCE and SCHOOLS lists at
http://mail.ale.org/mailman/listinfo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20151027/90d6eae0/attachment.html>
More information about the Ale
mailing list