[ale] Archiving directories/files with "compressed" mirror version
Michael B. Trausch
mike at trausch.us
Thu Aug 14 01:02:52 EDT 2008
On Wed, 2008-08-13 at 14:48 -0400, Ed L. Cashin wrote:
> Do you have lots of free space and resources?
>
> You could do that with,
>
> cp -a foo foo.archive
> find foo.archive -type f -exec bzip2 '{}' ';'
This approach is nice and simple. It will, of course, only work if you
have approximately 200% the data size available, _and_ the file names do
not have spaces, tabs, etc. in them.
There is a slightly more complex way to do it such that you don't have
the requirement for the 200% data storage, though. Assuming that ./foo/
is the directory that needs to be copied and compressed with the tree
preserved:
$ find foo -type d | sed 's/^foo/foo.new/' | xargs mkdir -p
$ for FILE in `find foo -type f`; do NEWFILE=$(echo $FILE | sed
's/^foo/foo.new/'); cat $FILE | bzip2 > $NEWFILE; done
However, this is still not friendly to filenames that contain spaces,
due to the way the shell works; the 'for' builtin will break at spaces.
Instead of using 'for', a combination of the 'while' and 'read' builtins
can do a similar thing, but it gets a little large to enter on the
command line:
$ find foo -type d | sed 's/^foo/foo.new/' | while read NEWDIR; do mkdir
-p "$NEWDIR"; done
$ find foo -type f | while read SOURCE_FILE; do DEST_FILE=$(echo
"${SOURCE_FILE}.bz2" | sed 's/^foo/foo.new/'); cat "$SOURCE_FILE" |
bzip2 > "$DEST_FILE"; done
To make it easier to read, here it is in the form of a shell script,
generalized to work for any user-specified directory (note, this will
fail to act properly if the filename contains a newline character, but
this is a rare occurrence. If you have filenames that have newlines in
them, you should probably rename them anyway.):
---------------BEGIN
#!/bin/bash
#
# Duplicate the specified directory as 'directory.new', but with all the
# files in the tree compressed via bzip2.
#
# by Michael Trausch, 2008. Public domain.
#
SRCDIR="$1"
DESTDIR="$1.compressed"
function ErrorExit {
printf " Failed.\nbzip2 returned an error (%d)" $1
exit $1
}
# mirror the directory tree, first.
find "$SRCDIR" -type d | sed "s|^$SRCDIR|$DESTDIR|" | \
while read NEWDIR; do mkdir -p "$NEWDIR"; done
# now, for each of the files, compress them and put them in the new
# tree.
find "$SRCDIR" -type f | while read SOURCE_FILE; do
DEST_FILE=$(echo "${SOURCE_FILE}.bz2" | sed "s|^$SRCDIR|$DESTDIR|")
printf "Compressing %s to %s..." "$SOURCE_FILE" "$DEST_FILE"
cat "$SOURCE_FILE" | bzip2 > "$DEST_FILE" || ErrorExit $?
printf " done!\n"
done
---------------END
Here are the results of running this on a (slightly redacted) version of
my ${HOME}:
Thursday, 2008-Aug-14 at 00:57:13 - mbt at zest - Linux v2.6.24
Ubuntu Hardy:[1-120/566-0]:~/tst> tree
.
|-- test
| |-- 100-pushups.ods
| |-- Doctorow, C. - Little Brother.pdf
| |-- FertigoProRegular
| | |-- Ferigo_Pro.pdf
| | |-- Fertigo_PRO.otf
| | `-- license_agreement.txt
| |-- GalaxiumContactList1.png
| |-- Hegadekatte_2006_PhD-Thesis.pdf
| |-- Router configuration.conf
| |-- UCAM-CL-TR-577.pdf
| |-- WIU Grad App.pdf
| |-- bubbltre.zip
| |-- fcgi-lib-description.odt
| |-- letter to parc.odt
| `-- ll.odt
`-- test.compressed
|-- 100-pushups.ods.bz2
|-- Doctorow, C. - Little Brother.pdf.bz2
|-- FertigoProRegular
| |-- Ferigo_Pro.pdf.bz2
| |-- Fertigo_PRO.otf.bz2
| `-- license_agreement.txt.bz2
|-- GalaxiumContactList1.png.bz2
|-- Hegadekatte_2006_PhD-Thesis.pdf.bz2
|-- Router configuration.conf.bz2
|-- UCAM-CL-TR-577.pdf.bz2
|-- WIU Grad App.pdf.bz2
|-- bubbltre.zip.bz2
|-- fcgi-lib-description.odt.bz2
|-- letter to parc.odt.bz2
`-- ll.odt.bz2
4 directories, 28 files
For convenience, the shell script is attached. Consider it public
domain.
--- Mike
--
My sigfile ran away and is on hiatus.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dupdir-bz2
Type: application/x-shellscript
Size: 804 bytes
Desc: not available
Url : http://mail.ale.org/pipermail/ale/attachments/20080814/02415344/attachment-0002.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mail.ale.org/pipermail/ale/attachments/20080814/02415344/attachment-0003.bin
More information about the Ale
mailing list