Secure and quick backup of huge amount of files via ssh
For a long time i didn't have access to stable and unlimited internet connection, thus i've developed a habit of storing anything that I found useful or just enjoyed. Unfortunetely 3 times already I have managed to unrecoverably delete or mangle my collections, and each new was larger then previous.
old setup
When I've build a computer that had a lot of storage I've moved my collection and manually copied files from it to other devices through ssh when needed. Files were not archived or anything, all of it was stored in normal directory structure and a lot of things I often moved had at least a couple of thousands of files.
Storing them as such made copying them much slower as each file had to be found by file system, read from random place in memory and then transfered by scp
or rsync
.
luks
The partition that stored them was secured with luks, which always worked fine on my laptop, but when my drive was around 85% full I've run into some issues. If a lot of data was read or written at once the drive was shutting down and any access to it was denied with 'input/output error'.
I've seen a lot of old hdd's dying with such error. It's the most general issue possible that doesn't really say anything about what failed or how, even checking dmesg
showed only sata errors which means that something in a drive failed but kernel knows nothing about it.
The only way of "fixing" such drive is by disconnecting the drive from power and connecting it back and after that it usually works only for a few minutes while being very slow before it either does the same or mounted file system becomes completely unresponsive and after forceful unmount it can be only mounted as read only. Drives usually become usable again after they cool down or never.
My hdd was too new for such things to happen and using other partitions caused no such problems. I've checked SMART data, searched for bad blocks, and did filesystem check but nothing wrong was found. I've extended the luks volume by 150GB but there was no noticable improvement.
This happened rarely and in special circumstances so I ignored it and for a couple of months patiently retransfered files whenever it occured.
But then filesystem started getting slower and each mount was a gamble with 30% chance of failure. Knowing that my files are in danger I copied them to other drive losing some in the process then I've deleted the luks volume and made a new one. After a month of peace it came back.
This made me realize that storing milions of files in a filesystem on hdd behind luks is a bad idea. A lot of files messes up ext4
and reading them forces hdd heads to wander around. Hdd overheats and any communication is cut off by thermal protection which damages the file system by abrupt shutdown. After enough of such incidents even repairing file system at mounting causes hdd to overheat.
Above that recovering data from broken ext4 over luks is a lot of pain and suffering.
solution
This convinced me that files should be stored on normal file system while being encrypted for easier recovery, and any directory should be archived as not to strain the drive and filesystem.
Inspired by git
I've made a bash script named sunt https://github.com/TUVIMEN/sunt
It's configured by environment variables or by setting them in ~/.config/sunt
.
All files stored on the server are named by command in SUNT_HASH
and placed in the same directory, general info about them is stored in the SUNT_INDEX
file.
Files are automatically encrypted and decrypted by commands in SUNT_CRYPT
and SUNT_DECRYPT
.
Any directory will be stored in a tar
archive and unpacked upon restoring.
SUNT_DEST
specifies storage name, connection command, destination directory and local directory. All of those have to be delimited by a ,
and can be repeated to specify more than one destination (they have to be divisive by 4).
The example ~/.config/sunt/
might look like:
SUNT_INDEX="/home/user3/index"
SUNT_ENCRYPT="openssl enc -e -aes-256-cbc -salt -pbkdf2 -iter 1000000 -md sha512 -in /dev/stdin -pass file/home/user3/pass -out /dev/stdout"
SUNT_DECRYPT="openssl enc -d -aes-256-cbc -salt -pbkdf2 -iter 1000000 -md sha512 -in /dev/stdin -pass file:/ome/user3/pass -out /dev/stdout"
SUNT_COMPRESS="xz -e9"
SUNT_DECOMPRESS="unxz"
SUNT_HASH="sha256sum"
SUNT_DEST="user1,ssh user1@192.168.1.104,/media/sdc1,/home/user3/d1,user2,ssh user2@192.168.1.105,/media/sdc2,/home/user3/d2"
The first argument of sunt has to be a command, this can be:
- l, list
- prints out index file
- a, add
- adds files to index
- d, delete
- deletes files from index, leaving files intact
- i, index
- uploads index file to the remote
- u, upload
- uploads files to the remote
- au, add-upload
- adds files to database and uploads them to the remote
- r, restore
- restores files from the remote
After it paths to files or other options might be specified.
Files first have to be added to the index, but only if they are under the local directory specified in SUNT_DEST
, in this example in /home/user3/d1
or /home/user3/d2
.
sunt add FILE1 PATH/FILE2 DIR
Files can be compressed if -c
option is specified, compression happens by SUNT_COMPRESS
at uploading, and when restoring SUNT_DECOMPRESS
is run.
sunt add -c log.txt
Then running sunt list
or cat /home/user3/index
will output
238b424f2d10c9c4a49308dc90ad7ea5fcf3df71b142cc3fc90ab1cd3121596c FILE1 user1 f
f99640b523a230567fe2fca6e70f7cf41ad974ff02e098fa1bbd82120a2d3fde PATH/FILE2 user1 f
d313ad93be2d2599267359ea396fa98713c2a69d8530c6da6b00d6b9380bc872 DIR user1 d
d7e60b33c666e6c4849584b9c36ca85791b55bca10162b75a1d6aa02e8dfdd9e log.txt user1 fc
where fields are separated by tabs, first field is file name on the remote, second is the local path relative to that in SUNT_DEST
, third is the storage name, fourth is type.
Files can be now uploaded with
sunt upload FILE1 PATH/FILE2 DIR log.txt
All of this can be shortened with
sunt add-upload FILE1 PATH/FILE2 DIR
sunt add-upload -c log.txt
If files were modified and you want to upload them you have to specify the -f
option to overwrite them on the remote
sunt upload -f log.txt
Files can be restored with
sunt restore /home/user3/PATH/FILE2 FILE1
You can also restore them overwriting local files with the -f
option
sunt restore -f log.txt
Files can be deleted from index (local files are not affected)
sunt delete DIR
This will however not delete them on the remote, to do that use -r
option (still local files are not affected)
sunt delete -r DIR
If you want to backup index file you can upload it encrypted, to the remote
sunt index
To make restoring files easier I've written a simple fzf
wrapper named suntr
#!/bin/bash
. ~/.config/sunt
mapfile -t -d, destination <<< "$SUNT_DEST"
[ "$((${#destination[@]}%4))" -ne '0' ] && {
echo "invalid SUNT_DEST" >&2
exit 1
}
path="${destination[3]//$'\n'}"
sel="$(cut -f2 "$SUNT_INDEX" | fzf)"
[ -z "$sel" ] && exit
sunt r "$path/$sel"
less common
If you're really sure that a file is compressed correctly you can also send already compressed one with
sunt upload -c FILE3
if you restore it it will be decompressed. You can also do the reverse
sunt restore -c FILE3
DO NOT USE GPG
gpg
is a tool mainly intended for asymmetric encryption, which is slower, and functionally excessive in case of remote files. gpg
allows you to use symmetric encryption like AES256
but because it was build for asymmetric encryption if anything about encrypted file changes it will absolutely refuse to decrypt the rest of it. For a long time I've used gpg like
SUNT_ENCRYPT="gpg -c --cipher-algo AES256 -z 0 --batch --passphrase-file /media/DRIVE/pass"
SUNT_DECRYPT="gpg -d --batch --passphrase-file /media/DRIVE/pass"
but after years of use files on the hdd changed by themselves (hdds don't guarantee that files will not change). Only 9 files were affected and only by a couple of bytes. After the discovery of the first one i could not find any way to force gpg to decrypt it. To check for any others I've decrypted all files and fortunetely only little of them were affected, but i could not recover them fully.
Just use openssl
for encrytion.