One day, my RAID5 became invisible to the operating system. Individually the disks showed no errors, but as a whole I could see the volume. After trying a bunch of software, only one tool was capable of reading the data. So I bought an external drive to be my backup and copied the data to it. Assuming that the issue was just some broken superblocks and armed with an external backup, I reformatted the disks and copied everything back, and continued with my day.
A few months later the same issue repeated itself. The disks and data were fine, but no amount of research would conclusively tell me one way or another why all Linux software and the kernel refused to recognize the volume. This time I deleted the RAID and installed a ZFS partition. It’s held up since.
However, in the process of shuffling data back and forth, I somehow did not copy over one folder; my code folder. Somehow, the folder existed on the external disk, but was completely empty. Rsync failed for some reason and by the time I noticed, the original disks had been formatted and overwritten. All my past projects, schoolwork, master’s thesis, resume, journal, and other various notes were all gone. My mistake was simple: I ignored commonly known best practices because I never thought this kind of data loss could happen. Multiple failures eventually cascaded into an unrecoverable disaster.
- I expected a RAID5 plus an external disk to be enough redundancy in all cases except for acts of god.
- I trusted myself enough to run an ad-hoc manual data-syncing process.
- I did not verify my backups and assumed one extra backup was enough.
To prevent this from happening again I prepared the following workflow:
- All git repositories have at least two remote sites: my personal server and GitLab.
- All personal data that I would regret losing (including git repositories) is encrypted weekly and copied to an external disk.
- All data that I would not mind losing too much (mostly large video files and datasets) are rsynced weekly to the external disk.
- If the hash of the latest week’s data is different from the previous week’s, it is automatically copied to an external service.
- A calendar reminder every month reminds me to manually spend a few minutes checking the backups. For data on the external disk I verify that there is enough space to continue backing up and pick a few random files to open. For external data I download the past week’s drop and manually inspect a few files within.