EXOTIC SILICON
“How to do it! How often is too often? We're talking backup, of course!”
Backup strategies - keeping data safe on tape, (or optical disc)
Safe on tape!
Back it up!
Today, Jay's talking backup strategies. Find out how to protect yourself - and your data - from the myriad of threats it faces in the modern world, from glitchy hard disks, and buggy SSD firmware, through user error, (fat fingers beware!), and even deliberate attacks from malware and network intrusion.
We'll also talk about keeping the backups themselves safe and sound, away from prying eyes and environmental factors that want to eat away at your media.
There's a good chance that the machine you are currently using to read this webpage contains a lot of your own valuable work. Whether that's photos or videos or even just plain old boring work stuff, you'd probably be lost if suddenly it wasn't there.
Have you backed it up? If not, shame on you! But for those who smugly answered, "yes!", just how sure are you of your backup strategy? Have you ever needed to restore anything? Have you tested it? How quickly can you get back up and running after disaster inevitably strikes? Have you covered all bases? What about slow, silent data corruption over the long term?
Hummm, see? It's not as simple and straightforward as it might seem. So let's dive right in and discuss the do's and don'ts of backup strategies.
“But I copy everything to a USB flash drive every Friday! I know I'm as safe as houses!”
Ahh... I was naïve too, 30 years ago.
Fun fact!
Readable after 50 years!
The tape pictured above, and in the background image, is the oldest tape in our archives. At the time of writing this article it was recorded 49½ years ago, and is still readable. (Of course, the drives can be quite stubborn, but that's another story.)
We're obviously not suggesting you invest in 9-track tape in 2023 for your daily backups, but it's sound proof that a good investment at the time pays off in terms of long term reliability.
Defining the problem
The myth...
When thinking about data backup, many people have tended to fixate on the possibility of a crashed hard disk, and in modern times, a totally dead SSD. It's been the classic disaster scenario for decades, assuming that your office doesn't burn down overnight. You sit down in front of your desktop in the morning, and it won't boot. As you reach in to fiddle with SATA cables and clean connections, you realise that the disk isn't even spinning up.
Maybe you knew enough to try a couple of short, sharp, ninety degree twists in the plane of the platters, in case it was caused by stiction. But sooner or later, reality dawns, and it becomes clear that the disk will never spin again. It, along with your data, is gone forever.
So a couple of full back-ups at regular intervals should suffice, right?
Except that isn't how it usually happens - most likely you'll be calling on your backups for some other reason.
The reality...
Aside from the fact that when modern SSDs fail they often remain readable, I.E. they become read-only, your data is much more likely to be at risk from silent corruption over time or overwritten due to operator error.
Silent corruption can happen for reasons ranging from bad SATA cables and buggy SSD firmware, to malware and more. Operator error might go genuinely un-noticed, or be covered up.
Both of these scenarios can be protected against with an adequate backup strategy, but the simple approach of a regular, full backup, (which also often goes untested), in many cases just won't suffice.
Side note
The general principles about data backup that we're presenting in this article are applicable across the board, whether you're a home user or a large multinational.
However, we've tried to put the information across in a way that's most applicable to a small to medium size company, with most or all of their data created and stored locally, (rather than relying on cloud services). We can imagine that they might have up to a dozen workstations for individual employees, one or two local fileservers, and perhaps a remote server or VM for self hosting of web services and email. We'll also assume that all of the IT equipment is administered by one member of staff.
What we need to consider:
Pesky backup details:
  1. The amount of work that is not backed up at any one time, I.E. data that has been created since the last backup.
  2. How far back our backups go. A lot of data from current work-in-progress is constantly in flux and changing, but plenty more usually remains little used and unchanged over a long period of time once it's immediate use is over. If this data is later found to be corrupted, including on recent backups, then a backup from a year ago might be perfectly useful. The question then becomes, will you have kept one?
  3. Speed of recovery. Downtime at home is inconvenient. Downtime at work is lost productivity, and lost income. Just how much time can you afford to lose getting back up and running?
  4. Security of backups. Confidential business information needs to be kept away from prying eyes, whether it's on a server's hard disk, or a tape in a safe deposit box. Without sufficient consideration, your backups might be an easy target for an attacker.
We need to address all of these criteria, otherwise our backups could end up being useless.
“But I have a RAID array so nothing to worry about, right?”
You couldn't be more wrong...
Why RAID won't save you, (and might just make things worse)
RAID is designed to prevent against a very specific failure mode - a single dead hard disk. It should already be obvious by now that this is not the main scenario that concerns us. RAID won't help against data loss through operator error - the data will simply be dutifully overwritten on all of the disks in the array instead of just one. Likewise with the vast majority of data corruption cases. Although RAID can in theory be set up to verify data integrity by reading multiple copies across different disks in a mirror, and therefore potentially spot data corruption coming from an individual drive, in practice this is very rarely done.
In fact, we can achieve a similar level of integrity checking at the application level, anyway, without using RAID. Additionally, if a drive is spewing out bad data, surely we want to know about it and fix the fault, rather than have it mitigated by a RAID controller with a blinking light that is easily missed or ignored?
What is perhaps much less obvious, is that installing a RAID in place of a single disk actually introduces new failure modes, and always increases the overall possibility of a fault occurring.
As well as the very real possibility of bugs in the firmware of the RAID card, simple maths tells us that with two disks in a mirror, there is, nominally, twice as much risk of any one of the drives failing at any particular moment. Although we might hope that a failed disk would simply be detected as such, and that the system would continue running, there is usually no guarantee of exactly how a dead disk will behave. It's entirely possible for it to hang the RAID controller and crash the system, for example. If the fault is power-supply related, all sorts of bad things could happen - including the loss of every single drive in the system.
What you are really doing with RAID, is trading the risk of a single disk failing, (which used to be much more of an issue in the distant past), for higher complexity and more potential points of failure overall. There is no free lunch - every gain comes with a corresponding loss.
RAID mirrors and backups serve very different purposes. One does not substitute for the other.
Listen to Jay
“There is no free lunch - RAID trades the risk of a single disk failing for higher complexity and more overall points of failure.”
The importance of archiving old, non-changing data:
An insurmountable task?
Let's review the first consideration in the list above:
The amount of work that is not backed up at any one time, I.E. data that has been created since the last backup.
In case it's not already clear, let's take an example. Imagine that we back up our entire server every Friday evening. If disaster strikes on Friday morning, then any data from the last six days won't be on the most recent backup.
Clearly, to minimise this concern, we want to make backups as frequently as possible. But this comes at a cost:
Whilst we can reduce the burden somewhat by doing incremental backups, (I.E. only copying new data, or data that has changed), this adds extra administrative work, and complicates the restoration process. Again, everything is a trade-off.
“We back up the server every Friday evening!”
So, a disaster on Friday morning will lose you six day's worth of work...
Reducing the backup burden by archiving
What can reduce that burden is to identify data that is long-term non-changing, isolate it, and deal with it separately. In other words, take it right out of the regular backup loop. This is the concept of archiving.
Or to look at it another way, we stop treating our, (main), hard disk as an endless cesspit continuously filling up with more and more unsorted data. Instead, it's the home to work in progress. Once that work has reached it's conclusion and won't change anymore, it's moved elsewhere for future quick access, backed up once, and then mostly just left alone.
In practice, this doesn't have to be a grandiose arrangement - 'moved elsewhere' might just mean a different partition on the same disk, and the archival backup can be to a regular tape or optical storage.
In some cases, where an old collection of data is unlikely to be needed at short notice - or even at all - and disk space is tight, the archival backup, (or preferably several), might be sufficient with no need to keep a copy on a local hard disk at all.
This concept of archiving unchanging data and taking it out of the regular backup loop also has two more advantages, one for data security and another for long-term data integrity:
Considering data security, the archived data can be encrypted with a separate key to the main hard disks. This key will then only be required on occasions that the old data needs to be accessed.
With regards to long-term data integrity - if the data shouldn't be changing, then by archiving it at a fixed point in time and keeping that copy, rather than continually re-writing it to each successive backup, we afford ourselves a degree of protection against silent data corruption. This could be in the form of malware, (typically ransomware), which is secretly encrypting our files, or more mundanely, the result of deteriorating memory cells in an SSD, or even buggy SSD firmware corrupting data as it moves it around for wear-leveling purposes. Whatever the cause, our long-time-ago copy isn't vulnerable to having had it's good data unknowingly overwritten with bad.
Not all data is equal
How much can we afford to lose?
Now that we've managed to reduced the volume of data that we have 'in flux' by identifying and archiving that which doesn't change, we can backup the rest of our data more frequently for the same cost in time and media. In other words, we have in one sense reached 'maximum efficiency'.
So at least in theory, all we need to consider now in order to determine our backup schedule is one question: how much can we afford to lose?
Obviously this is an open question, as it depends on the nature of the work and how difficult it would be to reproduce. It's fairly safe to say that most people certainly wouldn't want to lose a day's worth of effort. But can we really afford to lose even one hour's worth?
Handy hint!
Detecting silent data corruption
Whatever the cause, catching silent data corruption early will considerably reduce the burden and lost time it causes.
Two ways that you can do this are:
Tiered backups to the rescue
The upshot of all this is that to all practical intents and purposes we can't rely solely on a single backup schedule for all of our data.
The solution is tiered backups. Copies of the most important and most changeable data can be kept to hand on local storage for quick access in case of a fat-fingered deletion, whilst our more comprehensive backup is there to guard against possible tragic occurrences such as fire or physical theft.
Speed of restoration is often overlooked as a factor, but it's importance shouldn't be underestimated. If you accidentally break a configuration file or lose an important email that's just arrived, you'll probably want to be back up and running as quickly as possible. That's going to be much easier from a local disk than it would be restoring from a tape or other off-line backup.
A typical comprehensive backup strategy:
TierCategory of dataBackup mediaLocationSpeed of restorationProtects againstConsiderations
Tier 1
  • Finished projects
  • Older archive material
  • Old log files
  • Previous years' financial documents
  • Old company reports
One or more of:
  • Optical WORM
  • Tape
  • HDD off-line in cold storage
  • Preferably off-site.
  • On-site only if off-site is not an option, or as additional copies.
  • Slow if off-site
  • Intermediate if on-site
Complete melt-down:
  • Fire
  • Flood
  • Successful large-scale network intrusion (*)
  • Large scale damage-causing electrical system failure
  • Silent data corruption due to malware or bad hardware
Beware of format obsolescence.
Non-archival copies of the same data should be kept locally for quick access.
Tier 2
  • All current data.
  • Everything that is not archived in the category above.
One or more of:
  • External HDD
  • Tape
  • Optical WORM or re-writable
  • Preferably off-site and on-site copies
  • Intermediate
General system failures:
  • Dead hard disks and SSDs
  • Operator error
  • Silent data corruption that is caught early
  • Smaller scale less sophisticated network intrusion (*)
These are the traditional 'once-a-week' full backups, although obviously the interval can be anything you like
Tier 3
  • Current work in progress.
  • Excludes system files and anything in the next category below.
One of:
  • Flash drive
  • External HDD
  • Optical re-writable
  • On-site.
  • If security policy permits, taken home
  • Fast
Individual workstation issues:
  • Failure of a single personal workstation
  • Operator error - deletion or overwriting
This is basically a personal 'my home directory at the end of the day' backup.
Much faster and easier to restore than a 'weekly', but may not provide as much protection against un-noticed operator error or silent data corruption.
Tier 4Fast changing or new and irreplaceable data:
  • Mailspools
  • Phone call recordings
  • Documents about to undergo a large change
  • Collections of data about to be re-organised
  • Data from sensors or data collection equipment
One of:
  • Local hard disk
  • Local network server
  • On-site, local
  • Very fast
Individual workstation issues:
  • Operator error
  • Sudden HDD or SSD failure
  • Power outage
This data is hard to protect in any other way due to it's changing nature.
Backup should be made quick and easy to perform - a single command to be run as often as seen fit.
The system should keep the last three or four such backups and then overwrite the oldest automatically.
Backup can be made to a different partition on the same disk, if there is no other option.
RAID also offers some protection again sudden disk failure.
Tier 5System data:
  • Configuration files
  • Important log files
One or more of:
  • Flash drive
  • HDD or SDD on another local machine
  • On-site, local
  • Very fast
Individual server failures:
  • Operator error
  • Sudden HDD or SSD failure
Copies should be made whenever configuration files are changed, and whenever log files are deemed to have a particular importance.
Recommended to keep old versions of configuration files until they have been copied to the regular 'weekly' backups.
(*) Note that in the case of a successful network intrusion, ("hack"), the availability of backups only protects against data loss. Exposure of confidential company data to third parties is a separate concern which is outside the scope of this article.
Where's the cloud?
You'll have noticed that we haven't mentioned 'the cloud' as a backup medium at any point so far. This is for very good reason.
Whilst various internet service providers would like to convince you otherwise, there is nothing magical about storing data 'in the cloud', all you are doing is uploading it to somebody else's computer. As such, cloud storage can indeed serve for any of the backup categories described above, and could - in theory - be your only backup medium. But is that what you really want?
Firstly, you completely lose physical control over how and where your backups are stored. You place yourself at the mercy of whoever is keeping your data, if and when you ever need to get it back again. Everything will be done on their schedule, and at their agreed price.
Secondly, do you trust your chosen provider to follow appropriate security policies, and keep your data away from third parties?
Thirdly, can you be absolutely sure that a copy of your data will really be available when you need it? Other people have system failures too, and if it's not their data at risk, will they even care?
In any case, cloud backup is not a category in and of itself. It's merely one tool at our disposal, and not necessarily the best one. And if you're in the unfortunate situation of not only considering placing your backups in the cloud, but also have the main and only copy of your data also stored on somebody else's disks, then you should be concerned. Very concerned.
“If the main and only copy of your data is in the cloud, you should be worried. Very worried.”
Keep copies of your work on your own physical media...
Tier 1 - archiving
Identifying the material
As we've already discussed above, reducing the total volume of data in our regular backup pool is key to making the whole task as efficient as possible. But just how do we identify which data is a good candidate for archiving?
Depending on your particular business workflow, this can range from trivial to moderately time consuming. If you're lucky enough to generally work on a single project until it's completion and then move on to the next, then you essentially already have your answer. Once the project has been been handed to the client, or reached it's 'time of transmission', then it should be fixed in stone at that moment. If an updated version ever needs to be produced, either to add new material or correct errors, the process to do that would be to create a new project using the the files from the original 'finished version' as a base. This in turn would eventually be archived once no further changes were made to it for an extended period of time.
On the other hand, if like most organisations you have multiple projects on the go at any one time then some diligence is required to remember to archive each one as it reaches it's completion.
Of course, in reality it might be desirable to wait for a set period of time after the apparent closure of one work flow before finally committing it to a permanent tier 1 backup, as in the real world unexpected issues and requests for changes do occur.
An alternative method is to simply do a search once every year for everything that hasn't changed in the previous 18 months. For example, you might do this during the first week of January, in which case your Tier 1 backups can be conveniently indexed by year.
Performing the backup
Essentially, this process is a fairly straightforward one-off job for each dataset. By definition, we're not, (or no longer), backing up this data as part of a regular schedule.
Considering optical?
If you haven't already read Crystal's article about using optical storage on OpenBSD, now might be a good time.
There you can learn a simple technique for doing encrypted backups to blu-ray disc, with no additional software required over the base system.
What does require some thought and planning, though, is organising the data before committing it to it's, (hopefully), final resting place. Although just copying the files and nothing else should in theory be sufficient, there are some extra steps well worth considering:
We strongly suggest that you checksum each file and record this index of checksums alongside the files it references on the same media. This should ensure that any errors that occur when reading the data back in the future don't go un-noticed. Alternatively, you could digitally sign the files. Don't assume that any possible read errors will be caught at the drive firmware level. We have been bad data read back as good from various media more times that we care to remember.
Additionally, place a text file containing notes explaining the contents of each disc, including who prepared it, when, and on what equipment. Recording this information along with the data itself ensures that it's always available to anyone looking at it at a later date.
As this data is hopefully being written once, for access at any time into the distant future, we need to consider file format obsolescence. Basically this means avoiding the scenario where the data can be read back perfectly without errors, but is useless as there is no current software capable of interpreting it's contents. Wherever possible, use open and well documented file formats, and for less common formats, include the format specification or source code for software capable of reading it. Whilst the source code may no longer compile automatically on a later system, somebody, (such as Exotic Silicon's commercial division!), will almost certainly be able to make it work or re-implement it.
You might also consider using different brands of media for multiple copies of the same data. New products are continuously entering the market, and old products can have their formulations changed and updated, (but not necessarily improved). Certain batches of media might turn out to be sub-standard.
The only thing you can be sure of is that unless you've already been testing the archival qualities of a particular brand, or have some trustworthy data about them, you'll likely only find out how good your choice was in many years to come when you need to read your data back. At that point, it's probably too late.
Long term media storage
Hopefully you'll rarely - if ever - need to access the archived copies of tier one data. As mentioned above, unless the data is truly historic and unlikely to ever be needed for reference, it's recommended to keep a copy of it on a separate local hard disk or on a local server for easy reference. Obviously this disk doesn't ever need to be backed up, as it's contents can be replaced from the tier one archives.
However, this doesn't mean we should just dump our neatly labelled set of optical discs in a damp and dusty basement and never look at them again. If you did ever need them, it would be rather disappointing to say the least if they turned out to be unreadable.
The first step is to make sure that the set of media, be it discs or tapes, is well labelled. This not only helps to ensure that you can find any particular dataset quickly, but also reduces handling and potential damage of the media. Catalogue each individual tape or disc with a unique serial number, and note it's location. This is especially important if you are keeping backups off-site.
The final storage location for your archives should be appropriate for the type of media you're using. Where possible, aim to keep temperature and humidity constant and within the specifications stated by the media manufacturer. Magnetic disks and tapes should obviously be kept away from magnetic fields. Optical media should be kept away from exposure to direct sunlight.
Infrequent but regular maintenance might also be required depending on the nature of the media, as we will discuss in the next section.
Regular maintenance
Check your archived data is still readable every few years! At the same time, perform any regular maintenance that is appropriate for the type of media:
Optical discsUsually optical media doesn't require any special attention, but be careful not to write on the data surface of low quality media that doesn't have a protective plastic layer with regular marker pens. If in doubt, write just a serial number in the hub area and put any further details on a paper insert.
Tapes
Tape should be re-tensioned from time to time, perhaps bi-annually. Traditionally, this was done to minimise print-through of the magnetic recording from one layer to adjacent layers and also to avoid mechanical stress on any one part of the tape. These factors are much less of a concern with modern tape formats, however we still recommend re-tensioning any such media periodically as the tape itself can deteriorate and shed chemical binders, (basically glue), to adjacent tape layers. Re-tensioning won't prevent this, but might help to reduce it's effect on any one part of the tape. Older cartridge formats may contain plastic guides, pinch rollers and other mechanical elements that can become stuck to the tape or cause it to deform.
Always allow tape-based media to acclimatise to the temperature and humidity of the environment before trying to read it.
Magnetic disksHard disks in cold storage should be powered up and allowed to spin at infrequent intervals to avoid problems with static friction, (stiction). As with tapes, if you're literally moving them from a low temperature environment in to a warm office, allow them to acclimatise for several hours before applying power. It's usually enough just to connect such disks to a power supply with no need for a data connection, unless you actually want to verify the content, (which can possibly be done less frequently). Before concluding that any disks which don't spin up are faulty, check to see if they have been configured to power up in standby, and therefore require a start unit command to be sent from the host.
If your routine inspections reveal signs of deteriorating media, format obsolescence or storage shortcomings, then address these issues without delay.
Tier 2 - weeklies
Scope of this tier
Although we casually call these 'weekly' backups, there is absolutely no reason why they have to be performed on a weekly basis. Fortnightly or even monthly may be perfectly sufficient for your needs, especially if you correctly implement the lower tiers. It's just a name.
These backups are arguably the most important, as they will be your go-to resource for anything but the most serious - or most trivial - failures. If you're only going to do one backup, (and we know that despite everything we advise, there will be some people who do), it should probably be this one.
Performing the backup
How you actually go about physically doing the backup will depend to a large extent on your work environment. If you're a small business with half a dozen or so workstations in a single office that store their data locally, it might be practical to walk around with an external hard disk. Otherwise you'll probably be backing up from a central on-site LAN server.
Far less preparation of the data is required here than it was at tier one. This is because these backups will mostly be obsolete within a few months, so the considerations about file format obsolescence don't exist and neither is it so important to keep detailed records of the contents and who prepared it beyond a simple time and date stamp. We still recommend checksumming all of the files, though, as the risk of reading bad data as good is still a threat here.
Backup media rotation
The chances are that you will be using re-writable media for tier two backups. This helps to reduce costs as well as avoid wastage and the bureaucracy of disposal of WORM media. But at the same time it creates two questions:
It goes without saying that any media, be it tape or disc, even solid state, has a finite number of possible re-write cycles. External hard disks probably fare the best here, although beware of the ridiculously short life span of micro-USB connectors. Putting a real SATA drive in a USB enclosure yourself rather than buying a real external HDD mitigates this problem, as the enclosure can always be swapped for another one or the drive connected directly internally.
In general, we wouldn't advise intentionally using re-writable optical discs for more than ten to fifteen passes, possibly fewer. Although the media may be rated for many more cycles than this, in reality even careful handling is likely to cause dust and scratches to accumulate on the recording surface. By planning to retire discs from the backup cycle at this point, if you do need to continue using them unexpectedly because the new batch was faulty or never arrived, then they should still have some life left in them.
For tape, despite manufacturer's claims of re-usability, we would be reluctant to trust any particular piece of media beyond twenty or so passes. In reality, they probably will last much longer, but at the same time they are probably not being written to by a tape drive that is in perfect alignment, with perfectly clean and demagnetised heads. Nor will they be stored in ideal conditions and free from mis-handling. In short, there are a lot of environmental factors just waiting to reduce that claimed longevity by an order of magnitude, so considering the current price of media, why take chances? Just like with optical discs, if we plan to retire tapes whilst they are still within their usable lifespan we have some time in reserve if new stock fails to materialise.
In any case, be sure to mark the number of times each piece of media has been written to as you go along, and obviously discard media that shows physical signs of damage or excess wear, or that produces errors upon verification.
Speaking of verification, as always, be sure to verify these backups after writing them.
Regarding media rotation, the simplest strategy is a plain round robin approach. So you use tape A the first week, tape B the second week, tape C the third week, and then back to tape A for the fourth week.
This works, and in same cases where it's deemed sufficient it might be the best approach due to it's simplicity. However, other more comprehensive options do exist.
One possible alternative using the same three tapes would be to use just two of them alternately for all of the regularly scheduled backups, except the first backup of every other month which is always done to the C tape. This has the advantage of keeping older backups to hand - potentially useful protection against silent data corruption - at the disadvantage of a potentially larger interval if the A and B tapes ever both fail.
As an extension to this scheme, the C tape can be retired after six passes - one year of use - and kept indefinitely. This preserves a snapshot of your work in progress that might be useful if you ever need to look back over a longer timescale than anticipated at data that has not yet been committed to a tier one archival backup.
If you're using optical disc rather than tape, you can simply make each sixth monthly disc a WORM disc, and keep that independently of the regular C disc that is in rotation.
Check your restoration procedure actually works
It might seem like a stupid question, but do you actually know how to restore the data from one of your backups if you ever need it?
Almost certainly, you'll run the backup procedure a order of magnitude more often than the restoration procedure. That means the restore process won't be as familiar to you. Mistakes can happen - you don't want to accidentally over-write the exact tape that you wanted to restore!
Operator error aside, it's fair to say that if you verified your last backup when you made it then your data almost certainly exists there. So this question really becomes, how quickly can we get back up and running again?
In business, time is money. Having your system down for an hour is an inconvenience. Having it down for a day could be more serious.
At the very least, consider doing a partial test restoration of your data every time you change your backup procedure, or upgrade the hardware or software.
To back up the entire OS, or just your own data?
Here at Exotic Silicon, we almost always suggest that you concentrate on backing up your own data and any essential configuration files, whilst leaving the operating system to be installed afresh.
Why? Backing up a running OS is awkward enough, but the real difficulty tends to come with the restoration. You'll usually need to install a minimal system anyway to even access your backups, at which point you've then got to overwrite parts of that running system whilst performing the restore. Any difference in versions between the new minimal installation and what you are restoring could lead to problems, but might be inevitable if you're restoring to different hardware - for example if a power surge took out not only your main system disk but also the motherboard as well.
Installing a BSD system from scratch is usually very quick and easy. The time consuming part is the configuration, but as long as you have all of your old configuration files to hand, re-installing any software packages and merging the old configuration with the defaults for whatever new version of the system you've installed, should be fairly straightforward.
Obviously it pays to make notes about any non-standard or obscure local modifications precisely for such future reference.
Additionally, whilst restoring a complete and working set of system binaries from backup might be fine in the case of a disk failure, for example, it's far less desirable when restoring from backup due to malware or network intrusion. In that case, it's necessary to be absolutely sure that either the backup pre-dates the attack, or alternately to verify every single binary against a known good source, to avoid re-introducing unauthorised software to the system.
Disk image backups
What? Nobody has used whole-disk image backups for two decades!
True. For younger readers who were not around in the 1980s, a quick explanation might be in order. In that era, when backing up a hard disk we had a choice between making an image backup where the entire disk was copied sector by sector with no consideration of the underlying filesystem, or a file by file backup, which copied each file in the traditional way. Image backup has long fallen out of fashion and is rarely used nowadays. This is in most cases a good thing, because although the backup procedure was fairly simple and fast, it was inefficient, (copying even unused areas of the disk), and anything but a full restoration to the exact same make and model of disk was usually something of a challenge.
So why are we even talking about image backups, then? Because BSD systems do make this kind of backup particularly easy. You can boot into single user mode with all of the filesystems mounted read-only, (or some even unmounted), and copy the entire disk to tape or another raw device. Restoration is also trivial, as you can boot into a very minimal installation from a USB flash drive or CD, and restore with no additional software required.
There may be cases where this is a quick and easy way of backing up an embedded system. For example, the single board computers we used in our SBC bootcamp article have a removable eMMC chip as their main local storage. That can easily be removed, placed in a USB reader, and imaged to a file on a desktop machine in a matter of minutes.
Tier 3 - end of the day
Scope of this tier
Unlike the preceeding tiers for which the system administrator is responsible, backups at this level become the responsibility of the individual workstation user.
Tier three is essentially a casual copy of the user's home directory at the end of the working day. In many cases this can be achieved with a simple shell script copying to a usb flash drive.
These backups are mainly intended to allow for quick recovery of a fairly recent version of any files that are accidentally deleted or overwritten. However in the event of a more serious system failure requiring the restoration of tier two or even tier one backups, the tier three backups may provide a newer version of files that were recently modified.
Additionally, in the event of a serious disaster such as fire or flood where your normal IT facilities cannot be immediately restored, tier three backups may allow users to do at least some useful work remotely from personal laptops.
Tier 4 - quick and casual
Scope of this tier
At it's simplest, this can simply be a shell script that copies the whole of the user's home directory to a separate partition or local network server. Intended only as a way to undo an unintended modification to rapidly changing data such as mailspools that wouldn't have been fully captured on yesterday's tier three backup, it doesn't matter if the tier four backups are written to the same local disk.
This script should be quick and easy to invoke manually with a single command, and users encouraged to use it before making any difficult to reverse changes to local files, or after receiving difficult to replace data from outside your organisation by email or other means.
Tier 5 - configuration files and logs
Scope of this tier
This is really intended as a convenience to the system administrator. It's effectively the equivalent of tiers three and four, but for important data specific to any servers they are responsible for.
Server configuration files should be backed up on the regular tier two backups anyway, but since they are usually very small, it's trivial to keep copies of them on a flash drive or another local machine. The backup could even be made to an archive in the administrator's personal home directory.
Having such a readily available copy to hand avoids the delay and inconvenience of fetching it from the tier two backups if a server needs to be re-installed, or if a configuration change is found to break something a few hours or days after it was made.
A shell script that archives all of the configuration files from all of the local servers in to a single timestamped tar file in the administrator's home directory, and automatically deletes the oldest copy when they exceed a certain number, is a reasonable way to implement this tier.
But surely this is an overkill?
At the end of the day, only you can answer this question, based on the value you put on your company's digital assets.
The comprehensive backup strategy we've outlined above is very similar to what we use internally at Exotic Silicon, and our archives still have plenty of data that we've preserved since the 1980s and 1990s. We've been keeping data safe since long before cloud storage was a thing, and along the way we've dealt with many a faulty tape drive, broken pins on IDC connectors, fried disk controllers, and deteriorating optical discs, as well as other challenges. Yet our old data is still intact, because we make a point of adopting a belt-and-braces attitude towards it's safe storage.
If you simply don't have the resources to do all of this, it's possible to get reasonable protection by just doing what we described above as the tier two backups. This should be considered as the minimum necessary to be effective.
Take note!
Expensive hardware not required!
You absolutely don't need special 'encrypted' usb flash drives, or a high-end tape streamer to keep your data safe.
A regular blu-ray disc recorder available for under $ 100 or 100 euros will allow you to copy almost 50 Gb of data to inexpensive but reliable write-once media in about an hour.
Whilst very cheap, low performance flash drives are best avoided for the tier three through five backups, moderately priced units are widely available that can sustain 20 Mb/second when writing. Any encryption required can be trivially done on the host, so an ordinary flash drive is all that is required.
Of course, more advanced hardware makes a complex backup schedule faster and more convenient. However in modern times, cost really shouldn't be a barrier to entry for any small business, or even a serious home user.
Restoring data - the other side of backup
Restoration procedures
We've spoken in detail about performing backups, and how the procedures described in this article should offer strong protection against many types of data loss. But what exactly do we do if the worst happens? And how does this complicated multi-tier schedule actually protect us from the dreaded encrypting malware or a subtly faulty disk controller that's been corrupting our data over a long period of time?
Much depends on the exact situation that has caused you to need to revert to your backups in the first place. If you are recovering from a sudden data loss due to hardware failure, such as one or more disks that have just died, your considerations will be somewhat different compared to being in a scenario of having just discovered long term data corruption. In the former, you can usually expect all of your backups to contain valid data and will be looking to restore the most recent copies. In the latter, you'll need to identify which files need to be pulled from older backups, made before those files were damaged.
In any case, expect to be busy - the system administrator's work begins whilst the smoke is still rising. Take your time, and don't rush. Mistakes here will be costly, and the time required to physically read back the data is likely to far outweigh any interactive operator time you could save.
Before doing anything with the existing system, check who is responsible for assessing the damage and should ensure that all policy requirements have been met regarding investigation of the incident and possible insurance claims.
Warning!
Do not under any circumstances load valuable backup tapes into an untested drive!
Always test that the drive is mechanically sound with a blank or expendable tape first.
Restoration after disk failure, or significant loss and damage to hardware
This section is only intended as a broad outline of the steps you would need to take.
If only a single disk has failed:
If there is widespread hardware damage requiring replacement:
In both cases, continue with:
The exact procedure followed will always, of course, depend on the individual situation.
Restoration after discovering long-term data corruption:
This section is only intended as a broad outline of the steps you would need to take.
This scenario requires a somewhat different approach, as we can't automatically guarantee the integrity of the backed up data. Naturally, if the data was already bad before being backed up then no error will be thrown during the restoration - the data coming off of the backup medium is intact as far as the drive and software is concerned, it just doesn't contain what you thought it would.
If the corruption was caused by hardware failure:
If the corruption was caused by malware or network intrusion:
In both cases continue with:
Always clearly mark any backups that are suspected to contain malware in any form, and if they are not needed for later analysis then destroy them.
Closing comments
We've discussed the importance of having an adequate backup schedule in place, how to reduce the size of the task by archiving unchanging data, presented a typical example of a comprehensive backup plan, and also touched on how to correctly store, label, and handle the backup media. We've seen that different approaches offer varying levels of protection against the diverse range of possible data loss scenarios, and talked a bit about how to approach the task of restoring.
If implemented correctly, with a high level of compliance, the five-tier backup schedule detailed in this article will not only protect your data from the vast majority of threats it faces, but also allow fast and reliable recovery in the event that disaster strikes.
Here's hoping that your backups are just as readable in 50 years time as our trusty 9-track tape!
Need help?
Whether you've got an old backup that stubbornly refuses to read, or need some one to one advice on the strategy that's most suited to your business, Exotic Silicon's commercial services are on standby for your call.