Selecting a Database Backup Process
Since data was first stored, there has been a need to recover data that was removed or lost. Computer systems make this easier than physical files due to the ease of copying data. The reasons for the loss varies from hardware failure to fat-fingered commands, but there has always been data somewhere that someone's needed to retrieve after it was gone. Here we evaluate backing up databases, which is part of a comprehensive system management plan and a requirement for anyone storing any kind of non-transient data in a database. Most of what is discussed here should be applicable to most database platforms in existence. Upcoming posts will explain how to set up these configurations for MySQL and SQL Server.
There are a few items that need to be taken into account when planning a backup strategy:
- How long do backups need to be stored?
- How much storage space is available?
- How much data can the business afford to lose in the event that restoring from backup is required?
- How long does a restoration from backup take?
All of these items can affect the choice of the backup strategy. A backup that takes 4 days to restore can be as bad as losing a day of data from a business perspective. Every stakeholder involved with the system needs to be consulted for input on the backup strategy and contingency plans need to be made available and agreed upon for all parties.
Database systems typically provide both full backups and incremental backups. A full backup is a snapshot of the entire dataset at the point in time the backup is initiated and contains every record and every committed change up to that point. The incremental backup works in conjunction with the full backup by containing changes in the order they were applied to the database since the last full backup was taken.
Full backups guarantee that as long as the backup file(s) are intact they can be loaded and the dataset will be complete as of the time backup was taken. Depending on the length of time a full backup takes to complete, a backup strategy could take a full backup several times a day if there was not a significant performance impact and the available storage space was available for the backups. Typically storage is the biggest limitation; so daily complete backups are fairly common with incremental backups filling in the gaps between the full backup.
Incremental backups supplement the full backups by containing the changes made to the data. When configured to do so, database systems will log all commands that modify the data in the order the operations were performed. The result is that if you have a full database backup (Backup A) and have all the log entries for the data operations from the time that full backup was taken until the next full backup (Backup B), Backup A with the applied logs will contain the exact same data as Backup B. Selecting a timeframe for creating a backup of these logs is the decision that will depend on how far back is acceptable data loss for the business and how long it takes to replay the logs against the last full backup.
The main drawback with incremental backups is that the length of time it takes to restore can be dependent on how much data changes. Consider the following example: Because the incremental backups store all data modification operations in the order performed, if all that occurs on a database is that a field is switched from a 1 to a 0 and back millions of times repeatedly between full backups, every single one of those operations will be logged and need to occur in the restoration. As a result, the time needed to restore using incremental backups needs to be calculated as the time to restore the last full backup as well as replay all the logged data operations. For some systems, this can mean multiple full backups per day even with the incremental backups if there is a lot of data churn.
The most common backup strategy we encounter here at Smart Software is a full backup taken each day. Daily full backups with hourly incrementals is a distant second. When setting up a backup strategy or reviewing an existing strategy, we always confirm with the client that the strategy fits their business goals and update the backup plan as necessary.
One thing that many people forget is that invalid backups are as bad as not having any backups at all. Whatever backup strategy is chosen, the recovery plan should be tested routinely to ensure that all the backups are valid and can be used for restoration. We have had many clients over the years that have set up their own backup plan and not tested to confirm it worked. Always test the restoration plan to ensure that resting works, and repeat the test periodically to ensure that it continues to work.