Backup for databases: Get familiar with the type you use

The design of a database determines what method is best suited for backing it up, and those methods vary quite a bit.

In order to back up a database, you need to know how it’s delivered, but you also need to know which of the more than 13 types of database designs it employs. Here we’ll cover four of them—relational, key-value, document, and wide column—that generate a lot of backup questions.

Understanding these models will help the backup team create a relationship and trust level with the database admins, and that will help both parties.

Four database types

Relational

A relational-database management system (RDBMS) is a series of tables with a defined schema, or layout, with records in rows of one or more attributes, or values.  There are relationships between the tables, which is why it is called a relational database, and why backups generally have to back up and restore everything. Examples of RDMBSs include Oracle, SQL Server, DB2, MySQL, and PostgreSQL

Key-value

A key-value database is one of many not-only SQL (NoSQL) databases with a single schema consisting of keys and values, where you can look up the value if you know the key. Examples include Redis and DynamoDB.

Document

A document database is a NoSQL DBMS specifically designed to store documents. Records do not need to conform to any uniform standard, and they can store very different types of data.  JSON is often used to store documents in such a database. MongoDB is easily the most popular database that supports only the document model.

Wide column

Wide-column databases are another schema-free NoSQL DBMS that can store very large numbers of columns of data, without a preschema. Column names and keys can be defined throughout the database. Cassandra is the best known database of this type.

Most admins are most familiar with backing up RDMBSs, which typically consist of daily database dumps of some sort, followed by hourly or continuous backup of the transaction logs. This allows you to restore the database to any point in time using the latest backup followed by replaying the transaction logs, which will recreate the transactions that occurred since the last backup. Keeping all of these backups on disk and accessible to the database in question makes restores much quicker.

Key-value databases are much simpler to back up because unlike RDMBs, there are no relationships between tables that prevent you from doing table-level backups versus backing up the whole database. Backups can therefore be done at the table level at any time using either a full or incremental dump.

Backing up document or wide-column databases, on the other hand, can be quite complicate.  This is because they typically run in a multi-node configuration, with parts of the database on each node, without the ability to back up the entire database in one step. What’s truly baffling to backup folks, though, is the consistency model that these databases use. Let’s take a look at these models.

Three consistency models

There are two very different ways in which views of inserted or updated database data are kept consistent (or not) for all viewers of the database, and this consistency model can affect how you think about the backup and recovery of that database. Consider what happens immediately after the insert of a new record. Will all users immediately see that inserted record or not?  The answer to that question determines if the database supports immediate consistency, eventual consistency, or a hybrid of the two.

Immediate consistency

Immediate consistency, also referred to as strong consistency, ensures that all users will see the same data at the same time, regardless of where or how they view the data. Most RDMSs follow this model. Backing up an immediately consistent database is easier because you can back it up at any point via any supported method, and you will always get a consistent view of the database.

Eventual consistency

The term eventual consistency comes from the idea that, provided no changes are made to the entity in question, all reads of that entity will return the same value—eventually. A great example of eventual consistency is the domain name system (DNS). It can take a few minutes to many hours for a DNS change to propagate across all DNS servers in the world, but eventually that change will make it, and all the DNS servers will return the same value. This is why backing up an eventually consistent database is more complicated than backing up one with immediate consistency.

Hybrid consistency

Hybrid consistency is often used in NoSQL databases that supports eventually consistent writes but allows you to specify on a per-API-call basis the level of consistency you need on a read. Backup processes can use this feature to specify they want fully consistent data.

The consistency model can affect your data protection method because you need to be sure you are backing up or have the ability to restore consistent data. If, for example, you backed up an out-of-date database node in a multi-node database like MongoDB or Cassandra, your backup would be out of date. And if one node had part of the data, and another node had another part of the data from a different point-in-time, you would not get a consistent backup of all data by backing them both up at the same time. You would have what is called referential integrity issues between the two nodes if you restored them to two different points in time. Here’s which databases support which consistency models:

database chart Rob Schultz

Three steps to follow

To successfully back up databases, first know how it’s being delivered—hardware or VMs you control, someone else’s  hardware delivered as a service, or serverless where you only insert, update, or delete records.

Next, know what kind of database you are dealing with, which will also dictate how it’s backed up. If it’s a popular RDBMS like Oracle or SQL Server, you can most likely use a backup agent from your favorite backup product. This will allow you to send backup data directly to whatever storage or cloud system you are using for backup. If it’s one of the other three database models discussed here, you will most likely have to use the dump-and-sweep method where you use the product’s backup tool to create a backup on disk that you then sweep up with your backup system. Dump-and-sweep may be more complicated than a backup-agent setup, but it comes with much greater flexibility. 

The third and biggest challenge comes if you have a multi-node database using the eventually consistent model. Do not listen to anyone who tells you that the database doesn’t need backups. While they can survive many things, like node failures, these databases are still susceptible to human error and cyberattacks. Research the product in question and find out how to get a backup of either the entire database or each table in that database. With such node-level resiliency, you will most likely be restoring things at a table level.

Learn as much as you can about the databases in your environment, whether they’re in your data center, running in VMs in the cloud, or simply provided to you as a service. Then learn what type of database and consistency model they use and use that knowledge to talk to the DBAs and be their friends. Then find the option that prioritizes the kind of restores that database is likely to need.

Copyright © 2022 IDG Communications, Inc.

The 10 most powerful companies in enterprise networking 2022