Databases, also referred to as structured data, are an essential part of any data center. While databases do not typically hold a high percentage of the terabytes housed in a given data center, they do hold a high percentage of mission-critical data. Understanding their unique structure and operation is key to backing them up.\nStructured data cannot be backed up like unstructured data due to three very big challenges. First, databases are typically stored in datafiles that are constantly changing as long as something is making updates to the database. This means you cannot just back them up like any other file.\nSecond, most databases will have some kind of journal that can be replayed to either restore transactions after a point-in-time restore or to roll back a partially completed transaction after a crash.\n\nAnd third, a typical restore starts by restoring the datafiles from the most recent backup and is then followed with a restore from a journal in order to restore the database to as recent a state as possible. The recovery point objective (RPO) made possible with typical single-step restores of most backup systems\u2014possibly more than 24 hours\u2014is simply not good enough for mission critical databases.\nThe key to getting the backups of your database right lies in understanding how your database solves these challenges.\nDatabase models\nThere are at least 13 different database models, and knowing how to back up yours starts with knowing what kind of database you are backing up.\nThese models include: relational (the most common), key-value, time series, document, graph, search engine, wide column, object oriented, RDF, multivalue, native XML, navigational, and event. The following is a list of just the most popular models, along with a few whose popular databases have generated a lot of backup questions.\nRelational\nA relational database management system (RDBMS) is what most people think of when they say the word database: a series of tables with a defined schema (table layout), records (rows), and attributes (values).\u00a0 Examples include Oracle, SQL Server, MySQL, and PostgreSQL.\u00a0 These databases are often called SQL databases, after the query language they use.\nKey-value\nA very simple NoSQL (Not only SQL) DBMS, consisting of keys and values, where you can look up the value if you know the key. Popular examples are Redis and DynamoDB.\nTime series\nA NoSQL database specifically designed to handle time data, as each entry has a time stamp.\u00a0 The popular Prometheus database is an example and is used quite a bit in Kubernetes.\nDocument\nA schema-free NoSQL DBMS is designed specifically to store documents. Records do not need to conform to any uniform standard and can store very different types of data. JSON is often used to store documents in such a database. MongoDB is the most popular database that supports only the document model.\nWide column\nAnother schema-free NoSQL DBMS that can store very large numbers of columns of data without a predefined schema is the wide-column model. Column names and keys can be defined throughout the database. Cassandra is the best known database of this type.\nDatabase terminology\nDatabase terminology is also important, so what follows is a list of important terms. Not all databases use the same terms, but they should have a similar term that means the same thing. NoSQL databases often use very different terms or may lack something similar to the item in question.\nDatafile: A datafile is where a database stores its data. This may be a raw device (e.g., \/dev\/hda1 in Linux), or a \u201ccooked\u201d file (e.g., \/sap\/datafiles\/dbs06.dbf or c:MySQLdatafile.dbf). At this point, most databases use cooked or regular files as datafiles, and most have more than one for each database.\nTable: This is where things get a bit murky.\u00a0 In a SQL, relational, database, a table is a bunch of related values that behaves kind of like a virtual spreadsheet. NoSQL databases may have something similar or they may not.\nTablespace: A tablespace is a space where you put tables and is a collection of one or more datafiles. If your database doesn\u2019t have tables, it probably doesn\u2019t have tablespaces.\nPartition: Modern databases can divvy up and spread or partition a table across multiple resources, including multiple tablespaces.\nSharding: Sharding takes partitioning to another level and is the key to large scale-out databases. Sharding can even place pieces\u2014shards\u2014of a table on different nodes.\nMaster database: A master database keeps track of the status of all databases and datafiles. If multiple databases are allowed, it needs to keep track of them as well.\nTransaction: A transaction is an activity within a database that changes one or more attributes within one or more tables. Simple transactions change one attribute, and complex transactions will change many attributes as a single, atomic action. NoSQL databases tend to use simple transactions, and many who use them don\u2019t even think of their transactions as such.\nTransaction log: A transaction log records each transaction and what elements it changed. This information is used in case of a system crash or after a restore to either undo or redo transactions.\nConsistency models\nThere are two very different ways databases ensure that views of inserted or updated database data are the same for all viewers of the database. These are referred to as consistency models, and they affect backup and recovery.\nThe first is immediate consistency, also known as strong consistency, and it ensures that all users will see the same data at the same time, regardless of where or how they view the data. Most traditional, relational, databases follow this model.\nThe second model is an eventually consistent or weak-consistency database, whiich\u00a0 ensures that a given attribute will eventually be consistent for all viewers, but that may take some time. A great example of eventual consistency is within the DNS system, which has to wait until the time-to-live for DNS records has expired before updating information about domain names. This can take up to 72 hours.\nWhat, how, and why are you backing up?\nIf you\u2019re responsible for backing up a database, you need to understand how it is built and how it works. You need to understand where it stores its data, such as datafiles, whether or not it uses complex or simple transactions, and where it stores the log of those transactions. You will need to know how to get a consistent backup of the stored data and the transaction log.\nYou also need to understand how distributed your database is. Is it partitioned, but all within one host or is it sharded and spread across dozens or hundreds of hosts?\u00a0 If it is the latter, you will most likely be dealing with an eventually consistent database. Getting a consistent snapshot of a database spread across hundreds of nodes will be quite challenging, and restoring it will be just as challenging.\nSome may think that an eventually consistent database that uses replication across many nodes doesn\u2019t need to be backed up, but it definitely does. While you are protected against node failure, you are definitely not protected against human error. If you drop a table, it doesn\u2019t matter how replicated it is. You will need to restore it.