If you want to make a backup person apoplectic, call an old backup an archive.\nIt\u2019s just shy of saying that data on a RAID array doesn\u2019t need to be backed up. The good news is that the differences between backup and archive are quite stark and easy to understand.\nWhat is backup?\nBackup is a copy of data created to restore said data in case of damage or loss. The original data is not deleted after a backup is made.\nExamples of backups include a nightly backup of all files on your laptop or desktop, or all your photos on your iPhone being copied to iCloud in case you drop your phone. We also backup file servers (unstructured data) and databases (structured data).\u00a0 A backup might focus on the data, as in a database dump, the operating system of the server as in a bare-metal backup, or on both as with backup of VMware .VMDK files.\nThe definition of backup really comes down to purpose, and the purpose of a backup is always the same: to restore data if something happens to it.\u00a0 For example, a RAID 6 array might have a triple-disk failure, and all its data will need to be restored. Someone might accidentally or maliciously delete one or more VMs in your VMware, Hyper-V or AWS EC2 configuration, and they would need to be restored.\u00a0 You might one day realize every file in your organization has been encrypted by a ransomware package. Without a good backup system, your choice would be to pay the ransom.\u00a0 With a good backup system, you could figure out the source of the ransomware, stop it, then restore all your data \u2013 without ever paying the hacker.\nWhat is an archive?\nAn archive is a copy of data created for reference purposes. Although not required, the original is often deleted after an archive is made.\nWhere the purpose of a backup is to put something back to how it looked (usually) yesterday, an archive can serve multiple purposes. The most common purpose is to help you find some data from a long time ago. It could be single file that had a really important item in it, such as a contract a customer signed several years ago.\u00a0 It might be a related group of data, such as all the structural drawings of the building that just collapsed. Or it might be all the CAD drawings of the widget your company used to make that went out of style but is now back in style.\u00a0\nAnother related data set might be all emails and\/or files that can prove a given point.\u00a0 Perhaps an employee believes they were given permission to moonlight, and then was fired for doing so.\u00a0 Their lawsuit might issue an electronic discovery request asking for all emails to and from them that contain the words moonlight, after-hours or the name of the company they were going to moonlight for. Someone else might be trying to prove a hostile work environment and want to see all emails from a particular set of managers that contain a certain list of words that we do not need to list here.\nAn archive is what would help you accomplish all of these tasks. You might have an archive of every sales order, quote or contract your company has ever given.\u00a0 You might keep current contracts and orders online, but you keep all of them in the archive, which should have an index to let you retrieve orders and contracts via the content of those orders.\u00a0 You also might have an archive of every email ever sent or received by your company.\u00a0\nSome email archive systems can purge from the email server emails that have been archived, are bigger than a certain size, and\/or haven\u2019t been accessed in over n days.\u00a0 This helps keep the email system lean, saving on computing and storage resources, and making it easier to backup. That might even be the purpose of that archive, if you\u2019re not required by law to keep all emails.\nRestore vs. retrieval\nEven if the purpose of an archive is to save space on primary storage, it needs to be able to perform a retrieval vs a restore if it is to be called an archive. Backup systems restore and archive systems retrieve.\nWhen you restore something, it is typically a single file, server or database.\u00a0 When you retrieve something, it\u2019s usually a collection of related data, that may or may not have been stored on the same server or even in the same format.\u00a0 A restore is also done to a single point in time, such as restoring a database to the way it looked yesterday.\u00a0 A retrieval uses a range of time, such as all emails for the last three years.\nRestores require you to know a lot of about where the file or data was when it was backed up; otherwise, you can\u2019t find it.\u00a0 You need to know the name of the server it was on, the database or directory it was in, the name(s) of the file or table you want back and the date when it was last seen. Retrievals have none of that information; they just know they need all the files or records that match a set of parameters.\u00a0 Give me all files or emails that were created in the last three years that contain a particular phrase or were authored by a particular person.\nWhy the difference matters\nMany people try to use their backup system as an archive system, meaning they keep their backups for many years \u2013 or even forever. The first time you get a real retrieval request, you\u2019ll find how difficult it is to perform a retrieve from something that is mean to do restores.\u00a0 This will make the retrieval take much, much longer \u2013 potentially months instead of minutes \u2013 \u00a0and cost much, much, more \u2013 millions instead of a few dollars.\nIf the retrieval is for an electronic discovery request from a lawsuit, and you are unable to satisfy it in a timely manner, you run the risk of the judge issuing an adverse inference instruction. You\u2019ve taken six months to satisfy what they know to be a simple request, and you\u2019re nowhere near complete. The judge infers you\u2019re trying to hide something, and they say that to the jury. You just lost the case. The most infamous example of this was the Morgan Stanley lawsuit where they lost billions in this exact scenario.\nDon\u2019t use your backups as archives. If you have a long-term storage need, investigate an actual archive system. There will be an upfront cost, but it will be worth it in the long run.