Data backup and restoration can be somewhat of a black-box effort. You often don\u2019t know whether you fully nailed it until disaster strikes, and there is always room for improvement, especially as cloud and hybrid options grow. We asked four network professionals to share what made them realize they should do more to bolster their organization\u2019s backup and recovery processes, and how they made that happen. Here are their stories.\nA Kansas university outgrows tape backups\nThe aha moment: In May 2011, a tornado hit Joplin, Mo., and Tim Pearson, a volunteer fire chief in a nearby town, was called in to help in the aftermath. \u201cSuddenly, I was in a town that I knew well but couldn\u2019t recognize anything. They literally painted intersection names on the streets to help people get oriented,\u201d says Pearson, who is director of infrastructure and security at Pittsburg State University in Pittsburg, Kan.\n\nLearn more about backup and recovery\n\nWhat is instant recovery? A quick way to restore lost files, test backup systems\nHow to back up hyperconverged infrastructure\nWhat\u2019s the best way to move data to a backup site?\nCloud recovery becomes DRaaStic\nHousing developer builds resiliency with DRaaS\n\n\nHis colleagues with data centers in Joplin, Mo., were struggling just to identify where the sites should be, let alone how to get their networks back online. He realized that PSU\u2019s approach of having traditional tape backups, rotated weekly, in a bank vault across town didn\u2019t provide enough reliability for the region\u2019s weather patterns. \u201cWe had to take a fresh look at our vulnerabilities,\u201d he says.\nGeographic diversity\nThe fix: Initially, Pearson and his team addressed the university\u2019s geographic vulnerability by placing another Dell Equalogic storage array and 50 percent of its virtual computing horsepower in the basement of a library across campus from the university\u2019s primary data center. The team also added a Dell MD3200 storage array at Wichita State University (WSU), which PSU connects to via the Kansas Research and Education Network, using a high-speed fiber ring. Data was manually replicated to the secondary site (the library) several times throughout the day. Backups were sent nightly to WSU, eliminating the cumbersome tape process that had been in place.\n\u201cA tape retrieved from the vault might be a week old and take a day to recover,\u201d Pearson says, adding that a disaster that took out the primary and secondary sites would make it even more difficult to restore the data from the tapes.\nAlthough the library and WSU arrays worked well, the PSU team decided to improve backup and recovery even more, weaving in Hedvig\u2019s Distributed Storage Platform (software-defined storage) for automated orchestration. Hedvig uses agreed-upon policies to manage data replication in real time among multiple nodes: the primary data center, the library and WSU. \u201cAs long as two of the three nodes are up and running, our data is accessible,\u201d he says.\nThe system was tested recently when the link to WSU was temporarily shut down due to an unplanned router reboot. \u201cHedvig noted a problem, isolated it and got the WSU system caught up as soon as the link came back online 15 minutes later.\u00a0Our data center continued normal operations throughout the incident,\u201d Pearson says.\nHedvig works well with the university\u2019s legacy systems, which are still housed on a Unix server with iSCSI connections. \u201cMost of the other vendors we looked at didn\u2019t support that type of legacy configuration [which the school is dependent upon], but Hedvig handles it quite elegantly.\u00a0Their client-facing \u2018proxy\u2019 interfaces (small physical or virtual Linux servers) serve as multiprotocol connectors into the Hedvig storage environment and offer a range of block and object-oriented protocols, including NFS, Amazon S3 and even iSCSI,\u201d Pearson says.\u00a0\nPSU\u2019s IT team tests recoverability as part of routine maintenance, bringing down nodes and recording response times. All of the storage network configurations are well documented and updated often.\u00a0\n\u201cMy experience in the fire service and at Joplin makes me aware that you can\u2019t take anything for granted, and my advice is to get as much geographic diversity in your storage network as possible,\u201d Pearson says.\nCorrectional services team shores up backup vulnerabilities\nThe aha moment: \u201cThere were two moments that really drove us into high gear for changing how we\u2019re doing backup and recovery \u2013 one man-made and the other a natural disaster,\u201d says Dwain Caldwell, a systems administrator in Iowa\u2019s Department of Correctional Services. Caldwell works in DCS\u2019s First Judicial District, which provides correctional services to 11 counties in northeast Iowa.\nA few years ago, a user in a supervisory role visited a Web site, not knowing it had ransomware. \u201cNothing jumped out to the person,\u201d Caldwell says. The ransomware penetrated the main file systems, but Caldwell and his team were able to stop it relatively quickly. Although the team had a valid backup to restore to, the time it took to bring operations back to normal was longer than expected. \u201cTraining employees helps, but we can\u2019t control social engineering. What we can control is how fast we can get back online,\u201d he says.\nThe second incident was a storm that sent water into the building where the primary site is housed and caused a power outage in the secondary site\u2019s building. \u201cI didn\u2019t think we were susceptible [to full downtime] until that happened,\u201d Caldwell says. Having primary and secondary sites so close together with no third alternative was an unreliable strategy.\nVirtualization speeds data recovery\nThe fix: In recent years, DCS and the Department of Corrections as a whole have worked to virtualize their computing environments, including using virtual desktop infrastructure, and Caldwell says his district of DCS is at about 80 percent virtualized. This has made implementing a new data-backup and restoration plan much simpler.\nDCS uses Nutanix Core\u00a0hyperconverged infrastructure to handle VDI and data protection and disaster recovery in the data center and remote sites. \u201cWe are able to set up our policies for backup and restore so it all happens behind the scenes if someone makes a mistake,\u201d he says.\nNutanix frequently takes and stores snapshots of production environments, so if DCS is hit by a ransomware attack, Caldwell and his team can automatically restore the system to the most recent snapshot, which is typically every 15 minutes.\nThe IT team has developed experiments to test recovery time, including taking down a server room so a node goes offline. \u201cThe goal is to see how long it takes VMs on that node to come back online on other nodes,\u201d he says.\nRestoring applications goes hand in hand with restoring data, he says, because most of the applications are so data-dependent, such as the probation and parole applications. \u201cUsers need access to historical data as much as the application itself,\u201d he says.\nIn the event data becomes unavailable from the Nutanix system, as in a flood or storm, Caldwell can tap into incremental backups stored on an EMC Data Domain storage appliance located in the same city as well as one in another geographical location, with the closer location getting backed up more frequently. \u201cWe\u2019d spin the best backup into a virtual-sandbox environment and then push it to the main data center,\u201d he says.\n\u201cBackup solutions today are so much more universal than before. You used to have to make sure the environment you were restoring the tape in exactly matched the original configuration. In our hypervisor environment, we are able to have our data available more quickly and efficiently,\u201d Caldwell says. The virtualized environment and automation also enable all storage responsibilities to be handled by two members of the IT team. \u201cWe are able to perform the backup and restoration piece and still wear a lot of other hats.\u201d\nBackup and recovery for Microsoft Office 365\nThe aha moment: The Aquilini Group has a lot of subsidiaries, including the Vancouver Canucks and its home rink Rogers Arena. The company also owns all of the arena\u2019s operations, including food and beverage services, as well as hotels, construction companies, restaurants, and blueberry and cranberry farms. The common theme across these investments is the need to protect data \u2013 whether it be customer information, surveillance-camera footage or point-of-sale transactions. That protection was tested when a third-party-led SAN upgrade went wrong and had the potential to lose a significant amount of data.\n\u201cWe wouldn\u2019t have been able to serve food and beverage at an event, which would have resulted in revenue losses and customer dissatisfaction,\u201d says Bryce Hollweg, director of IT at Aquilini Investment Group in Vancouver, B.C. Fortunately, the internal IT team had backed up the data properly and was able to restore all data. But the episode left Hollweg wanting to be even more proactive about backing up all data \u2013 even data generated by applications in the cloud.\n3rd-party backup for SaaS\nThe fix: The Aquilini Group has migrated to Microsoft Office 365 for its nearly 1,500 employees. And while Microsoft is good about guaranteeing uptime of the application, like most SaaS providers, it is less willing to take responsibility for data integrity. \u201cWe have some sensitive data that traverses the Office 365 network and need to protect it,\u201d Hollweg says. In addition, loss of the company\u2019s mailboxes would undoubtedly cause productivity degradation. \u201cThe more layers you can put in place, the better. A secondary and tertiary measure for cloud applications is not a bad practice.\u201d\nAquilini uses Veeam Backup for Microsoft Office 365 as a secondary measure to protect Exchange Online, SharePoint Online, Teams (chat), and OneDrive against accidental deletion, support rapid restore, and meet compliance demands. The backups can be stored on premises, in the cloud in Microsoft Azure or Amazon Web Services, or at a third-party provider.\nHollweg says he doesn\u2019t mind having multiple, targeted tools to manage, even with a lean staff, because the protection is customized to the type of data being stored, which makes recoverability faster and easier. \u201cSegregating information is good so there\u2019s not one pot where if someone cracks the code, they have access to the crown jewels.\u201d\nLocal protection for virtual machines\nThe aha moment: When The CSI Companies, a recruiting and healthcare IT consulting firm based in Jacksonville, Fla., decided to virtualize its environment, including SQL Server, with VMware, Matt Greaves wanted to make sure that recovery time objectives remained intact.\n\u201cWhen we started doing recovery tests for all the virtual machines, the results were scary. An entire site restore, which we thought would take 30 hours, was more like 90 hours. That was a huge pain point,\u201d says Greaves, director of IT at The CSI Companies. \u201cWith 3,000 to 4,000 people needing to get paid each week, even two hours of downtime for payroll systems can cause a significant rift.\u201d\nThe previous backup and recovery software that The CSI Companies used required IT to manually set policies for when to perform backups, for what period of time, and for which applications. Inevitably, there were gaps that would leave them with an out-of-date or incomplete backup, and the only option after a disaster would be to manually dig through and restore individual transaction logs.\nOn-premises backup can cost less\nThe fix: Greaves decided to take advantage of the virtualized environment and deployed a stand-alone storage appliance from Rubrik that hooks directly into the VMware environment. IT can apply a specific policy \u2013 gold, for example \u2013 to the VMs listed in vCenter and automatically protect data on a granular level. \u201cThey do policy-driven backup points so I can set the SQL Server to get a transaction log snapshot\u00a0every few minutes and then a full database snapshot every couple of hours,\u201d he says. Transaction logs are now applied automatically as needed for a full restore.\n\u201cBackup and recovery used to be something managed on a daily basis, now the only time we need to manage Rubrik is if we get an alert and need to go investigate,\u201d he says. As for documentation, Greaves says coworkers can get up to speed on Rubrik\u2019s use with a one-page best-practices sheet that sits on the company\u2019s SharePoint site.\u00a0\nHe considered moving applications and infrastructure to the cloud, including backup and recovery, but balked at the price. \u201cIt\u2019s so easy to get into the cloud for infrastructure and start spinning stuff up, but there is an hourly cost to all those tools. When we did a cost analysis, it was far cheaper to keep everything on premises,\u201d he says.\nExperts recommend SaaS bacup\nMany IT managers feel confident about their ability to back up and restore data from on-site or from a secondary data center. It\u2019s when you introduce cloud-based services that things get murky.\n\u201cWe see companies engaging a cloud service to replace on-premise service for applications like CRM without any real understanding of how that service handles backup and restore issues,\u201d says John Burke, CIO and principal research analyst at Nemertes Research.\nCustomers often get hyper-focused on failover capabilities and business continuity but don\u2019t consider data corruption issues or times when you need to roll back to a previous week\u2019s data. \u201cThat\u2019s not always a default capability,\u201d Burke says.\nVinny Choinski, senior IT validation analyst at Enterprise Strategy Group, agrees, emphasizing that \u201cdata recovery is your responsibility\u201d when it comes to SaaS. \u201cWhat if someone deletes your data? It\u2019s prudent to make sure you understand the recovery climate of your application.\u201d\nOne option for winnowing a growing field of backup and recovery service providers is to ask your SaaS provider who they prefer. Opting for one of their partners could make integrating backup for SaaS easier as well.\nAnd while signing on to backup and recovery services for your SaaS will likely add to what you planned to be a lower cost option for your applications, both Burke and Choinski say not doing so will leave your data vulnerable.