My humble beginnings\nBack in the early 2000s, I was the sole network engineer at a startup. By morning, my role included managing four floors and 22 European locations packed with different vendors and servers between three companies. In the evenings, I administered the largest enterprise streaming networking in Europe with a group of highly skilled staff.\nSince we were an early startup, combined roles were the norm. I\u2019m sure that most of you who joined as young engineers in such situations could understand how I felt back then. However, it was a good experience, so I battled through it. To keep my evening\u2019s stress-free and without any IT calls, I had to design in as much high-availability (HA) as I possibly could. After all, all the interesting technological learning was in the second part of my day working with content delivery mechanisms and complex routing.\nAll of which came back to me when I read a recent post on Cato network\u2019s self-healing SD-WAN for global enterprises networks.\nCato is enriching the self-healing capabilities of Cato Cloud. Rather than the enterprise having the skill and knowledge to think about every type of failure in an HA design, the Cato Cloud now heals itself end-to-end, ensuring service continuity.\nThe importance of redundancy\nHA is a necessity for application stability. It is usually misinterpreted as a value add-on, although it is a must-have component. Our digital transformation relies on network stability, we therefore require a stable and consistent networking experience.\nDelivering an always-on highly available network is easier said than done. Local redundancy isn't enough, and you need to plan through multiple layers of failover across the entire network and security infrastructure. This includes layers at a device, site, regional and global level.\nEvery end-to-end component that could increase design complexity and the recurring costs for the additional equipment needs to be made redundant. This additional equipment may only be used for minimal periods along with spares in storage.\nBut the more equipment the more complex the HA interaction. Within the IT network that I ran, for example, a particular vendor offered local device redundancy with supervisor engine failovers that could perform nonstop forwarding upon a primary supervisor failure. So essentially, the brain of the device was redundant.\nThe configuration was designed as per validated designs and tested appropriately on the deployment phase. However, there was a limitation to it when there was a software bug or hardware failure, it never worked when I wanted it to.\nI was often left to defend myself without a professional explanation to the chief executive officer (CEO). And most of the times, I ended up just saying \u201csometimes these things just don't work.\u201d As a result, even today when I hear of self-healing and nonstop forwarding, I always stop to take a breath.\nSite-to-site redundancy\nSo I knew in the back of my mind that designing high availability for a single device was not 100 percent foolproof but I still had to move to high availability design in the enterprise between multiple diverse locations. Each site had different edge equipment. Let\u2019s just say it was a complicated project.\nI broke down my high availability strategy in regions, generally based on latency. During intervals at night, I would replicate data between different regions. Although, this worked most of the time any interference with latency would cause the job to fail.\nMy career quickly moved to design high availability to both greenfield and brownfield MPLS networks. It involved extensive skills and effort to design under the framework of such network types. It is a challenge to get it done right. It requires knowledge and skill with a lot of testing, feedback, and documentation.\nIn the minds of the engineer\nToday\u2019s infrastructure is more diverse and interconnected than before. There are even more moving parts. All of which would make HA complicated but gets even more complicated when the HA design is solely in the mind of the engineer not in the central database where they can be modeled, updated and controlled.\nIt often depends on the individuals day-to-day working practice and previous technical knowledge as to how he or she is going to design. There are so many ways to design high availability and as many ways to shoot yourself in the foot. If it's in the mind of the engineer, it will consist of manual configurations causing issues with location failures.\nDaisy chain of manual events\nA location failure would result in a daisy chain of manual events. Therefore, engineers must manually update policies in the firewalls and other security or networking appliances. There have never been any \u2018follow-the-network security rules\u2019 where security rules could change dynamically with the network.\nMore importantly, when connectivity is finally restored, you need to make sure that the outdated security rules would not break the application service.\nThe rollback process was usually a document put together by someone who had already left the organization. It would consist of a variety of steps for example if node 1 fails in data center two change policy x on firewall A to policy y on firewall B. The list goes on. Most of the time we would just wait for the big bang.\nWait for the big bang\nThis is certainly not something you can test and pre-plan for. You just have to wait for the big bang to happen, which is usually around the three years mark and analyze what happens then.\nI thought to myself, wouldn't it be great if we could push all this complexity to the cloud and let the cloud take care of it?\nMy take on Cato\nCato\u2019s self-help capabilities minimize the chances that problems will crop up in HA design. Cato Cloud replaces the myriad of appliances, VNFs and standalone services that make up the network with a single processing software engine for routing, optimizing, and securing all WAN and Internet traffic. A simpler network is a more reliable.\nThen Cato builds self-healing from the data center through the Cato network and to the remote office. Changes to the network automatically cause updates to security policies. It\u2019s the kind HA integration I wish I had when I ran my network.\nNow that Cato has fully converged self-healing into our security and networking cloud platform, it has revolutionized the way we look at SD-WAN. By remediating network failures, updating the security infrastructure, and adapting workflows according to business priority, we are now witnessing a new era of SD-WAN.