When it comes to enterprises successfully transitioning to the DevOps and the cloud, the key isn’t just technological agility, but also organizational agility, also known as “culture.”
At least, that was the contention of Constantin Gonzalez, principal solutions architect at Amazon Web Services (AWS), in a session at the recent Amazon Web Services re:Invent conference in Las Vegas. Titled The Enterprise Fast Lane—What Your Competition Doesn’t Want You To Know, the session paired Gonzalez with Christian Deger, chief architect for European car-market site AutoScout24, to discuss the ongoing journey from monolithic apps running .Net/Windows in on-premise data centers to microservices architectures running in JVM/Linux in the AWS cloud. You can see the video of the entire talk below, but I wanted to highlight some of the most interesting takeaways.
+ Also on Network World: Can any cloud company keep up with Amazon +
1. Dev and Ops have different priorities
The journey to the cloud can raise issues between groups, Deger said. For example, while developers typically embrace change, operations staff traditionally strive for stability. “There’s always tension between those two mantras,” he noted.
2. More servers are not the answer
Before moving to the cloud, Deger said, the Munich-based company’s traditional architectures were coupled with agile principles and “supported the growth of AutoScout24 for many years.”
So far so good. But engineers didn’t know how their apps were run in production, “so Ops would just spin up new servers to compensate for any performance issues.” The company used a ticket system to request new resources, he explained, and engineers would specify what they needed and then wait for it to appear. No one was happy.
3. The desire for change is NOT universal
“A small group wanted change,” Deger recalled, “but a large group opposed it.” But when AutoScout24was sold to new investors, the new CEO realized that “we are good, but not great” and gave the company “one year to get ready for the future.” That meant everything from attracting the best talent to moving to more modern platforms.
4. Start small, but do it all
AutoScout24 decided to start small to avoid overwhelming the company. The idea was to treat initial changes as experiments, he said, and learn from them before tackling the bigger issues.
At the same time it was upgrading its infrastructure, the company was revamping the AutoScout24 website. Given the risks in changing so many things at once, Deger said, the company wondered if there were any intermediate steps it could take—leaving others for later. Ultimately, that was rejected. “We need to do it all,” Deger said.
5. Scaling teams is difficult
In November of 2014, Deger recalled, a group of Dev and Ops folks spun up an AWS instance and moved from running .Net on Dell PCs to C# on Macs. This new team was responsible for everything, from infrastructure to app delivery.
As the process progressed, Deger said, the company ramped up that original team, adding more members and then splitting it into two to four new teams. These split teams worked to share knowledge, and the new teams learned the new way of working very fast. Unfortunately, this process destabilized the existing teams and didn’t allow them to become performant.
6. Decide on your core IT principles
In addition, the new team members didn’t always understand the original goals.
So, AutoScout24 came up with a set of core IT principles designed to help everyone get on board. These principles included both strategic goals—such as reduce time to market—and business objectives, as well as the architectural, design and delivery guidelines needed to support them.
While Deger presented this list of principles to the audience (and you can see them in the video below), he asked attendees not to copy them for their own use. Many are generic, he said, and you’d likely come up with similar ones yourself. More important, he added, “the valuable part is the discussions you have to come up with your own.”
7. Create technology “guilds”
In what seemed a charmingly German approach, AutoScou24 also set up what it called “guilds” to foster cross-team communications around topics such as macro architecture, infrastructure, QA and so on. These “self-organizing common-intent groups” meet weekly, Deger said. Some do actual work, some make decisions and others just share information as needed.
8. Microservices architectures mirror corporate structures
Citing Conway’s Law, Deger noted that autonomous systems tend to organize around business capabilities.
“We wanted to have a microservices architecture,” Deger said, “which means the company also had to be set up like that. … We wanted to build products, not projects,” and allow teams to make fast, local decisions.
The mantra became: You build it, you run it. Each team is responsible for fixing anything that goes wrong, which leads to resilient and robust services, he said, because the team does not want to be woken up in the middle of the night when something goes wrong.
9. Have a plan for making ongoing technical decisions
Each team was empowered to make its own technology decisions, but AutoScout24 didn’t want to end up with a stew of different technologies. So, while the first of the new teams made technology choices and built tooling around them, “when the third, fourth and fifth teams came onboard,” he said, they could pick up these choices. But they weren’t forced to do so if they had a good reason to do something different. Of course, these teams in turn would be required to support their own technology decisions.
10. Servers aren’t cattle; they’re hamburger
Expanding on the common meme that servers are becoming cattle, not pets, Deger went a step further. At AutoScout24, he said, servers are hamburger, not cattle:“We’re not interested in the cattle, but only in the meat.”
None of the company’s servers survive a software update, he said, and AutoScout24 is working on a containerized approach, though that is still in a transition phase.
11. Staging environments are history
Staging environments are designed to look like production, Deger said, and used to test every new software release. But in a microservices world, many releases behave differently in production than they do in testing.
AutosScout24 found it was an ongoing struggle to keep its staging environment looking like production. It took “a lot of effort,” Deger said, “but we couldn’t be confident it reflected what works happen in production.”
So, the company decided to ditch it’s staging environment and make all new releases directly to production.
Moving to continuous delivery and constant updates may seem scary, Deger said, but if you increase the number of changes, and you do it right, then your failure rate actually goes down.
12. “Monitoring is the new testing”
Testing is still important, Deger said, but in a continuous deployment microservices environment, you can’t easily tell how a change will affect other services. Still, the company wanted to “be bold, not stupid,” he said, so it used “canary releases” under production load, but not reporting as real, that created shadow traffic that could be monitored for problems before being rolled out to users.
Monitoring should include business KPIs, he said, not just operational metrics. For example, AutoScout24’s engineers took it upon themselves to build their own dashboards, pulling everything together, including AWS costs.
Constant monitoring is especially important in the microservices world, Deger concluded, because your environment changes all the time.
If you want to learn more about AutoScout24’s journey to a cloud-native microservices architecture and what it discovered along the way, watch the video of Deger and Gonzalez’ full presentation below. I found it packed with worthwhile insights into an increasingly common transition.