Popular developer platform GitLab has concluded that the public IaaS cloud is not an effective platform for hosting its open source file storage system with high input/output demands. So, GitLab is ditching the cloud.
In a blog post explaining the decision, GitLab engineers say they’ll transition their CephFS storage tool to bare metal infrastructure that they will manage themselves. GitLab provides a platform to help teams of developers write, test and ship code.
GitLab's storage issue is a prime example that not all workloads are ideally suited for the public cloud. GitLab is hardly the first company to pull an application from the public cloud; DropBox announced plans to build out its own cloud platform instead of using Amazon Web Service’s cloud earlier this year, for example. Still, many other enterprises are going all in on the cloud.
These conflicting examples reinforce the idea that each company must evaluate their own circumstances to determine if the cloud is a fit for their organization.
+MORE AT NETWORK WORLD: What you need to know about microservices +
GitLab’s CephFS needs a “really performant underlying infrastructure,” GitLab engineers wrote. “By choosing to use the cloud, we are by default sharing infrastructure with a lot of other people. The cloud is timesharing, i.e. you share the machine with others on the providers’ resources,” GitLab wrote. “As such, the provider has to ensure that everyone gets a fair slice of the time share. To do this, providers place performance limits and thresholds on the services they provide.”
GitLab attempted to run CephFS on the cloud (it does not indicate which one), but says that the application became the “noisy neighbor” requesting spikes of CPU usage on shared servers. “We became the neighbor who plays their music loud and really late. So, we were punished with latencies,” the blog explains. A graph in the blog shows that latencies for CephFS ranged from around 10 second latency to up to a minute.
Conventional wisdom is that the public cloud is good for scale-out workloads that have variable demmand. Resoruces can be spun up as needed and decommissioned when load drops; meanwhile users only pay for the infrastructure needed. But GitLab found a problem with this: The scaling out of resources takes time. “What we discovered is that yes, you can keep spawning more machines but there is a threshold in time, particularly when you're adding heavy IOPS, where it becomes less effective and very expensive. You'll still have to pay for bigger machines. The nature of the cloud is time sharing so you still will not get the best performance. When it comes down to it, you're paying a lot of money to get a subpar level of service while still needing more performance.”
In defense of public cloud vendors, there are ways end users can assure infrastructure performance for their applications. In AWS’s cloud, for example, customers can pay a premium to run their apps on dedicated infrastructure that is not shared with other users. Customers can also pay extra to get guaranteed provisioned input/output per second by using Amazon EBS Provisioned IOPS, the company’s elastic block storage service. Both of these scenarios are more expensive than on-demand virtual machine instances or standard non-provisioned IOPS storage, however.
GitLab's swtich to self-managed bare metal servers will have its own challenges. The company will pay upfront capital costs of server infrastructure, then will have to plan for the costs of maintaining and replacing them. The cloud frees users from those obligations and turns infrastructure into a operational expense. But for GitLab, the consistent reliable performance of running their own infrastructure outweighs those challenges.