If I were to play buzzword bingo today I'd really like to have the words Cloud, Fabric, OpenFlow, API, and Stack lined up - I would be sure to win in any vendor briefing in the first 42 seconds of the meeting. Let's face it, even if a vendor has products that were designed for wiring closets or branch routing they are all trying to find some way of hitching them to the Cloud train.My fundamental question is - does that work? Can I take products and technology that had design points, price structures, and power draw that were aimed squarely at large enterprises when they were built and conceived and repurpose them for the cloud market effectively enough that customers and providers are not impacted?See, the challenge of this is that public cloud is one of the purest forms of IT. IT Operations is THE business of a public cloud, period. There is nothing else that can mask the sins of the architecture - failures/outages are gloriously public, CAPEX and OPEX of the architecture directly impact the bottom-line, and to paraphrase Steve Jobs, you can't mask bad IT decisions by selling more sugar water. I have heard many people talk about 'The Cloud' (which the way they say it I feel must be capitalized) and one that struck me profoundly was a conversation I had with Randy Bias, CTO of Cloudscaling who stated quite directly, "You cannot build a profitable and competitively priced public cloud with enterprise technology, the vendor taxes are just too steep."Randy also stated that clouds required new design patterns, ones based on commodity hardware, open source software, APIs for programmatic access, simple systems that horizontally scale, automation, and multi-tenancy. So I explored this a bit and realized cost came in several ways into these cloud deployments, some more tangible than others:- you, a public cloud provider has to buy gear. (Vendors like this of course and as cloud providers start taking on more and more small/medium business applications there can easily be some demand-side consolidation so its a crucial fight for the vendors affected.) The problem is that the vendors have to operate this gear for profit. So the less they pay, or the more cost effective gear they use, the more profitable they are.I have seen some configurations cut the revenue potential of a public cloud in half by taking too much power, and then further reduce profitability by having per Socket/CPU taxes on software, grossly expensive networking gear chock full of features that add to the complexity of the network - yet aren't consistently manageable or capable of being automated, and have no programmatic APIs. - operational expenses add up, and the one many people hone in on is power draw and maintenance contracts. I have seen many vendors looking at power draw as, "We are 30% less power than Competitor X, so we can save you money." I will contend this is probably a bass-ackwards way of looking at it for a cloud provider - its about compute density.
Asset Acquisition Cost
- If I have a 3MW space, I am going to fill it, to capacity, with as many servers as I possibly can.
- A decent SuperMicro server with a good chunk of memory and dual-socket CPU will consume 350-400W
- Best case, with no other infrastructure I can fit approximately 8571 servers into this data center, or about 215 cabinets of equipment
- Everything else you put into that data center, on the same power grid reduces your ability to support the maximum of 8571 servers. If each server can house 40 VM-SMALLs at Amazon pricing of around $64/month per VM-SMALL I can make $2,560 per server
The 3MW data center therefore has a maximum income potential of $21,941,760 per month. But this is where your mileage can really start to vary - every switch, router, load balancer, and firewall you put in reduces the number of servers you support. Every 350W you consume takes out $2560 per month of earnings potential. Integration CostsCommon problems I have seen recently:
Can the equipment that you are contemplating using be managed by your cloud management system - does it tie in and integrate? These costs can quickly surpass all others - if the system has local or remote APIs that are well defined, adhered to, and version controlled or pre-integrate with whatever cloud management software/framework you are using you are probably off to a good start. If the best available is a CLI and you are expected to write PERL and do some screen-scraping to provision VLANs and segments and add routing entries you may want to continue looking.
All of these are variations on a theme btw, many cloud management companies seemed to have started with the premise that they can bare-metal provision the servers and if they put nice UIs around it they can have a neat new service. They got things up and running, it worked great in one or two cabinets. Then they started to build out a bit - Then they ran into The Network. Many of them forgot that networks were designed the way they are for a reason, and many customers are not comfortable changing their networks wholesale, and almost never does a network exist that every port can talk to every other port at wirespeed, L2, with no hierarchy and supports thousands of ports. (I don't want to get into a debate about whether it can be built and which vendors support what : It's build-able but most existing networks are not set up that way.)
- A vendor only supported 1000 total VLANs, yet they wanted to have customers separated by VLAN and given them multiple VNICs per VM. They quickly ran out of VLANs and they cannot fill their servers to capacity.
- A cloud management company didn't integrate with their underlying network, so they could never upgrade their switches. They had some easily correctable issues that new code had been available for months but could never upgrade. Outages were caused.
- A customer bought a pre-configured 'Cloud in a Can' from a vendor consortium - in order to break even they have to sell each VM at $450/month. Their competition charges $50/month and is profitable - they used open source software, and cost-effective hardware.
- A customer built a simple cloud based on a Top-of-Rack switch, but then when they had to scale out hadn't planned on expensive L3 aggregation switch costs and delayed their expansion until they had funding for it costing them customers. Then when they added the aggregation device they realized they had not planned well for the addressing issues and VM mobility issues around L3 boundaries.
So, my summation from these stories, some spreadsheets I pulled together doing cost comparisons, and from what I have seen with many companies building public clouds is that if you use the 'Cloud in a Can' approach from vendors selling products that were designed and built for Enterprise customers you may be able to get it to work, but you will lose in the near-term:
- You will overpay in most cases
- You will generally be sold a very complex system that does not integrate with any cloud management systems in a meaningful fashion
- You will reduce the number of servers you can bring to market because of inefficient power draw and HVAC handling
But what have you all seen? Are there examples of profitable and growing public clouds that do not run on open-source software, cost-effective hardware, or have programmatic APIs and simple systems?