Rob Hirschfeld: I saw something come through, I didn’t really get into it yet. There are a couple of issues here. This is a very challenging but important thing. It’s double important where people don’t realize is that the software defined networking is important because there’s a lot of desire to have network function virtualization which means that in these clouds we’re sending traffic VM to VM and there’s a lot of traffic bouncing around inside the cloud. Which means that the traditional networking design was you put protections at the edge and a lot of network function at the edge with big expensive pieces of iron.
When you virtualize everything you have all these east-west traffic flows, you need to move your network functions into virtual instances. They’re not necessarily virtual machines, they might be running as agents or in containers or somewhere else in the infrastructure. You actually have to route traffic through this network, this virtualized network functions, VNFs.
Now, all of sudden the SDN layer is connecting these network function virtualization units and then reading traffic all over the place and then a lot of cases connecting you up to services that you have in the datacenter. It’s a big mess. It’s really hard, it’s really complex. It’s going to take more time to resolve because even if we get software defined networking right, if you’re depending on network virtualized router, in that router, this is when things you were talking about come in.
If that router is not reliable your whole infrastructure is subject to a single point of failure. The thing that was making Eucalyptus very problematic to use when we were looking at it 5 years ago now, was that to emulate Amazon’s networking model they had to have a network chokepoint. There was a single point of failure in the Eucalyptus cloud.
That was a showstopper for anybody who wanted more than basically a concept Eucalyptus cloud. I mean that was 5 years ago so I’m not saying where they are now.
Art Fewell: Yeah. It explains a lot about why … There are been a lot of pieces here that have had to mature. While a lot of people in enterprises are wondering, “Hey, how come I don’t have my cloud yet, or my private cloud the way I want it yet?” There’s been a lot of work to do on the back end, right?
Rob Hirschfeld: The thing that we see with software defined networking is it’s incredibly sensitive to the physical underlay. The story we tell is, the first step you need to be successful with our cloud of that complexity is that the physical underlay has to be perfect. Every time you learn something, every rev that comes out you have to be able to patch it and maintain it and keep it in sync and all that.
The thing that I saw is missing and is blocking a lot of this adoption is that because there isn’t a consistent baseline, everybody does it differently, everybody has to troubleshoot it. They can’t help each other. Software defined networking makes it even more complex because it’s very sensitive to the networking topology and the node topology and how you configure the agents on the systems.
Art Fewell: Race conditions were really, really hard when we thought of networking in isolation. As it becomes more and more integrated, it’s a challenge.
Rob Hirschfeld: It’s crazy. There’s a change. I think there’s a real change that people are going to look at SDN and basically unplug it and throw it out. Here’s the scenario, I’m an operator and let’s take public cloud ‘cause public cloud actually needs this layer for tenant isolation. I’m a private cloud person who’s running mountains of workloads. I’m trying to use software defined networking because I’m supposed to and it has some benefits. It’s a good thing.
Somebody calls me up and say, “Virtual Machine A can’t talk to VM-B.” All of sudden the operator is, “Okay. Let me check that. No, that’s not working. Let me look at the virtualized layer on the host. Let me look at the physical layer on that host. Let me look at the top of rack topology. Let me look at my switch fab or backbone. Let me look at my next switch.”
By the time they’ve gone through this whole list of things to figure out what’s going on, they’re just going to say, “Screw this. I’m tired.” The first time they’ll troubleshoot it, they second time they’re going to toss it, they’re going to turn it off and just say, “I’m going back to flat networking or I’m going to switch to IPV6 and just do point to point IPV6 with encryption.”
This is what happens, developers don’t think about the complexity of maintaining ops and they don’t worry about the support calls in the middle of the night when things are breaking.
At the end of the day the operators have to deliver a working service and if the service gets too complex with too many layers of abstraction in it and too many things that knobs turn, they’re going to get frustrated. That becomes a real cost.
A lot of times we overlook the deployment and complexity and maintenance cost in the equation. I think that’s going on in the Docker environment quite a bit. The developers love it, some use docker every day. It’s amazing and it helps me do my development faster. That’s a good thing. It doesn’t necessarily help me operate a datacenter better. It could actually make it more problematic if it’s not done right.
We don’t yet have the operational expertise on what it takes to go to a Docker environment and containerized workloads. OpenStack is still working through its operational challenges. With the 6-month development cycle, with all these new stuff surfacing, we’re only now really putting operational workloads on some of the OpenStack components.
I have a very operator-focus from my career. You have to be careful about how you build an operational framework. Developers can’t just toss things down and walk away from it. The whole DevOps movement has been about changing that and tightening that cycle. It’s just like lean manufacturing from the 90s, you can’t have somebody design something that can’t be built. We’re finally aware of that, we’re having the dialogues and it’s actually, from my perspective, very exciting to see us treating IT and software creation like a pipeline.
Art Fewell: We talked about networking, a lot in OpenStack and I think the audience is really going to appreciate your perspective on that.
Now, is there anything else in 2015 that … I know there’s too much. This is not an easy question. If you have to pick another quick topic that you think is really cool happening in 2015, OpenStack or not, what gets your attention?
Rob Hirschfeld: The thing that I’m really excited about is the service architecture. We’re in the middle of doing on the RackN and Crowbar side, we’re in the middle of doing an architecture that’s basically turning data center operations into services.
It’s funny ‘cause we’ve been doing it for a while. A lot of the stuff is not new. What’s happening is that we’re finally really describing it better. We’re getting some interesting tools to do service discovery. We’re using console, a lot of people are excited about etcd which are really about … They’re part of this docker container ecosystem, but they’re not docker container specific. They’re really about service discovery and being able to handle scale datacenters and really move things.
I think that that’s going to be a very exciting dialogue in 2015 because it’s really operationally a significant improvement and it’s accelerated by DevOps and script automation and it’s a necessary part of the docker container story. I feel like that’s really exciting.
Art Fewell: It has some really interesting applications on … I’m not worthy when I say the networking space or space that the networking people have thought that we’ve laid claim to for a long time. That’s in terms of finding things, registration and that … One of the other implications of docker that I’m not sure that most people quite understand yet. There’s a lot of awareness about VM sprawl, this idea of VM sprawl is something we need to get contained in a virtual machine world.
When you move into the docker world, important realization of docker is that a docker image could be from a bare metal, single system is basically almost a bare metal all the way to a docker container, inside of a docker container, inside of a virtual machine or that has a virtual machine.
You can really scale and it’s really … You wouldn’t have had with the virtual machines only, I don’t think you would really have this micro-services. I know you didn’t love that word. This idea that we can have a container that’s really, really small.
When I develop each of the functions that cumulatively make up my application, instead of putting a bunch of those into one virtual machine, ‘cause we got to pull them together ‘cause virtual machines are clunky and has a lot of overhead. I can say one of these little tiny docker containers and say, “I just wanted to run this one function and it can be web callable. A bunch of those are going to make up my application.”
It would put VM sprawl in the dust in a sense, but then we have ephemoral stuff at the same time. It’s hard to approach from a traditional mindset to think about how it would be to operationalize it.
Rob Hirschfeld: You’re going to have micro-service sprawl without a doubt. The funny thing about micro-services, if your service has to do a significant amount of work, it’s not a micro-service anymore. You’re going to end up needing to have a stateless micro-service front end for some data backend.
Art Fewell: That’s why you’re going to have it automatically scale out and bring out 10 ephemoral copies of the micro-service. Then you’ll have 10 services running at like one cycle of CPU each lol ;)
Rob Hirschfeld: Now, you’ve just … I love this term. Somebody was talking about heisenbugs at Facebook or Google or something like that which is basically bugs that only exist in the essence of this stateless scale environment. They show up, there’s this weird coding thing and boom!
This is … We’re back to where complexity is. The thing that you just described requires somebody to develop architecturally for that purpose. That’s five years out because it’s going to take people time to develop for it.
I think that we’re closer to being able to take more traditional services and then use the service brokers we have to then use DevOps automation to couple those together. I think that we don’t have to do the full level of re-architecting a new application development to take advantage of containerized service discovery. I don’t even think we need to containerize it.
I think that we can use these technologies in bits and pieces. We can do automation. We can do service discovery. We can take advantage of that and containerize some of these with true elastic automatic scale up, scale down which we saw coming with Cloud Foundry and OpenShift and Heroku and other PaaS type stuff. Yeah, that’s still there and it’s still going to happen. I think we’re going even a level beyond that and it’s pretty exciting.
We’re definitely completely blurring the lines between IaaS and PaaS and things like that.
I’ll go back to a little bit of history, but it’s useful when we talk about PaaS and what PaaS really means. Dave McRory and I … Dave worked at Dell. Dave and I founded a company together 99’, so we’ve known each other for a long time. At the same that we were banging around ideas about what platform as a service meant and what that looked like. It’s when he was defining what data gravity was, he’s the data gravity guru if you will. He’s made a good name for himself on that.
Those ideas came out of the fact that platform as a service is about stateless compute. Where does all the state go? What do you store that with?
The thing we understood when we looked at Amazon and Azure and Google, Google I’ve mentioned was new at that time. Was that the thing that they were selling was not the compute, but the services around it.
This is where data gravity comes in. Amazon wants you to store your data ‘cause once the data is in Amazon’s cloud or Google cloud or Microsoft you’re stuck in that cloud, right? They’re going to charge you for that.
What platform as a service really is about, it’s about how you store the information. What services do you offer around the elastic part? Elastic is time based, it’s where you’re manipulating in the data. The data and the services that provide that, they’re really interesting. The same is true, database services or big data analytics, all those. That’s what cloud is about and people really lose sight of this.
Art Fewell: It’s been a fascinating conversation with talking about all this stuff. Before we go, I wanted to make sure that we gave a chance to talk about your new project. Am I fair to describe to describe, this is a new startup?
Rob Hirschfeld: It’s a new startup for an old project. I’m CEO and founder of a company called RackN, which was cofounded by basically people who started the Crowbar project.
We inside of Dell, Dell open sourced this project. Dell made the decision to stop investing in it in April. We really thought that this was something special that we were doing a unique operational model that we bring in special value into the market. We made the decision to start it and some people who left Dell a while ago came back and wanted to do this. It’s me and some other people.
What we’re doing is we’re basically extending the life of the new version of Crowbar. We rewrote it and it’s this really exciting, interesting physical operations or physical infrastructure abstraction that lets people take all of the value and the automation and the things that they expect to happen in cloud but then do it against physical infrastructure.
That’s where you start getting into, really, software converged infrastructure where you can build networking based on what your needs are. Dynamically you can build infrastructure based on what you need. You can’t manufacture infrastructure but you can use it in a much “cloudier way”. It really redefines what you can do in a datacenter.