First-ever test of public cloud management wares
RightScale, Tap In Systems and Cloudkick allow IT to monitor and manage public cloud resources
It's simple to rent raw compute power on the public cloud. The challenge is to deploy, manage, and take down jobs that are hosted in the cloud. In this groundbreaking test, we set up public cloud accounts at Amazon, Rackspace and GoGrid, and tested management services from RightScale, Tap In Systems and Cloudkick under real world conditions.
Cloud computing: Early adopters and five key lessons
In our first cloud computing test, we reviewed virtual private clouds or enterprise clouds from Rackspace, Terremark and BlueLock. In that scenario, enterprises were looking for dedicated resources for persistent applications. The vendor provided a full menu of compute resources, operating system instances, security options, among other things. We ordered what we wanted and the vendor built or provided the infrastructure. Everything was determined in advance, and billing was on a monthly basis.
Public cloud resources, by contrast, are completely a la carte. Everything is up for sale, piecemeal. You can use the public cloud vendor's API, choose an operating system version, memory, storage, bandwidth, security metrics, IP and DNS addressing, and perhaps a place to store your stuff when you're done.
We signed on to Rackspace, GoGrid, and Amazon Web Services (AWS) with a simple credit card and verification step. We rented compute resources by the hour. The meter starts running as soon as the resources are used.
As in our enterprise cloud test, we chose to deploy Linux/Apache/MySQL/PhP (LAMP) instances.
The management tools that we tested allow IT administrators to look at rented resources, check them for status and utilization in order to optimize resources and perhaps balance what's online and being used. One way to think about these products is as cloud-based analog to traditional network management software, such as Tivoli or CA/UniCenter. The focus is on monitoring processes and sending alerts.
We also found that RightScale and Tap In have capabilities that go far beyond these basic management functions. Running under the aegis of the public cloud vendor's API set, or directly through agents on the operating system instance, these services can control the process of building, deploying, monitoring usage, then shutting down jobs in a timely way.
In an upcoming test, we will review these advanced application or job focused management capabilities provided by RightScale, Tap In and others.
Our findings in this test are that RightScale impressed us most with its overall control and deep understanding of specific cloud vendors like Amazon. Tap In Systems has more breadth in terms of different clouds that can be used, it's just not as easy to use. And we liked Cloudkick for its simplicity and ease of use.
RightScale Cloud Management Platform
RightScale sets itself apart from the competition through its mastery of the cloud provider's API and control plane, coupled with templated OS instance generation and monitoring. RightScale's process control of Amazon public cloud resource was impressive. One starts by taking a process job and breaking it down into components. The beauty of cloud computing is that a 100-hour job on a single machine can be broken down into a one-hour job on 100 server instances (where that's feasible) at the same cost.
But provisioning the server instances, seeding data, doing the job, getting the outputs, then tearing down the servers would be impossible to do without templates. RightScale has a library of approximately 50 server templates, and RightScale's partners provide an equal number. One takes a template, imports it, and uses it to spawn a server instance. The template can be edited to provide monitoring characteristics that can be used to watch the processes. The server templates include what RightScale calls RightScripts, but you can add and modify more at your convenience. Some scripts are in Ruby, others in perl, bash. As an example, we imported a LAMP server template (CentOS Linux, Apache, MySQL, and PhP), then chose characteristics called inputs where we could chose its location on Amazon's server, the code package, data dump file, schema, processes to be monitored after setup, PhP modules to load, MySQL username/password — more than 50 characteristics.We could save the modifications as a revision, CVS-style. We could roll-back to previous revisions of the template if we desired. We could then launch the instance into Amazon's cloud. RightScale monitors the Amazon Web Services API so that we could monitor resource consumption and relate that to budgeted consumption — and therefore, cost.
Once the instance template was built and stored, there were three steps: boot process, operational, and then subsequent decommissioning. An audit log shows each part of the steps. There are more than 300 scripts that do everything from backing up databases to the Amazon S3 storage cloud, installing different server processes (like MySQL and Hadoop grid-based distributed processes), setting privileges, or installing locally-sourced components. Since Amazon supports Linux CentOS, as well as Windows 2003/2008 (not R2) server license rental, we had our choice of server substrate to run application jobs via our template choices. Licensing provisioning isn't needed, as it's included in the Amazon price.
And as Amazon has a seeming superfluity of operating systems choices, the capability to service jobs on a variety of platforms was made simple.When monitoring the processes we spawned by our deployments, we found that RightScale allowed us to view the servers by name. We could then watch histograms of 19 types of monitored processes. RightScale uses the FOSS CollectD RDDtool plug in application, and its numerous monitored object characteristics that collect data and feed it to the histogram graphs.
RightScale understands the Amazon API very well, and alerted us as we tested monitoring in a way that was better and more understandable than its competition.
Philosophically, RightScale believes in cloud portability, a concept where cloud vendor resources are used deeply in terms of vendor-specific resources, but broadly, meaning that multiple cloud vendors are interchangeable. We weren't able to check that thoroughly as the only other public cloud vendor that RightScale is compatible with is in the alpha stage — Rackspace.
The RightScale Web user interface worked well for us. Importantly, most components of the user interface have Web-based help available so that procedures are clear. RightScale's scripts can also be modified. These and most RightScale actions used can be bookmarked inside the user interface for re-use. RightScale lists the groups of instances across all clouds, server and instance templates, and more importantly, shows CPU memory, disk, and application-specific usage for each instance.
This is an application that begs for a large-screen monitor for all the data that it can display. Server arrays are also manageable across instances, groups, or clouds. There is alert-based and queue-based messaging regarding server conditions. Alerts are triggered and queue messages are job control actions, such as spawning a new worker instance when another queue is taking too long to complete something.
Tap In Systems Cloud Management Service
Tap In Cloud Management Service allows you to connect to Amazon Web Services, GoGrid, OpSource, Terremark and 3tera Applogic clouds. We tested with AWS and GoGrid.
Tap In Systems doesn't provide instance launching, just straight up cloud monitoring. Tap In Systems uses cloud modules installed on the Tap In System server that talk to system agents on the server instances.
As an example, the agent for Amazon EC2 gets information about each instance in your account with information provided by the EC2 API. It's also possible to see CloudWatch data for your instances in the Web interface; CloudWatch is an Amazon optional monitoring API.
There are two components to Tap In's Cloud Management Service , a Web user interface and a Java-based QuickView application. Tap In's agent for GoGrid is similar to the Amazon agent in that it gets information about each instance and displays that information in the QuickView client or the Web user interface. Tap In agents, installed directly into each cloud instance, are actually scripts that gather detailed information and send it to the management server. One problem we discovered is that the agents aren't daemons and have to run as scheduled jobs in Windows. With Linux, the cron job scheduler handles this function.
The operating system provides more information than the agents do, but unlike RightScale, Tap In doesn't pick this information up. Although it's possible to integrate with open source network monitoring applications such as Nagois, Big Brother, or Ganglia, the documentation doesn't say where to get them or how to install.
And for Windows, it's possible to integrate with MOM or Systems Center Ops Manager through the powershell script, which gathers the information from MOM or SC Ops Manager. But, again documentation is scarce. Monitoring Windows cloud instances is augmented, however, by a Windows powershell script that sends information regarding CPU, memory, disk and virtual memory to the Tap In server.
Like the Linux script, you have to setup a task to have it run every so often so that the script will run and update the statistics. A pair of ports is required to be open on each instance controlled in this way, 9001/9009, which may be a small security risk, depending on how it's configured. We could modify the script to send more statistics to the server, which are accumulated and displayed in the Web interface or their client user interface.
We found a small problem with the agents, as their logic doesn't seem to permit time stamping data, so that, for example, if the server or instance is shutdown, the console of events displayed in the TIS Web page or QuickView app will still think the instance is working.
There are two main ways of viewing events. One is the Web interface, which is also where most of the configuration takes place, the second way is through the QuickView client, which is a Java app. In both user interfaces, the tabular data is displayed under a histogram, unfiltered. View settings with chosen and re-orderable columns would be handy.
The Web interface is a bit confusing and we feel it needs some more polish. Event monitoring is also not so pretty either. It's organized into cloud vendor groups like GoGrid, Amazon, and each individual system agent. However, even if a system agent is installed on an Amazon or GoGrid instance, the instance data collected by the agent will not be displayed under the labels Amazon or GoGrid, but separately. The QuickView client connects to the Tap In server to receive events. Unlike Web interface events, in this case, we are able to order the columns around. There are also SQL filters to go through the data.
Cloudkick
Cloudkick is a basic monitoring tool that can manage public clouds including Amazon EC2, GoGrid, Slicehost, Rackspace and more. We tested Cloudkick with Amazon EC2, GoGrid and Rackspace and had reasonable, if comparatively elementary, success with it. We tested the developer edition, but there are several editions, based on the number of nodes monitored, messaging, and users of the monitoring tools.
We installed it onto Amazon EC2 first. After entering our Amazon EC2 credentials (Access Key ID and Secret Access Key) and later our GoGrid info (API key and Shared secret), and then our Rackspace API key and username, the Cloudkick Web interface started collecting information about our running instances on each provider.
There is also a new Hybrid Cloudkick which is supposed to be able to connect machines in your data center to the Cloudkick interface via the Cloudkick API. One installs the agent onto a physical Debian/Ubuntu or CentOS/RHEL box and you can see the operating system plus data in the Web user interface. Simple as pie.
Besides the basic information provided by the public cloud vendors (in Amazon's case that includes IP address, DNS names, instance name/ID, creation time, and a few others) through their APIs, Cloudkick has Linux agents that are installed in Debian/Ubuntu or CentOS/RHEL instances. Cloudkick Windows agents are said to be coming soon. The agent is pretty easy to install with detailed instructions on their wiki.
Even without the agent, it's possible to monitor ssh, http, https, and ping. Details like CPU, memory, network, disk, IO, and others are only available when the agent is installed. Custom plug-ins are also able to be created to monitor whatever one would like.
After a Cloudkick instance agent was installed, we could view some diagnostics, graphs and other aforementioned information on the main overview page for that machine. Cloudkick also has a neat feature called a Web terminal for ssh which launches a terminal similar to a Quake-style (the 3D video game) pull down console that will connect to your instance once you install an ssh key provided through the browser-based Cloudkick user interface.
Another somewhat neat feature is the ability to color code a node. When there are a lot of instances, it's possible to quickly find the one(s) you need. Dragging and dropping the nodes (instances), is also possible to re-order the list. We could make and set tags for each node, as well.