TwinStrata's CloudArray is a brokerage platform between enterprise networks and cloud storage services providers, and we found in testing that it's pretty clever.
CloudArray allows companies to expand their storage resources into the cloud storage provider's assets, subject to the bounds of budget, data rates, and realistic configurations. To operating systems and applications, the CloudArray platform appears as a simple iSCSI connection or CIFS share that's back-ended to the cloud.
CloudArray is simple enough to be installed and used by those with moderate skills. It potentially could be a nightmare for its ability to execute massive rogue data transfers. Fortunately, there are enough checks and balances to prevent hostile data theft. TwinStrata creates a long “paper trail” and secure tracking is done by the CloudArray portal.
[ALSO: 10 cloud tools]
CloudArray represents something of a milestone. For more than a decade, there have been connection brokers for OS terminal sessions. Then VMware broke new ground to offer live connection brokerage for virtual machine sessions. What TwinStrata proposes is to serve as a secure connection broker between network devices over iSCSI or CIFS shares that look like one more network hard drive -- of potentially gargantuan size. Our criticisms surround lack of deduplication, and limitations with high-availability/disaster recovery.
We tested TwinStrata with two different cloud storage providers, Amazon's S3 and HP's CloudServices and found that it does the connection brokerage well. It can augment high availability scenarios, but doesn't substitute for a more comprehensive inner network-outer cloud backup and retrieval system. Instead, it extends storage capacity for static, not dynamic data -- files, not necessarily the best for live storage to instances.
The cloud is not your active transparent hard drive with the speed of directly attached SSD -- yet.
How It Works
The TwinStrata platform is an appliance, either a proprietary piece of hardware or a virtual machine equivalent. TwinStrata uses a web interface, and a limited command line vocabulary, along with a configuration (config+keys) that can be saved -- and cache space. The appliance serves as a storage proxy machine between one network and another.
The traffic cop portion of the appliance is a bandwidth scheduler that can be set to lower the speed of network traffic use during peak periods, which is important we found, as CloudArray is both aperiodic and can be really heavy to the point of dominating available bandwidth. CloudArray can be throttled on a seven day, 24-hour basis to prevent its activity from overwhelming other site traffic as it sends data to its cloud destination.
In our tests, we used the virtual machine version of the CloudArray appliance. We configured the VM appliance with a primary and secondary virtual hard drive atop VMware ESXi 5.1, with a 75GB local cache. The cache in turn, goes to the cloud. Specific appliances are available for VMware, XenServer, and Hyper-V; a non-production/sample Windows version was also downloadable.
We could configure the appliance in one of two ways: a local backup device, or a cache to the cloud storage provider's resources. As the iSCSI/CIFS targets represent destinations for data, they can be used by backup applications like Veeam, but we didn't test this. Instead, we instantiated an iSCSI target with VMware, dedicating the VMware host machine to the iSCSI path, then linked the VMware server along with its CloudArray appliance.
In turn, we connected to both Amazon S3 and HP CloudServices, then used several sample data sets to gauge speed of the circuit between source and destination -- then back again. Once we setup the appliance using the SSL logon keys and cloud service provider specifics (our accounts), we pumped the data sets.
The results, in our case, were circuit speed-bound, at about 12MBps up to the provider, and about 18MBps down. CloudArray has several compression algorithms that can be used, as data is both compressed and encrypted on the way out the door of the CloudArray appliance. In our tests, a sample said we achieved only about 20% compression, but the data set had binary contents. Throttling the connection worked, but the sampling rate provided by the CloudArray appliance was but once every 15 minutes (not changeable), and we wondered why it was included at all.
Not cloudy all day
Procedurally, one installs the CloudArray VM or appliance, and has a ready and compatible cloud services vendor along with that vendor's SSL keys and/or other additional security components. CloudArray is then populated with the path and keys to the desired cloud storage site, and the circuit then opens that can move data from local resources across to the cloud — one or many clouds.
TwinStrata has partnership relations with more than two dozen cloud storage providers and understands most by a simple drop-box selection choice. The depth of relationship between the CloudArray appliance and the vendors we chose was good in terms of security (SSL) and ease of connectivity.
Possible cloud storage candidates range from Amazon, Google and Rackspace to Peer1 and Windstream, including many providers with international IP presence. We chose Amazon AWS and HP CloudServices. Measuring speed to providers proved more difficult.
Part of the problem is that some of the data is cached in the appliance, which makes for quick restorations if the data is located in cache, but the answer to the question of where exactly is my data is more nebulous. As a logical data pipe, locally sourced restoration from local cache could be as fast as if from a local network share. If the data comes from the cloud during restoration, the circuit is different and the latency is captive to the responsiveness of the circuit and the host cloud storage provider's speed of delivery to that circuit. As all data is dragged through the CloudArray machine, it needs to be well-placed in terms of network connectivity.
A file/dataset/snapshot might be in cache, it might be in transit, or it might be in the cloud. Inside of the CloudArray appliance, files are broken into encrypted chunks as they're initially written, then as the cache fills, they're sent to providers, first-in/first-out. As things move back and forth, tabulation is stored in the CloudArray portal. We could look things up to see what was where.
There are ways of predicting where any specific file, folder or snapshot of information might be, and we used this as a comparative method for speed of restoration, as speed of backup is what one might expect of an iSCSI interface or CIFS share -- captive to the speed of the local connection pathway. In our case, the pathway was Gigabit Ethernet as piped through our host. The CloudArray portal keeps track of what's where, allowing a replacement CloudArray appliance to be brought online remotely should a primary appliance become unavailable.
We saved configuration information, killed the appliance, and it restored into a newly created appliance correctly. Although we could lose a bit if information if we crashed the appliance, the cloud services provider used retained intact information. Backup session typically ran at 5MBps. Restorations where we knew we were pulling the data from the cloud, were typically retrieved at less than half that speed. Our circuit-speed, however, will be different than your circuit-speed.
In the use case where the appliance is used for high-availability/disaster recovery, there is danger of having a large local cache set in the appliance, which might become unavailable for whatever reasons. Local cache, although forwarded to the cloud service provider somewhat quickly, is still an unknown synchronization point; the last completed job is what's now available to restore into a hot-site or another CloudArray appliance on a network -- the survivor, we'll call it. More cache means potentially more data in a transient state that can't be restored, so we suggest using smaller datasets rather than large ones with large cache if HA/DR is of concern. The bigger the chunk of data in cache you store locally, the more you can lose locally and cannot subsequently restore.
The appliance is a closed host and is shutdown from external examination. We can tell what's inside the appliance by poking it, but you can't get a shell to go inside and fix something. We could download plentiful logs, but could not go inside and fix things. This may or may not have security implications for organizations.
It's possible to have an accumulation of dead files in the cloud service provider's storage, because deletion of objects from CloudArray doesn't necessarily walk up the circuit to the provider's store and delete them or purge them. We were told that external utilities must be used to delete from iSCSI stores, but CIFS stores should be deleted immediately and we verified this.
As an example, to get files deleted from our iSCSI-connected HP store, we had to use HP-supplied PowerShell commands. This means that it's possible to displace (and therefore likely get billed for) a maximum common denominator cloud storage displacement, although the deleted files will eventually go away using CIFS. These files are around 800KB in size. Those counting pennies will get nervous, but the storage costs for many cloud storage providers and smaller cloud storage needs have recently been a race to the bottom, so it's likely that those deploying CloudArray won't likely care. These facts did, however, make testing/piloting/estimating storage needs, costs, and transaction times more difficult.
TwinStrata's CloudArray is potentially more manageable via CIFS shares, as described above, and that means compatibility with Microsoft and Linux/BSD/etc. uses. The CIFS shares or iSCSI target can be the replacement for drives and shares, as well as tapes in backup scenarios, although TwinStrata isn't really a traditional backup application.
Another scenario might be to establish datasets for replication purposes. The replicated data can be used for eDiscovery, directly accessible restoration, and depending on realtime requirements, aforementioned HA or disaster recovery to another or IP address.
The most convenient use is to a cloud storage services provider as an extension of general internal storage assets, or as the gathering point for other cloud system assets either with the same cloud system asset provider, or as a backup/replicant to a distributed chain of diffuse assets.
The connection/share brokerage capabilities of CloudArray can be very diverse, if with some limitations. Most of what CloudArray does can be mimicked by doing all of the work yourself if you have the time and are very clever, as there's a large and tedious amount of work that CloudArray performs in lieu of our own manual construction of the same steps. CloudArray can be terribly convenient, and a comparatively simple way to extend storage resources into a cloud storage service provider's secure data stores. Despite the convenience, TwinStrata's CloudArray is still somewhat primitive and needs polish - but it's a great start.
Henderson is principal researcher for ExtremeLabs, of Bloomington, Ind. He can be reached at firstname.lastname@example.org.
How We Tested TwinStrata CloudArray
We hosted CloudArray as an appliance in our network operations center at nFrame/Expedient in Carmel, Ind., and controlled it from our lab in Bloomington. CloudArray was downloaded and hosted as a VMware ESXi 5.1 VM. We connected the appliance to AWS S3 and HP CloudServices via iSCSI, and tested the appliance's connectivity, speed, and logging from the NOC's resources, mainly Windows 2008R2 instances via iSCSI, and our Dell/Compellent Series 30 SAN, also connected through the switch via iSCSI interfaces. We also tested CIFS interfaces in a similar manner. The appliance host was an HP DL360Gen8 server with four Ethernet Gigabit Ethernet interfaces into an Extreme 24-port Gigabit Ethernet crossbar switch, connected through nFrame/Expedient's switch to the cloud storage providers.