Americas

  • United States
by Jon Toigo

The grid storage facade

Feature
Sep 27, 20049 mins
Data CenterIBMSAN

Storage vendors are playing off the buzzy grid computing term to draw attention to their tools for scaling NAS capacity. One analyst analyzes whether this latest storage concept has more than a catchy name.

Lately the term “grid storage” has crept into the product literature of vendors ranging from storage stalwarts IBM and Network Appliance to numerous start-ups. While grid storage appears to borrow conceptually from grid computing – a set of technologies used to build supercomputers from clusters of inexpensive processors – the similarity ends there. The two have little else to do with each other.

Grid storage refers to two items: a topology for scaling the capacity of network-attached storage (NAS) in response to application requirements, and a technology for enabling and managing a single file system so that it can span an increasing volume of storage.

One way to view grid storage is as a means to scale NAS storage horizontally and vertically while avoiding the problems associated with each.

Currently, scaling horizontally means adding more NAS arrays to a LAN. This works until the number of NAS boxes becomes unmanageable. In a “grid” topology, NAS heads are joined together using clustering technology to create one virtual head. NAS heads are the components containing a thin operating system optimized for Network File System (NFS) protocol support and storage device attachment.

Conversely, the vertical scaling of NAS is accomplished by adding more disk drives to an array. Scalability is affected by NAS file system addressing limits (how many file names you can read and write) and by such physical features as the bandwidth of the interconnect between the NAS head and the back-end disk. In general, the more disk placed behind a NAS head, the greater the likelihood the system will become inefficient because of concentrated load or interconnect saturation.

Grid storage, in theory, attacks these limits by joining NAS heads into highly scalable clusters and by alleviating the constraints of file system address space through the use of an extensible file system.

Who needs grid storage?

Grid storage would be useful to anyone with a large complement of NAS arrays to administer, according to a manager of a national Internet e-mail portal service who asked not to be named. He complains that his current complement of several hundred NAS storage devices from a prominent NAS vendor creates a huge management problem. Managing the capacity on each array requires that he access each array’s self-generated status and configuration Web page, which is “like surfing the Web all day.” To him, the possibility of one virtual NAS array, created from a cluster of individual arrays, is a management boon.

The development of storage grids clearly is geared toward NAS users today – primarily because NAS vendors are spearheading such efforts. But others might one day benefit from the grid storage concept, particularly those who have unruly Fibre Channel fabrics. Take for example a hospital in northern Virginia with several isolated storage-area network (SAN) islands – the result of uncoordinated storage acquisitions made by various corporate turf lords. Making disparate SANs communicate and share data with each other in the face of non-interoperable switching equipment is a nightmare for the hospital. Conceivably, by using clustered NAS devices serving as gateways and managers of the back-end SANs, the hospital would gain improved capacity, file sharing and management generally.

For those organizations with file storage consisting of millions of discrete files, the limitations of current file system address spaces can impose major hurdles for centralized management and capacity efficiency. Including this data into a massively scalable storage grid-based file system would promise more efficient file sharing.

Competition in the making

Established vendors such as Network Appliance and Silicon Graphics (SGI), and newcomers such as Panasas, are working on clustered NAS technologies that sometimes are called grid storage. SGI might be ahead of the game with its application of proprietary server clustering technology to the NAS head, and Panasas has begun shipping a system based on Linux Beowulf clustering. Both companies’ products primarily target high-performance computing.

12

BURNING QUESTIONS

ABOUT GRID STORAGE
  • Which vendors’ storage products and which server operating systems will work with your grid storage technology?
  • When will grid storage be available?
  • Does the strategy require replacing existing file systems?
  • Does the strategy require the deployment of new servers to host file namespace services? Will specialized agents need to be deployed on all client systems?
  • How much WAN bandwidth is required to make a distributed approach viable?
  • Which standards does it support, and who else supports those standards?
  • What management tools will you provide?
  • How will your grid storage product handle storage allocation and de-allocation?
  • How will the solution detect and respond to application demands for storage resources or file access?
  • How will file system metadata be managed: Is there a special agent architecture, a replacement file system or some sort of communication between storage platforms themselves?
  • What time delays will occur in file directory data in geographically distributed environments?
  • How will security be preserved in a universally accessible storage environment?

For grid storage, Network Appliance plans to use technology it gained when it acquired Spinnaker Networks in February, says Chris Bennet, a senior director with the vendor. Network Appliance’s challenge is particularly daunting. While the Filer products use a proprietary implementation of the Berkeley Fast File System, the Spinnaker products had used the Andrews File System. The two file systems have fundamental architectural differences that might require a departure from current product design. “Several years will be required to converge the technology at the code-line level,” Bennet concedes.

Like competitors, Network Appliance seeks to improve management of multiple physical NAS heads and to create one scalable, synchronized directory. This directory would represent all files stored on all the NAS arrays as the number of arrays is expanded. Here grid storage appears to be less about NAS architecture and more about file sharing.

Getting to grid storage

If you are a large corporation with skyrocketing storage needs, you might be interested in watching for eventual grid storage products. Before going to the vendors for a solution (none exist yet), you will need the following:
  • A comprehensive data assessment that tells you which applications generate your data files; what requirements exist for their long-term retention, security and accessibility; and rates or trends of growth over time.
  • For those anticipating the inclusion of storage devices located at geographically disbursed locations into a storage grid, you will need to document current WAN interconnect bandwidth and service levels. This will help you determine what enhancements are required to make interconnects serviceable with a grid storage approach.
  • A total cost of ownership analysis focused on current data storage to identify the costs associated with the hardware, software and labor overall and with respect to each tier or type of storage you have deployed. You will need this information to see whether a business case exists for going to grid storage when it becomes available.

At IBM’s Almaden Research Center, work is proceeding on a self-described grid storage project aimed at creating a “wide-area file sharing” approach, says Leo Luan, research staff manager on IBM’s Distributed Storage Tank (DST) project. The objective is to extend the capabilities in a “Storage Tank” – a set of storage technologies IBM rolled out last year that includes virtualization services, file services and centralized management – to meet the needs of large, geographically distributed corporations. Such sprawling companies struggle to replicate and distribute copies of files among their disbursed data centers.

The heart of grid storage is a methodology, whether based on clustered NAS or other distributed storage topologies, to enable synchronized file sharing. IBM is looking at untapped capabilities in the NFS Version 4 standard to help meet the need. “DST extends to NFS clusters that can be used to build a much larger grid with a single global file namespace across a geographically distributed environment,” Luan says.

Making the approach open and standards-based requires a schema for file sharing that is independent of a server’s file and operating systems, and that does not require the deployment of a proprietary client on all machines. IBM is working with the Global Grid Forum’s File System Working Group because its intent is to produce a standards-based Lightweight Directory Access Protocol server to act as the master namespace server.

Industry observers disagree about the timeframe for, and even the likelihood of, a truly vendor-agnostic grid storage solution reaching the market. Some believe that the underlying technologies for global file namespace management, including virtualization and synchronized replication, are simply too immature or too prone to vendor infighting to be ready for prime time. Others take exception with the disruption inherent in most current extensible file systems, which commonly require either the modification or wholesale replacement of server file systems. To be successful, grid storage must be non-disruptive and transparent to users and applications.

Rival technologies

Yet others question the relevance of such complex grid storage architectures in the face of rival technologies. For example, global namespace servers, such as NuView’s StorageX, and networked file sharing appliances, such as Tacit Networks’ Ishared, deploy without interfering with existing file systems. They also provide file accessibility and synchronization services for wide-area file sharing that users might find perform well enough for their needs.

The death knell for grid storage ultimately might result from a failure to define the term. Not only is there the confusion surrounding use of the word “grid,” but there also is a similarity between much of the grid storage discussion and the description of storage utilities in 2003 – and of SANs the year before. Without a common industry definition for the term, it will remain more “marketecure” than architecture.

Toigo is CEO of Toigo Partners International, a technology analysis firm. He can be reached at jtoigo@intnet.net .