How to avoid bumps on the road to grid computing

A UPS IT executive shares a truckload of lessons learned on his grid journey

UPS has always been driven by technology. It invests close to $1 billion per year in IT. Recently it added the New Data Center technology of grid computing to its infrastructure, says Brian Cucci, manager for UPS' Advanced Technology Group in Mahwah, N.J. Last October, Cucci's group completed a year-long project that moved a mission-critical COBOL billing application from the mainframe to a Linux grid running DataSynapse GridServer. Here Cucci shares the five biggest lessons he learned from that and other grid projects.

• Software licenses, not technical issues, could be what prevents an application from running on your grid.

"If you are going to target an application to run on a grid, you are virtualizing that application to run anywhere. But you may be locked contractually to run it only on, say, two dual-core boxes, and then you are not going to get the power of that grid," Cucci says. His team analyzed which types of licenses were grid friends and which were foes, he says. The most grid-friendly is enterprise licensing, which lets software run anywhere. Concurrent-user licenses - or any form of license based on how many times an application runs somewhere - also are friendly. Node-locking licensing - or any form of license that dictates the type of machine - are the worst. CPU-based licensing can work but isn't great because contractually limiting the software to a specified number of CPUs undermines the power of the grid. Getting vendors to modify their licenses to be more grid-friendly can be tough. Vendors are often fearful that multiprocessing computers will eat their revenues - and rightly so: Grids and multicore machines often let enterprises do more with less software.


Read a related story on the first application UPS ported to its grid.


• Expect capacity planning to be more guesswork and gut instinct than established engineering.

Because UPS chose Linux, an affordable operating system already well known internally, there were few surprises for the team in building the grid itself, Cucci says. But the team found little guidance in how to estimate workloads so it could determine how big a grid to build. Even the most intense workloads ran lightning-fast on the grid. A process that took 270 minutes to complete on the mainframe could be completed in less than 40 minutes on a two-server, eight-CPU grid, he says. Adding servers did not always translate into proportionately faster performance, however. For instance, tests showed that a two-server, eight-CPU grid connected to a storage-area network reduced application-processing time by 42% compared with a grid that had a single server and four CPUs. Adding a third server made the grid only marginally faster, however, reducing application-processing time by 53%. A four-server, 16-CPU grid reduced processing time by 56% compared with the single-server grid but ran only 3% faster than the three-server grid. As it experimented with capacity, UPS tended to overestimate the number of boxes needed, Cucci says. But there was an upside: Because the grid was inexpensive to buy and operate, there was no large financial penalty for overbuilding it.

• Don't expect help with utilization planning.

To maximize their investment, IT executives are going to want to run as many applications on the grid as it can handle. Cucci says his team's goal is 100% utilization, but mature workload-management tools are not available yet to help plan for such usage."Chargeback tools exist in DataSynapse, but are only good once you build and deploy," he says. Discovering how many applications, as well as which application combinations, the grid can handle will be a matter of trial and error, so be sure your planning phase includes extra time for this, he says. Grid newbies also must remember to factor business-continuity capacity into the mix. For its business continuity needs, UPS built two grids - one for each of its primary data centers - to run specific applications and to handle failover.

• Understand that small technical differences between the grid and mainframe can cause the biggest trouble.

Often a grid is built to run only portions of a mainframe application. The goal is to slice out the compute intensive part, run it on the grid, then deliver the results back without skipping a beat. To make this work, the grid has to produce results identical to the original mainframe code. This will probably require lots of unexpected reengineering. For instance, UPS' billing application uses a timestamp in the file name. The mainframe relies on the name to work with the data. UPS discovered, however, that Linux uses a timestamp convention that's different from the one the mainframe uses - and the grid operates faster. As a result, the grid was giving multiple files the same name and in a timestamp format the mainframe didn't recognize. Before going live, Cucci's team had to to fix this hidden problem.

• Plan on gutting your systems management processes.

If a long-running mainframe application has a problem, the IT folks have a reliable methodology for fixing it. When that same application - or just a part of it - moves to a new platform, IT executives need to build new systems management procedures for it. The tools used to diagnose problems on a Linux grid are different from those used to troubleshoot mainframe problems, plus the mainframe experts are often not the Linux experts. New support teams will likely need to be created. The most efficient project timeline considers the support process from the outset, Cucci says. Otherwise, this requirement will be discovered at some point - and it's best that it not be discovered after the production rollout is complete and a broken application is waiting to be fixed.

< Previous story: Six advanced-technology open source projects | Next story: 10 best practices for your enterprise SOA >

Learn more about this topic

Grid computing takes hold at UPS

11/01/06

Buffalo: The land of chicken wings, snowstorms -- and grid computing?01/09/07Which should come first, the grid or the app virtualization?

09/25/06

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:

Copyright © 2007 IDG Communications, Inc.

IT Salary Survey 2021: The results are in