- Is the Cisco MARS mission going to abort?
- First iPhone worm spreads Rick Astley wallpaper
- 10 stunning 3D buildings made with Google SketchUp
- Open source software ready for big business
- Four reasons to buy (and one reason to avoid) the Droid
Corporate efforts to secure data, comply with regulations, tier storage and meet new legal-discovery demands depend on having a good data-classification method in place.
Traditional classification methods that rely on file-system metadata lack comprehensive content visibility, because the Common Internet File System for Windows and the Network File System for Unix offer no more than eight metadata summaries for classification, such as file name, directory name, file size, type and modified or access dates.
These basic solutions are proving inadequate to address IT's requirements for accurate data classification.
That has spurred the emergence of a market segment called Information Classification and Management (ICM). These tools offer advanced features such as file-path metadata parsing, in-file content visibility, context category classification, file-classification tagging and policy-based management and tracking.
Unfortunately, some of these solutions still suffer from serious performance, scalability, flexibility and capability issues because of their foundational architectures, namely relational databases and/or enterprise search engines.
The latter have proven quite adequate for Web-based searching, as demonstrated by Google and others. Their limitations may make them unwieldly, and many IT professionals find them unsuitable for ICM requirements in enterprise environments. Think of search as building a dictionary to find a few words. One must first build a large index of all words in all files.
This process is quite slow (as much as two weeks to index 10TB) and can consume lots of storage (increasing requirements by 50% to 300% in most cases).
Advanced solutions targeted at ICM must go beyond search to provide true data mining of information. This includes the ability to find Social Security numbers, credit card numbers, source code or confidential information stored in unsecured locations. They also must be able to find data that resembles a name, company name, account number or litigation case name, or even a data-point value in a spreadsheet cell.
Some tools can use pattern or context recognition to detect document summaries or themes. This provides content visibility similar to search but adds context to make sure that John Apple is classified as a name and not a company or fruit.
In larger enterprises, files must be found and classified in a variety of locations, something solutions built on relational databases have difficulty accomplishing. What is needed is something that transcends the monolithic single database architecture and offers something akin to a grid.
This requires a new distributed data model that enables individual slices of a database to reside in remote locations or on individual PCs rather than on centralized data repositories.
Moving classified data to appropriate repositories requires policy engines that start with data classification and include file tagging. The data value must be known before policies are established, and a successful policy engine must leave file stubs, or shortcuts. Also, all file directory structures, including access control lists, must be moved. This has to be accomplished in a heterogeneous storage environment regardless of the storage systems in place.
Partner Content
www.bmc.com
Gartner 2009 Magic Quadrant for Job Scheduling
Gartner has positioned BMC CONTROL-M in the Leaders Quadrant of their "2009 Magic Quadrant for Job Scheduling." The report assesses the ability to execute and completeness of vision of key vendors in the marketplace. Read a full copy today, courtesy of BMC Software.
Download whitepaper
Dell's SMART Approach to Workload Automation
Read a compelling case study by EMA, Inc. to learn how Dell uses BMC CONTROL-M to cut cost and increase productivity with workload automation.
Download whitepaper
Workload Automation Cost Savings 2 Minute Video
A major computer manufacturer uses BMC CONTROL-M and just four people to schedule and run over 85,000 jobs every month. By switching to BMC CONTROL-M, they more than quadrupled the workload without adding a single staff member. See how in this 2-minute video overview.
Go to video
Comment