- Google I/O 2013's Coolest Products and Services
- 10 Star Trek Technologies That are Almost Here
- 19 Generations of Computer Programmers
- 25 Must-Have Technologies for SMBs
Network World - This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
It is estimated that unstructured data -- everything from email to spreadsheets, documents and digital media -- accounts for at least 90% of an organization's data. You systems are bloated with everything from personal iTunes playlists to the early versions of that PowerPoint presentation you delivered in March. To make matters worse, analysts at Gartner and IDC predict that data growth in IT organizations will grow by as much as 800% in the next five years.
Corporations can fight information bloat by using tools that provide a file-by-file inventory to identify files that are duplicate, unused, infrequently accessed or violate policy. In short, there are ways to shed those unwanted terabytes. Here are some tips how:
Most companies don't know how big their problem is. They can't tell you what file content they have, how much exists, who created it, what resources it is consuming or how much data is duplicated. When we first begin working with an enterprise, we typically find an average of 50%-60% of any given organization's NAS data has not been viewed in several years.
Since many view the task of sifting the bad data from the good as too daunting, the problem just gets worse. Traditional, manual profiling is difficult and expensive. As such, profiling is done infrequently -- sometimes annually -- making it impossible to understand the data's impact to the corporation and its storage resources.
Before you can identify wasted files, re-tier storage or trend on storage usage patterns, you need to understand your current capabilities and decide what tools you need to be successful. There are several that offer varying degrees of visibility into some or all of your unstructured environments.
Native array monitoring tools often stop at array capacity and can't provide file-level information, such as when the file was last accessed. Furthermore, this view tends to overestimate your true capacity, leaving you searching for budget to buy more arrays sooner than is truly necessary. Solutions that walk the file tree tend to be cumbersome and place a significant burden on your system, slowing down not only your visibility reporting, but potentially your network as a whole. These "boil the ocean" tools tend to take months or years to deploy and may force users to install agents to feed a relational monitoring database, which can weigh your system down and present scalability challenges.
More lightweight solutions can be deployed in a matter of weeks, not months, and work without the use of agents. Some use a purpose-built database to collect file metadata (versus the complete file). This enables them to characterize and report on billions of files at 10x to 100x faster than a standard relational database. Many of these solutions can be paired with a data mover or user script to implement removal, archiving or re-tiering of data.