• United States
Executive Editor

Software cleanses documents of unwanted data

Mar 27, 20065 mins

Obscured content, metadata can expose companies to accidental data disclosure.

The Center for Army Lessons Learned is on the receiving end of sensitive information that it sanitizes and turns into instructional materials for military personnel.

For Dan Cindrich, security specialist, the challenge is to make sure the documents that leave CALL don’t inadvertently expose sensitive or extraneous information. For the last seven months he has been using software from SRS Technologies to help automate the task.

SRS makes Document Detective, a new electronic document security tool rolled out this week. The software is designed to find and strip dozens of hidden data and metadata varieties, including tracked changes, comments, OLE files, embedded objects and object fragments.

The software exposes any hidden content and lets users determine what material to eliminate or retain. With its “flatten” tool, Document Detective can automatically discard extraneous content and reduce document file sizes in the process.

The IT community knows all about security threats from hackers and malicious insiders, but they’re less aware of the damage that can be done when employees share files via the Web and e-mail and inadvertently expose sensitive information, says Ron Hackett, a program manager at SRS and developer of Document Detective.

“Ordinary users have tremendous access to information and a legitimate need to share some of that information outside of the security boundary. The problem is, the document formats they like to share information in can contain lots of hidden data, and they don’t know how to clean it up,” Hackett says.

It’s a problem that has caused a number of publicized data leaks, particularly among government agencies. In May 2005, for example, Multi-National Force-Iraq posted a report regarding an investigation in Iraq, but the organization’s attempts to mask certain sections didn’t hold up. By cutting and pasting text that had been blacked out in the PDF file, viewers could see the words censors had tried to hide.

A key factor contributing to such unwanted data disclosures is an ad hoc review feature Microsoft added in Office XP that automatically enables version tracking if a user e-mails a document – even if version tracking wasn’t turned on in the original document. “It’s automatically enabled every time you e-mail a Word, PowerPoint or Excel document using Outlook,” Hackett says.

Microsoft has since changed this default setting, but some companies may still be vulnerable. “If you do a clean install of Office 03, that switch is turned off. But if you do an upgrade from Office XP, that switch is still turned on,” Hackett says.

Accidental exposures also can happen when users create a summary chart in an Excel workbook, then copy and paste that chart into a PowerPoint presentation. “What they believe they have done is copy the chart. What they have done in reality is copy the entire workbook,” Hackett says.

At a Department of Defense conference Hackett spoke at, he received a CD with all the speakers’ presentations. One speaker had included a chart containing caseload information – along with lots of additional material invisible at first glance. “I opened it up and found a 10-page workbook in it,” Hackett says. “That workbook included things like defendants’ names, court dates, case officers, charges and evidence logs. Some very sensitive information.”

These are just the types of data-disclosure gaffes CALL hopes to avoid with Document Detective.

CALL, based in Fort Leavenworth, Kan., collects data from sources including Army operations staff, then turns that data into lessons which get published and distributed to military commanders, staff and students.

A quick turnaround is part of CALL’s mission. CALL’s officers collect data from the field that’s turned into real-world lessons in as short as 30, 60 or 90 days. Conversely, formal Army doctrine can sometimes takes a couple of years to change, Cindrich says. That’s too long to wait to disseminate the experiences of soldiers just completing a tour of duty, for example. “We have to make those soldiers going into a combat area just as smart as the soldiers coming out,” he says.

So far CALL has used Document Detective to scan and scrub nearly 650 documents. CALL found 75% of the documents had information obscured by images, 20% had hidden layered objects, and 5% had information embedded in metadata.

“It’s given us greater confidence that our documents are completely in a pure form, that there’s no accidental information in there,” Cindrich says.

Document Detective is priced starting at $300 for a single-user license. The version available today is a desktop application that lets users initiate a document scan via a toolbar embedded in Word, PowerPoint and Excel documents.

Looking ahead, SRS plans to develop a plug-in for Outlook that would let companies warn users that an attached document needs to be scanned before being sent outside a defined security domain. A server-based version also is in the works.

SRS isn’t the only vendor in the document-scrubbing business. One competitor is Workshare, which offers software for uncovering hidden data in Office documents. Microsoft, too, offers a free add-in for Office 2003/XP that removes hidden data.