Americas

  • United States

Electronic message archiving: Managing out-of-control e-mail growth

Feature
May 24, 20045 mins
Data CenterLegalMessaging Apps

Electronic message archiving: Managing out-of-control e-mail growth

Jim Geis, director of system solutions and services, Forsythe Solutions Group

Business problem: As part of a distributed IT operation, an entertainment company placed e-mail servers at each of its 15 offices. With e-mail and instant messaging (IM) use rapidly growing, predicting storage requirements for electronic messages had become difficult. Distributed operations were complicating capacity planning. And planning was about to get worse. Because of compliance regulations and the corporate world’s increasingly litigious nature, the legal department mandated that IT keep a permanent record of all messages for at least seven years.

Traditional approach: Add e-mail servers to nightly back-up processes to address legal’s mandate and then manage server space by reducing message stores on e-mail servers. However, this has several drawbacks. Even if Post Office Protocol is not used – so messages aren’t downloaded automatically to the client and deleted from the server – users remain free to manage their own e-mail. They can delete messages stored on the server at will and exchange information with whomever they wish (although administrators might filter out certain domains). A disgruntled employee could leak messages or wipe an in-box clean of all messages. If users delete messages from the main server before a nightly backup, those messages would be gone for good. And, for users who never delete their messages, system administrators must ask them to do so when the servers run out of space – unless the administrators automatically expunge messages older than a specified date. Should IT need to locate messages on a specified subject from an archival tape – perhaps for use in legal proceedings – finding messages based on content would be an arduous task, taking weeks to months (subjecting the company to court-imposed penalties for untimely compliance), and that’s presuming the message was saved to begin with (with most IMs never saved at all).

New data center approach: Treat every electronic message sent or received as a potential evidentiary fact. Know the location of electronic information, who sees it, how long to keep it and when to delete it. Develop access, creation, deletion and retention guidelines. Create a plan that coordinates the physical management of e-mail storage with logical electronic message content management, including IMs. Use an electronic message archiving infrastructure as the technology that lets you execute these guidelines and plans.

With a good electronic message archiving infrastructure, all messages would be processed and stored centrally, accessible not only by the user, but also possibly by a subset of people from various business departments – legal and managerial, to name two. Two types of servers are required – one for processing messages and another for managing archival functions. Archival management includes indexing and searching messages based on various selection criteria, from dates and sender to content. The business gets the bonus of knowledge management – the ability to mine message vaults for useful business information.

This entertainment company would consolidate to its main data center the work that most of its 15-plus mail servers were doing, leaving on-site servers and storage only at larger field offices. Remote servers would be integrated with the main messaging infrastructure, and all messages would be archived centrally. Two main multi-processor servers would be needed for each function – message processing and archiving – but would be clustered for failover. Clustering also would give administrators a way to increase processing capacity as needed, even while absorbing remote office message processing.

Clustering for failover also mandates two storage tiers. A network-based storage scheme is required for any effective electronic message archiving (and could be chosen among any of them: TCP/IP, Fibre Channel, storage-area network, network-attached storage, iSCSI and Fibre Channel over IP). One network-based tier would be high performing and handle the continuous read and write I/O for an intense application such as e-mail. Network-based storage connectivity also complements the cluster failover by providing redundant access points.

The second networked-attached tier would house the archive and would use more economical media. This tier would be somewhat slower performing but would have autonomic properties – as would the primary storage – and it would have features for short- and long-distance replication to help it integrate with disaster-recovery initiatives (the disaster-recovery location also would need duplication of the message-archiving infrastructure: servers, storage and data). Both storage tiers require easy integration into the existing tape back-up process and enterprise management and monitoring tools.

Policies are needed, too. Events and dates should trigger archival processes that move messages from the primary store to the archive. Policies that determine how IMs are to be used and integrated with the central archival store also are necessary. These should include specific “messenger” names that can be tracked and associated with staff, how and when messages would be blocked or flagged as suspect, and access control lists of who and how IMs could be used or viewed. The message archive would be indexed by content, so that key words or activities can be tracked and monitored.

Educating employees will be critical. Human resources must help draft revised “e-policies” that state proper use of electronic messages, both e-mail and IM. Expect to train users on how they would use the archive to search and retrieve messages, too.

Electronic messages are an indispensable business tool, but recent regulations are forcing businesses to treat every message as a potential legal document. A systematic electronic messaging archiving infrastructure should become central to new data center plans.

Previous: Web site reliability | Next Controlling server sprawl