Understanding Shadow Redundancy in Exchange 2010

Improving Message Routing Reliability in an Exchange 2010 Environment

One of the improvements made in Exchange 2010 is the redundancy built-in to the routing of messages through the Exchange environment.  Microsoft built-in a technology they call “Shadow Redundancy” which the concept behind shadow redundancy is that a message is not deleted from the queue until the next hop has confirmed delivery to the subsequent hop.  If confirmation is not received, the message is resubmitted. If the next hop server is down, the message is resubmitted to another server.  Bottomline, messages are no longer lost when a routing server fails.

NOTE:  Assured delivery requires that there be redundant Hub Transport and Edge Transport servers to resubmit to if a failure of any given transport server occurs.

The components of the shadow redundancy follow:

  • Primary Message—The original message submitted to transport for delivery.

  • Shadow Message —The copy of a message that a transport server retains until it confirms that all the next hops for that message have successfully delivered it.

  • Primary Server—The transport server that is currently processing a message.

  • Shadow Server—The transport server that holds shadow copies of a message after delivering the message to the primary server.

  • Shadow Queue—The queue that a transport server uses to store shadow messages. A transport server will have separate shadow queues for each Primary server to which it delivered the primary message.

  • Discard Status—The information a transport server maintains for shadow messages that indicates when a message is ready to be discarded.

  • Discard Notification—The response a shadow server receives from a primary server indicating a message is ready to be discarded.

  • Shadow Redundancy Manager—The Transport component that manages shadow redundancy.

  • Heartbeat—The process of transport servers verifying the availability of each other.

As an excerpt from my book “Exchange 2010 Unleashed”, one of my co-authors, Chris Amaris, describes a mail flow using shadow redundancy in the following example.  In the example, Chris with mailbox on Exchange Server 2010 Mailbox server MB1 is sending a message to Michelle with a mailbox on Exchange Server 2010 Mailbox server MB2. There are two Exchange Server 2010 Hub Transport servers, HT1 and HT2. The process is:

  1. Chris Submits Message—The message is submitted to MB1. MB1 becomes the Primary Server for the message. (NOTE: Client submissions such as MAPI, Windows Mobile, or SMTP client are not redundant until the message is successfully stored on the mailbox or hub transport server. Then the Exchange Server high availability features take effect)

  2. MB1 submits to HT1—The message is submitted by MB1 to HT1. HT1 becomes the Primary Server and MB1 becomes a Shadow Server. However, HT1 subsequently fails and never acknowledges the delivery of the message to MB2. MB1 times out and becomes the Primary Server.

  3. MB1 submits to HT2—The message is resubmitted by MB1 to the redundant HT2. HT2 becomes the Primary Server and MB1 becomes a Shadow Server.

  4. HT2 submits to MB2—The message is submitted to by HT2 to MB2. MB1 confirms that HT2 has delivered the message, deletes the message from its shadow queue, and is no longer a Shadow Server.

  5. Michelle Receives Message—The message is received by Michelle.

Shadow redundancy gives the Exchange Server 2010 self-healing capabilities for mail flow. It enables the infrastructure to intelligently fail over between redundant paths if messages have not been delivered in a timely manner.

An administrator needs to do nothing to get Shadow Redundancy to work in Exchange 2010.  This feature is built-in, and as long as you have redundant servers in your environment, message routing will be made highly available and routing redundancy will be performed automatically in event of a routing (Hub Transport) server failure.

This design change in Exchange 2010 does beg the question “so what” in terms of how we might architect Exchange 2010 servers.  Because routing is made redundant and assurances are made to the delivery of messages between servers, an organization is better off with 2 Hub Transport servers with non-fault tolerant hardware (ie: basic server with no RAID / no reduant power supplies) in their environment than a single Hub Transport server with a lot of hardware fault tolerance.  If a basic Hub Transport server fails, messages are routed through the secondary or subsequent server system(s).  For organizations virtualizing their servers, having 2 (or more) basic hub transport servers will provide redundancy for the messaging routing better than clustering the guest sessions.  This is the strategy that Exchange has taken for redundancy is to have multiple lowend systems (ie: scaling out) than to have a single highly available (ie: scaling up) configuration.

The technology and model work very well, and our architecture of Exchange 2010 environments now take in account this new design structure and high availability and redundancy technology built-in to Exchange 2010.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10