A few years ago, I was involved in a consulting project with a large company in the healthcare industry that was in the middle of a data center migration. After the networks and servers were stood up at the new location they needed to migrate massive amounts of data in bulk so the company secured a pair of OC192 circuits, providing nearly 10Gbps of throughput in each direction on each circuit.
Everything seemed to be in order, so they began transferring data. To their surprise, they were only seeing throughput in the tens of megabits per second, even on servers connected to the network via gigabit Ethernet switches. After exhausting all the normal troubleshooting steps, they decided to bring in a fresh set of eyes. What we discovered may seem counterintuitive: this company’s pipes were just too big. The company was suffering from a Long Fat Network (LFN).
+ ALSO ON NETWORK WORLD Unplugging the data center +
The LFN problem addressed here relates to one function of one protocol in one layer of the OSI model: the Transmission Control Protocol, or TCP. Layer 4, the Transport Layer, provides numerous functions, including:
- Segmentation of data. If the amount of data sent by an application exceeds the capability of the network, or of the sender or receiver’s buffer, the Transport Layer can split up the data into segments and send them separately.
- Ordered delivery of segments. If a piece of data is broken up into multiple segments and sent separately, there is no guarantee the segments will arrive at the destination in the correct order. The Transport Layer is responsible for receiving all the segments and, if necessary, putting the data stream back together in the correct order.
- Multiplexing. If a single computer is running multiple applications, the Transport Layer differentiates between them and ensures data arriving on the network is sent to the correct application.
In addition, the Transport Layer traditionally has been responsible for reliability, or guaranteed delivery of data. Not all Transport Layer protocols provide reliability mechanisms, and which Transport Layer protocol is used by a given application depends on a number of considerations. However, the majority of data traversing networks today utilizes the TCP, which does indeed provide a reliability mechanism. And it is TCP’s reliability mechanism that is at the heart of the LFN problem.
When data is ready to be sent TCP performs the following sequence of events:
1. TCP on the initiating computer establishes a connection with TCP on the remote computer.
2. Each computer advertises its Window Size, which is the maximum amount of data that the other computer should send before pausing to wait for an acknowledgment. The advertised window size is typically related to the size of the computer’s receive buffer.
3. TCP begins transmitting the data in intervals equal to the maximum segment size, or MSS (also negotiated by the hosts). Once the amount of data transmitted equals the window size, TCP pauses and waits for an acknowledgment. TCP will not send any more data until an acknowledgment has been received.
4. If an acknowledgment is not received in a timely manner, TCP retransmits the data and once again pauses to wait for an acknowledgment.
This “send and wait” method of reliability ensures that data has been delivered, and frees applications and their developers from having to reinvent the wheel every time they want to add reliability to their applications. However, this method lends itself to inefficiencies based on two factors: 1) how much data a computer sends before pausing, and 2) how long the computer has to wait to receive an acknowledgment. It is these two factors that are critical to understanding, and ultimately overcoming, the LFN problem.
We now have enough information to understand the LFN problem. TCP is efficient on Short Skinny Networks, but not on Long Fat Networks. The longer the network (i.e. the higher the latency), the longer TCP has to sit by twiddling its thumbs waiting for an acknowledgment before it can send more data. And the fatter the network (i.e. the faster a sender can serialize data onto the wire), the greater the percentage of time TCP is sitting by idly. When you put those two together -- Longness and Fatness, or high latency and high bandwidth -- TCP can become very inefficient.
Here is an analogy. Let’s say you have a coworker who talks a lot. It’s not that he has a lot to say, he just talks really slowly. When you are having a conversation face to face, he can pretty much just keep talking and talking and talking. He gets near-instant acknowledgment that you heard what he said, so he can just keep talking. There is very little dead air. This is equivalent to a Short Skinny network.
Now let’s say your coworker becomes an astronaut and flies to Mars. He calls you on his astronaut phone to tell you about the trip, and he is really excited so he talks really fast. But the delay is really long. Since he can’t see you, he decides that every 25 words he will pause and wait for you to respond before he continues speaking.
Since your friend talks really fast, let’s say it only takes him five seconds to spit out 25 words before pausing to wait for a response. If the round trip delay between Earth and Mars is 10 seconds, he will only be able to speak 33% of the time. The other 67% of the time the line between you and the Martian is sitting idle.
It wouldn’t be such a big deal if he didn’t speak so fast. If it took him two minutes to speak those same 25 words instead of blurting them out in five seconds, he’d be speaking for about 92% of the time. Likewise, if the round trip latency between you were lower, let’s say two seconds, the utilization percentage of the line would go up as well. In this case he would speak for five seconds and then pause for two, achieving a utilization of about 71%.
Let’s look at a real-world network scenario. Two computers, Computer A and Computer B, are located at two different sites that are connected by a T-3 link. The computers are connected to Gigabit Ethernet switches. The one-way latency is 70 milliseconds. Computer A initiates a data transfer to Computer B using an FTP PUT operation. The following sequence of events occurs (for the sake of simplicity, I will leave out some of the TCP optimizations that may occur in the real world):
1. Computer A initiates a TCP connection to Computer B for the data transfer.
2. Each computer advertises a window size of 16,384 bytes, and an MSS of 1,460 bytes is negotiated.
3. Computer A starts sending data to Computer B. With an MSS of 1,460 bytes and a window size of 16,384 bytes, Computer A can send 11 segments before pausing to wait for an acknowledgment from Computer B.
So how efficient is our sample network? To figure this out, we need to calculate two numbers:
1. The maximum amount of data that could be in flight on the wire at any given point in time. This is called the bandwidth-delay product. Think of it like an oil pipeline: how much oil is contained within a one mile stretch of pipe if the oil is flowing at 10mph and you are pumping 10 gallons per minute? (Answer: 60 gallons). In our example, the T-3 bandwidth is 44.736Mbps (or 5.592 megabytes per second) and the delay is 70 milliseconds. So the bandwidth-delay product is 5.592 x .07, or about 0.39MB (399.36KB). This means at any given point in time, if the T-3 link is totally saturated, there is 0.39MB of data in flight on the wire in each direction.
2. The amount of data actually transmitted by Computer A before pausing to wait for an acknowledgment. In our example, Computer A sends 11 segments, each being 1460 bytes. So Computer A can only send 1460 x 11 = 16,060 bytes (15.68KB) before having to pause and wait for an acknowledgment from Computer B.
So, if the network link could support 399.36KB at any given point, but Computer A can only put 15.6KB on the wire before pausing to wait for an acknowledgment, the efficiency is only 3.9%. That means that the link is sitting idle 96.1% of the time!