The bake-off workload
Real-world Web traffic is incredibly complicated, both to understand and to simulate. Many workload attributes are well understood by themselves, but not when combined with each other. For example, we have a clear idea of real-world object size distributions. But how does object size combine with popularity and content-type? Are popular HTML objects larger, smaller or the same as unpopular ones? Web Polygraph models several key aspects of real Web traffic. We realize that our model is not complete, but rather than wasting months or years upfront on developing a perfect model, we feel it is better to model the important parameters now and add new features in future tests.
Our model addresses the following characteristics of Web traffic:
- A mixture of content types.
- Varying offered load, depending on the test phase.
- A working set of URLs that changes its content with time but can preserve its size.
- Global URL set shared by all distributed clients.
- Crosstalk, meaning that all clients are able to talk to all servers.
- Object life-cycles both expiration and last-modification times.
- Persistent connections.
- Network packet loss.
- Reply sizes.
- Server-side latencies.
- A mixture of cache hits and cache misses.
- A mixture of cacheable and uncacheable responses.
- Object popularity meaning the frequency of re-visits to Web objects.
- Request rates and interarrival times.
- Embedded objects and browser behavior.
- Virtually infinite number of different objects that are added to the working set as needed.
Noticeably absent from Polygraph traffic model are:
- DNS-lookup latencies.
- Real content (HTML, images, etc.)
- Aborted requests.
- Cache validation.
- Forced cache validations which are also called reloads.
- Client-side latencies.
- Bandwidth limits.
- Non-HTTP traffic.
Support for some of the missing items awaits better understanding of how the corresponding traffic properties behave in real life. Some items are supported by Polygraph but lack consensus among vendors on acceptable parameterization. Missing items will eventually be added to the Polygraph model. PolyMix-2 test consists of about 14 hours of variable load, as shown on the picture below. Following a typical ISP or corporate environment, the pattern simulates two daily peaks in load and an idle phase in a "compressed" simulation time. Most measurements discussed in this review are taken from the top2 phase when the proxy is more likely to be in a steady state.
RELATED LINKS
Rousskov is the author of Web Polygraph, and he can be reached at rousskov@ircache.net . Wessels is the creator of the IRCache group and the author of Squid caching proxy, and he can be reached at wessels@ircache.net . Chisholm works on various IRCache projects, including Polygraph tools and Squid performance optimization, and he can be reached at chisolm@ircache.net .
Cache panache
A well-placed caching device can boost Web site performance and cut WAN costs.
Interactive buyer's guide
Find the caching device that's right for you or compare two or more devices in a variety of categories, with our detailed database of product specs.
Law firm defends move from software to hardware caching
Better performance drives Perkins Coie from software proxy servers to CacheFlow's Internet Caching Appliances.
