I noticed that when browsing one of our heavy traffic website’s under different browsers, I would see completely separate versions of a page depending on which browser I used. We use a cluster of Squid servers to cache the many pages of this particular website, but I believed the only thing that needed to happen was to configure the HAProxy load balancer to balance requests based on URI so each page would always be served by the same Squid server (to optimize hit rate). Apparently not so!
So… what’s the deal?
Apparently, the Vary: Encoding header passed by Squid, in response to the browser requests, is used to make sure browsers receive only pages with encoding, if any, that they support. So, for example, Firefox would tell Squid that it accepts “gzip,deflate” as encoding methods, while Chrome was telling Squid it accepts “gzip,deflate,sdch”. Internet Explorer was only different from Firefox by a single space (“gzip, deflate”), but that was enough for Squid to cache and serve a completely separate object. I felt like this was a waste of resources. The “Big 3″ browsers all support gzip and deflate (among other things), so in order to further optimize performance (and hit rate), I decided to normalize the Accept-Encoding headers.
My first attempt was to do something on the Squid servers themselves. However, the “hdr_replace” function I looked into did not have any impact on how Squid actually handles those requests, so I could hdr_replace all day long and each browser would still see separate cache objects for each individual page.
The alternative was to use HAProxy, but it turns out this works well! After a bit of reading, I found the “reqirep” function that allows one to rewrite any header passed down through the back-end servers. It uses regular expressions to do the deed, and after some testing/serverfaulting/luck I ended up adding the following command to the HAProxy backend configuration:
reqirep ^Accept-Encoding:\ gzip[,]*[\ ]*deflate.* Accept-Encoding:\ gzip,deflate
This regex will match for IE (gzip, deflate), Chrome (gzip,deflate,sdhc), and Firefox (gzip,deflate). It should also be noted that Googlebot uses the same header as Firefox. I didn’t bother looking into other browsers, as I was most concerned with the major browsers. If someone wants to contribute a better Regex to get the job done, let me know!
An additional reason normalized Accept-Encoding headers are good is that your cache will be primed much more quickly if everybody that visits your website are retrieving the same version of a page. Less work on your web server farm, database servers, and all that.
Speed is good. Good luck!