4

HTTP Caching With Htaccess

Monie on March 24th, 2011 in Articles
http-cache

Learn how to optimizing your web page performance by HTTP caching that will reduce round-trip time by eliminating numerous HTTP requests.

Overview

Setting an expiry date or a maximum age in the HTTP headers for static resources instructs the browser to load previously downloaded resources from local disk rather than over the network.

Web/HTTP caching is the caching of web documents (e.g., HTML pages, images) to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.
Wikipedia

On the usual default settings, browsers will always check for freshness, using a conditional GET request. The server returns the date of last modification, and the browser compares this with the date from its cache. If the item has been changed, the browser will download it again.

Why Caching?

According to Google docs in its Optimize Caching tips:

Most web pages include resources that change infrequently, such as CSS files, image files, JavaScript files, and so on. These resources take time to download over the network, which increases the time it takes to load a web page.

HTTP caching allows these resources to be saved, or cached, by a browser or proxy. Once a resource is cached, a browser or proxy can refer to the locally cached copy instead of having to download it again on subsequent visits to the web page. Thus caching is a double win: you reduce round-trip time by eliminating numerous HTTP requests for the required resources, and you substantially reduce the total payload size of the responses. Besides leading to a dramatic reduction in page load time for subsequent user visits, enabling caching can also significantly reduce the bandwidth and hosting costs for your site.

Web page designs are getting richer and richer, which means more scripts, stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components.
Yahoo: Best Practices for Speeding Up Your Web Site

What To Cache?

HTTP/S supports local caching of static resources by the browser. Some of the newest browsers (e.g. IE 7, Chrome) use a heuristic to decide how long to cache all resources that don’t have explicit caching headers. Other older browsers may require that caching headers be set before they will fetch a resource from the cache; and some may never cache any resources sent over SSL.

To take advantage of the full benefits of caching consistently across all browsers, we recommend that you configure your web server to explicitly set caching headers and apply them to all cacheable static resources, not just a small subset (such as images). Cacheable resources include JS and CSS files, image files, and other binary object files (media files, PDFs, Flash files, etc.).
Google: Leverage Browser Caching

In general, HTML is not static, and shouldn’t be considered cacheable.

How To Cache?

There are several tools that Web developers and Webmasters can use to fine-tune how caches will treat their sites. It may require getting your hands a little dirty with your server’s configuration, but the results are worth it. For this article, we are going to learn one of the method of controlling the browser cache with Apache HTTP server module (also known as htaccess Mod Rewrite)

The modules need to be built into Apache; although they are included in the distribution, they are not turned on by default. Some Apache server turn on the module by default tho. If they are not available to you, then you need to contact your server administrator. The modules we’re looking for are mod_expires.

HTTP Caching With Htaccess (with mod_expires module)

Here’s an example .htaccess file that demonstrates how to set far-future expiry for your static assets.

###################################
#   HTTP Header Caching Setting   #
###################################
# Set far-future expiry for images, css, and js
# Remember that you MUST change the filename whenever you update these items!
# Enablel this in your httpd.conf :
# LoadModule expires_module modules/mod_expires.so
<ifmodule mod_expires.c>
<Filesmatch "\.(jpg|jpeg|png|gif|js|css|swf|ico)$">
    ExpiresActive on
    ExpiresDefault "access plus 1 years"
</Filesmatch>
</ifmodule>

This aggressively caches all static assets (images, and so on), but not the pages themselves. A consequence is that, whenever you change a file (say, you alter your CSS), you must also change its name (add a version number or date-stamp).

For all cacheable resources (images, css, js), it is recommend to set the expires date not greater than one year in the future. Setting it more than one year will violates the RFC guidelines.

If you know exactly when a resource is going to change, setting a shorter expiration is okay. But if you think it “might change soon” but don’t know when, you should set a long expiration. Setting caching aggressively does not “pollute” browser caches: as far as we know, all browsers clear their caches according to a Least Recently Used algorithm; we are not aware of any browsers that wait until resources expire before purging them.

This kind of caching is far too aggressive for a web page, as the contents of the page may change, and you really don’t want to change the URL! However, caching the page is not very important, compared to caching the associated files.

Those commands set a 1-year expiry date for all my CSS, javascript, and image files. This means that the browser won’t even check for new versions until 1 years later.

Forcing The Browser To Apply A New Changes

What happens if you change your logo after one month? What happen if you made some changes to your CSS file? There’s no way for the user to get the new logo or the new style, unless he refreshes the web page. And why would they do that?

The solution is to change the file name. If the file was called logo1.png, you rename it to logo2.png (and you also change any references to it in your code). Then the browser will download it again, because as far as the browser is concerned, it’s a completely different file. Each time you change the logo, increment the version suffix by 1 (alternatively, you can use a date-stamp. Just make sure it’s always a file name you haven’t used before).

Example:

<link rel="stylesheet" type="text/css" href="style.css?date=13-12-2010">
<link rel="stylesheet" type="text/css" href="style.css?v=1.0">

Let’s Test Our Page!

Now that we have the htaccess ready in our web server, it is time to test them and see the comparison visually.

Scenario #1 (2 HTTP Request)

Normally, the conversation between browser and server goes like this:

  1. Browser: send me that web page please [HTTP request number 1]
  2. Server: sure, here it is. You will also need this large logo image
  3. Browser: let me just check if I’ve got it already…yes, but that was yesterday. I guess it might have been updated since then. Can you tell me when it was last modified? [HTTP request number 2]
  4. Server: sure. It was last modified a year ago
  5. Browser: okay, my cached version is more recent than that. I don’t need to download it again

So you can see that there are two HTTP requests, two separate occasions when the browser asks the server for data. First it asks for the web page, then it asks when the logo was last modified.

This is how caching normally works. Yslow is recommending more aggressive caching, where you send expiry information along with the file. With the .htaccess commands I gave you, the conversation between server and browser will be different:

Scenario #2 (1 HTTP Request)

  • Browser: send me that web page please [HTTP request number 1]
  • Server: sure, here it is. You will also need this large logo image
  • Browser: let me just check if I’ve got it already…yes, and it doesn’t expire for another 1 years. I can use my cached version

So this time, there’s only one HTTP request. Yet there’s a catch! The browser will never ask for a fresh version. It won’t even ask whether the image has been modified. The same thing goes with your CSS, js and the rest of your static assets file.

A Visual Comparison

First, let’s start of by loading a fresh new website into our browser. The following image (captured with HTTPWatch) showing the time line of the page loading time by displaying the detail page resources versus the time it takes to load that individual resources.

The requests from browsers, result in one of the following response status codes:

  • 200 – The browser does not have the image in its cache. [First time visit]
  • 304 – The browser has the image in its cache, but needs to verify the last modified date.

We can see that the time it takes for the web browser to loads this page is approximately 3.5 seconds from as seen from the Time column.

Now, let’s have a look on how our browser loads our page with a locally cached copy of all the static elements!

As you can see, the page loads almost 50% faster that the previous one which is approximately 1.6 seconds. All the blue line that you see from the time chart representing the locally cached copy of all the web page resources.

Summary

Using a far future Expires header affects page views only after a user has already visited your site. It has no effect on the number of HTTP requests when a user visits your site for the first time and the browser’s cache is empty. Therefore the impact of this performance improvement depends on how often users hit your pages with a primed cache. (A “primed cache” already contains all of the components in the page.)

By using a far future Expires header, you increase the number of components that are cached by the browser and re-used on subsequent page views without sending a single byte over the user’s Internet connection. In simple terms, it:

  • Reduce round-trip
  • Avoids unnecessary HTTP requests
  • Reduce bandwidth usage
  • Reduce server load
  • …and you get a faster loading page!

4 Responses so far.

  1. [...] here to see the original: HTTP Caching With Htaccess Share on bebo Blog this! Bookmark on Delicious Digg this post Recommend on Facebook Share on fark [...]

  2. Monie says:

    Here is another way of doing it in your .htaccess file which will override the above .htaccess code:

    # Setting (up) will be overite by the following...
    
    # 10 Year
    <FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf|txt)$">
        Header set Cache-Control "max-age=315360000, public"
    </FilesMatch>
    # 10 Year
    <FilesMatch "\.(html|htm)$">
        Header set Cache-Control "max-age=315360000, must-revalidate"
    </FilesMatch>
    <FilesMatch "\.(pl|php|cgi|spl|scgi|fcgi)$">
        Header unset Cache-Control
    </FilesMatch>
  3. Natalie says:

    The example you provided in:
    HTTP Caching With Htaccess (with mod_expires module)

    is not in the .htaccess file but in httpd.conf

  4. Monie says:

    Are you sure about that? I have been using them in my .htaccess file without any problem… Maybe its depends of one preference.
    You can use httpd.conf if you have accessed to it (most web server won’t allow you the access the file), otherwise use .htaccess.

Leave a Reply