Saturday, February 5, 2011

Chrome's 10 Caches

While defining the set of page load time metrics that we think are most important for benchmarking, Mike Belshe, James Simonsen and I went through a seemingly simple exercise: enumerate the ways in which Chrome caches data. The resulting list was interesting enough to me that I thought it worthwhile to share.

When most people think of "the browser's cache" they envision a single map of HTTP requests to HTTP responses on disk (and perhaps partially in memory). This cache may arguably have the most impact on page load times, but to get to a truly stable benchmark, we identified 10 caches that need to be considered. An understanding of the various caches is also useful to web page optimization experts who seek to maximize cache hits.

  1. HTTP disk cache

    Stores HTTP responses on disk as long as their headers permit caching. Lookups are usually significantly cheaper than fetching over the network, but they are not free as a single disk seek might take 10-15ms and that doesn't include the time to read the data from disk.

    The maximum size of the cache is calculated as a percentage of available disk space. The contents can be viewed at chrome://net-internals/#httpCache. It can be cleared manually at chrome://settings/advanced or programmatically by calling chrome.benchmarking.clearCache() when Chrome is run with the --enable-benchmarking flag set. Note that for incognito windows this cache actually resides in memory. source

  2. HTTP memory cache

    Similar to the HTTP disk cache, but entirely unrelated in code. Lookups in this cache are fast enough that they may be thought of as "free."

    This cache is limited to 32 megabytes, however, when the system is not under memory pressure the effective limit may be higher due to its use of purgeable memory. Conversely, when multiple tabs are open, the limit may be divided among the tabs. It is cleared in the same manner as the HTTP disk cache. source

  3. DNS host cache

    Caches up to 100 DNS resolutions for up to 1 minute each. It is somewhat unfortunate that this cache needs to exist in Chrome, but OS level caching cannot be trusted across platforms.

    It can be viewed and manually cleared at chrome://net-internals/#dns. source

  4. Preconnect domain cache

    A unique and important optimization in Chrome is the ability to remember the set of domains used by all subresources referenced by a page. Upon the next visit to the page, Chrome can preemptively perform DNS resolution and even establish a TCP connection to these domains.

    This cache can be viewed at about:dns. source

  5. V8 compilation cache

    Compilation can be an expensive step in executing JavaScript. V8 stores compiled JS keyed off of a hash of the source for up to 5 generations (garbage collections). This means that two identical pieces of source code will share a cache entry regardless of how they were included. source

  6. SSL session cache

    Caches SSL sessions to disk. This saves several round trips of negotiation when connecting to HTTPS pages by allowing the connection to skip directly to the encrypted stream. Implementation and limits vary by platform, as an example, when OpenSSL is used, the limit is 1,024 sessions. source

  7. TCP connections

    Establishing a TCP connection takes about one round trip time. Newer connections also have a smaller window so they have a lower effective bandwidth. For this reason Chrome keeps connections open for a period in hopes that they can be reused. This can be thought of as an in-memory cache.

    Connections may be viewed at chrome://net-internals/#sockets and cleared programmatically by calling chrome.benchmarking.closeConnections() when Chrome is run with the --enable-benchmarking flag set. source

  8. Cookies

    While not usually thought of as a cache, this is web page state which is persisted to disk. The presence of cookies can have a large impact on performance. They can bloat requests and change how the client and server behave in limitless ways.

    They can be cleared manually at chrome://settings/advanced. We are planning to add a method to chrome.benchmarking for the same. source

  9. HTML5 caches

    HTML5 introduces 3 major new ways for web pages to persist state to disk: Application Cache, Indexed Database and Web Storage. For a particular page, these stores may be viewed under the "Resources" panel of the Inspector. The entire Application Cache may also be viewed and manually cleared at chrome://appcache-internals/

  10. SDCH dictionary cache

    While currently only used by Google Search, the SDCH protocol requires a shared dictionary to be downloaded periodically. A performance hit is taken infrequently to download the dictionary which makes future requests much faster. source

I hope you found this as interested as I did. Please let me know if I left anything out.

Edit: Will Chan points out additional caches for proxies, authentication, glyphs and backing stores. Proxy and authentication caches were intentionally omitted because they aren't relevant to our benchmark, however glyphs and backing stores are two additional things we need to consider. Thanks!

13 comments:

Mihai Parparita said...

MemoryCache looks like it's per (renderer) process, is that actually the case? Does this mean that for resources that appear on multiple distinct sites (e.g. Analytics JS) we'll end up with multiple copies of it in memory, one per process?

The Nerdbirder said...

@Mihai Parparita - Yep. I think it is considered a security benefit to wall off the cache for each process, but I not sure about that. Seems like sharing some memory could be a big win if possible.

There are actually a lot of subtleties to the size limits and how it is divided. The source is here: http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/renderer_host/web_cache_manager.h?view=markup

Aaron said...

Hi Tony,

Nice article. Txs.
I am seeking the answer to this question:

what is the policy in Chrome for (disk) caching HTTPS content?
E.g. a .css file sent over HTTPS

Does Chrome want the Cache-Control:Public header? Is a normal Expires or Cache-Control header sufficient?

The Nerdbirder said...

@Aaron - Chrome doesn't need Cache-Control:public to cache an https subresource.

Eric Lawrence speaks for IE here: http://blogs.msdn.com/b/ieinternals/archive/2010/04/21/internet-explorer-may-bypass-cache-for-cross-domain-https-content.aspx

And FF changed its behavior here:
https://bugzilla.mozilla.org/show_bug.cgi?id=531801

Michael Mahemoff said...

Cool summary! I think you could really separate out all the HTML5 caches, and add Web SQL Database and File access (http://www.html5rocks.com/tutorials/file/filesystem/) too.

Unknown said...

thanks Tony, Please permit us to publish a link to your site from IT Server Press http://serverpress.wordpress.com

Thanks

Anonymous said...

thanks amigo! great post!

Anonymous said...

I love this post – totally kewl!!! Well done! I’m coming back to this one …

Zhen Wang said...

Hi Tony,

This is very nice article. It has answered several questions I have. Thx!

One further question is: how does http memory cache and http disk cache work together? any pointer to the implementation in the codes?

Also, is there any rationale behind the cache size numbers?

Brett Cave said...

@Aaron To clarify the "public" vs "private" options for cache-control normally only applied to shared caches (e.g. a squid proxy), as a "private" cache is can still be cached locally by a browser, but not by a shared proxy (learnt a bit about caching from this: http://www.mnot.net/cache_docs/)

@Tony I came across an help forum post where a developer found that content served through an HTTPS server with a self-signed certificate wasn't caching content with cache-specific directives. We have found similar behaviour with Chrome only, but have not isolated the cause. Is this behaviour expected? (reference: http://www.google.com/support/forum/p/Chrome/thread?tid=2f7803d278406baf&hl=en)

Anonymous said...

Lots of incredibly good reading here, thanks! I was searching on yahoo when I observed your post, I’m going to add your feed to Google Reader, I look forward to more from you.

Anonymous said...

@Brett did you ever find an answer to your question? I'm encountering the same issue with Chrome not caching content from a server using a self-signed certificate for SSL.

Anonymous said...

Hi tony
Chrome's cache control is a big headache especially if u want to refresh the page when someone click back buttons .I have a php application which dont work at all in chrome because of caching issue.Any suggestion how i can get rid of that issue in php.I have tried to refresh apge,disable caching etc already but no success.