Friday, January 28, 2011

The WebKit PreloadScanner

WebKit's HTMLDocumentParser frequently must block parsing while waiting for a script or stylesheet to download. During this time, the HTMLPreloadScanner looks ahead in the source for subresource downloads which can be started speculatively. I've always assumed that discovering subresources sooner is a key factor in loading web pages efficiently, but until now, never had a good way to quantify it.

How effective is it?

Today I used Web Page Replay to test a build of Chromium with preload scanning disabled vs a stock build. The results were definitive. A sample of 43 URLs from Alexa's top 75 websites loaded on average in 1,086ms without the scanner and 879ms with it. That is a ~20% savings!

That number conceals some subtleties. The preload scanner has zero effect on highly optimized sites such as google.com and bing.com. In stark contrast, the preload scanner causes cnn.com, a subresource heavy site, to load fully twice as fast.

Why does this matter?

There is a lot of room for improvement in the preload scanner. These results tell me that it is worth spending time giving it some serious love. Some ideas:

  • It doesn't detect iframes, @import stylesheet, fonts, HTML5 audio/video, and probably lots of other types of subresources.
  • When blocked in the <head>, it doesn't scan the <body>.
  • It doesn't work on xhtml pages (wikipedia is perhaps the most prominent example).
  • The tokens it generates are not reused by the parser, so in many cases parsing is done twice.
  • The scanner runs in the UI thread. So as data arrives from the network, it may not be scanned immediately if the UI thread is blocked by JavaScript execution.
  • External stylesheets are not scanned until they are entirely downloaded. They could be scanned as data arrives like is done for the root document.

Test setup

The test was performed with a simulated connection of 5Mbps download, 2Mbps upload, and a 40ms RTT time on OSX 10.6. The full data set is available.

8 comments:

Anonymous said...

Nice test!!

Let's make the fixes you suggest.

Steve Souders said...

Preload scanning (aka speculative parsing) was pretty much non-existent 3 years ago and now most of the main browsers have it. But I haven't seen any comparison. For example, it looks like IE won't preload images while waiting for scripts. I'd love to see you do this comparison given that you know a lot about how preload scanning works. Browserscope user test? ;-)

The Nerdbirder said...

@Steve Souders - A couple of the browserscope network tests are what motivated me to look into this. I'd be up for adding more tests as I learn more.

NicJ said...

IE9's speculative parser will preload images while waiting on scripts (this is a change from IE8).

Anonymous said...

"...It doesn't work on xhtml pages (wikipedia is perhaps the most prominent example)..."

This means, just because of this reason changing XHTML to HTML is a great Performance Optimization?

kbalazs said...

Great benchmark!
What method/tool did you use to load the pages one-by-one?

Ryan Wyatt said...

I am a big fan of your blog and my day start from reading your article and i am enjoyed lot here. All the articles which you have shared here are unique and i am not read about them anywhere superb and hard work is done by you. website content writing is an art and you are very good in this context.

Anonymous said...

How to disable PreloadScanner in webkit?