WebKit's HTMLDocumentParser frequently must block parsing while waiting for a script or stylesheet to download. During this time, the HTMLPreloadScanner looks ahead in the source for subresource downloads which can be started speculatively. I've always assumed that discovering subresources sooner is a key factor in loading web pages efficiently, but until now, never had a good way to quantify it.
How effective is it?
Today I used Web Page Replay to test a build of Chromium with preload scanning disabled vs a stock build. The results were definitive. A sample of 43 URLs from Alexa's top 75 websites loaded on average in 1,086ms without the scanner and 879ms with it. That is a ~20% savings!
That number conceals some subtleties. The preload scanner has zero effect on highly optimized sites such as google.com and bing.com. In stark contrast, the preload scanner causes cnn.com, a subresource heavy site, to load fully twice as fast.
Why does this matter?
There is a lot of room for improvement in the preload scanner. These results tell me that it is worth spending time giving it some serious love. Some ideas:
- It doesn't detect iframes, @import stylesheet, fonts, HTML5 audio/video, and probably lots of other types of subresources.
- When blocked in the <head>, it doesn't scan the <body>.
- It doesn't work on xhtml pages (wikipedia is perhaps the most prominent example).
- The tokens it generates are not reused by the parser, so in many cases parsing is done twice.
- External stylesheets are not scanned until they are entirely downloaded. They could be scanned as data arrives like is done for the root document.
The test was performed with a simulated connection of 5Mbps download, 2Mbps upload, and a 40ms RTT time on OSX 10.6. The full data set is available.
Let's make the fixes you suggest.
Preload scanning (aka speculative parsing) was pretty much non-existent 3 years ago and now most of the main browsers have it. But I haven't seen any comparison. For example, it looks like IE won't preload images while waiting for scripts. I'd love to see you do this comparison given that you know a lot about how preload scanning works. Browserscope user test? ;-)
@Steve Souders - A couple of the browserscope network tests are what motivated me to look into this. I'd be up for adding more tests as I learn more.
IE9's speculative parser will preload images while waiting on scripts (this is a change from IE8).
"...It doesn't work on xhtml pages (wikipedia is perhaps the most prominent example)..."
This means, just because of this reason changing XHTML to HTML is a great Performance Optimization?
What method/tool did you use to load the pages one-by-one?
I am a big fan of your blog and my day start from reading your article and i am enjoyed lot here. All the articles which you have shared here are unique and i am not read about them anywhere superb and hard work is done by you. website content writing is an art and you are very good in this context.
How to disable PreloadScanner in webkit?
Post a Comment