About Stats

We have so many plans for the Stats area of the site. For now, here are a few high-level data tables that contain the type of analyses we plan to release with our first full site version.

If you have any incredible ideas for some aggregated information you'd like to see here, please do get in touch.

Browse Some Stats

User-Agents mentioned in robots.txt in all analysed sites

Most explicitly allowed-all bots

User-Agent Mentions Allow All
1 Googlebot 16.3m 1.0m
2 facebookexternalhit 0.9m 0.4m
3 Mediapartners-Google 14.3m 0.4m
4 Googlebot-Image 1.7m 0.4m
5 AdsBot-Google 10.1m 0.3m
6 bingbot 11.6m 0.3m
7 Twitterbot 0.8m 0.3m
8 ia_archiver 9.5m 0.2m
9 Mediapartners (Googlebot) 0.2m 0.2m
10 Baiduspider 9.5m 0.1m

More Allowed User-Agents »

Download

The full dataset is available under a Creative Commons Attribution 4.0 International license.

Notes

These tables are reports of what we see in robots.txt files. As robots.txt files are rarely validated or checked, there will be misspellings and incorrectly formatted crawler names. So, bear in mind, something that you see labelled as a "bot" may be a common typo, rather than a real crawler.

The files we provide are an early-release sample of data. It is very likely that layout and column headers will change in the future.