About Stats
We have so many plans for the Stats area of the site. For now, here are a few high-level data tables that contain the type of analyses we plan to release with our first full site version.
If you have any incredible ideas for some aggregated information you'd like to see here, please do get in touch.
Browse Some Stats
User-Agents mentioned in robots.txt in all analysed sites
Most explicitly allowed-all bots
User-Agent | Mentions | Allow All | |
1 | Googlebot | 16.3m | 1.0m |
2 | facebookexternalhit | 0.9m | 0.4m |
3 | Mediapartners-Google | 14.3m | 0.4m |
4 | Googlebot-Image | 1.7m | 0.4m |
5 | AdsBot-Google | 10.1m | 0.3m |
6 | bingbot | 11.6m | 0.3m |
7 | Twitterbot | 0.8m | 0.3m |
8 | ia_archiver | 9.5m | 0.2m |
9 | Mediapartners (Googlebot) | 0.2m | 0.2m |
10 | Baiduspider | 9.5m | 0.1m |
Download
The full dataset is available under a Creative Commons Attribution 4.0 International license.
- Bot Statistics [Comma-Separated (.csv)]
Notes
These tables are reports of what we see in robots.txt files. As robots.txt files are rarely validated or checked, there will be misspellings and incorrectly formatted crawler names. So, bear in mind, something that you see labelled as a "bot" may be a common typo, rather than a real crawler.
The files we provide are an early-release sample of data. It is very likely that layout and column headers will change in the future.