About OpenRobotsTXT Bot

OpenRobotsTXT is a project designed to archive and analyze the content of all of the robots.txt files on the web. The project will act as a robots.txt library, allowing anyone to search for both active and historic files.

The bot only wants to visit publicly accessible robots.txt files then move on to the next website. It does not want to crawl whole domains.

The open project is led by Majestic, a UK based specialist search engine used by hundreds of thousands of businesses worldwide.

Bot Info

Bot TypeGood crawler
Identifies ItselfAlways
IP RangeDistributed, Worldwide
Obeys Robots.txtYes*
Data served atopenrobotstxt.org

What is OpenRobotsTXT doing on my site(s)?

Crawler is designed to only examine robots.txt. Redirects will only be respected if they resolve to robots.txt. Any other content will be disregarded.

Why is there an asterisk after 'Yes*' in the Bot Info table?

The bot only wants to access robots.txt itself. It does not crawl site content, so no directives are triggered or violated. So while it plays by the rules and tries to obey robots.txt instructions, it shouldn't find itself in a situation where it has to.

What happens to the crawled data?

Data is archived, analysed and made available freely available for verified users on this site, openrobotstxt.org.

Can I block OpenRobotsTXT?

Of course.

OpenRobotsTXT adheres to the robots.txt standard. As the bot is only interested in robots.txt, it should not visit any other page on your site unless your host's server configuration orders it to do so. However, for your peace of mind, you can request that it does not look at any page other than robots.txt by adding this line.

User-agent: OpenRobotsTXT
Disallow: /

If you have reason to believe that OpenRobotsTXT did NOT obey your robots.txt commands, then please let us know. Please provide URL to your website and log entries showing bot trying to retrieve pages other than robots.txt.

Please note that OpenRobotsTXT accepts data from a few trusted partners. They may use their own crawl methods, with a different User Agent and ruleset.

Is there a down-side to blocking OpenRobotsTXT?

Yes.

As well as offering aggregated reports on webmaster sentiment regarding blocking and allowing certain bots, OpenRobotsTXT aims to act as a back-up for those times when a site's robots.txt is not available. If OpenRobotsTXT is blocked, you remove the ability for other crawlers to check our secondary records to see if and how they should respectfully crawl your site.

Additionally, we will not be able to provide you with an audit ability to see if a named crawler should have been able to visit parts of your site on a certain day.

What commands in robots.txt does OpenRobotsTXT support?

Robots.txt informs crawling behavior for site content, which our crawler does not visit. Robots.txt instructions are not intended to apply to scanning of robots.txt itself.

However, OpenRobotsTXT will obey the following optional instruction to control republishing of your robots file ONLY WHEN placed in OpenRobotsTXT directives:

User-agent: OpenRobotsTXT
Publish-Robots: yes
Disallow:

Options are [yes/no]. Change Publish-Robots to 'no' to request that we do not publish this robots.txt

Does OpenRobotsTXT follow redirects?

Sometimes a server may send a 30x redirect in response to a request for a robots.txt instead of the file anticipated.

The OpenRobotsTXT crawler will follow up to 5 redirects. The crawler will only follow redirects where the path resolves to “/robots.txt”. Redirect chains falling outside of these parameters will cause the crawler to log an error and move on.

This is a stricter interpretation than some other crawlers, which may accommodate internal ( same host ) redirects to any path.

Why did my robots.txt block not work on OpenRobotsTXT?

We are keen to see any reports of potential violations of robots.txt by OpenRobotsTXT.

There are a number of false positives raised - this can be a useful checklist when configuring a web server:

  • Off site redirects when requesting robots.txt - OpenRobotsTXT follows redirects, but only on the same domain. The ideal is for robots.txt to be available at "/robots.txt" as specified in the standard.
  • Multiple domains running on the same server. Modern webservers such as Apache can log accesses to a number of domains to one file - this can cause confusion when attempting to see what webserver was accessed at which point. You may wish to consider adding domain information to the access log, or splitting access logs on a per domain basis
  • Robots.txt out of sync with developer copy.

What are the current versions of OpenRobotsTXT?

The current operating versions of OpenRobotsTXT is:

  • v1.0.0 (Current - May 2025)

How do I verify requests are from you?

We intend to support Reverse DNS. Details coming soon.