OpenRobotsTXT

What is the OpenRobotsTXT (ORT) web crawler?

OpenRobotsTXT is a project designed to archive and analyze the content of all of the robots.txt files on the web. The project will act as a robots.txt library, allowing anyone to search for both active and historic files.

The project is supported by a dedicated crawler, known as ORT. ORT is limited to visiting publicly accessible robots.txt files. It does not seek to access other content. It is not designed to crawl whole domains. After downloading robots.txt, it moves onto another website.

The OpenRobotsTXT project is led by Majestic, a UK based internet research company and specialist search engine used by hundreds of thousands of businesses worldwide.

This is the official information page for the ORT crawler, aka ORT bot.

Crawler Information

User Agent

ORT

Type

Other / robots.txt only

Identification Features

ORT substring in User-Agent with version.
User Agent Contains reference to this page about crawler.
Dedicated User agent page accessible via ORTc.me

Authentication

Reverse DNS (RDNS)

Obeys Robots.txt

Yes* - Only downloads robots.txt – optional directives supported to support control of archival

Data served at

openrobotstxt.org

What is ORT (the OpenRobotsTxt crawler) doing on my site(s)?

The OpenRobotsTxt Crawler, aka ORT, is designed to examine robots.txt files only. Redirects will only be respected if they resolve to robots.txt. Any other content will be ignored.

Why is there an asterisk after 'Yes*' in the Bot Info table?

The ORT crawler is designed to access robots.txt files. Other files on your site should not be visited. This design decision limits the effect of “allow” and “disallow” robots.txt directives, as other content is out of scope for the project. The ORT bot supports optional robots.txt instructions as detailed below.

What happens to the crawled data?

The data gathered from the crawl is archived, analysed and made available freely available for verified users on this site, openrobotstxt.org. We may also share the data with researchers and other partner organisations so they may better understand a world wide view of robots.txt files and policies.

Can I block ORT/OpenRobotsTXT?

The ORT crawler is not interested in your general site content. If you wish to prevent OpenRobotsTxt from publishing your robots.txt file on the public archive, please review the documentation for the “Publish-Robots” directive below.

There are a number of ways to block bots from accessing robots.txt files. This may require bespoke, expert advice. Configuration may vary by webserver. The impact of blocking user-agents from accessing Robots.Txt may be significant. For these reasons we cannot offer advice or guidance in this area. For server side user agent control, please contact your ISP or cloud services provider.

The ORT crawler adheres to the robots.txt standard. As the bot is only interested in robots.txt, it should not visit any other page on your site. An exception is where your server is specially configured to issue a redirects when the robots.txt file is requested. Read more on support for redirects below. As ORT should only download robots.txt, the following disallow statement should make no difference in the behaviour of the crawler. You are welcome to add this to your robots file.


                        User-agent: ORT

                        Disallow: /

If you have reason to believe that the ORT crawler strayed from robots.txt and accessed other site content, then please let us know. When reporting any errors, it is essential to provide the URL of your website. To facilitate our investigation, please provide log entries an extract of log files which show dates, times, user agent and URL visited.

Please note that the OpenRobotsTXT site accepts data from a few trusted partners. They may use their own crawl methods, with a different User Agent and ruleset.

Is there a down-side to blocking the ORT bot?

Yes.

As well as offering aggregated reports on webmaster sentiment regarding blocking and allowing certain bots, OpenRobotsTXT aims to create a repository that can help inform crawl strategies and crawl budget. If OpenRobotsTXT is blocked, you remove the ability for other crawlers to check our records.

Additionally, we will not be able to provide you with an audit ability to see if a named crawler should have been able to visit parts of your site on a certain day.

What commands in robots.txt does ORT support?

Robots.txt informs crawling behaviour for site content. ORT is a robots.txt focussed web crawler, and as such is not interested in any other content on your site. Robots.txt instructions are not intended to apply to scanning of robots.txt itself.

However, ORT will obey the following optional instruction to control republishing of your robots file ONLY WHEN placed in ORT directives:


                        User-agent: ORT

                        Publish-Robots: yes

                        Disallow:

Options are [yes/no]. Change Publish-Robots to 'no' to request that we do not publish this robots.txt

Does ORT follow redirects?

Sometimes a server may send a 30x redirect in response to a request for a robots.txt instead of the file anticipated.

The OpenRobotsTXT crawler ( ORT ) will follow up to 5 redirects. The crawler will only follow redirects where the path resolves to “/robots.txt”. Redirect chains falling outside of these parameters will cause the crawler to log an error and move on.

This is a stricter interpretation than some other crawlers, which may accommodate internal ( same host ) redirects to any path.

Please note that redirects are configured and served by your server. As such, redirects served will be prioritised over any directives in robots.txt.

Why did ORT access my files contrary to robots.txt?

The ORT crawler should only visit robots.txt files. With the exception of redirects which are sent by your site, the crawler should not access other parts of your site.

We are keen to see any reports of potential violations of robots.txt by ORT. To submit a bug report, please enclose an extract of log files which show dates, times, user agent and URL visited.

As with any web crawler or bot, there is scope for false positives. We’ve put together the following checklist to help triage any concerns with the ORT crawler:

Off-site redirects when requesting robots.txt. The ideal is for robots.txt to be available at "/robots.txt" as specified in the standard.
Multiple domains running on the same server. Modern webservers such as Apache can log accesses to a number of domains to one file - this can cause confusion when attempting to see what webserver was accessed at which point. You may wish to consider adding domain information to the access log, or splitting access logs on a per domain basis.
Robots.txt out of sync with developer copy. Variations of this principle include the web server being configured to send out different content than specified in the robots.txt file.

What are the current versions of ORT?

The current operating versions of the OpenRobotsTXT crawler ( ORT ) is

v1.0.0. HTTP User-Agent: “Mozilla/5.0 (compatible; ORT/v1.0.0; visit ORTc.me )” (Current - May 2025)

Note: The ORT User Agent expressed in HTTP headers may vary subtly from version to version. As with many crawlers, the user agent specified in Robots.Txt ( ORT ) differs from that used in the HTTP header.

ORT uses a dedicated URL shortener to reference this page: ORTc .me. Instructions to access this domain are included in the crawlers HTTP User Agent header. This dedicated domain points to this page. In the current version, the http[s] prefix of ORTc.me is omitted to reduce link spam. A number of websites include details of user agents visiting their sites in copy. By omitting the http prefix, we reduce the chance of the bot inserting links into third party sites as a natural consequence of crawling. This is achieved without impacting a humans ability to discern the appropriate web address for the user agent.

Note: Early versions of this page referred to the User-Agent as “OpenRobotsTXT”. In final testing, it was decided that the ambiguity between user agent, and the archive was unhelpful. The user agent was renamed as “ORT” in order to differentiate the crawler from the project .

How do I verify requests are from you?

Reverse DNS is supported.

RDNS is supported on the systems on which the OpenRobotsTxt crawler is operated. RDNS Queries will result in the following:


                        crawler*.openrobotstxt.org

Where * is a unique code assigned to a crawler host.