Many websites contain a robots.txt file. It is a set of instructions to search engines (like Google) on which files/folders/urls they should access on the website, and which to ignore. It is generally to keep private or duplicate or unimportant information from appearing in search results. It can also be used to help robots identify the sitemap of a website.

How robots.txt can interfere with DubBot

Depending on the way the robots.txt file is set up on a website, it can prevent DubBot from crawling pages that need to check for Accessibility.

When to check your Site settings

  • If a crawl of a site states that there are 0 pages found and you know there are pages.

  • If a crawl of a site doesn’t return all of the pages you expected.

  • If you know that a site has pages blocked by a robots.txt file and you would like them crawled and checked for accessibility.

  • If you receive a message from DubBot about your robots.txt file:

DubBot error message: This site is blocked by robots.txt

What to do

The settings for the robots.txt are per site. View instructions on modifying the robots.txt settings

  • If you want DubBot to ignore all of the robots.txt file’s directives, uncheck the Obey robots.txt box.

  • If you want to give DubBot permission to crawl some but not all of a site’s data, view the Dubbot Crawler information and edit your robots.txt file to allow specific permissions for DubBot’s crawler.

References

Did this answer your question?