All Collections
Setting Up DubBot (for Administrators)
Additional Administrative Information
What is robots.txt and why do I need to update my site settings?
What is robots.txt and why do I need to update my site settings?
Updated over a week ago

Many websites contain a robots.txt file. It is a set of instructions to search engines (like Google) on which files/folders/urls they should access on the website, and which to ignore. It is generally to keep private or duplicate or unimportant information from appearing in search results. It can also be used to help robots identify the sitemap of a website.

How robots.txt can interfere with DubBot

Depending on the way the robots.txt file is set up on a website, it can prevent DubBot from crawling pages that need to be checked for Accessibility.

When to check your Site settings

  • If a crawl of a site states that there are 0 pages found and you know there are pages.

  • If a crawl of a site doesn’t return all of the pages you expected.

  • If you know that a site has pages blocked by a robots.txt file and you would like them crawled and checked for accessibility.

  • If you receive a message from DubBot about your robots.txt file:

DubBot error message: This site is blocked by robots.txt

What to do

The settings for the robots.txt are per site. Learn about modifying the robots.txt setting in the Advanced Settings for Sites article.

  • If you want DubBot to ignore all of the robots.txt file’s directives, uncheck the Obey robots.txt box.

  • If you want to give DubBot permission to crawl some but not all of a site’s data, review DubBot Crawler information article and edit your robots.txt file to allow specific permissions for DubBot’s crawler.

References

Did this answer your question?