Many websites contain a robots.txt file. It is a set of instructions to search engines (like Google) on which files/folders/urls they should access on the website, and which to ignore. It is generally to keep private or duplicate or unimportant information from appearing in search results. It can also be used to help robots identify the sitemap of a website.
How robots.txt can interfere with DubBot
Depending on the way the robots.txt file is set up on a website, it can prevent DubBot from crawling pages that need to be checked for Accessibility.
When to check your Site settings
If a crawl of a site states that there are 0 pages found and you know there are pages.
If a crawl of a site doesn’t return all of the pages you expected.
If you know that a site has pages blocked by a robots.txt file and you would like them crawled and checked for accessibility.
If you receive a message from DubBot about your robots.txt file:
What to do
The settings for the robots.txt are per site. Learn about modifying the robots.txt setting in the Advanced Settings for Sites article.
If you want DubBot to ignore all of the robots.txt file’s directives, uncheck the Obey robots.txt box.
If you want to give DubBot permission to crawl some but not all of a site’s data, review DubBot Crawler information article and edit your robots.txt file to allow specific permissions for DubBot’s crawler.