Many websites contain a robots.txt file. It is a set of instructions to search engines (like Google) on which files/folders/URLs they should access on the website, and which to ignore. It is generally to keep private, duplicate or unimportant content from appearing in search results. It can also be used to help robots identify the sitemap of a website.
How robots.txt can interfere with DubBot
Depending on the way the robots.txt file is set up on a website, it can prevent DubBot from crawling pages that need to be checked for Accessibility.
When to check your Site settings
If a site crawl states that 0 pages were found, but you know there are pages.
If a crawl of a site doesn’t return all of the pages you expected.
If you know that a site has pages blocked by a robots.txt file and you would like them crawled and checked for accessibility.
If you receive a message from DubBot about your robots.txt file:
What to do
To have the crawler ignore all of the robots.txt file’s directives for a site, uncheck the Obey robots.txt box in the Advanced Site Setting tab of a Site's Settings.
If you want to give DubBot permission to crawl some but not all of a site’s data, review DubBot Crawler information article and edit your robots.txt file to allow specific permissions for DubBot’s crawler.