What is robots.txt and why do I need to update my site settings?

Many websites contain a robots.txt file. It is a set of instructions to search engines (like Google) on which files/folders/URLs they should access on the website, and which to ignore. It is generally to keep private, duplicate or unimportant content from appearing in search results. It can also be used to help robots identify the sitemap of a website.

How robots.txt can interfere with DubBot

Depending on the way the robots.txt file is set up on a website, it can prevent DubBot from crawling pages that need to be checked for Accessibility.

When to check your Site settings

If a site crawl states that 0 pages were found, but you know there are pages.
If a crawl of a site doesn’t return all of the pages you expected.
If you know that a site has pages blocked by a robots.txt file and you would like them crawled and checked for accessibility.
If you receive a message from DubBot about your robots.txt file:

DubBot error message: This site is blocked by robots.txt

What to do

To have the crawler ignore all of the robots.txt file’s directives for a site, uncheck the Obey robots.txt box in the Advanced Site Setting tab of a Site's Settings.

Site Settings Panel highlighting the Advanced tab and the Obey robots.txt check box.

If you want to give DubBot permission to crawl some but not all of a site’s data, review DubBot Crawler information article and edit your robots.txt file to allow specific permissions for DubBot’s crawler.

More Information

If you have questions, please contact our DubBot Support team via email at help@dubbot.com or via the blue chat bubble in the lower right corner of your screen. We are here to help!