Basic site setup configuration information can be found in the Site Setup article.
To access the Advanced settings for a site, navigate to a site under the Settings panel. Once a site is selected, click on Advanced at the top of the site-panel.
By default, sites will Obey robots.txt rules configured within a site. Administrators can choose whether to disobey robots.txt rules by deselecting the Obey robots.txt checkbox.
Days between crawls
The default cadence for crawling and testing sites is every 7 days. To slow this cadence, update the Days between crawls field. Note that DubBot requires whole numbers and the cadence can be 7 or more days.
Page load timeout (in seconds)
For websites with slower loading webpages, increasing the Page load timeout can be necessary to ensure the webpages are properly inventoried, tested, and not reported as a page that timed-out.
The Page load timeout (in seconds) field allows administrators to select how long it will take for a page to be timed-out. The default timeout is 60 seconds. This can be modified to be up to 120 seconds, using whole numbers for seconds.
Selector to click on page load
See the Remove optional, banner/popup content from displaying on webpages within DubBot article for more information about learning how to close out content sections like Cookie notifications, Survey popups, Alerts, etc.
Delay after each page crawl (in seconds)
Organizations can determine a set number of seconds for DubBot to wait between webpages being crawled. This is done to slow DubBot traffic to a web server for web servers that require a slower amount of activity.
Note: Setting any amount of delay (above 0 seconds) will also prevent DubBot from performing parallel crawls. Entering the amount of 1 second (or higher) will result in the application crawling pages one-by-one. The delay is entered in whole seconds. The maximum amount of time for a delay is 10 seconds.
Changing this setting will result in a slower crawl and analysis by DubBot. If updating this setting, the recommendation is to start with 1.
Ignored Paths is used to determine which folder paths will not be crawled while DubBot inventories a site. Administrators can ignore full folders or specific pages using this field.
This is commonly used for sites that contain legacy content that does not need to be tested and also for breaking a site into multiple site dashboards in DubBot.
To add a path, enter the full (root-relative) path to the page or folder that you want excluded from the crawl inventory.
Using the following link as an example:
To exclude the
2020 folder that resides in the
events folder, you should add the full path that follows the
.edu domain extension as part of the exclusion. That path would be:
/events/2020. This exclusion will only be for content within the
Select the Add Path (+) button after entering the path.
Alternatively, a CSV of paths to ignore can be uploaded using the Import CSV button. For more information regarding uploading via CSV of paths to ignore see the CSV upload format article.