All Collections
Setting Up DubBot
Sites
Advanced Settings for Sites
Advanced Settings for Sites
Advanced Settings can be used to configure the site crawler to accommodate website server and performance needs.
Blaine Herman avatar
Written by Blaine Herman
Updated over a week ago

Basic site setup configuration information can be found in the Site Setup article.

To configure the Advanced Settings for a Site, select the Advanced tab in the Site Settings section of the Site Settings panel.

Learn how to locate a Site's Settings panel for Modifying an existing Site's Settings.

Advanced button is highlighted in the Site Setting section of the Site Settings panel in DubBot app

Obey Robots.txt

By default, sites will obey robots.txt rules configured within a site.

Administrators can choose whether to disobey robots.txt rules by deselecting the Obey robots.txt checkbox.

Site Settings panel with the Obey robots.txt checkbox checked and highlighted in red.

Page load timeout (in seconds)

For websites with slower loading webpages, increasing the Page load timeout can be necessary to ensure the webpages are properly inventoried, tested, and not reported as a page that timed out.

The Page load timeout (in seconds) field allows administrators to select how long it will take for a page to be timed-out. The default timeout is 60 seconds. This can be modified to be up to 120 seconds, using whole numbers for seconds.

Site Settings page with the Page load timeout (in seconds) field highlighted in red.

Custom Crawler User Agent

Setting a Customer Crawler User Agent is a task for only the advanced user. Refer to this article that outlines Example Custom User Agents.

Days between crawls

The default cadence for crawling and testing sites is every seven (7) days. To slow this cadence, update the Days between crawls field. DubBot requires whole numbers, and the cadence can be seven (7) or more days.

Site Settings section  with Advanced button selected and  the Days between crawls field highlighted in red.

Selector to click on page load

Refer to the Remove optional banner/popup content from displaying on webpages within DubBot article for more information about learning how to close out content sections like Cookie notifications, Survey popups, Alerts, etc., before any analysis occurs on the page.

Entries in this field produce a click action which can be useful for expanding accordions or accepting cookies terms of use.

Page Preview/Analysis Options heading with Selector to click on page load Selector to remove matching elements from cached page fields highlighted in red.

Selector to remove matching elements from a cached page

This field is useful when there is some element that needs to be removed from the page to enhance the page preview experience where a javascript click action will not remove the content that may be making the page hard to view.

Useful for removing elements that make viewing the cached copy of a page difficult inside the DubBot app. This is common for loader icons that do not close out on their own and Chat boxes that take up space or cover important page elements. Examples could include modals and overlays that appear over pages.

Delay processing (in seconds)

Organizations can determine a set number of seconds for DubBot to wait between webpages being crawled. This is done to slow DubBot traffic to a web server for web servers that require a slower amount of activity.

Note: Setting any amount of delay (above 0 seconds) will also prevent DubBot from performing parallel crawls. Entering the amount of 1 second (or higher) will result in the application crawling pages one-by-one. The delay is entered in whole seconds. The maximum amount of time for a delay is 10 seconds.

Changing this setting will result in a slower crawl and analysis by DubBot. If updating this setting, the recommendation is to start with 1.

Site Settings panel with the Delay processing (in seconds) field highlighted in red.

Disable using proxy rewriting for URL

By default, DubBot uses proxy rewriting to display some content from a client’s site in the app. Sometimes, clients have a setup that doesn't allow this to work. A good, simple test if something isn’t loading correctly in a page is to check this box and recrawl the site.

Scroll to the bottom of each page

Check this box if your site implements lazy loading. This ensures that all of your pages’ content is loaded before the crawler gets to work.

More on the Site Settings panel

Did this answer your question?