Table of Contents
To configure the Advanced Settings for a Site, select the Advanced tab in the Site Settings section of the Site Settings panel.
Enable PDF Testing
Select box to Enable PDF checking on the site.
By default, sites will obey robots.txt rules configured within a site.
Disobey robots.txt rules by deselecting the Obey robots.txt checkbox.
Page Preview/Analysis Options
By default, the Viewport is set to Use Default Viewport, 1440x900.
For more options, select the Viewport dropdown. In the dropdown you will find the following options:
Use Default Viewport : 1440 x 900
Desktop : 1920 x 1080
Macbook Air: 1280 X 800
iPad Pro : 834 x 1194
iPad Air : 820 x 1180
iPad Mini : 768 x 1024
Galaxy Tab S7: 800 x 1280
iPhone 15 Plus/PRO MAX : 430 x 932
iPhone 15 (PRO) : 393 x 852
iPhone 13 Mini : 375 x 812
iPhone 12/13/14 (PRO) : 390 x 844
iPhone 11/XR : 414 x 896
iPhone 5/SE : 320 x 568
Samsung Galaxy S20 : 393 x 568
Google Pixel 5 : 360 x 800
Selecting Custom Viewport in the Viewport dropdown will allow you to enter the exact pixel specification you need.
The following fields will appear with this option is selected :
Viewport Width - Pixels (up to 2560) the crawler will use for the viewport width. Enter only a whole number, no unit is required.
Viewport Height - Pixels (up to 1440) the crawler will use for the viewport height. Enter only a whole number, no unit is required.
Crawler Viewport Pixel Ratio/Scale Factor - The measure of how many physical pixels there are in a virtual inch of space on the device.
Crawl Using a Mobile Device and/or Touch Screen Device checkboxes - These provide you the option to choose the type of options used for the re-crawl needed after you adjust these settings.
The default values for the Viewport Width/Height and the Crawl Using a Mobile/Touch Screen checkboxes are taken from the last pre-set selection in the Viewport dropdown.
Selector to click on page load
Refer to the Remove optional banner/popup content from displaying on webpages within DubBot article for more information about learning how to close out content sections like Cookie notifications, Survey popups, Alerts, etc., before any analysis occurs on the page.
Selector to remove matching elements from a cached page
Useful for removing elements that make viewing the cached copy of a page difficult inside the DubBot app. This is common for loader icons that do not close out on their own and Chat boxes that take up space or cover important page elements. Examples could include modals and overlays that appear over pages.
Delay processing (in seconds)
Admins can determine a set number of seconds for DubBot to wait between webpages being crawled. This is done to slow DubBot traffic to a web server for web servers that require a slower amount of activity.
Note: Setting any amount of delay (above 0 seconds) will also prevent DubBot from performing parallel crawls. Entering the amount of 1 second (or higher) will result in the application crawling pages one-by-one. The delay is entered in whole seconds. The maximum amount of time for a delay is 10 seconds.
Changing this setting will result in a slower crawl and analysis by DubBot. If updating this setting, the recommendation is to start with 1.
Scroll to the bottom of each page
Check the Scroll to the bottom of each page box if your site implements lazy loading. This ensures that all of your pages’ content is loaded before the crawler gets to work.
Advanced Crawler Configuration
Disable using proxy rewriting for URL
By default, DubBot uses proxy rewriting to display some content from a client’s site in the app. Sometimes, clients have a setup that doesn't allow this to work. A good, simple test if something isn’t loading correctly in a page is to check this box and recrawl the site.
Allow crawler to check HTTPS and HTTP links
Check the Allow crawler to check HTTPS and HTTP links box if you have a mix of these protocols on pages in your site.
Page load timeout (in seconds)
The Page load timeout (in seconds) field allows administrators to select how long it will take for a page to be timed-out. The default timeout is 60 seconds. This can be modified to be up to 120 seconds, using whole numbers for seconds.
For websites with slower loading webpages, increasing the Page load timeout can be necessary to ensure the webpages are properly inventoried, tested, and not reported as a page that timed out.
Delay after each page crawl (in seconds)
In this field enter the number of seconds (up to 10 ) the crawler will wait before crawling the next page or asset. Use the default value of 0 for parallel crawling. This sometimes needs to be adjusted due to settings on a site's server.
Custom Crawler User Agent
Setting a Customer Crawler User Agent is a task for only the advanced user. Refer to this article that outlines Example Custom User Agents.
Enable crawler authentication check box.
If the Site you are crawling is behind a login, you have the option of having our crawler access that content. Learn more about Crawling behind a login.
Days between crawls
The default cadence for crawling and testing sites is every seven (7) days. To slow this cadence, update the Days between crawls field. DubBot requires whole numbers, and the cadence can be seven (7) or more days.