Advanced Settings for Sites

Looking for basic information? Review Site Setup or Modifying an existing Site's Settings.

Page Preview/Analysis Options

Viewport

By default, the Viewport is set to Use Default Viewport, 1440x900.

For more options, select the Viewport dropdown. In the dropdown, you will find the following options:

Use Default Viewport : 1440 x 900
Desktop : 1920 x 1080
Macbook Air: 1280 X 800
Tablets
- iPad Pro : 834 x 1194
- iPad Air : 820 x 1180
- iPad Mini : 768 x 1024
- Galaxy Tab S7: 800 x 1280
Apple Smartphones
- iPhone 15 Plus/PRO MAX : 430 x 932
- iPhone 15 (PRO) : 393 x 852
- iPhone 13 Mini : 375 x 812
- iPhone 12/13/14 (PRO) : 390 x 844
- iPhone 11/XR : 414 x 896
- iPhone 5/SE : 320 x 568
Android Smartphones
- Samsung Galaxy S20 : 393 x 568
- Google Pixel 5 : 360 x 800
Custom Viewport

Custom Viewport

Selecting Custom Viewport in the Viewport dropdown will allow you to enter the exact pixel specification you need.

The following fields will appear when this option is selected :

Viewport Width - Pixels (up to 2560) the crawler will use for the viewport width. Enter only a whole number, no unit is required.
Viewport Height - Pixels (up to 1440) the crawler will use for the viewport height. Enter only a whole number, no unit is required.
Crawler Viewport Pixel Ratio/Scale Factor - The measure of how many physical pixels there are in a virtual inch of space on the device.
Crawl Using a Mobile Device and/or Touch Screen Device checkboxes - These provide you the option to choose the type of options used for the re-crawl needed after you adjust these settings.

The default values for the Viewport Width/Height and the Crawl Using a Mobile/Touch Screen checkboxes are taken from the last pre-set selection in the Viewport dropdown.

After adjusting the Viewport Width and/or Height, the site must be re-crawled for the data and viewports to be updated.

Page Preview/Analysis Options after choosing Custom Viewport in the Viewport dropdown. Additional fields include viewport width, height, and pixel ratio/scale factor. Also, checkboxes for Crawl using a Mobile and/or Touch Screen device.

Selector to click on page load

Refer to the Remove optional banner/popup content from displaying on webpages within DubBot article for more information about learning how to close out content sections like Cookie notifications, Survey popups, Alerts, etc., before any analysis occurs on the page.

Entries in this field produce a click action, which can be useful for expanding accordions or accepting cookies' terms of use.

Selector to remove matching elements from a cached page

This field is useful when there is some element that needs to be removed from the page to enhance the page preview experience where a JavaScript click action will not remove the content that may be making the page hard to view.

This is useful for removing elements that make viewing the cached copy of a page difficult inside the DubBot app. This is common for loader icons that do not close out on their own and Chat boxes that take up space or cover important page elements. Examples could include modals and overlays that appear over pages.

Delay processing (in seconds)

Admins can determine a set number of seconds for DubBot to wait between webpages being crawled. This is done to slow DubBot traffic to a web server for web servers that require a slower amount of activity.

Note: Setting any amount of delay (above 0 seconds) will also prevent DubBot from performing parallel crawls. Entering the amount of 1 second (or higher) will result in the application crawling pages one by one. The delay is entered in whole seconds. The maximum amount of time for a delay is 10 seconds.

Changing this setting will result in a slower crawl and analysis by DubBot. The recommendation is to start with 1 when updating this setting.

Scroll to the bottom of each page

If your site implements lazy loading, check the Scroll to the bottom of each page box. This ensures that all of your pages’ content is loaded before processing the page.

Advanced Crawler Configuration

Disable using proxy rewriting for URL

By default, DubBot uses proxy rewriting to display some content from a client’s site in the app. Sometimes, clients have a setup that doesn't allow this to work. A good, simple test if something isn’t displaying correctly in DubBot is to check this box and re-crawl the site.

Generally, disabling the proxy is fine, but there are two main reasons the proxy exists:

To show content (images, CSS) from HTTP sites (not as much of an issue currently, as HTTPS adoption is so high)
To access content when client servers have security settings that block our crawler.

Allow Crawler to check HTTPS and HTTP links

Check the Allow crawler to check HTTPS and HTTP links box if you have a mix of these protocols on pages in your site.

Allow Crawler to check links with or without www

Check the Allow crawler to check links with or without box if you have a mix of these URLs in links on your site.

Often, sites with many editors have a mix of link URLs for the same page; for example, a site could have links to both https://www.mysite.edu/contact_us and https://mysite.edu/contact_us. If the web server is configured to redirect the page from one to another, checking this box allows these domains to be considered as the same thing, so pages aren't duplicated in a site's inventory.

Additional Domains Allowed in Crawl

The Additional Domains allowed in Crawl field allows you to enter supplementary domains that may be needed in your site crawl. Some possible uses:

The site uses both a www and www1 subdomain for its pages.
The site utilizes a media server with a different URL to hold PDF files.

One full domain, including http(s)://, should be entered on each line.

Page load timeout (in seconds)

The Page load timeout (in seconds) field allows administrators to select how long it will take for a page to be timed-out. The default timeout is 60 seconds. This can be modified to be up to 120 seconds, using whole numbers for seconds.

For websites with slower loading webpages, increasing the Page load timeout can be necessary to ensure the webpages are properly inventoried, tested, and not reported as a page that timed out.

Delay after each page crawl (in seconds)

In this field enter the number of seconds (up to 10 ) the crawler will wait before crawling the next page or asset. Use the default value of 0 for parallel crawling. This sometimes needs to be adjusted due to settings on a site's server.

Custom Crawler User Agent

Setting a Customer Crawler User Agent is only for the advanced user. Refer to this article that outlines Example Custom User Agents.

Advanced Crawler Configuration options in the Advanced Settings tab of the Site Settings panel in DubBot app.

Crawler Authentication

DubBot must enable this option on the account. If you do not see this option in your Advanced tab, please contact DubBot's Support team via the app's chat, or you can email help@dubbot.com.

Enable crawler authentication check box.

If the Site you are crawling is behind a login, you can have our crawler access that content. Learn more about Crawling behind a login.

Crawler Authentication and Scheduling Options in the Advanced Settings tab of the Site Settings panel in DubBot app

Crawler Scheduling

Days between crawls

The default cadence for crawling and testing sites is every seven (7) days. To slow this cadence, update the Days between crawls field. DubBot requires whole numbers, and the cadence can be seven (7) or more days.

In this Article:

Enable PDF Testing

Obey Robots.txt