Ignored Paths for Specific Sites

Used to determine which folder paths will not be crawled while DubBot inventories a site.

Updated over a week ago

Administrators can use Ignored Paths to ignore full folders or specific pages using this tool. This is commonly used for sites that contain legacy content that does not need to be tested and also for breaking a site into multiple site dashboards in DubBot.

Learn how to locate a Site's Settings panel for Modifying an existing Site's Settings.

Ignored Paths

To configure the Ignored Paths for a Site, select the Ignored Paths tab in the Site Settings section of the Site Settings panel.

Ignored Paths button is highlighted in the Site Setting section of the DubBot app

To add a path, enter the full (root-relative) path to the page or folder that you want excluded from the crawl inventory.

Use the following link as an example: https://www.benson.edu/events/2020.

To exclude the 2020 folder that resides in the events folder, you should add the full path that follows the .edu domain extension as part of the exclusion. That path would be: /events/2020. This exclusion will only be for content within the 2020 folder.

Select the Add Path (+) button after entering the path.

Ignored Paths field is highlighted in the Ignored Paths button

Alternatively, a CSV of paths to ignore can be uploaded using the Import CSV button. Refer to the article on CSV upload format for more information about formatting files for upload into DubBot.

Import CSV button is highlighted in the Ignored Paths button

Ignored Paths by Regular Expressions

More complicated exclusions can be created using regular expressions. Webpages or folders will be excluded from the site crawl based on the regular expression added. RegExs can be added one at a time or by using the Import CSV button.

If you need with this, reach out to DubBot support using the in-app chat or email help@dubbot.com.

You can enable or disable the Ignore pages with URL queries checkbox. This setting is helpful if you find that your crawl is picking up a lot of extra query pages from your calendar, for example. Query pages are those that contain a ? in the URL. In that case, you would simply uncheck this box and re-run your crawl.

The Ignored Path by Regular Expression box with the "Ignored pages with URL queries" checkbox checked.

More on the Site Settings panel

Did this answer your question?