Administrators can use Ignored Paths to ignore full folders or specific pages using this tool. This is commonly used for sites that contain legacy content that does not need to be tested and also for breaking a site into multiple site dashboards in DubBot.
Learn how to locate a Site's Settings panel for Modifying an existing Site's Settings.
To configure the Ignored Paths for a Site, select the Ignored Paths tab in the Site Settings section of the Site Settings panel.
To add a path, enter the full (root-relative) path to the page or folder that you want excluded from the crawl inventory.
Using the following link as an example:
To exclude the
2020 folder that resides in the
events folder, you should add the full path that follows the
.edu domain extension as part of the exclusion. That path would be:
/events/2020. This exclusion will only be for content within the
Select the Add Path (+) button after entering the path.
Alternatively, a CSV of paths to ignore can be uploaded using the Import CSV button. Refer to the article on CSV upload format for more information about formatting files for upload into DubBot.
Ignored Paths by Regular Expressions
More complicated exclusions can be created using regular expressions. Webpages or folders will be excluded from the site crawl based on the regular expression added. RegExs can be added one at a time or by using the Import CSV button.
If you need with this, reach out to DubBot support using the in-app chat or email email@example.com.
You can enable or disable the Ignore pages with URL queries checkbox. This setting is helpful if you find that your crawl is picking up a lot of extra query pages from your calendar, for example. Query pages are those that contain a ? in the URL. In that case, you would simply uncheck this box and re-run your crawl.