β
In this Article
Where to find the Crawl Log
Information on the Latest crawl for the site is found through the link at the bottom of the Site Overview panel on a site dashboard.
Following the link shown as a date beside Latest crawl leads to the Crawl Log.
Crawl Log Header Information
The Crawl Log header contains the following information:
Status - current status of Crawl > Crawling/ Processing / Done / Cancelled
Total URLs Identified - number of items the crawler tried to index
Pages Indexed- number of traditional web pages that were indexed, or inventoried in the crawl.
Assets Indexed - number of supplemental files inventoried in the crawl.
Retries - number of times the crawler attempted to crawl an asset and failed, so the asset was tried again
Errors - errors encountered by the crawler on this crawl
The numbers shown below these headings as buttons or links will direct you to a listing of those items in the Crawl Log page body below.
There are also arrows to the far left and right of the header to view the previous or next crawl's data.
Breadcrumb Filters
The Crawled URLs breadcrumb allows you to filter the information shown from the latest crawl:
βAll URLs for a filter based on the URL state: (selectable in the Select a filter for URLs list)
Queued - pages currently in the Queue, waiting to be analyzed
Crawling - pages currently crawling
Processing - pages currently running through all DubBot checks
Crawled- pages that have been successfully crawled. When selecting Crawled, a secondary filter appears: All Types
Page - show only traditional web pages
Asset - show only PDF files
Error - pages that resulted in an error with the crawler, and error codes returned
Skipped - pages skipped by the crawler, that the crawler is not going to index, like 404 pages and pages not in the scope of the site
Ignored - URLs ignored by the crawler due to issues such as ignored paths or disabled query strings
Information per URL
Each Filtered URL list shows the following information:
URL
Status (hovering over these items will give more detail)
Response Code
Started and Updated date (time shows when hovering over the date)
Learn more about the Site or Page Set Dashboard.
If you have questions, please reach out to our DubBot Support team via email at help@dubbot.com or via the blue chat bubble in the lower right corner of your screen. We are here to help!