Skip to main content

Crawl Log

Review what occurred during a site's last crawl.

Updated today

​

Where to find the Crawl Log

Information on the Latest crawl for the site is found through the link at the bottom of the Site Overview panel on a site dashboard.

Following the link shown as a date beside Latest crawl leads to the Crawl Log.

The Site Overview panel highlighting the date of the latest crawl that links to the crawl log. .

Crawl Log Header Information

The Crawl Log header contains the following information:

  • Status - current status of Crawl > Crawling/ Processing / Done / Cancelled

  • Total URLs Identified - number of items the crawler tried to index

  • Pages Indexed- number of traditional web pages that were indexed, or inventoried in the crawl.

  • Assets Indexed - number of supplemental files inventoried in the crawl.

  • Retries - number of times the crawler attempted to crawl an asset and failed, so the asset was tried again

  • Errors - errors encountered by the crawler on this crawl

The numbers shown below these headings as buttons or links will direct you to a listing of those items in the Crawl Log page body below.

There are also arrows to the far left and right of the header to view the previous or next crawl's data.

Sample Crawl Log header showing the crawl's Status, Total URLs Identified, Pages Indexed, Assets Indexed, Retries and Errors.

Breadcrumb Filters

The Crawled URLs breadcrumb allows you to filter the information shown from the latest crawl:


​All URLs for a filter based on the URL state: (selectable in the Select a filter for URLs list)

  • Queued - pages currently in the Queue, waiting to be analyzed

  • Crawling - pages currently crawling

  • Processing - pages currently running through all DubBot checks

  • Crawled- pages that have been successfully crawled. When selecting Crawled, a secondary filter appears: All Types

    • Page - show only traditional web pages

    • Asset - show only PDF files

  • Error - pages that resulted in an error with the crawler, and error codes returned

  • Skipped - pages skipped by the crawler, that the crawler is not going to index, like 404 pages and pages not in the scope of the site

  • Ignored - URLs ignored by the crawler due to issues such as ignored paths or disabled query strings

Information per URL

Each Filtered URL list shows the following information:

  • URL

  • Status (hovering over these items will give more detail)

  • Response Code

  • Started and Updated date (time shows when hovering over the date)

Example of Crawl Log Panel

Learn more about the Site or Page Set Dashboard.


If you have questions, please reach out to our DubBot Support team via email at help@dubbot.com or via the blue chat bubble in the lower right corner of your screen. We are here to help!

Did this answer your question?