Crawler

The crawl endpoint allows you to trigger a crawl of a website and stores the contents in a collection. You can enter a start url or sitemap.

POSThttps://api.scoutos.com/v1/collections/:collection_id/crawl

Crawl Website

This route triggers a crawl of a website and adds the webpage contents to a collection.

Path Params

Name
collection_id
Type
string
Description
The id of the collection to add the webpage contents to.

Required attributes

Name
sites
Type
array
Description
Array of the sites to crawl. It takes an array of objects with the shape:
start_url: string settings?: { "type": "sitemap" } metadata?: object
Settings is optional. Currently you only need to add it if you are scraping using a sitemap. If you are scraping from a sitemap then set the start_url to the sitemap url. Otherwise, we will crawl and discover the urls. Discovery take a bit of time right now, we are working on speeding this up.
You can also add a metadata object to the site object. This metadata will be added to the site object in the collection.This will allow you to filter by this metadata later when querying the collection.

Request

POST

https://api.scoutos.com/v1/collections/:collection_id/crawl

curl -X POST 'https://api.scoutos.com/v1/collections/:collection_id/crawl' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer SECRET_KEY' \
--data-raw '{"sites": [{"start_url": "https://scoutos.com"}]}'

Response

{
 "ok": true
}