Exa Webpage Contents Block | Scout

The Exa Webpage Contents block enables users to perform content scraping on specified URLs using the Exa API. It supports options for live crawling, retrieving full page text, and crawling subpages, making it useful for gathering webpage contents for analysis or processing within workflows.

Configuration (Required)

Exa API Key

stringRequired

Use Environment Variables to safely and securely access sensitive credentials in a Scout workflow.

The API key required to authenticate with the Exa service.

URLs to Crawl (comma-separated)

string

A comma-separated list of URLs to be crawled. This field supports Jinja templating for dynamic URL generation.

Return Full Page Text

boolean

Whether to return the full page text from the crawled URLs. The default value is true.

Livecrawl Option

string

Option to control live crawling behavior. Choices are ‘never’, ‘fallback’, ‘always’, or ‘auto’. The default value is never. This field supports Jinja templating for dynamic option selection.

Livecrawl Timeout (ms)

integer

The timeout duration for live crawling in milliseconds. The default value is 10000.

Number of Subpages to Crawl

integer

The number of subpages to crawl for each URL. The default value is 0.

Subpage Target Keyword

string

A keyword to target specific subpages during crawling. This field supports Jinja templating for dynamic keyword targeting.

See Workflow Logic & State > State Management for details on using dynamic variables in this block.

Outputs

The output is a JSON object containing the results of the content scraping, including any specified subpages and full page text if requested.

Usage Context

Use this block to scrape webpage contents for analysis or processing within workflows. It is particularly useful when needing to gather data from multiple URLs efficiently.

Best Practices

Ensure your Exa API Key is valid and has the necessary permissions.
Use Jinja templating to dynamically generate URLs and options based on workflow state.
Set appropriate livecrawl options and timeouts to balance between speed and data freshness.
Consider the number of subpages and target keywords to optimize the scraping process.