Web Page Scrape Block | Scout

The Web Page Scrape block enables users to scrape the content of a web page by making HTTP requests through the Browserless API. This block is useful for extracting HTML content while excluding specified elements, allowing users to dynamically interact with web pages and extract necessary information.

Configuration (Required)

URL to which the request will be made

codeRequired

The URL specifies the web page to be scraped. Ensure the URL is valid and accessible to successfully retrieve the content. This input supports Jinja templating for dynamic content.

Exclude Selectors

code

A comma-separated list of classes, ids, or tags to exclude from the scraped content. Use this to remove unwanted elements from the HTML output. This input supports Jinja templating for dynamic content.

See Workflow Logic & State > State Management for details on using dynamic variables in this block.

Outputs

The block outputs the HTML content of the web page after excluding specified selectors. This allows for further processing and integration within the workflow.

Usage Context

Use this block to scrape web pages and extract HTML content. Ensure that the URL is correctly specified and that any selectors to be excluded are accurately listed.

Best Practices

Verify the URL is accessible and correct: Ensure the URL is valid to avoid errors during the scraping process.
Use Jinja templating to dynamically construct URLs and exclude selectors: This allows for flexible and dynamic scraping operations.
Ensure the BROWSERLESS_API_KEY is configured in the environment: This is necessary for the block to function correctly.