Web Search Block

Search the web and extract content from results

The Web Search block enables users to perform web searches and extract content from search engine results. It provides options for filtering results by time, including or excluding specific domains, and processing the extracted text into manageable chunks. This block is useful for gathering data from the web and integrating it into Scout workflows.

Configuration (Required)

Search Engine Query
stringRequired

Enter what you’re looking for. This is the main query used to search the web.

Search Results To Scrape
integer

The maximum number of search results to process. The default value is 1.

Time Filter
string

Filter search results by time range. The default value is Any time. Options include:

  • Any time
  • Past hour
  • Past 24 hours
  • Past week
  • Past month
  • Past year
Include Domains
list

List of domains to include in the search results. The default is an empty list, which will include results from all domains.

Exclude Domains
list

List of domains to exclude from the search results. The default is an empty list, which will include results from all domains..

Split Page Text
boolean

Toggle whether or not the extracted text is chunked into smaller sections. The default value is true.

Splitter Strategy
string

The strategy to use for splitting text when Split Page Text is enabled. The default value is Smart Splitter.

Max Results to Return
integer

The maximum number of results to return after processing the scraped content. This will default to the number of search results if not set or set to 0. The default value is 0.

Content Capture Mode
string

How to capture web pages: Thorough (processes everything including JavaScript, more complete but slower) or Quick (basic HTML only, faster). The default value is Quick.

Minimum Similarity Score
float

The minimum similarity score for a result to be considered relevant. Set to 0.0 to include all results. The default value is 0.0.

Page Search Term
string

The term to search for inside of the top search results. Defaults to the Search Engine Query if not provided. The default value is an empty string.

Text Extractor
string

The method to use for extracting text from web pages. The default value is readability.

See Workflow Logic & State > State Management for details on using dynamic variables in this block.

Outputs

The block outputs a list of extracted web page results, each containing text, similarity score, canonical URL, and metadata.

Usage Context

Use this block to perform web searches and extract content from search engine results for integration into Scout workflows.

Best Practices

  • Ensure that the query is specific to obtain relevant search results.
  • Use the time filter to narrow down results to a specific time range if needed.
  • Specify include or exclude domains to refine the search scope.
  • Set an appropriate minimum similarity score to filter out less relevant results.
  • Consider the content capture mode based on the need for thoroughness versus speed.