Web Search Block
Search the web and extract content from results
The Web Search block enables users to perform web searches and extract content from search engine results. It provides options for filtering results by time, including or excluding specific domains, and processing the extracted text into manageable chunks. This block is useful for gathering data from the web and integrating it into Scout workflows.
Configuration (Required)
Enter what you’re looking for. This is the main query used to search the web.
The maximum number of search results to process. The default value is 1
.
Filter search results by time range. The default value is Any time
. Options include:
- Any time
- Past hour
- Past 24 hours
- Past week
- Past month
- Past year
List of domains to include in the search results. The default is an empty list, which will include results from all domains.
List of domains to exclude from the search results. The default is an empty list, which will include results from all domains..
Toggle whether or not the extracted text is chunked into smaller sections. The default value is true
.
The strategy to use for splitting text when Split Page Text is enabled. The default value is Smart Splitter
.
The maximum number of results to return after processing the scraped content. This will default to the number of search results if not set or set to 0. The default value is 0
.
How to capture web pages: Thorough (processes everything including JavaScript, more complete but slower) or Quick (basic HTML only, faster). The default value is Quick
.
The minimum similarity score for a result to be considered relevant. Set to 0.0 to include all results. The default value is 0.0
.
The term to search for inside of the top search results. Defaults to the Search Engine Query if not provided. The default value is an empty string.
The method to use for extracting text from web pages. The default value is readability
.
Outputs
The block outputs a list of extracted web page results, each containing text, similarity score, canonical URL, and metadata.
Usage Context
Use this block to perform web searches and extract content from search engine results for integration into Scout workflows.
Best Practices
- Ensure that the query is specific to obtain relevant search results.
- Use the time filter to narrow down results to a specific time range if needed.
- Specify include or exclude domains to refine the search scope.
- Set an appropriate minimum similarity score to filter out less relevant results.
- Consider the content capture mode based on the need for thoroughness versus speed.