Skip to Content
πŸŽ‰ Scout Docs 2.0 is here!

Querying Data

Learn how to search and retrieve data from your collections using semantic search, hybrid search and advanced filtering.

Search Types

Scout Collections support multiple search modes:

Search TypeHow It WorksBest For
Semantic (Vector)Matches by meaning using embeddingsNatural language queries, finding related concepts
Keyword (BM25)Matches exact keywordsSpecific terms, product names, codes
HybridCombines both approachesGeneral-purpose search, mixed queries

Semantic search converts your query into a vector embedding and finds documents with similar vectors. This means it can find relevant content even when exact words don’t match.

Example:

Query: "how do I reset my password" Finds: Documents about "password recovery", "account access", "login issues"

Best for:

  • Natural language queries
  • Finding conceptually similar content
  • Synonym-rich domains
  • Conversational interfaces

Keyword search uses BM25 to find documents containing exact keyword matches. This is traditional full-text search.

Example:

Query: "API_KEY_12345" Finds: Only documents containing "API_KEY_12345" exactly

Best for:

  • Exact term matching
  • Product codes, SKUs
  • Technical identifiers
  • Precise searches

Hybrid search combines semantic and keyword approaches using Reciprocal Rank Fusion (RRF). It gets the best of both worlds.

Example:

Query: "React hooks tutorial" Finds: - Documents with "React", "hooks", "tutorial" (keyword matches) - Documents about "state management in React" (semantic matches)

Best for:

  • General-purpose search
  • Mixed query styles
  • Maximum coverage
  • Most production applications

Querying via Workflows

Use the Query Collection Table block in your workflows to search collections.

Configuration

ParameterDescriptionDefault
CollectionSelect your collectionRequired
TableSelect the table to queryRequired
Search TermYour query (supports Jinja templating)Required
Minimum SimilarityThreshold for relevance (0-1)0.35
Hybrid SearchEnable combined searchfalse
AlphaWeight for semantic vs keyword (0-1)0.5
FiltersFilter results by metadataOptional
LimitMaximum results to return10

Example Workflow Query

Collection: Knowledge Base Table: Documentation Search Term: "{{inputs.user_question}}" Minimum Similarity: 0.5 Hybrid Search: true Alpha: 0.5 Limit: 10

Using Query Results

The block returns an array of results that you can use in subsequent blocks:

[ { "details": { "vector_distance": 0.15, "hybrid_score": null }, "record": { "id": "doc_abc123", "attributes": { "title": "Getting Started Guide", "content": "This guide walks you through...", "category": "tutorial", "url": "https://docs.example.com/getting-started" } } } ]

Access in later blocks with Jinja:

{{ query_results.output[0].record.attributes.title }}

Use with Agents

1) Enable Tools

In the agent’s Tools tab, enable the Collections query capability.

2) Add Instruction Snippet

For questions that require internal knowledge: 1. Query Collections before answering. 2. Start with hybrid search using `min_similarity: 0.5` and `alpha: 0.5`. 3. If results are noisy, increase `min_similarity`. 4. If the user gives constraints like category, date or status, apply metadata filters. 5. Return a concise answer, then include the key supporting records.

3) Prompt Examples

  • β€œFind troubleshooting steps for SSO login failures from the IT docs table.”
  • β€œSearch only category = policy and summarize PTO policy changes since Jan. 1.”
  • β€œQuery the sales enablement table for pricing objection handling and give me three approved responses.”

4) Expected Behavior

  • The agent chooses semantic or hybrid search based on the prompt
  • The agent narrows scope with filters instead of broad retries
  • The agent returns grounded answers based on retrieved records
  • The agent states when evidence is weak or missing

Querying via API

Basic Query

curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "search_term": "customer support", "min_similarity": 0.5, "limit": 10 }'

Hybrid Search Query

curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "search_term": "React hooks tutorial", "min_similarity": 0.5, "hybrid_search": true, "alpha": 0.5, "limit": 10 }'

Using Python SDK

from scoutos import Scout client = Scout(api_key="YOUR_API_KEY") results = client.tables.query( collection_id="col_abc123", table_id="tab_xyz789", search_term="customer support", min_similarity=0.5, hybrid_search=True, limit=10 ) for result in results: print(f"Title: {result['record']['attributes']['title']}") print(f"Distance: {result['details']['vector_distance']}")

Using TypeScript SDK

import { ScoutClient } from "scoutos"; const client = new ScoutClient({ apiKey: "YOUR_API_KEY" }); const results = await client.tables.query({ collectionId: "col_abc123", tableId: "tab_xyz789", searchTerm: "customer support", minSimilarity: 0.5, hybridSearch: true, limit: 10 }); results.forEach(result => { console.log(`Title: ${result.record.attributes.title}`); });

Advanced Filtering

Filters let you narrow down search results based on metadata columns. Filters use a JSON array format:

["column_id", "operator", "value"]

Available Operators

OperatorDescriptionExample
EqEqual to["status", "Eq", "active"]
NotEqNot equal to["status", "NotEq", "archived"]
InIn a list of values["category", "In", ["tutorial", "guide"]]
NotInNot in a list["category", "NotIn", ["draft", "archived"]]
GtGreater than["price", "Gt", 50]
GteGreater than or equal["views", "Gte", 1000]
LtLess than["created_at", "Lt", 1704067200]
LteLess than or equal["stock", "Lte", 10]
GlobPattern match (case-sensitive)["url", "Glob", "*/docs/*"]
IGlobPattern match (case-insensitive)["title", "IGlob", "*quick*"]
AndAll filters must match["And", [[...], [...]]]
OrAny filter must match["Or", [[...], [...]]]

Filter Examples

Single Filter:

["category", "Eq", "tutorial"]

Date Range:

["And", [ ["created_at", "Gte", 1672531200], ["created_at", "Lte", 1704067200] ]]

Multiple Categories:

["category", "In", ["tutorial", "guide", "reference"]]

Text Contains:

["title", "IGlob", "*getting started*"]

Combined Conditions:

["And", [ ["category", "Eq", "tutorial"], ["difficulty", "In", ["beginner", "intermediate"]], ["created_at", "Gte", 1704067200] ]]

Using Filters in Workflows

Apply filters dynamically using Jinja templating:

Filters: ["category", "Eq", "{{inputs.category}}"]

For complex dynamic filters:

{% if inputs.show_archived %} ["category", "Eq", "{{inputs.category}}"] {% else %} ["And", [[\"category\", \"Eq\", \"{{inputs.category}}\"], [\"status\", \"NotEq\", \"archived\"]]] {% endif %}

Minimum Similarity Threshold

The minimum similarity threshold filters out results that aren’t relevant enough.

Scale:

ValueBehavior
0.0 - 0.4Broad results, includes marginal matches
0.5 - 0.7Balanced precision and recall
0.8 - 1.0Strict, only highly relevant results

Choosing a Threshold:

  • 0.3-0.4 for exploratory searches, finding all potentially relevant content
  • 0.5-0.6 for general-purpose knowledge bases (recommended default)
  • 0.7-0.8 for technical documentation where precision matters
  • 0.8+ for finding exact or near-exact matches

The alpha parameter controls the balance between semantic and keyword search in hybrid mode.

Scale:

AlphaBehavior
0.0Pure keyword search (BM25 only)
0.3Mostly keyword, some semantic boost
0.5Balanced (default)
0.7Mostly semantic, some keyword precision
1.0Pure semantic search

When to Adjust:

  • Lower alpha (0.2-0.4) for technical docs with specific terms
  • Medium alpha (0.5) for general knowledge bases
  • Higher alpha (0.7-0.9) for natural language queries and content discovery

Best Practices

1. Start with Defaults

Begin with:

  • min_similarity: 0.5
  • hybrid_search: true
  • alpha: 0.5

Adjust based on your results.

Hybrid search gives the best results for most use cases. It catches both exact matches and semantically related content.

3. Leverage Filters

Use metadata filters to:

  • Narrow results to relevant categories
  • Filter by date ranges
  • Exclude unwanted status values

4. Optimize Your Content

For better search results:

  • Write comprehensive, descriptive content
  • Use clear headings and structure
  • Include relevant keywords naturally
  • Keep metadata accurate and consistent

5. Test with Real Queries

Use actual user queries to tune:

  • Similarity thresholds
  • Alpha values
  • Filter configurations

Common Queries

Find Similar Documents

{ "search_term": "{{document.content}}", "min_similarity": 0.7, "limit": 5 }

Recent Tutorials

{ "search_term": "{{user_query}}", "filters": ["And", [ ["category", "Eq", "tutorial"], ["created_at", "Gte", {{thirty_days_ago}}] ]], "limit": 10 }

Exclude Drafts

{ "search_term": "{{user_query}}", "filters": ["status", "NotEq", "draft"], "limit": 10 }
{ "search_term": "{{user_query}}", "filters": ["category", "In", ["tutorial", "guide", "reference"]], "limit": 15 }

Next Steps


Built with ❀️ by Scout OS

Last updated on