Querying Data
Learn how to search and retrieve data from your collections using semantic search, hybrid search and advanced filtering.
Search Types
Scout Collections support multiple search modes:
| Search Type | How It Works | Best For |
|---|---|---|
| Semantic (Vector) | Matches by meaning using embeddings | Natural language queries, finding related concepts |
| Keyword (BM25) | Matches exact keywords | Specific terms, product names, codes |
| Hybrid | Combines both approaches | General-purpose search, mixed queries |
Semantic Search
Semantic search converts your query into a vector embedding and finds documents with similar vectors. This means it can find relevant content even when exact words donβt match.
Example:
Query: "how do I reset my password"
Finds: Documents about "password recovery", "account access", "login issues"Best for:
- Natural language queries
- Finding conceptually similar content
- Synonym-rich domains
- Conversational interfaces
Keyword Search
Keyword search uses BM25 to find documents containing exact keyword matches. This is traditional full-text search.
Example:
Query: "API_KEY_12345"
Finds: Only documents containing "API_KEY_12345" exactlyBest for:
- Exact term matching
- Product codes, SKUs
- Technical identifiers
- Precise searches
Hybrid Search
Hybrid search combines semantic and keyword approaches using Reciprocal Rank Fusion (RRF). It gets the best of both worlds.
Example:
Query: "React hooks tutorial"
Finds:
- Documents with "React", "hooks", "tutorial" (keyword matches)
- Documents about "state management in React" (semantic matches)Best for:
- General-purpose search
- Mixed query styles
- Maximum coverage
- Most production applications
Querying via Workflows
Use the Query Collection Table block in your workflows to search collections.
Configuration
| Parameter | Description | Default |
|---|---|---|
| Collection | Select your collection | Required |
| Table | Select the table to query | Required |
| Search Term | Your query (supports Jinja templating) | Required |
| Minimum Similarity | Threshold for relevance (0-1) | 0.35 |
| Hybrid Search | Enable combined search | false |
| Alpha | Weight for semantic vs keyword (0-1) | 0.5 |
| Filters | Filter results by metadata | Optional |
| Limit | Maximum results to return | 10 |
Example Workflow Query
Collection: Knowledge Base
Table: Documentation
Search Term: "{{inputs.user_question}}"
Minimum Similarity: 0.5
Hybrid Search: true
Alpha: 0.5
Limit: 10Using Query Results
The block returns an array of results that you can use in subsequent blocks:
[
{
"details": {
"vector_distance": 0.15,
"hybrid_score": null
},
"record": {
"id": "doc_abc123",
"attributes": {
"title": "Getting Started Guide",
"content": "This guide walks you through...",
"category": "tutorial",
"url": "https://docs.example.com/getting-started"
}
}
}
]Access in later blocks with Jinja:
{{ query_results.output[0].record.attributes.title }}Use with Agents
1) Enable Tools
In the agentβs Tools tab, enable the Collections query capability.
2) Add Instruction Snippet
For questions that require internal knowledge:
1. Query Collections before answering.
2. Start with hybrid search using `min_similarity: 0.5` and `alpha: 0.5`.
3. If results are noisy, increase `min_similarity`.
4. If the user gives constraints like category, date or status, apply metadata filters.
5. Return a concise answer, then include the key supporting records.3) Prompt Examples
- βFind troubleshooting steps for SSO login failures from the IT docs table.β
- βSearch only
category = policyand summarize PTO policy changes since Jan. 1.β - βQuery the sales enablement table for pricing objection handling and give me three approved responses.β
4) Expected Behavior
- The agent chooses semantic or hybrid search based on the prompt
- The agent narrows scope with filters instead of broad retries
- The agent returns grounded answers based on retrieved records
- The agent states when evidence is weak or missing
Querying via API
Basic Query
curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"search_term": "customer support",
"min_similarity": 0.5,
"limit": 10
}'Hybrid Search Query
curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"search_term": "React hooks tutorial",
"min_similarity": 0.5,
"hybrid_search": true,
"alpha": 0.5,
"limit": 10
}'Using Python SDK
from scoutos import Scout
client = Scout(api_key="YOUR_API_KEY")
results = client.tables.query(
collection_id="col_abc123",
table_id="tab_xyz789",
search_term="customer support",
min_similarity=0.5,
hybrid_search=True,
limit=10
)
for result in results:
print(f"Title: {result['record']['attributes']['title']}")
print(f"Distance: {result['details']['vector_distance']}")Using TypeScript SDK
import { ScoutClient } from "scoutos";
const client = new ScoutClient({ apiKey: "YOUR_API_KEY" });
const results = await client.tables.query({
collectionId: "col_abc123",
tableId: "tab_xyz789",
searchTerm: "customer support",
minSimilarity: 0.5,
hybridSearch: true,
limit: 10
});
results.forEach(result => {
console.log(`Title: ${result.record.attributes.title}`);
});Advanced Filtering
Filters let you narrow down search results based on metadata columns. Filters use a JSON array format:
["column_id", "operator", "value"]Available Operators
| Operator | Description | Example |
|---|---|---|
Eq | Equal to | ["status", "Eq", "active"] |
NotEq | Not equal to | ["status", "NotEq", "archived"] |
In | In a list of values | ["category", "In", ["tutorial", "guide"]] |
NotIn | Not in a list | ["category", "NotIn", ["draft", "archived"]] |
Gt | Greater than | ["price", "Gt", 50] |
Gte | Greater than or equal | ["views", "Gte", 1000] |
Lt | Less than | ["created_at", "Lt", 1704067200] |
Lte | Less than or equal | ["stock", "Lte", 10] |
Glob | Pattern match (case-sensitive) | ["url", "Glob", "*/docs/*"] |
IGlob | Pattern match (case-insensitive) | ["title", "IGlob", "*quick*"] |
And | All filters must match | ["And", [[...], [...]]] |
Or | Any filter must match | ["Or", [[...], [...]]] |
Filter Examples
Single Filter:
["category", "Eq", "tutorial"]Date Range:
["And", [
["created_at", "Gte", 1672531200],
["created_at", "Lte", 1704067200]
]]Multiple Categories:
["category", "In", ["tutorial", "guide", "reference"]]Text Contains:
["title", "IGlob", "*getting started*"]Combined Conditions:
["And", [
["category", "Eq", "tutorial"],
["difficulty", "In", ["beginner", "intermediate"]],
["created_at", "Gte", 1704067200]
]]Using Filters in Workflows
Apply filters dynamically using Jinja templating:
Filters: ["category", "Eq", "{{inputs.category}}"]For complex dynamic filters:
{% if inputs.show_archived %}
["category", "Eq", "{{inputs.category}}"]
{% else %}
["And", [[\"category\", \"Eq\", \"{{inputs.category}}\"], [\"status\", \"NotEq\", \"archived\"]]]
{% endif %}Minimum Similarity Threshold
The minimum similarity threshold filters out results that arenβt relevant enough.
Scale:
| Value | Behavior |
|---|---|
| 0.0 - 0.4 | Broad results, includes marginal matches |
| 0.5 - 0.7 | Balanced precision and recall |
| 0.8 - 1.0 | Strict, only highly relevant results |
Choosing a Threshold:
- 0.3-0.4 for exploratory searches, finding all potentially relevant content
- 0.5-0.6 for general-purpose knowledge bases (recommended default)
- 0.7-0.8 for technical documentation where precision matters
- 0.8+ for finding exact or near-exact matches
Alpha Parameter (Hybrid Search)
The alpha parameter controls the balance between semantic and keyword search in hybrid mode.
Scale:
| Alpha | Behavior |
|---|---|
| 0.0 | Pure keyword search (BM25 only) |
| 0.3 | Mostly keyword, some semantic boost |
| 0.5 | Balanced (default) |
| 0.7 | Mostly semantic, some keyword precision |
| 1.0 | Pure semantic search |
When to Adjust:
- Lower alpha (0.2-0.4) for technical docs with specific terms
- Medium alpha (0.5) for general knowledge bases
- Higher alpha (0.7-0.9) for natural language queries and content discovery
Best Practices
1. Start with Defaults
Begin with:
min_similarity: 0.5hybrid_search: truealpha: 0.5
Adjust based on your results.
2. Use Hybrid Search
Hybrid search gives the best results for most use cases. It catches both exact matches and semantically related content.
3. Leverage Filters
Use metadata filters to:
- Narrow results to relevant categories
- Filter by date ranges
- Exclude unwanted status values
4. Optimize Your Content
For better search results:
- Write comprehensive, descriptive content
- Use clear headings and structure
- Include relevant keywords naturally
- Keep metadata accurate and consistent
5. Test with Real Queries
Use actual user queries to tune:
- Similarity thresholds
- Alpha values
- Filter configurations
Common Queries
Find Similar Documents
{
"search_term": "{{document.content}}",
"min_similarity": 0.7,
"limit": 5
}Recent Tutorials
{
"search_term": "{{user_query}}",
"filters": ["And", [
["category", "Eq", "tutorial"],
["created_at", "Gte", {{thirty_days_ago}}]
]],
"limit": 10
}Exclude Drafts
{
"search_term": "{{user_query}}",
"filters": ["status", "NotEq", "draft"],
"limit": 10
}Multi-Category Search
{
"search_term": "{{user_query}}",
"filters": ["category", "In", ["tutorial", "guide", "reference"]],
"limit": 15
}Next Steps
- Sources β Automate data syncing
- Creating Collections β Set up your schemas
- Workflows β Build AI-powered applications
Built with β€οΈ by Scout OS