Querying Scout Databases: Semantic and Hybrid Search

Querying is where Databases pay off. You’ve stored your documents, your embeddings are built, and now your agents and workflows need to retrieve the right information at the right time. Scout gives you three search modes — semantic, keyword, and hybrid — along with metadata filters and tunable thresholds so you can dial in exactly the results you need.

Query Modes

Mode	How It Works	Best For
Semantic (Vector)	Converts your query to an embedding and finds documents with similar vectors	Natural language questions, related concepts
Keyword (BM25)	Matches exact keywords using traditional full-text search	Product codes, SKUs, technical identifiers
Hybrid	Fuses both result sets using Reciprocal Rank Fusion (RRF)	General-purpose production search

Semantic Search

Semantic search converts your query into a vector embedding and returns documents whose embeddings are closest in vector space. It finds relevant content even when the user’s words don’t appear verbatim in the document.

Query: "how do I reset my password"
Finds: Documents about "password recovery", "account access", "login issues"

Use semantic search for conversational interfaces, synonym-rich content, and queries where users express intent in their own words.

Keyword Search (BM25)

Keyword search uses the BM25 algorithm to find documents containing exact keyword matches. It’s the same mechanism behind traditional full-text search.

Query: "API_KEY_12345"
Finds: Only documents containing "API_KEY_12345" exactly

Use keyword search when users search for specific identifiers, error codes, product names, or other terms where exact matching matters more than semantic similarity.

Hybrid Search

Hybrid search runs both a semantic pass and a keyword pass, then merges and re-ranks the results using Reciprocal Rank Fusion (RRF). You get the precision of keyword matching alongside the conceptual coverage of vector search in a single ranked result set.

Query: "React hooks tutorial"
Finds:
  - Documents with "React", "hooks", "tutorial" (keyword match)
  - Documents about "state management in React" (semantic match)

Hybrid search is the recommended default for most production applications. It handles mixed query styles — proper nouns blended with natural language — better than either mode alone.

Query Parameters

string

required

The query string. In workflow blocks this supports Jinja templating, for example {{inputs.user_question}}.

number

default:"0.35"

Minimum relevance threshold. Results below this score are excluded. Range is 0.0 to 1.0. See Tuning min_similarity for guidance.

integer

default:"10"

Maximum number of results to return.

boolean

default:"false"

When true, enables hybrid mode — results from semantic and keyword passes are fused using RRF.

number

default:"0.5"

Controls the balance between semantic and keyword search in hybrid mode. 0.0 = pure keyword, 1.0 = pure semantic. Ignored when hybrid_search is false.

array

Filter results by metadata column values. See Filtering by Metadata for the full syntax.

Querying via Workflow Blocks

Add a Query Database Table block to any workflow to search your data at runtime.

Configuration

Parameter	Description	Default
Database	The Database to query	Required
Table	The Table within the Database	Required
Search Term	Query string; supports Jinja templating	Required
Minimum Similarity	Relevance threshold (0–1)	`0.35`
Hybrid Search	Enable RRF fusion	`false`
Alpha	Semantic vs. keyword balance (0–1)	`0.5`
Filters	Metadata filter expression	Optional
Limit	Max results to return	`10`

Example Block Configuration

Database: Knowledge Base
Table: Documentation
Search Term: "{{inputs.user_question}}"
Minimum Similarity: 0.5
Hybrid Search: true
Alpha: 0.5
Limit: 10

Working with Query Results

The block returns an array of result objects. Each result contains a details object with relevance scores and a record object with the document’s fields:

[
  {
    "details": {
      "vector_distance": 0.15,
      "hybrid_score": 0.87
    },
    "record": {
      "id": "doc_abc123",
      "attributes": {
        "title": "Getting Started Guide",
        "content": "This guide walks you through...",
        "category": "tutorial",
        "url": "https://docs.example.com/getting-started"
      }
    }
  }
]

number

Distance from the query in vector space. Lower values indicate higher similarity. 0.0 is identical; 1.0 is completely unrelated. Present on all results.

number

Fused relevance score when hybrid_search: true. Higher is better. null for pure semantic queries.

string

The unique document identifier.

object

All metadata fields stored on the document, keyed by column name.

Access results in downstream blocks with Jinja:

{{ query_results.output[0].record.attributes.title }}

Handle empty results gracefully:

{% if query_results.output | length > 0 %}
  {{ query_results.output[0].record.attributes.content }}
{% else %}
  I couldn't find anything relevant. Try rephrasing your question.
{% endif %}

Querying via API

Basic Semantic Query

curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "search_term": "customer support",
    "min_similarity": 0.5,
    "limit": 10
  }'

Hybrid Search Query

curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "search_term": "React hooks tutorial",
    "min_similarity": 0.5,
    "hybrid_search": true,
    "alpha": 0.5,
    "limit": 10
  }'

Using the Python SDK

from scoutos import Scout

client = Scout(api_key="YOUR_API_KEY")

results = client.tables.query(
    collection_id="col_abc123",
    table_id="tab_xyz789",
    search_term="customer support",
    min_similarity=0.5,
    hybrid_search=True,
    limit=10
)

for result in results:
    print(f"Title: {result['record']['attributes']['title']}")
    print(f"Distance: {result['details']['vector_distance']}")

Using the TypeScript SDK

import { ScoutClient } from "scoutos";

const client = new ScoutClient({ apiKey: "YOUR_API_KEY" });

const results = await client.tables.query({
  collectionId: "col_abc123",
  tableId: "tab_xyz789",
  searchTerm: "customer support",
  minSimilarity: 0.5,
  hybridSearch: true,
  limit: 10
});

results.forEach(result => {
  console.log(`Title: ${result.record.attributes.title}`);
  console.log(`Score: ${result.details.hybrid_score}`);
});

Filtering by Metadata

Metadata filters let you narrow results using column values before or after the similarity ranking step. Filters use a JSON array with the format ["column_id", "operator", "value"].

Available Operators

Operator	Description	Example
`Eq`	Equal to	`["status", "Eq", "active"]`
`NotEq`	Not equal to	`["status", "NotEq", "archived"]`
`In`	Value is in a list	`["category", "In", ["tutorial", "guide"]]`
`NotIn`	Value is not in a list	`["category", "NotIn", ["draft", "archived"]]`
`Gt`	Greater than	`["price", "Gt", 50]`
`Gte`	Greater than or equal	`["views", "Gte", 1000]`
`Lt`	Less than	`["created_at", "Lt", 1704067200]`
`Lte`	Less than or equal	`["stock", "Lte", 10]`
`Glob`	Pattern match (case-sensitive)	`["url", "Glob", "/docs/"]`
`IGlob`	Pattern match (case-insensitive)	`["title", "IGlob", "quick start"]`
`And`	All sub-filters must match	`["And", [[...], [...]]]`
`Or`	Any sub-filter must match	`["Or", [[...], [...]]]`

Filter Examples

Single column filter:

["category", "Eq", "tutorial"]

Date range:

["And", [
  ["created_at", "Gte", 1672531200],
  ["created_at", "Lte", 1704067200]
]]

Multiple categories:

["category", "In", ["tutorial", "guide", "reference"]]

Case-insensitive title match:

["title", "IGlob", "*getting started*"]

Combined conditions:

["And", [
  ["category", "Eq", "tutorial"],
  ["difficulty", "In", ["beginner", "intermediate"]],
  ["created_at", "Gte", 1704067200]
]]

Using Filters in Workflow Blocks

Apply filters dynamically with Jinja templating:

Filters: ["category", "Eq", "{{inputs.category}}"]

For conditional filter logic:

{% if inputs.show_archived %}
["category", "Eq", "{{inputs.category}}"]
{% else %}
["And", [["category", "Eq", "{{inputs.category}}"], ["status", "NotEq", "archived"]]]
{% endif %}

Tuning min_similarity

The min_similarity threshold cuts off results below a relevance score. Start with 0.5 and adjust based on what you observe.

Value Range	Behavior
`0.0 – 0.4`	Broad — includes marginal and loosely related matches
`0.5 – 0.7`	Balanced — good default for most knowledge bases
`0.8 – 1.0`	Strict — only highly relevant, near-exact matches

Practical guidance:

Use 0.3–0.4 for exploratory search or content discovery.
Use 0.5–0.6 as a general-purpose default for production.
Use 0.7–0.8 for technical documentation where precision matters.
Use 0.8+ when you need near-exact semantic matches.

Tuning Alpha (Hybrid Mode)

The alpha parameter shifts the balance between semantic and keyword scoring in hybrid mode.

Alpha	Behavior
`0.0`	Pure keyword (BM25 only)
`0.3`	Mostly keyword with a semantic boost
`0.5`	Balanced — the recommended default
`0.7`	Mostly semantic with keyword precision
`1.0`	Pure semantic

When to adjust:

Lower alpha (0.2–0.4) for technical docs with specific identifiers and exact terms.
Medium alpha (0.5) for general knowledge bases — start here.
Higher alpha (0.7–0.9) for natural language conversations and content discovery.

Using Databases with Agents

1. Enable Databases Tools

In your agent’s Tools tab, enable the Databases query capability.

2. Add an Instruction Snippet

For questions that require internal knowledge:

Query Databases before answering.
Start with hybrid search using min_similarity: 0.5 and alpha: 0.5.
If results are noisy, increase min_similarity.
If the user gives constraints like category, date, or status, apply metadata filters.
Return a concise answer, then include the key supporting records.

3. Prompt Examples

“Find troubleshooting steps for SSO login failures from the IT docs table.”
“Search only category = policy and summarize PTO policy changes since Jan. 1.”
“Query the sales enablement table for pricing objection handling and give me three approved responses.”

Common Query Patterns

Find Similar Documents

{
  "search_term": "{{document.content}}",
  "min_similarity": 0.7,
  "limit": 5
}

Recent Tutorials Only

{
  "search_term": "{{user_query}}",
  "filters": ["And", [
    ["category", "Eq", "tutorial"],
    ["created_at", "Gte", 1704067200]
  ]],
  "limit": 10
}

Exclude Drafts

{
  "search_term": "{{user_query}}",
  "filters": ["status", "NotEq", "draft"],
  "limit": 10
}

Multi-Category Hybrid Search

{
  "search_term": "{{user_query}}",
  "hybrid_search": true,
  "filters": ["category", "In", ["tutorial", "guide", "reference"]],
  "limit": 15
}

Troubleshooting

Getting no results

Lower min_similarity — try 0.3 to cast a wider net.
Confirm your table has indexed documents by checking the row count in Scout Studio.
Try a simple, general search term to confirm the data is reachable.
Temporarily remove filters to check whether a filter condition is too restrictive.

Too many irrelevant results

Raise min_similarity to 0.6 or higher.
Add metadata filters to scope results to the right category or status.
Lower alpha toward 0.3 if your query uses specific terms that should match exactly.

Missing an obvious match

Confirm the document is in the correct table.
Check for typos in filter values — Eq and In operators are case-sensitive. Use IGlob for case-insensitive text matching.
Try hybrid search if you’ve been using semantic-only.
Re-sync your data source if the record was added recently and may not be indexed yet.

Low scores on relevant results vector_distance values above 0.5 generally indicate weak semantic alignment. This often means:

Your query phrasing doesn’t match how the content is written.
The content is too short or generic to embed well.
Consider enriching your documents with more descriptive text and re-syncing.

Best Practices

Start with defaults — min_similarity: 0.5, hybrid_search: true, alpha: 0.5. Adjust from there based on observed results.
Prefer hybrid search — it outperforms pure semantic or pure keyword for the vast majority of real-world queries.
Use metadata filters — scoping to category, date, or status dramatically improves precision without sacrificing recall within the relevant subset.
Write rich content — comprehensive, descriptive text in the content field produces better embeddings and more accurate retrieval.
Test with real queries — use actual user questions (not synthetic ones) to tune thresholds and filter configurations.

Next Steps

Databases Overview

Understand the Databases data model and when to use it.

Creating Databases

Set up schemas optimized for search quality.

Sources

Keep database data fresh with automated syncs.

​Query Modes

​Semantic Search

​Keyword Search (BM25)

​Hybrid Search

​Query Parameters

​Querying via Workflow Blocks

​Configuration

​Example Block Configuration

​Working with Query Results

​Querying via API

​Basic Semantic Query

​Hybrid Search Query

​Using the Python SDK

​Using the TypeScript SDK

​Filtering by Metadata

​Available Operators

​Filter Examples

​Using Filters in Workflow Blocks

​Tuning min_similarity

​Tuning Alpha (Hybrid Mode)

​Using Databases with Agents

​1. Enable Databases Tools

​2. Add an Instruction Snippet

​3. Prompt Examples

​Common Query Patterns

​Find Similar Documents

​Recent Tutorials Only

​Exclude Drafts

​Multi-Category Hybrid Search

​Troubleshooting

​Best Practices

​Next Steps

Databases Overview

Creating Databases

Sources

Query Modes

Semantic Search

Keyword Search (BM25)

Hybrid Search

Query Parameters

Querying via Workflow Blocks

Configuration

Example Block Configuration

Working with Query Results

Querying via API

Basic Semantic Query

Hybrid Search Query

Using the Python SDK

Using the TypeScript SDK

Filtering by Metadata

Available Operators

Filter Examples

Using Filters in Workflow Blocks

Tuning min_similarity

Tuning Alpha (Hybrid Mode)

Using Databases with Agents

1. Enable Databases Tools

2. Add an Instruction Snippet

3. Prompt Examples

Common Query Patterns

Find Similar Documents

Recent Tutorials Only

Exclude Drafts

Multi-Category Hybrid Search

Troubleshooting

Best Practices

Next Steps