Skip to main content
Querying is where Databases pay off. You’ve stored your documents, your embeddings are built, and now your agents and workflows need to retrieve the right information at the right time. Scout gives you three search modes — semantic, keyword, and hybrid — along with metadata filters and tunable thresholds so you can dial in exactly the results you need.

Query Modes

ModeHow It WorksBest For
Semantic (Vector)Converts your query to an embedding and finds documents with similar vectorsNatural language questions, related concepts
Keyword (BM25)Matches exact keywords using traditional full-text searchProduct codes, SKUs, technical identifiers
HybridFuses both result sets using Reciprocal Rank Fusion (RRF)General-purpose production search
Semantic search converts your query into a vector embedding and returns documents whose embeddings are closest in vector space. It finds relevant content even when the user’s words don’t appear verbatim in the document.
Query: "how do I reset my password"
Finds: Documents about "password recovery", "account access", "login issues"
Use semantic search for conversational interfaces, synonym-rich content, and queries where users express intent in their own words.

Keyword Search (BM25)

Keyword search uses the BM25 algorithm to find documents containing exact keyword matches. It’s the same mechanism behind traditional full-text search.
Query: "API_KEY_12345"
Finds: Only documents containing "API_KEY_12345" exactly
Use keyword search when users search for specific identifiers, error codes, product names, or other terms where exact matching matters more than semantic similarity. Hybrid search runs both a semantic pass and a keyword pass, then merges and re-ranks the results using Reciprocal Rank Fusion (RRF). You get the precision of keyword matching alongside the conceptual coverage of vector search in a single ranked result set.
Query: "React hooks tutorial"
Finds:
  - Documents with "React", "hooks", "tutorial" (keyword match)
  - Documents about "state management in React" (semantic match)
Hybrid search is the recommended default for most production applications. It handles mixed query styles — proper nouns blended with natural language — better than either mode alone.

Query Parameters

search_term
string
required
The query string. In workflow blocks this supports Jinja templating, for example {{inputs.user_question}}.
min_similarity
number
default:"0.35"
Minimum relevance threshold. Results below this score are excluded. Range is 0.0 to 1.0. See Tuning min_similarity for guidance.
limit
integer
default:"10"
Maximum number of results to return.
When true, enables hybrid mode — results from semantic and keyword passes are fused using RRF.
alpha
number
default:"0.5"
Controls the balance between semantic and keyword search in hybrid mode. 0.0 = pure keyword, 1.0 = pure semantic. Ignored when hybrid_search is false.
filters
array
Filter results by metadata column values. See Filtering by Metadata for the full syntax.

Querying via Workflow Blocks

Add a Query Database Table block to any workflow to search your data at runtime.

Configuration

ParameterDescriptionDefault
DatabaseThe Database to queryRequired
TableThe Table within the DatabaseRequired
Search TermQuery string; supports Jinja templatingRequired
Minimum SimilarityRelevance threshold (0–1)0.35
Hybrid SearchEnable RRF fusionfalse
AlphaSemantic vs. keyword balance (0–1)0.5
FiltersMetadata filter expressionOptional
LimitMax results to return10

Example Block Configuration

Database: Knowledge Base
Table: Documentation
Search Term: "{{inputs.user_question}}"
Minimum Similarity: 0.5
Hybrid Search: true
Alpha: 0.5
Limit: 10

Working with Query Results

The block returns an array of result objects. Each result contains a details object with relevance scores and a record object with the document’s fields:
[
  {
    "details": {
      "vector_distance": 0.15,
      "hybrid_score": 0.87
    },
    "record": {
      "id": "doc_abc123",
      "attributes": {
        "title": "Getting Started Guide",
        "content": "This guide walks you through...",
        "category": "tutorial",
        "url": "https://docs.example.com/getting-started"
      }
    }
  }
]
details.vector_distance
number
Distance from the query in vector space. Lower values indicate higher similarity. 0.0 is identical; 1.0 is completely unrelated. Present on all results.
details.hybrid_score
number
Fused relevance score when hybrid_search: true. Higher is better. null for pure semantic queries.
record.id
string
The unique document identifier.
record.attributes
object
All metadata fields stored on the document, keyed by column name.
Access results in downstream blocks with Jinja:
{{ query_results.output[0].record.attributes.title }}
Handle empty results gracefully:
{% if query_results.output | length > 0 %}
  {{ query_results.output[0].record.attributes.content }}
{% else %}
  I couldn't find anything relevant. Try rephrasing your question.
{% endif %}

Querying via API

Basic Semantic Query

curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "search_term": "customer support",
    "min_similarity": 0.5,
    "limit": 10
  }'

Hybrid Search Query

curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "search_term": "React hooks tutorial",
    "min_similarity": 0.5,
    "hybrid_search": true,
    "alpha": 0.5,
    "limit": 10
  }'

Using the Python SDK

from scoutos import Scout

client = Scout(api_key="YOUR_API_KEY")

results = client.tables.query(
    collection_id="col_abc123",
    table_id="tab_xyz789",
    search_term="customer support",
    min_similarity=0.5,
    hybrid_search=True,
    limit=10
)

for result in results:
    print(f"Title: {result['record']['attributes']['title']}")
    print(f"Distance: {result['details']['vector_distance']}")

Using the TypeScript SDK

import { ScoutClient } from "scoutos";

const client = new ScoutClient({ apiKey: "YOUR_API_KEY" });

const results = await client.tables.query({
  collectionId: "col_abc123",
  tableId: "tab_xyz789",
  searchTerm: "customer support",
  minSimilarity: 0.5,
  hybridSearch: true,
  limit: 10
});

results.forEach(result => {
  console.log(`Title: ${result.record.attributes.title}`);
  console.log(`Score: ${result.details.hybrid_score}`);
});

Filtering by Metadata

Metadata filters let you narrow results using column values before or after the similarity ranking step. Filters use a JSON array with the format ["column_id", "operator", "value"].

Available Operators

OperatorDescriptionExample
EqEqual to["status", "Eq", "active"]
NotEqNot equal to["status", "NotEq", "archived"]
InValue is in a list["category", "In", ["tutorial", "guide"]]
NotInValue is not in a list["category", "NotIn", ["draft", "archived"]]
GtGreater than["price", "Gt", 50]
GteGreater than or equal["views", "Gte", 1000]
LtLess than["created_at", "Lt", 1704067200]
LteLess than or equal["stock", "Lte", 10]
GlobPattern match (case-sensitive)["url", "Glob", "*/docs/*"]
IGlobPattern match (case-insensitive)["title", "IGlob", "*quick start*"]
AndAll sub-filters must match["And", [[...], [...]]]
OrAny sub-filter must match["Or", [[...], [...]]]

Filter Examples

Single column filter:
["category", "Eq", "tutorial"]
Date range:
["And", [
  ["created_at", "Gte", 1672531200],
  ["created_at", "Lte", 1704067200]
]]
Multiple categories:
["category", "In", ["tutorial", "guide", "reference"]]
Case-insensitive title match:
["title", "IGlob", "*getting started*"]
Combined conditions:
["And", [
  ["category", "Eq", "tutorial"],
  ["difficulty", "In", ["beginner", "intermediate"]],
  ["created_at", "Gte", 1704067200]
]]

Using Filters in Workflow Blocks

Apply filters dynamically with Jinja templating:
Filters: ["category", "Eq", "{{inputs.category}}"]
For conditional filter logic:
{% if inputs.show_archived %}
["category", "Eq", "{{inputs.category}}"]
{% else %}
["And", [["category", "Eq", "{{inputs.category}}"], ["status", "NotEq", "archived"]]]
{% endif %}

Tuning min_similarity

The min_similarity threshold cuts off results below a relevance score. Start with 0.5 and adjust based on what you observe.
Value RangeBehavior
0.0 – 0.4Broad — includes marginal and loosely related matches
0.5 – 0.7Balanced — good default for most knowledge bases
0.8 – 1.0Strict — only highly relevant, near-exact matches
Practical guidance:
  • Use 0.3–0.4 for exploratory search or content discovery.
  • Use 0.5–0.6 as a general-purpose default for production.
  • Use 0.7–0.8 for technical documentation where precision matters.
  • Use 0.8+ when you need near-exact semantic matches.

Tuning Alpha (Hybrid Mode)

The alpha parameter shifts the balance between semantic and keyword scoring in hybrid mode.
AlphaBehavior
0.0Pure keyword (BM25 only)
0.3Mostly keyword with a semantic boost
0.5Balanced — the recommended default
0.7Mostly semantic with keyword precision
1.0Pure semantic
When to adjust:
  • Lower alpha (0.2–0.4) for technical docs with specific identifiers and exact terms.
  • Medium alpha (0.5) for general knowledge bases — start here.
  • Higher alpha (0.7–0.9) for natural language conversations and content discovery.

Using Databases with Agents

1. Enable Databases Tools

In your agent’s Tools tab, enable the Databases query capability.

2. Add an Instruction Snippet

For questions that require internal knowledge:

1. Query Databases before answering.
2. Start with hybrid search using min_similarity: 0.5 and alpha: 0.5.
3. If results are noisy, increase min_similarity.
4. If the user gives constraints like category, date, or status, apply metadata filters.
5. Return a concise answer, then include the key supporting records.

3. Prompt Examples

  • “Find troubleshooting steps for SSO login failures from the IT docs table.”
  • “Search only category = policy and summarize PTO policy changes since Jan. 1.”
  • “Query the sales enablement table for pricing objection handling and give me three approved responses.”

Common Query Patterns

Find Similar Documents

{
  "search_term": "{{document.content}}",
  "min_similarity": 0.7,
  "limit": 5
}

Recent Tutorials Only

{
  "search_term": "{{user_query}}",
  "filters": ["And", [
    ["category", "Eq", "tutorial"],
    ["created_at", "Gte", 1704067200]
  ]],
  "limit": 10
}

Exclude Drafts

{
  "search_term": "{{user_query}}",
  "filters": ["status", "NotEq", "draft"],
  "limit": 10
}
{
  "search_term": "{{user_query}}",
  "hybrid_search": true,
  "filters": ["category", "In", ["tutorial", "guide", "reference"]],
  "limit": 15
}

Troubleshooting

Getting no results
  • Lower min_similarity — try 0.3 to cast a wider net.
  • Confirm your table has indexed documents by checking the row count in Scout Studio.
  • Try a simple, general search term to confirm the data is reachable.
  • Temporarily remove filters to check whether a filter condition is too restrictive.
Too many irrelevant results
  • Raise min_similarity to 0.6 or higher.
  • Add metadata filters to scope results to the right category or status.
  • Lower alpha toward 0.3 if your query uses specific terms that should match exactly.
Missing an obvious match
  • Confirm the document is in the correct table.
  • Check for typos in filter values — Eq and In operators are case-sensitive. Use IGlob for case-insensitive text matching.
  • Try hybrid search if you’ve been using semantic-only.
  • Re-sync your data source if the record was added recently and may not be indexed yet.
Low scores on relevant results vector_distance values above 0.5 generally indicate weak semantic alignment. This often means:
  • Your query phrasing doesn’t match how the content is written.
  • The content is too short or generic to embed well.
  • Consider enriching your documents with more descriptive text and re-syncing.

Best Practices

  1. Start with defaultsmin_similarity: 0.5, hybrid_search: true, alpha: 0.5. Adjust from there based on observed results.
  2. Prefer hybrid search — it outperforms pure semantic or pure keyword for the vast majority of real-world queries.
  3. Use metadata filters — scoping to category, date, or status dramatically improves precision without sacrificing recall within the relevant subset.
  4. Write rich content — comprehensive, descriptive text in the content field produces better embeddings and more accurate retrieval.
  5. Test with real queries — use actual user questions (not synthetic ones) to tune thresholds and filter configurations.

Next Steps

Databases Overview

Understand the Databases data model and when to use it.

Creating Databases

Set up schemas optimized for search quality.

Sources

Keep database data fresh with automated syncs.