> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scoutos.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Querying Scout Databases: Semantic and Hybrid Search

> Search Scout Databases using semantic or hybrid mode. Filter by metadata and tune similarity thresholds for precise agent data retrieval.

Querying is where Databases pay off. You've stored your documents, your embeddings are built, and now your agents and workflows need to retrieve the right information at the right time. Scout gives you three search modes — semantic, keyword, and hybrid — along with metadata filters and tunable thresholds so you can dial in exactly the results you need.

## Query Modes

| Mode                  | How It Works                                                                 | Best For                                     |
| --------------------- | ---------------------------------------------------------------------------- | -------------------------------------------- |
| **Semantic (Vector)** | Converts your query to an embedding and finds documents with similar vectors | Natural language questions, related concepts |
| **Keyword (BM25)**    | Matches exact keywords using traditional full-text search                    | Product codes, SKUs, technical identifiers   |
| **Hybrid**            | Fuses both result sets using Reciprocal Rank Fusion (RRF)                    | General-purpose production search            |

### Semantic Search

Semantic search converts your query into a vector embedding and returns documents whose embeddings are closest in vector space. It finds relevant content even when the user's words don't appear verbatim in the document.

```yaml theme={null}
Query: "how do I reset my password"
Finds: Documents about "password recovery", "account access", "login issues"
```

Use semantic search for conversational interfaces, synonym-rich content, and queries where users express intent in their own words.

### Keyword Search (BM25)

Keyword search uses the BM25 algorithm to find documents containing exact keyword matches. It's the same mechanism behind traditional full-text search.

```yaml theme={null}
Query: "API_KEY_12345"
Finds: Only documents containing "API_KEY_12345" exactly
```

Use keyword search when users search for specific identifiers, error codes, product names, or other terms where exact matching matters more than semantic similarity.

### Hybrid Search

Hybrid search runs both a semantic pass and a keyword pass, then merges and re-ranks the results using Reciprocal Rank Fusion (RRF). You get the precision of keyword matching alongside the conceptual coverage of vector search in a single ranked result set.

```yaml theme={null}
Query: "React hooks tutorial"
Finds:
  - Documents with "React", "hooks", "tutorial" (keyword match)
  - Documents about "state management in React" (semantic match)
```

**Hybrid search is the recommended default for most production applications.** It handles mixed query styles — proper nouns blended with natural language — better than either mode alone.

## Query Parameters

<ParamField body="search_term" type="string" required>
  The query string. In workflow blocks this supports Jinja templating, for example `{{inputs.user_question}}`.
</ParamField>

<ParamField body="min_similarity" default="0.35" type="number">
  Minimum relevance threshold. Results below this score are excluded. Range is `0.0` to `1.0`. See [Tuning min\_similarity](#tuning-min_similarity) for guidance.
</ParamField>

<ParamField body="limit" default="10" type="integer">
  Maximum number of results to return.
</ParamField>

<ParamField body="hybrid_search" default="false" type="boolean">
  When `true`, enables hybrid mode — results from semantic and keyword passes are fused using RRF.
</ParamField>

<ParamField body="alpha" default="0.5" type="number">
  Controls the balance between semantic and keyword search in hybrid mode. `0.0` = pure keyword, `1.0` = pure semantic. Ignored when `hybrid_search` is `false`.
</ParamField>

<ParamField body="filters" type="array">
  Filter results by metadata column values. See [Filtering by Metadata](#filtering-by-metadata) for the full syntax.
</ParamField>

## Querying via Workflow Blocks

Add a **Query Database Table** block to any workflow to search your data at runtime.

### Configuration

| Parameter              | Description                             | Default  |
| ---------------------- | --------------------------------------- | -------- |
| **Database**           | The Database to query                   | Required |
| **Table**              | The Table within the Database           | Required |
| **Search Term**        | Query string; supports Jinja templating | Required |
| **Minimum Similarity** | Relevance threshold (0–1)               | `0.35`   |
| **Hybrid Search**      | Enable RRF fusion                       | `false`  |
| **Alpha**              | Semantic vs. keyword balance (0–1)      | `0.5`    |
| **Filters**            | Metadata filter expression              | Optional |
| **Limit**              | Max results to return                   | `10`     |

### Example Block Configuration

```yaml theme={null}
Database: Knowledge Base
Table: Documentation
Search Term: "{{inputs.user_question}}"
Minimum Similarity: 0.5
Hybrid Search: true
Alpha: 0.5
Limit: 10
```

### Working with Query Results

The block returns an array of result objects. Each result contains a `details` object with relevance scores and a `record` object with the document's fields:

```json theme={null}
[
  {
    "details": {
      "vector_distance": 0.15,
      "hybrid_score": 0.87
    },
    "record": {
      "id": "doc_abc123",
      "attributes": {
        "title": "Getting Started Guide",
        "content": "This guide walks you through...",
        "category": "tutorial",
        "url": "https://docs.example.com/getting-started"
      }
    }
  }
]
```

<ResponseField name="details.vector_distance" type="number">
  Distance from the query in vector space. Lower values indicate higher similarity. `0.0` is identical; `1.0` is completely unrelated. Present on all results.
</ResponseField>

<ResponseField name="details.hybrid_score" type="number">
  Fused relevance score when `hybrid_search: true`. Higher is better. `null` for pure semantic queries.
</ResponseField>

<ResponseField name="record.id" type="string">
  The unique document identifier.
</ResponseField>

<ResponseField name="record.attributes" type="object">
  All metadata fields stored on the document, keyed by column name.
</ResponseField>

Access results in downstream blocks with Jinja:

```jinja2 theme={null}
{{ query_results.output[0].record.attributes.title }}
```

Handle empty results gracefully:

```jinja2 theme={null}
{% if query_results.output | length > 0 %}
  {{ query_results.output[0].record.attributes.content }}
{% else %}
  I couldn't find anything relevant. Try rephrasing your question.
{% endif %}
```

## Querying via API

### Basic Semantic Query

```bash theme={null}
curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "search_term": "customer support",
    "min_similarity": 0.5,
    "limit": 10
  }'
```

### Hybrid Search Query

```bash theme={null}
curl -X POST https://api.scoutos.com/v2/collections/{collection_id}/tables/{table_id}/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "search_term": "React hooks tutorial",
    "min_similarity": 0.5,
    "hybrid_search": true,
    "alpha": 0.5,
    "limit": 10
  }'
```

### Using the Python SDK

```python theme={null}
from scoutos import Scout

client = Scout(api_key="YOUR_API_KEY")

results = client.tables.query(
    collection_id="col_abc123",
    table_id="tab_xyz789",
    search_term="customer support",
    min_similarity=0.5,
    hybrid_search=True,
    limit=10
)

for result in results:
    print(f"Title: {result['record']['attributes']['title']}")
    print(f"Distance: {result['details']['vector_distance']}")
```

### Using the TypeScript SDK

```typescript theme={null}
import { ScoutClient } from "scoutos";

const client = new ScoutClient({ apiKey: "YOUR_API_KEY" });

const results = await client.tables.query({
  collectionId: "col_abc123",
  tableId: "tab_xyz789",
  searchTerm: "customer support",
  minSimilarity: 0.5,
  hybridSearch: true,
  limit: 10
});

results.forEach(result => {
  console.log(`Title: ${result.record.attributes.title}`);
  console.log(`Score: ${result.details.hybrid_score}`);
});
```

## Filtering by Metadata

Metadata filters let you narrow results using column values before or after the similarity ranking step. Filters use a JSON array with the format `["column_id", "operator", "value"]`.

### Available Operators

| Operator | Description                      | Example                                        |
| -------- | -------------------------------- | ---------------------------------------------- |
| `Eq`     | Equal to                         | `["status", "Eq", "active"]`                   |
| `NotEq`  | Not equal to                     | `["status", "NotEq", "archived"]`              |
| `In`     | Value is in a list               | `["category", "In", ["tutorial", "guide"]]`    |
| `NotIn`  | Value is not in a list           | `["category", "NotIn", ["draft", "archived"]]` |
| `Gt`     | Greater than                     | `["price", "Gt", 50]`                          |
| `Gte`    | Greater than or equal            | `["views", "Gte", 1000]`                       |
| `Lt`     | Less than                        | `["created_at", "Lt", 1704067200]`             |
| `Lte`    | Less than or equal               | `["stock", "Lte", 10]`                         |
| `Glob`   | Pattern match (case-sensitive)   | `["url", "Glob", "*/docs/*"]`                  |
| `IGlob`  | Pattern match (case-insensitive) | `["title", "IGlob", "*quick start*"]`          |
| `And`    | All sub-filters must match       | `["And", [[...], [...]]]`                      |
| `Or`     | Any sub-filter must match        | `["Or", [[...], [...]]]`                       |

### Filter Examples

**Single column filter:**

```json theme={null}
["category", "Eq", "tutorial"]
```

**Date range:**

```json theme={null}
["And", [
  ["created_at", "Gte", 1672531200],
  ["created_at", "Lte", 1704067200]
]]
```

**Multiple categories:**

```json theme={null}
["category", "In", ["tutorial", "guide", "reference"]]
```

**Case-insensitive title match:**

```json theme={null}
["title", "IGlob", "*getting started*"]
```

**Combined conditions:**

```json theme={null}
["And", [
  ["category", "Eq", "tutorial"],
  ["difficulty", "In", ["beginner", "intermediate"]],
  ["created_at", "Gte", 1704067200]
]]
```

### Using Filters in Workflow Blocks

Apply filters dynamically with Jinja templating:

```yaml theme={null}
Filters: ["category", "Eq", "{{inputs.category}}"]
```

For conditional filter logic:

```jinja2 theme={null}
{% if inputs.show_archived %}
["category", "Eq", "{{inputs.category}}"]
{% else %}
["And", [["category", "Eq", "{{inputs.category}}"], ["status", "NotEq", "archived"]]]
{% endif %}
```

## Tuning min\_similarity

The `min_similarity` threshold cuts off results below a relevance score. Start with `0.5` and adjust based on what you observe.

| Value Range | Behavior                                              |
| ----------- | ----------------------------------------------------- |
| `0.0 – 0.4` | Broad — includes marginal and loosely related matches |
| `0.5 – 0.7` | Balanced — good default for most knowledge bases      |
| `0.8 – 1.0` | Strict — only highly relevant, near-exact matches     |

**Practical guidance:**

* Use `0.3–0.4` for exploratory search or content discovery.
* Use `0.5–0.6` as a general-purpose default for production.
* Use `0.7–0.8` for technical documentation where precision matters.
* Use `0.8+` when you need near-exact semantic matches.

## Tuning Alpha (Hybrid Mode)

The `alpha` parameter shifts the balance between semantic and keyword scoring in hybrid mode.

| Alpha | Behavior                               |
| ----- | -------------------------------------- |
| `0.0` | Pure keyword (BM25 only)               |
| `0.3` | Mostly keyword with a semantic boost   |
| `0.5` | Balanced — the recommended default     |
| `0.7` | Mostly semantic with keyword precision |
| `1.0` | Pure semantic                          |

**When to adjust:**

* Lower alpha (`0.2–0.4`) for technical docs with specific identifiers and exact terms.
* Medium alpha (`0.5`) for general knowledge bases — start here.
* Higher alpha (`0.7–0.9`) for natural language conversations and content discovery.

## Using Databases with Agents

### 1. Enable Databases Tools

In your agent's **Tools** tab, enable the Databases query capability.

### 2. Add an Instruction Snippet

```markdown theme={null}
For questions that require internal knowledge:

1. Query Databases before answering.
2. Start with hybrid search using min_similarity: 0.5 and alpha: 0.5.
3. If results are noisy, increase min_similarity.
4. If the user gives constraints like category, date, or status, apply metadata filters.
5. Return a concise answer, then include the key supporting records.
```

### 3. Prompt Examples

* "Find troubleshooting steps for SSO login failures from the IT docs table."
* "Search only `category = policy` and summarize PTO policy changes since Jan. 1."
* "Query the sales enablement table for pricing objection handling and give me three approved responses."

## Common Query Patterns

### Find Similar Documents

```json theme={null}
{
  "search_term": "{{document.content}}",
  "min_similarity": 0.7,
  "limit": 5
}
```

### Recent Tutorials Only

```json theme={null}
{
  "search_term": "{{user_query}}",
  "filters": ["And", [
    ["category", "Eq", "tutorial"],
    ["created_at", "Gte", 1704067200]
  ]],
  "limit": 10
}
```

### Exclude Drafts

```json theme={null}
{
  "search_term": "{{user_query}}",
  "filters": ["status", "NotEq", "draft"],
  "limit": 10
}
```

### Multi-Category Hybrid Search

```json theme={null}
{
  "search_term": "{{user_query}}",
  "hybrid_search": true,
  "filters": ["category", "In", ["tutorial", "guide", "reference"]],
  "limit": 15
}
```

## Troubleshooting

**Getting no results**

* Lower `min_similarity` — try `0.3` to cast a wider net.
* Confirm your table has indexed documents by checking the row count in Scout Studio.
* Try a simple, general search term to confirm the data is reachable.
* Temporarily remove filters to check whether a filter condition is too restrictive.

**Too many irrelevant results**

* Raise `min_similarity` to `0.6` or higher.
* Add metadata filters to scope results to the right category or status.
* Lower `alpha` toward `0.3` if your query uses specific terms that should match exactly.

**Missing an obvious match**

* Confirm the document is in the correct table.
* Check for typos in filter values — `Eq` and `In` operators are case-sensitive. Use `IGlob` for case-insensitive text matching.
* Try hybrid search if you've been using semantic-only.
* Re-sync your data source if the record was added recently and may not be indexed yet.

**Low scores on relevant results**

`vector_distance` values above `0.5` generally indicate weak semantic alignment. This often means:

* Your query phrasing doesn't match how the content is written.
* The content is too short or generic to embed well.
* Consider enriching your documents with more descriptive text and re-syncing.

## Best Practices

1. **Start with defaults** — `min_similarity: 0.5`, `hybrid_search: true`, `alpha: 0.5`. Adjust from there based on observed results.
2. **Prefer hybrid search** — it outperforms pure semantic or pure keyword for the vast majority of real-world queries.
3. **Use metadata filters** — scoping to category, date, or status dramatically improves precision without sacrificing recall within the relevant subset.
4. **Write rich content** — comprehensive, descriptive text in the `content` field produces better embeddings and more accurate retrieval.
5. **Test with real queries** — use actual user questions (not synthetic ones) to tune thresholds and filter configurations.

## Next Steps

<CardGroup cols={3}>
  <Card title="Databases Overview" icon="layer-group" href="/databases/overview">
    Understand the Databases data model and when to use it.
  </Card>

  <Card title="Creating Databases" icon="table" href="/databases/creating-databases">
    Set up schemas optimized for search quality.
  </Card>

  <Card title="Sources" icon="rotate" href="/databases/sources">
    Keep database data fresh with automated syncs.
  </Card>
</CardGroup>
