> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scoutos.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Database Sources: Sync Data from Notion, Sheets, and More

> Automatically sync Scout Databases from websites, Notion, Google Sheets, Google Drive, and Microsoft 365. Set up once and keep your agent data current.

Sources eliminate the manual work of keeping your Databases current. Instead of re-uploading content every time your documentation updates or your spreadsheet changes, you configure a Source once and Scout handles ingestion automatically — on a schedule or whenever you trigger a manual run. This is the recommended approach for any Database that powers a live agent.

## What Sources Do

Each Source runs a sync job that:

1. Pulls content from an external system using your configured credentials and settings.
2. Maps the incoming fields to the columns in your destination table.
3. Creates new documents or updates existing ones — an **upsert** based on each item's unique source identifier.

Sources are non-destructive by default: items removed from the source are **not** automatically deleted from your table. If you want your table to mirror the source exactly, clear the table contents and trigger a fresh sync.

Scout deduplicates by matching the source item's unique identifier — the URL for web pages, the row ID for spreadsheet rows, the page ID for Notion. Running the same sync twice does not create duplicate records.

## Supported Sources

Scout supports a broad set of integrations so you can pull data from wherever your content already lives.

<CardGroup cols={2}>
  <Card title="Web Scraping" icon="globe" href="/databases/web-scraping">
    Crawl a website starting from a single URL. Scout follows internal links up to a configurable depth and extracts page content automatically.
  </Card>

  <Card title="Sitemap" icon="sitemap">
    Provide an XML sitemap URL and Scout fetches every listed page. Ideal for documentation portals with an existing sitemap.
  </Card>

  <Card title="Notion" icon="n">
    Sync Notion pages and databases. Connect via OAuth or an integration token and select the pages or database records to include.
  </Card>

  <Card title="Google Sheets" icon="table-cells">
    Sync rows from a Google Sheets spreadsheet. Each row becomes a document; column headers map to table fields.
  </Card>

  <Card title="Google Drive" icon="hard-drive">
    Pull documents and files directly from a Google Drive folder. Supports Docs, Sheets, and PDF files.
  </Card>

  <Card title="Microsoft 365 / SharePoint" icon="microsoft">
    Connect a Microsoft 365 tenant and sync content from SharePoint sites, document libraries, or Teams wikis.
  </Card>

  <Card title="OneDrive" icon="cloud">
    Sync files from a personal or business OneDrive account. Supports Word documents, PDFs, and text files.
  </Card>

  <Card title="Laserfiche" icon="folder-open">
    Pull documents from a Laserfiche repository into a Scout Database for AI-powered search and retrieval.
  </Card>
</CardGroup>

## How Syncs Work

When a sync runs, Scout compares the incoming data against what is already in your table:

* **New items** become new documents, embedded and indexed immediately.
* **Existing items** are updated in place — the document is overwritten using its source ID as the match key.
* **Items removed from the source** are left in the table unchanged. Delete them manually if your use case requires exact mirroring.

<Note>
  Running the same sync twice is safe. Scout deduplicates on the source item's unique identifier (URL, row ID, page ID), so repeated runs do not create duplicate documents.
</Note>

## Configuring a Source

<Steps>
  <Step title="Open your Database and select a Table">
    Navigate to Databases in Scout, open the Database you want to populate, and click the target Table.
  </Step>

  <Step title="Open the Sources panel">
    Click the **Sources** tab at the top of the table view.
  </Step>

  <Step title="Add a Source">
    Click **Add Source** and choose the source type from the list of integrations.
  </Step>

  <Step title="Authenticate and configure">
    Complete the OAuth flow or paste your credentials for the chosen integration. Then configure source-specific settings such as:

    * **Web Scraping** — starting URL, crawl depth, URL filters
    * **Sitemap** — sitemap URL
    * **Notion** — database or page selector
    * **Google Sheets** — spreadsheet ID and sheet name
    * **SharePoint / OneDrive** — tenant, site, and library
  </Step>

  <Step title="Map fields to columns">
    Review the field mapping screen. Match the incoming fields from the source to your table's columns. See [Source Mapping](#source-mapping) below for guidance.
  </Step>

  <Step title="Set sync frequency">
    Choose a schedule or leave it as manual-only. See [Sync Frequency](#sync-frequency) for recommendations.
  </Step>

  <Step title="Run the first sync">
    Click **Run Now** to trigger the initial ingestion. Scout fetches the data, maps it, and populates the table. Watch the progress in the Sources panel.
  </Step>
</Steps>

## Source Mapping

Each source produces different fields. During setup you map those fields to columns in your table. Common mappings:

| Source Field         | Maps To                     | Notes                                                     |
| -------------------- | --------------------------- | --------------------------------------------------------- |
| `title`              | `title` (Single Line Text)  | Article or page title                                     |
| `body` / `content`   | `content` (Multi Line Text) | Main searchable text — must be mapped here for embeddings |
| `url` / `source_url` | `url` (URL)                 | Original URL for attribution                              |
| `last_modified`      | `updated_at` (Number)       | Unix timestamp of the last change                         |

<Tip>
  Always map the main body text to your `content` column. This is the field Scout embeds for semantic search. If you map it to a different column, vector search won't work as expected.
</Tip>

If the source doesn't include a field your table expects, that column remains empty for synced documents. You can fill the gap manually afterward or combine a second Source that covers the missing data.

## Sync Frequency

| Schedule        | When to Use                                                                 |
| --------------- | --------------------------------------------------------------------------- |
| **Manual only** | Static content that rarely changes — a one-time import of archived articles |
| **Hourly**      | Live support docs or spreadsheets that update throughout the day            |
| **Daily**       | Documentation portals or Notion wikis updated a few times a week            |
| **Weekly**      | Reference content that changes infrequently                                 |

Use scheduled syncs only for content where freshness actually matters. Unnecessary syncs consume ingestion quota and slow down other operations.

## Triggering a Manual Sync

You can re-run any Source at any time without waiting for the scheduled window:

1. Open the **Sources** panel for your table.
2. Find the Source you want to run.
3. Click **Run Now**.

Scout queues the job immediately. Progress and any errors appear in the sync history log.

## Monitoring Sync Status

The **Sources** panel shows the current and historical state of every sync job:

| Status        | Meaning                                                                   |
| ------------- | ------------------------------------------------------------------------- |
| **Running**   | The sync is actively fetching and ingesting data.                         |
| **Completed** | The sync finished successfully. The timestamp and record count are shown. |
| **Failed**    | The sync encountered an error. Open the error log to see details.         |
| **Scheduled** | The sync is queued for the next scheduled window.                         |

From the Sources panel you can:

* View run history and record counts
* Inspect error messages and stack traces for failed runs
* Edit source credentials or mapping configuration
* Re-run failed or completed jobs

### Common Failure Causes

* **Permission changes** — The OAuth token or integration key was revoked. Re-authenticate the source.
* **Changed URLs** — The starting URL or sitemap location moved. Update the source configuration.
* **Rate limits** — The external system throttled Scout's requests. Re-run the sync after a short wait.
* **Mapping drift** — A column was renamed after the source was configured. Update the field mapping to match the new column name.

<Warning>
  Renaming a table column after configuring a Source breaks the field mapping for that column. The column stays empty on subsequent syncs until you update the mapping to reference the new column name.
</Warning>

## Best Practices

* **Start small.** Test a Source on a subset of content before running a full ingestion. Create a temporary table, sync a sample, and verify field mapping and content quality before pointing the source at your production table.
* **Keep column names stable.** Schema changes after a sync are disruptive. Design your columns up front and avoid renames.
* **Schedule only what needs freshness.** For static content, a one-time manual sync is sufficient and uses fewer resources.
* **Review failures quickly.** Mapping drift and revoked credentials are the most common failure modes. Check the Sources panel regularly and fix issues before they go unnoticed for multiple sync cycles.
* **Use the `content` column.** Always map the primary text body to a column named `content` so Scout's automatic embedding pipeline picks it up correctly.

## Next Steps

<CardGroup cols={3}>
  <Card title="Creating Databases" icon="table" href="/databases/creating-databases">
    Design your table schema before configuring a Source.
  </Card>

  <Card title="Querying Data" icon="magnifying-glass" href="/databases/querying-data">
    Search synced content with semantic, keyword, and hybrid modes.
  </Card>

  <Card title="Databases Overview" icon="layer-group" href="/databases/overview">
    Understand the full Databases model and when to use it.
  </Card>
</CardGroup>
