Sources

Sources keep your tables up to date by syncing external data into Collections. Use them when you want repeatable ingestion — rather than manually uploading or entering documents each time your source data changes.

What Sources Do

Each source runs a sync job that:

Pulls data from an external system
Maps that data to your table columns
Creates or updates documents in the destination table

You can run sources manually or on a schedule.

Source Types

Scout supports multiple source types in the same table:

Source Type	What It Pulls	Best For
Web Scrape	Website pages via single URL, crawl or sitemap	Public docs, help centers, blogs
Notion	Notion pages and databases	Internal knowledge bases and team wikis
Google Sheets	Rows from spreadsheets	Operational data and structured lists

Create a Source

Open your Collection and select a table
Click Sources
Click Add Source
Choose a source type
Configure mapping and frequency
Run the first sync

How Syncs Work

When a sync runs, Scout compares incoming data against documents already in the table:

New items become new documents
Existing items are updated in place (upsert by source ID)
Items removed from the source are not automatically deleted from the table

This means your table grows over time unless you manually remove stale documents. If you want the table to mirror the source exactly, delete the table contents and re-run a fresh sync.

Scout deduplicates by matching on the source item’s unique identifier — for example, the URL for web pages or the row ID for Google Sheets. Running the same sync twice won’t create duplicates.

Source Mapping

Each source returns different fields. You map those fields to columns in your table during setup.

Typical mappings:

title -> title
content -> content
url -> url
updated_at -> updated_at

If you’re syncing long-form text for retrieval, map the main body into your content column — that’s the field Scout embeds for semantic search.

If the source doesn’t include a field your table expects, that column stays empty for synced documents. You can fill gaps manually or add a second source that covers the missing data.

Sync Frequency

Frequency is optional. You can:

Run once manually
Set a recurring schedule (hourly, daily, weekly) for automatic refresh

Use schedules for content that changes often, like active docs portals or live spreadsheets. For static content, a one-time manual sync is usually enough.

Monitoring and Re-runs

From the Sources panel, you can:

View run status and sync history
Inspect errors and logs
Edit source configuration
Re-run failed or completed jobs

If a sync fails partway through, check the error log — common causes are permission issues, changed URLs or revoked API access. Fix the root cause and re-run.

Best Practices

Start with a small test sync before large runs
Keep column mappings explicit and stable — renaming columns after syncing can break mapping and leave fields empty
Use schedules only where freshness matters
Review failed runs regularly and fix mapping drift quickly

Next Steps

Web Scraping: Configure crawl settings and extraction options
Notion: Connect and sync Notion pages and databases
Google Sheets: Sync spreadsheet rows into table documents
Creating Collections: Design tables for source data
Querying Data: Search synced content with semantic and hybrid search

Built with ❤️ by Scout OS