Sources
Sources keep your tables up to date by syncing external data into Collections. Use them when you want repeatable ingestion — rather than manually uploading or entering documents each time your source data changes.
What Sources Do
Each source runs a sync job that:
- Pulls data from an external system
- Maps that data to your table columns
- Creates or updates documents in the destination table
You can run sources manually or on a schedule.
Source Types
Scout supports multiple source types in the same table:
| Source Type | What It Pulls | Best For |
|---|---|---|
| Web Scrape | Website pages via single URL, crawl or sitemap | Public docs, help centers, blogs |
| Notion | Notion pages and databases | Internal knowledge bases and team wikis |
| Google Sheets | Rows from spreadsheets | Operational data and structured lists |
Create a Source
- Open your Collection and select a table
- Click Sources
- Click Add Source
- Choose a source type
- Configure mapping and frequency
- Run the first sync
How Syncs Work
When a sync runs, Scout compares incoming data against documents already in the table:
- New items become new documents
- Existing items are updated in place (upsert by source ID)
- Items removed from the source are not automatically deleted from the table
This means your table grows over time unless you manually remove stale documents. If you want the table to mirror the source exactly, delete the table contents and re-run a fresh sync.
Scout deduplicates by matching on the source item’s unique identifier — for example, the URL for web pages or the row ID for Google Sheets. Running the same sync twice won’t create duplicates.
Source Mapping
Each source returns different fields. You map those fields to columns in your table during setup.
Typical mappings:
title->titlecontent->contenturl->urlupdated_at->updated_at
If you’re syncing long-form text for retrieval, map the main body into your content column — that’s the field Scout embeds for semantic search.
If the source doesn’t include a field your table expects, that column stays empty for synced documents. You can fill gaps manually or add a second source that covers the missing data.
Sync Frequency
Frequency is optional. You can:
- Run once manually
- Set a recurring schedule (hourly, daily, weekly) for automatic refresh
Use schedules for content that changes often, like active docs portals or live spreadsheets. For static content, a one-time manual sync is usually enough.
Monitoring and Re-runs
From the Sources panel, you can:
- View run status and sync history
- Inspect errors and logs
- Edit source configuration
- Re-run failed or completed jobs
If a sync fails partway through, check the error log — common causes are permission issues, changed URLs or revoked API access. Fix the root cause and re-run.
Best Practices
- Start with a small test sync before large runs
- Keep column mappings explicit and stable — renaming columns after syncing can break mapping and leave fields empty
- Use schedules only where freshness matters
- Review failed runs regularly and fix mapping drift quickly
Next Steps
- Web Scraping: Configure crawl settings and extraction options
- Notion: Connect and sync Notion pages and databases
- Google Sheets: Sync spreadsheet rows into table documents
- Creating Collections: Design tables for source data
- Querying Data: Search synced content with semantic and hybrid search
Built with ❤️ by Scout OS