Skip to Content
🎉 Scout Docs 2.0 is here!

Sources

Sources keep your tables up to date by syncing external data into Collections. Use them when you want repeatable ingestion — rather than manually uploading or entering documents each time your source data changes.

What Sources Do

Each source runs a sync job that:

  1. Pulls data from an external system
  2. Maps that data to your table columns
  3. Creates or updates documents in the destination table

You can run sources manually or on a schedule.

Source Types

Scout supports multiple source types in the same table:

Source TypeWhat It PullsBest For
Web ScrapeWebsite pages via single URL, crawl or sitemapPublic docs, help centers, blogs
NotionNotion pages and databasesInternal knowledge bases and team wikis
Google SheetsRows from spreadsheetsOperational data and structured lists

Create a Source

  1. Open your Collection and select a table
  2. Click Sources
  3. Click Add Source
  4. Choose a source type
  5. Configure mapping and frequency
  6. Run the first sync

How Syncs Work

When a sync runs, Scout compares incoming data against documents already in the table:

  • New items become new documents
  • Existing items are updated in place (upsert by source ID)
  • Items removed from the source are not automatically deleted from the table

This means your table grows over time unless you manually remove stale documents. If you want the table to mirror the source exactly, delete the table contents and re-run a fresh sync.

Scout deduplicates by matching on the source item’s unique identifier — for example, the URL for web pages or the row ID for Google Sheets. Running the same sync twice won’t create duplicates.

Source Mapping

Each source returns different fields. You map those fields to columns in your table during setup.

Typical mappings:

  • title -> title
  • content -> content
  • url -> url
  • updated_at -> updated_at

If you’re syncing long-form text for retrieval, map the main body into your content column — that’s the field Scout embeds for semantic search.

If the source doesn’t include a field your table expects, that column stays empty for synced documents. You can fill gaps manually or add a second source that covers the missing data.

Sync Frequency

Frequency is optional. You can:

  • Run once manually
  • Set a recurring schedule (hourly, daily, weekly) for automatic refresh

Use schedules for content that changes often, like active docs portals or live spreadsheets. For static content, a one-time manual sync is usually enough.

Monitoring and Re-runs

From the Sources panel, you can:

  • View run status and sync history
  • Inspect errors and logs
  • Edit source configuration
  • Re-run failed or completed jobs

If a sync fails partway through, check the error log — common causes are permission issues, changed URLs or revoked API access. Fix the root cause and re-run.

Best Practices

  • Start with a small test sync before large runs
  • Keep column mappings explicit and stable — renaming columns after syncing can break mapping and leave fields empty
  • Use schedules only where freshness matters
  • Review failed runs regularly and fix mapping drift quickly

Next Steps


Built with ❤️ by Scout OS

Last updated on