What Sources Do
Each Source runs a sync job that:- Pulls content from an external system using your configured credentials and settings.
- Maps the incoming fields to the columns in your destination table.
- Creates new documents or updates existing ones — an upsert based on each item’s unique source identifier.
Supported Sources
Scout supports a broad set of integrations so you can pull data from wherever your content already lives.Web Scraping
Crawl a website starting from a single URL. Scout follows internal links up to a configurable depth and extracts page content automatically.
Sitemap
Provide an XML sitemap URL and Scout fetches every listed page. Ideal for documentation portals with an existing sitemap.
Notion
Sync Notion pages and databases. Connect via OAuth or an integration token and select the pages or database records to include.
Google Sheets
Sync rows from a Google Sheets spreadsheet. Each row becomes a document; column headers map to table fields.
Google Drive
Pull documents and files directly from a Google Drive folder. Supports Docs, Sheets, and PDF files.
Microsoft 365 / SharePoint
Connect a Microsoft 365 tenant and sync content from SharePoint sites, document libraries, or Teams wikis.
OneDrive
Sync files from a personal or business OneDrive account. Supports Word documents, PDFs, and text files.
Laserfiche
Pull documents from a Laserfiche repository into a Scout Database for AI-powered search and retrieval.
How Syncs Work
When a sync runs, Scout compares the incoming data against what is already in your table:- New items become new documents, embedded and indexed immediately.
- Existing items are updated in place — the document is overwritten using its source ID as the match key.
- Items removed from the source are left in the table unchanged. Delete them manually if your use case requires exact mirroring.
Running the same sync twice is safe. Scout deduplicates on the source item’s unique identifier (URL, row ID, page ID), so repeated runs do not create duplicate documents.
Configuring a Source
Open your Database and select a Table
Navigate to Databases in Scout, open the Database you want to populate, and click the target Table.
Authenticate and configure
Complete the OAuth flow or paste your credentials for the chosen integration. Then configure source-specific settings such as:
- Web Scraping — starting URL, crawl depth, URL filters
- Sitemap — sitemap URL
- Notion — database or page selector
- Google Sheets — spreadsheet ID and sheet name
- SharePoint / OneDrive — tenant, site, and library
Map fields to columns
Review the field mapping screen. Match the incoming fields from the source to your table’s columns. See Source Mapping below for guidance.
Set sync frequency
Choose a schedule or leave it as manual-only. See Sync Frequency for recommendations.
Source Mapping
Each source produces different fields. During setup you map those fields to columns in your table. Common mappings:| Source Field | Maps To | Notes |
|---|---|---|
title | title (Single Line Text) | Article or page title |
body / content | content (Multi Line Text) | Main searchable text — must be mapped here for embeddings |
url / source_url | url (URL) | Original URL for attribution |
last_modified | updated_at (Number) | Unix timestamp of the last change |
Sync Frequency
| Schedule | When to Use |
|---|---|
| Manual only | Static content that rarely changes — a one-time import of archived articles |
| Hourly | Live support docs or spreadsheets that update throughout the day |
| Daily | Documentation portals or Notion wikis updated a few times a week |
| Weekly | Reference content that changes infrequently |
Triggering a Manual Sync
You can re-run any Source at any time without waiting for the scheduled window:- Open the Sources panel for your table.
- Find the Source you want to run.
- Click Run Now.
Monitoring Sync Status
The Sources panel shows the current and historical state of every sync job:| Status | Meaning |
|---|---|
| Running | The sync is actively fetching and ingesting data. |
| Completed | The sync finished successfully. The timestamp and record count are shown. |
| Failed | The sync encountered an error. Open the error log to see details. |
| Scheduled | The sync is queued for the next scheduled window. |
- View run history and record counts
- Inspect error messages and stack traces for failed runs
- Edit source credentials or mapping configuration
- Re-run failed or completed jobs
Common Failure Causes
- Permission changes — The OAuth token or integration key was revoked. Re-authenticate the source.
- Changed URLs — The starting URL or sitemap location moved. Update the source configuration.
- Rate limits — The external system throttled Scout’s requests. Re-run the sync after a short wait.
- Mapping drift — A column was renamed after the source was configured. Update the field mapping to match the new column name.
Best Practices
- Start small. Test a Source on a subset of content before running a full ingestion. Create a temporary table, sync a sample, and verify field mapping and content quality before pointing the source at your production table.
- Keep column names stable. Schema changes after a sync are disruptive. Design your columns up front and avoid renames.
- Schedule only what needs freshness. For static content, a one-time manual sync is sufficient and uses fewer resources.
- Review failures quickly. Mapping drift and revoked credentials are the most common failure modes. Check the Sources panel regularly and fix issues before they go unnoticed for multiple sync cycles.
- Use the
contentcolumn. Always map the primary text body to a column namedcontentso Scout’s automatic embedding pipeline picks it up correctly.
Next Steps
Creating Databases
Design your table schema before configuring a Source.
Querying Data
Search synced content with semantic, keyword, and hybrid modes.
Databases Overview
Understand the full Databases model and when to use it.