Skip to main content
Sources eliminate the manual work of keeping your Databases current. Instead of re-uploading content every time your documentation updates or your spreadsheet changes, you configure a Source once and Scout handles ingestion automatically — on a schedule or whenever you trigger a manual run. This is the recommended approach for any Database that powers a live agent.

What Sources Do

Each Source runs a sync job that:
  1. Pulls content from an external system using your configured credentials and settings.
  2. Maps the incoming fields to the columns in your destination table.
  3. Creates new documents or updates existing ones — an upsert based on each item’s unique source identifier.
Sources are non-destructive by default: items removed from the source are not automatically deleted from your table. If you want your table to mirror the source exactly, clear the table contents and trigger a fresh sync. Scout deduplicates by matching the source item’s unique identifier — the URL for web pages, the row ID for spreadsheet rows, the page ID for Notion. Running the same sync twice does not create duplicate records.

Supported Sources

Scout supports a broad set of integrations so you can pull data from wherever your content already lives.

Web Scraping

Crawl a website starting from a single URL. Scout follows internal links up to a configurable depth and extracts page content automatically.

Sitemap

Provide an XML sitemap URL and Scout fetches every listed page. Ideal for documentation portals with an existing sitemap.

Notion

Sync Notion pages and databases. Connect via OAuth or an integration token and select the pages or database records to include.

Google Sheets

Sync rows from a Google Sheets spreadsheet. Each row becomes a document; column headers map to table fields.

Google Drive

Pull documents and files directly from a Google Drive folder. Supports Docs, Sheets, and PDF files.

Microsoft 365 / SharePoint

Connect a Microsoft 365 tenant and sync content from SharePoint sites, document libraries, or Teams wikis.

OneDrive

Sync files from a personal or business OneDrive account. Supports Word documents, PDFs, and text files.

Laserfiche

Pull documents from a Laserfiche repository into a Scout Database for AI-powered search and retrieval.

How Syncs Work

When a sync runs, Scout compares the incoming data against what is already in your table:
  • New items become new documents, embedded and indexed immediately.
  • Existing items are updated in place — the document is overwritten using its source ID as the match key.
  • Items removed from the source are left in the table unchanged. Delete them manually if your use case requires exact mirroring.
Running the same sync twice is safe. Scout deduplicates on the source item’s unique identifier (URL, row ID, page ID), so repeated runs do not create duplicate documents.

Configuring a Source

1

Open your Database and select a Table

Navigate to Databases in Scout, open the Database you want to populate, and click the target Table.
2

Open the Sources panel

Click the Sources tab at the top of the table view.
3

Add a Source

Click Add Source and choose the source type from the list of integrations.
4

Authenticate and configure

Complete the OAuth flow or paste your credentials for the chosen integration. Then configure source-specific settings such as:
  • Web Scraping — starting URL, crawl depth, URL filters
  • Sitemap — sitemap URL
  • Notion — database or page selector
  • Google Sheets — spreadsheet ID and sheet name
  • SharePoint / OneDrive — tenant, site, and library
5

Map fields to columns

Review the field mapping screen. Match the incoming fields from the source to your table’s columns. See Source Mapping below for guidance.
6

Set sync frequency

Choose a schedule or leave it as manual-only. See Sync Frequency for recommendations.
7

Run the first sync

Click Run Now to trigger the initial ingestion. Scout fetches the data, maps it, and populates the table. Watch the progress in the Sources panel.

Source Mapping

Each source produces different fields. During setup you map those fields to columns in your table. Common mappings:
Source FieldMaps ToNotes
titletitle (Single Line Text)Article or page title
body / contentcontent (Multi Line Text)Main searchable text — must be mapped here for embeddings
url / source_urlurl (URL)Original URL for attribution
last_modifiedupdated_at (Number)Unix timestamp of the last change
Always map the main body text to your content column. This is the field Scout embeds for semantic search. If you map it to a different column, vector search won’t work as expected.
If the source doesn’t include a field your table expects, that column remains empty for synced documents. You can fill the gap manually afterward or combine a second Source that covers the missing data.

Sync Frequency

ScheduleWhen to Use
Manual onlyStatic content that rarely changes — a one-time import of archived articles
HourlyLive support docs or spreadsheets that update throughout the day
DailyDocumentation portals or Notion wikis updated a few times a week
WeeklyReference content that changes infrequently
Use scheduled syncs only for content where freshness actually matters. Unnecessary syncs consume ingestion quota and slow down other operations.

Triggering a Manual Sync

You can re-run any Source at any time without waiting for the scheduled window:
  1. Open the Sources panel for your table.
  2. Find the Source you want to run.
  3. Click Run Now.
Scout queues the job immediately. Progress and any errors appear in the sync history log.

Monitoring Sync Status

The Sources panel shows the current and historical state of every sync job:
StatusMeaning
RunningThe sync is actively fetching and ingesting data.
CompletedThe sync finished successfully. The timestamp and record count are shown.
FailedThe sync encountered an error. Open the error log to see details.
ScheduledThe sync is queued for the next scheduled window.
From the Sources panel you can:
  • View run history and record counts
  • Inspect error messages and stack traces for failed runs
  • Edit source credentials or mapping configuration
  • Re-run failed or completed jobs

Common Failure Causes

  • Permission changes — The OAuth token or integration key was revoked. Re-authenticate the source.
  • Changed URLs — The starting URL or sitemap location moved. Update the source configuration.
  • Rate limits — The external system throttled Scout’s requests. Re-run the sync after a short wait.
  • Mapping drift — A column was renamed after the source was configured. Update the field mapping to match the new column name.
Renaming a table column after configuring a Source breaks the field mapping for that column. The column stays empty on subsequent syncs until you update the mapping to reference the new column name.

Best Practices

  • Start small. Test a Source on a subset of content before running a full ingestion. Create a temporary table, sync a sample, and verify field mapping and content quality before pointing the source at your production table.
  • Keep column names stable. Schema changes after a sync are disruptive. Design your columns up front and avoid renames.
  • Schedule only what needs freshness. For static content, a one-time manual sync is sufficient and uses fewer resources.
  • Review failures quickly. Mapping drift and revoked credentials are the most common failure modes. Check the Sources panel regularly and fix issues before they go unnoticed for multiple sync cycles.
  • Use the content column. Always map the primary text body to a column named content so Scout’s automatic embedding pipeline picks it up correctly.

Next Steps

Creating Databases

Design your table schema before configuring a Source.

Querying Data

Search synced content with semantic, keyword, and hybrid modes.

Databases Overview

Understand the full Databases model and when to use it.