Amazon Simple Storage Service (S3) is a scalable cloud storage service that can store various types of files, including CSV (Comma-Separated Values) files. This connector allows you to extract data from CSV files stored in your S3 buckets, making it easy to analyze and process structured data stored in the cloud.

Configuring AWS S3 (CSV) as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the AWS S3 (CSV) option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

You’ll need to provide the following credentials to connect to AWS S3:

Bucket Name: The name of your AWS S3 bucket where the CSV files are stored

Additionally, you’ll need to configure your CSV streams. You can add multiple streams, each representing a different set of CSV files - although the schema must be consistent between files for each stream. For each stream, configure:

Stream Name: A unique identifier for the stream (e.g., sales_data). Must start with a letter and contain only lowercase letters, numbers, and underscores.
Delimiter: The character that separates fields in your CSV files (default: ,). Common options:
- Comma (,) for standard CSV files
- Semicolon (;) for some European formats
- Tab (\t) for TSV files
Note: The delimiter must be a single character.
S3 Folder Path (optional): The path within your bucket where the files are located (e.g., raw_data/sales/). Leave empty if files are in the bucket’s root.
File Search Pattern: A regex pattern to match the files you want to process (default: .*). Examples:
- .* - matches all files
- .*\.csv$ - matches only files ending in .csv
- sales_.*\.csv$ - matches CSV files starting with “sales_”
- 2024-.*\.csv$ - matches CSV files from 2024
Unique Key Columns: The column(s) that uniquely identify each row in your CSV files (e.g., id or order_number). This helps with deduplication and incremental syncs.

Once you’re done, click Next.

2. Select streams

Choose which data streams you want to sync - you can select all streams or pick specific ones that matter most to you.

Tip: The stream can be found more easily by typing its name.

Select the streams and click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select a name for each table (which will contain the fetched data) and the type of sync.

Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix and make this process faster!
Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
- Incremental: every time the extraction happens, we’ll get only files that are added or modified since the last extraction.
- Full table: every time the extraction happens, we’ll process all matching files from the bucket.

Once you are done configuring, click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can determine when to execute a full sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while. Once you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.

For you to be able to see it on your Catalog, you need at least one successful source run.

Additional Information

Best Practices

File Organization

Keep related CSV files in dedicated folders
Use consistent naming patterns for your files
Consider using date-based prefixes for time-series data (e.g., YYYY-MM-DD-sales.csv)

CSV Structure

Ensure consistent column names across files
Use appropriate data types for each column
Avoid special characters in column names
Include a header row in all CSV files

Performance Optimization

Use compression (e.g., .gz) for large files
Set appropriate file search patterns to limit unnecessary file scanning
Choose unique key columns that have good cardinality

Troubleshooting

File Not Found

Verify the bucket name is correct
Check the file search pattern
Ensure the S3 folder path is correct
Confirm file permissions

Parsing Errors

Verify the correct delimiter is specified
Check for special characters or quotes in the CSV
Ensure consistent column counts across rows
Look for hidden characters (BOM, etc.)

Authentication Issues

Verify AWS credentials are correct
Check bucket permissions
Ensure the credentials have not expired
Confirm the bucket region

Performance Issues

Review file search patterns for efficiency
Check file sizes and consider splitting large files
Monitor AWS request quotas
Consider using compression for large files

Skills for agents

Download AWS S3 (CSV) skills file

AWS S3 (CSV) connector documentation as plain markdown, for use in AI agent contexts.

​Configuring AWS S3 (CSV) as a Source

​1. Add account access

​2. Select streams

​3. Configure data streams

​4. Configure data source

​5. Check your new source

​Additional Information

​Best Practices

​File Organization

​CSV Structure

​Performance Optimization

​Troubleshooting

​File Not Found

​Parsing Errors

​Authentication Issues

​Performance Issues

​Skills for agents

Download AWS S3 (CSV) skills file

Configuring AWS S3 (CSV) as a Source

1. Add account access

2. Select streams

3. Configure data streams

4. Configure data source

5. Check your new source

Additional Information

Best Practices

File Organization

CSV Structure

Performance Optimization

Troubleshooting

File Not Found

Parsing Errors

Authentication Issues

Performance Issues

Skills for agents