
Configuring GitHub as a Source
In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the GitHub option from the list of connectors. Click Next and you’ll be prompted to add your access.1. Add account access
You’ll need a GitHub Personal Access Token (classic or fine-grained) with permission to read the repositories you want to sync. The following configurations are available:- Personal Access Token: Your GitHub Personal Access Token (classic or fine-grained) with repo scope. This field is required and stored securely.
-
Repositories: Optional list of repositories in
owner/repoformat (one per line). If provided, only these repositories are synced. If left empty, the connector syncs all repositories accessible by the token. - Start Date: Optional starting point used by incremental commit syncs. Only sync records created or updated after this date.
2. Select streams
Choose which data streams you want to sync:- repositories
- pull_requests
- commits
3. Configure data streams
Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table, and the type of sync.- Layer: Choose the layer where extracted GitHub tables will be created.
- Folder: Optionally group all GitHub tables inside a folder.
- Table name: A default name is suggested, but you can customize it. You can also add a prefix to all tables.
- Sync Type: Choose between INCREMENTAL and FULL_TABLE.
- Incremental: Recommended for
commits, usingcommitted_atas the replication key. - Full table: Useful for one-off backfills or full refreshes.
- Incremental: Recommended for
4. Configure data source
Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often your repositories change:- Hourly / every few hours for active engineering analytics.
- Daily for standard operational reporting.
- Weekly for low-change repositories.
- Delta Log Retention: How long Nekt keeps previous table states. See Resource control.
- Additional Full Sync: Periodic full syncs in addition to incrementals.
5. Check your new source
You can view your new source on the Sources page. If needed, manually trigger the extraction by clicking on the arrow button. Once a run completes successfully, your data appears in the Catalog.Streams and Fields
Below you’ll find the available GitHub streams and their core fields.Repositories
Repositories
Repository metadata for all repositories accessible by the token (or only the configured list in
Notes:
repositories).Key fields:| Field | Type | Description |
|---|---|---|
id | Integer | Repository numeric ID (primary key) |
full_name | String | Repository name in owner/repo format |
private | Boolean | Indicates whether the repository is private |
visibility | String | Repository visibility (public, private, etc.) |
default_branch | String | Default branch |
language | String | Primary detected language |
stargazers_count | Integer | Number of stars |
forks_count | Integer | Number of forks |
open_issues_count | Integer | Number of open issues |
created_at | DateTime | Timestamp when the repository was created |
updated_at | DateTime | Timestamp when the repository was last updated |
pushed_at | DateTime | Timestamp when the repository was last pushed to |
- Primary key:
id - Replication: full-table style (no replication key)
- Child context: each repository emits
ownerandrepocontext used bypull_requestsandcommits
Pull Requests
Pull Requests
Pull requests for each repository. The connector fetches all pull request states (
Notes:
open, closed, and merged).Key fields:| Field | Type | Description |
|---|---|---|
id | Integer | Pull request ID (primary key) |
number | Integer | Pull request number inside the repository |
title | String | Pull request title |
body | String | Pull request description/body |
state | String | Current state (open, closed) |
draft | Boolean | Indicates if it is a draft PR |
locked | Boolean | Indicates if the PR is locked |
user | Object | Pull request author |
head | Object | Source branch metadata |
base | Object | Target branch metadata |
merged_at | DateTime | Timestamp when merged |
closed_at | DateTime | Timestamp when closed |
created_at | DateTime | Timestamp when created |
updated_at | DateTime | Timestamp when last updated |
additions | Integer | Lines added |
deletions | Integer | Lines deleted |
changed_files | Integer | Number of files changed |
comments | Integer | Number of comments |
review_comments | Integer | Number of review comments |
commits | Integer | Number of commits in the PR |
_sdc_repository | String | Repository context in owner/repo format |
- Primary key:
id - Replication: full-table style (no replication key)
- Includes repository context fields (
owner,repo,_sdc_repository) for easier joins
Commits
Commits
Commits for each repository. This stream supports incremental sync using commit timestamp.Key fields:
Notes:
| Field | Type | Description |
|---|---|---|
sha | String | Git SHA hash value of the object (primary key). |
node_id | String | GraphQL node identifier of the record. |
html_url | String | Web URL for this resource in GitHub. |
url | String | API URL for this resource. |
commit.message | String | Commit message. |
commit.author | Object | Author metadata embedded in the commit (name, email, date). |
commit.committer | Object | Committer metadata embedded in the commit (name, email, date). |
commit.tree | Object | Git tree object referenced by this commit. |
commit.verification | Object | Verification details (verified, reason, signature, payload). |
author | Object | GitHub user object for the author (when available). |
committer | Object | GitHub user object for the committer (when available). |
parents | Array | Parent commit references. |
stats.additions | Integer | Lines added. |
stats.deletions | Integer | Lines deleted. |
stats.total | Integer | Total lines changed. |
committed_at | DateTime | Replication key (derived from commit.committer.date). |
_sdc_repository | String | Repository context in owner/repo format. |
- Primary key:
sha - Replication key:
committed_at - Incremental sync sends
sinceto GitHub API based on state bookmark (orstart_datewhen state is not available)
Data Model
The connector follows a repository-centered model:Use Cases for Data Analysis
This section includes practical SQL examples you can run in Explorer.1. Pull Request throughput by repository
Measure how many pull requests are created, closed, and merged by repository.SQL query
SQL query
- AWS
- GCP
2. Commit activity in the last 30 days
Track commit volume and active contributors by repository.SQL query
SQL query
- AWS
- GCP
Skills for agents
Download GitHub skills file
GitHub connector documentation as plain markdown, for use in AI agent contexts.