GitHub is a software development platform for source control, collaboration, and code review. The Nekt GitHub connector uses the GitHub REST API to extract repository metadata, pull requests, and commits into your Catalog.

Configuring GitHub as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the GitHub option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

You’ll need a GitHub Personal Access Token (classic or fine-grained) with permission to read the repositories you want to sync. The following configurations are available:

Personal Access Token: Your GitHub Personal Access Token (classic or fine-grained) with repo scope. This field is required and stored securely.
Repositories: Optional list of repositories in owner/repo format (one per line). If provided, only these repositories are synced. If left empty, the connector syncs all repositories accessible by the token.
Start Date: Optional starting point used by incremental commit syncs. Only sync records created or updated after this date.

Once you’re done, click Next.

2. Select streams

Choose which data streams you want to sync:

repositories
pull_requests
commits

For faster extractions, select only the streams you need. Select the streams and click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table, and the type of sync.

Layer: Choose the layer where extracted GitHub tables will be created.
Folder: Optionally group all GitHub tables inside a folder.
Table name: A default name is suggested, but you can customize it. You can also add a prefix to all tables.
Sync Type: Choose between INCREMENTAL and FULL_TABLE.
- Incremental: Recommended for commits, using committed_at as the replication key.
- Full table: Useful for one-off backfills or full refreshes.

Once you are done configuring, click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often your repositories change:

Hourly / every few hours for active engineering analytics.
Daily for standard operational reporting.
Weekly for low-change repositories.

Optionally, you can define:

Delta Log Retention: How long Nekt keeps previous table states. See Resource control.
Additional Full Sync: Periodic full syncs in addition to incrementals.

When you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the extraction by clicking on the arrow button. Once a run completes successfully, your data appears in the Catalog.

You need at least one successful source run to see the tables in your Catalog.

Streams and Fields

Below you’ll find the available GitHub streams and their core fields.

Repositories

Repository metadata for all repositories accessible by the token (or only the configured list in repositories).Key fields:

Field	Type	Description
`id`	Integer	Repository numeric ID (primary key)
`full_name`	String	Repository name in `owner/repo` format
`private`	Boolean	Indicates whether the repository is private
`visibility`	String	Repository visibility (`public`, `private`, etc.)
`default_branch`	String	Default branch
`language`	String	Primary detected language
`stargazers_count`	Integer	Number of stars
`forks_count`	Integer	Number of forks
`open_issues_count`	Integer	Number of open issues
`created_at`	DateTime	Timestamp when the repository was created
`updated_at`	DateTime	Timestamp when the repository was last updated
`pushed_at`	DateTime	Timestamp when the repository was last pushed to

Notes:

Primary key: id
Replication: full-table style (no replication key)
Child context: each repository emits owner and repo context used by pull_requests and commits

Pull Requests

Pull requests for each repository. The connector fetches all pull request states (open, closed, and merged).Key fields:

Field	Type	Description
`id`	Integer	Pull request ID (primary key)
`number`	Integer	Pull request number inside the repository
`title`	String	Pull request title
`body`	String	Pull request description/body
`state`	String	Current state (`open`, `closed`)
`draft`	Boolean	Indicates if it is a draft PR
`locked`	Boolean	Indicates if the PR is locked
`user`	Object	Pull request author
`head`	Object	Source branch metadata
`base`	Object	Target branch metadata
`merged_at`	DateTime	Timestamp when merged
`closed_at`	DateTime	Timestamp when closed
`created_at`	DateTime	Timestamp when created
`updated_at`	DateTime	Timestamp when last updated
`additions`	Integer	Lines added
`deletions`	Integer	Lines deleted
`changed_files`	Integer	Number of files changed
`comments`	Integer	Number of comments
`review_comments`	Integer	Number of review comments
`commits`	Integer	Number of commits in the PR
`_sdc_repository`	String	Repository context in `owner/repo` format

Notes:

Primary key: id
Replication: full-table style (no replication key)
Includes repository context fields (owner, repo, _sdc_repository) for easier joins

Commits

Commits for each repository. This stream supports incremental sync using commit timestamp.Key fields:

Field	Type	Description
`sha`	String	Git SHA hash value of the object (primary key).
`node_id`	String	GraphQL node identifier of the record.
`html_url`	String	Web URL for this resource in GitHub.
`url`	String	API URL for this resource.
`commit.message`	String	Commit message.
`commit.author`	Object	Author metadata embedded in the commit (name, email, date).
`commit.committer`	Object	Committer metadata embedded in the commit (name, email, date).
`commit.tree`	Object	Git tree object referenced by this commit.
`commit.verification`	Object	Verification details (verified, reason, signature, payload).
`author`	Object	GitHub user object for the author (when available).
`committer`	Object	GitHub user object for the committer (when available).
`parents`	Array	Parent commit references.
`stats.additions`	Integer	Lines added.
`stats.deletions`	Integer	Lines deleted.
`stats.total`	Integer	Total lines changed.
`committed_at`	DateTime	Replication key (derived from `commit.committer.date`).
`_sdc_repository`	String	Repository context in `owner/repo` format.

Notes:

Primary key: sha
Replication key: committed_at
Incremental sync sends since to GitHub API based on state bookmark (or start_date when state is not available)

Data Model

The connector follows a repository-centered model:

Use Cases for Data Analysis

This section includes practical SQL examples you can run in Explorer.

1. Pull Request throughput by repository

Measure how many pull requests are created, closed, and merged by repository.

SQL query

SELECT
   _sdc_repository AS repository,
   COUNT(*) AS total_prs,
   SUM(CASE WHEN state = 'open' THEN 1 ELSE 0 END) AS open_prs,
   SUM(CASE WHEN closed_at IS NOT NULL THEN 1 ELSE 0 END) AS closed_prs,
   SUM(CASE WHEN merged_at IS NOT NULL THEN 1 ELSE 0 END) AS merged_prs
FROM
   nekt_raw.github_pull_requests
GROUP BY
   1
ORDER BY
   total_prs DESC;

SELECT
   _sdc_repository AS repository,
   COUNT(*) AS total_prs,
   SUM(CASE WHEN state = 'open' THEN 1 ELSE 0 END) AS open_prs,
   SUM(CASE WHEN closed_at IS NOT NULL THEN 1 ELSE 0 END) AS closed_prs,
   SUM(CASE WHEN merged_at IS NOT NULL THEN 1 ELSE 0 END) AS merged_prs
FROM
   `nekt_raw.github_pull_requests`
GROUP BY
   1
ORDER BY
   total_prs DESC;

2. Commit activity in the last 30 days

Track commit volume and active contributors by repository.

SQL query

SELECT
   _sdc_repository AS repository,
   COUNT(*) AS commits_last_30d,
   COUNT(DISTINCT COALESCE(author.login, commit.author.email)) AS active_authors
FROM
   nekt_raw.github_commits
WHERE
   CAST(committed_at AS timestamp) >= current_timestamp - interval '30' day
GROUP BY
   1
ORDER BY
   commits_last_30d DESC;

SELECT
   _sdc_repository AS repository,
   COUNT(*) AS commits_last_30d,
   COUNT(DISTINCT COALESCE(author.login, JSON_VALUE(TO_JSON_STRING(commit), '$.author.email'))) AS active_authors
FROM
   `nekt_raw.github_commits`
WHERE
   TIMESTAMP(committed_at) >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY
   1
ORDER BY
   commits_last_30d DESC;

Skills for agents

Download GitHub skills file

GitHub connector documentation as plain markdown, for use in AI agent contexts.

​Configuring GitHub as a Source

​1. Add account access

​2. Select streams

​3. Configure data streams

​4. Configure data source

​5. Check your new source

​Streams and Fields

​Data Model

​Use Cases for Data Analysis

​1. Pull Request throughput by repository

​2. Commit activity in the last 30 days

​Skills for agents

Download GitHub skills file

Configuring GitHub as a Source

1. Add account access

2. Select streams

3. Configure data streams

4. Configure data source

5. Check your new source

Streams and Fields

Data Model

Use Cases for Data Analysis

1. Pull Request throughput by repository

2. Commit activity in the last 30 days

Skills for agents