GitHub Integration Guide

This guide explains how to integrate Arc Memory with GitHub to enhance your knowledge graph with pull requests, issues, discussions, and other GitHub data.

Overview

Arc Memory can integrate with GitHub to:

Fetch pull request data and link it to commits
Retrieve issue information and connect it to related PRs
Capture discussions and comments
Track code reviews and feedback
Analyze contribution patterns

Authentication

Before you can use GitHub integration, you need to authenticate with GitHub.

Using a Personal Access Token (PAT)

Create a token:
- Go to GitHub Settings > Developer settings > Personal access tokens
- Click “Generate new token”
- Select the following scopes:
  - repo (for private repositories)
  - read:user
  - read:org (if accessing organization repositories)
- Click “Generate token”
- Copy the token
Authenticate with Arc Memory:
```
arc auth gh --token YOUR_TOKEN
```
Verify authentication:
```
arc auth status
```

Using GitHub CLI

If you already use GitHub CLI (gh), you can use its authentication:

# First, ensure you're logged in with GitHub CLI
gh auth login

# Then, use GitHub CLI auth with Arc Memory
arc auth gh --use-gh-cli

Using a GitHub App (for Organizations)

For organization-wide use, you can create a GitHub App:

Create a GitHub App:
- Go to GitHub Settings > Developer settings > GitHub Apps
- Click “New GitHub App”
- Fill in the required information
- Set permissions:
  - Repository: Read-only
  - Issues: Read-only
  - Pull requests: Read-only
  - Discussions: Read-only
- Create the app
Install the app on your organization or repositories
Generate a private key for the app

Authenticate with Arc Memory:

arc auth gh-app --app-id YOUR_APP_ID --installation-id YOUR_INSTALLATION_ID --private-key-path path/to/private-key.pem

Building with GitHub Data

Once authenticated, you can build your knowledge graph with GitHub data:

# Build with GitHub data
arc build --include-github

# Or build incrementally
arc build --incremental --include-github

Limiting GitHub Data

To limit the amount of GitHub data fetched:

# Limit to recent PRs and issues
arc build --include-github --days 30

# Limit to specific PR numbers
arc build --include-github --prs 123,456,789

# Limit to specific issue numbers
arc build --include-github --issues 100,200,300

GitHub Data in the Knowledge Graph

After building with GitHub data, the knowledge graph will contain:

Node Types

Node Type	Description	Example
`PR`	Pull request	PR #123
`Issue`	GitHub issue	Issue #456
`Discussion`	GitHub discussion	Discussion #789
`Comment`	Comment on PR, issue, or discussion	Comment on PR #123
`Review`	Code review on a PR	Review on PR #123
`User`	GitHub user	User “username”

Edge Types

Edge Type	Description	Example
`PART_OF`	Commit is part of a PR	Commit abc123 → PR #123
`REFERENCES`	PR references an issue	PR #123 → Issue #456
`AUTHORED_BY`	PR/issue authored by user	PR #123 → User “username”
`REVIEWED_BY`	PR reviewed by user	PR #123 → User “reviewer”
`COMMENTED_BY`	Comment authored by user	Comment → User “commenter”
`RESPONDS_TO`	Comment responds to another comment	Comment → Comment

Querying GitHub Data

You can query GitHub data using the why, relate, and trace commands:

Finding PRs for a Commit

arc relate node commit:abc123 --node-types PR

Finding Issues Referenced by a PR

arc relate node pr:123 --edge-types REFERENCES

Finding the Decision Trail for a Line

arc why file path/to/file.py 42 --include-prs --include-issues

GitHub Enterprise Support

Arc Memory supports GitHub Enterprise:

# Authenticate with GitHub Enterprise
arc auth gh --token YOUR_TOKEN --api-url https://github.your-company.com/api/v3

Rate Limiting Considerations

GitHub API has rate limits that may affect Arc Memory:

Authenticated requests: 5,000 requests per hour
Unauthenticated requests: 60 requests per hour

To avoid rate limiting issues:

Always authenticate with a token
Use incremental builds when possible
Limit the scope of your builds
Consider using a GitHub App for higher rate limits

Troubleshooting

Authentication Issues

If you encounter authentication issues:

Verify your token has the correct scopes
Check if the token has expired
Try re-authenticating:
```
arc auth gh --token YOUR_NEW_TOKEN
```

Rate Limit Errors

If you hit rate limits:

Check your current rate limit status:

curl -H "Authorization: token YOUR_TOKEN" https://api.github.com/rate_limit

Wait for the rate limit to reset

Use incremental builds:

arc build --incremental --include-github

Limit the scope of your build:
```
arc build --include-github --days 7
```

Missing Data

If GitHub data is missing from your knowledge graph:

Verify you included the GitHub flag:
```
arc build --include-github
```
Check if the data exists on GitHub
Verify your authentication has access to the repository
Run with debug logging:
```
arc build --include-github --debug
```

Best Practices

Use incremental builds to minimize API calls
Authenticate with a token to increase rate limits
Limit the scope of your builds to recent data when possible
Consider using a GitHub App for organization-wide use
Schedule builds during off-hours for large repositories
Use specific filters when querying to improve performance

Examples

CI/CD Integration

# Example GitHub Actions workflow
name: Arc Memory Build

on:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      
      - name: Install Arc Memory
        run: pip install arc-memory
      
      - name: Build knowledge graph
        run: |
          arc auth gh --token ${{ secrets.GITHUB_TOKEN }}
          arc build --include-github --incremental
      
      - name: Upload database
        uses: actions/upload-artifact@v3
        with:
          name: arc-memory-db
          path: .arc/memory.db

Team Dashboard Integration

from arc_memory import ArcMemory
import pandas as pd
import matplotlib.pyplot as plt

# Initialize Arc Memory
arc = ArcMemory()

# Query PR data
prs = arc.query_prs(days=30)

# Create a DataFrame
pr_data = pd.DataFrame([
    {
        'PR': pr.number,
        'Author': pr.author,
        'Created': pr.created_at,
        'Merged': pr.merged_at,
        'Comments': len(pr.comments),
        'Files Changed': len(pr.files),
        'Additions': pr.additions,
        'Deletions': pr.deletions
    }
    for pr in prs
])

# Generate visualizations
plt.figure(figsize=(12, 6))
pr_data.groupby('Author').size().plot(kind='bar')
plt.title('PRs by Author (Last 30 Days)')
plt.savefig('prs_by_author.png')

Getting Started

CLI Commands

API Documentation

Usage Examples

Guides

Architecture Decision Records

Features

Benchmarks

GitHub Integration Guide

GitHub Integration Guide

Overview

Authentication

Using a Personal Access Token (PAT)

Using GitHub CLI

Using a GitHub App (for Organizations)

Building with GitHub Data

Limiting GitHub Data

GitHub Data in the Knowledge Graph

Node Types

Edge Types

Querying GitHub Data

Finding PRs for a Commit

Finding Issues Referenced by a PR

Finding the Decision Trail for a Line

GitHub Enterprise Support

Rate Limiting Considerations

Troubleshooting

Authentication Issues

Rate Limit Errors

Missing Data

Best Practices

Examples

CI/CD Integration

Team Dashboard Integration

See Also

Getting Started

CLI Commands

API Documentation

Usage Examples

Guides

Architecture Decision Records

Features

Benchmarks

​GitHub Integration Guide

​Overview

​Authentication

​Using a Personal Access Token (PAT)

​Using GitHub CLI

​Using a GitHub App (for Organizations)

​Building with GitHub Data

​Limiting GitHub Data

​GitHub Data in the Knowledge Graph

​Node Types

​Edge Types

​Querying GitHub Data

​Finding PRs for a Commit

​Finding Issues Referenced by a PR

​Finding the Decision Trail for a Line

​GitHub Enterprise Support

​Rate Limiting Considerations

​Troubleshooting

​Authentication Issues

​Rate Limit Errors

​Missing Data

​Best Practices

​Examples

​CI/CD Integration

​Team Dashboard Integration

​See Also

GitHub Integration Guide

Overview

Authentication

Using a Personal Access Token (PAT)

Using GitHub CLI

Using a GitHub App (for Organizations)

Building with GitHub Data

Limiting GitHub Data

GitHub Data in the Knowledge Graph

Node Types

Edge Types

Querying GitHub Data

Finding PRs for a Commit

Finding Issues Referenced by a PR

Finding the Decision Trail for a Line

GitHub Enterprise Support

Rate Limiting Considerations

Troubleshooting

Authentication Issues

Rate Limit Errors

Missing Data

Best Practices

Examples

CI/CD Integration

Team Dashboard Integration

See Also