GitHub Integration Guide

This guide explains how to integrate Arc Memory with GitHub to enhance your knowledge graph with pull requests, issues, discussions, and other GitHub data.

Overview

Arc Memory can integrate with GitHub to:

  1. Fetch pull request data and link it to commits
  2. Retrieve issue information and connect it to related PRs
  3. Capture discussions and comments
  4. Track code reviews and feedback
  5. Analyze contribution patterns

Authentication

Before you can use GitHub integration, you need to authenticate with GitHub.

Using a Personal Access Token (PAT)

  1. Create a token:

  2. Authenticate with Arc Memory:

    arc auth gh --token YOUR_TOKEN
    
  3. Verify authentication:

    arc auth status
    

Using GitHub CLI

If you already use GitHub CLI (gh), you can use its authentication:

# First, ensure you're logged in with GitHub CLI
gh auth login

# Then, use GitHub CLI auth with Arc Memory
arc auth gh --use-gh-cli

Using a GitHub App (for Organizations)

For organization-wide use, you can create a GitHub App:

  1. Create a GitHub App:

  2. Install the app on your organization or repositories

  3. Generate a private key for the app

  4. Authenticate with Arc Memory:

    arc auth gh-app --app-id YOUR_APP_ID --installation-id YOUR_INSTALLATION_ID --private-key-path path/to/private-key.pem
    

Building with GitHub Data

Once authenticated, you can build your knowledge graph with GitHub data:

# Build with GitHub data
arc build --include-github

# Or build incrementally
arc build --incremental --include-github

Limiting GitHub Data

To limit the amount of GitHub data fetched:

# Limit to recent PRs and issues
arc build --include-github --days 30

# Limit to specific PR numbers
arc build --include-github --prs 123,456,789

# Limit to specific issue numbers
arc build --include-github --issues 100,200,300

GitHub Data in the Knowledge Graph

After building with GitHub data, the knowledge graph will contain:

Node Types

Node TypeDescriptionExample
PRPull requestPR #123
IssueGitHub issueIssue #456
DiscussionGitHub discussionDiscussion #789
CommentComment on PR, issue, or discussionComment on PR #123
ReviewCode review on a PRReview on PR #123
UserGitHub userUser “username”

Edge Types

Edge TypeDescriptionExample
PART_OFCommit is part of a PRCommit abc123 → PR #123
REFERENCESPR references an issuePR #123 → Issue #456
AUTHORED_BYPR/issue authored by userPR #123 → User “username”
REVIEWED_BYPR reviewed by userPR #123 → User “reviewer”
COMMENTED_BYComment authored by userComment → User “commenter”
RESPONDS_TOComment responds to another commentComment → Comment

Querying GitHub Data

You can query GitHub data using the why, relate, and trace commands:

Finding PRs for a Commit

arc relate node commit:abc123 --node-types PR

Finding Issues Referenced by a PR

arc relate node pr:123 --edge-types REFERENCES

Finding the Decision Trail for a Line

arc why file path/to/file.py 42 --include-prs --include-issues

GitHub Enterprise Support

Arc Memory supports GitHub Enterprise:

# Authenticate with GitHub Enterprise
arc auth gh --token YOUR_TOKEN --api-url https://github.your-company.com/api/v3

Rate Limiting Considerations

GitHub API has rate limits that may affect Arc Memory:

  • Authenticated requests: 5,000 requests per hour
  • Unauthenticated requests: 60 requests per hour

To avoid rate limiting issues:

  1. Always authenticate with a token
  2. Use incremental builds when possible
  3. Limit the scope of your builds
  4. Consider using a GitHub App for higher rate limits

Troubleshooting

Authentication Issues

If you encounter authentication issues:

  1. Verify your token has the correct scopes
  2. Check if the token has expired
  3. Try re-authenticating:
    arc auth gh --token YOUR_NEW_TOKEN
    

Rate Limit Errors

If you hit rate limits:

  1. Check your current rate limit status:
    curl -H "Authorization: token YOUR_TOKEN" https://api.github.com/rate_limit
    
  2. Wait for the rate limit to reset
  3. Use incremental builds:
    arc build --incremental --include-github
    
  4. Limit the scope of your build:
    arc build --include-github --days 7
    

Missing Data

If GitHub data is missing from your knowledge graph:

  1. Verify you included the GitHub flag:
    arc build --include-github
    
  2. Check if the data exists on GitHub
  3. Verify your authentication has access to the repository
  4. Run with debug logging:
    arc build --include-github --debug
    

Best Practices

  1. Use incremental builds to minimize API calls
  2. Authenticate with a token to increase rate limits
  3. Limit the scope of your builds to recent data when possible
  4. Consider using a GitHub App for organization-wide use
  5. Schedule builds during off-hours for large repositories
  6. Use specific filters when querying to improve performance

Examples

CI/CD Integration

# Example GitHub Actions workflow
name: Arc Memory Build

on:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      
      - name: Install Arc Memory
        run: pip install arc-memory
      
      - name: Build knowledge graph
        run: |
          arc auth gh --token ${{ secrets.GITHUB_TOKEN }}
          arc build --include-github --incremental
      
      - name: Upload database
        uses: actions/upload-artifact@v3
        with:
          name: arc-memory-db
          path: .arc/memory.db

Team Dashboard Integration

from arc_memory import ArcMemory
import pandas as pd
import matplotlib.pyplot as plt

# Initialize Arc Memory
arc = ArcMemory()

# Query PR data
prs = arc.query_prs(days=30)

# Create a DataFrame
pr_data = pd.DataFrame([
    {
        'PR': pr.number,
        'Author': pr.author,
        'Created': pr.created_at,
        'Merged': pr.merged_at,
        'Comments': len(pr.comments),
        'Files Changed': len(pr.files),
        'Additions': pr.additions,
        'Deletions': pr.deletions
    }
    for pr in prs
])

# Generate visualizations
plt.figure(figsize=(12, 6))
pr_data.groupby('Author').size().plot(kind='bar')
plt.title('PRs by Author (Last 30 Days)')
plt.savefig('prs_by_author.png')

See Also