Build Process API
The Build Process API allows you to build a knowledge graph from Git commits, GitHub PRs and issues, and ADRs.Overview
The build process discovers and executes plugins to ingest data from various sources, creates nodes and edges in the knowledge graph, and saves the result to a SQLite database. It supports both full and incremental builds, allowing for efficient updates to the graph.Key Functions
CLI Command: arc build
Options
--repo
,-r
: Path to the Git repository (default: current directory)--output
,-o
: Path to the output database file (default: ~/.arc/graph.db)--max-commits
: Maximum number of commits to process (default: 5000)--days
: Maximum age of commits to process in days (default: 365)--incremental
: Only process new data since last build (default: False)--pull
: Pull the latest CI-built graph (not implemented yet)--token
: GitHub token to use for API calls--debug
: Enable debug logging
Example
Build Process Flow
The build process follows these steps:-
Initialization:
- Ensure the output directory exists
- Check if the repository is a Git repository
- Load existing manifest for incremental builds
- Initialize the database
-
Plugin Discovery:
- Discover and register plugins using the plugin registry
- Plugins are discovered using entry points
-
Data Ingestion:
- For each plugin:
- Get last processed data (for incremental builds)
- Call the plugin’s
ingest
method with appropriate parameters - Collect nodes and edges from the plugin
- For each plugin:
-
Database Operations:
- Write all nodes and edges to the database
- Get node and edge counts
- Compress the database
-
Manifest Creation:
- Create a build manifest with metadata about the build
- Save the manifest for future incremental builds
Incremental Builds
Incremental builds only process new data since the last build, making them much faster than full builds. The process works as follows:- Load the existing build manifest
- Pass the last processed data to each plugin
- Plugins use this data to determine what’s new
- Only new nodes and edges are added to the database
Plugin Integration
The build process integrates with plugins through the plugin registry. Each plugin must implement theIngestorPlugin
protocol, which includes:
get_name()
: Returns the name of the pluginget_node_types()
: Returns the node types the plugin can createget_edge_types()
: Returns the edge types the plugin can createingest()
: Ingests data and returns nodes, edges, and metadata
- Git Plugin: Receives
max_commits
anddays
parameters - GitHub Plugin: Receives a
token
parameter
Error Handling
The build process includes comprehensive error handling:- Specific
GraphBuildError
for build-related errors - Detailed error messages for common issues
- Graceful handling of plugin failures
- Debug logging for troubleshooting
Performance
The build process is designed for performance:- Incremental builds are very fast (typically < 0.5 seconds)
- Full builds scale linearly with repository size
- Database compression reduces storage requirements