Performance benchmarks
Performance Benchmarks
Beta Release: Arc Memory SDK is currently in beta. We’re actively gathering feedback and improving the platform.
This document provides performance benchmarks for Arc Memory, focusing on key operations that impact user experience.
Benchmark Methodology
We use a custom benchmarking script located in tests/benchmark/benchmark.py
to measure the performance of key operations:
- Plugin Discovery: Time to discover and register all plugins
- Initial Build: Time to build the knowledge graph from scratch
- Incremental Build: Time to update the knowledge graph with new data
- Trace History Query: Time to trace the history of a specific line
Benchmarks are run on repositories of different sizes:
- Small: Arc Memory repository (~100 commits)
- Medium: Flask repository (~5,000 commits)
- Large: Django repository (~30,000 commits)
Current Performance (as of April 2025)
Small Repository (Arc Memory)
Operation | Duration (seconds) | Notes |
---|---|---|
Plugin Discovery | 0.100 | Discovering and registering 3 built-in plugins |
Initial Build | 3.342 | Building the knowledge graph from scratch (79 nodes, 84 edges) |
Incremental Build | 0.446 | Updating the knowledge graph with new data |
Trace History Query | 0.000109 | Tracing history for a specific line (109 microseconds) |
Medium Repository (Flask)
Operation | Duration (seconds) | Notes |
---|---|---|
Plugin Discovery | 0.087 | Discovering and registering 3 built-in plugins |
Initial Build | 14.300 | Building the knowledge graph from scratch (217 nodes, 474 edges) |
Incremental Build | 0.032 | Updating the knowledge graph with new data |
Trace History Query | 0.000030 | Tracing history for a specific line (30 microseconds) |
Large Repository (Django)
Operation | Duration (seconds) | Notes |
---|---|---|
Plugin Discovery | 0.085 | Discovering and registering 3 built-in plugins |
Initial Build | 127.800 | Building the knowledge graph from scratch (2,360 nodes, 3,462 edges) |
Incremental Build | 0.108 | Updating the knowledge graph with new data |
Trace History Query | 0.000038 | Tracing history for a specific line (38 microseconds) |
Performance Targets
Based on the PRD and user experience requirements:
Operation | Target Duration | Rationale |
---|---|---|
Plugin Discovery | < 0.5 seconds | Fast startup time for CLI and VS Code extension |
Initial Build | < 30 seconds | Acceptable one-time cost for new users |
Incremental Build | < 5 seconds | Fast enough for CI integration |
Trace History Query | < 200ms | Required for responsive VS Code extension |
Performance Analysis
Based on the benchmarks across different repository sizes, we can make the following observations:
-
Plugin Discovery is very fast across all repository sizes, taking less than 100ms.
-
Initial Build scales with repository size:
- Small repository (Arc Memory): 3.342 seconds
- Medium repository (Flask): 14.300 seconds
- Large repository (Django): 127.800 seconds
The build time increases roughly linearly with the number of nodes and edges. For the large repository, the build time is still well below the 5-minute target specified in the PRD.
-
Incremental Build is very efficient across all repository sizes:
- Small repository: 0.446 seconds
- Medium repository: 0.032 seconds
- Large repository: 0.108 seconds
Incremental builds are consistently fast, regardless of repository size, making them suitable for CI integration.
-
Trace History Query is extremely fast across all repository sizes:
- Small repository: 109 microseconds
- Medium repository: 30 microseconds
- Large repository: 38 microseconds
All trace history queries are well below the 200ms target specified in the PRD, ensuring a responsive user experience in the VS Code extension.
Performance Optimization Opportunities
Based on the current benchmarks, we’ve identified the following optimization opportunities:
-
Trace History Algorithm
- Enhance the implementation to include full BFS algorithm
- Add support for PR, Issue, and ADR nodes in the results
- Optimize database queries for larger repositories
- Add caching for frequently accessed data
-
Build Process
- Optimize database writes for larger repositories
- Parallelize plugin processing
- Improve incremental build detection
-
Plugin Architecture
- Lazy loading of plugins
- Configuration options for disabling unused plugins
Recent Improvements
-
Build Process
- Fixed the build process to correctly add nodes and edges to the database
- Added a custom JSON encoder for datetime and date objects
- Updated the database schema to allow NULL values for title and body
-
ADR Ingestor
- Added support for multiple date formats
- Added proper error handling for non-string date values
-
Trace History
- Fixed the trace history implementation to work with the database
- Implemented full BFS algorithm for graph traversal
- Added support for all node types (Commit, PR, Issue, ADR, File)
- Optimized performance to achieve 109 microseconds per query (still well below the 200ms target)
Next Steps
- ✅ Add benchmarks for medium and large repositories
- Add more comprehensive tests for the trace history algorithm
- Optimize database writes for larger repositories
- Implement parallel processing for the build process
- Add caching for git blame results
- Continuously monitor performance as new features are added
Running Benchmarks
To run the benchmarks yourself:
The benchmark results will be saved to the specified output file in JSON format.