Simulation Improvements
Improvements to Arc Memory’s simulation feature
Arc Memory Simulation Improvements
This document outlines the plan for improving the simulation history display and overall user experience in the Arc Memory simulation feature.
Current Issues
1. Simulation History Display Problems
The current simulation history display shows:
- Missing dates (showing “Unknown”)
- Empty entries with no ID but with “Unknown” date
- Missing or incomplete service information
- Inconsistent formatting
Example of current output:
2. Lack of Clear Explanation in Simulation Output
- Missing detailed reasoning and explanations (“show your work”)
- Lack of transparency about data sources and assumptions
- Insufficient integration with the LLM in
arc_memory/simulate/langgraph_flow.py
3. Evidence of Testing in Sandbox Environment
- Limited visibility into the actual testing that occurred
- No access to unit, integration, and end-to-end test results
- Code execution from
arc_memory/simulate/code_interpreter.py
not visible to users
4. Output Format Issues
- JSON output not converted to human-readable format in the CLI
- No progress indicators for long-running operations
- Limited error handling and user feedback
Improvement Plan
1. Fix Data Storage and Retrieval
Tasks:
- Review database schema to ensure all necessary fields are defined
- Fix simulation node creation in
memory/storage.py
to include all metadata - Update query functions in
memory/query.py
to retrieve complete simulation data - Add validation to ensure no empty entries are displayed in history
- Format dates consistently in human-readable format
Files to modify:
arc_memory/memory/storage.py
arc_memory/memory/query.py
arc_memory/cli/sim.py
(history command)
2. Enhance Explanation Generation
Tasks:
- Modify LLM system prompt in
langgraph_flow.py
to request more detailed explanations - Structure the prompt to explicitly ask for reasoning, data sources, and assumptions
- Update the explanation generation function to store the full explanation
- Add a way to display the full explanation in the CLI output
Files to modify:
arc_memory/simulate/langgraph_flow.py
arc_memory/simulate/llm.py
(if exists, or create it)
Example prompt enhancement:
3. Improve Sandbox Testing Visibility
Tasks:
- Enhance logging in
code_interpreter.py
to capture more detailed information - Store test execution details (commands run, outputs, errors) in the simulation node
- Create a way to display this information on demand (with
--verbose
flag) - Add a summary of tests performed in the standard output
Files to modify:
arc_memory/simulate/code_interpreter.py
arc_memory/cli/sim.py
4. Enhance Output Formatting
Tasks:
- Create a formatter to convert JSON output to markdown for CLI display
- Add progress indicators for long-running operations
- Implement better error handling with user-friendly messages
- Add color coding for risk levels and important information
Files to modify:
arc_memory/cli/sim.py
arc_memory/cli/utils.py
(create if doesn’t exist)
Example formatter function:
5. Improve CLI Experience
Tasks:
- Add a progress bar for long-running simulations
- Provide clearer error messages with suggestions for resolution
- Add more detailed help text and examples
- Implement a
--format
option to allow output in different formats (text, markdown, json)
Files to modify:
arc_memory/cli/sim.py
arc_memory/cli/utils.py
Implementation Strategy
Phase 1: Data Storage and Retrieval
Focus on ensuring all simulation data is properly stored and retrieved, fixing the basic display issues in the history command.
Phase 2: Output Formatting
Implement the formatter to convert JSON to human-readable output and add progress indicators.
Phase 3: Explanation Enhancement
Improve the LLM prompts and explanation generation to provide more detailed reasoning.
Phase 4: Sandbox Testing Visibility
Enhance the code interpreter to capture and display more detailed testing information.
Phase 5: CLI Experience
Finalize the CLI experience with better error handling, help text, and additional options.
Testing Strategy
For each phase:
- Write unit tests to verify the functionality
- Perform manual testing with different scenarios
- Verify backward compatibility with existing simulations
- Check edge cases (empty results, errors, etc.)
Documentation Updates
After implementation:
- Update CLI documentation in
docs/cli/sim.md
- Add examples in
docs/examples/simulation.md
- Update README with new features
- Add troubleshooting tips for common issues
Refactoring Plan: LangGraph to Smol Agents
Based on the issues identified and the need for a more modular, maintainable approach, we’ve decided to refactor the simulation workflow from LangGraph to Smol Agents. This section outlines the detailed refactoring plan.
Why Smol Agents?
Feature | LangGraph | Smol Agents |
---|---|---|
Architecture | Graph-based workflow with nodes and edges | Code-centric agent approach with Python execution |
Complexity | Higher complexity with state management and graph definitions | Lower complexity with more direct code execution |
Security | No built-in sandboxing | Built-in sandboxing options (local, E2B, Docker) |
Control Flow | Explicit graph definition with conditional edges | More natural Python control flow |
Debugging | Requires understanding graph state transitions | More straightforward Python debugging |
Integration with E2B | Requires custom integration | Native integration with E2B |
Code Size | Our implementation is ~1500 lines in one file | Would likely be more modular and smaller |
Agent Autonomy | Limited by graph structure | Higher autonomy with code execution |
Module Structure
Implementation Phases
Phase 1: Core Module Creation
- Create the basic module structure
- Implement utility functions and shared code
- Extract core functionality from
langgraph_flow.py
into appropriate modules
Phase 2: Agent Implementation
- Create agent definitions using Smol Agents
- Implement the sandbox integration with E2B
- Develop the workflow orchestration
Phase 3: CLI Integration
- Update the CLI to use the new workflow
- Implement progress reporting and output formatting
- Add support for the
--memory
flag and history subcommand
Phase 4: Testing and Documentation
- Write comprehensive tests for each module
- Update documentation to reflect the new architecture
- Create examples demonstrating the new workflow
Benefits of This Approach
- Improved maintainability: Smaller, focused modules will be easier to maintain and test
- Better security: Native integration with E2B will improve sandbox security
- Enhanced visibility: Users will be able to see the code being executed in the sandbox
- More flexible workflow: The agent will have more autonomy to handle complex scenarios
- Simplified error handling: Python-native error handling will be more straightforward
For more details on this decision, see ADR-0003: Refactor Simulation Workflow to Smol Agents.