Skip to main content

Overview

This example demonstrates how Atlas SDK enables agents to learn efficient tool usage patterns. Using the Model Context Protocol (MCP) to provide filesystem tools to a LangGraph agent, the example shows measurable improvement across 25 progressive tasks: 30-40% fewer tool calls and 95%+ completion rates by task 25. What you’ll see:
  • MCP server with 5 file operation tools
  • LangGraph agent integration
  • Progressive learning (simple → complex tasks)
  • Measurable efficiency gains
  • Total cost: $0.10-0.20 for complete 25-run session
Repository: atlas-sdk/examples/mcp_tool_learning

Architecture

Learning Harness (25 tasks)

   Atlas SDK Core
   (orchestration + rewards)

   LangGraph Agent

   MultiServerMCPClient

   MCP Server
   (5 file operation tools)
Tool inventory:
  • read_file - Read file contents
  • write_file - Write/create files
  • list_files - List directory contents
  • search_content - Regex search in files
  • run_command - Safe shell commands (ls, grep, wc)

Quick Start

Prerequisites

pip install arc-atlas langchain-mcp-adapters langchain-openai langgraph mcp anyio
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=...
atlas init  # Start Postgres for telemetry

Run Complete Learning Session

cd examples/mcp_tool_learning
python learning_harness.py
Executes 25 tasks with progressive complexity:
  • Phase 1 (tasks 1-5): Basic file operations
  • Phase 2 (tasks 6-10): Multi-step operations
  • Phase 3 (tasks 11-15): Complex workflows
  • Phase 4 (tasks 16-20): Advanced scenarios
  • Phase 5 (tasks 21-25): Edge cases and error handling

Run Single Task

atlas run --config examples/mcp_tool_learning/config.yaml \
          --task "List all files in sample_workspace and read notes.txt"

Learning Objectives

The agent learns to:
  1. Minimize redundant operations - Cache file lists instead of listing repeatedly
  2. Optimize tool selection - Choose search vs read based on task requirements
  3. Handle errors gracefully - Recover from missing files and invalid operations
  4. Plan efficiently - Break complex tasks into minimal step sequences
  5. Build context awareness - Understand when list → read → write sequence is optimal

Measured Results

Early Runs (Tasks 1-5)

  • Tool calls per task: 8-12 (trial and error)
  • Reward scores: 0.6-0.7
  • Occasional incorrect tool selection

Later Runs (Tasks 15-25)

  • Tool calls per task: 4-6 (optimized)
  • Reward scores: 0.8-0.9
  • Consistent correct tool selection
  • Proactive error handling
Key Metrics:
  • Tool call reduction: 30-40%
  • Completion rate: 95%+ by task 25
  • Reward progression: +0.2-0.3 average increase

Configure the Agent

The example uses a Python adapter to integrate the LangGraph agent:
agent:
  type: python
  import_path: examples.mcp_tool_learning.mcp_agent
  attribute: create_agent
Reward system provides learning signals for efficient tool usage:
rim:
  judge_prompt: |
    Reward effective tool usage:
    - Correct tool for each task
    - Minimal redundant operations
    - Proper error handling

Viewing Learning Progress

Check Learning Playbook

python -m atlas.cli.learning --project mcp-tool-learning
Shows:
  • Tool usage patterns over time
  • Reward progression
  • Common failure modes
  • Synthesized best practices

Export Session Traces

arc-atlas --database-url postgresql://atlas:atlas@localhost:5433/atlas \
          --output mcp_traces.jsonl \
          --limit 25

Query Database Directly

SELECT session_id, task, (reward_stats->>'score')::float as reward, created_at
FROM sessions
WHERE metadata->>'learning_key' = 'mcp-tool-learning'
ORDER BY session_id DESC
LIMIT 25;

Customization

Add Domain-Specific Tools

Modify mcp_server.py to add tools for your use case:
@server.call_tool()
async def database_query(query: str) -> str:
    """Execute safe database queries"""
    # Your implementation
    return results

Adjust Learning Tasks

Edit LEARNING_TASKS in learning_harness.py:
LEARNING_TASKS = [
    "Your domain-specific task 1",
    "Your domain-specific task 2",
    # ... progressive complexity
]

Tune Reward Signals

Update judge_prompt in config.yaml to reward domain-specific behaviors:
rim:
  judge_prompt: |
    Reward effective database operations:
    - Efficient query construction
    - Proper index usage
    - Connection pooling

Troubleshooting

IssueSolution
MCP server connection errorsVerify server_path in mcp_agent.py points to correct file
Async event loop errorsRun with python learning_harness.py (not python -i)
API rate limitsIncrease sleep duration between tasks in learning_harness.py
High costsUse GPT-4o-mini for both student and teacher; reduce task count
Postgres connection refusedStart Postgres with atlas init or verify DATABASE_URL in .env
No learning improvement seenEnsure storage is enabled and check reward scores in database
Tool calls not reducingVerify learning.enabled=true in config and check playbook entries

Next Steps

Custom Adapters

Connect your own agent framework

Configuration Guide

Tune orchestration and learning parameters

Export Training Data

Use runtime traces for offline training

Learning System

Understand persistent memory and playbooks