Learning System Architecture

Atlas runtime learning exists to make the teacher more effective between offline GRPO cycles. Instead of hard-coding prompt tweaks or waiting for the next training run, the SDK captures guidance from successful sessions, synthesizes a “playbook,” and reinjects that context on subsequent requests. This page explains how the learning pipeline works, how data persists in Postgres, and which configuration levers production teams can use to control it.

Why Runtime Learning Matters

Faster feedback loops – learning runs on live telemetry, so student/teacher personas improve within hours instead of a full fine-tune.
Traceable changes – every playbook has a hash, metadata, and storage lineage so you can audit who learned what and when.
Safe by default – review gating, drift detection, and update toggles let you pause or roll back learning if behavior regresses.

Feedback Loop at a Glance

Session executes – the dual-agent runtime completes a task and logs telemetry into sessions, trajectory_events, and reward tables.
Reward judges score – the RIM ensemble produces reward, uncertainty, and escalation data.
Learning synthesizer runs – after reward evaluation, Atlas calls the LearningSynthesizer to summarize salient guidance for student and teacher personas.
Registry update – Database.upsert_learning_state writes the new playbook to the learning_registry table (keyed by learning_key).
Playbook cached – on the next run for the same key, resolve_playbook retrieves and hashes the playbook, then injects it into persona prompts.
Evaluation harness audits – engineering teams query atlas.training_data (see the snippet below) to track reward deltas, mode shifts, and review status across learning keys.

Pipeline Components

Learning Synthesizer (`atlas/learning/synthesizer.py`)

The synthesizer is distinct from reward judges. It:

runs after reward scoring so updates only occur on high-signal sessions,
consumes trajectory summaries, reward stats, and recent history (bounded by history_limit),
emits structured “student” and “teacher” guidance plus optional session notes,
uses a dedicated LLM (configurable via learning.llm) to transform raw notes into concise playbooks.

When learning.update_enabled is false, the synthesizer skips persistence but can still write per-session learning notes for auditing. This is useful for A/B testing new prompts before rolling them out.

Playbook Resolver (`atlas/learning/playbook.py`)

resolve_playbook is invoked during persona construction. It handles:

fetching the latest registry entry for the learning_key,
trimming long sections to meet token budgets,
caching the playbook on disk (per role) and computing a SHA256 hash,
returning both content and metadata so prompts, validation payloads, and cache keys include the hash.

Set learning.apply_to_prompts=false to keep generating playbooks without injecting them into runtime prompts—a common pattern for staging and smoke tests. The hash still flows through telemetry so you can verify the correct playbook would have been used.

Playbook Injection Modes

Atlas supports two playbook injection strategies optimized for different scenarios: Prefix Mode (Default)

learning:
  injection_mode: prefix  # Default

Playbooks are injected at the beginning of the system prompt. Best for:

General guidance and behavioral patterns
Task-agnostic learning that should influence all reasoning
Compatibility with providers that don’t support advanced caching

Suffix Mode (KV Cache Optimization)

learning:
  injection_mode: suffix

Playbooks are appended after the base system prompt. Optimized for:

Provider KV cache efficiency: Anthropic and other providers cache the static system prompt prefix, avoiding recomputation when only the playbook changes
Reduced latency: Cache hits eliminate prompt reprocessing overhead
Cost savings: Cached tokens aren’t rebilled on subsequent requests

The runtime automatically computes cache breakpoints when injection_mode: suffix is used with compatible providers (Anthropic Claude, providers supporting prompt caching). The base system prompt becomes the cached prefix, and the playbook suffix can be updated without invalidating the cache.

When to use suffix mode: Enable injection_mode: suffix for production workloads with frequent playbook updates and providers supporting KV cache (e.g., Anthropic Claude). This can reduce prompt processing time by 80%+ when the base prompt remains stable.

Learning Registry (`atlas/runtime/storage/schema.sql`)

learning_registry keeps the current playbooks for each learning_key. The table stores a single row per key with:

learning_key – primary identifier (task or project scope).
student_learning / teacher_learning – latest playbook bodies (text).
metadata – optional JSON payload (e.g., synthesizer audit info, hashes you compute upstream).
updated_at – timestamp of the most recent update.

Because both roles live in the same row, updates atomically replace the student and teacher playbooks together. Historical snapshots remain accessible via sessions.student_learning and sessions.teacher_learning, giving you a time-series view even as the registry is overwritten with fresher guidance.

Discovery Telemetry

Autodiscovery (atlas env init) persists complementary context in discovery_runs. Each record stores module hashes, autogenerated factory metadata, and preflight results. Learning summaries link back to matching discovery runs so you can reproduce the environment that produced a given playbook.

Persistence Topology

discovery_runs ─┐
                │
                ▼
          learning_registry ◄── learning synthesizer (updates)
                │
                ▼
            sessions ───► trajectory_events
                │             ▲
                └─ reward_stats│

discovery_runs captures onboarding metadata and is referenced in learning reports.
sessions records adaptive summaries, reward stats, review status, and session-specific learning notes.
trajectory_events stores per-step evidence (event_type, actor, payload digests).
learning_registry contains the most recent playbook per key/role.

Use learning_key to join these tables. The learning evaluation harness (below) already performs that join.

Configuration Reference

Add a learning block to your runtime config to control behavior:

Parameter	Default	Purpose
`enabled`	`true`	Master switch. Set `false` to run without playbooks or registry updates.
`update_enabled`	`true`	Allow the synthesizer to write updated playbooks. Disable for read-only playbook usage.
`llm`	`null`	Override synthesizer model; defaults to the runtime’s standard LLM if omitted.
`prompts`	`null`	Custom prompt templates for the synthesizer LLM.
`history_limit`	`10`	Max historical sessions considered when generating an update.
`session_note_enabled`	`true`	Emit per-session learning notes alongside registry updates.
`apply_to_prompts`	`true`	Inject playbooks into persona prompts and validation payloads.
`playbook_injection_mode`	`prefix`	Playbook injection strategy: `prefix` (default) or `suffix` (KV cache optimization).

Example configuration:

learning:
  enabled: true
  update_enabled: true
  history_limit: 25
  session_note_enabled: false
  apply_to_prompts: true
  llm:
    provider: openai
    model: gpt-5-mini
    api_key_env: OPENAI_API_KEY

orchestration.forced_mode (see atlas/config/models.py) locks the runtime into a specific lane—helpful when you want deterministic evaluation while toggling learning features.
runtime_safety.review.require_approval should remain true in production so only reviewed sessions feed the synthesizer. Override via ATLAS_REVIEW_REQUIRE_APPROVAL=0 for local experiments.
runtime_safety.drift guardrails help spot reward regressions after a playbook change.

Operating the Learning System

Seed the registry – run a curated set of tasks with learning.update_enabled=true to capture baseline playbooks. Export these as JSON for code review if needed.
Stage changes – toggle apply_to_prompts=false to generate candidate playbooks without impacting prompts.
Promote – flip apply_to_prompts back to true once the evaluation harness shows improved reward/uncertainty metrics.
Monitor – use the CLI and dashboards to track playbook_hash changes, reward drift, and review approvals per learning key.
Rollback – set update_enabled=false (to freeze) or enabled=false (to bypass entirely) if drift guardrails or reviewers flag an issue.

Tip: store exported playbooks (from the evaluation harness) alongside release notes so you can diff guidance between deployments.

Verification & Tooling

Run the official learning-report harness from the SDK when you need the full telemetry diff:

cd ../atlas-sdk
python scripts/report_learning.py \
  --database-url postgresql://atlas:atlas@localhost:5433/atlas \
  --recent-window 10 \
  --baseline-window 50 \
  --limit 5 \
  --output-dir results/learning

For quick inline spot checks, you can also query atlas.training_data directly:

python - <<'PY'
from statistics import mean
from atlas.training_data import get_training_sessions

sessions = get_training_sessions(
    db_url="postgresql://atlas:atlas@localhost:5433/atlas",
    learning_key="mcp-tool-learning",
    status_filters=["succeeded"],
    limit=200,
)

recent_scores = [s.session_reward["score"] for s in sessions[:25] if s.session_reward]
baseline_scores = [s.session_reward["score"] for s in sessions[25:] if s.session_reward]

def pct_diff(a, b): return (a - b) / b * 100 if b else 0
print(f"Recent avg reward: {mean(recent_scores):.3f}")
print(f"Baseline avg reward: {mean(baseline_scores):.3f}")
print(f"Delta: {pct_diff(mean(recent_scores), mean(baseline_scores)):.2f}%")
PY

Persist the JSON payloads under results/learning/ (for example results/learning/mcp-tool-learning.json) so you can diff windows in CI. Many teams wrap the snippet above in a small script that also counts review statuses and execution modes before pushing the summary to dashboards.

Troubleshooting

Symptom	Likely Cause	Resolution
Playbook hash does not change after successful runs	`learning.update_enabled` disabled or drift guardrails prevented persistence	Re-enable updates and confirm review approvals; check logs for guardrail warnings.
Prompts missing playbook content	`learning.apply_to_prompts=false` or cache invalidation failed	Flip the flag to `true`, clear `.atlas/cache/learning/*`, rerun `atlas run`.
Evaluation harness shows empty history	`learning_key` absent in sessions	Ensure `atlas env init` / runtime config tags sessions with consistent metadata.
Synthesizer timeouts	LLM provider throttling	Configure `learning.llm` with a provider-specific timeout or reduce `history_limit`.
Learning updates lost on restart	Registry not persisting (fixed in atlas-sdk v0.2.5+)	Upgrade to latest SDK version. Earlier versions had a persistence bug where updates were only cached in memory.
Playbook reverts to older version	Cache staleness or registry race condition	Clear `.atlas/cache/learning/*` and verify `learning_registry` table shows latest `updated_at` timestamp.

Learning Persistence Fix (v0.2.5+): Earlier versions of atlas-sdk had a bug where learning updates were cached in memory but not reliably persisted to the database. This caused playbooks to revert on restart. If you’re experiencing this issue, upgrade to atlas-sdk v0.2.5 or later:

pip install --upgrade arc-atlas

After upgrading, verify persistence is working:

# Run a session that should trigger learning
atlas run --config your_config.yaml --task "test task"

# Restart the runtime (new Python process)
# Verify playbook loads from database
atlas run --config your_config.yaml --task "another task"

# Check that playbook_hash in telemetry matches registry
psql $DATABASE_URL -c "SELECT learning_key, updated_at FROM learning_registry WHERE learning_key='your-key';"

SDK Configuration – full YAML reference, including the learning block.
Atlas CLI Reference – commands for discovery, execution, and review gating.
Export Runtime Traces – move approved sessions into training datasets.
Hybrid Learning Concept – theoretical background for offline + runtime learning.

Getting Started

SDK Guides

Examples

Training

Core Concepts

Reference

Benchmarks

Learning System Architecture

Why Runtime Learning Matters

Feedback Loop at a Glance

Pipeline Components

Learning Synthesizer (`atlas/learning/synthesizer.py`)

Playbook Resolver (`atlas/learning/playbook.py`)

Playbook Injection Modes

Learning Registry (`atlas/runtime/storage/schema.sql`)

Discovery Telemetry

Persistence Topology

Configuration Reference

Operating the Learning System

Verification & Tooling

Troubleshooting

Getting Started

SDK Guides

Examples

Training

Core Concepts

Reference

Benchmarks

​Why Runtime Learning Matters

​Feedback Loop at a Glance

​Pipeline Components

​Learning Synthesizer (atlas/learning/synthesizer.py)

​Playbook Resolver (atlas/learning/playbook.py)

​Playbook Injection Modes

​Learning Registry (atlas/runtime/storage/schema.sql)

​Discovery Telemetry

​Persistence Topology

​Configuration Reference

​Related Controls

​Operating the Learning System

​Verification & Tooling

​Troubleshooting

​Related Guides

Why Runtime Learning Matters

Feedback Loop at a Glance

Pipeline Components

Learning Synthesizer (`atlas/learning/synthesizer.py`)

Playbook Resolver (`atlas/learning/playbook.py`)

Playbook Injection Modes

Learning Registry (`atlas/runtime/storage/schema.sql`)

Discovery Telemetry

Persistence Topology

Configuration Reference

Related Controls

Operating the Learning System

Verification & Tooling

Troubleshooting

Related Guides