Atlas runtime learning exists to make the teacher more effective between offline GRPO cycles. Instead of hard-coding
prompt tweaks or waiting for the next training run, the SDK captures guidance from successful sessions, synthesizes a
“playbook,” and reinjects that context on subsequent requests. This page explains how the learning pipeline works,
how data persists in Postgres, and which configuration levers production teams can use to control it.
Why Runtime Learning Matters
- Faster feedback loops – learning runs on live telemetry, so student/teacher personas improve within hours instead of a full fine-tune.
- Traceable changes – every playbook has a hash, metadata, and storage lineage so you can audit who learned what and when.
- Safe by default – review gating, drift detection, and update toggles let you pause or roll back learning if behavior regresses.
Feedback Loop at a Glance
- Session executes – the dual-agent runtime completes a task and logs telemetry into
sessions, trajectory_events, and reward tables.
- Reward judges score – the RIM ensemble produces reward, uncertainty, and escalation data.
- Learning synthesizer runs – after reward evaluation, Atlas calls the
LearningSynthesizer to summarize salient guidance for student and teacher personas.
- Registry update –
Database.upsert_learning_state writes the new playbook to the learning_registry table (keyed by learning_key).
- Playbook cached – on the next run for the same key,
resolve_playbook retrieves and hashes the playbook, then injects it into persona prompts.
- Evaluation harness audits – engineering teams query
atlas.training_data (see the snippet below) to track reward deltas, mode shifts, and review status across learning keys.
Pipeline Components
Learning Synthesizer (atlas/learning/synthesizer.py)
The synthesizer is distinct from reward judges. It:
- runs after reward scoring so updates only occur on high-signal sessions,
- consumes trajectory summaries, reward stats, and recent history (bounded by
history_limit),
- emits structured “student” and “teacher” guidance plus optional session notes,
- uses a dedicated LLM (configurable via
learning.llm) to transform raw notes into concise playbooks.
When learning.update_enabled is false, the synthesizer skips persistence but can still write per-session learning
notes for auditing. This is useful for A/B testing new prompts before rolling them out.
Playbook Resolver (atlas/learning/playbook.py)
resolve_playbook is invoked during persona construction. It handles:
- fetching the latest registry entry for the
learning_key,
- trimming long sections to meet token budgets,
- caching the playbook on disk (per role) and computing a SHA256 hash,
- returning both content and metadata so prompts, validation payloads, and cache keys include the hash.
Set learning.apply_to_prompts=false to keep generating playbooks without injecting them into runtime prompts—a common
pattern for staging and smoke tests. The hash still flows through telemetry so you can verify the correct playbook would
have been used.
Playbook Injection Modes
Atlas supports two playbook injection strategies optimized for different scenarios:
Prefix Mode (Default)
learning:
injection_mode: prefix # Default
Playbooks are injected at the beginning of the system prompt. Best for:
- General guidance and behavioral patterns
- Task-agnostic learning that should influence all reasoning
- Compatibility with providers that don’t support advanced caching
Suffix Mode (KV Cache Optimization)
learning:
injection_mode: suffix
Playbooks are appended after the base system prompt. Optimized for:
- Provider KV cache efficiency: Anthropic and other providers cache the static system prompt prefix, avoiding recomputation when only the playbook changes
- Reduced latency: Cache hits eliminate prompt reprocessing overhead
- Cost savings: Cached tokens aren’t rebilled on subsequent requests
The runtime automatically computes cache breakpoints when injection_mode: suffix is used with compatible providers (Anthropic Claude, providers supporting prompt caching). The base system prompt becomes the cached prefix, and the playbook suffix can be updated without invalidating the cache.
When to use suffix mode: Enable injection_mode: suffix for production workloads with frequent playbook updates and providers supporting KV cache (e.g., Anthropic Claude). This can reduce prompt processing time by 80%+ when the base prompt remains stable.
Learning Registry (atlas/runtime/storage/schema.sql)
learning_registry keeps the current playbooks for each learning_key. The table stores a single row per key with:
learning_key – primary identifier (task or project scope).
student_learning / teacher_learning – latest playbook bodies (text).
metadata – optional JSON payload (e.g., synthesizer audit info, hashes you compute upstream).
updated_at – timestamp of the most recent update.
Because both roles live in the same row, updates atomically replace the student and teacher playbooks together. Historical
snapshots remain accessible via sessions.student_learning and sessions.teacher_learning, giving you a time-series view
even as the registry is overwritten with fresher guidance.
Discovery Telemetry
Autodiscovery (atlas env init) persists complementary context in discovery_runs. Each record stores module hashes,
autogenerated factory metadata, and preflight results. Learning summaries link back to matching discovery runs so you
can reproduce the environment that produced a given playbook.
Persistence Topology
discovery_runs ─┐
│
▼
learning_registry ◄── learning synthesizer (updates)
│
▼
sessions ───► trajectory_events
│ ▲
└─ reward_stats│
discovery_runs captures onboarding metadata and is referenced in learning reports.
sessions records adaptive summaries, reward stats, review status, and session-specific learning notes.
trajectory_events stores per-step evidence (event_type, actor, payload digests).
learning_registry contains the most recent playbook per key/role.
Use learning_key to join these tables. The learning evaluation harness (below) already performs that join.
Configuration Reference
Add a learning block to your runtime config to control behavior:
| Parameter | Default | Purpose |
|---|
enabled | true | Master switch. Set false to run without playbooks or registry updates. |
update_enabled | true | Allow the synthesizer to write updated playbooks. Disable for read-only playbook usage. |
llm | null | Override synthesizer model; defaults to the runtime’s standard LLM if omitted. |
prompts | null | Custom prompt templates for the synthesizer LLM. |
history_limit | 10 | Max historical sessions considered when generating an update. |
session_note_enabled | true | Emit per-session learning notes alongside registry updates. |
apply_to_prompts | true | Inject playbooks into persona prompts and validation payloads. |
playbook_injection_mode | prefix | Playbook injection strategy: prefix (default) or suffix (KV cache optimization). |
Example configuration:
learning:
enabled: true
update_enabled: true
history_limit: 25
session_note_enabled: false
apply_to_prompts: true
llm:
provider: openai
model: gpt-5-mini
api_key_env: OPENAI_API_KEY
orchestration.forced_mode (see atlas/config/models.py) locks the runtime into a specific lane—helpful when you want deterministic evaluation while toggling learning features.
runtime_safety.review.require_approval should remain true in production so only reviewed sessions feed the synthesizer. Override via ATLAS_REVIEW_REQUIRE_APPROVAL=0 for local experiments.
runtime_safety.drift guardrails help spot reward regressions after a playbook change.
Operating the Learning System
- Seed the registry – run a curated set of tasks with
learning.update_enabled=true to capture baseline playbooks. Export these as JSON for code review if needed.
- Stage changes – toggle
apply_to_prompts=false to generate candidate playbooks without impacting prompts.
- Promote – flip
apply_to_prompts back to true once the evaluation harness shows improved reward/uncertainty metrics.
- Monitor – use the CLI and dashboards to track
playbook_hash changes, reward drift, and review approvals per learning key.
- Rollback – set
update_enabled=false (to freeze) or enabled=false (to bypass entirely) if drift guardrails or reviewers flag an issue.
Tip: store exported playbooks (from the evaluation harness) alongside release notes so you can diff guidance between deployments.
Run the official learning-report harness from the SDK when you need the full telemetry diff:
cd ../atlas-sdk
python scripts/report_learning.py \
--database-url postgresql://atlas:atlas@localhost:5433/atlas \
--recent-window 10 \
--baseline-window 50 \
--limit 5 \
--output-dir results/learning
For quick inline spot checks, you can also query atlas.training_data directly:
python - <<'PY'
from statistics import mean
from atlas.training_data import get_training_sessions
sessions = get_training_sessions(
db_url="postgresql://atlas:atlas@localhost:5433/atlas",
learning_key="mcp-tool-learning",
status_filters=["succeeded"],
limit=200,
)
recent_scores = [s.session_reward["score"] for s in sessions[:25] if s.session_reward]
baseline_scores = [s.session_reward["score"] for s in sessions[25:] if s.session_reward]
def pct_diff(a, b): return (a - b) / b * 100 if b else 0
print(f"Recent avg reward: {mean(recent_scores):.3f}")
print(f"Baseline avg reward: {mean(baseline_scores):.3f}")
print(f"Delta: {pct_diff(mean(recent_scores), mean(baseline_scores)):.2f}%")
PY
Persist the JSON payloads under results/learning/ (for example results/learning/mcp-tool-learning.json) so you can diff windows in CI. Many teams wrap the snippet above in a small script that also counts review statuses and execution modes before pushing the summary to dashboards.
Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|
| Playbook hash does not change after successful runs | learning.update_enabled disabled or drift guardrails prevented persistence | Re-enable updates and confirm review approvals; check logs for guardrail warnings. |
| Prompts missing playbook content | learning.apply_to_prompts=false or cache invalidation failed | Flip the flag to true, clear .atlas/cache/learning/*, rerun atlas run. |
| Evaluation harness shows empty history | learning_key absent in sessions | Ensure atlas env init / runtime config tags sessions with consistent metadata. |
| Synthesizer timeouts | LLM provider throttling | Configure learning.llm with a provider-specific timeout or reduce history_limit. |
| Learning updates lost on restart | Registry not persisting (fixed in atlas-sdk v0.2.5+) | Upgrade to latest SDK version. Earlier versions had a persistence bug where updates were only cached in memory. |
| Playbook reverts to older version | Cache staleness or registry race condition | Clear .atlas/cache/learning/* and verify learning_registry table shows latest updated_at timestamp. |
Learning Persistence Fix (v0.2.5+): Earlier versions of atlas-sdk had a bug where learning updates were cached in memory but not reliably persisted to the database. This caused playbooks to revert on restart. If you’re experiencing this issue, upgrade to atlas-sdk v0.2.5 or later:pip install --upgrade arc-atlas
After upgrading, verify persistence is working:# Run a session that should trigger learning
atlas run --config your_config.yaml --task "test task"
# Restart the runtime (new Python process)
# Verify playbook loads from database
atlas run --config your_config.yaml --task "another task"
# Check that playbook_hash in telemetry matches registry
psql $DATABASE_URL -c "SELECT learning_key, updated_at FROM learning_registry WHERE learning_key='your-key';"