Atlas runtime learning exists to make the teacher more effective between offline GRPO cycles. Instead of hard-coding prompt tweaks or waiting for the next training run, the SDK captures guidance from successful sessions, synthesizes a “playbook,” and reinjects that context on subsequent requests. This page explains how the learning pipeline works, how data persists in Postgres, and which configuration levers production teams can use to control it.Documentation Index
Fetch the complete documentation index at: https://docs.arc.computer/llms.txt
Use this file to discover all available pages before exploring further.
Why Runtime Learning Matters
- Faster feedback loops – learning runs on live telemetry, so student/teacher personas improve within hours instead of a full fine-tune.
- Traceable changes – every playbook has a hash, metadata, and storage lineage so you can audit who learned what and when.
- Safe by default – review gating, drift detection, and update toggles let you pause or roll back learning if behavior regresses.
Feedback Loop at a Glance
- Session executes – the dual-agent runtime completes a task and logs telemetry into
sessions,trajectory_events, and reward tables. - Reward judges score – the RIM ensemble produces reward, uncertainty, and escalation data.
- Learning synthesizer runs – after reward evaluation, Atlas calls the
LearningSynthesizerto summarize salient guidance for student and teacher personas. - Registry update –
Database.upsert_learning_statewrites the new playbook to thelearning_registrytable (keyed bylearning_key). - Playbook cached – on the next run for the same key,
resolve_playbookretrieves and hashes the playbook, then injects it into persona prompts. - Evaluation harness audits – engineering teams query
atlas.training_data(see the snippet below) to track reward deltas, mode shifts, and review status across learning keys.
Pipeline Components
Learning Synthesizer (atlas/learning/synthesizer.py)
The synthesizer is distinct from reward judges. It:
- runs after reward scoring so updates only occur on high-signal sessions,
- consumes trajectory summaries, reward stats, and recent history (bounded by
history_limit), - emits structured “student” and “teacher” guidance plus optional session notes,
- uses a dedicated LLM (configurable via
learning.llm) to transform raw notes into concise playbooks.
learning.update_enabled is false, the synthesizer skips persistence but can still write per-session learning
notes for auditing. This is useful for A/B testing new prompts before rolling them out.
Playbook Resolver (atlas/learning/playbook.py)
resolve_playbook is invoked during persona construction. It handles:
- fetching the latest registry entry for the
learning_key, - trimming long sections to meet token budgets,
- caching the playbook on disk (per role) and computing a SHA256 hash,
- returning both content and metadata so prompts, validation payloads, and cache keys include the hash.
learning.apply_to_prompts=false to keep generating playbooks without injecting them into runtime prompts—a common
pattern for staging and smoke tests. The hash still flows through telemetry so you can verify the correct playbook would
have been used.
Playbook Injection Modes
Atlas supports two playbook injection strategies optimized for different scenarios: Prefix Mode (Default)- General guidance and behavioral patterns
- Task-agnostic learning that should influence all reasoning
- Compatibility with providers that don’t support advanced caching
- Provider KV cache efficiency: Anthropic and other providers cache the static system prompt prefix, avoiding recomputation when only the playbook changes
- Reduced latency: Cache hits eliminate prompt reprocessing overhead
- Cost savings: Cached tokens aren’t rebilled on subsequent requests
injection_mode: suffix is used with compatible providers (Anthropic Claude, providers supporting prompt caching). The base system prompt becomes the cached prefix, and the playbook suffix can be updated without invalidating the cache.
Learning Registry (atlas/runtime/storage/schema.sql)
learning_registry keeps the current playbooks for each learning_key. The table stores a single row per key with:
learning_key– primary identifier (task or project scope).student_learning/teacher_learning– latest playbook bodies (text).metadata– optional JSON payload (e.g., synthesizer audit info, hashes you compute upstream).updated_at– timestamp of the most recent update.
sessions.student_learning and sessions.teacher_learning, giving you a time-series view
even as the registry is overwritten with fresher guidance.
Discovery Telemetry
Autodiscovery (atlas env init) persists complementary context in discovery_runs. Each record stores module hashes,
autogenerated factory metadata, and preflight results. Learning summaries link back to matching discovery runs so you
can reproduce the environment that produced a given playbook.
Persistence Topology
discovery_runscaptures onboarding metadata and is referenced in learning reports.sessionsrecords adaptive summaries, reward stats, review status, and session-specific learning notes.trajectory_eventsstores per-step evidence (event_type, actor, payload digests).learning_registrycontains the most recent playbook per key/role.
learning_key to join these tables. The learning evaluation harness (below) already performs that join.
Configuration Reference
Add alearning block to your runtime config to control behavior:
| Parameter | Default | Purpose |
|---|---|---|
enabled | true | Master switch. Set false to run without playbooks or registry updates. |
update_enabled | true | Allow the synthesizer to write updated playbooks. Disable for read-only playbook usage. |
llm | null | Override synthesizer model; defaults to the runtime’s standard LLM if omitted. |
prompts | null | Custom prompt templates for the synthesizer LLM. |
history_limit | 10 | Max historical sessions considered when generating an update. |
session_note_enabled | true | Emit per-session learning notes alongside registry updates. |
apply_to_prompts | true | Inject playbooks into persona prompts and validation payloads. |
playbook_injection_mode | prefix | Playbook injection strategy: prefix (default) or suffix (KV cache optimization). |
Related Controls
orchestration.forced_mode(seeatlas/config/models.py) locks the runtime into a specific lane—helpful when you want deterministic evaluation while toggling learning features.runtime_safety.review.require_approvalshould remaintruein production so only reviewed sessions feed the synthesizer. Override viaATLAS_REVIEW_REQUIRE_APPROVAL=0for local experiments.runtime_safety.driftguardrails help spot reward regressions after a playbook change.
Operating the Learning System
- Seed the registry – run a curated set of tasks with
learning.update_enabled=trueto capture baseline playbooks. Export these as JSON for code review if needed. - Stage changes – toggle
apply_to_prompts=falseto generate candidate playbooks without impacting prompts. - Promote – flip
apply_to_promptsback totrueonce the evaluation harness shows improved reward/uncertainty metrics. - Monitor – use the CLI and dashboards to track
playbook_hashchanges, reward drift, and review approvals per learning key. - Rollback – set
update_enabled=false(to freeze) orenabled=false(to bypass entirely) if drift guardrails or reviewers flag an issue.
Verification & Tooling
Run the official learning-report harness from the SDK when you need the full telemetry diff:atlas.training_data directly:
results/learning/ (for example results/learning/mcp-tool-learning.json) so you can diff windows in CI. Many teams wrap the snippet above in a small script that also counts review statuses and execution modes before pushing the summary to dashboards.
Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|---|---|
| Playbook hash does not change after successful runs | learning.update_enabled disabled or drift guardrails prevented persistence | Re-enable updates and confirm review approvals; check logs for guardrail warnings. |
| Prompts missing playbook content | learning.apply_to_prompts=false or cache invalidation failed | Flip the flag to true, clear .atlas/cache/learning/*, rerun atlas run. |
| Evaluation harness shows empty history | learning_key absent in sessions | Ensure atlas env init / runtime config tags sessions with consistent metadata. |
| Synthesizer timeouts | LLM provider throttling | Configure learning.llm with a provider-specific timeout or reduce history_limit. |
| Learning updates lost on restart | Registry not persisting (fixed in atlas-sdk v0.2.5+) | Upgrade to latest SDK version. Earlier versions had a persistence bug where updates were only cached in memory. |
| Playbook reverts to older version | Cache staleness or registry race condition | Clear .atlas/cache/learning/* and verify learning_registry table shows latest updated_at timestamp. |
Learning Persistence Fix (v0.2.5+): Earlier versions of atlas-sdk had a bug where learning updates were cached in memory but not reliably persisted to the database. This caused playbooks to revert on restart. If you’re experiencing this issue, upgrade to atlas-sdk v0.2.5 or later:After upgrading, verify persistence is working:
Related Guides
SDK Configuration– full YAML reference, including the learning block.Atlas CLI Reference– commands for discovery, execution, and review gating.Export Runtime Traces– move approved sessions into training datasets.Hybrid Learning Concept– theoretical background for offline + runtime learning.