Why Runtime Learning Matters
- Faster feedback loops – learning runs on live telemetry, so student/teacher personas improve within hours instead of a full fine-tune.
- Traceable changes – every playbook has a hash, metadata, and storage lineage so you can audit who learned what and when.
- Safe by default – review gating, drift detection, and update toggles let you pause or roll back learning if behavior regresses.
Feedback Loop at a Glance
- Session executes – the dual-agent runtime completes a task and logs telemetry into
sessions,trajectory_events, and reward tables. - Reward judges score – the RIM ensemble produces reward, uncertainty, and escalation data.
- Learning synthesizer runs – after reward evaluation, Atlas calls the
LearningSynthesizerto summarize salient guidance for student and teacher personas. - Registry update –
Database.upsert_learning_statewrites the new pamphlet to thelearning_registrytable (keyed bylearning_key). - Playbook cached – on the next run for the same key,
resolve_playbookretrieves and hashes the pamphlet, then injects it into persona prompts. - Evaluation harness audits – engineering teams run
scripts/eval_learning.pyto track reward deltas, mode shifts, and review status across learning keys.
Pipeline Components
Learning Synthesizer (atlas/learning/synthesizer.py)
The synthesizer is distinct from reward judges. It:
- runs after reward scoring so updates only occur on high-signal sessions,
- consumes trajectory summaries, reward stats, and recent history (bounded by
history_limit), - emits structured “student” and “teacher” guidance plus optional session notes,
- uses a dedicated LLM (configurable via
learning.llm) to transform raw notes into concise playbooks.
learning.update_enabled is false, the synthesizer skips persistence but can still write per-session learning
notes for auditing. This is useful for A/B testing new prompts before rolling them out.
Playbook Resolver (atlas/learning/playbook.py)
resolve_playbook is invoked during persona construction. It handles:
- fetching the latest registry entry for the
learning_key, - trimming long sections to meet token budgets,
- caching the playbook on disk (per role) and computing a SHA256 hash,
- returning both content and metadata so prompts, validation payloads, and cache keys include the hash.
learning.apply_to_prompts=false to keep generating pamphlets without injecting them into runtime prompts—a common
pattern for staging and smoke tests. The hash still flows through telemetry so you can verify the correct playbook would
have been used.
Learning Registry (atlas/runtime/storage/schema.sql)
learning_registry keeps the current pamphlets for each learning_key. The table stores a single row per key with:
learning_key– primary identifier (task or project scope).student_learning/teacher_learning– latest pamphlet bodies (text).metadata– optional JSON payload (e.g., synthesizer audit info, hashes you compute upstream).updated_at– timestamp of the most recent update.
sessions.student_learning and sessions.teacher_learning, giving you a time-series view
even as the registry is overwritten with fresher guidance.
Discovery Telemetry
Autodiscovery (atlas env init) persists complementary context in discovery_runs. Each record stores module hashes,
autogenerated factory metadata, and preflight results. Learning summaries link back to matching discovery runs so you
can reproduce the environment that produced a given playbook.
Persistence Topology
discovery_runscaptures onboarding metadata and is referenced in learning reports.sessionsrecords adaptive summaries, reward stats, review status, and session-specific learning notes.trajectory_eventsstores per-step evidence (event_type, actor, payload digests).learning_registrycontains the most recent playbook per key/role.
learning_key to join these tables. The learning evaluation harness (below) already performs that join.
Configuration Reference
Add alearning block to your runtime config to control behavior:
| Parameter | Default | Purpose |
|---|---|---|
enabled | true | Master switch. Set false to run without playbooks or registry updates. |
update_enabled | true | Allow the synthesizer to write updated pamphlets. Disable for read-only playbook usage. |
llm | null | Override synthesizer model; defaults to the runtime’s standard LLM if omitted. |
prompts | null | Custom prompt templates for the synthesizer LLM. |
history_limit | 10 | Max historical sessions considered when generating an update. |
session_note_enabled | true | Emit per-session learning notes alongside registry updates. |
apply_to_prompts | true | Inject playbooks into persona prompts and validation payloads. |
Related Controls
orchestration.forced_mode(seeatlas/config/models.py) locks the runtime into a specific lane—helpful when you want deterministic evaluation while toggling learning features.runtime_safety.review.require_approvalshould remaintruein production so only reviewed sessions feed the synthesizer. Override viaATLAS_REVIEW_REQUIRE_APPROVAL=0for local experiments.runtime_safety.driftguardrails help spot reward regressions after a playbook change.
Operating the Learning System
- Seed the registry – run a curated set of tasks with
learning.update_enabled=trueto capture baseline pamphlets. Export these as JSON for code review if needed. - Stage changes – toggle
apply_to_prompts=falseto generate candidate pamphlets without impacting prompts. - Promote – flip
apply_to_promptsback totrueonce the evaluation harness shows improved reward/uncertainty metrics. - Monitor – use the CLI and dashboards to track
pamphlet_hashchanges, reward drift, and review approvals per learning key. - Rollback – set
update_enabled=false(to freeze) orenabled=false(to bypass entirely) if drift guardrails or reviewers flag an issue.
Verification & Tooling
Run the learning evaluation harness to audit learning performance without enabling hints:--filter-project,--filter-task,--filter-tag– focus on a specific service or task taxonomy.--learning-key– target explicit keys.--compare-to results/learning/index.json– compute deltas against a prior run (manifest generated automatically).--no-markdown– suppress human-readable reports for CI pipelines.
results/learning/:
<slug>_summary.json– structured payload with reward windows, execution-mode histograms, review status counts, and latest pamphlet metadata.<slug>_summary.md– business-friendly digest ready for incident reviews.index.json– manifest that drives comparison mode.
Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|---|---|
| Playbook hash does not change after successful runs | learning.update_enabled disabled or drift guardrails prevented persistence | Re-enable updates and confirm review approvals; check logs for guardrail warnings. |
| Prompts missing playbook content | learning.apply_to_prompts=false or cache invalidation failed | Flip the flag to true, clear .atlas/cache/learning/*, rerun atlas run. |
| Evaluation harness shows empty history | learning_key absent in sessions | Ensure atlas env init / runtime config tags sessions with consistent metadata. |
| Synthesizer timeouts | LLM provider throttling | Configure learning.llm with a provider-specific timeout or reduce history_limit. |
Related Guides
SDK Configuration– full YAML reference, including the learning block.Atlas CLI Reference– commands for discovery, execution, and review gating.Export Runtime Traces– move approved sessions into training datasets.Hybrid Learning Concept– theoretical background for offline + runtime learning.