Skip to main content
Atlas runtime learning exists to make the teacher more effective between offline GRPO cycles. Instead of hard-coding prompt tweaks or waiting for the next training run, the SDK captures guidance from successful sessions, synthesizes a “playbook,” and reinjects that context on subsequent requests. This page explains how the learning pipeline works, how data persists in Postgres, and which configuration levers production teams can use to control it.

Why Runtime Learning Matters

  • Faster feedback loops – learning runs on live telemetry, so student/teacher personas improve within hours instead of a full fine-tune.
  • Traceable changes – every playbook has a hash, metadata, and storage lineage so you can audit who learned what and when.
  • Safe by default – review gating, drift detection, and update toggles let you pause or roll back learning if behavior regresses.

Feedback Loop at a Glance

  1. Session executes – the dual-agent runtime completes a task and logs telemetry into sessions, trajectory_events, and reward tables.
  2. Reward judges score – the RIM ensemble produces reward, uncertainty, and escalation data.
  3. Learning synthesizer runs – after reward evaluation, Atlas calls the LearningSynthesizer to summarize salient guidance for student and teacher personas.
  4. Registry updateDatabase.upsert_learning_state writes the new pamphlet to the learning_registry table (keyed by learning_key).
  5. Playbook cached – on the next run for the same key, resolve_playbook retrieves and hashes the pamphlet, then injects it into persona prompts.
  6. Evaluation harness audits – engineering teams run scripts/eval_learning.py to track reward deltas, mode shifts, and review status across learning keys.

Pipeline Components

Learning Synthesizer (atlas/learning/synthesizer.py)

The synthesizer is distinct from reward judges. It:
  • runs after reward scoring so updates only occur on high-signal sessions,
  • consumes trajectory summaries, reward stats, and recent history (bounded by history_limit),
  • emits structured “student” and “teacher” guidance plus optional session notes,
  • uses a dedicated LLM (configurable via learning.llm) to transform raw notes into concise playbooks.
When learning.update_enabled is false, the synthesizer skips persistence but can still write per-session learning notes for auditing. This is useful for A/B testing new prompts before rolling them out.

Playbook Resolver (atlas/learning/playbook.py)

resolve_playbook is invoked during persona construction. It handles:
  • fetching the latest registry entry for the learning_key,
  • trimming long sections to meet token budgets,
  • caching the playbook on disk (per role) and computing a SHA256 hash,
  • returning both content and metadata so prompts, validation payloads, and cache keys include the hash.
Set learning.apply_to_prompts=false to keep generating pamphlets without injecting them into runtime prompts—a common pattern for staging and smoke tests. The hash still flows through telemetry so you can verify the correct playbook would have been used.

Learning Registry (atlas/runtime/storage/schema.sql)

learning_registry keeps the current pamphlets for each learning_key. The table stores a single row per key with:
  • learning_key – primary identifier (task or project scope).
  • student_learning / teacher_learning – latest pamphlet bodies (text).
  • metadata – optional JSON payload (e.g., synthesizer audit info, hashes you compute upstream).
  • updated_at – timestamp of the most recent update.
Because both roles live in the same row, updates atomically replace the student and teacher pamphlets together. Historical snapshots remain accessible via sessions.student_learning and sessions.teacher_learning, giving you a time-series view even as the registry is overwritten with fresher guidance.

Discovery Telemetry

Autodiscovery (atlas env init) persists complementary context in discovery_runs. Each record stores module hashes, autogenerated factory metadata, and preflight results. Learning summaries link back to matching discovery runs so you can reproduce the environment that produced a given playbook.

Persistence Topology

discovery_runs ─┐


          learning_registry ◄── learning synthesizer (updates)


            sessions ───► trajectory_events
                │             ▲
                └─ reward_stats│
  • discovery_runs captures onboarding metadata and is referenced in learning reports.
  • sessions records adaptive summaries, reward stats, review status, and session-specific learning notes.
  • trajectory_events stores per-step evidence (event_type, actor, payload digests).
  • learning_registry contains the most recent playbook per key/role.
Use learning_key to join these tables. The learning evaluation harness (below) already performs that join.

Configuration Reference

Add a learning block to your runtime config to control behavior:
ParameterDefaultPurpose
enabledtrueMaster switch. Set false to run without playbooks or registry updates.
update_enabledtrueAllow the synthesizer to write updated pamphlets. Disable for read-only playbook usage.
llmnullOverride synthesizer model; defaults to the runtime’s standard LLM if omitted.
promptsnullCustom prompt templates for the synthesizer LLM.
history_limit10Max historical sessions considered when generating an update.
session_note_enabledtrueEmit per-session learning notes alongside registry updates.
apply_to_promptstrueInject playbooks into persona prompts and validation payloads.
Example configuration:
learning:
  enabled: true
  update_enabled: true
  history_limit: 25
  session_note_enabled: false
  apply_to_prompts: true
  llm:
    provider: openai
    model: gpt-5-mini
    api_key_env: OPENAI_API_KEY
  • orchestration.forced_mode (see atlas/config/models.py) locks the runtime into a specific lane—helpful when you want deterministic evaluation while toggling learning features.
  • runtime_safety.review.require_approval should remain true in production so only reviewed sessions feed the synthesizer. Override via ATLAS_REVIEW_REQUIRE_APPROVAL=0 for local experiments.
  • runtime_safety.drift guardrails help spot reward regressions after a playbook change.

Operating the Learning System

  1. Seed the registry – run a curated set of tasks with learning.update_enabled=true to capture baseline pamphlets. Export these as JSON for code review if needed.
  2. Stage changes – toggle apply_to_prompts=false to generate candidate pamphlets without impacting prompts.
  3. Promote – flip apply_to_prompts back to true once the evaluation harness shows improved reward/uncertainty metrics.
  4. Monitor – use the CLI and dashboards to track pamphlet_hash changes, reward drift, and review approvals per learning key.
  5. Rollback – set update_enabled=false (to freeze) or enabled=false (to bypass entirely) if drift guardrails or reviewers flag an issue.
Tip: store exported playbooks (from the evaluation harness) alongside release notes so you can diff guidance between deployments.

Verification & Tooling

Run the learning evaluation harness to audit learning performance without enabling hints:
python scripts/eval_learning.py \
  --database-url postgresql://atlas:atlas@localhost:5433/atlas \
  --recent-window 10 \
  --baseline-window 50 \
  --summary-only
Key options:
  • --filter-project, --filter-task, --filter-tag – focus on a specific service or task taxonomy.
  • --learning-key – target explicit keys.
  • --compare-to results/learning/index.json – compute deltas against a prior run (manifest generated automatically).
  • --no-markdown – suppress human-readable reports for CI pipelines.
Outputs land under results/learning/:
  • <slug>_summary.json – structured payload with reward windows, execution-mode histograms, review status counts, and latest pamphlet metadata.
  • <slug>_summary.md – business-friendly digest ready for incident reviews.
  • index.json – manifest that drives comparison mode.

Troubleshooting

SymptomLikely CauseResolution
Playbook hash does not change after successful runslearning.update_enabled disabled or drift guardrails prevented persistenceRe-enable updates and confirm review approvals; check logs for guardrail warnings.
Prompts missing playbook contentlearning.apply_to_prompts=false or cache invalidation failedFlip the flag to true, clear .atlas/cache/learning/*, rerun atlas run.
Evaluation harness shows empty historylearning_key absent in sessionsEnsure atlas env init / runtime config tags sessions with consistent metadata.
Synthesizer timeoutsLLM provider throttlingConfigure learning.llm with a provider-specific timeout or reduce history_limit.
I