Skip to main content
Atlas uses the same “student” and “teacher” labels in two contexts—the adaptive runtime loop and offline training—but the responsibilities shift. At runtime the student is your agent executing the task; the teacher is a verifying reviewer that provides guidance and approvals. During training those labels refer to the models being optimized.
Runtime vs. Training: The SDK runtime orchestrates an existing agent to complete a task with real-time oversight. Model training optimizes teacher checkpoints for future tasks. For the conceptual overview, start with Adaptive Dual-Agent Reasoning.

Two Contexts, One Vocabulary

ContextStudent (your agent)Verifying teacherPrimary Goal
SDK RuntimePlanner, executor, and synthesizer personas that drive your agent.Plan reviewer, validator, and guidance author that certifies answers.Ship a reliable answer for the current task.
Model TrainingThe model being improved (e.g., a new policy checkpoint).The supervising model providing feedback during optimization.Improve the long-term performance of the student model.
The runtime personas live in atlas/personas/, while the training roles are part of the broader Atlas training system. Every runtime episode also flows through the adaptive controller described in adaptive_teaching: triage builds a dossier, a capability probe chooses a lane, and then the student/teacher duo behave differently depending on whether the mode is auto, paired, coach, or escalate.

The Runtime Student (your agent)

Located in atlas/personas/student.py, the runtime Student performs three key actions:
  1. Plan: Creates a dependency-aware plan (acreate_plan) when the selected lane is stepwise (coach / escalate). In single-shot lanes (auto / paired), the plan collapses to a single synthetic step. The prompt lives in student.prompts.planner.
  2. Execute: Runs either the single-shot step or each step from the reviewed plan, calling any necessary tools or adapters (aexecute_step). Lane choice controls whether retries are allowed.
  3. Synthesize: Compiles results into a final answer (asynthesize_final_answer). In paired mode the Teacher may validate only the final answer; in stepwise lanes synthesis happens after every validated step completes.
Key configuration levers include prompt templates (student.prompts), token budgets (max_*_tokens), and tool behavior (tool_choice).

The Runtime Verifying Teacher

Defined in atlas/personas/teacher.py, the runtime teacher acts as the quality-assurance layer.
  1. Plan review: Approves or rewrites the student’s plan (areview_plan). In auto/paired, this may be skipped entirely if the runtime collapses the task into a single-shot step.
  2. Validation: Certifies either the final answer (paired) or every step (coach/escalate) via avalidate_step, recording certification verdicts when the lane is paired.
  3. Guidance: When validation fails or the reward score is below the retry threshold, the teacher generates guidance (agenerate_guidance) that feeds the next attempt and is logged in the execution context.
Configuration options include the teacher’s LLM (teacher.llm), token limits for feedback (max_review_tokens), and plan caching (plan_cache_seconds).

The Runtime Feedback Loop

The Student and Teacher collaborate differently depending on the adaptive mode. All lanes still capture telemetry, reward, and persona updates.
  1. The Teacher reviews or rewrites the plan when the chosen lane is stepwise. In single-shot lanes the plan is condensed and executed immediately.
  2. Validation happens either once (paired) or after every step (coach/escalate). Auto runs skip validation to prioritise latency.
  3. If a lane permits retries and the reward score falls below the retry threshold (default: 0.6), the Teacher issues guidance and the Student replays the step with that context.
  4. After completion, the orchestrator persists adaptive_summary, session_reward, and persona updates so the next run can reuse the outcome.

Summary: Runtime vs. Training

To keep the contexts clear, remember this summary:
  • In the SDK runtime, the student is your agent (planner/executor) and the verifying teacher is the reviewer who guides and certifies work, with responsibilities modulated by the adaptive lane chosen for that task.
  • In model training, the Student is the model being improved, and the Teacher is the expert coach providing feedback.
When you see documentation referencing GRPO or reward shaping, you are in the Atlas Core training context. Discussions about continual learning now live in the SDK runtime documentation.

Next Steps

I