Runtime vs. Training: The SDK runtime orchestrates an existing agent to complete a task with real-time oversight. Model training optimizes teacher checkpoints for future tasks. For the conceptual overview, start with
Adaptive Dual-Agent Reasoning
.Two Contexts, One Vocabulary
Context | Student (your agent) | Verifying teacher | Primary Goal |
---|---|---|---|
SDK Runtime | Planner, executor, and synthesizer personas that drive your agent. | Plan reviewer, validator, and guidance author that certifies answers. | Ship a reliable answer for the current task. |
Model Training | The model being improved (e.g., a new policy checkpoint). | The supervising model providing feedback during optimization. | Improve the long-term performance of the student model. |
atlas/personas/
, while the training roles are part of the broader Atlas training system. Every runtime episode also flows through the adaptive controller described in adaptive_teaching
: triage builds a dossier, a capability probe chooses a lane, and then the student/teacher duo behave differently depending on whether the mode is auto
, paired
, coach
, or escalate
.
The Runtime Student (your agent)
Located inatlas/personas/student.py
, the runtime Student performs three key actions:
- Plan: Creates a dependency-aware plan (
acreate_plan
) when the selected lane is stepwise (coach
/escalate
). In single-shot lanes (auto
/paired
), the plan collapses to a single synthetic step. The prompt lives instudent.prompts.planner
. - Execute: Runs either the single-shot step or each step from the reviewed plan, calling any necessary tools or adapters (
aexecute_step
). Lane choice controls whether retries are allowed. - Synthesize: Compiles results into a final answer (
asynthesize_final_answer
). Inpaired
mode the Teacher may validate only the final answer; in stepwise lanes synthesis happens after every validated step completes.
student.prompts
), token budgets (max_*_tokens
), and tool behavior (tool_choice
).
The Runtime Verifying Teacher
Defined inatlas/personas/teacher.py
, the runtime teacher acts as the quality-assurance layer.
- Plan review: Approves or rewrites the student’s plan (
areview_plan
). Inauto
/paired
, this may be skipped entirely if the runtime collapses the task into a single-shot step. - Validation: Certifies either the final answer (
paired
) or every step (coach
/escalate
) viaavalidate_step
, recording certification verdicts when the lane ispaired
. - Guidance: When validation fails or the reward score is below the retry threshold, the teacher generates guidance (
agenerate_guidance
) that feeds the next attempt and is logged in the execution context.
teacher.llm
), token limits for feedback (max_review_tokens
), and plan caching (plan_cache_seconds
).
The Runtime Feedback Loop
The Student and Teacher collaborate differently depending on the adaptive mode. All lanes still capture telemetry, reward, and persona updates.- The Teacher reviews or rewrites the plan when the chosen lane is stepwise. In single-shot lanes the plan is condensed and executed immediately.
- Validation happens either once (
paired
) or after every step (coach
/escalate
). Auto runs skip validation to prioritise latency. - If a lane permits retries and the reward score falls below the retry threshold (default: 0.6), the Teacher issues guidance and the Student replays the step with that context.
- After completion, the orchestrator persists
adaptive_summary
,session_reward
, and persona updates so the next run can reuse the outcome.
Summary: Runtime vs. Training
To keep the contexts clear, remember this summary:- In the SDK runtime, the student is your agent (planner/executor) and the verifying teacher is the reviewer who guides and certifies work, with responsibilities modulated by the adaptive lane chosen for that task.
- In model training, the Student is the model being improved, and the Teacher is the expert coach providing feedback.
Next Steps
- Explore the YAML knobs in
SDK Configuration
and dive deeper into lanes with theAdaptive Runtime Guide
. - See the Student and Teacher in motion in
How Orchestration Works
. - Jump into training workflows with the Training Quickstart or the Offline Training guide.