Runtime vs. Training: The SDK runtime focuses on orchestrating an existing agent to complete a task. Model training focuses on optimizing a model’s performance for future tasks.
Two Contexts, One Vocabulary
Context | Student | Teacher | Primary Goal |
---|---|---|---|
SDK Runtime | The planner, executor, and synthesizer that calls your agent. | The plan reviewer, output validator, and guidance author. | Deliver a reliable answer for the current task. |
Model Training | The model being improved (e.g., a new policy checkpoint). | The supervising model providing feedback during optimization. | Improve the long-term performance of the Student model. |
atlas/roles/
, while the training roles are part of the broader Atlas training system.
The Runtime Student
Located inatlas/roles/student.py
, the runtime Student performs three key actions:
- Plan: Creates a dependency graph to solve the task (
acreate_plan
). The prompt for this is set instudent.prompts.planner
. - Execute: Runs each step from the plan, calling any necessary tools or adapters (
aexecute_step
). - Synthesize: Compiles the results from all steps into a final answer for the user (
asynthesize_final_answer
).
student.prompts
), token budgets (max_*_tokens
), and tool behavior (tool_choice
).
The Runtime Teacher
Defined inatlas/roles/teacher.py
, the runtime Teacher acts as the quality assurance layer.
- Plan Review: Approves or rejects the Student’s plan before execution (
areview_plan
). - Validation: Checks the output of each step to ensure it meets quality standards (
avalidate_step
). - Guidance: If an output is poor, it generates feedback to guide the Student on the next attempt (
agenerate_guidance
).
teacher.llm
), token limits for feedback (max_review_tokens
), and plan caching (plan_cache_seconds
).
The Runtime Feedback Loop
The Student and Teacher collaborate in a tight loop to ensure quality.- The Teacher must approve the Student’s plan before any work begins.
- After each step, the Teacher validates the output. If the output is low-quality, it engages the Reward System.
- If the Reward System’s score is below the
retry_threshold
, the Teacher generates guidance, and the Student attempts the step again. - Once all steps pass validation, the Student synthesizes the final answer.
Summary: Runtime vs. Training
To keep the contexts clear, remember this summary:- In the SDK runtime, the Student is the doer (planner/executor) and the Teacher is the reviewer.
- In model training, the Student is the model being improved, and the Teacher is the expert coach providing feedback.
Next Steps
- Explore the YAML knobs in
SDK Configuration
. - See the Student and Teacher in motion in
How Orchestration Works
. - Jump into training workflows with the Training Quickstart or your First Experiment.