Overview
This guide shows how to take traces captured by the Atlas SDK, distill them withscripts/validate_gkd.py, and interpret the resulting metrics. The example uses GSM8K data, but the same steps apply to any dataset—Atlas traces in Postgres or a Hugging Face dataset loaded via MathGKDDatasetConfig.
Export traces from the SDK
- Run your agent with the Atlas SDK and persist sessions to Postgres via the
storageblock inatlas.config. Every approved session (teacher intervention, student attempt, rewards) lives in the same schema Atlas Core expects. - Review sessions with
arc-atlas review sessions --database-url <postgres_url> --status pendingand approve the conversations you want to train on. - (Optional) Export a JSONL snapshot with
arc-atlas --database-url <postgres_url> --include-status approved --output traces/runtime.jsonlif you prefer file-based workflows. AtlasGKDTrainer can consume either the live Postgres DB or a JSONL file generated with the same schema.
Configure GKD (Postgres path)
EnsureATLAS_DB_URL points to the same Postgres instance the SDK writes to, then use the default Hydra config to run distillation:
trainer.learning_key, min_reward, etc., as needed for your workflow.
Run the validation script (Hugging Face path)
To validate end-to-end settings on public data, run:--dataset-name / --dataset-config to any Hugging Face dataset that contains math or reasoning conversations; the script formats it into the same chat schema the trainer expects. For your own traces, skip the HF flags and let the trainer load from Postgres via ATLAS_DB_URL.
Interpret math_validation_metrics.json
After the script finishes, inspect outputs/gkd_math_validation/math_validation_metrics.json. It contains:
training.train_loss: final training loss (useful for comparing configs).baselineanddistilledblocks: eval accuracy, average generated tokens, etc.success_deltaandtoken_reduction_pctderived from the baseline/d distilled metrics so you can see how the distilled student improved.
0.815 - 0.758 = +5.7 pp) and token reduction (1 - 180/210 ≈ 14.3%) to judge whether the run met your targets.
Next steps
- Use
scripts/examples/run_two_gear_gkd.pyto run the fast and reliability configs back-to-back and automatically print the comparison table. - Once you have Postgres traces from the Atlas runtime, re-run
train.py --config-name teacher_gkdpointing atATLAS_DB_URLto distill your own workflows instead of GSM8K.