Skip to main content
Scope: This guide covers the Atlas training stack (Hydra configs, wrappers, optimization recipes). For the SDK runtime YAML, see the SDK Configuration Reference.

Directory Quick Reference

configs/
├── wrappers/          🎯 START HERE – wrap your existing agent
├── optimize/          📈 GEPA + vLLM optimization settings
├── examples/          ⚡ Ready-to-run tutorial configs
├── data/              💾 Dataset definitions
├── demo/              🔬 Full demo scenarios
├── rim_config.yaml    🏆 Reward system configuration
├── model/             🤖 Model architectures (advanced)
├── run/               🚀 Experiment recipes (SFT, GRPO, etc.)
└── trainer/           ⚙️ Algorithm defaults (GRPO, SFT)
Tip: Start with wrappers/ to plug Atlas into your agent, then explore optimize/ for GEPA runs or run/ for full training jobs. The deeper directories (model/, trainer/) are for advanced customisation.

Wrapping Your Agent (wrappers/)

  • HTTP API
  • Python Function
  • CLI Command
# configs/wrappers/my_api_agent.yaml
user_agent:
  type: custom
  config:
    integration_type: http_api
    endpoint: "http://localhost:8000/chat"
    prompt_field: "message"
    response_field: "response"
    headers:
      Authorization: "Bearer YOUR_API_KEY"
    timeout: 300

teacher_model: Arc-Intelligence/ATLAS-8B-Thinking
trainset: arc-atlas-rl
max_examples: 10
compatibility_mode: true

generation_config:
  max_tokens: 2048
  temperature: 0.7
  diagnostic_max_tokens: 500
./scripts/openai_agent_atlas.sh configs/wrappers/my_api_agent.yaml

Optimization & Datasets

  • configs/optimize/ – GEPA wrappers, vLLM clients, batching options.
  • configs/data/ – Dataset definitions (arc_atlas_rl.yaml, arc_atlas_sft.yaml). Adjust max_train_samples, preprocessing, or add new splits.
  • configs/rim_config.yaml – Reward system selections used by both runtime and training.
See Reward Design for guidance on judges, variance thresholds, and escalation strategies.

Hydra Composition Deep Dive

Hydra composes experiment recipes from reusable building blocks.

Layer Breakdown

LayerPurposeExample FilesWhen to Modify
train.yamlGlobal defaultsSingle fileRarely – system-wide changes only
run/*.yamlExperiment recipesteacher_rcl.yaml, teacher_sft.yamlNew experiment types
model/*.yamlModel specificationsqwen3_8b.yaml, llama3_8b.yamlAdding new architectures
data/*.yamlDataset configsarc_atlas_rl.yaml, arc_atlas_sft.yamlNew datasets or preprocessing
trainer/*.yamlAlgorithm settingsteacher_grpo.yaml, sft.yamlTweaking GRPO/SFT defaults
trainer/reward/*.yamlReward presets (Hydra _global_ group)rim_teaching.yamlSwap reward bundles or create new ensembles

Example: configs/run/teacher_rcl.yaml

defaults:
  - _self_
  - override /trainer: teacher_grpo
  - override /model: qwen3_8b
  - override /data: arc_atlas_rl
  - override /reward: rim_teaching
Later overrides win conflicts. _self_ keeps local settings at the top.
beta: 0.04
temperature: 0.7
grpo_alpha: 0.5
generation_aggregation_steps: 1
Controls the GRPO algorithm; see the Trainers API.
model_name_or_path: Qwen/Qwen2.5-7B-Instruct
torch_dtype: bfloat16
attn_implementation: flash_attention_2
model_kwargs:
  trust_remote_code: true
Swap in new checkpoints or quantisation settings here.
dataset_name: Arc-Intelligence/Arc-ATLAS
dataset_config: rl
max_train_samples: 100000
preprocessing:
  max_length: 2048
  pad_to_multiple_of: 16
Adjust dataset selection, sample limits, and preprocessing.
teacher_reward:
  _target_: RIM.reward_adapter.RIMReward
  config_path: configs/rim_config.yaml
Points the trainer at the reward ensemble defined in rim_config.yaml.

Customisation Patterns

Command-line Overrides

scripts/launch.sh 8 configs/run/teacher_sft.yaml learning_rate=1e-5
scripts/launch.sh 8 configs/run/teacher_sft.yaml model=llama3_8b
scripts/launch.sh 8 configs/run/teacher_sft.yaml \
  per_device_train_batch_size=2 \
  gradient_accumulation_steps=8

Creating New Configurations

1

Pick the layer

New model → configs/model/ • New dataset → configs/data/ • New experiment → configs/run/.
2

Copy a template

cp configs/model/qwen3_8b.yaml configs/model/my_model.yaml
3

Edit the template

model_name_or_path: meta-llama/Llama-3.2-8B-Instruct
torch_dtype: float16
load_in_4bit: true
4

Reference it in run config

defaults:
  - override /model: my_model

Multi-GPU Scaling

  • Single GPU
  • 4 GPUs
  • 8 GPUs
scripts/launch.sh 1 configs/run/teacher_sft.yaml \
  per_device_train_batch_size=1 \
  gradient_accumulation_steps=32 \
  offload=true

Common Scenarios

Memory-Constrained GPUs

per_device_train_batch_size: 1
gradient_accumulation_steps: 16
gradient_checkpointing: true
offload: true
torch_dtype: float16

Fast Iteration Mode

max_train_samples: 1000
num_train_epochs: 1
save_steps: 100
eval_steps: 100
logging_steps: 10

Production Training

num_train_epochs: 3
learning_rate: 5e-6
warmup_ratio: 0.1
weight_decay: 0.01
eval_strategy: "steps"
eval_steps: 500
save_total_limit: 3
load_best_model_at_end: true
metric_for_best_model: "eval_reward"

Debugging Config Issues

# Inspect composed configuration
python -m hydra.main \
  config_path=configs \
  config_name=train \
  hydra.verbose=true

# Override configs interactively
python -m hydra.main \
  config_path=configs \
  config_name=train \
  overrides="run=teacher_rcl model=my_model"
Use hydra.verbose=true to see each included file. If composition fails, confirm the path exists and the file is in the defaults list.

Next Steps

I