title: Training Configs API description: Python reference for loading and overriding Atlas training configurations. sidebarTitle: Training Configs icon: sliders
Overview
ATLAS configurations control every aspect of training and inference. Parameters are organized into logical groups for easier navigation.
Typical Usage
GRPOConfig Parameters
Core Training Parameters
Core Training Parameters
Parameter | Type | Default | Description |
---|---|---|---|
model_name_or_path | str | required | HuggingFace model or local path |
learning_rate | float | 1e-6 | Initial learning rate for AdamW optimizer |
num_train_epochs | int | 3 | Number of training epochs (inherited) |
per_device_train_batch_size | int | 8 | Batch size per GPU/TPU core (inherited) |
gradient_accumulation_steps | int | 1 | Steps before backward pass (inherited) |
warmup_ratio | float | 0.1 | Ratio of warmup steps (inherited) |
weight_decay | float | 0.01 | L2 regularization coefficient (inherited) |
max_grad_norm | float | 1.0 | Maximum gradient norm for clipping (inherited) |
transformers.TrainingArguments
GRPO Algorithm Parameters
GRPO Algorithm Parameters
Parameter | Type | Default | Description |
---|---|---|---|
beta | float | 0.04 | KL coefficient |
temperature | float | 0.9 | Temperature for sampling completions |
num_generations | int | 8 | Number of generations to sample |
max_completion_length | int | 256 | Maximum length of generated completion |
max_prompt_length | int | 512 | Maximum prompt length (truncated left) |
reward_weights | list[float] | None | Weights for each reward function |
trainers/grpo_config.py
Generation & Sampling
Generation & Sampling
Parameter | Type | Default | Description |
---|---|---|---|
top_k | int | None | Top-k sampling parameter |
top_p | float | 1.0 | Nucleus sampling threshold |
min_p | float | None | Minimum token probability |
repetition_penalty | float | 1.0 | Penalty for token repetition |
generation_aggregation_steps | int | None | Aggregates generations across steps |
shuffle_generation_inputs | bool | False | Randomly shuffle prompts |
grpo_config.py
vLLM Integration
vLLM Integration
Parameter | Type | Default | Description |
---|---|---|---|
use_vllm | bool | False | Use vLLM for generating completions |
use_vllm_server | bool | False | Use a vLLM server for generation |
vllm_device | str | ”auto” | Device where vLLM generation runs |
vllm_gpu_memory_utilization | float | 0.9 | GPU memory ratio for vLLM |
vllm_dtype | str | ”auto” | Data type for vLLM generation |
vllm_max_model_len | int | None | Max model length for vLLM |
vllm_host | str | None | Host of the vLLM server |
vllm_port | int | None | Port of the vLLM server |
num_vllm_clients | int | 1 | Number of vLLM clients |
grpo_config.py
lines 102-184Memory & Training Optimization
Memory & Training Optimization
Parameter | Type | Default | Description |
---|---|---|---|
offload_untrained_models | bool | False | Offload reference/reward models to minimize memory |
ds3_gather_for_generation | bool | True | Gather policy weights for generation (DeepSpeed ZeRO-3) |
backprop_accumulation_steps | int | None | Accumulate loss during backprop computations |
backprop_accumulation_micro_batch_size | int | None | Max per-device batch during backprop |
remove_unused_columns | bool | False | Keep only ‘prompt’ column in dataset |
TrainingArguments
Teacher Training Parameters
Teacher Training Parameters
Parameter | Type | Default | Description |
---|---|---|---|
max_probe_tokens | int | 500 | Maximum tokens for student diagnostic probing |
student_diagnostic_template | str | None | Template for generating student diagnostic probes |
teacher_adaptive_template | str | None | Template for generating teacher adaptive teaching |
student_with_teaching_template | str | None | Template for student solution with teaching |
student_baseline_template | str | None | Template for student baseline solution |
grpo_config.py
lines 330-353 (teacher-specific parameters)Logging & Checkpointing
Logging & Checkpointing
Parameter | Type | Default | Description |
---|---|---|---|
logging_steps | int | 10 | Log every N steps |
save_steps | int | 500 | Save checkpoint every N steps |
eval_steps | int | 500 | Evaluate every N steps |
save_total_limit | int | 3 | Maximum checkpoints to keep |
load_best_model_at_end | bool | True | Load best model after training |
metric_for_best_model | str | ”eval_reward” | Metric for model selection |
greater_is_better | bool | True | Whether metric should increase |
report_to | list | [“wandb”] | Logging integrations |
- Set
save_steps
=eval_steps
for consistency - Use
save_total_limit
to manage disk space - Enable W&B for experiment tracking
Reward System Parameters
The reward ensemble is configured through Hydra rather than direct constructor arguments. Before editing YAML, decide which judges you need and how aggressively you want to escalate to the large arbiter. Once the intent is clear, update therim
block in the referenced config file.
Teacher Training Usage
TeacherGRPOTrainer uses the same GRPOConfig but accepts additional constructor parameters:Command-Line Overrides
Any parameter can be overridden via command line:Configuration Usage
ATLAS configurations are standard dataclasses extendingtransformers.TrainingArguments
:
Source Code
For complete implementation details:- GRPOConfig:
trainers/grpo_config.py
- GRPOTrainer:
trainers/grpo.py
- TeacherGRPOTrainer:
trainers/teacher_trainers.py