Overview
ATLAS configurations control every aspect of training and inference. Parameters are organized into logical groups for easier navigation.
Typical Usage
GRPOConfig Parameters
Core Training Parameters
Core Training Parameters
Parameter | Type | Default | Description |
---|---|---|---|
model_name_or_path | str | required | HuggingFace model or local path |
learning_rate | float | 1e-6 | Initial learning rate for AdamW optimizer |
num_train_epochs | int | 3 | Number of training epochs (inherited) |
per_device_train_batch_size | int | 8 | Batch size per GPU/TPU core (inherited) |
gradient_accumulation_steps | int | 1 | Steps before backward pass (inherited) |
warmup_ratio | float | 0.1 | Ratio of warmup steps (inherited) |
weight_decay | float | 0.01 | L2 regularization coefficient (inherited) |
max_grad_norm | float | 1.0 | Maximum gradient norm for clipping (inherited) |
transformers.TrainingArguments
GRPO Algorithm Parameters
GRPO Algorithm Parameters
Parameter | Type | Default | Description |
---|---|---|---|
beta | float | 0.04 | KL coefficient |
temperature | float | 0.9 | Temperature for sampling completions |
num_generations | int | 8 | Number of generations to sample |
max_completion_length | int | 256 | Maximum length of generated completion |
max_prompt_length | int | 512 | Maximum prompt length (truncated left) |
reward_weights | list[float] | None | Weights for each reward function |
trainers/grpo_config.py
Generation & Sampling
Generation & Sampling
Parameter | Type | Default | Description |
---|---|---|---|
top_k | int | None | Top-k sampling parameter |
top_p | float | 1.0 | Nucleus sampling threshold |
min_p | float | None | Minimum token probability |
repetition_penalty | float | 1.0 | Penalty for token repetition |
generation_aggregation_steps | int | None | Aggregates generations across steps |
shuffle_generation_inputs | bool | False | Randomly shuffle prompts |
grpo_config.py
vLLM Integration
vLLM Integration
Parameter | Type | Default | Description |
---|---|---|---|
use_vllm | bool | False | Use vLLM for generating completions |
use_vllm_server | bool | False | Use a vLLM server for generation |
vllm_device | str | ”auto” | Device where vLLM generation runs |
vllm_gpu_memory_utilization | float | 0.9 | GPU memory ratio for vLLM |
vllm_dtype | str | ”auto” | Data type for vLLM generation |
vllm_max_model_len | int | None | Max model length for vLLM |
vllm_host | str | None | Host of the vLLM server |
vllm_port | int | None | Port of the vLLM server |
num_vllm_clients | int | 1 | Number of vLLM clients |
grpo_config.py
lines 102-184Memory & Training Optimization
Memory & Training Optimization
Parameter | Type | Default | Description |
---|---|---|---|
offload_untrained_models | bool | False | Offload reference/reward models to minimize memory |
ds3_gather_for_generation | bool | True | Gather policy weights for generation (DeepSpeed ZeRO-3) |
backprop_accumulation_steps | int | None | Accumulate loss during backprop computations |
backprop_accumulation_micro_batch_size | int | None | Max per-device batch during backprop |
remove_unused_columns | bool | False | Keep only ‘prompt’ column in dataset |
TrainingArguments
Teacher Training Parameters
Teacher Training Parameters
Parameter | Type | Default | Description |
---|---|---|---|
max_probe_tokens | int | 500 | Maximum tokens for student diagnostic probing |
student_diagnostic_template | str | None | Template for generating student diagnostic probes |
teacher_adaptive_template | str | None | Template for generating teacher adaptive teaching |
student_with_teaching_template | str | None | Template for student solution with teaching |
student_baseline_template | str | None | Template for student baseline solution |
grpo_config.py
lines 330-353 (teacher-specific parameters)Logging & Checkpointing
Logging & Checkpointing
Parameter | Type | Default | Description |
---|---|---|---|
logging_steps | int | 10 | Log every N steps |
save_steps | int | 500 | Save checkpoint every N steps |
eval_steps | int | 500 | Evaluate every N steps |
save_total_limit | int | 3 | Maximum checkpoints to keep |
load_best_model_at_end | bool | True | Load best model after training |
metric_for_best_model | str | ”eval_reward” | Metric for model selection |
greater_is_better | bool | True | Whether metric should increase |
report_to | list | [“wandb”] | Logging integrations |
- Set
save_steps
=eval_steps
for consistency - Use
save_total_limit
to manage disk space - Enable W&B for experiment tracking
Teacher Training Usage
TeacherGRPOTrainer uses the same GRPOConfig but accepts additional constructor parameters:Command-Line Overrides
Any parameter can be overridden via command line:Configuration Usage
ATLAS configurations are standard dataclasses extendingtransformers.TrainingArguments
:
Source Code
For complete implementation details:- GRPOConfig:
trainers/grpo_config.py
- GRPOTrainer:
trainers/grpo.py
- TeacherGRPOTrainer:
trainers/teacher_trainers.py