Overview
ATLAS provides specialized trainer classes for different training paradigms. Each trainer extends HuggingFace’s base trainer with RL-specific capabilities.
Typical Usage
GRPOTrainer
Main trainer for Group Relative Policy Optimization.Class Overview
GRPOTrainer is the main trainer class for Group Relative Policy Optimization, extending the standard HuggingFace Trainer with reinforcement learning capabilities.Parameters
Parameters
Parameter | Type | Description |
---|---|---|
config | GRPOConfig | Training configuration |
model | PreTrainedModel | Model to train (policy network) |
ref_model | PreTrainedModel | Reference model for KL penalty |
tokenizer | PreTrainedTokenizer | Tokenizer for encoding/decoding |
train_dataset | Dataset | Training data |
eval_dataset | Dataset | Evaluation data |
reward_model | PreTrainedModel | Optional external reward model |
compute_metrics | Callable | Custom metrics function |
callbacks | List[TrainerCallback] | Training callbacks |
optimizers | Tuple | Custom optimizer and scheduler |
Key Features
Key Features
GRPO Training Loop: Implements the complete reinforcement learning training process with policy gradient optimization and KL divergence constraints.Generation Support: Supports both local generation and distributed generation via vLLM server integration.Memory Management: Includes optimizations for training large models with gradient checkpointing and model offloading.Reward Integration: Handles multiple reward functions and reward weighting for complex optimization objectives.Implementation: See
trainers/grpo.py
for complete method signatures and implementation details.Training Hooks
Training Hooks
Override these methods for custom behavior:
trainers/grpo.py
TeacherGRPOTrainer
Specialized trainer for adaptive teaching with teacher-student paradigm.Class Overview
TeacherGRPOTrainer extends GRPOTrainer to implement the two-pass teaching protocol. From the actual source code (trainers/teacher_trainers.py
), this trainer:
- Inherits from both
GRPOTrainer
andTeacherTrainer
- Accepts
student_model
parameter in constructor - Implements diagnostic probing and adaptive teaching templates
- Manages both teacher and student models during training
Unique Methods
Unique Methods
Teaching Protocol
Teaching Protocol
The two-pass protocol implementation:
trainers/teacher_grpo.py
SFTTrainer
Supervised fine-tuning trainer for warmup before RL.Constructor
Key Features
Key Features
- Sequence packing: Efficient batching of variable-length sequences
- Custom formatting: Apply templates to raw data
- Gradient accumulation: Handle large effective batch sizes
- Mixed precision: FP16/BF16 training support
trainers/sft.py
Custom Trainer Implementation
Create your own trainer by extending base classes:Callbacks and Monitoring
Available Callbacks
Custom Metrics
Distributed Training
Multi-GPU Setup
DeepSpeed Integration
Implementation Notes
ATLAS trainers extend standard HuggingFace Trainer classes with RL-specific functionality. The implementation details can be found in:trainers/grpo.py
- Main GRPO trainer implementationtrainers/teacher_trainers.py
- Teacher-student training logictrainers/grpo_config.py
- Configuration parameters
Troubleshooting
Memory Issues
Memory Issues
Problem: CUDA OOM during trainingSolutions:
Slow Training
Slow Training
Problem: Training is slower than expectedSolutions:
Unstable Training
Unstable Training
Problem: Loss spikes or NaN valuesSolutions: