What is Hybrid Learning?
ATLAS separates learning into two phases: offline foundation training (one-time, compute-intensive) and runtime continual learning (continuous, lightweight). This architecture solves the enterprise AI constraint: lack of high-quality preference data for complex business tasks. Key insight: Train a teacher once on math reasoning (where data is abundant and logic is clear), then apply that reasoning to any domain (CRM, telecom, debugging) without domain-specific retraining.The Two-Phase Paradigm
Phase 1: Offline Foundation Training (Atlas Core)
Establish deep, generalizable skills through reinforcement learning:| Characteristic | Detail |
|---|---|
| Compute requirement | Minimum 2 GPUs (1 for vLLM, 1 for training) |
| Training data | ~900 curated dual-agent demonstrations from Arc-ATLAS-Teach-v0 |
| Foundation domain | Mathematics (reasoning, sequential thinking, problem decomposition) |
| Transfer capability | Math-trained reasoning generalizes to debugging, coding, analytical tasks |
| Cost model | One-time; amortized over all deployments |
Phase 2: Runtime Continual Learning (Atlas SDK)
The atlas-sdk runtime adapts pre-trained teachers to specific tasks:| Characteristic | Detail |
|---|---|
| Infrastructure | Managed APIs in SDK |
| Speed | Improves over hours vs full retraining cycles |
| Safety | Non-degradation guarantee via reward guardrails |
| Feedback loop | Feeds fresh traces into next offline training job |
Performance Comparison
| Training Approach | Time to Deploy | Performance Gain | Cost | Generalization |
|---|---|---|---|---|
| Fine-tuning | 1-2 weeks | +10-15% | $1000s | Poor |
| Few-shot prompting | Minutes | +3-5% | ~$1 | Limited |
| ATLAS (Runtime + GRPO) | Hours | +15.7% baseline | API + GPU | Excellent |
Cross-Domain Transfer Results
Why Mathematics as Foundation?
- Clear correctness: Verifiable ground truth (unlike business tasks)
- Abundant data: Thousands of well-structured problems
- Pure reasoning: Systematic thinking, problem decomposition, logical flow
- Complexity gradient: Simple arithmetic → AIME-level competition problems
Validated Transfer
Mathematics → Telecom (τ²-bench):- Teacher trained only on math problems
- Applied to telecom troubleshooting (no telecom training)
- Result: 24.0% pass@1 (vs 18.0% for GPT-4.1, Claude 3.7)
- Same math-trained teacher
- Applied to policy compliance tasks
- Result: 54% task completion (vs ~35% for leading models)
- Critical accuracy: 69.2% identifying policy violations
Quick Start
1
Offline Foundation
Download pre-trained teacher or train custom:
2
Run GRPO on Exported Traces
3
Deploy Enhanced Model
Point SDK at new checkpoint:
Next Steps
Adaptive Dual-Agent Reasoning
Two-pass dual-agent mechanism
Offline Training Guide
Run GRPO training pipeline
Training Configuration
Hydra composition and overrides
SDK Runtime
Export traces and manage runtime learning
References
- ATLAS Technical Report - Sections 3.1-3.3 on hybrid architecture
- GRPO Algorithm - Foundation for offline training
- SDK Runtime Guide - Export traces and continual learning