Skip to main content

What is Hybrid Learning?

ATLAS separates learning into two phases: offline foundation training (one-time, compute-intensive) and runtime continual learning (continuous, lightweight). This architecture solves the enterprise AI constraint: lack of high-quality preference data for complex business tasks. Key insight: Train a teacher once on math reasoning (where data is abundant and logic is clear), then apply that reasoning to any domain (CRM, telecom, debugging) without domain-specific retraining.

The Two-Phase Paradigm

Phase 1: Offline Foundation Training (Atlas Core)

Establish deep, generalizable skills through reinforcement learning:
Offline RL Training (24-48 hours)
├── SFT Warmup: Base reasoning capabilities
├── GRPO Training: Adaptive teaching skills
└── Output: Teacher model with foundational knowledge
CharacteristicDetail
Compute requirementMinimum 2 GPUs (1 for vLLM, 1 for training)
Training data~900 curated dual-agent demonstrations from Arc-ATLAS-Teach-v0
Foundation domainMathematics (reasoning, sequential thinking, problem decomposition)
Transfer capabilityMath-trained reasoning generalizes to debugging, coding, analytical tasks
Cost modelOne-time; amortized over all deployments

Phase 2: Runtime Continual Learning (Atlas SDK)

The atlas-sdk runtime adapts pre-trained teachers to specific tasks:
Runtime Loop (continuous)
├── Task Analysis: Identify performance gaps via rewards
├── Experimentation: Adjust teaching prompts and strategies
├── Trace Export: Capture high-signal interactions
└── Output: Data for next GRPO cycle + incremental improvements
CharacteristicDetail
InfrastructureManaged APIs in SDK
SpeedImproves over hours vs full retraining cycles
SafetyNon-degradation guarantee via reward guardrails
Feedback loopFeeds fresh traces into next offline training job

Performance Comparison

Training ApproachTime to DeployPerformance GainCostGeneralization
Fine-tuning1-2 weeks+10-15%$1000sPoor
Few-shot promptingMinutes+3-5%~$1Limited
ATLAS (Runtime + GRPO)Hours+15.7% baselineAPI + GPUExcellent
*With pre-trained teacher models

Cross-Domain Transfer Results

Why Mathematics as Foundation?

  • Clear correctness: Verifiable ground truth (unlike business tasks)
  • Abundant data: Thousands of well-structured problems
  • Pure reasoning: Systematic thinking, problem decomposition, logical flow
  • Complexity gradient: Simple arithmetic → AIME-level competition problems
Our Teacher trained on ~7,000 math problems achieved 46% accuracy on AIME-25 (top-10 SOTA level).

Validated Transfer

Mathematics → Telecom (τ²-bench):
  • Teacher trained only on math problems
  • Applied to telecom troubleshooting (no telecom training)
  • Result: 24.0% pass@1 (vs 18.0% for GPT-4.1, Claude 3.7)
Mathematics → CRM (CRMArena-Pro):
  • Same math-trained teacher
  • Applied to policy compliance tasks
  • Result: 54% task completion (vs ~35% for leading models)
  • Critical accuracy: 69.2% identifying policy violations
→ Full methodology in Technical Report

Quick Start

1

Offline Foundation

Download pre-trained teacher or train custom:
# Option 1: Pre-trained
huggingface-cli download Arc-Intelligence/ATLAS-8B-Thinking

# Option 2: Custom (2+ GPUs)
scripts/launch.sh 2 configs/run/teacher_sft.yaml
scripts/launch_with_server.sh 1 1 configs/run/teacher_rcl.yaml
2

Run GRPO on Exported Traces

python scripts/run_offline_pipeline.py \
  --export-path traces/runtime.jsonl \
  --wandb-project atlas-production
3

Deploy Enhanced Model

Point SDK at new checkpoint:
teacher:
  llm:
    provider: huggingface
    model: /models/atlas-teacher-grpo
    temperature: 0.2

Next Steps

References