General Questions
What is ATLAS?
ATLAS (Adaptive Teaching and Learning Alignment System) is a framework that pairs your existing agent (the student) with a specialized verifying teacher that provides adaptive guidance. It uses a two-pass protocol: diagnostic assessment followed by targeted teaching.Does ATLAS train on my production data?
No. The runtime loop operates through inference-time feedback—no model weights are modified during production execution. Weight updates only happen when you explicitly run offline GRPO training jobs on your own infrastructure. Data control:- All session traces write exclusively to the Postgres database you provide via
storage.database_url - ATLAS never operates its own data store or accesses your database
- Leave
storage: nullin your config to run in ephemeral mode with no persistent data - Training is opt-in: you choose when to export traces and run offline training
How is ATLAS different from fine-tuning?
Unlike fine-tuning which modifies model weights, ATLAS:- Preserves the student model’s original capabilities
- Works with any model without retraining
- Adapts guidance based on student capability
- Provides immediate enhancement without training time
What performance improvements can I expect?
Across our evaluation suite we consistently see the closed-loop dual-agent runtime (student + verifying teacher) deliver an average +15.7% accuracy gain, 31% task completion lift, 97% non-degradation, and ~50% token savings versus baseline agents. Offline GRPO training then compounds those gains when you fine-tune custom teacher checkpoints using the traces exported from production. Actual results vary with task difficulty, data quality, and the strength of the underlying student model, but the closed loop plus GRPO stack gives you levers to reach those numbers. Online continual learning lives in theatlas-sdk runtime if you need rapid, task-specific adaptation.
Hardware & Setup
What hardware do I need?
Minimum Requirements:- GPU: 16GB VRAM (RTX 4080, A5000)
- RAM: 32GB system memory
- Storage: 100GB for models and data
- GPU: 4× A100 40GB or H100 80GB
- RAM: 128GB+ system memory
- Storage: 500GB NVMe SSD
- Can run on CPU (slower)
- 8GB VRAM with quantization
- Cloud instances work well
Can I run ATLAS on CPU?
Yes, with API-based models you need no GPU at all. For local model inference:- CPU inference is 10-50x slower than GPU
- Limited to smaller models (4B-8B)
- Quantization recommended
- Suitable for development/testing
Which models are compatible?
Verifying teacher checkpoints (pre-trained):- ATLAS-8B-Thinking (reasoning)
- ATLAS-8B-Instruct (coding)
- Qwen series (4B-70B)
- Llama series (7B-70B)
- Mistral/Mixtral models
- GPT-3.5/4 (via API)
- Claude (via API)
Training Questions
How long does training take?
Offline RL Training (GRPO):- SFT warmup: 4-8 hours
- GRPO training: 24-48 hours
- Hardware: 4-8 H100 GPUs
What’s the difference between online and offline training?
Offline Training (GRPO):- Creates foundational teaching skills
- Requires significant compute
- Produces generalizable models
- One-time investment
- Adapts to specific tasks using the runtime loop
- Runs through the SDK CLI and APIs
- Rapid iteration cycles driven by live traces
- Keeps production agents improving between offline training runs
Can I train on custom data?
Yes, prepare your data in this format:Implementation Questions
How do I integrate ATLAS into my application?
Use the ATLAS teaching protocol with real imports:Can ATLAS work with my existing agent?
Yes. Use theatlas-sdk runtime wrappers (HTTP, Python callable, OpenAI Assistants, CLI) to orchestrate your agent, export traces, and hand them to Atlas Core for training. The SDK documentation covers the available adapters and configuration options.
How do I monitor performance in production?
Use RIM reward scoring to track quality:- Weights & Biases
- TensorBoard
- Prometheus
- Custom logging systems
Performance & Optimization
Why is inference slow?
Common causes and solutions:-
Not using Flash Attention:
-
Small batch size:
-
No caching:
- CPU inference: Use GPU or quantization
How can I reduce memory usage?
Progressive solutions:-
Quantization (75% reduction):
- Smaller models: Use 4B instead of 8B
- Offloading: Move to CPU/disk
- Batch size: Reduce to 1
What if the teacher makes things worse?
ATLAS has a 97% non-degradation guarantee through:- Zero reward for performance drops
- Safety validation before deployment
- Fallback to baseline response
- Continuous monitoring
- Check task-model compatibility
- Verify data quality
- Adjust teaching parameters
- Export fresh traces and schedule a GRPO training run
Cost Questions
How much does ATLAS cost to run?
Training Costs:- Offline RL: $100-500 in compute (depends on GPUs and run length)
- Self-hosted: Electricity only
- Cloud GPU: $1-3/hour
- API-based: $0.001-0.01 per request
Is there a cloud service?
Currently ATLAS is open-source only. You can:- Self-host on your infrastructure
- Use cloud GPU providers
- Deploy on Hugging Face Spaces
- Contact team for enterprise support
Troubleshooting
Where can I get help?
How do I report a bug?
File an issue with:- Error message and stack trace
- System configuration
- Minimal reproduction code
- Expected vs actual behavior
Can I contribute to ATLAS?
Yes! We welcome contributions:- Code improvements
- Documentation
- Bug fixes
- New features
- Dataset contributions