
Have questions? Chat with the docs using the assistant at the bottom.
What is ATLAS?
ATLAS is a continual learning framework for production LLM agents. It combines runtime quality control with offline reinforcement learning to improve agent reliability, reduce token costs, and build domain expertise through persistent memory. The system layers a dual-agent reasoning loop on top of any model (GPT, Claude, Gemini, or custom checkpoints). A verifying teacher reviews the agent’s work at runtime, providing guidance when needed. The Atlas SDK streams those causality traces into Postgres, and Atlas Core reuses the exact data to run on-policy distillation (GKD) or GRPO depending on whether a trusted teacher exists. This keeps the SDK and trainer in lockstep: the conversations you inspect in the runtime viewer are the same ones the distillation or GRPO jobs ingest when you launch them from Atlas Core. ATLAS provides full observability and control. You own your data, models, and training decisions. The framework handles orchestration complexity while you configure supervision policies, reward functions, and deployment strategies, and when you are ready to train you can either export a JSONL snapshot or point the GKD/GRPO trainers directly at the same Postgres instance the SDK uses—both paths preserve the schema emitted during runtime.The Value Proposition
Reduced token costs: ATLAS uses adaptive supervision lanes to allocate expensive reasoning only when needed, and improves policy efficiency over time via reward-guided optimization and teacher checkpoint updates. Increased task success rate: the student and teacher architecture performs real-time coaching and escalation, correcting errors before they impact production workflows. Compounded & transferable knowledge: persistent memory and offline RL turns production traces into learning traces, updating the teacher model so the agent continuously improves post-deployment. Your agent builds a durable library of domain expertise rather than treating deployment as a static endpoint. This results in agents that become cheaper, more accurate, and more reliable the longer they run.How It Works: Closed-Loop Learning System
ATLAS wraps any base model (GPT, Claude, Gemini, open source checkpoints, or your own) with an inference-time closed-loop learning system that observes the agent’s action space in its live environment. The system executes tasks with built-in quality control that reviews every decision, and the Reward System scores the outcome. That signal can immediately trigger retries or feed downstream training jobs. The same loop powers both the runtime SDK (real-time quality control) and the training stack (offline optimization).What ATLAS Provides
ATLAS combines four components to create a complete learning loop. The Reasoning Core handles the dual-agent orchestration (student plus verifying teacher) and the Reward System turns feedback into dense signals. The Learning Engine now runs either on-policy distillation (GKD) or GRPO depending on the workflow: if a strong teacher exists you run the distillation recipes against the SDK’s Postgres database to compress it, and when you need exploration you switch to the GRPO pipeline. Persistent Memory stores every interaction in structured trace files so both trainers see the same conversations. Together these components form a closed-loop system—interaction traces flow into the reward system, the learning engine upgrades the reasoning core, and the refreshed models redeploy so your agent improves performance with each task.
ATLAS keeps your agent in a learn–evaluate–update cycle.
Runtime for ML Engineers
- Autodiscovery CLI – Install the SDK (
pip install arc-atlas), runatlas env initto discover your agent/environment pair, and execute tasks withatlas run. The CLI loads.env, scaffolds configs when needed, and records metadata under.atlas/. - Orchestrator loop – Each run triages a task, probes capability, and routes into
auto,paired, orcoachmode. The student agent works alongside a verifying teacher while telemetry streams throughatlas.runtime.telemetry. - Telemetry & exports – Persist sessions to Postgres (
storageblock) and export reviewed traces with the CLI (arc-atlas … --include-status approved --output traces.jsonl). Review gating keeps production datasets safe before they feed training, and both the GKD and GRPO trainers can attach to either the live Postgres instance or the exported JSONL file. - Learning playbooks – The runtime synthesizes student/teacher playbooks and stores them in
learning_registry; seeLearning System Architecturefor how playbooks influence future prompts. - Offline training – Feed exported traces into the
Runtime Traces dataset configand GRPO trainers to ship bespoke teachers without hand-labeling.
Atlas CLI Reference.
Data Ownership: Atlas never modifies model weights during runtime—only RL training (which you control) updates weights. Trace storage is optional and self-hosted. You own all data.
Runtime vs. Training: Online continual learning (adaptive runtime with dual-agent orchestration) is implemented in the atlas-sdk. Offline RL training (GRPO) is implemented in Atlas Core (this repository).
End-to-End Lifecycle at a Glance
| Stage | Run This | Output | Typical Effort |
|---|---|---|---|
| Runtime quality control | atlas.core.run(..., stream_progress=True) | Reviewed plan, per-step traces, live reward scores | Minutes |
| Persist + export | storage: block + arc-atlas --database-url … --include-status approved --output traces.jsonl | JSONL dataset mirroring production behaviour | Minutes |
| Export + train workflow | scripts/run_offline_pipeline.py | Convert runtime traces into a new teacher checkpoint | Minutes to launch (training time depends on compute) |
| Custom training | GRPO pipeline | Bespoke teacher checkpoint, ready to deploy | Multi-hour job on GPUs |
Getting Started: Two Paths
Choose your starting point based on your goal:🔧 Ready to ship code? Start with the
SDK Quickstart—it walks through installation, configuration, and running your first dual-agent task in minutes.See the Atlas SDK in action: from installation to measurable performance gains across real examples.
| I want to… | Use this Path | Key Docs |
|---|---|---|
| Orchestrate tasks with a structured runtime loop. | Atlas SDK | SDK Quickstart |
| Wrap my existing agent in a quality-control loop. | Atlas SDK | BYOA Adapters |
| Distill runtime traces into a smaller teacher. | Atlas Core | GKD Training |
| Convert runtime traces into GRPO training runs. | Atlas Core | Offline Training Guide |
SDK Runtime Orchestration
Use the Atlas orchestrator to run an existing agent with a closed-loop learning system. Get started in minutes.
Offline Training (Atlas Core)
Convert exported runtime traces into GRPO training jobs, evaluate reward deltas, and ship updated teacher checkpoints.
Research & Resources
Learn more about the methodology and science behind ATLAS:- ATLAS Technical Report (PDF) - Complete methodology, benchmarks, and implementation details
- Arc Research - Our latest research advancing continual learning systems
- GitHub Repository - Source code, examples, and issue tracking
- HuggingFace Models - Pre-trained models
- Evaluation Harnesses – Scripts for measuring runtime, reward, and learning performance