Introduction

Have questions? Chat with the docs using the assistant at the bottom.

What is ATLAS?

ATLAS gives your agents the ability to learn from every task they run in production, allowing your agents to truly evolve by improving reliability, reducing token costs, and building domain expertise over time through persistent memory. ATLAS turns your production environment into a training ground allowing for learning at inference time and preventing your agent’s performance from plateauing. Every interaction feeds a continuous loop: feedback, improvement, and redeployment ensuring that deployment is not the end of learning. How it works: you wrap your agent within your current stack and environment and we handle the complexity of continual learning at scale. The result: higher task success rates, lower token usage, and increased reliability for mission-critical workflows. ATLAS gives you full observability and control over the learning process. The framework handles the orchestration complexity while you retain ownership of your data, models, and training decisions.

The Value Proposition

Reduced token costs: ATLAS uses adaptive supervision lanes to allocate expensive reasoning only when needed, and improves policy efficiency over time via reward-guided optimization and teacher checkpoint updates. Increased task success rate: the student and teacher architecture performs real-time coaching and escalation, correcting errors before they impact production workflows. Compounded & transferable knowledge: persistent memory and offline RL turns production traces into learning traces, updating the teacher model so the agent continuously improves post-deployment. Your agent builds a durable library of domain expertise rather than treating deployment as a static endpoint. This results in agents that become cheaper, more accurate, and more reliable the longer they run.

How It Works: Closed-Loop Learning System

ATLAS wraps any base model (GPT, Claude, Gemini, open source checkpoints, or your own) with an inference-time closed-loop learning system that observes the agent’s action space in its live environment. The system executes tasks with built-in quality control that reviews every decision, and the Reward System scores the outcome. That signal can immediately trigger retries or feed downstream training jobs. The same loop powers both the runtime SDK (real-time quality control) and the training stack (offline optimization).

What ATLAS Provides

ATLAS wraps your existing agent framework with four components that create a complete learning loop:

Reasoning Core: Dual-agent reasoning loop (student + verifying teacher) that guides execution and captures learning signals
Reward System: Turns user feedback into dense reward signals (achieves 93.7% accuracy on RewardBench V2)
Learning Engine: Uses offline reinforcement learning (GRPO) to update models based on rewards
Persistent Memory: Stores all interactions in structured trace files for analysis and retraining

Together, these components form a closed-loop system: interaction traces flow into the reward system, the learning engine upgrades the reasoning core, and the refreshed models redeploy so your agent improves performance with each task.

ATLAS keeps your agent in a learn–evaluate–update cycle.

Runtime for ML Engineers

Autodiscovery CLI – Install the SDK (pip install arc-atlas), run atlas env init to discover your agent/environment pair, and execute tasks with atlas run. The CLI loads .env, scaffolds configs when needed, and records metadata under .atlas/.
Orchestrator loop – Each run triages a task, probes capability, and routes into auto, paired, coach, or escalate. The student agent works alongside a verifying teacher while telemetry streams through atlas.runtime.telemetry.
Telemetry & exports – Persist sessions to Postgres (storage block) and export reviewed traces with the CLI (arc-atlas … --include-status approved --output traces.jsonl). Review gating keeps production datasets safe before they feed training.
Learning playbooks – The runtime synthesizes student/teacher playbooks and stores them in learning_registry; see Learning System Architecture for how playbooks influence future prompts.
Offline training – Feed exported traces into the Runtime Traces dataset config and GRPO trainers to ship bespoke teachers without hand-labeling.

For CLI details and flags, read the Atlas CLI Reference.

Data Ownership: Atlas never modifies model weights during runtime—only RL training (which you control) updates weights. Trace storage is optional and self-hosted. You own all data.

The runtime provides immediate quality improvements through dual-agent orchestration. Export the same traces to train custom checkpoints with GRPO—captured traces become training data for both runtime and offline RL training.

Runtime vs. Training: Online continual learning (adaptive runtime with dual-agent orchestration) is implemented in the atlas-sdk. Offline RL training (GRPO) is implemented in Atlas Core (this repository).

End-to-End Lifecycle at a Glance

Stage	Run This	Output	Typical Effort
Runtime quality control	`atlas.core.run(..., stream_progress=True)`	Reviewed plan, per-step traces, live reward scores	Minutes
Persist + export	`storage:` block + `arc-atlas --database-url … --include-status approved --output traces.jsonl`	JSONL dataset mirroring production behaviour	Minutes
Export + train workflow	`scripts/run_offline_pipeline.py`	Convert runtime traces into a new teacher checkpoint	Minutes to launch (training time depends on compute)
Custom training	GRPO pipeline	Bespoke teacher checkpoint, ready to deploy	Multi-hour job on GPUs

Every stage feeds the next—runtime traces become the input for optimization and training.

Getting Started: Two Paths

Choose your starting point based on your goal:

🔧 Ready to ship code? Start with the SDK Quickstart—it walks through installation, configuration, and running your first dual-agent task in minutes.

See the Atlas SDK in action: from installation to measurable performance gains across real examples.

I want to…	Use this Path	Key Docs
Orchestrate tasks with a structured runtime loop.	Atlas SDK	`SDK Quickstart`
Wrap my existing agent in a quality-control loop.	Atlas SDK	`BYOA Adapters`
Convert runtime traces into GRPO training runs.	Atlas Core	`Offline Training Guide`
Fine-tune a custom model with RL.	Training & Optimization	`Offline Training Guide`

Choose your starting point:

SDK Runtime Orchestration

Use the Atlas orchestrator to run an existing agent with a closed-loop learning system. Get started in minutes.

Offline Training (Atlas Core)

Convert exported runtime traces into GRPO training jobs, evaluate reward deltas, and ship updated teacher checkpoints.

Research & Resources

Learn more about the methodology and science behind ATLAS:

ATLAS Technical Report (PDF) - Complete methodology, benchmarks, and implementation details
Arc Research - Our latest research advancing continual learning systems
GitHub Repository - Source code, examples, and issue tracking
HuggingFace Models - Pre-trained models
Evaluation Harnesses – Scripts for measuring runtime, reward, and learning performance

Getting Started

SDK Guides

Training

Core Concepts

Reference

Benchmarks

Introduction

What is ATLAS?

The Value Proposition

How It Works: Closed-Loop Learning System

What ATLAS Provides

Runtime for ML Engineers

End-to-End Lifecycle at a Glance

Getting Started: Two Paths

SDK Runtime Orchestration

Offline Training (Atlas Core)

Research & Resources

Getting Started

SDK Guides

Training

Core Concepts

Reference

Benchmarks

​What is ATLAS?

​The Value Proposition

​How It Works: Closed-Loop Learning System

​What ATLAS Provides

​Runtime for ML Engineers

​End-to-End Lifecycle at a Glance

​Getting Started: Two Paths

SDK Runtime Orchestration

Offline Training (Atlas Core)

​Research & Resources

What is ATLAS?

The Value Proposition

How It Works: Closed-Loop Learning System

What ATLAS Provides

Runtime for ML Engineers

End-to-End Lifecycle at a Glance

Getting Started: Two Paths

Research & Resources