Skip to main content
ATLAS Hero Image
Have questions? Chat with the docs using the assistant at the bottom.

What is ATLAS?

ATLAS gives your agents the ability to learn from every task they run in production, allowing your agents to truly evolve by improving reliability, reducing token costs, and building domain expertise over time through persistent memory. ATLAS turns your production environment into a training ground allowing for learning at inference time and preventing your agent’s performance from plateauing. Every interaction feeds a continuous loop: feedback, improvement, and redeployment ensuring that deployment is not the end of learning. How it works: you wrap your agent within your current stack and environment and we handle the complexity of continual learning at scale. The result: higher task success rates, lower token usage, and increased reliability for mission-critical workflows. ATLAS gives you full observability and control over the learning process. The framework handles the orchestration complexity while you retain ownership of your data, models, and training decisions.

The Value Proposition

Reduced token costs: ATLAS uses adaptive supervision lanes to allocate expensive reasoning only when needed, and improves policy efficiency over time via reward-guided optimization and teacher checkpoint updates. Increased task success rate: the student and teacher architecture performs real-time coaching and escalation, correcting errors before they impact production workflows. Compounded & transferable knowledge: persistent memory and offline RL turns production traces into learning traces, updating the teacher model so the agent continuously improves post-deployment. Your agent builds a durable library of domain expertise rather than treating deployment as a static endpoint. This results in agents that become cheaper, more accurate, and more reliable the longer they run.

How It Works: Closed-Loop Learning System

ATLAS wraps any base model (GPT, Claude, Gemini, open source checkpoints, or your own) with an inference-time closed-loop learning system that observes the agent’s action space in its live environment. The system executes tasks with built-in quality control that reviews every decision, and the Reward System scores the outcome. That signal can immediately trigger retries or feed downstream training jobs. The same loop powers both the runtime SDK (real-time quality control) and the training stack (offline optimization).

What ATLAS Provides

ATLAS wraps your existing agent framework with four components that create a complete learning loop:
  1. Reasoning Core: Dual-agent reasoning loop (student + verifying teacher) that guides execution and captures learning signals
  2. Reward System: Turns user feedback into dense reward signals (achieves 93.7% accuracy on RewardBench V2)
  3. Learning Engine: Uses offline reinforcement learning (GRPO) to update models based on rewards
  4. Persistent Memory: Stores all interactions in structured trace files for analysis and retraining
Together, these components form a closed-loop system: interaction traces flow into the reward system, the learning engine upgrades the reasoning core, and the refreshed models redeploy so your agent improves performance with each task.
ATLAS System Architecture

ATLAS keeps your agent in a learn–evaluate–update cycle.

Runtime for ML Engineers

  • Autodiscovery CLI – Install the SDK (pip install arc-atlas), run atlas env init to discover your agent/environment pair, and execute tasks with atlas run. The CLI loads .env, scaffolds configs when needed, and records metadata under .atlas/.
  • Orchestrator loop – Each run triages a task, probes capability, and routes into auto, paired, coach, or escalate. The student agent works alongside a verifying teacher while telemetry streams through atlas.runtime.telemetry.
  • Telemetry & exports – Persist sessions to Postgres (storage block) and export reviewed traces with the CLI (arc-atlas … --include-status approved --output traces.jsonl). Review gating keeps production datasets safe before they feed training.
  • Learning playbooks – The runtime synthesizes student/teacher playbooks and stores them in learning_registry; see Learning System Architecture for how playbooks influence future prompts.
  • Offline training – Feed exported traces into the Runtime Traces dataset config and GRPO trainers to ship bespoke teachers without hand-labeling.
For CLI details and flags, read the Atlas CLI Reference.
Data Ownership: Atlas never modifies model weights during runtime—only RL training (which you control) updates weights. Trace storage is optional and self-hosted. You own all data.
The runtime provides immediate quality improvements through dual-agent orchestration. Export the same traces to train custom checkpoints with GRPO—captured traces become training data for both runtime and offline RL training.
Runtime vs. Training: Online continual learning (adaptive runtime with dual-agent orchestration) is implemented in the atlas-sdk. Offline RL training (GRPO) is implemented in Atlas Core (this repository).

End-to-End Lifecycle at a Glance

StageRun ThisOutputTypical Effort
Runtime quality controlatlas.core.run(..., stream_progress=True)Reviewed plan, per-step traces, live reward scoresMinutes
Persist + exportstorage: block + arc-atlas --database-url … --include-status approved --output traces.jsonlJSONL dataset mirroring production behaviourMinutes
Export + train workflowscripts/run_offline_pipeline.pyConvert runtime traces into a new teacher checkpointMinutes to launch (training time depends on compute)
Custom trainingGRPO pipelineBespoke teacher checkpoint, ready to deployMulti-hour job on GPUs
Every stage feeds the next—runtime traces become the input for optimization and training.

Getting Started: Two Paths

Choose your starting point based on your goal:
🔧 Ready to ship code? Start with the SDK Quickstart—it walks through installation, configuration, and running your first dual-agent task in minutes.

See the Atlas SDK in action: from installation to measurable performance gains across real examples.


I want to…Use this PathKey Docs
Orchestrate tasks with a structured runtime loop.Atlas SDKSDK Quickstart
Wrap my existing agent in a quality-control loop.Atlas SDKBYOA Adapters
Convert runtime traces into GRPO training runs.Atlas CoreOffline Training Guide
Fine-tune a custom model with RL.Training & OptimizationOffline Training Guide
Choose your starting point:

Research & Resources

Learn more about the methodology and science behind ATLAS: