Skip to main content
ATLAS Hero Image
Have questions? Chat with the docs using the assistant at the bottom.

What is ATLAS?

ATLAS is a continual learning framework for production LLM agents. It combines runtime quality control with offline reinforcement learning to improve agent reliability, reduce token costs, and build domain expertise through persistent memory. The system layers a dual-agent reasoning loop on top of any model (GPT, Claude, Gemini, or custom checkpoints). A verifying teacher reviews the agent’s work at runtime, providing guidance when needed. The Atlas SDK streams those causality traces into Postgres, and Atlas Core reuses the exact data to run on-policy distillation (GKD) or GRPO depending on whether a trusted teacher exists. This keeps the SDK and trainer in lockstep: the conversations you inspect in the runtime viewer are the same ones the distillation or GRPO jobs ingest when you launch them from Atlas Core. ATLAS provides full observability and control. You own your data, models, and training decisions. The framework handles orchestration complexity while you configure supervision policies, reward functions, and deployment strategies, and when you are ready to train you can either export a JSONL snapshot or point the GKD/GRPO trainers directly at the same Postgres instance the SDK uses—both paths preserve the schema emitted during runtime.

The Value Proposition

Reduced token costs: ATLAS uses adaptive supervision lanes to allocate expensive reasoning only when needed, and improves policy efficiency over time via reward-guided optimization and teacher checkpoint updates. Increased task success rate: the student and teacher architecture performs real-time coaching and escalation, correcting errors before they impact production workflows. Compounded & transferable knowledge: persistent memory and offline RL turns production traces into learning traces, updating the teacher model so the agent continuously improves post-deployment. Your agent builds a durable library of domain expertise rather than treating deployment as a static endpoint. This results in agents that become cheaper, more accurate, and more reliable the longer they run.

How It Works: Closed-Loop Learning System

ATLAS wraps any base model (GPT, Claude, Gemini, open source checkpoints, or your own) with an inference-time closed-loop learning system that observes the agent’s action space in its live environment. The system executes tasks with built-in quality control that reviews every decision, and the Reward System scores the outcome. That signal can immediately trigger retries or feed downstream training jobs. The same loop powers both the runtime SDK (real-time quality control) and the training stack (offline optimization).

What ATLAS Provides

ATLAS combines four components to create a complete learning loop. The Reasoning Core handles the dual-agent orchestration (student plus verifying teacher) and the Reward System turns feedback into dense signals. The Learning Engine now runs either on-policy distillation (GKD) or GRPO depending on the workflow: if a strong teacher exists you run the distillation recipes against the SDK’s Postgres database to compress it, and when you need exploration you switch to the GRPO pipeline. Persistent Memory stores every interaction in structured trace files so both trainers see the same conversations. Together these components form a closed-loop system—interaction traces flow into the reward system, the learning engine upgrades the reasoning core, and the refreshed models redeploy so your agent improves performance with each task.
ATLAS System Architecture

ATLAS keeps your agent in a learn–evaluate–update cycle.

Runtime for ML Engineers

  • Autodiscovery CLI – Install the SDK (pip install arc-atlas), run atlas env init to discover your agent/environment pair, and execute tasks with atlas run. The CLI loads .env, scaffolds configs when needed, and records metadata under .atlas/.
  • Orchestrator loop – Each run triages a task, probes capability, and routes into auto, paired, or coach mode. The student agent works alongside a verifying teacher while telemetry streams through atlas.runtime.telemetry.
  • Telemetry & exports – Persist sessions to Postgres (storage block) and export reviewed traces with the CLI (arc-atlas … --include-status approved --output traces.jsonl). Review gating keeps production datasets safe before they feed training, and both the GKD and GRPO trainers can attach to either the live Postgres instance or the exported JSONL file.
  • Learning playbooks – The runtime synthesizes student/teacher playbooks and stores them in learning_registry; see Learning System Architecture for how playbooks influence future prompts.
  • Offline training – Feed exported traces into the Runtime Traces dataset config and GRPO trainers to ship bespoke teachers without hand-labeling.
For CLI details and flags, read the Atlas CLI Reference.
Data Ownership: Atlas never modifies model weights during runtime—only RL training (which you control) updates weights. Trace storage is optional and self-hosted. You own all data.
The runtime provides immediate quality improvements through dual-agent orchestration. Export the same traces to train custom checkpoints with GRPO—captured traces become training data for both runtime and offline RL training.
Runtime vs. Training: Online continual learning (adaptive runtime with dual-agent orchestration) is implemented in the atlas-sdk. Offline RL training (GRPO) is implemented in Atlas Core (this repository).

End-to-End Lifecycle at a Glance

StageRun ThisOutputTypical Effort
Runtime quality controlatlas.core.run(..., stream_progress=True)Reviewed plan, per-step traces, live reward scoresMinutes
Persist + exportstorage: block + arc-atlas --database-url … --include-status approved --output traces.jsonlJSONL dataset mirroring production behaviourMinutes
Export + train workflowscripts/run_offline_pipeline.pyConvert runtime traces into a new teacher checkpointMinutes to launch (training time depends on compute)
Custom trainingGRPO pipelineBespoke teacher checkpoint, ready to deployMulti-hour job on GPUs
Every stage feeds the next—runtime traces become the input for optimization and training.

Getting Started: Two Paths

Choose your starting point based on your goal:
🔧 Ready to ship code? Start with the SDK Quickstart—it walks through installation, configuration, and running your first dual-agent task in minutes.

See the Atlas SDK in action: from installation to measurable performance gains across real examples.


I want to…Use this PathKey Docs
Orchestrate tasks with a structured runtime loop.Atlas SDKSDK Quickstart
Wrap my existing agent in a quality-control loop.Atlas SDKBYOA Adapters
Distill runtime traces into a smaller teacher.Atlas CoreGKD Training
Convert runtime traces into GRPO training runs.Atlas CoreOffline Training Guide
Choose your starting point:

Research & Resources

Learn more about the methodology and science behind ATLAS: