Skip to main content
The Atlas SDK persists every orchestration session, including per-step rewards, guidance history, and tool usage. You can access this data through:
  1. Direct Database Access (Recommended) - Query PostgreSQL directly with the atlas.training_data module for filtered, high-performance access
  2. JSONL Export (Alternative Method) - Use the arc-atlas CLI to export sessions to JSONL files
For training pipelines: Direct database access is recommended (SDK v0.1.13+). It eliminates schema drift, provides 10-100x faster queries with database indexes, and supports reward-based filtering at the database level.

1. Enable Postgres Persistence

Add a storage block to your SDK config:
storage:
  database_url: postgresql://atlas:atlas@localhost:5433/atlas
  min_connections: 1
  max_connections: 5
  statement_timeout_seconds: 30
Run your tasks with atlas.core.run(..., stream_progress=True) as usual. Each session, step result, and intermediate event is written to Postgres. Query training sessions directly from PostgreSQL with reward-based filtering and selective data loading:
from atlas.training_data import get_training_sessions

# Query sessions with filters
sessions = get_training_sessions(
    db_url="postgresql://atlas:atlas@localhost:5433/atlas",
    min_reward=0.8,
    learning_key="security-review",
    status_filters=["succeeded"],
    limit=1000
)

# Access session data
for session in sessions:
    reward_score = session.session_reward["score"]
    trajectory = session.trajectory_events
    learning_data = session.learning_history

Key Features

  • No intermediate files: Query directly from PostgreSQL
  • Database-level filtering: Reward, status, date range, and learning key filters
  • Selective loading: Control which fields are loaded (include_trajectory_events, include_learning_data)
  • Pagination support: Process large datasets in batches with async iterators
  • 10-100x faster: Database indexes optimize reward and date range queries

Example: Pagination for Large Datasets

from atlas.training_data import paginate_sessions

async for batch in paginate_sessions(
    db_url="postgresql://atlas:atlas@localhost:5433/atlas",
    batch_size=100,
    min_reward=0.7
):
    for session in batch:
        process_session(session)
See the Training Data Pipeline Guide for complete API reference and advanced usage.

3. JSONL Export (Alternative Method)

arc-atlas \
  --database-url postgresql://atlas:atlas@localhost:5433/atlas \
  --output traces/my-session.jsonl \
  --include-status approved \
  --trajectory-event-limit 500 \
  --status succeeded \
  --limit 50
Start Postgres before exporting (e.g., docker compose up -d postgres or brew services start postgresql) so the CLI can connect successfully.
If another tool owns the atlas command on your system, run the exporter with python -m atlas.cli.export ... or adjust PATH so arc-atlas resolves first.

Optional filters

  • --session-id 42 (repeatable) exports specific sessions.
  • --limit 25 / --offset 25 page through recent sessions.
  • --status succeeded --status failed filters on runtime completion state.
  • --include-status approved (repeatable) restricts review statuses; omit to inherit runtime_safety.review.default_export_statuses. Use --include-all-statuses for exploratory exports.
  • --trajectory-event-limit 200 caps the number of intermediate telemetry events embedded per session.
The exporter writes one JSON object per line. Each record aligns with AtlasSessionTrace:
{
  "task": "Summarize the latest Atlas SDK updates",
  "final_answer": "...",
  "adaptive_summary": {
    "adaptive_mode": "coach",
    "confidence": 0.58,
    "certification_run": false,
    "probe": {
      "mode": "coach",
      "confidence": 0.55,
      "evidence": ["persona_helpful_ratio=0.62", "risk_high_severity"]
    },
    "mode_history": [
      {"mode": "paired", "confidence": 0.71, "certification": true},
      {"mode": "coach", "confidence": 0.55}
    ]
  },
  "triage_dossier": {
    "task": "Summarize the latest Atlas SDK updates",
    "summary": "Capture highlights for stakeholders.",
    "risks": [{"category": "quality", "description": "Customer-facing copy", "severity": "moderate"}],
    "signals": [{"name": "tenant", "value": "demo"}],
    "tags": ["tenant:demo", "domain:sre"]
  },
  "plan": {"steps": [{"id": 1, "description": "Collect release notes"}, {"id": 2, "description": "Draft summary"}]},
  "steps": [
    {
      "step_id": 1,
      "description": "Collect release notes",
      "trace": "HUMAN: ...",
      "output": "...",
      "reward": {
        "score": 0.92,
        "judges": [
          {"identifier": "process", "score": 0.91, "rationale": "..."}
        ]
      },
      "guidance": ["Cite the release date."],
      "validation": {"valid": true, "rationale": "Complete"},
      "tool": "web_search",
      "tool_params": {"query": "Atlas SDK release notes"},
      "artifacts": {"sources": ["https://..."]},
      "deliverable": {"notes": ["https://..."]}
    }
  ],
  "session_reward": {
    "score": 0.88,
    "uncertainty": 0.07,
    "judges": [
      {"identifier": "process", "score": 0.90, "rationale": "..."}
    ]
  },
  "reward_summary": {"score": 0.88},
  "review_status": "approved",
  "personas_used": [
    {"persona": "planner", "instruction": "Focus on customer tone", "source": "memory"}
  ],
  "persona_updates": {
    "new_candidates": [
      {"persona": "planner", "instruction": "Mention adaptive modes", "tags": ["tenant:demo"]}
    ]
  },
  "session_metadata": {"batch": "aime-2025"}
}
Tip: Compress large exports with xz or gzip—the loader streams line-by-line, so you can decompress on the fly if desired.
Use adaptive_summary to audit routing choices, probe evidence, and certification status; triage_dossier captures the structured context that informed the decision (see triage dossier); personas_used and persona_updates highlight which personas were active and how memory evolved during the run. Each step also carries structured artifacts captured during execution and a deliverable payload that mirrors what the Student hands back to downstream systems.
Review gating defaults to approved sessions. Set ATLAS_REVIEW_REQUIRE_APPROVAL=0 only for local experiments and always note which review statuses were exported alongside your artifacts.

4. Feed the Training Stack

from atlas.training_data import get_training_sessions
from trainers.runtime_dataset import sessions_to_rl_records

# Query sessions directly
sessions = get_training_sessions(
    db_url="postgresql://atlas:atlas@localhost:5433/atlas",
    min_reward=0.8,
    status_filters=["succeeded"],
    limit=10000
)

# Convert to RL training records
records = sessions_to_rl_records(sessions)

Using JSONL Export (Alternative Method)

from trainers.runtime_dataset import load_runtime_traces, sessions_to_rl_records

sessions = load_runtime_traces("traces/my-session.jsonl")
records = sessions_to_rl_records(sessions)
Or use the Hydra shortcut (configs/data/runtime_traces.yaml) described in the top-level quickstart. The schema matches the training adapters, so no custom glue code is required.

Troubleshooting

ErrorLikely causeFix
database connection refusedPostgres URL unreachableVerify host/port, ensure server is running.
Empty JSONL fileNo sessions storedConfirm storage block is enabled and runs completed successfully.
Missing rewards in JSONJudges disabledEnsure your rim block activates the judges you expect.
With the exporter in place you can schedule nightly runs, collect batches of traces, and continuously fine-tune the teacher without manual wrangling.