- Direct Database Access (Recommended) - Query PostgreSQL directly with the
atlas.training_datamodule for filtered, high-performance access - JSONL Export (Alternative Method) - Use the
arc-atlasCLI to export sessions to JSONL files
For training pipelines: Direct database access is recommended (SDK v0.1.13+). It eliminates schema drift, provides 10-100x faster queries with database indexes, and supports reward-based filtering at the database level.
1. Enable Postgres Persistence
Add astorage block to your SDK config:
atlas.core.run(..., stream_progress=True) as usual. Each session, step result, and intermediate event is written to Postgres.
2. Direct Database Access (Recommended)
Query training sessions directly from PostgreSQL with reward-based filtering and selective data loading:Key Features
- No intermediate files: Query directly from PostgreSQL
- Database-level filtering: Reward, status, date range, and learning key filters
- Selective loading: Control which fields are loaded (
include_trajectory_events,include_learning_data) - Pagination support: Process large datasets in batches with async iterators
- 10-100x faster: Database indexes optimize reward and date range queries
Example: Pagination for Large Datasets
3. JSONL Export (Alternative Method)
Start Postgres before exporting (e.g.,
docker compose up -d postgres or brew services start postgresql) so the CLI can connect successfully.Optional filters
--session-id 42(repeatable) exports specific sessions.--limit 25/--offset 25page through recent sessions.--status succeeded --status failedfilters on runtime completion state.--include-status approved(repeatable) restricts review statuses; omit to inheritruntime_safety.review.default_export_statuses. Use--include-all-statusesfor exploratory exports.--trajectory-event-limit 200caps the number of intermediate telemetry events embedded per session.
AtlasSessionTrace:
Tip: Compress large exports withUsexzorgzip—the loader streams line-by-line, so you can decompress on the fly if desired.
adaptive_summary to audit routing choices, probe evidence, and certification status; triage_dossier captures the structured context that informed the decision (see triage dossier); personas_used and persona_updates highlight which personas were active and how memory evolved during the run. Each step also carries structured artifacts captured during execution and a deliverable payload that mirrors what the Student hands back to downstream systems.
Review gating defaults to approved sessions. Set
ATLAS_REVIEW_REQUIRE_APPROVAL=0 only for local experiments and
always note which review statuses were exported alongside your artifacts.4. Feed the Training Stack
Using Direct Database Access (Recommended)
Using JSONL Export (Alternative Method)
configs/data/runtime_traces.yaml) described in the top-level quickstart. The schema matches the training adapters, so no custom glue code is required.
Troubleshooting
| Error | Likely cause | Fix |
|---|---|---|
database connection refused | Postgres URL unreachable | Verify host/port, ensure server is running. |
| Empty JSONL file | No sessions stored | Confirm storage block is enabled and runs completed successfully. |
| Missing rewards in JSON | Judges disabled | Ensure your rim block activates the judges you expect. |