Overview
The Atlas SDK provides direct PostgreSQL access for training data extraction, eliminating JSONL export intermediates and preventing schema drift between SDK and ATLAS Core. Query training sessions with reward-based filtering, selective data loading, and pagination support for large datasets.Prerequisites
- Atlas SDK v0.1.13 or higher
- PostgreSQL database with runtime traces (configured via
storage.database_url) - Python 3.10+
Direct Database Access
Basic Usage
Query training sessions directly from PostgreSQL:Async Queries
For high-throughput training pipelines:Query Filters
Reward-Based Filtering
Filter sessions by reward score using JSONB operators:Status Filtering
Filter by runtime completion status:Date Range Filtering
Query sessions within a specific time window:Selective Data Loading
Control which data is loaded to optimize performance:include_trajectory_events=False: 50-70% faster queriesinclude_learning_data=False: 30-40% faster queries
Pagination
Process large datasets in batches using async iterators:Session Count Queries
Get session counts without loading full data:Fetch Individual Sessions
Retrieve a specific session by ID:Schema Fields
AtlasSessionTrace
Essential fields (always loaded):session_reward: Aggregate reward with score and uncertaintytrajectory_events: Ordered list of runtime eventsstudent_learning: Student persona learning notesteacher_learning: Teacher persona learning noteslearning_history: Historical learning dataadaptive_summary: Mode selection and probe evidence
learning_key: Task identifier for grouping sessionsteacher_notes: Guidance provided during executionreward_summary: Simplified reward statisticsdrift: Detected schema or behavior driftdrift_alert: Critical drift warningstriage_dossier: Pre-execution risk assessmentreward_audit: Detailed judge breakdowns
AtlasStepTrace
Essential fields:runtime: Execution time in millisecondsdepends_on: Step dependency graph
attempt_history: Previous attempt records
Performance Optimization
Database Indexes
The SDK automatically creates performance indexes:Query Optimization
For training workloads with millions of sessions:Integration with Training Pipeline
Step 1: Query Training Data
Step 2: Convert to Training Format
Step 3: Train with GRPO
Migration from JSONL Export
Previous Approach (JSONL Files)
Direct Database Access
- No intermediate JSONL files
- Filters applied at database level
- 10-100x faster queries with indexes
- No schema drift between SDK and training
Troubleshooting
| Error | Cause | Solution |
|---|---|---|
| Connection refused | PostgreSQL not running | Start Postgres: docker compose up -d postgres |
| Empty result set | No sessions match filters | Verify filters with count_training_sessions() |
| Memory error | Loading too many sessions | Use pagination with smaller batch sizes |
| Missing fields | SDK version mismatch | Upgrade to atlas-sdk ≥ 0.1.13 |
API Reference
Core Functions
Converter Functions
Related Documentation
- GRPO Training Guide - Complete training pipeline
- Database Schema - PostgreSQL schema reference
- Runtime Export - JSONL export and direct database access methods