Keep credentials such as ANTHROPIC_API_KEY in a .env file and load them before orchestrating runs. Atlas defaults to Anthropic as the primary provider.
After the package installs, bootstrap your project with autodiscovery:
Copy
atlas env init --task "Summarize the latest AI news"atlas run --config .atlas/generated_config.yaml --task "Summarize the latest AI news"
The CLI writes .atlas/discover.json, optional factory scaffolds, and metadata snapshots while automatically loading .env and extending PYTHONPATH. atlas env init now handles storage setup automatically—no need to run atlas init separately. Re-run atlas env init --scaffold-config-full whenever you want a fresh runtime configuration derived from discovery output.
Use our validated installation scripts for the smoothest setup:For Python 3.11:
Copy
bash scripts/install_py311.sh
For Python 3.12:
Copy
bash scripts/install_py312.sh
These scripts automatically:
Install PyTorch with CUDA 12.4 support
Configure vLLM 0.8.3
Set up Flash Attention
Install all dependencies
Build a pinned training image directly from this repo:
Copy
docker build -t atlas-core:local .
Run the offline pipeline helper against a JSONL export:
Single GPU is supported for inference only. For RL training, use model offloading:
Copy
# Inference only with single GPUpython examples/quickstart/evaluate.py # Quick evaluation test# For training with limited VRAM (requires 2+ GPUs)scripts/launch.sh offload 2 src/atlas_core/configs/recipe/teacher_rcl.yaml# Or use Zero-1 optimizationscripts/launch.sh zero1 2 src/atlas_core/configs/recipe/teacher_rcl.yaml
Multi-GPU Setup
For distributed training across multiple GPUs:
Copy
# Minimum 2 GPUs for RL training (1 for vLLM, 1 for training)scripts/launch_with_server.sh 1 1 src/atlas_core/configs/recipe/teacher_rcl.yaml# Production setup with 4 GPUs (2 for vLLM, 2 for training)scripts/launch_with_server.sh 2 2 src/atlas_core/configs/recipe/teacher_rcl.yaml# Full 8 GPU setupscripts/launch_with_server.sh 4 4 src/atlas_core/configs/recipe/teacher_rcl.yaml
Memory Optimization
Reduce memory usage with these settings:
Copy
# In config fileper_device_train_batch_size: 1gradient_checkpointing: truefp16: true # or bf16 for A100/H100
# Check CUDA versionnvidia-sminvcc --version# Reinstall PyTorch with correct CUDA versionpip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu118 # For CUDA 11.8
Out of Memory Errors
Reduce memory usage:
Copy
# Use gradient checkpointingatlas-core train gradient_checkpointing=true# Reduce batch sizeatlas-core train per_device_train_batch_size=1# Enable CPU offloadingscripts/launch.sh offload 2 src/atlas_core/configs/recipe/teacher_rcl.yaml