SDK Only : 30 seconds • Full Training Stack : 10-15 minutes
Choose Your Path
Most users only need the SDK:
python -m pip install --upgrade arc-atlas
This gives you adaptive dual-agent orchestration, telemetry streaming, and data export. Skip to the Verification section after installation.
Only install the full training stack if you need to:
Train custom teacher models with GRPO
Run offline reinforcement learning
Fine-tune models on your own hardware
The training stack requires CUDA-capable GPUs, PyTorch 2.6.0, and vLLM 0.8.3. Most teams use pre-trained teacher models and never need this setup.
System Requirements
Minimum Requirements
2× NVIDIA GPUs with CUDA support (for RL training)
1× GPU minimum for inference only
32GB+ system RAM
100GB+ disk space
Python 3.10 or newer
Recommended Setup
4×H100 or 8×H100 GPUs (40GB+ VRAM each)
128GB+ system RAM
200GB+ NVMe storage
Ubuntu 22.04 LTS
Prerequisites
Before installing: Run this 30-second check to verify your system meets requirements.
python - << 'EOF'
import sys
import subprocess
checks = []
# Check Python version
py_version = sys.version_info
checks.append(("Python 3.11 or 3.12", py_version >= (3, 11), f"Found {py_version.major}.{py_version.minor}"))
# Check CUDA
try:
result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
cuda_available = result.returncode == 0
checks.append(("NVIDIA GPU", cuda_available, "Found" if cuda_available else "Not found"))
except:
checks.append(("NVIDIA GPU", False, "nvidia-smi not available"))
# Check disk space
import shutil
stat = shutil.disk_usage("/")
free_gb = stat.free / (1024**3)
checks.append(("200GB+ free disk", free_gb >= 200, f"{free_gb:.1f}GB free"))
# Print results
print("\nPrerequisites Check:")
print("-" * 50)
for name, passed, detail in checks:
status = "✅" if passed else "❌"
print(f"{status} {name}: {detail}")
all_passed = all(c[1] for c in checks)
print("-" * 50)
if all_passed:
print("✅ All checks passed! Proceed with installation.")
else:
print("❌ Some checks failed. Review requirements before installing.")
sys.exit(1)
EOF
Expected output:
Prerequisites Check:
--------------------------------------------------
✅ Python 3.11 or 3.12: Found 3.11
✅ NVIDIA GPU: Found
✅ 200GB+ free disk: 245.3GB free
--------------------------------------------------
✅ All checks passed! Proceed with installation.
SDK-only users can skip this. This check is only needed for the full training stack (Atlas Core).
Set up CUDA
Ensure NVIDIA drivers and CUDA are installed and compatible with PyTorch 2.6.0: nvidia-smi # Verify CUDA version
Python Environment
Verify Python version (3.10 or newer required):
Authenticate with HuggingFace
Authenticate for model and dataset access:
Installation Methods
python -m pip install --upgrade arc-atlas
Keep credentials such as ANTHROPIC_API_KEY in a .env file and load them before orchestrating runs. Atlas defaults to Anthropic as the primary provider.
After the package installs, bootstrap your project with autodiscovery: atlas env init --task "Summarize the latest AI news"
atlas run --config .atlas/generated_config.yaml --task "Summarize the latest AI news"
The CLI writes .atlas/discover.json, optional factory scaffolds, and metadata snapshots while automatically loading .env and extending PYTHONPATH. atlas env init now handles storage setup automatically—no need to run atlas init separately. Re-run atlas env init --scaffold-config-full whenever you want a fresh runtime configuration derived from discovery output. Use our validated installation scripts for the smoothest setup: For Python 3.11: bash scripts/install_py311.sh
For Python 3.12: bash scripts/install_py312.sh
These scripts automatically:
Install PyTorch with CUDA 12.4 support
Configure vLLM 0.8.3
Set up Flash Attention
Install all dependencies
For custom environments or debugging: # Install PyTorch with CUDA support
python -m pip install torch== 2.6.0 --index-url https://download.pytorch.org/whl/cu124
# Install vLLM and TensorBoard
python -m pip install vllm== 0.8.3 tensorboard
# Install Flash Attention (for optimal performance)
python -m pip install flash-attn --no-build-isolation
# Install FlashInfer
python -m pip install flashinfer-python -i https://flashinfer.ai/whl/cu124/torch2.6/
# Install remaining dependencies
python -m pip install --upgrade -r requirements-py311.txt # or requirements-py312.txt
Create isolated environment with Conda: # Create environment
conda create -n atlas python= 3.11
conda activate atlas
# Install PyTorch
conda install pytorch== 2.6.0 pytorch-cuda= 12.4 -c pytorch -c nvidia
# Run installation script
bash scripts/install_py311.sh
API Keys
# Training stack
export HF_TOKEN = "your-huggingface-token"
export WANDB_API_KEY = "your-wandb-key" # Optional
# Runtime SDK
export ANTHROPIC_API_KEY = "sk-ant-your-key" # Primary provider
export GEMINI_API_KEY = "your-gemini-key" # Optional for rewards
Store secrets in .env. The Atlas CLI loads .env automatically and extends PYTHONPATH with your project root and src/ directory.
Disable Tracking
To disable Weights & Biases tracking:
# In command line
python train.py report_to=null
# Or in config file
report_to: null
Verification
After installation, verify your setup:
3-Minute Smoke Test
Run this once to confirm CUDA, vLLM, and model downloads are working before you invest in longer training jobs.
python - << 'PY'
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load teacher model
teacher = AutoModelForCausalLM.from_pretrained(
"Arc-Intelligence/ATLAS-8B-Thinking",
device_map="auto",
torch_dtype=torch.float16
)
teacher_tokenizer = AutoTokenizer.from_pretrained(
"Arc-Intelligence/ATLAS-8B-Thinking"
)
print("CUDA available:", torch.cuda.is_available())
print("GPU count:", torch.cuda.device_count())
print("Teacher model loaded:", teacher.config.model_type)
print("Model device:", next(teacher.parameters()).device)
PY
Expected output:
CUDA available: True
GPU count: 8
Teacher model loaded: qwen2
Model device: cuda:0
# Verify core dependencies
import torch
import transformers
import datasets
import vllm
print ( f "PyTorch: { torch. __version__ } " )
print ( f "CUDA available: { torch.cuda.is_available() } " )
print ( f "GPU count: { torch.cuda.device_count() } " )
print ( f "Transformers: { transformers. __version__ } " )
print ( f "vLLM: { vllm. __version__ } " )
GPU Memory Management
For different GPU configurations:
Single GPU is supported for inference only. For RL training, use model offloading: # Inference only with single GPU
python examples/quickstart/evaluate.py # Quick evaluation test
# For training with limited VRAM (requires 2+ GPUs)
python train.py +offload
# Or use Zero-1 optimization
python train.py +zero1
For distributed training across multiple GPUs: # Minimum 2 GPUs for RL training (1 for vLLM, 1 for training)
scripts/launch_with_server.sh 1 1 configs/run/teacher_rcl.yaml
# Production setup with 4 GPUs (2 for vLLM, 2 for training)
scripts/launch_with_server.sh 2 2 configs/run/teacher_rcl.yaml
# Full 8 GPU setup
scripts/launch_with_server.sh 4 4 configs/run/teacher_rcl.yaml
Reduce memory usage with these settings: # In config file
per_device_train_batch_size : 1
gradient_checkpointing : true
fp16 : true # or bf16 for A100/H100
Security Best Practices
Follow these security guidelines to protect sensitive information:
Never commit secrets : Keep tokens, .env files, and API keys out of version control
Use environment variables : Store HF_TOKEN, WANDB_API_KEY, etc. as environment variables
Gitignore protection : Ensure results/, logs/, wandb/ remain in .gitignore
Least privilege : Restrict dataset access permissions
Logout on shared machines : Run huggingface-cli logout after use
Tested on Ubuntu 20.04/22.04 LTS:
Ensure CUDA toolkit matches PyTorch requirements
May need sudo for system package installations
Limited support for Apple Silicon:
CPU-only mode available
Use MPS backend where supported
vLLM may not be available
Run through WSL2 for best compatibility:
Install CUDA toolkit in WSL2
Use Linux installation instructions
Ensure WSL2 has GPU passthrough enabled
Troubleshooting
If you see CUDA errors: # Check CUDA version
nvidia-smi
nvcc --version
# Reinstall PyTorch with correct CUDA version
pip install torch== 2.6.0 --index-url https://download.pytorch.org/whl/cu118 # For CUDA 11.8
Reduce memory usage: # Use gradient checkpointing
python train.py gradient_checkpointing= true
# Reduce batch size
python train.py per_device_train_batch_size= 1
# Enable CPU offloading
python train.py +offload
HuggingFace Access Denied
Ensure proper authentication: # Re-authenticate
huggingface-cli logout
huggingface-cli login
# Verify token
huggingface-cli whoami
Common vLLM issues: # Install build dependencies
sudo apt-get install python3-dev
# Try pre-built wheel
pip install https://github.com/vllm-project/vllm/releases/download/v0.8.3/vllm-0.8.3-cp311-cp311-linux_x86_64.whl
Next Steps