Installation

Time required: 10-15 minutes • Difficulty: Beginner

TL;DR: If you only need the SDK runtime, install the packaged release:

python -m pip install --upgrade arc-atlas

That gives you the orchestrator, telemetry streaming, and the exporter CLI. Use the sections below when you need the full training stack (PyTorch, vLLM, Flash Attention) for SFT/GRPO. This guide covers installing ATLAS with Python 3.10+, PyTorch 2.6.0, and vLLM 0.8.3 (the high-throughput inference engine).

System Requirements

Minimum Requirements

2× NVIDIA GPUs with CUDA support (for RL training)
1× GPU minimum for inference only
32GB+ system RAM
100GB+ disk space
Python 3.10 or newer

Recommended Setup

4×H100 or 8×H100 GPUs (40GB+ VRAM each)
128GB+ system RAM
200GB+ NVMe storage
Ubuntu 22.04 LTS

Prerequisites

CUDA Setup

Ensure NVIDIA drivers and CUDA are installed and compatible with PyTorch 2.6.0:

nvidia-smi  # Verify CUDA version

Python Environment

Verify Python version (3.10 or newer required):

python --version

HuggingFace Authentication

Authenticate for model and dataset access:

huggingface-cli login

Installation Methods

Runtime SDK (Minimal)
Automated Training Setup (Recommended)
Manual Training Installation
Conda Environment

python -m pip install --upgrade arc-atlas

Keep credentials such as OPENAI_API_KEY in a .env file and load them before orchestrating runs.

After the package installs, bootstrap your project with autodiscovery:

atlas env init --task "Summarize the latest AI news"
atlas run --config .atlas/generated_config.yaml --task "Summarize the latest AI news"

The CLI writes .atlas/discover.json, optional factory scaffolds, and metadata snapshots while automatically loading .env and extending PYTHONPATH. Re-run atlas env init --scaffold-config-full whenever you want a fresh runtime configuration derived from discovery output.

Environment Configuration

API Keys and Tracking

Configure authentication for various services:

# Required: HuggingFace for models
export HF_TOKEN="your-huggingface-token"

# Optional: Weights & Biases for experiment tracking
export WANDB_API_KEY="your-wandb-key"

# Optional: OpenAI/Gemini for runtime orchestration
export OPENAI_API_KEY="your-openai-key"
export GEMINI_API_KEY="your-gemini-key"

The training script automatically sets HF_HUB_ENABLE_HF_TRANSFER=1 to speed up model downloads.

Keep provider keys, DATABASE_URL, and other secrets in .env. The Atlas CLI family (atlas env, atlas run, atlas train, arc-atlas export) loads .env automatically and adds your project root plus src/ to PYTHONPATH, so custom adapters resolve without manual sys.path tweaks.

Disable Tracking

To disable Weights & Biases tracking:

# In command line
python train.py report_to=null

# Or in config file
report_to: null

Verification

After installation, verify your setup:

3-Minute Smoke Test

Run this once to confirm CUDA, vLLM, and model downloads are working before you invest in longer training jobs.

python - <<'PY'
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load teacher model
teacher = AutoModelForCausalLM.from_pretrained(
    "Arc-Intelligence/ATLAS-8B-Thinking",
    device_map="auto",
    torch_dtype=torch.float16
)
teacher_tokenizer = AutoTokenizer.from_pretrained(
    "Arc-Intelligence/ATLAS-8B-Thinking"
)

print("CUDA available:", torch.cuda.is_available())
print("GPU count:", torch.cuda.device_count())
print("Teacher model loaded:", teacher.config.model_type)
print("Model device:", next(teacher.parameters()).device)
PY
# Expected output: CUDA available: True, GPU count: 2+ (for RL training), model type shown

# Verify core dependencies
import torch
import transformers
import datasets
import vllm

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
print(f"Transformers: {transformers.__version__}")
print(f"vLLM: {vllm.__version__}")

GPU Memory Management

For different GPU configurations:

Single GPU Setup

Single GPU is supported for inference only. For RL training, use model offloading:

# Inference only with single GPU
python examples/quickstart/evaluate.py  # Quick evaluation test

# For training with limited VRAM (requires 2+ GPUs)
python train.py +offload

# Or use Zero-1 optimization
python train.py +zero1

Multi-GPU Setup

For distributed training across multiple GPUs:

# Minimum 2 GPUs for RL training (1 for vLLM, 1 for training)
scripts/launch_with_server.sh 1 1 configs/run/teacher_rcl.yaml

# Production setup with 4 GPUs (2 for vLLM, 2 for training)
scripts/launch_with_server.sh 2 2 configs/run/teacher_rcl.yaml

# Full 8 GPU setup
scripts/launch_with_server.sh 4 4 configs/run/teacher_rcl.yaml

Memory Optimization

Reduce memory usage with these settings:

# In config file
per_device_train_batch_size: 1
gradient_checkpointing: true
fp16: true  # or bf16 for A100/H100

Security Best Practices

Follow these security guidelines to protect sensitive information:

Never commit secrets: Keep tokens, .env files, and API keys out of version control
Use environment variables: Store HF_TOKEN, WANDB_API_KEY, etc. as environment variables
Gitignore protection: Ensure results/, logs/, wandb/ remain in .gitignore
Least privilege: Restrict dataset access permissions
Logout on shared machines: Run huggingface-cli logout after use

Platform-Specific Notes

Linux
macOS
Windows WSL2

Tested on Ubuntu 20.04/22.04 LTS:

Ensure CUDA toolkit matches PyTorch requirements
May need sudo for system package installations

Troubleshooting

CUDA Version Mismatch

If you see CUDA errors:

# Check CUDA version
nvidia-smi
nvcc --version

# Reinstall PyTorch with correct CUDA version
pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu118  # For CUDA 11.8

Out of Memory Errors

Reduce memory usage:

# Use gradient checkpointing
python train.py gradient_checkpointing=true

# Reduce batch size
python train.py per_device_train_batch_size=1

# Enable CPU offloading
python train.py +offload

HuggingFace Access Denied

Ensure proper authentication:

# Re-authenticate
huggingface-cli logout
huggingface-cli login

# Verify token
huggingface-cli whoami

vLLM Installation Fails

Common vLLM issues:

# Install build dependencies
sudo apt-get install python3-dev

# Try pre-built wheel
pip install https://github.com/vllm-project/vllm/releases/download/v0.8.3/vllm-0.8.3-cp311-cp311-linux_x86_64.whl

Getting Started

SDK Guides

Training

Core Concepts

Reference

Benchmarks

Installation

System Requirements

Minimum Requirements

Recommended Setup

Prerequisites

Installation Methods

Environment Configuration

API Keys and Tracking

Disable Tracking

Verification

3-Minute Smoke Test

GPU Memory Management

Security Best Practices

Platform-Specific Notes

Troubleshooting

Next Steps

Quickstart

Offline Training

Getting Started

SDK Guides

Training

Core Concepts

Reference

Benchmarks

​System Requirements

Minimum Requirements

Recommended Setup

​Prerequisites

​Installation Methods

​Environment Configuration

​API Keys and Tracking

​Disable Tracking

​Verification

​3-Minute Smoke Test

​GPU Memory Management

​Security Best Practices

​Platform-Specific Notes

​Troubleshooting

​Next Steps

Quickstart

Offline Training

System Requirements

Prerequisites

Installation Methods

Environment Configuration

API Keys and Tracking

Disable Tracking

Verification

3-Minute Smoke Test

GPU Memory Management

Security Best Practices

Platform-Specific Notes

Troubleshooting

Next Steps