Skip to main content
Time required: 10-15 minutes • Difficulty: Beginner
TL;DR: If you only need the SDK runtime, install the packaged release:
python -m pip install --upgrade arc-atlas
That gives you the orchestrator, telemetry streaming, and the exporter CLI. Use the sections below when you need the full training stack (PyTorch, vLLM, Flash Attention) for SFT/GRPO. This guide covers installing ATLAS with Python 3.10+, PyTorch 2.6.0, and vLLM 0.8.3 (the high-throughput inference engine).

System Requirements

Minimum Requirements

  • 2× NVIDIA GPUs with CUDA support (for RL training)
  • 1× GPU minimum for inference only
  • 32GB+ system RAM
  • 100GB+ disk space
  • Python 3.10 or newer

Recommended Setup

  • 4×H100 or 8×H100 GPUs (40GB+ VRAM each)
  • 128GB+ system RAM
  • 200GB+ NVMe storage
  • Ubuntu 22.04 LTS

Prerequisites

1

CUDA Setup

Ensure NVIDIA drivers and CUDA are installed and compatible with PyTorch 2.6.0:
nvidia-smi  # Verify CUDA version
2

Python Environment

Verify Python version (3.10 or newer required):
python --version
3

HuggingFace Authentication

Authenticate for model and dataset access:
huggingface-cli login

Installation Methods

  • Runtime SDK (Minimal)
  • Manual Training Installation
  • Conda Environment
python -m pip install --upgrade arc-atlas
Keep credentials such as OPENAI_API_KEY in a .env file and load them before orchestrating runs.
After the package installs, bootstrap your project with autodiscovery:
atlas env init --task "Summarize the latest AI news"
atlas run --config .atlas/generated_config.yaml --task "Summarize the latest AI news"
The CLI writes .atlas/discover.json, optional factory scaffolds, and metadata snapshots while automatically loading .env and extending PYTHONPATH. Re-run atlas env init --scaffold-config-full whenever you want a fresh runtime configuration derived from discovery output.

Environment Configuration

API Keys and Tracking

Configure authentication for various services:
# Required: HuggingFace for models
export HF_TOKEN="your-huggingface-token"

# Optional: Weights & Biases for experiment tracking
export WANDB_API_KEY="your-wandb-key"

# Optional: OpenAI/Gemini for runtime orchestration
export OPENAI_API_KEY="your-openai-key"
export GEMINI_API_KEY="your-gemini-key"
The training script automatically sets HF_HUB_ENABLE_HF_TRANSFER=1 to speed up model downloads.
Keep provider keys, DATABASE_URL, and other secrets in .env. The Atlas CLI family (atlas env, atlas run, atlas train, arc-atlas export) loads .env automatically and adds your project root plus src/ to PYTHONPATH, so custom adapters resolve without manual sys.path tweaks.

Disable Tracking

To disable Weights & Biases tracking:
# In command line
python train.py report_to=null

# Or in config file
report_to: null

Verification

After installation, verify your setup:

3-Minute Smoke Test

Run this once to confirm CUDA, vLLM, and model downloads are working before you invest in longer training jobs.
python - <<'PY'
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load teacher model
teacher = AutoModelForCausalLM.from_pretrained(
    "Arc-Intelligence/ATLAS-8B-Thinking",
    device_map="auto",
    torch_dtype=torch.float16
)
teacher_tokenizer = AutoTokenizer.from_pretrained(
    "Arc-Intelligence/ATLAS-8B-Thinking"
)

print("CUDA available:", torch.cuda.is_available())
print("GPU count:", torch.cuda.device_count())
print("Teacher model loaded:", teacher.config.model_type)
print("Model device:", next(teacher.parameters()).device)
PY
# Expected output: CUDA available: True, GPU count: 2+ (for RL training), model type shown
# Verify core dependencies
import torch
import transformers
import datasets
import vllm

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
print(f"Transformers: {transformers.__version__}")
print(f"vLLM: {vllm.__version__}")

GPU Memory Management

For different GPU configurations:
Single GPU is supported for inference only. For RL training, use model offloading:
# Inference only with single GPU
python examples/quickstart/evaluate.py  # Quick evaluation test

# For training with limited VRAM (requires 2+ GPUs)
python train.py +offload

# Or use Zero-1 optimization
python train.py +zero1
For distributed training across multiple GPUs:
# Minimum 2 GPUs for RL training (1 for vLLM, 1 for training)
scripts/launch_with_server.sh 1 1 configs/run/teacher_rcl.yaml

# Production setup with 4 GPUs (2 for vLLM, 2 for training)
scripts/launch_with_server.sh 2 2 configs/run/teacher_rcl.yaml

# Full 8 GPU setup
scripts/launch_with_server.sh 4 4 configs/run/teacher_rcl.yaml
Reduce memory usage with these settings:
# In config file
per_device_train_batch_size: 1
gradient_checkpointing: true
fp16: true  # or bf16 for A100/H100

Security Best Practices

Follow these security guidelines to protect sensitive information:
  • Never commit secrets: Keep tokens, .env files, and API keys out of version control
  • Use environment variables: Store HF_TOKEN, WANDB_API_KEY, etc. as environment variables
  • Gitignore protection: Ensure results/, logs/, wandb/ remain in .gitignore
  • Least privilege: Restrict dataset access permissions
  • Logout on shared machines: Run huggingface-cli logout after use

Platform-Specific Notes

  • Linux
  • macOS
  • Windows WSL2
Tested on Ubuntu 20.04/22.04 LTS:
  • Ensure CUDA toolkit matches PyTorch requirements
  • May need sudo for system package installations

Troubleshooting

If you see CUDA errors:
# Check CUDA version
nvidia-smi
nvcc --version

# Reinstall PyTorch with correct CUDA version
pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu118  # For CUDA 11.8
Reduce memory usage:
# Use gradient checkpointing
python train.py gradient_checkpointing=true

# Reduce batch size
python train.py per_device_train_batch_size=1

# Enable CPU offloading
python train.py +offload
Ensure proper authentication:
# Re-authenticate
huggingface-cli logout
huggingface-cli login

# Verify token
huggingface-cli whoami
Common vLLM issues:
# Install build dependencies
sudo apt-get install python3-dev

# Try pre-built wheel
pip install https://github.com/vllm-project/vllm/releases/download/v0.8.3/vllm-0.8.3-cp311-cp311-linux_x86_64.whl

Next Steps