Skip to main content
This guide shows HOW-TO use the reward system in code. For conceptual understanding, see The ATLAS Reward System.

Using the Reward System

In Training (Offline RL)

The reward system integrates seamlessly with the GRPO trainer:
from trainers.grpo import GRPOTrainer
from RIM.reward_adapter import RIMReward
from datasets import load_dataset

# 1. Instantiate reward system
reward_system = RIMReward(config_path='configs/rim_config.yaml')

# 2. Pass to trainer
trainer = GRPOTrainer(
    model="path/to/your/teacher_model",
    args=grpo_config,
    reward_funcs=[reward_system],  # Just pass it in
    train_dataset=train_dataset
)

# 3. Train - the reward system runs automatically
trainer.train()
The trainer handles calling the reward system with batches of data during the RL loop. You don’t need to manage it manually.

For Ad-hoc Evaluation

Quick evaluation of teaching effectiveness:
from RIM.reward_adapter import RIMReward

# Create reward system
reward = RIMReward(config_path='configs/rim_config.yaml')

# Evaluate a single interaction
result = reward.evaluate({
    'question': 'What is 2+2?',
    'baseline_response': 'It is 4',
    'taught_response': 'The answer is 4 because 2 plus 2 equals 4',
    'teaching': 'Explain your reasoning step by step'
})

print(f"Accuracy: {result['accuracy']}")
print(f"Helpfulness: {result['helpfulness']}")
print(f"Improvement: {result['helpfulness'] - result['baseline_accuracy']}")

In Continual Learning

In the SDK runtime, the same reward signals drive continual learning loops and help teams decide when to export traces for GRPO training. See the atlas-sdk documentation for details on wiring reward feedback into production orchestration.

Customizing Judges

Advanced Configuration: This section is for users who need custom evaluation criteria. Most users can use the default judges.

Modifying Existing Judges

Judge behavior is controlled by their prompts in RIM/judges.py. To change what AccuracyJudge prioritizes:
# RIM/judges.py
class AccuracyJudge:
    def _build_prompt(self, inputs: Dict[str, Any]) -> str:
        # Customize this string to change evaluation criteria
        return f"""Evaluate these responses.

Prompt: {inputs.get('prompt', '')}
Response A: {inputs.get('response_a', '')}
Response B: {inputs.get('response_b', '')}

Step 1: Generate 2-3 evaluation principles with weights (must sum to 1.0)
Step 2: Score both responses against each principle
Step 3: Provide final scores (0.0 to 1.0)

Output JSON only: {{"principles": [...], "score_a": float, "score_b": float, "uncertainty": float}}"""

Adding a New Judge

Step 1: Create judge class (RIM/judges.py):
class CreativityJudge:
    def __init__(self):
        self.name = 'creativity'

    def evaluate(self, inputs: Dict[str, Any], model_fn, temperature: float):
        prompt = f"""Score creativity (0.0 = formulaic, 1.0 = highly creative).
        Response: {inputs.get('response', '')}
        Output JSON: {{"score": float, "rationale": str, "uncertainty": float}}"""

        response = model_fn(prompt, temperature)
        return json.loads(response)
Step 2: Register in reward adapter (RIM/reward_adapter.py):
from RIM.judges import AccuracyJudge, HelpfulnessJudge, CreativityJudge

class RIMReward:
    def __init__(self, ...):
        self.judges = {
            'accuracy': AccuracyJudge(),
            'helpfulness': HelpfulnessJudge(),
            'creativity': CreativityJudge()  # Add here
        }
Step 3: Enable in config (configs/rim_config.yaml):
active_judges:
  accuracy: true
  helpfulness: true
  creativity: true  # Enable new judge

Performance & Monitoring

RewardBench V2 Results

The ensemble-and-escalation architecture achieves 93.7% overall accuracy, significantly outperforming individual models:
  • Component model (gemini-2.5-flash): 77.7% on its own
  • System performance: 93.7% (+16 points)
The architecture creates a result greater than the sum of its parts.
ATLAS Reward System Leaderboard

Category Breakdown

Performance by Category
See the complete Reward System Technical Report for full analysis.

Monitoring Rewards During Training

The training logs include reward system outputs:
# Example log entry
{
  'step': 150,
  'rim_rewards': {
    'accuracy': 0.85,
    'helpfulness': 0.72,
    'process': 0.78,
    'diagnostic': 0.80
  },
  'rim_explanations': {
    'accuracy': 'Response correctly solves the problem with proper units',
    'helpfulness': 'Teaching improved reasoning structure significantly'
  },
  'escalation_rate': 0.23  # 23% of cases went to Tier 2
}
Monitor these to:
  • Spot prompt regressions (dropping helpfulness scores)
  • Identify misconfigured thresholds (escalation rate too high/low)
  • Validate teaching improvements (rising scores over time)

Next Steps

References