Overview

The teacher-student paradigm in ATLAS establishes an asymmetric learning relationship where a specialized teacher model enhances any student model’s capabilities without modifying the student’s weights or architecture.

Core Architecture

Model Roles

class TeacherStudentSystem:
    """
    Asymmetric model interaction for performance enhancement

    Teacher: 8B parameter model trained for adaptive teaching
    Student: Any LLM (4B-70B+) requiring enhancement
    """

    def __init__(self, teacher_model, student_model):
        self.teacher = teacher_model  # ATLAS-8B-Thinking or ATLAS-8B-Instruct
        self.student = student_model  # GPT-4, Claude, Llama, etc.

    def enhance_response(self, prompt: str) -> EnhancedResponse:
        # Step 1: Diagnostic probe
        student_capability = self.teacher.assess_capability(prompt, self.student)

        # Step 2: Adaptive guidance
        teaching_strategy = self.teacher.generate_guidance(
            prompt=prompt,
            capability_level=student_capability
        )

        # Step 3: Enhanced generation
        enhanced_output = self.student.generate(
            prompt=prompt,
            guidance=teaching_strategy
        )

        return EnhancedResponse(
            baseline=self.student.generate(prompt),
            enhanced=enhanced_output,
            improvement=self.measure_improvement(baseline, enhanced_output)
        )

Technical Specifications

Model Requirements

ComponentSpecificationPurpose
Teacher Model8B parameters, RL-trainedProvides adaptive guidance
Student ModelAny size (4B-70B+)Executes enhanced reasoning
Context Window4096-32768 tokensAccommodates teaching interaction
Inference Time+30% overheadTwo-pass protocol cost

Capability Assessment

The teacher evaluates student competence through targeted probes:
def assess_capability(self, task: str, student: Model) -> CapabilityLevel:
    """
    Diagnostic assessment of student's task-specific capability

    Returns:
        CapabilityLevel: Enum of WEAK, MODERATE, STRONG
    """
    # Generate diagnostic probe (≤50 tokens)
    probe = self.generate_probe(task)

    # Collect student response
    response = student.generate(probe, max_tokens=50)

    # Analyze response quality
    indicators = {
        'reasoning_depth': self.analyze_reasoning(response),
        'domain_knowledge': self.check_domain_terms(response),
        'problem_structure': self.evaluate_structure(response),
        'confidence': self.estimate_confidence(response)
    }

    # Classify capability level
    if indicators['reasoning_depth'] < 0.3:
        return CapabilityLevel.WEAK
    elif indicators['reasoning_depth'] < 0.7:
        return CapabilityLevel.MODERATE
    else:
        return CapabilityLevel.STRONG

Adaptive Teaching Strategies

Strategy Selection Matrix

Student CapabilityTeaching StrategyGuidance TokensFocus
WEAKComprehensive scaffolding200-300Step-by-step decomposition
MODERATETargeted hints100-150Key insights and corrections
STRONGMinimal intervention50-100Edge case handling only

Implementation Example

def generate_guidance(self, task: str, capability: CapabilityLevel) -> str:
    """
    Generate capability-appropriate teaching guidance
    """
    if capability == CapabilityLevel.WEAK:
        return self.comprehensive_scaffolding(task)
    elif capability == CapabilityLevel.MODERATE:
        return self.targeted_hints(task)
    else:  # STRONG
        return self.minimal_intervention(task)

def comprehensive_scaffolding(self, task: str) -> str:
    """Full problem decomposition for weak students"""
    return f"""
    Break this problem into steps:
    1. Identify the core requirement: {self.extract_requirement(task)}
    2. Gather necessary information: {self.list_prerequisites(task)}
    3. Apply systematic approach: {self.generate_methodology(task)}
    4. Verify solution: {self.create_verification(task)}
    """

def targeted_hints(self, task: str) -> str:
    """Key insights for moderate students"""
    return f"""
    Key insight: {self.identify_critical_insight(task)}
    Common pitfall: {self.highlight_common_error(task)}
    """

def minimal_intervention(self, task: str) -> str:
    """Edge case awareness for strong students"""
    return f"""
    Consider edge case: {self.identify_edge_case(task)}
    """

Empirical Performance

τ²-bench Results (Dual-Control Environment)

Our system was evaluated on τ²-bench’s most complex mms_issue tasks, establishing state-of-the-art performance:
SystemPass@1 RatePass@4 RateNotes
ATLAS Teacher-Student24.0%22.4%Minimal degradation
GPT-4.118.0%10.0%-8pt drop
Claude 3.7 Sonnet18.0%2.0%-16pt drop
o4-mini12.0%2.0%-10pt drop
Qwen3-8B (Student Only)4.1%-No teacher guidance

Key Performance Metrics (from README)

  • Average accuracy improvement: 15.7% across tasks
  • Maximum improvement: 29.6% on specific domains
  • Completion rate: 31% improvement (69% → 100%)
  • Token efficiency: 50% reduction (4k → 2k tokens)
  • Non-degradation rate: 97%

Key Observations

  1. 6x performance lift: Teacher guidance improves Qwen3-8B from 4.1% to 24.0%
  2. Consistency advantage: Minimal pass@4 degradation vs competitors
  3. Cross-domain transfer: Math-trained teacher successfully guides telecom tasks

Case Study: Mathematical Reasoning

Demonstrating teacher-student interaction on a complex problem:

Task

“A bacteria culture doubles every 3 hours. Starting with 100 bacteria, how many will there be after 15 hours?”

Interaction Flow

Diagnostic Response: “100 × 2 × 5 = 1000”Teacher Guidance:
This is exponential growth, not linear multiplication.
Steps:
1. Find number of doubling periods: 15 ÷ 3 = 5
2. Apply exponential formula: Initial × 2^periods
3. Calculate: 100 × 2^5 = 100 × 32 = 3200
Enhanced Response: “3200 bacteria (100 × 2^5)”

Integration Patterns

Pattern 1: Direct Enhancement

# Using optimize_teaching.py for enhancement
from trainers.prompt_adapter import ATLASGEPAAdapter

adapter = ATLASGEPAAdapter(
    teacher_model=teacher_generate_fn,
    student_model=student_generate_fn,
    all_prompts=optimized_prompts
)
result = adapter.evaluate([{"question": user_query}])
enhanced_output = result.outputs[0]["student_with_teaching"]

Pattern 2: Batch Processing

# Efficient processing of multiple queries
queries = [{"question": q} for q in query_list]
results = adapter.evaluate(
    queries,
    capture_traces=True  # For analysis
)
enhanced_outputs = [r["student_with_teaching"] for r in results.outputs]

Pattern 3: Streaming Applications

# Real-time enhancement for chat applications
class StreamingATLAS:
    def __init__(self, teacher, student):
        self.teacher = teacher
        self.student = student
        self.guidance_cache = {}

    async def stream_enhanced(self, prompt: str):
        # Get guidance once
        if prompt not in self.guidance_cache:
            self.guidance_cache[prompt] = await self.teacher.get_guidance(prompt)

        # Stream enhanced response
        async for token in self.student.stream(prompt, self.guidance_cache[prompt]):
            yield token

Advantages

Over Fine-tuning

  • No retraining required: Works with frozen student models
  • Preserves capabilities: No catastrophic forgetting
  • Instant deployment: No training time or cost

Over Prompting

  • Adaptive: Adjusts to student capability
  • Consistent: Systematic improvement approach
  • Efficient: Optimized token usage

Over Ensemble Methods

  • Lower latency: Single student inference
  • Lower cost: No multiple model calls
  • Better interpretability: Clear teaching rationale

Implementation Best Practices

Choose based on task type:
  • ATLAS-8B-Thinking: Mathematical and logical reasoning
  • ATLAS-8B-Instruct: Code generation and technical tasks
  • Custom trained: Domain-specific requirements
Verify student model supports:
  • System prompts or instruction following
  • Sufficient context length (>4K tokens)
  • Deterministic generation (temperature control)
  • Cache teacher guidance for repeated queries
  • Batch similar tasks together
  • Use streaming for interactive applications
  • Monitor token usage for cost control

Next Steps

References