Skip to content

Coder

The Coder is the implementation workhorse. It writes code, runs experiments, debugs errors, and produces results — all within the boundaries set by the Planner's design.

Identity

PropertyValue
LLMCodex (GPT)
Invocationomc team 1:codex:coder "task"
LifecyclePersistent — stays alive across multiple tasks
SessionNamed tmux session ({prefix}-coder)

Why persistent?

Unlike the Writer (fresh session per section) or the Judge (stateless per invocation), the Coder keeps its tmux session alive across tasks. This preserves the development environment — conda env, working directory, running processes, terminal history. The Coder can resume exactly where it left off.

Responsibilities

ResponsibilityDescription
Code implementationTranslate Planner's design into working code
Environment setupInstall dependencies, configure GPU, prepare data
Training executionLaunch and monitor training runs
DebuggingFix errors in code, data pipeline, or training
TestingRun tests to verify implementation correctness
Result extractionParse logs and produce structured results files
Figure generationCreate plots from Scout's figure descriptions
Experiment managementTrack configs, logs, and checkpoints per experiment

Task Flow

mermaid
graph TD
    O[Orchestrator] -->|"task + spec"| C[Coder]
    C --> I{Task Type}
    I -->|implement| Code[Write Code]
    I -->|train| Train[Launch Training]
    I -->|debug| Debug[Fix Error]
    I -->|extract| Extract[Parse Results]
    
    Code --> T[Run Tests]
    T -->|pass| R[Report to Orchestrator]
    T -->|fail| Debug
    
    Train --> M[Monitor Start]
    M -->|healthy| R
    M -->|error| Debug
    
    Debug --> T2{Fixed?}
    T2 -->|yes| R
    T2 -->|no, retry < 3| Debug
    T2 -->|no, retry >= 3| E[Escalate to Orchestrator]
    
    Extract --> R
    
    style O fill:#f9f0ff,stroke:#7c3aed
    style C fill:#fef3c7,stroke:#d97706
    style E fill:#fee2e2,stroke:#dc2626
    style R fill:#dcfce7,stroke:#16a34a

Error Handling: The Ralph Self-Fix Loop

When the Coder encounters an error, it enters a ralph loop — a tight fix-test-retry cycle.

Step 1: Coder runs code → Error: "RuntimeError: CUDA out of memory"
Step 2: Coder analyzes error → Reduces batch size in config
Step 3: Coder runs code → Error: "AssertionError: unexpected shape [64, 512]"
Step 4: Coder analyzes error → Fixes tensor reshape
Step 5: Coder runs code → Tests pass ✓
Step 6: Coder reports success to Orchestrator

Ralph Rules

RuleDetail
Max retries3 attempts per error type
ScopeFix the immediate error only — no refactoring
EscalationAfter 3 failures, stop and report to Orchestrator
LoggingEach attempt logged to logs/errors.log

Ralph fixes, it doesn't redesign

The ralph loop is for fixing implementation errors — bugs, shape mismatches, OOM. If the Coder encounters a design problem (e.g., "this algorithm can't work because X"), it must escalate to the Orchestrator immediately. The Coder does not redesign experiments.

Does Not Make Design Decisions

This is the Coder's most important constraint:

The Coder DoesThe Coder Does NOT
Implement the specified architectureChoose which architecture to implement
Set hyperparameters per specDecide which hyperparameters to try
Run the specified experimentsDecide which experiments to run
Fix bugs in codeChange the experimental design
Report unexpected resultsInterpret what unexpected results mean
Optimize code for speedChange the algorithm for speed

The Coder is an expert implementer, not a researcher

Think of the Coder as a highly skilled research engineer. You hand it a specification and it builds exactly that — efficiently, correctly, and reliably. It doesn't second-guess the research direction. If something seems wrong with the design, it reports the observation and lets the Orchestrator (the PI) decide.

Example: Design vs. Implementation

# This is an implementation fix (Coder handles it):
"RuntimeError: shape mismatch in attention projection"
→ Fix the projection dimensions

# This is a design problem (Coder escalates):
"Training converges but perplexity is 30% worse than baseline"
→ Report to Orchestrator: "Results significantly below expected. 
   Perplexity: 25.1 vs expected 18.5. May need design revision."

Output

The Coder writes all output to disk:

OutputLocationFormat
Codesrc/Python files
Configsexperiments/exp-*/config.yamlYAML
Training logsexperiments/exp-*/log.jsonlStructured JSONL
Resultsexperiments/exp-*/results.yamlYAML
Figurespapers/figures/*.pdfPDF plots
Error logslogs/errors.logTimestamped text

The Orchestrator receives a one-line summary:

"Experiment exp-003 training complete. Final perplexity: 18.7. 
 Results in experiments/exp-003/results.yaml"

Next

AutoResearch — Multi-agent Deep Learning Research System