Coder
The Coder is the implementation workhorse. It writes code, runs experiments, debugs errors, and produces results — all within the boundaries set by the Planner's design.
Identity
| Property | Value |
|---|---|
| LLM | Codex (GPT) |
| Invocation | omc team 1:codex:coder "task" |
| Lifecycle | Persistent — stays alive across multiple tasks |
| Session | Named tmux session ({prefix}-coder) |
Why persistent?
Unlike the Writer (fresh session per section) or the Judge (stateless per invocation), the Coder keeps its tmux session alive across tasks. This preserves the development environment — conda env, working directory, running processes, terminal history. The Coder can resume exactly where it left off.
Responsibilities
| Responsibility | Description |
|---|---|
| Code implementation | Translate Planner's design into working code |
| Environment setup | Install dependencies, configure GPU, prepare data |
| Training execution | Launch and monitor training runs |
| Debugging | Fix errors in code, data pipeline, or training |
| Testing | Run tests to verify implementation correctness |
| Result extraction | Parse logs and produce structured results files |
| Figure generation | Create plots from Scout's figure descriptions |
| Experiment management | Track configs, logs, and checkpoints per experiment |
Task Flow
graph TD
O[Orchestrator] -->|"task + spec"| C[Coder]
C --> I{Task Type}
I -->|implement| Code[Write Code]
I -->|train| Train[Launch Training]
I -->|debug| Debug[Fix Error]
I -->|extract| Extract[Parse Results]
Code --> T[Run Tests]
T -->|pass| R[Report to Orchestrator]
T -->|fail| Debug
Train --> M[Monitor Start]
M -->|healthy| R
M -->|error| Debug
Debug --> T2{Fixed?}
T2 -->|yes| R
T2 -->|no, retry < 3| Debug
T2 -->|no, retry >= 3| E[Escalate to Orchestrator]
Extract --> R
style O fill:#f9f0ff,stroke:#7c3aed
style C fill:#fef3c7,stroke:#d97706
style E fill:#fee2e2,stroke:#dc2626
style R fill:#dcfce7,stroke:#16a34aError Handling: The Ralph Self-Fix Loop
When the Coder encounters an error, it enters a ralph loop — a tight fix-test-retry cycle.
Step 1: Coder runs code → Error: "RuntimeError: CUDA out of memory"
Step 2: Coder analyzes error → Reduces batch size in config
Step 3: Coder runs code → Error: "AssertionError: unexpected shape [64, 512]"
Step 4: Coder analyzes error → Fixes tensor reshape
Step 5: Coder runs code → Tests pass ✓
Step 6: Coder reports success to OrchestratorRalph Rules
| Rule | Detail |
|---|---|
| Max retries | 3 attempts per error type |
| Scope | Fix the immediate error only — no refactoring |
| Escalation | After 3 failures, stop and report to Orchestrator |
| Logging | Each attempt logged to logs/errors.log |
Ralph fixes, it doesn't redesign
The ralph loop is for fixing implementation errors — bugs, shape mismatches, OOM. If the Coder encounters a design problem (e.g., "this algorithm can't work because X"), it must escalate to the Orchestrator immediately. The Coder does not redesign experiments.
Does Not Make Design Decisions
This is the Coder's most important constraint:
| The Coder Does | The Coder Does NOT |
|---|---|
| Implement the specified architecture | Choose which architecture to implement |
| Set hyperparameters per spec | Decide which hyperparameters to try |
| Run the specified experiments | Decide which experiments to run |
| Fix bugs in code | Change the experimental design |
| Report unexpected results | Interpret what unexpected results mean |
| Optimize code for speed | Change the algorithm for speed |
The Coder is an expert implementer, not a researcher
Think of the Coder as a highly skilled research engineer. You hand it a specification and it builds exactly that — efficiently, correctly, and reliably. It doesn't second-guess the research direction. If something seems wrong with the design, it reports the observation and lets the Orchestrator (the PI) decide.
Example: Design vs. Implementation
# This is an implementation fix (Coder handles it):
"RuntimeError: shape mismatch in attention projection"
→ Fix the projection dimensions
# This is a design problem (Coder escalates):
"Training converges but perplexity is 30% worse than baseline"
→ Report to Orchestrator: "Results significantly below expected.
Perplexity: 25.1 vs expected 18.5. May need design revision."Output
The Coder writes all output to disk:
| Output | Location | Format |
|---|---|---|
| Code | src/ | Python files |
| Configs | experiments/exp-*/config.yaml | YAML |
| Training logs | experiments/exp-*/log.jsonl | Structured JSONL |
| Results | experiments/exp-*/results.yaml | YAML |
| Figures | papers/figures/*.pdf | PDF plots |
| Error logs | logs/errors.log | Timestamped text |
The Orchestrator receives a one-line summary:
"Experiment exp-003 training complete. Final perplexity: 18.7.
Results in experiments/exp-003/results.yaml"Next
- Judge — who reviews the Coder's work
- Planner — who writes the Coder's specifications
- Implementation Stage — where the Coder is most active