Stage 7: Review

The review stage subjects the paper draft to a simulated peer review using a three-model review panel. Based on the reviews, the system produces a revision plan and can loop back to the Writing stage for revisions.

Entering This Stage

What you have:

Complete paper draft (papers/drafts/current.tex)
All figures and tables
Complete bibliography
Analysis and results on disk

What you don't have yet:

Peer review feedback
Revision plan
Final camera-ready paper

Steps

mermaid

graph TD
    A[1. Review Dispatch] --> B[2. Three-Model Review]
    B --> C[3. Meta-Review]
    C --> D[4. Revision Planning]
    D --> E{Gate}
    E -->|accept| F[Done: Paper Complete]
    E -->|revise| G[5. Revision Execution]
    G --> H[6. Re-Review]
    H --> C

    style B fill:#fef3c7,stroke:#d97706
    style C fill:#f9f0ff,stroke:#7c3aed
    style F fill:#dcfce7,stroke:#16a34a
    style G fill:#ede9fe,stroke:#7c3aed

1. Review Dispatch

Agent: Orchestrator (Claude Opus)

The Orchestrator prepares the paper for review:

Compile the full PDF (if LaTeX)
Extract the paper text for LLM consumption
Prepare review instructions tailored to each reviewer
Set venue-specific evaluation criteria

yaml

# Review dispatch configuration
venue: "ICML 2025"
review_format: "ICML 2025 reviewer guidelines"
criteria:
  - novelty
  - technical_correctness
  - experimental_thoroughness
  - clarity_of_writing
  - significance_of_results
  - reproducibility

2. Three-Model Review

Agents: Codex (Judge), Claude (sub-agent), Gemini (Scout session)

Three independent reviews are conducted in parallel:

Reviewer	LLM	Focus	Invocation
Technical Reviewer	Codex	Correctness, experiments, reproducibility	`codex exec` (stateless)
Clarity Reviewer	Claude Opus	Writing quality, argument strength, novelty	Sub-agent (fresh context)
Positioning Reviewer	Gemini	Related work, broader impact, positioning	Tmux worker

Each reviewer is independent

No reviewer sees the other reviewers' feedback. Each evaluates the paper from a fresh perspective with different focus areas. This mirrors real peer review where reviewers work independently.

Each reviewer outputs a structured review:

yaml

# reviews/paper_reviews/codex_review.yaml
reviewer: codex
role: technical_reviewer
overall: WEAK_ACCEPT  # STRONG_REJECT / WEAK_REJECT / BORDERLINE / WEAK_ACCEPT / STRONG_ACCEPT

strengths:
  - "Clean experimental setup with 4 strong baselines"
  - "Thorough ablation study isolating each component"
  - "Statistical significance reported for all comparisons"

weaknesses:
  - "Missing comparison with Mamba (released after submission cutoff)"
  - "Wall-clock training time not reported"
  - "Error bars missing from throughput measurements in Table 2"

questions:
  - "How does the method scale beyond 32k tokens?"
  - "What is the performance impact on downstream tasks?"

detailed_comments:
  - section: "Method"
    line: "Equation 3"
    comment: "The recurrence relation should be stated more formally"
  - section: "Experiments"
    line: "Table 1"
    comment: "Add inference latency column"

confidence: 4  # 1 (low) to 5 (expert)

3. Meta-Review

Agent: Orchestrator (Claude Opus)

The Orchestrator aggregates the three reviews into a meta-review:

yaml

# reviews/meta_review.yaml
verdict: REVISE  # ACCEPT / REVISE / REJECT

review_summary:
  codex: WEAK_ACCEPT
  claude: WEAK_ACCEPT
  gemini: BORDERLINE

consensus_strengths:
  - "Strong experimental methodology (3/3 agree)"
  - "Clear writing in Method section (2/3 agree)"

consensus_weaknesses:
  - "Missing recent baseline (Mamba) — raised by 2/3"
  - "Insufficient error reporting — raised by 3/3"

critical_issues:
  - id: 1
    issue: "No Mamba comparison"
    severity: major
    raised_by: [codex, gemini]
    
  - id: 2
    issue: "Missing error bars in throughput"
    severity: minor
    raised_by: [codex, claude, gemini]

  - id: 3
    issue: "Downstream task evaluation absent"
    severity: major
    raised_by: [codex]

overall_assessment: |
  Paper presents a solid efficiency contribution but has two major gaps: 
  missing Mamba comparison and no downstream evaluation. Recommend 
  revision to address these before submission.

The meta-review is honest

The Orchestrator does not cherry-pick positive feedback. If two out of three reviewers raise an issue, it's flagged as a consensus weakness. The goal is to improve the paper, not to confirm its quality.

4. Revision Planning

Agent: Planner (Claude Opus, sub-agent)

The Planner creates a revision plan from the meta-review:

yaml

# reviews/revision_plan.md
## Revision Plan

### Must Address (Major Issues)
1. **Add Mamba comparison** [raised by 2/3 reviewers]
   - Action: Coder implements Mamba baseline + runs experiment
   - Estimated time: 1 day (implementation) + 1 day (training)
   - Affected sections: Tables 1-2, Section 4.1

2. **Add downstream evaluation** [raised by 1/3, but valid]
   - Action: Coder runs on GLUE benchmark
   - Estimated time: 0.5 days
   - Affected sections: New Table 3, Section 4.4

### Should Address (Minor Issues)
3. **Add error bars to throughput** [raised by 3/3]
   - Action: Coder re-runs throughput benchmark 3x
   - Estimated time: 2 hours
   - Affected sections: Table 2

4. **Formalize recurrence relation** [raised by 1/3]
   - Action: Writer revises Equation 3
   - Affected sections: Section 3.2

### Will Not Address (with rebuttal)
5. **32k+ token scaling** [raised by 1/3]
   - Rebuttal: Outside compute budget, acknowledged as limitation

5. Revision Execution

The pipeline loops back to relevant stages:

mermaid

graph LR
    RP[Revision Plan] --> I[Implementation<br/>Mamba baseline]
    RP --> T[Training<br/>New experiments]
    RP --> A[Analysis<br/>Updated results]
    RP --> W[Writing<br/>Revised sections]

    I --> T
    T --> A
    A --> W
    W --> RR[Re-Review]

    style RP fill:#fef3c7,stroke:#d97706
    style RR fill:#dbeafe,stroke:#2563eb

Targeted revisions, not full re-run

The revision only re-runs what's needed. If the revision plan calls for adding a Mamba baseline, only the Mamba experiment is run — not all experiments from scratch. Similarly, only affected paper sections are rewritten.

6. Re-Review

Agent: Judge (Codex, codex exec)

After revision, a focused re-review checks:

Were all "must address" issues resolved?
Did the revisions introduce new problems?
Is the paper now ready for submission?

The re-review is lighter than the initial review — it focuses on the changes, not the entire paper.

Gate

Gate Type	Recommended	Behavior
`human`	Yes	User reads final paper before submission
`auto-judge`	Possible	Judge checks revision completeness
`auto`	Not recommended	Final paper quality deserves human eyes

Always review the final paper yourself

The review stage can loop multiple times. After revisions are complete and the re-review passes, you should read the final paper. You are the author — you should be confident in every claim and every sentence before submission.

Error Handling

Error	Recovery
All three reviewers reject	Orchestrator flags for major revision or pivot
Revision requires new experiments	Loop back to Implementation + Training
Revision introduces inconsistencies	Writer integration pass
Human disagrees with revision plan	Human overrides plan
Revision-resubmission loop exceeds 3 iterations	Orchestrator pauses, asks human for direction

When to stop revising

The review loop can theoretically continue indefinitely. In practice:

1 revision round is typical for most papers
2 rounds if the first revision revealed additional issues
3+ rounds suggests a deeper problem — the Orchestrator pauses and asks the human whether to continue revising, pivot the framing, or accept the paper as-is

Outputs Summary

File	Contents
`reviews/paper_reviews/codex_review.yaml`	Technical review
`reviews/paper_reviews/claude_review.yaml`	Clarity review
`reviews/paper_reviews/gemini_review.yaml`	Positioning review
`reviews/meta_review.yaml`	Aggregated meta-review
`reviews/revision_plan.md`	Structured revision plan
`papers/drafts/v2.tex`	Revised paper (if revised)
`papers/drafts/current.tex`	Symlink to latest version

Pipeline Complete

When the review gate passes (human approves the final paper):

pipeline.yaml status is set to complete
Final paper is in papers/drafts/current.tex
All research artifacts are preserved in .omc/research/
The project can be archived

yaml

# pipeline.yaml (final state)
current_stage: "complete"
stages:
  review:
    status: complete
    gate: human
    completed: "2025-06-10T15:00:00Z"
    revisions: 1

Stage 7: Review ​

Entering This Stage ​

Steps ​

1. Review Dispatch ​

2. Three-Model Review ​

3. Meta-Review ​

4. Revision Planning ​

5. Revision Execution ​

6. Re-Review ​

Gate ​

Error Handling ​

Outputs Summary ​

Pipeline Complete ​

Stage 7: Review

Entering This Stage

Steps

1. Review Dispatch

2. Three-Model Review

3. Meta-Review

4. Revision Planning

5. Revision Execution

6. Re-Review

Gate

Error Handling

Outputs Summary

Pipeline Complete