Skip to content

Stage 7: Review

The review stage subjects the paper draft to a simulated peer review using a three-model review panel. Based on the reviews, the system produces a revision plan and can loop back to the Writing stage for revisions.

Entering This Stage

What you have:

  • Complete paper draft (papers/drafts/current.tex)
  • All figures and tables
  • Complete bibliography
  • Analysis and results on disk

What you don't have yet:

  • Peer review feedback
  • Revision plan
  • Final camera-ready paper

Steps

mermaid
graph TD
    A[1. Review Dispatch] --> B[2. Three-Model Review]
    B --> C[3. Meta-Review]
    C --> D[4. Revision Planning]
    D --> E{Gate}
    E -->|accept| F[Done: Paper Complete]
    E -->|revise| G[5. Revision Execution]
    G --> H[6. Re-Review]
    H --> C

    style B fill:#fef3c7,stroke:#d97706
    style C fill:#f9f0ff,stroke:#7c3aed
    style F fill:#dcfce7,stroke:#16a34a
    style G fill:#ede9fe,stroke:#7c3aed

1. Review Dispatch

Agent: Orchestrator (Claude Opus)

The Orchestrator prepares the paper for review:

  • Compile the full PDF (if LaTeX)
  • Extract the paper text for LLM consumption
  • Prepare review instructions tailored to each reviewer
  • Set venue-specific evaluation criteria
yaml
# Review dispatch configuration
venue: "ICML 2025"
review_format: "ICML 2025 reviewer guidelines"
criteria:
  - novelty
  - technical_correctness
  - experimental_thoroughness
  - clarity_of_writing
  - significance_of_results
  - reproducibility

2. Three-Model Review

Agents: Codex (Judge), Claude (sub-agent), Gemini (Scout session)

Three independent reviews are conducted in parallel:

ReviewerLLMFocusInvocation
Technical ReviewerCodexCorrectness, experiments, reproducibilitycodex exec (stateless)
Clarity ReviewerClaude OpusWriting quality, argument strength, noveltySub-agent (fresh context)
Positioning ReviewerGeminiRelated work, broader impact, positioningTmux worker

Each reviewer is independent

No reviewer sees the other reviewers' feedback. Each evaluates the paper from a fresh perspective with different focus areas. This mirrors real peer review where reviewers work independently.

Each reviewer outputs a structured review:

yaml
# reviews/paper_reviews/codex_review.yaml
reviewer: codex
role: technical_reviewer
overall: WEAK_ACCEPT  # STRONG_REJECT / WEAK_REJECT / BORDERLINE / WEAK_ACCEPT / STRONG_ACCEPT

strengths:
  - "Clean experimental setup with 4 strong baselines"
  - "Thorough ablation study isolating each component"
  - "Statistical significance reported for all comparisons"

weaknesses:
  - "Missing comparison with Mamba (released after submission cutoff)"
  - "Wall-clock training time not reported"
  - "Error bars missing from throughput measurements in Table 2"

questions:
  - "How does the method scale beyond 32k tokens?"
  - "What is the performance impact on downstream tasks?"

detailed_comments:
  - section: "Method"
    line: "Equation 3"
    comment: "The recurrence relation should be stated more formally"
  - section: "Experiments"
    line: "Table 1"
    comment: "Add inference latency column"

confidence: 4  # 1 (low) to 5 (expert)

3. Meta-Review

Agent: Orchestrator (Claude Opus)

The Orchestrator aggregates the three reviews into a meta-review:

yaml
# reviews/meta_review.yaml
verdict: REVISE  # ACCEPT / REVISE / REJECT

review_summary:
  codex: WEAK_ACCEPT
  claude: WEAK_ACCEPT
  gemini: BORDERLINE

consensus_strengths:
  - "Strong experimental methodology (3/3 agree)"
  - "Clear writing in Method section (2/3 agree)"

consensus_weaknesses:
  - "Missing recent baseline (Mamba) — raised by 2/3"
  - "Insufficient error reporting — raised by 3/3"

critical_issues:
  - id: 1
    issue: "No Mamba comparison"
    severity: major
    raised_by: [codex, gemini]
    
  - id: 2
    issue: "Missing error bars in throughput"
    severity: minor
    raised_by: [codex, claude, gemini]

  - id: 3
    issue: "Downstream task evaluation absent"
    severity: major
    raised_by: [codex]

overall_assessment: |
  Paper presents a solid efficiency contribution but has two major gaps: 
  missing Mamba comparison and no downstream evaluation. Recommend 
  revision to address these before submission.

The meta-review is honest

The Orchestrator does not cherry-pick positive feedback. If two out of three reviewers raise an issue, it's flagged as a consensus weakness. The goal is to improve the paper, not to confirm its quality.

4. Revision Planning

Agent: Planner (Claude Opus, sub-agent)

The Planner creates a revision plan from the meta-review:

yaml
# reviews/revision_plan.md
## Revision Plan

### Must Address (Major Issues)
1. **Add Mamba comparison** [raised by 2/3 reviewers]
   - Action: Coder implements Mamba baseline + runs experiment
   - Estimated time: 1 day (implementation) + 1 day (training)
   - Affected sections: Tables 1-2, Section 4.1

2. **Add downstream evaluation** [raised by 1/3, but valid]
   - Action: Coder runs on GLUE benchmark
   - Estimated time: 0.5 days
   - Affected sections: New Table 3, Section 4.4

### Should Address (Minor Issues)
3. **Add error bars to throughput** [raised by 3/3]
   - Action: Coder re-runs throughput benchmark 3x
   - Estimated time: 2 hours
   - Affected sections: Table 2

4. **Formalize recurrence relation** [raised by 1/3]
   - Action: Writer revises Equation 3
   - Affected sections: Section 3.2

### Will Not Address (with rebuttal)
5. **32k+ token scaling** [raised by 1/3]
   - Rebuttal: Outside compute budget, acknowledged as limitation

5. Revision Execution

The pipeline loops back to relevant stages:

mermaid
graph LR
    RP[Revision Plan] --> I[Implementation<br/>Mamba baseline]
    RP --> T[Training<br/>New experiments]
    RP --> A[Analysis<br/>Updated results]
    RP --> W[Writing<br/>Revised sections]

    I --> T
    T --> A
    A --> W
    W --> RR[Re-Review]

    style RP fill:#fef3c7,stroke:#d97706
    style RR fill:#dbeafe,stroke:#2563eb

Targeted revisions, not full re-run

The revision only re-runs what's needed. If the revision plan calls for adding a Mamba baseline, only the Mamba experiment is run — not all experiments from scratch. Similarly, only affected paper sections are rewritten.

6. Re-Review

Agent: Judge (Codex, codex exec)

After revision, a focused re-review checks:

  • Were all "must address" issues resolved?
  • Did the revisions introduce new problems?
  • Is the paper now ready for submission?

The re-review is lighter than the initial review — it focuses on the changes, not the entire paper.

Gate

Gate TypeRecommendedBehavior
humanYesUser reads final paper before submission
auto-judgePossibleJudge checks revision completeness
autoNot recommendedFinal paper quality deserves human eyes

Always review the final paper yourself

The review stage can loop multiple times. After revisions are complete and the re-review passes, you should read the final paper. You are the author — you should be confident in every claim and every sentence before submission.

Error Handling

ErrorRecovery
All three reviewers rejectOrchestrator flags for major revision or pivot
Revision requires new experimentsLoop back to Implementation + Training
Revision introduces inconsistenciesWriter integration pass
Human disagrees with revision planHuman overrides plan
Revision-resubmission loop exceeds 3 iterationsOrchestrator pauses, asks human for direction
When to stop revising

The review loop can theoretically continue indefinitely. In practice:

  • 1 revision round is typical for most papers
  • 2 rounds if the first revision revealed additional issues
  • 3+ rounds suggests a deeper problem — the Orchestrator pauses and asks the human whether to continue revising, pivot the framing, or accept the paper as-is

Outputs Summary

FileContents
reviews/paper_reviews/codex_review.yamlTechnical review
reviews/paper_reviews/claude_review.yamlClarity review
reviews/paper_reviews/gemini_review.yamlPositioning review
reviews/meta_review.yamlAggregated meta-review
reviews/revision_plan.mdStructured revision plan
papers/drafts/v2.texRevised paper (if revised)
papers/drafts/current.texSymlink to latest version

Pipeline Complete

When the review gate passes (human approves the final paper):

  • pipeline.yaml status is set to complete
  • Final paper is in papers/drafts/current.tex
  • All research artifacts are preserved in .omc/research/
  • The project can be archived
yaml
# pipeline.yaml (final state)
current_stage: "complete"
stages:
  review:
    status: complete
    gate: human
    completed: "2025-06-10T15:00:00Z"
    revisions: 1

AutoResearch — Multi-agent Deep Learning Research System