Stage 7: Review
The review stage subjects the paper draft to a simulated peer review using a three-model review panel. Based on the reviews, the system produces a revision plan and can loop back to the Writing stage for revisions.
Entering This Stage
What you have:
- Complete paper draft (
papers/drafts/current.tex) - All figures and tables
- Complete bibliography
- Analysis and results on disk
What you don't have yet:
- Peer review feedback
- Revision plan
- Final camera-ready paper
Steps
graph TD
A[1. Review Dispatch] --> B[2. Three-Model Review]
B --> C[3. Meta-Review]
C --> D[4. Revision Planning]
D --> E{Gate}
E -->|accept| F[Done: Paper Complete]
E -->|revise| G[5. Revision Execution]
G --> H[6. Re-Review]
H --> C
style B fill:#fef3c7,stroke:#d97706
style C fill:#f9f0ff,stroke:#7c3aed
style F fill:#dcfce7,stroke:#16a34a
style G fill:#ede9fe,stroke:#7c3aed1. Review Dispatch
Agent: Orchestrator (Claude Opus)
The Orchestrator prepares the paper for review:
- Compile the full PDF (if LaTeX)
- Extract the paper text for LLM consumption
- Prepare review instructions tailored to each reviewer
- Set venue-specific evaluation criteria
# Review dispatch configuration
venue: "ICML 2025"
review_format: "ICML 2025 reviewer guidelines"
criteria:
- novelty
- technical_correctness
- experimental_thoroughness
- clarity_of_writing
- significance_of_results
- reproducibility2. Three-Model Review
Agents: Codex (Judge), Claude (sub-agent), Gemini (Scout session)
Three independent reviews are conducted in parallel:
| Reviewer | LLM | Focus | Invocation |
|---|---|---|---|
| Technical Reviewer | Codex | Correctness, experiments, reproducibility | codex exec (stateless) |
| Clarity Reviewer | Claude Opus | Writing quality, argument strength, novelty | Sub-agent (fresh context) |
| Positioning Reviewer | Gemini | Related work, broader impact, positioning | Tmux worker |
Each reviewer is independent
No reviewer sees the other reviewers' feedback. Each evaluates the paper from a fresh perspective with different focus areas. This mirrors real peer review where reviewers work independently.
Each reviewer outputs a structured review:
# reviews/paper_reviews/codex_review.yaml
reviewer: codex
role: technical_reviewer
overall: WEAK_ACCEPT # STRONG_REJECT / WEAK_REJECT / BORDERLINE / WEAK_ACCEPT / STRONG_ACCEPT
strengths:
- "Clean experimental setup with 4 strong baselines"
- "Thorough ablation study isolating each component"
- "Statistical significance reported for all comparisons"
weaknesses:
- "Missing comparison with Mamba (released after submission cutoff)"
- "Wall-clock training time not reported"
- "Error bars missing from throughput measurements in Table 2"
questions:
- "How does the method scale beyond 32k tokens?"
- "What is the performance impact on downstream tasks?"
detailed_comments:
- section: "Method"
line: "Equation 3"
comment: "The recurrence relation should be stated more formally"
- section: "Experiments"
line: "Table 1"
comment: "Add inference latency column"
confidence: 4 # 1 (low) to 5 (expert)3. Meta-Review
Agent: Orchestrator (Claude Opus)
The Orchestrator aggregates the three reviews into a meta-review:
# reviews/meta_review.yaml
verdict: REVISE # ACCEPT / REVISE / REJECT
review_summary:
codex: WEAK_ACCEPT
claude: WEAK_ACCEPT
gemini: BORDERLINE
consensus_strengths:
- "Strong experimental methodology (3/3 agree)"
- "Clear writing in Method section (2/3 agree)"
consensus_weaknesses:
- "Missing recent baseline (Mamba) — raised by 2/3"
- "Insufficient error reporting — raised by 3/3"
critical_issues:
- id: 1
issue: "No Mamba comparison"
severity: major
raised_by: [codex, gemini]
- id: 2
issue: "Missing error bars in throughput"
severity: minor
raised_by: [codex, claude, gemini]
- id: 3
issue: "Downstream task evaluation absent"
severity: major
raised_by: [codex]
overall_assessment: |
Paper presents a solid efficiency contribution but has two major gaps:
missing Mamba comparison and no downstream evaluation. Recommend
revision to address these before submission.The meta-review is honest
The Orchestrator does not cherry-pick positive feedback. If two out of three reviewers raise an issue, it's flagged as a consensus weakness. The goal is to improve the paper, not to confirm its quality.
4. Revision Planning
Agent: Planner (Claude Opus, sub-agent)
The Planner creates a revision plan from the meta-review:
# reviews/revision_plan.md
## Revision Plan
### Must Address (Major Issues)
1. **Add Mamba comparison** [raised by 2/3 reviewers]
- Action: Coder implements Mamba baseline + runs experiment
- Estimated time: 1 day (implementation) + 1 day (training)
- Affected sections: Tables 1-2, Section 4.1
2. **Add downstream evaluation** [raised by 1/3, but valid]
- Action: Coder runs on GLUE benchmark
- Estimated time: 0.5 days
- Affected sections: New Table 3, Section 4.4
### Should Address (Minor Issues)
3. **Add error bars to throughput** [raised by 3/3]
- Action: Coder re-runs throughput benchmark 3x
- Estimated time: 2 hours
- Affected sections: Table 2
4. **Formalize recurrence relation** [raised by 1/3]
- Action: Writer revises Equation 3
- Affected sections: Section 3.2
### Will Not Address (with rebuttal)
5. **32k+ token scaling** [raised by 1/3]
- Rebuttal: Outside compute budget, acknowledged as limitation5. Revision Execution
The pipeline loops back to relevant stages:
graph LR
RP[Revision Plan] --> I[Implementation<br/>Mamba baseline]
RP --> T[Training<br/>New experiments]
RP --> A[Analysis<br/>Updated results]
RP --> W[Writing<br/>Revised sections]
I --> T
T --> A
A --> W
W --> RR[Re-Review]
style RP fill:#fef3c7,stroke:#d97706
style RR fill:#dbeafe,stroke:#2563ebTargeted revisions, not full re-run
The revision only re-runs what's needed. If the revision plan calls for adding a Mamba baseline, only the Mamba experiment is run — not all experiments from scratch. Similarly, only affected paper sections are rewritten.
6. Re-Review
Agent: Judge (Codex, codex exec)
After revision, a focused re-review checks:
- Were all "must address" issues resolved?
- Did the revisions introduce new problems?
- Is the paper now ready for submission?
The re-review is lighter than the initial review — it focuses on the changes, not the entire paper.
Gate
| Gate Type | Recommended | Behavior |
|---|---|---|
human | Yes | User reads final paper before submission |
auto-judge | Possible | Judge checks revision completeness |
auto | Not recommended | Final paper quality deserves human eyes |
Always review the final paper yourself
The review stage can loop multiple times. After revisions are complete and the re-review passes, you should read the final paper. You are the author — you should be confident in every claim and every sentence before submission.
Error Handling
| Error | Recovery |
|---|---|
| All three reviewers reject | Orchestrator flags for major revision or pivot |
| Revision requires new experiments | Loop back to Implementation + Training |
| Revision introduces inconsistencies | Writer integration pass |
| Human disagrees with revision plan | Human overrides plan |
| Revision-resubmission loop exceeds 3 iterations | Orchestrator pauses, asks human for direction |
When to stop revising
The review loop can theoretically continue indefinitely. In practice:
- 1 revision round is typical for most papers
- 2 rounds if the first revision revealed additional issues
- 3+ rounds suggests a deeper problem — the Orchestrator pauses and asks the human whether to continue revising, pivot the framing, or accept the paper as-is
Outputs Summary
| File | Contents |
|---|---|
reviews/paper_reviews/codex_review.yaml | Technical review |
reviews/paper_reviews/claude_review.yaml | Clarity review |
reviews/paper_reviews/gemini_review.yaml | Positioning review |
reviews/meta_review.yaml | Aggregated meta-review |
reviews/revision_plan.md | Structured revision plan |
papers/drafts/v2.tex | Revised paper (if revised) |
papers/drafts/current.tex | Symlink to latest version |
Pipeline Complete
When the review gate passes (human approves the final paper):
pipeline.yamlstatus is set tocomplete- Final paper is in
papers/drafts/current.tex - All research artifacts are preserved in
.omc/research/ - The project can be archived
# pipeline.yaml (final state)
current_stage: "complete"
stages:
review:
status: complete
gate: human
completed: "2025-06-10T15:00:00Z"
revisions: 1