Scout
The Scout is the literature and knowledge agent. It uses Gemini's broad knowledge base to search for papers, generate research ideas, and prepare structured information for other agents.
Identity
| Property | Value |
|---|---|
| LLM | Gemini |
| Invocation | omc team 1:gemini:scout "task" |
| Lifecycle | Per-task — invoked for specific search/analysis tasks |
| Session | Named tmux session ({prefix}-scout) |
When Invoked
| Stage | Task | Output |
|---|---|---|
| Ideation | Idea generation | ideas/brainstorm.md |
| Ideation | Baseline finding | design/baselines.yaml |
| Design | Paper detail fetch | papers/related_work/summaries.yaml |
| Writing | Related work preparation | papers/related_work/papers.bib + summaries |
| Writing | Figure description design | papers/figures/descriptions.yaml |
| Review | Finding reviewer-cited papers | Additional entries in summaries.yaml |
Idea Generation
During ideation, the Scout generates research ideas by exploring the landscape around a topic.
What It Receives
topic: "efficient attention mechanisms for long sequences"
constraints:
- Must be implementable in 2 weeks
- Must run on 4x A100 GPUs
- Target venue: ICMLWhat It Produces
The Scout generates ideas without requiring code or implementation feasibility analysis — that's the Judge's job later.
# ideas/brainstorm.md
## Idea 1: Flash-Recurrent Attention
Combine flash attention's IO-awareness with RetNet's recurrent form...
- Novelty angle: No one has applied flash attention's tiling to recurrent attention
- Expected benefit: O(n) memory, hardware-efficient
## Idea 2: Sparse Retention Patterns
Apply learned sparsity to the retention mechanism...
- Novelty angle: Retention + dynamic sparsity is unexplored
- Expected benefit: Sub-linear compute for very long sequences
## Idea 3: Multi-Scale Retention with Mixture of Experts
Different retention scales for different attention heads, gated by MoE...Ideas are creative, not conservative
The Scout is instructed to be expansive during idea generation. Bad ideas can be filtered later by the Judge. Missing a good idea because the Scout was too conservative is a worse failure mode.
Baseline Finding
When the Orchestrator needs baselines for experiment design, the Scout searches for relevant papers with specific requirements:
Search Criteria
| Criterion | Requirement |
|---|---|
| Venue quality | Top venues only (ICML, NeurIPS, ICLR, ACL, EMNLP, CVPR, etc.) |
| Code availability | Required — baselines without code are flagged but deprioritized |
| Recency | Prefer last 2 years, allow older if seminal |
| Relevance | Direct comparison possible with proposed method |
Code is required for baselines
Unlike idea generation, baseline finding has a hard requirement: baselines must have available code. A baseline that can't be reproduced can't be compared against. The Scout flags papers without code as code_available: false and the Planner deprioritizes them.
Output Format
# design/baselines.yaml
baselines:
- name: "Flash Attention 2"
paper: "Dao, 2023"
venue: "ICLR 2024"
code: "https://github.com/Dao-AILab/flash-attention"
code_available: true
key_results:
throughput: "2.5x vanilla attention"
memory: "O(n) instead of O(n^2)"
comparison_axes: [throughput, memory, perplexity]Paper Digestion
The Scout's most detailed work is digesting papers into structured documents for other agents (especially the Writer).
Input
"Digest this paper for the related work section:
Title: 'Retentive Network: A Successor to Transformer for Large Language Models'
Focus: retention mechanism, recurrent formulation, training parallelism"Output
# papers/related_work/summaries.yaml (appended)
- key: "sun2023retnet"
title: "Retentive Network: A Successor to Transformer for Large Language Models"
authors: "Sun et al."
venue: "arXiv 2023"
contribution: |
Proposes a retention mechanism that enables parallel training
(like Transformers) and recurrent inference (like RNNs). Uses
multi-scale exponential decay for position encoding.
method_summary: |
Replaces softmax attention with a retention mechanism based on
exponential decay. Three computation modes: parallel (training),
recurrent (inference), chunk-wise (long sequences).
key_results:
language_modeling: "Competitive with Transformer on perplexity"
inference_speed: "8.4x faster than Transformer at 8k length"
relevance: |
Core inspiration for our work. We extend the recurrent mode
with flash attention's tiling strategy for better hardware utilization.
bibtex: "@article{sun2023retnet, ...}"Figure Description Design
The Scout also helps design figures by writing structured descriptions that can be used to generate plots.
# papers/figures/descriptions.yaml
figures:
- id: "fig:throughput"
type: "line_plot"
title: "Throughput vs Sequence Length"
x_axis: "Sequence Length"
y_axis: "Tokens/second"
series:
- label: "Ours"
data_source: "experiments/summary.yaml#throughput"
- label: "Flash Attention 2"
data_source: "experiments/summary.yaml#baseline_flash"
- label: "RetNet"
data_source: "experiments/summary.yaml#baseline_retnet"
caption_draft: |
Our method achieves higher throughput than both baselines
for sequences longer than 4096 tokens.The Scout describes, the Coder generates
The Scout designs what figures should show and writes descriptions. The actual figure generation (matplotlib/tikz code) is the Coder's job, using these descriptions as specifications.
Next
- Writer — who uses the Scout's literature output
- Judge — who evaluates the Scout's idea suggestions
- Ideation Stage — where the Scout is most active