Scout

The Scout is the literature and knowledge agent. It uses Gemini's broad knowledge base to search for papers, generate research ideas, and prepare structured information for other agents.

Identity

Property	Value
LLM	Gemini
Invocation	`omc team 1:gemini:scout "task"`
Lifecycle	Per-task — invoked for specific search/analysis tasks
Session	Named tmux session (`{prefix}-scout`)

When Invoked

Stage	Task	Output
Ideation	Idea generation	`ideas/brainstorm.md`
Ideation	Baseline finding	`design/baselines.yaml`
Design	Paper detail fetch	`papers/related_work/summaries.yaml`
Writing	Related work preparation	`papers/related_work/papers.bib` + summaries
Writing	Figure description design	`papers/figures/descriptions.yaml`
Review	Finding reviewer-cited papers	Additional entries in `summaries.yaml`

Idea Generation

During ideation, the Scout generates research ideas by exploring the landscape around a topic.

What It Receives

yaml

topic: "efficient attention mechanisms for long sequences"
constraints:
  - Must be implementable in 2 weeks
  - Must run on 4x A100 GPUs
  - Target venue: ICML

What It Produces

The Scout generates ideas without requiring code or implementation feasibility analysis — that's the Judge's job later.

yaml

# ideas/brainstorm.md
## Idea 1: Flash-Recurrent Attention
Combine flash attention's IO-awareness with RetNet's recurrent form...
- Novelty angle: No one has applied flash attention's tiling to recurrent attention
- Expected benefit: O(n) memory, hardware-efficient

## Idea 2: Sparse Retention Patterns  
Apply learned sparsity to the retention mechanism...
- Novelty angle: Retention + dynamic sparsity is unexplored
- Expected benefit: Sub-linear compute for very long sequences

## Idea 3: Multi-Scale Retention with Mixture of Experts
Different retention scales for different attention heads, gated by MoE...

Ideas are creative, not conservative

The Scout is instructed to be expansive during idea generation. Bad ideas can be filtered later by the Judge. Missing a good idea because the Scout was too conservative is a worse failure mode.

Baseline Finding

When the Orchestrator needs baselines for experiment design, the Scout searches for relevant papers with specific requirements:

Search Criteria

Criterion	Requirement
Venue quality	Top venues only (ICML, NeurIPS, ICLR, ACL, EMNLP, CVPR, etc.)
Code availability	Required — baselines without code are flagged but deprioritized
Recency	Prefer last 2 years, allow older if seminal
Relevance	Direct comparison possible with proposed method

Code is required for baselines

Unlike idea generation, baseline finding has a hard requirement: baselines must have available code. A baseline that can't be reproduced can't be compared against. The Scout flags papers without code as code_available: false and the Planner deprioritizes them.

Output Format

yaml

# design/baselines.yaml
baselines:
  - name: "Flash Attention 2"
    paper: "Dao, 2023"
    venue: "ICLR 2024"
    code: "https://github.com/Dao-AILab/flash-attention"
    code_available: true
    key_results:
      throughput: "2.5x vanilla attention"
      memory: "O(n) instead of O(n^2)"
    comparison_axes: [throughput, memory, perplexity]

Paper Digestion

The Scout's most detailed work is digesting papers into structured documents for other agents (especially the Writer).

Input

"Digest this paper for the related work section: 
 Title: 'Retentive Network: A Successor to Transformer for Large Language Models'
 Focus: retention mechanism, recurrent formulation, training parallelism"

Output

yaml

# papers/related_work/summaries.yaml (appended)
- key: "sun2023retnet"
  title: "Retentive Network: A Successor to Transformer for Large Language Models"
  authors: "Sun et al."
  venue: "arXiv 2023"
  contribution: |
    Proposes a retention mechanism that enables parallel training 
    (like Transformers) and recurrent inference (like RNNs). Uses 
    multi-scale exponential decay for position encoding.
  method_summary: |
    Replaces softmax attention with a retention mechanism based on 
    exponential decay. Three computation modes: parallel (training), 
    recurrent (inference), chunk-wise (long sequences).
  key_results:
    language_modeling: "Competitive with Transformer on perplexity"
    inference_speed: "8.4x faster than Transformer at 8k length"
  relevance: |
    Core inspiration for our work. We extend the recurrent mode 
    with flash attention's tiling strategy for better hardware utilization.
  bibtex: "@article{sun2023retnet, ...}"

Figure Description Design

The Scout also helps design figures by writing structured descriptions that can be used to generate plots.

yaml

# papers/figures/descriptions.yaml
figures:
  - id: "fig:throughput"
    type: "line_plot"
    title: "Throughput vs Sequence Length"
    x_axis: "Sequence Length"
    y_axis: "Tokens/second"
    series:
      - label: "Ours"
        data_source: "experiments/summary.yaml#throughput"
      - label: "Flash Attention 2"
        data_source: "experiments/summary.yaml#baseline_flash"
      - label: "RetNet"
        data_source: "experiments/summary.yaml#baseline_retnet"
    caption_draft: |
      Our method achieves higher throughput than both baselines 
      for sequences longer than 4096 tokens.

The Scout describes, the Coder generates

The Scout designs what figures should show and writes descriptions. The actual figure generation (matplotlib/tikz code) is the Coder's job, using these descriptions as specifications.

Writer — who uses the Scout's literature output
Judge — who evaluates the Scout's idea suggestions
Ideation Stage — where the Scout is most active

Scout ​

Identity ​

When Invoked ​

Idea Generation ​

What It Receives ​

What It Produces ​

Baseline Finding ​

Search Criteria ​

Output Format ​

Paper Digestion ​

Input ​

Output ​

Figure Description Design ​

Next ​

Scout

Identity

When Invoked

Idea Generation

What It Receives

What It Produces

Baseline Finding

Search Criteria

Output Format

Paper Digestion

Input

Output

Figure Description Design

Next