Skip to content

Scout

The Scout is the literature and knowledge agent. It uses Gemini's broad knowledge base to search for papers, generate research ideas, and prepare structured information for other agents.

Identity

PropertyValue
LLMGemini
Invocationomc team 1:gemini:scout "task"
LifecyclePer-task — invoked for specific search/analysis tasks
SessionNamed tmux session ({prefix}-scout)

When Invoked

StageTaskOutput
IdeationIdea generationideas/brainstorm.md
IdeationBaseline findingdesign/baselines.yaml
DesignPaper detail fetchpapers/related_work/summaries.yaml
WritingRelated work preparationpapers/related_work/papers.bib + summaries
WritingFigure description designpapers/figures/descriptions.yaml
ReviewFinding reviewer-cited papersAdditional entries in summaries.yaml

Idea Generation

During ideation, the Scout generates research ideas by exploring the landscape around a topic.

What It Receives

yaml
topic: "efficient attention mechanisms for long sequences"
constraints:
  - Must be implementable in 2 weeks
  - Must run on 4x A100 GPUs
  - Target venue: ICML

What It Produces

The Scout generates ideas without requiring code or implementation feasibility analysis — that's the Judge's job later.

yaml
# ideas/brainstorm.md
## Idea 1: Flash-Recurrent Attention
Combine flash attention's IO-awareness with RetNet's recurrent form...
- Novelty angle: No one has applied flash attention's tiling to recurrent attention
- Expected benefit: O(n) memory, hardware-efficient

## Idea 2: Sparse Retention Patterns  
Apply learned sparsity to the retention mechanism...
- Novelty angle: Retention + dynamic sparsity is unexplored
- Expected benefit: Sub-linear compute for very long sequences

## Idea 3: Multi-Scale Retention with Mixture of Experts
Different retention scales for different attention heads, gated by MoE...

Ideas are creative, not conservative

The Scout is instructed to be expansive during idea generation. Bad ideas can be filtered later by the Judge. Missing a good idea because the Scout was too conservative is a worse failure mode.

Baseline Finding

When the Orchestrator needs baselines for experiment design, the Scout searches for relevant papers with specific requirements:

Search Criteria

CriterionRequirement
Venue qualityTop venues only (ICML, NeurIPS, ICLR, ACL, EMNLP, CVPR, etc.)
Code availabilityRequired — baselines without code are flagged but deprioritized
RecencyPrefer last 2 years, allow older if seminal
RelevanceDirect comparison possible with proposed method

Code is required for baselines

Unlike idea generation, baseline finding has a hard requirement: baselines must have available code. A baseline that can't be reproduced can't be compared against. The Scout flags papers without code as code_available: false and the Planner deprioritizes them.

Output Format

yaml
# design/baselines.yaml
baselines:
  - name: "Flash Attention 2"
    paper: "Dao, 2023"
    venue: "ICLR 2024"
    code: "https://github.com/Dao-AILab/flash-attention"
    code_available: true
    key_results:
      throughput: "2.5x vanilla attention"
      memory: "O(n) instead of O(n^2)"
    comparison_axes: [throughput, memory, perplexity]

Paper Digestion

The Scout's most detailed work is digesting papers into structured documents for other agents (especially the Writer).

Input

"Digest this paper for the related work section: 
 Title: 'Retentive Network: A Successor to Transformer for Large Language Models'
 Focus: retention mechanism, recurrent formulation, training parallelism"

Output

yaml
# papers/related_work/summaries.yaml (appended)
- key: "sun2023retnet"
  title: "Retentive Network: A Successor to Transformer for Large Language Models"
  authors: "Sun et al."
  venue: "arXiv 2023"
  contribution: |
    Proposes a retention mechanism that enables parallel training 
    (like Transformers) and recurrent inference (like RNNs). Uses 
    multi-scale exponential decay for position encoding.
  method_summary: |
    Replaces softmax attention with a retention mechanism based on 
    exponential decay. Three computation modes: parallel (training), 
    recurrent (inference), chunk-wise (long sequences).
  key_results:
    language_modeling: "Competitive with Transformer on perplexity"
    inference_speed: "8.4x faster than Transformer at 8k length"
  relevance: |
    Core inspiration for our work. We extend the recurrent mode 
    with flash attention's tiling strategy for better hardware utilization.
  bibtex: "@article{sun2023retnet, ...}"

Figure Description Design

The Scout also helps design figures by writing structured descriptions that can be used to generate plots.

yaml
# papers/figures/descriptions.yaml
figures:
  - id: "fig:throughput"
    type: "line_plot"
    title: "Throughput vs Sequence Length"
    x_axis: "Sequence Length"
    y_axis: "Tokens/second"
    series:
      - label: "Ours"
        data_source: "experiments/summary.yaml#throughput"
      - label: "Flash Attention 2"
        data_source: "experiments/summary.yaml#baseline_flash"
      - label: "RetNet"
        data_source: "experiments/summary.yaml#baseline_retnet"
    caption_draft: |
      Our method achieves higher throughput than both baselines 
      for sequences longer than 4096 tokens.

The Scout describes, the Coder generates

The Scout designs what figures should show and writes descriptions. The actual figure generation (matplotlib/tikz code) is the Coder's job, using these descriptions as specifications.

Next

  • Writer — who uses the Scout's literature output
  • Judge — who evaluates the Scout's idea suggestions
  • Ideation Stage — where the Scout is most active

AutoResearch — Multi-agent Deep Learning Research System