Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.hiverge.ai/llms.txt

Use this file to discover all available pages before exploring further.

Setting up a Hive experiment requires three things:
  1. A target — which files (or lines) the Hive is allowed to modify
  2. An evaluator — a script that scores each candidate solution
  3. A sandbox — the environment where evaluations run
Optionally, you can also provide natural language context to guide the search. The Hive can evolve arbitrarily large codebases, but like a human researcher, it performs best when its task is well-defined. Before starting, consider:
  • Is the code you want to improve clearly separable from the surrounding harness?
  • Do you have a quantifiable metric? If a higher score doesn’t mean a better algorithm, you’re measuring the wrong thing.
  • Would domain knowledge or directional hints help narrow the search?

Define the target

The Hive needs to know which files in your codebase it is allowed to modify. This is specified in the configuration YAML using the repo.target_code field:
repo:
  target_code: evolve.py
It is possible to specify multiple files to evolve, and specific line ranges to evolve too, see here. Everything outside these paths is treated as fixed infrastructure. This lets you isolate the algorithm or heuristic you want to improve while keeping the surrounding harness stable.

Write the evaluator

One of the most important parts to setting up a Hive experiment is defining an evaluator. Concretely, this is a Python script specified in the configuration YAML file as a local path relative to the codebase root.
repo:
  evaluation_script: path/to/evaluation.py
This script is run as python path/to/evaluation.py, and must print a single-line JSON object to stdout, e.g.,
{
    "output": {
        "fitness": 0.85,
        "feedback_summary": "Throughput improved by 23% over baseline"
    },
    "metainfo": "Success"
}
The Hive will optimize for an algorithm which maximizes the quantity reported under output.fitness. It is also possible to define an evaluator with multiple objectives which are wanted to be optimized simultaneously, e.g.,
{
    "output": {
        "fitness": {
            "speedup": 2.48,
            "accuracy": 0.94
        },
        "feedback_summary": "forward() accounts for 73% of total runtime; matmul on line 42 is the primary bottleneck"
    },
    "metainfo": "Success"
}
The Hive always assumes that objectives should be maximized. If you have a quantity you want to minimize, simply return the negative of that value in the fitness.
FieldTypeDescription
output.fitnessnumber or objectScalar value, or a dict for multi-objective
output.feedback_summarystring (optional)Summary passed back to the agent
metainfostring"Success" on success; any other value indicates failure

Tips and tricks

The Hive optimizes for the score your evaluator returns — make sure that score faithfully represents what you actually want to improve.
  • Enforce correctness explicitly. Never assume the output of the evaluator is correct. You should always add tests or checks to guarantee the code produces correct results before scoring performance. Your evaluator should return a failure if the code produces incorrect results, regardless of performance.
  • Guard against reward hacking. The Hive will exploit any shortcut that inflates the score — caching results, short-circuiting logic, or producing hardcoded outputs. Build validation checks into your evaluator and treat unexpectedly large score jumps with skepticism.
  • Make evaluations deterministic. If your evaluator has stochasticity, run it multiple times and report the mean or median. Noisy scores make it harder for the Hive to distinguish genuine improvements.
  • Keep evaluations fast. The Hive iterates faster with quick feedback. Balance evaluation thoroughness with speed — consider if a lighter test suite provides sufficient signal compared to an exhaustive one.
  • Use relative scoring for timed benchmarks. Hardware performance can vary between sandboxes. If optimizing for runtime, instead of reporting raw times, run a baseline alongside each candidate and report the speedup factor.

Configure the sandbox

Once the evaluator is implemented, it is also important to consider what environment to run the code in. Each candidate solution is evaluated inside a sandboxed environment which can be customized in the configuration YAML as follows:
sandbox:
  base_image: python:3.9-slim
  setup_script: |
    pip install -r requirements.txt
  resources:
    cpu: "2"
    memory: "4Gi"
    shmsize: "1Gi"
    accelerators: a100-80gb:8
  evaluation_timeout: 60
Below, we briefly explain the most important fields and how to use them. See this page for a full list of available options.
  • base_image: The Docker image used as the base environment.
  • setup_script: Shell commands that run once when the sandbox is first created. Use this to install dependencies or download data.
  • resources: CPU, RAM, and shared memory allocated to the sandbox. GPUs can be added with the format <accelerator-name>:<num-gpus>.
    AcceleratorGPU
    a100-80gbNVIDIA A100 80GB
    a100-40gbNVIDIA A100 40GB
    h100NVIDIA H100
    h200NVIDIA H200
    b200NVIDIA B200
    a10NVIDIA A10
    t4NVIDIA T4
    l4NVIDIA L4
    l40sNVIDIA L40S
    If you don’t have a strict constraint on target hardware, allocate resources with some headroom. Resource exhaustion is reported as an evaluation failure, and high-performing solutions can consume more resources than expected.
  • evaluation_timeout: Maximum time in seconds before an evaluation is terminated.
    Set this with some headroom — high-performing candidates can take longer than expected, and an overly tight timeout may discard good solutions.

Provide context

You can provide natural language context to guide the Hive’s search. This is specified in the prompt.context field:
prompt:
  context: |
    This is some sample context about the problem.
    It can be formatted like this to span multiple lines.
Use this to describe the problem domain, suggest directions to explore, or set soft constraints on approaches the Hive should avoid. If you have multiple distinct ideas you want the Hive to explore, list them in the prompt.ideas field:
prompt:
  ideas:
    - This is one idea to explore.
    - This is a different idea to explore.
On each iteration, the Hive will randomly sample one of these ideas to guide its approach. Finally, if there are any files in the codebase that provide important context to writing the desired algorithm, such as internal dependencies, these can be specified in the repo.additional_context field.
repo:
  additional_context: context/file.py,another/context/file.py