Setting up a Hive experiment requires three things:Documentation Index
Fetch the complete documentation index at: https://docs.hiverge.ai/llms.txt
Use this file to discover all available pages before exploring further.
- A target — which files (or lines) the Hive is allowed to modify
- An evaluator — a script that scores each candidate solution
- A sandbox — the environment where evaluations run
- Is the code you want to improve clearly separable from the surrounding harness?
- Do you have a quantifiable metric? If a higher score doesn’t mean a better algorithm, you’re measuring the wrong thing.
- Would domain knowledge or directional hints help narrow the search?
Define the target
The Hive needs to know which files in your codebase it is allowed to modify. This is specified in the configuration YAML using therepo.target_code field:
Write the evaluator
One of the most important parts to setting up a Hive experiment is defining an evaluator. Concretely, this is a Python script specified in the configuration YAML file as a local path relative to the codebase root.python path/to/evaluation.py, and must print a single-line JSON object to stdout, e.g.,
output.fitness. It is also possible to define an evaluator with multiple objectives which are wanted to be optimized simultaneously, e.g.,
The Hive always assumes that objectives should be maximized. If you have a quantity you want to minimize, simply return the negative of that value in the fitness.
| Field | Type | Description |
|---|---|---|
output.fitness | number or object | Scalar value, or a dict for multi-objective |
output.feedback_summary | string (optional) | Summary passed back to the agent |
metainfo | string | "Success" on success; any other value indicates failure |
Tips and tricks
The Hive optimizes for the score your evaluator returns — make sure that score faithfully represents what you actually want to improve.- Enforce correctness explicitly. Never assume the output of the evaluator is correct. You should always add tests or checks to guarantee the code produces correct results before scoring performance. Your evaluator should return a failure if the code produces incorrect results, regardless of performance.
- Guard against reward hacking. The Hive will exploit any shortcut that inflates the score — caching results, short-circuiting logic, or producing hardcoded outputs. Build validation checks into your evaluator and treat unexpectedly large score jumps with skepticism.
- Make evaluations deterministic. If your evaluator has stochasticity, run it multiple times and report the mean or median. Noisy scores make it harder for the Hive to distinguish genuine improvements.
- Keep evaluations fast. The Hive iterates faster with quick feedback. Balance evaluation thoroughness with speed — consider if a lighter test suite provides sufficient signal compared to an exhaustive one.
- Use relative scoring for timed benchmarks. Hardware performance can vary between sandboxes. If optimizing for runtime, instead of reporting raw times, run a baseline alongside each candidate and report the speedup factor.
Configure the sandbox
Once the evaluator is implemented, it is also important to consider what environment to run the code in. Each candidate solution is evaluated inside a sandboxed environment which can be customized in the configuration YAML as follows:base_image: The Docker image used as the base environment.setup_script: Shell commands that run once when the sandbox is first created. Use this to install dependencies or download data.resources: CPU, RAM, and shared memory allocated to the sandbox. GPUs can be added with the format<accelerator-name>:<num-gpus>.Accelerator GPU a100-80gbNVIDIA A100 80GB a100-40gbNVIDIA A100 40GB h100NVIDIA H100 h200NVIDIA H200 b200NVIDIA B200 a10NVIDIA A10 t4NVIDIA T4 l4NVIDIA L4 l40sNVIDIA L40S evaluation_timeout: Maximum time in seconds before an evaluation is terminated.
Provide context
You can provide natural language context to guide the Hive’s search. This is specified in theprompt.context field:
prompt.ideas field:
repo.additional_context field.