OpenEvolve + GeoSpatial Knowledge ⇒ Improved Geospatial Algorithms

Researchers from MIT and Stanford built GeoEvolve, a two-loop approach that combines an OpenEvolve-style evolutionary inner loop with an outer loop that retrieves and encodes geospatial domain knowledge ("GeoKnowRAG"). Applied to ordinary kriging (spatial interpolation) and geospatial conformal prediction (uncertainty quantification), GeoEvolve reports interpolation RMSE reductions of roughly 13–21% and interval-score improvements of about 17% on the evaluated datasets.

How is OpenEvolve used?

Problem setup: Geospatial tasks expose clear objectives and evaluators, e.g., RMSE/MAE for ordinary kriging and interval score or average interval size for geospatial conformal prediction—measured on real datasets or simulators.

Inner loop (OpenEvolve): Runs evolutionary code search: generate candidate implementations, execute on the task scorer, retain top performers, and iterate.

Outer loop (domain-guided): An agentic controller keeps global elites and invokes GeoKnowRAG, which retrieves structured geospatial knowledge and returns domain-informed prompts to guide the next round of evolution.

Injecting geospatial theory grounds the search and narrows the space, improving robustness in non-stationary or small-sample regimes; the evolved kriging and GeoCP variants explicitly incorporate variogram model selection, adaptive/localized kriging, and geographically weighted quantiles.

Figure 1: The workflow of GeoEvolve adapted from Figure-2 in the paper

Results

Across two canonical tasks, GeoEvolve reports consistent gains over strong baselines:

Spatial interpolation (ordinary kriging): Test RMSE decreases by ≈13–21% across evaluated datasets, driven by domain-guided choices such as variogram model selection and adaptive/localized kriging (selected and parameterized by the system).
Spatial uncertainty quantification (geospatial conformal prediction): The interval score (lower is better) improves by ≈17%, using domain-informed calibration (e.g., geographically weighted quantiles) to balance coverage and width.

Ablations (four variants):
Performance increases monotonically with added domain structure:

OpenEvolve (baseline)
  < OpenEvolve + generic knowledge
  < GeoEvolve w/o GeoKnowRAG
  < GeoEvolve (full: outer controller + GeoKnowRAG)

The full system yields the largest and most stable improvements; pure evolution alone shows smaller gains and plateaus.

Generic engine + domain priors beats bespoke

A common worry is that every domain needs its own bespoke agent stack. we argue the opposite: you don't need a task-specific agent stack for every domain. A reusable evolutionary engine (like OpenEvolve) plus lightweight domain priors (retrieval, constraints, localized heuristics) is usually enough to outpace bespoke pipelines. The engine gives you a standardized search-and-evaluate substrate; the domain layer narrows the space and injects good inductive bias. You get stronger results with far less scaffolding and a setup you can reuse on the next problem.