Blog

New algorithmic frontiers.

OpenEvolve for AI Driven Research for Systems (ADRS)

OpenEvolve for AI Driven Research for Systems (ADRS)

UC Berkeley's Sky Computing Lab used OpenEvolve as an open-source engine for AI-Driven Research for Systems (ADRS) to automatically discover and refine systems algorithms across multiple domains (for instance - MoE expert parallelism load balancer, multi-region spot scheduling, LLM-SQL preprocessing, transaction scheduling, and more). Reported results include up to 5× runtime speedups and double-digit percentage point cost reductions, often found within hours and sub-$20 evaluation budgets. Berkeley's team has documented their experience of using OpenEvolve and the results in their paper and blog.

OpenEvolve + GeoSpatial Knowledge ⇒ Improved Algorithms

OpenEvolve + GeoSpatial Knowledge ⇒ Improved Algorithms

Researchers from MIT and Stanford built GeoEvolve, a two-loop approach that combines an OpenEvolve-style evolutionary inner loop with an outer loop that retrieves and encodes geospatial domain knowledge ("GeoKnowRAG"). Applied to ordinary kriging (spatial interpolation) and geospatial conformal prediction (uncertainty quantification), GeoEvolve reports interpolation RMSE reductions of roughly 13–21% and interval-score improvements of about 17% on the evaluated datasets.

OptiLLM-Powered CePO: How Cerebras Turned Open Llama into a Fast, Test-Time Reasoner

OptiLLM-Powered CePO: How Cerebras Turned Open Llama into a Fast, Test-Time Reasoner

Cerebras has applied CePO, or Cerebras Enhanced Planning and Optimization, to the GPT-OSS-120B model through their inference endpoint. As an OptiLLM technique, CePO leverages test-time computation for iterative planning and refinement, all without retraining the model. CePO works as an inference-time pipeline tailored for Cerebras hardware, allowing the model to plan, iterate on solutions, and refine outputs in real time. As part of the OptiLLM framework, it breaks down complex tasks like code generation into steps: outlining a plan, generating multiple attempts, analyzing for consistency, and picking the strongest result. This uses more tokens overall but turns hardware speed into an advantage for better reasoning, something that is tough on standard setups due to memory limits. The approach builds on earlier work with Llama models and has been extended in updates to models like DeepSeek R1 and Qwen QwQ 32B.