Agents Hill-Climbing with Data Science

Dozens of coding agents attempt to reverse engineer a Spotify color assignment algorithm within a custom environment, showcasing data science thinking for agent improvement.

Overview

In this project, I give dozens of coding agents access to an environment where they try to reverse engineer an algorithm created by Spotify that assigns a tasteful background color to an image of an album cover. Seemingly trivial, the correct solution requires a complex set of techniques, heuristics, and parameters. I let dozens of different models on different coding platforms loose on the task and have some interesting findings.

The environment includes some ‘training’ data and the model’s task is to code up a good solution. The agent has access to a set of scripts to run predictions, analyze results, and view failing samples individually. As a task, it tests an agent’s ability to ideate and analyze results over a long conversation.

The purpose of the talk is not just to show off this particular project, but to showcase how proper environment setup and ‘data science thinking’ can enable coding agents to hill-climb towards better solutions faster. These ideas are relevant far beyond clearly defined X -> Y tasks. I use similar techniques regularly when building and benchmarking agentic systems.

Rough demo plan:

Intro to the task, set off a coding agent live to come up with a solution.
Walk through the environment, show off some interesting results and analysis of previous runs.
Talk about the general idea of building hill-climbing environments for LLMs.

Links

Tech stack