In 2024, SWE-bench & SWE-agent helped kickstart the coding agent revolution.
We now ask: What if SWE-agent was 100x smaller, and still worked nearly as well?
mini is for
Researchers who want to benchmark, fine-tune or RL without assumptions, bloat, or surprises
Developers who like their tools like their scripts: short, sharp, and readable
Engineers who want something trivial to sandbox & to deploy anywhere
Here's some details:
Minimal: Just 100 lines of python (+100 total for env, model, script) — no fancy dependencies!
Powerful: Resolves >74% of GitHub issues in the SWE-bench verified benchmark (leaderboard).
Convenient: Comes with UIs that turn this into your daily dev swiss army knife!
Deployable: In addition to local envs, you can use docker, podman, singularity, apptainer, and more
Tested: Codecov
Cutting edge: Built by the Princeton & Stanford team behind SWE-bench and SWE-agent.