#agents + #llm-coding

Public notes from activescott tagged with both #agents and #llm-coding

Monday, December 8, 2025

In 2024, SWE-bench & SWE-agent helped kickstart the coding agent revolution.

We now ask: What if SWE-agent was 100x smaller, and still worked nearly as well?

mini is for

Researchers who want to benchmark, fine-tune or RL without assumptions, bloat, or surprises
Developers who like their tools like their scripts: short, sharp, and readable
Engineers who want something trivial to sandbox & to deploy anywhere

Here's some details:

Minimal: Just 100 lines of python (+100 total for env, model, script) — no fancy dependencies!
Powerful: Resolves >74% of GitHub issues in the SWE-bench verified benchmark (leaderboard).
Convenient: Comes with UIs that turn this into your daily dev swiss army knife!
Deployable: In addition to local envs, you can use docker, podman, singularity, apptainer, and more
Tested: Codecov
Cutting edge: Built by the Princeton & Stanford team behind SWE-bench and SWE-agent.