prose/skills/open-prose/compiler.md at main · openprose/prose
prose.md language reference
Public notes from activescott tagged with both #llm and #agents
prose.md language reference
prose.md
Why not just play in English? English is already an agent framework—we're structuring it, not replacing it. Plain English doesn't distinguish sequential from parallel, doesn't specify retry counts, doesn't scope variables. OpenProse uses English exactly where ambiguity is a feature (inside ...), and structure everywhere else. The fourth wall syntax lets you lean on AI judgment precisely when you want to.
How is this a VM? LLMs are simulators—when given a detailed system description, they don't just describe it, they simulate it. The prose.md spec describes a VM with enough fidelity that reading it induces simulation. But simulation with sufficient fidelity is implementation: each session spawns a real subagent, outputs are real artifacts, state persists in conversation history or files. The simulation is the execution.
Fast, script-friendly CLI for Gmail, Calendar, Chat, Classroom, Drive, Docs, Slides, Sheets, Forms, Apps Script, Contacts, Tasks, People, Groups (Workspace), and Keep (Workspace-only). JSON-first output, multiple accounts, and least-privilege auth built in.
I'm a simple boy, so I like simple things. Agents can run Bash and write code well. Bash and code are composable. So what's simpler than having your agent just invoke CLI tools and write code? This is nothing new. We've all been doing this since the beginning. I'd just like to convince you that in many situations, you don't need or even want an MCP server.
An interesting tool that uses playwright to extract structure based on apparently accessibility roles and geometry of “important” elements and use that for an execution agent to process the page results. Important elements are somehow ranked. Then geometry is inferred from those elements.
Also relies on jest-style assertions to explicitly assert whether a step succeeded or failed.
We introduce the Berkeley Function Calling Leaderboard (BFCL), the first comprehensive and executable function call evaluation dedicated to assessing Large Language Models' (LLMs) ability to invoke functions. Unlike previous evaluations, BFCL accounts for various forms of function calls, diverse scenarios, and executability.
In 2024, SWE-bench & SWE-agent helped kickstart the coding agent revolution.
We now ask: What if SWE-agent was 100x smaller, and still worked nearly as well?
mini is for
Researchers who want to benchmark, fine-tune or RL without assumptions, bloat, or surprises Developers who like their tools like their scripts: short, sharp, and readable Engineers who want something trivial to sandbox & to deploy anywhereHere's some details:
Minimal: Just 100 lines of python (+100 total for env, model, script) — no fancy dependencies! Powerful: Resolves >74% of GitHub issues in the SWE-bench verified benchmark (leaderboard). Convenient: Comes with UIs that turn this into your daily dev swiss army knife! Deployable: In addition to local envs, you can use docker, podman, singularity, apptainer, and more Tested: Codecov Cutting edge: Built by the Princeton & Stanford team behind SWE-bench and SWE-agent.
This is all I feed to my agent. It's a handful of tools that cover all the bases for my use case. Each tool is a simple Node.js script that uses Puppeteer Core. By reading that README, the agent knows the available tools, when to use them, and how to use them via Bash.