#agents
Public notes from activescott tagged with #agents
Monday, February 23, 2026
The Enterprise AI for Complex Work | Coworker AI
Another thing like glean that integrates and sucks in data and answers questions
Sunday, February 22, 2026
Andrej Karpathy on X: "Bought a new Mac mini to properly tinker with claws over the weekend. The apple store person told me they are selling like hotcakes and everyone is confused :) I'm definitely a bit sus'd to run OpenClaw specifically - giving my private data/keys to 400K lines of vibe coded" / X
Sounds about right.
I'm definitely a bit sus'd to run OpenClaw specifically - giving my private data/keys to 400K lines of vibe coded monster that is being actively attacked at scale is not very appealing at all. Already seeing reports of exposed instances, RCE vulnerabilities, supply chain poisoning, malicious or compromised skills in the registry, it feels like a complete wild west and a security nightmare. But I do love the concept and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level.
Sunday, February 15, 2026
steipete/gogcli: Google Suite CLI: Gmail, GCal, GDrive, GContacts.
Fast, script-friendly CLI for Gmail, Calendar, Chat, Classroom, Drive, Docs, Slides, Sheets, Forms, Apps Script, Contacts, Tasks, People, Groups (Workspace), and Keep (Workspace-only). JSON-first output, multiple accounts, and least-privilege auth built in.
What if you don't need MCP at all?
I'm a simple boy, so I like simple things. Agents can run Bash and write code well. Bash and code are composable. So what's simpler than having your agent just invoke CLI tools and write code? This is nothing new. We've all been doing this since the beginning. I'd just like to convince you that in many situations, you don't need or even want an MCP server.
Sunday, February 8, 2026
The Agent Skills Directory
Wednesday, February 4, 2026
Agent Skills Marketplace - Claude, Codex & ChatGPT Skills | SkillsMP
Sunday, February 1, 2026
OpenClaw — Personal AI Assistant
Wednesday, January 28, 2026
Sentience API - Verification & Control Layer for Browser AI Agents | Semantic snapshots, assertions, traces + artifacts. Local-ready, cloud-friendly, vision optional
An interesting tool that uses playwright to extract structure based on apparently accessibility roles and geometry of “important” elements and use that for an execution agent to process the page results. Important elements are somehow ranked. Then geometry is inferred from those elements.
Also relies on jest-style assertions to explicitly assert whether a step succeeded or failed.
Friday, January 23, 2026
The Leading Multi-Agent Platform
Monday, December 8, 2025
gorilla/berkeley-function-call-leaderboard at main · ShishirPatil/gorilla
We introduce the Berkeley Function Calling Leaderboard (BFCL), the first comprehensive and executable function call evaluation dedicated to assessing Large Language Models' (LLMs) ability to invoke functions. Unlike previous evaluations, BFCL accounts for various forms of function calls, diverse scenarios, and executability.
SWE-agent/mini-swe-agent: The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!
In 2024, SWE-bench & SWE-agent helped kickstart the coding agent revolution.
We now ask: What if SWE-agent was 100x smaller, and still worked nearly as well?
mini is for
Researchers who want to benchmark, fine-tune or RL without assumptions, bloat, or surprises Developers who like their tools like their scripts: short, sharp, and readable Engineers who want something trivial to sandbox & to deploy anywhereHere's some details:
Minimal: Just 100 lines of python (+100 total for env, model, script) — no fancy dependencies! Powerful: Resolves >74% of GitHub issues in the SWE-bench verified benchmark (leaderboard). Convenient: Comes with UIs that turn this into your daily dev swiss army knife! Deployable: In addition to local envs, you can use docker, podman, singularity, apptainer, and more Tested: Codecov Cutting edge: Built by the Princeton & Stanford team behind SWE-bench and SWE-agent.