#llm + #code

Public notes from activescott tagged with both #llm and #code

Friday, February 27, 2026

MCP Inspector - Model Context Protocol

modelcontextprotocol.io/docs/tools/inspector

The MCP Inspector is an interactive developer tool for testing and debugging MCP servers. While the Debugging Guide covers the Inspector as part of the overall debugging toolkit, this document provides a detailed exploration of the Inspector’s features and capabilities.

#2:48 AM

mcp code llm

Sunday, February 8, 2026

Tools - Model Context Protocol

modelcontextprotocol.io/specification/2025-06-18/server/tools

schema reference

#4:40 AM

mcp code llm

Tuesday, February 3, 2026

Gemini CLI | gemini-cli

google-gemini.github.io/gemini-cli/

#1:50 AM

llm/coding agent code cli google gemini llm

Sunday, February 1, 2026

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

agentdojo.spylab.ai/

To measure the adversarial robustness of AI agents, we introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data. To capture the evolving nature of attacks and defenses, AgentDojo is not a static test suite, but rather an extensible environment for designing and evaluating new agent tasks, defenses, and adaptive attacks. We populate the environment with 97 realistic tasks (e.g., managing an email client, navigating an e-banking website, or making travel bookings), 629 security test cases, and various attack and defense paradigms from the literature. We find that AgentDojo poses a challenge for both attacks and defenses: state-of-the-art LLMs fail at many tasks (even in the absence of attacks), and existing prompt injection attacks break some security properties but not all. We hope that AgentDojo can foster research on new design principles for AI agents that solve common tasks in a reliable and robust manner.

#1:15 AM

exfiltration-attacks prompt-injection code security llm

Saturday, January 31, 2026

google-research/camel-prompt-injection: Code for the paper "Defeating Prompt Injections by Design"

github.com/google-research/camel-prompt-injection

#4:44 PM

exfiltration-attacks code security llm

Wednesday, January 28, 2026

Schema Reference - Model Context Protocol

modelcontextprotocol.io/specification/2025-11-25/schema#toolannotations

interface ToolAnnotations { title?: string; readOnlyHint?: boolean; destructiveHint?: boolean; idempotentHint?: boolean; openWorldHint?: boolean; }

Additional properties describing a Tool to clients.

NOTE: all properties in ToolAnnotations are hints. They are not guaranteed to provide a faithful description of tool behavior (including descriptive properties like title).

Clients should never make tool use decisions based on ToolAnnotations received from untrusted servers.

#12:29 AM

mcp code llm

Tuesday, January 27, 2026

ChatGPT Containers can now run bash, pip/npm install packages, and download files

simonwillison.net/2026/Jan/26/chatgpt-containers/

ChatGPT can directly run Bash commands now. Previously it was limited to Python code only, although it could run shell commands via the Python subprocess module. It has Node.js and can run JavaScript directly in addition to Python. I also got it to run “hello world” in Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C and C++. No Rust yet though! While the container still can’t make outbound network requests, pip install package and npm install package both work now via a custom proxy mechanism. ChatGPT can locate the URL for a file on the web and use a container.download tool to download that file and save it to a path within the sandboxed container.

Is this a data exfiltration vulnerability though? Could a prompt injection attack trick ChatGPT into leaking private data out to a container.download call to a URL with a query string that includes sensitive information?

I don’t think it can. I tried getting it to assemble a URL with a query string and access it using container.download and it couldn’t do it. It told me that it got back this error:

ERROR: download failed because url not viewed in conversation before. open the file or url using web.run first.

This looks to me like the same safety trick used by Claude’s Web Fetch tool: only allow URL access if that URL was either directly entered by the user or if it came from search results that could not have been influenced by a prompt injection.

#2:14 AM

prompt-injection-vulnerabilities mcp prompt-injection code security llm

MCP Apps - Bringing UI Capabilities To MCP Clients | Model Context Protocol Blog

blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/

The architecture of MCP Apps relies on two key MCP primitives:

Tools with UI metadata: Tools include a _meta.ui.resourceUri field pointing to a UI resource UI Resources: Server-side resources served via the ui:// scheme containing bundled HTML/JavaScript // Tool with UI metadata { name: "visualize_data", description: "Visualize data as an interactive chart", inputSchema: { /* ... */ }, _meta: { ui: { resourceUri: "ui://charts/interactive" } } } The host fetches the resource, renders it in a sandboxed iframe, and enables bidirectional communication via JSON-RPC over postMessage.

#2:07 AM

mcp code llm

lancedb/lancedb: Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.

github.com/lancedb/lancedb

LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease. LanceDB is a central location where developers can build, train and analyze their AI workloads.

#1:35 AM

llm/embeddings databases code rag llm

Friday, January 23, 2026

The Leading Multi-Agent Platform

www.crewai.com/

#3:35 AM

python code agents llm

Monday, January 19, 2026

First impressions of Claude Cowork, Anthropic’s general agent

simonwillison.net/2026/Jan/12/claude-cowork/

Anthropic say that Cowork can only access files you grant it access to—it looks to me like they’re mounting those files into a containerized environment, which should mean we can trust Cowork not to be able to access anything outside of that sandbox.

Update: It’s more than just a filesystem sandbox—I had Claude Code reverse engineer the Claude app and it found out that Claude uses VZVirtualMachine—the Apple Virtualization Framework—and downloads and boots a custom Linux root filesystem.

I recently learned that the summarization applied by the WebFetch function in Claude Code and now in Cowork is partly intended as a prompt injection protection layer via this tweet from Claude Code creator Boris Cherny:

Summarization is one thing we do to reduce prompt injection risk. Are you running into specific issues with it?

#4:31 AM

llm/tool-calling prompt-injection code claude llm

A quote from Jeremy Daer

simonwillison.net/2026/Jan/17/jeremy-daer/

Subscribe [On agents using CLI tools in place of REST APIs] To save on context window, yes, but moreso to improve accuracy and success rate when multiple tool calls are involved, particularly when calls must be correctly chained e.g. for pagination, rate-limit backoff, and recognizing authentication failures.

Other major factor: which models can wield the skill? Using the CLI lowers the bar so cheap, fast models (gpt-5-nano, haiku-4.5) can reliably succeed. Using the raw APl is something only the costly "strong" models (gpt-5.2, opus-4.5) can manage, and it squeezes a ton of thinking/reasoning out of them, which means multiple turns/iterations, which means accumulating a ton of context, which means burning loads of expensive tokens. For one-off API requests and ad hoc usage driven by a developer, this is reasonable and even helpful, but for an autonomous agent doing repetitive work, it's a disaster.

#4:27 AM

prompt-engineering llm/tool-calling code llm

Thursday, January 15, 2026

The Typescript AI framework - Mastra

mastra.ai/

#9:38 PM

code js llm

Wednesday, January 14, 2026

My answers to the questions I posed about porting open source code with LLMs

simonwillison.net/2026/Jan/11/answers/

the short version is that it’s now possible to point a coding agent at some other open source project and effectively tell it “port this to language X and make sure the tests still pass” and have it do exactly that.

Does this library represent a legal violation of copyright of either the Rust library or the Python one? #

I decided that the right thing to do here was to keep the open source license and copyright statement from the Python library author and treat what I had built as a derivative work, which is the entire point of open source.

Even if this is legal, is it ethical to build a library in this way? #

After sitting on this for a while I’ve come down on yes, provided full credit is given and the license is carefully considered. Open source allows and encourages further derivative works! I never got upset at some university student forking one of my projects on GitHub and hacking in a new feature that they used. I don’t think this is materially different, although a port to another language entirely does feel like a slightly different shape.

The much bigger concern for me is the impact of generative AI on demand for open source. The recent Tailwind story is a visible example of this—while Tailwind blamed LLMs for reduced traffic to their documentation resulting in fewer conversions to their paid component library, I’m suspicious that the reduced demand there is because LLMs make building good-enough versions of those components for free easy enough that people do that instead.

#7:15 PM

open-source freedom-of-speech code copyright llm

Saturday, January 10, 2026

LiteLLM - Getting Started | liteLLM

docs.litellm.ai/docs/

Translate inputs to provider's endpoints (/chat/completions, /responses, /embeddings, /images, /audio, /batches, and more) Consistent output - same response format regardless of which provider you use Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router Track spend & set budgets per project LiteLLM Proxy Server

#8:13 PM

code llm

Introducing advanced tool use on the Claude Developer Platform \ Anthropic

www.anthropic.com/engineering/advanced-tool-use

The Tool Search Tool lets Claude dynamically discover tools instead of loading all definitions upfront. You provide all your tool definitions to the API, but mark tools with defer_loading: true to make them discoverable on-demand. Deferred tools aren't loaded into Claude's context initially. Claude only sees the Tool Search Tool itself plus any tools with defer_loading: false (your most critical, frequently-used tools).

With Programmatic Tool Calling:

Instead of each tool result returning to Claude, Claude writes a Python script that orchestrates the entire workflow. The script runs in the Code Execution tool (a sandboxed environment), pausing when it needs results from your tools. When you return tool results via the API, they're processed by the script rather than consumed by the model. The script continues executing, and Claude only sees the final output.

#3:43 AM

mcp code claude llm

Monday, January 5, 2026

Claude Code On-The-Go - granda

granda.org/en/2026/01/02/claude-code-on-the-go/

I run six Claude Code agents in parallel from my phone. No laptop, no desktop—just Termius on iOS and a cloud VM.

The loop is: kick off a task, pocket the phone, get notified when Claude needs input. Async development from anywhere.

#4:21 PM

code claude llm

Monday, December 22, 2025

geerlingguy/ai-benchmarks: Simple AI/LLM benchmarking tools.

github.com/geerlingguy/ai-benchmarks

#6:25 AM

benchmarks code gpu llm

Apple's macOS Tahoe 26.2 Enables RDMA Over Thunderbolt for AI Mac Clusters

www.webpronews.com/apples-macos-tahoe-26-2-enables-rdma-over-thunderbolt-for-ai-mac-clusters/

Apple’s release notes detail that RDMA integrates with the Thunderbolt framework to enable zero-copy data transfers, meaning data moves directly from one device’s memory to another’s without intermediate buffering. This eliminates bottlenecks associated with TCP/IP protocols, which Thunderbolt previously emulated. Insiders note that while Thunderbolt 5 offers peak speeds, real-world performance depends on factors like cable quality and device compatibility—only M4 and later chips fully support this enhanced mode.

Diving deeper into the technical specifics, Apple’s developer documentation explains that RDMA over Thunderbolt is exposed through new APIs in the macOS networking stack. Developers can initialize clusters using Swift or Objective-C calls that negotiate memory mappings directly over the Thunderbolt bus. This is a departure from traditional Ethernet-based RDMA, which relies on Infiniband or RoCE (RDMA over Converged Ethernet), adapting instead to Thunderbolt’s point-to-point topology.

For those building apps, the update introduces protocols for fault-tolerant clustering. If a device drops out—say, due to a disconnected cable—the system can redistribute workloads dynamically, minimizing disruptions. Testing scenarios outlined in the notes suggest latency as low as microseconds for small transfers, rivaling dedicated high-performance computing setups.

Security is paramount in such a powerful feature. Apple’s notes emphasize built-in encryption for RDMA transfers, preventing unauthorized memory access. A separate 9to5Mac report on the update’s patches reveals fixes for kernel vulnerabilities that could have been exploited in clustered environments, ensuring that the feature doesn’t become a vector for attacks.

Looking at adoption, early sentiment on X suggests enthusiasm among AI researchers. One thread discussed collaborative model training, where multiple users contribute compute power via clustered Macs, democratizing access to high-end AI tools. This could disrupt markets dominated by cloud providers, offering cost savings for startups avoiding subscription fees.

#6:12 AM

macos code apple llm

1.5 TB of VRAM on Mac Studio - RDMA over Thunderbolt 5 | Jeff Geerling

www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-studio-rdma-over-thunderbolt-5

RDMA lets the Macs all act like they have one giant pool of RAM, which speeds up things like massive AI models.

#6:06 AM

open-source code llm