#mcp + #llm

Public notes from activescott tagged with both #mcp and #llm

Friday, May 22, 2026

Parallel Quality Benchmarks | Parallel Web Systems | Infrastructure for intelligence on the web

Dataset

We evaluated search providers against five open benchmarks covering complementary aspects of agentic search: BrowseComp (hard multi-hop questions that require navigating the live web), Frames (multi-document factoid reasoning), FreshQA (time-sensitive questions where the correct answer depends on recent web information), HLE (Humanity's Last Exam — expert-level academic questions spanning math, science, and humanities), SealQA (ambiguity-robust factoid QA with intentionally misleading snippets), WebWalker (tasks designed around following links across pages to find an answer).

Evaluation methodology

Every task is run through a shared deep-research harness: a single GPT-5.4 agent is given two tools (web search and web fetch) with an iterative budget of up to MAX_TOOL_CALLS=25 tool calls per question. The agent plans sub-queries, fans out searches, fetches specific pages when snippets are insufficient, and returns an answer when it exhausts the number of allowed tool calls or has sufficient information to answer the question. Each answer is then LLM-graded by GPT-5.4. We report accuracy of the final answer.

We measure accuracy and overall cost, which includes LLM token costs and tool call costs.

Testing dates

April 19-21, 2026

#6:40 PM

benchmarks mcp search llm

Parallel Web Systems | Infrastructure for intelligence on the web

parallel.ai/

The highest accuracy web search for your AI

Why use Parallel Search vs. the default search in Claude?

Parallel runs its own web-scale index (billions of pages, millions added daily) and returns dense, query-relevant excerpts instead of raw HTML or SEO-ranked snippets. On public benchmarks, Parallel outperforms the default search in leading frontier models. Your agent reaches the right answer in fewer round trips and with less wasted context. – https://parallel.ai/blog/free-web-search-mcp

#6:06 PM

mcp code search llm

Friday, April 24, 2026

sarahpark/google-search-console-mcp: Google Search Console MCP server for AI agents

github.com/sarahpark/google-search-console-mcp

#3:46 AM

llm mcp seo google

Wednesday, March 18, 2026

gmickel/sheets-cli: Composable Google Sheets CLI for humans and agents. Read, write, update cells by key—with Agent Skills for Claude Code and OpenAI Codex.

github.com/gmickel/sheets-cli

Composable Google Sheets CLI for humans and agents. Read, write, update cells by key—with Agent Skills for Claude Code and OpenAI Codex.

#7:19 AM

llm agent mcp spreadsheet

Friday, February 27, 2026

MCP Inspector - Model Context Protocol

modelcontextprotocol.io/docs/tools/inspector

The MCP Inspector is an interactive developer tool for testing and debugging MCP servers. While the Debugging Guide covers the Inspector as part of the overall debugging toolkit, this document provides a detailed exploration of the Inspector’s features and capabilities.

#2:48 AM

mcp llm code

adhikasp/mcp-reddit: A Model Context Protocol (MCP) server that provides tools for fetching and analyzing Reddit content.

github.com/adhikasp/mcp-reddit

#2:44 AM

mcp reddit llm

Arindam200/reddit-mcp: Model Context Protocol server implementation for Reddit

github.com/Arindam200/reddit-mcp

This repository contains a Model Context Protocol server implementation for Reddit that allows AI assistants to access and interact with Reddit content through PRAW (Python Reddit API Wrapper).

#2:43 AM

mcp reddit llm

Sunday, February 15, 2026

steipete/gogcli: Google Suite CLI: Gmail, GCal, GDrive, GContacts.

github.com/steipete/gogcli

Fast, script-friendly CLI for Gmail, Calendar, Chat, Classroom, Drive, Docs, Slides, Sheets, Forms, Apps Script, Contacts, Tasks, People, Groups (Workspace), and Keep (Workspace-only). JSON-first output, multiple accounts, and least-privilege auth built in.

#6:36 PM

llm agents mcp cli bash google todo

What if you don't need MCP at all?

mariozechner.at/posts/2025-11-02-what-if-you-dont-need-mcp/?t=0

I'm a simple boy, so I like simple things. Agents can run Bash and write code well. Bash and code are composable. So what's simpler than having your agent just invoke CLI tools and write code? This is nothing new. We've all been doing this since the beginning. I'd just like to convince you that in many situations, you don't need or even want an MCP server.

#6:33 PM

llm agents mcp

Wednesday, February 11, 2026

Official MCP Registry

registry.modelcontextprotocol.io/

Official MCP Registry (modelcontextprotocol.io)

#5:02 AM

mcp llm

modelcontextprotocol/servers: Model Context Protocol Servers

github.com/modelcontextprotocol/servers

This repository is a collection of reference implementations for the Model Context Protocol (MCP), as well as references to community-built servers and additional resources.

#4:53 AM

llm mcp

Monday, February 9, 2026

DougBourban/mcp-wrapper-http: MCP HTTP Wrapper - Expose stdio-based Model Context Protocol servers via HTTP using official Streamable HTTP transport. Supports tools, prompts, resources with JSON-RPC 2.0, SSE streaming, session management & security. Transform any MCP server into a REST API.

github.com/DougBourban/mcp-wrapper-http

MCP HTTP Wrapper - Expose stdio-based Model Context Protocol servers via HTTP using official Streamable HTTP transport. Supports tools, prompts, resources with JSON-RPC 2.0, SSE streaming, session management & security. Transform any MCP server into a REST API.

#7:49 AM

llm mcp python

Smithery - Connect agents to MCPs in minutes

smithery.ai/

#7:18 AM

llm mcp

Sunday, February 8, 2026

Tools - Model Context Protocol

modelcontextprotocol.io/specification/2025-06-18/server/tools

schema reference

#4:40 AM

mcp llm code

Wednesday, February 4, 2026

A2A's Focus: Enabling agents to collaborate within their native modalities, allowing them to communicate as agents (or as users) rather than being constrained to tool-like interactions. This enables complex, multi-turn interactions where agents reason, plan, and delegate tasks to other agents. For example, this facilitates multi-turn interactions, such as those involving negotiation or clarification when placing an order.

#6:41 AM

mcp llm a2a agent

modelcontextprotocol/registry: A community driven registry service for Model Context Protocol (MCP) servers.

github.com/modelcontextprotocol/registry

The MCP registry provides MCP clients with a list of MCP servers, like an app store for MCP servers.

A "marketplace" of sorts but no market ($). Curated by model context protocol folks.

#6:31 AM

mcp llm

Wednesday, January 28, 2026

Schema Reference - Model Context Protocol

modelcontextprotocol.io/specification/2025-11-25/schema#toolannotations

interface ToolAnnotations { title?: string; readOnlyHint?: boolean; destructiveHint?: boolean; idempotentHint?: boolean; openWorldHint?: boolean; }

Additional properties describing a Tool to clients.

NOTE: all properties in ToolAnnotations are hints. They are not guaranteed to provide a faithful description of tool behavior (including descriptive properties like title).

Clients should never make tool use decisions based on ToolAnnotations received from untrusted servers.

#12:29 AM

llm mcp code

Tuesday, January 27, 2026

ChatGPT Containers can now run bash, pip/npm install packages, and download files

simonwillison.net/2026/Jan/26/chatgpt-containers/

ChatGPT can directly run Bash commands now. Previously it was limited to Python code only, although it could run shell commands via the Python subprocess module. It has Node.js and can run JavaScript directly in addition to Python. I also got it to run “hello world” in Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C and C++. No Rust yet though! While the container still can’t make outbound network requests, pip install package and npm install package both work now via a custom proxy mechanism. ChatGPT can locate the URL for a file on the web and use a container.download tool to download that file and save it to a path within the sandboxed container.

Is this a data exfiltration vulnerability though? Could a prompt injection attack trick ChatGPT into leaking private data out to a container.download call to a URL with a query string that includes sensitive information?

I don’t think it can. I tried getting it to assemble a URL with a query string and access it using container.download and it couldn’t do it. It told me that it got back this error:

ERROR: download failed because url not viewed in conversation before. open the file or url using web.run first.

This looks to me like the same safety trick used by Claude’s Web Fetch tool: only allow URL access if that URL was either directly entered by the user or if it came from search results that could not have been influenced by a prompt injection.

#2:14 AM

llm mcp code security prompt-injection prompt-injection-vulnerabilities

MCP Apps - Bringing UI Capabilities To MCP Clients | Model Context Protocol Blog

blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/

The architecture of MCP Apps relies on two key MCP primitives:

Tools with UI metadata: Tools include a _meta.ui.resourceUri field pointing to a UI resource UI Resources: Server-side resources served via the ui:// scheme containing bundled HTML/JavaScript // Tool with UI metadata { name: "visualize_data", description: "Visualize data as an interactive chart", inputSchema: { /* ... */ }, _meta: { ui: { resourceUri: "ui://charts/interactive" } } } The host fetches the resource, renders it in a sandboxed iframe, and enables bidirectional communication via JSON-RPC over postMessage.

#2:07 AM

llm mcp code

Saturday, January 10, 2026

Introducing advanced tool use on the Claude Developer Platform \ Anthropic

www.anthropic.com/engineering/advanced-tool-use

The Tool Search Tool lets Claude dynamically discover tools instead of loading all definitions upfront. You provide all your tool definitions to the API, but mark tools with defer_loading: true to make them discoverable on-demand. Deferred tools aren't loaded into Claude's context initially. Claude only sees the Tool Search Tool itself plus any tools with defer_loading: false (your most critical, frequently-used tools).

With Programmatic Tool Calling:

Instead of each tool result returning to Claude, Claude writes a Python script that orchestrates the entire workflow. The script runs in the Code Execution tool (a sandboxed environment), pausing when it needs results from your tools. When you return tool results via the API, they're processed by the script rather than consumed by the model. The script continues executing, and Claude only sees the final output.

#3:43 AM

llm claude code mcp