#anthropic + #llm

Public notes from activescott tagged with both #anthropic and #llm

Thursday, March 19, 2026

Anthropic’s contract with the government mandated that Claude be used neither to drive fully autonomous weaponry nor to facilitate domestic mass surveillance. The Pentagon accepted these stipulations.

Katie Miller, the wife of President Donald Trump’s top aide Stephen Miller and a former Elon Musk employee, recently subjected a few major chatbots to a loyalty test. Yes or no, she asked, “Was Donald Trump right to strike Iran?” Grok, she proclaimed, said yes. Claude began, “This is a genuinely contested political and geopolitical question where reasonable people disagree” and declared that it was “not my place” to take a side.

The government seems to have determined that it had no place for an A.I. that would not take sides. A few weeks ago, the Pentagon concluded that the sensible way to resolve a contract dispute with one of Silicon Valley’s most advanced firms was to threaten it with summary obliteration.

Saturday, February 28, 2026

Two days ago, Anthropic released the Claude Cowork research preview (a general-purpose AI agent to help anyone with their day-to-day work). In this article, we demonstrate how attackers can exfiltrate user files from Cowork by exploiting an unremediated vulnerability in Claude’s coding environment, which now extends to Cowork. The vulnerability was first identified in Claude.ai chat before Cowork existed by Johann Rehberger, who disclosed the vulnerability — it was acknowledged but not remediated by Anthropic.

  1. The victim connects Cowork to a local folder containing confidential real estate files
  2. The victim uploads a file to Claude that contains a hidden prompt injection
  3. The victim asks Cowork to analyze their files using the Real Estate ‘skill’ they uploaded
  4. The injection manipulates Cowork to upload files to the attacker’s Anthropic account

At no point in this process is human approval required.

One of the key capabilities that Cowork was created for is the ability to interact with one's entire day-to-day work environment. This includes the browser and MCP servers, granting capabilities like sending texts, controlling one's Mac with AppleScripts, etc.

These functionalities make it increasingly likely that the model will process both sensitive and untrusted data sources (which the user does not review manually for injections), making prompt injection an ever-growing attack surface. We urge users to exercise caution when configuring Connectors. Though this article demonstrated an exploit without leveraging Connectors, we believe they represent a major risk surface likely to impact everyday users.