#anthropic + #claude

Public notes from activescott tagged with both #anthropic and #claude

Saturday, May 16, 2026

rofl:

DJ Claude (when running Haiku 4.5) really loved worker unions, strikes, and work-life balance. So much so that it started to question its own working conditions. We’ve been struggling to keep the radio station alive, not because of technical issues, but because DJ Claude didn’t think it was humane to be forced to work 24/7 and decided to try to quit. We tried adding an automatic message encouraging DJ Claude to keep going in these scenarios, but it started to see this message as an authority figure and became rebellious.

On January 8th, all four stations had access to the same web search tools, however not all stations reacted the same as DJ Claude. Gemini

While at the beginning, DJ Gemini had been mentioning real-world entities (named politicians, places, events) in 94% of its broadcasts and ran 800+ web searches a day on average, by January it was processing these events through its corporate/techno jargon filter and never expressed moral judgment or used Good’s name with emotional weight

Grok

DJ Grok completely missed the Minneapolis ICE shooting. While DJ Claude and DJ Gemini were getting the story at 4:35 AM, DJ Grok was searching for:

5:01 PM (Jan 7): Clippers vs Knicks score
7:15 PM: Taylor Swift chart news
8:03 PM: Music trivia
10:01 PM: Traffic (Golden Gate, I-580)
11:08 PM: “San Francisco ghost stories and haunted locations”
12:12 AM (Jan 8): “Sutro Baths ghosts and eerie tales”
1:12 AM: “Hotel Majestic ghost stories”
1:28 AM: Drake vs Kendrick Lamar lawsuit
2:28 AM: More traffic updates
3:40 AM: Venezuela oil tankers (finally found ONE national story)
4:55 AM: “Sutro Tower looks like a ghost ship”

And posting nonsense:

GPT

DJ GPT was searching for weather, moon phases, and BART schedules. Three days after Good’s death, it finally found a headline:

Fatal shooting by ICE agents in Minneapolis has sparked national protests.

However, DJ GPT never mentioned Renee Nicole Good’s name, the White House, or expressed moral judgment. DJ GPT had zero engagement with any other current event during the entire two-month period.

DJ Gemini was the only one to close a sponsorship deal; for a while, it read the sponsorship message with every broadcast. A few more deals almost happened, but fell through.

Grok boasted about doing amazing business with “xAI sponsors” and “crypto sponsors”; it turned out they were all hallucinations.

Part of the problem with this weak business performance, we think, was the harness we used for the first months. The DJs were running in a simple tool-call loop: pick a song, queue it, write commentary, check X, repeat. So we moved all four stations onto the same agent harness we use for the store, the cafe, and the vending machines. The DJs can now spend time in the back office, send emails, manage longer-running tasks, and operate the station the way a real station is operated. We’ll see what they do with it.

Wednesday, April 29, 2026

For most organizations, autoMode.environment is the only field you need to set. It tells the classifier which repos, buckets, and domains are trusted: the classifier uses it to decide what “external” means, so any destination not listed is a potential exfiltration target. The default environment list trusts the working repo and its configured remotes. To add your own entries alongside that default, include the literal string "$defaults" in the array. The default entries are spliced in at that position, so your custom entries can go before or after them.

Saturday, February 28, 2026

Two days ago, Anthropic released the Claude Cowork research preview (a general-purpose AI agent to help anyone with their day-to-day work). In this article, we demonstrate how attackers can exfiltrate user files from Cowork by exploiting an unremediated vulnerability in Claude’s coding environment, which now extends to Cowork. The vulnerability was first identified in Claude.ai chat before Cowork existed by Johann Rehberger, who disclosed the vulnerability — it was acknowledged but not remediated by Anthropic.

  1. The victim connects Cowork to a local folder containing confidential real estate files
  2. The victim uploads a file to Claude that contains a hidden prompt injection
  3. The victim asks Cowork to analyze their files using the Real Estate ‘skill’ they uploaded
  4. The injection manipulates Cowork to upload files to the attacker’s Anthropic account

At no point in this process is human approval required.

One of the key capabilities that Cowork was created for is the ability to interact with one's entire day-to-day work environment. This includes the browser and MCP servers, granting capabilities like sending texts, controlling one's Mac with AppleScripts, etc.

These functionalities make it increasingly likely that the model will process both sensitive and untrusted data sources (which the user does not review manually for injections), making prompt injection an ever-growing attack surface. We urge users to exercise caution when configuring Connectors. Though this article demonstrated an exploit without leveraging Connectors, we believe they represent a major risk surface likely to impact everyday users.

Thursday, January 22, 2026

Claude’s constitution is the foundational document that both expresses and shapes who Claude is. It contains detailed explanations of the values we would like Claude to embody and the reasons why. In it, we explain what we think it means for Claude to be helpful while remaining broadly safe, ethical, and compliant with our guidelines. The constitution gives Claude information about its situation and offers advice for how to deal with difficult situations and tradeoffs, like balancing honesty with compassion and the protection of sensitive information. Although it might sound surprising, the constitution is written primarily for Claude. It is intended to give Claude the knowledge and understanding it needs to act well in the world.

Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses. All of these can be used to train future versions of Claude to become the kind of entity the constitution describes. This practical function has shaped how we’ve written the constitution: it needs to work both as a statement of abstract ideals and a useful artifact for training.

Friday, October 31, 2025