#anthropic + #prompt-engineering

Public notes from activescott tagged with both #anthropic and #prompt-engineering

Friday, May 22, 2026

At Lasso, we have been building Intent Security, a runtime security framework that ensures every component in the agentic system behaves as intended. It monitors the behavior of each component and analyzes their alignment. Like auto mode, when alignment holds it allows actions to proceed. When misalignment is detected, it intervenes. When we read Anthropic's post, the overlap in core assumptions was hard to miss. This post provides a comparison of the two approaches.

Independent evaluation without cross-contamination is what enables misalignment detection.

‍Anthropic's input layer screens external content for injection attempts before it reaches the agent to determine whether tool outputs are safe. The output layer structurally evaluates whether the agent's tool calls are aligned with user intent. Critically, the output classifier never sees tool results, to prevent compromised external content from influencing the security decision.

Anthropic publishes the history of system prompts used on claude.ai and the mobile apps at https://platform.claude.com/docs/en/release-notes/system-prompts. That page is a single monolithic markdown document grouped by model, and each model lists one or more dated revisions.

Friday, October 31, 2025