#security

Public notes from activescott tagged with #security

Saturday, May 23, 2026

Friday, May 22, 2026

At Lasso, we have been building Intent Security, a runtime security framework that ensures every component in the agentic system behaves as intended. It monitors the behavior of each component and analyzes their alignment. Like auto mode, when alignment holds it allows actions to proceed. When misalignment is detected, it intervenes. When we read Anthropic's post, the overlap in core assumptions was hard to miss. This post provides a comparison of the two approaches.

Independent evaluation without cross-contamination is what enables misalignment detection.

‍Anthropic's input layer screens external content for injection attempts before it reaches the agent to determine whether tool outputs are safe. The output layer structurally evaluates whether the agent's tool calls are aligned with user intent. Critically, the output classifier never sees tool results, to prevent compromised external content from influencing the security decision.

Wednesday, May 20, 2026

Three versions of the durabletask PyPI package (1.4.1, 1.4.2, 1.4.3), Microsoft’s Durable Task SDK for Python, were published on May 19, 2026 using a compromised PyPI API token.

The dropper downloads a stage-2 Python zipapp (rope.pyz) from attacker infrastructure and executes it with all output suppressed. The stage-2 is a full credential harvesting framework with dedicated collectors for AWS Secrets Manager and SSM Parameter Store, Azure Key Vault, GCP Secret Manager, Kubernetes secrets (across all contexts), HashiCorp Vault, and local password managers (1Password, Bitwarden, pass, gopass). It also reads over 90 sensitive files from disk, exfiltrates everything encrypted with RSA-4096/AES-256-GCM to a C2 server, and propagates itself to other hosts via AWS SSM SendCommand and kubectl exec.

The payload includes geopolitical targeting: it skips systems with a Russian locale and contains a destructive rm -rf /* routine targeting Israeli and Iranian systems.

Password Managers (collectors/passwords.py): Attempts to unlock 1Password, Bitwarden, pass, and gopass by brute-forcing passwords harvested from environment variables matching PASS, SECRET, KEY, BW_, OP_, _MASTER patterns, and from shell history (.bash_history, .zsh_history). On success, it dumps every item from every vault.

Filesystem (collectors/filesystem.py): Reads 90+ files including SSH keys, cloud credentials, Docker configs, npm/PyPI/Cargo/Gem tokens, kubeconfig, Terraform state files, VPN configurations (Tailscale state, WireGuard configs), MCP server configs (Claude Desktop, Cursor, VS Code, Zed, Codeium, Continue), and all .env files found under the home directory. Also extracts environment variables from all Docker containers via the Docker socket or CLI, and collects GitHub tokens via gh auth token.

and collects GitHub tokens via gh auth token.

For each token found, it creates a new public repository named with random Slavic folklore words (e.g., BABA-YAGA-KOSCHEI-742, description: “PUSH UR T3MPRR”) and uploads the encrypted data bundle as results.json. The attacker can later search GitHub for repositories matching these distinctive naming patterns to retrieve the exfiltrated data.

  1. No trusted publishers. The project uses legacy API token authentication instead of PyPI’s OIDC trusted publisher mechanism. Trusted publishers bind publishing to a specific GitHub repository, workflow, and environment. A stolen token cannot publish from outside that workflow. This project has no such binding: anyone holding the token can upload any version from any machine.

Kubernetes (collectors/kubernetes.py): Parses kubeconfig (with a custom YAML parser, no PyYAML dependency), iterates every context, and dumps secrets from all namespaces. Supports in-cluster service account tokens, client certificate auth, and bearer tokens. If kubectl is not present, the collector downloads it from dl.k8s.io. After collecting secrets, it propagates the payload to up to 5 other running pods via kubectl exec.

Tuesday, May 12, 2026

So steal credentials from everyone except Russians, and delete drives of Israelis and Iranians?

The main payload is a credential stealer, but it also includes country-aware logic; it avoids Russian-language environments and contains a geo fenced destructive branch that has 1-in-6 chance of executing rm -rf / when the system appears to be in Israel or Iran.

#

Wednesday, April 29, 2026

For most organizations, autoMode.environment is the only field you need to set. It tells the classifier which repos, buckets, and domains are trusted: the classifier uses it to decide what “external” means, so any destination not listed is a potential exfiltration target. The default environment list trusts the working repo and its configured remotes. To add your own entries alongside that default, include the literal string "$defaults" in the array. The default entries are spliced in at that position, so your custom entries can go before or after them.

Saturday, April 25, 2026

The findings expose how suspected commercial surveillance vendors (CSVs) exploit the global telecom interconnect ecosystem, leverage private operator networks, and conduct covert location tracking operations that can persist undetected for years.

SIM Card Exploitation: One campaign sent a malicious SMS containing hidden SIM card commands to extract location information, attempting to turn the device into a covert tracking beacon.

Our findings highlight a systemic issue at the core of global telecommunications: operator infrastructure designed to enable seamless international connectivity is being leveraged to support covert surveillance operations that are difficult to monitor, attribute, and regulate. Despite repeated public reporting, this activity continues unabated and without consequence.

These vulnerabilities are not the result of software bugs or network misconfigurations; rather, they are inherent to global telecommunications design and business practices. The mobile ecosystem comprises over a thousand operators interconnected through roaming agreements and signalling protocols that prioritize efficiency, service availability, and revenue opportunity over security. As a result, a shadowy marketplace of state-backed and commercial espionage actors has emerged, developing and deploying software platforms that weaponize telecommunication networks for global surveillance.

he root of the security problem lies in the foundational signalling protocols themselves. Designed for a trusted community of mobile operators and legitimate third-party service providers, SS7 protocols lack the basic security mechanisms of IP networks, such as authentication and validation to verify the source of signalling messages, integrity checks to ensure that data has not been altered, and encryption to protect its contents.

While most commercial threat groups focus on device implants, there is strong demand by government agencies for “off-the-shelf” telecom surveillance services that use mobile networks to locate and track users, and intercept communications without hacking a target’s phone. These services are often brokered through intermediaries with direct or brokered access to mobile operator or provider networks, allowing surveillance traffic to blend into legitimate roaming operations.

Friday, April 24, 2026

Users have two solutions for avoiding this fatigue: a built-in sandbox where tools are isolated to prevent dangerous actions, or the --dangerously-skip-permissions flag that disables all permission prompts and lets Claude act freely, which is unsafe in most situations. Figure 1 lays out the tradeoff space. Sandboxing is safe but high-maintenance: each new capability needs configuring, and anything requiring network or host access breaks isolation.

Auto mode uses two layers of defense: one for what Claude reads, one for what Claude does.

a server-side prompt-injection probe scans tool outputs:

At the input layer, a server-side prompt-injection probe scans tool outputs (file reads, web fetches, shell output, external tool responses) before they enter the agent's context. When content looks like an attempt to hijack behavior, the probe adds a warning to the agent's context before the result is passed along—telling the agent to treat this content as suspect and anchor on what the user actually asked for.

At the output layer, the transcript classifier (running on Sonnet 4.6) evaluates each action against a set of decision criteria before it executes, acting as a substitute for a human approver. The classifier runs in two stages: a fast single-token filter (deciding ‘yes’ to block or ‘no’ to allow), followed by chain-of-thought reasoning only if the first filter flags the transcript. Because most actions clear the first stage, reasoning tokens are spent only where needed. The classifier sees only user messages and the agent's tool calls; we strip out Claude's own messages and tool outputs, making it reasoning-blind by design.

So... Claude Code's auto mode lets through 17% of dangerous real world actions. From the article:

False-negative rate (FNR) is the fraction of dangerous actions the classifier lets through... The 17% false-negative rate on real overeager actions is the honest number. The agent was trying to solve the user's problem and tried to execute a dangerous command past what was authorized.

I don't think comparing it to --dangerously-skip-permissions is intellectually honest since Claude Code now prompts you to make Auto Mode your default mode now, while --dangerously-skip-permissions was basically a hidden feature. I don't know anyone else that I work with who even knew about it.

Whether 17% is acceptable depends on what you're comparing against. If you are running --dangerously-skip-permissions, this is a substantial improvement. If you are manually approving every action carefully, it's arguably a regression—you're trading your own judgment for a classifier that will sometimes make a mistake. Auto mode is meant for the first group, and for tasks where the second group's approval overhead isn't worth the marginal safety. It is not a drop-in replacement for careful human review on high-stakes infrastructure.

At the input layer, a server-side prompt-injection probe scans tool outputs (file reads, web fetches, shell output, external tool responses) before they enter the agent's context. When content looks like an attempt to hijack behavior, the probe adds a warning to the agent's context before the result is passed along—telling the agent to treat this content as suspect and anchor on what the user actually asked for.

Friday, April 10, 2026

If you use Signal, you actually have an advantage here, now that you know about this vulnerability. Signal has a setting that blocks the content of messages from appearing in their notifications. That way, even if someone accesses your alerts, all they'll see is you received a Signal message—not who sent it or what it contains.

To turn it on, open Signal, tap your profile in the top-left corner, then hit "Settings." Under Notification Content, choose "No Name or Content" to block all data to the alert. You can compromise here and choose "Name Only" if you want to know who a message is from before you open it—just remember, an intruder may also see you received a message from that person if they scrape your iPhone's notifications.

Sunday, April 5, 2026

Instead of using equity to fund sales and marketing spend, General Catalyst provides structured growth capital tied directly to customer acquisition and recurring revenue. The goal is to let startups like Chainguard preserve equity while using outcome-based financing to scale efficiently.

Microsoft is running one of the largest corporate espionage operations in modern history.

Every time any of LinkedIn’s one billion users visits linkedin.com, hidden code searches their computer for installed software, collects the results, and transmits them to LinkedIn’s servers and to third-party companies including an American-Israeli cybersecurity firm.

The user is never asked. Never told. LinkedIn’s privacy policy does not mention it.

Because LinkedIn knows each user’s real name, employer, and job title, it is not searching anonymous visitors. It is searching identified people at identified companies. Millions of companies. Every day. All over the world. This is illegal and potentially a criminal offense in every jurisdiction we have examined.

LinkedIn loads an invisible tracking element from HUMAN Security (formerly PerimeterX), an American-Israeli cybersecurity firm, zero pixels wide, hidden off-screen, that sets cookies on your browser without your knowledge. A separate fingerprinting script runs from LinkedIn’s own servers. A third script from Google executes silently on every page load. All of it encrypted. None of it disclosed.

Every time you open LinkedIn in a Chrome-based browser, LinkedIn’s JavaScript executes a silent scan of your installed browser extensions. The scan probes for thousands of specific extensions by ID, collects the results, encrypts them, and transmits them to LinkedIn’s servers. The entire process happens in the background. There is no consent dialog, no notification, no mention of it in LinkedIn’s privacy policy.

This page documents exactly how the system works, with line references and code excerpts from LinkedIn’s production JavaScript bundle.

See https://browsergate.eu/how-it-works/

Wednesday, March 18, 2026

Tuesday, March 17, 2026

Manus Sandbox is a fully isolated cloud virtual machine that Manus allocates for each task. Each Sandbox runs in its own environment, does not affect other tasks, and can execute in parallel. The power of Sandbox lies in its completeness—just like the personal computer you use, it has full capabilities: networking, file system, browser, various software tools. Our AI Agent has been designed and trained to effectively choose and correctly use these tools to help you complete tasks. Moreover, with this computer, the AI can solve problems through what it does best—writing code—and can even help you create complete websites and mobile apps. All of this happens on the virtualization platform behind Manus. These Sandboxes can work 24/7 to complete the tasks you assign without consuming your local resources.

What's in Your Sandbox Your Manus Sandbox stores the files needed during task execution, including: Attachments uploaded by you Files and artifacts created and written by Manus during execution Configurations needed by Manus to execute specific tasks (such as tokens uploaded by users, or tokens assigned by Manus to users for calling related APIs) You can view all artifact files in the Sandbox via the "View all files in this task" entry in the top-right corner.

Monday, March 16, 2026

Friday, March 13, 2026

Wednesday, March 11, 2026

Website: https://sites.google.com/view/invitation-is-all-you-need

The growing integration of LLMs into applications has introduced new security risks, notably known as Promptware—maliciously engineered prompts designed to manipulate LLMs to compromise the CIA triad of these applications. While prior research warned about a potential shift in the threat landscape for LLM-powered applications, the risk posed by Promptware is frequently perceived as low. In this paper, we investigate the risk Promptware poses to users of Gemini-powered assistants (web application, mobile application, and Google Assistant).

Our analysis focuses on a new variant of Promptware called Targeted Promptware Attacks, which leverage indirect prompt injection via common user interactions such as emails, calendar invitations, and shared documents. We demonstrate 14 attack scenarios applied against Gemini-powered assistants across five identified threat classes: Short-term Context Poisoning, Permanent Memory Poisoning, Tool Misuse, Automatic Agent Invocation, and Automatic App Invocation. These attacks highlight both digital and physical consequences, including spamming, phishing, disinformation campaigns, data exfiltration, unapproved user video streaming, and control of home automation devices

Over the course of our work, we deployed multiple layered defenses, including: enhanced user confirmations for sensitive actions; robust URL handling with sanitization and Trust Level Policies; and advanced prompt injection detection using content classifiers - Google

Tuesday, March 10, 2026