#ai

Public notes from activescott tagged with #ai

Wednesday, May 27, 2026

...these systems sit squarely in the center of a recent high-stakes battle between the US government and AI startup Anthropic. Anthropic is seeking to preserve two “red lines”: bans on domestic mass surveillance and on weapons that can identify, track, and kill targets with zero human involvement. Since the start of the year, it’s emerged as the only military AI contractor to place meaningful limits on what experts call one of the final frontiers of AI warfare.

At the center of the debates is DOD Directive 3000.09, one of the only policies governing the use of lethal autonomous weapons. Originally written in 2012, it defines such a system as one that, “once activated, can select and engage targets without further intervention by an operator.” And it decrees that both fully autonomous and semi-autonomous weapons be designed to allow humans to “exercise appropriate levels” of judgment over the use of force.

The directive set up the “first policy on the use of autonomy in warfare,” said Hamza Chaudhry, who leads AI and national security at the Future of Life Institute.

Depending on how you interpret the definition, however, certain missile defense programs may have crossed that line decades ago. Take the Phalanx CIWS, for instance. It’s an automated weapon system resembling a very large gun, built to defend naval vessels from incoming missile attacks. That type of system wouldn’t work if there were a human in the loop, since it has to respond in milliseconds.

The difference, some experts say, is that these systems operate solely in a defense-only, fixed environment. They’re engaging, this interpretation goes, but not deciding — just reacting to an incoming threat. “The ‘and’ is doing a lot of work inside of that statute — we have systems that can decide and systems that can engage but you can’t have a system that does both,” Reddie said.

They're also "killing" missiles not humans.

Google employees argued their company should take a stand — and it did, choosing not to renew its contract amid the controversy in mid-2018. But Amazon and Microsoft quickly swooped in to pick up tens of millions of dollars in contracts for the same work. Palantir soon took over, and Project Maven became the Maven Smart System (MSS), which not only allows for object detection and tracking but also analyzing surveillance data on a large scale.

The sheer volume of targets could make any meaningful human supervision difficult, said Shoker. “What we know about MSS is that it reduces the number of human beings in the targeting cycle — and that’s actually by design.”

While Anthropic might have been all right reducing human intervention, it’s pushed back against setting it to zero. As Google found with Project Maven, though, competitors are more than willing to fill the gap.

OpenAI quickly signed onto the terms Anthropic had spurned. And in the months after snubbing Anthropic, the Department of Defense signed deals with eight companies to deploy their AI on classified networks: Google, Microsoft, Amazon Web Services, Nvidia, OpenAI, Reflection, Oracle, and SpaceX.

Silicon Valley executives are aggressively pushing back against employee organizing and speaking out, including by using AI to identify leakers. And many tech workers already fear for their jobs in an era when AI is set to replace entry-level roles at their own firms.

Anthropic CEO Dario Amodei has held firm on mass surveillance for Americans, but he’s demonstrated no problem with — and in fact expressed his support for — such surveillance for everyone else.

Anthropic’s “very narrow” red lines “do not go far enough to protect human rights or to comply with international law,” said Tech Justice Law’s Batt. “Anthropic specifically talks about mass domestic surveillance of US persons as posing grave civil liberties concerns, but the same civil liberties concerns apply with equal force to non-US persons,” she added.

In a blog post, he said that “fully autonomous weapons (those that take humans out of the loop entirely and automate selecting and engaging targets) may prove critical for our national defense.” Amodei even said he was happy to “work directly with the Department of War on R&D to improve the reliability of these systems” and speed up the timeline for the company’s help in deploying them.

Wednesday, May 20, 2026

The net result is a chip with a lot of compute and a lot of SRAM that is blisteringly fast to access. To put it in numbers, the WSE-3 (Cerebras’ latest chip) has 44GB of on-chip SRAM at 21 PB/s of bandwidth; an H100 has 80GB of HBM at 3.35 TB/s. In other words, the WSE-3 has just over half the memory of an H100, but 6,000 times the memory bandwidth.

The reason to compare the WSE-3 to an H100 is that the H100 is the chip most used for inference — and inference is clearly what Cerebras is most well-suited for. You can use Cerebras chips for training, but the chip-to-chip networking story isn’t very compelling, which is to say that all of that compute and on-chip memory is mostly just sitting around; what is much more interesting is the idea of getting a stream of tokens at dramatically faster speed than you can from a GPU.

Note, however, that the limitation in terms of training also potentially applies in terms of inference: as long as everything fits in on-chip memory Cerebras’ speed is an incredible experience; the moment you need more memory, whether that be for a larger model or, more likely, a larger KV cache, then Cerebras doesn’t make much sense, particularly given the price.

At the same time, I do think there will be a market for Cerebras-style chips: right now the company is highlighting the usefulness of speed for coding — reasoning means a lot of tokens, which means that dramatically scaling up tokens-per-second equals faster thinking — but I think this is a temporary use case, for reasons I’ll explain in a bit. What does matter is how long humans are waiting for an answer, and as products like AI wearables become more of a thing, the speed of interaction, particularly for voice — which will be a function of token generation speed — will have a tangible effect on the user experience.

All of this falls under the banner of “inference”, but I think it will be increasingly clear that there is a difference between providing an answer — what I will call “answer inference” — and doing a task — what I will call “agentic inference.” Cerebras’ target market is “answer inference”; in the long run, I think the architecture for “agentic inference” will look a lot different, not just from Cerebras’ approach, but from the GPU approach as well.

I mentioned above that fast inference for coding is a temporary use case. Specifically, coding with LLMs requires a human in the loop. It’s the human that defines what is to be coded, checks the work, commits the pull request, etc.; it’s not hard to envision a future, however, where all of this is completely handled by machines. This will apply to agentic work broadly: the true power of agents will not be that they do work for humans, but rather that they do work without human involvement at all.

This, by extension, will mean that the likely best approach to solving agentic inference will look a lot different than answer inference. The most important aspect for answer inference is token speed; the most important aspect for agentic inference, however, is memory. Agents need context, state, and history. Some of that will live as active KV cache; some will live in host memory or SSDs; much of it will live in databases, logs, embeddings, and object stores. The important point is that agentic inference will be less about GPUs answering a question and more about the memory hierarchy wrapped around a model.

Critically, this articulation of an agentic-specific memory hierarchy implies a necessary trade-off of speed for capacity. Here’s the thing, though: lower speed isn’t nearly as important a consideration if there isn’t a human in the loop. If an agent is waiting around for a job that is being run overnight, the agent doesn’t know or care about the user experience impact; what is most important is being able to accomplish a task, and if entirely new approaches to memory make that possible, then delays are fine.

Meanwhile, if delays are fine, then all of the focus on pure compute power and high-bandwidth memory seems out of place: if latency isn’t the top priority, then slower and cheaper memory — like traditional DRAM, for example — makes a lot more sense. And if the entire system is mostly waiting on memory, then chips don’t need to be as fast as the cutting edge either. This represents a profound shift in future architectures, but it also doesn’t mean that current architectures are going away:

Wednesday, May 13, 2026

In data released Wednesday, finance startup Ramp said more of its customers used Anthropic’s models than OpenAI’s for the first time, with 34.4% using Anthropic versus 32.3% using OpenAI. Adoption of Anthropic’s Claude tools jumped 3.8% from March to April, while OpenAI adoption fell 2.9%, according to the data. Ramp analyzes the spend of approximately 50,000 customers to track AI adoption trends.

Monday, May 4, 2026

Lenders, including JPMorgan and MUFG, have spent more than six months distributing $38bn of construction debt tied to a data centre project leased to Oracle in Texas and Wisconsin, people familiar with the matter said.

Some banks sought to sell the loans at a discount to non-bank lenders to offload the Oracle-linked debt, the people said.

Banks have in recent weeks sounded out investors about structures including a variant of a significant risk transfer, or SRT. SRTs have been commonly used by European banks to reduce their capital requirements by offloading the risk of losses on part of a loan portfolio to investors such as private credit funds and insurers in exchange for a return. North American banks have begun using the instruments more in recent years. Rather than a classic SRT that may be tied to dozens of loans, banks are exploring slicing and dicing large and concentrated data centre loans to shift the riskiest portions off their books, for example.

Companies have already started expanding to new debt markets beyond bank lending by issuing private credit, asset-backed securities, commercial mortgage-backed securities and privately placed bonds. “There’s a nervousness . . . [Banks] are having to find more counterparties in order to achieve for what’s in the market and in the pipeline,” said Carlos Mendez, co-founder at Crayhill Capital.

Friday, March 13, 2026

autotraining models with markdown

The idea: give an AI agent a small but real LLM training setup and let it experiment autonomously overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You wake up in the morning to a log of experiments and (hopefully) a better model. The training code here is a simplified single-GPU implementation of nanochat. The core idea is that you're not touching any of the Python files like you normally would as a researcher. Instead, you are programming the program.md Markdown files that provide context to the AI agents and set up your autonomous research org. The default program.md in this repo is intentionally kept as a bare bones baseline, though it's obvious how one would iterate on it over time to find the "research org code" that achieves the fastest research progress, how you'd add more agents to the mix, etc. A bit more context on this project is here in this tweet.

We did the math. At $185 billion a year, in eight years, Google would be spending $1.5 trillion, slightly more than OpenAI has committed to spend over the same time period. Extend that out to 10 years, as Vahdat noted, and Google would be spending $1.9 trillion.

Vahdat is clear that this is “not a promise” that Google would spend that much over the next 10 years. But the decade-long view he takes suggests the scope of Google’s bet. “The point here is that we are, at Google, investing at the highest levels,” he says.

There’s a big difference between Google’s data center ambitions and OpenAI’s: Google is a money-making machine. In the fourth quarter, Google parent Alphabet raked in $113 billion in revenue; for the full year, sales topped $400 billion for the first time in the company’s more than 25 year history. By comparison, OpenAI is spending at similar levels and only brought in about $13 billion in revenue last year — a tiny fraction of Google’s revenue, and less than half of Google’s cash reserves.

Google’s TPUs previously were only used in house for Google’s own infrastructure — to power consumer apps like Gmail and YouTube, and eventually train self-driving cars and develop and run AI models like Gemini. Now, they’re one of the industry’s go-tos: maybe not as popular as Nvidia’s top of the line Blackwells, but still useful for pretraining and operating AI models at scale. Google first started selling access to them through a cloud service in 2018, letting other companies rent out processing power. But more recently, Google has inked high profile deals, like a big contract with Anthropic, and has reportedly been in talks with Meta to use its chips. In December, Morgan Stanley estimated that TPUs could generate $13 billion for Google by 2027. “It is fair to say that the demand for cloud TPUs has been unprecedented,” Vahdat says, particularly in the last few years.

In August, Vahdat, Google Chief Scientist Jeff Dean, and 10 other researchers and execs at the company, co-published a paper aiming to contextualize AI’s power guzzling. The paper says that the median prompt for Google’s Gemini AI model uses the same amount of energy it takes to power 9 seconds of television and consumes around five drops of water, which they write is “substantially lower than many public estimates.” (One report says large data centers can consume up to 5 million gallons per day, equivalent to the water use of a town populated by up to 50,000 people.)

Coding After Coders: Summary

The New Reality of AI-Assisted Programming

  • Elite software developers now rarely write code themselves — instead, they direct AI agents in plain English
  • Tools like Claude Code deploy multiple agents simultaneously: one writes, one tests, one supervises
  • Tasks that once took days now take under an hour

The Strange New Workflow

  • Developers spend their days describing intent to AI, reviewing the AI's "plan," then letting agents execute
  • When agents misbehave, developers have resorted to scolding, pleading, ALL-CAPS commands, and emotionally charged language ("embarrassing," "national security imperative") — and it seems to work
  • Prompt files have become records of hard-won rules to constrain unpredictable AI behavior

Economic Stakes

  • Coding was once considered near-guaranteed, high-paying employment ($200K+)
  • It may be the first expensive white-collar skill AI can fully replace — unlike AI video or legal briefs, AI-generated code that passes tests is indistinguishable in value from human-written code
  • Irony noted: Silicon Valley workers, who told others to "learn to code," got automated first

Developer Sentiment: Mostly Euphoric

  • Most developers interviewed were energized, not demoralized — reporting 10x to 100x productivity gains
  • Key insight from tech executive Anil Dash: unlike creative fields where AI removes the soulful work and leaves drudgery, in coding AI removes the drudgery and leaves the soulful parts

Historical Context: A Long Arc of Abstraction

  • Each programming era simplified the one before: Assembly → high-level languages (Python) → open-source packages → now natural language intent
  • AI represents the highest abstraction layer yet: developers no longer need to manage syntax, memory, or debugging minutiae
  • The open question, now being asked at Anthropic itself: what is coding, fundamentally, when the code-writing is gone?

Monday, March 2, 2026

It does, kinda, matter that Hegseth turned a simple contract dispute into an attempted corporate death sentence, weaponizing a supply-chain security designation that was clearly designed for tech the US government fears could be infiltrated by hostile foreign nations.

Yet, under Hegseth’s order, Chinese AI models would technically be more welcome in America’s military supply chain than Anthropic’s. The “supply chain risk” designation is now being used to punish a domestic company for having safety guidelines. DeepSeek, with its direct ties to the Chinese government, faces fewer restrictions than a San Francisco company that committed the cardinal sin of asking for human oversight on killing decisions.

One source familiar with the Pentagon’s negotiations with AI companies confirmed that OpenAI’s deal is much softer than the one Anthropic was pushing for, thanks largely to three words: “any lawful use.” In negotiations, the person said, the Pentagon wouldn’t back down on its desire to collect and analyze bulk data on Americans. If you look line-by-line at the OpenAI terms, the source said, every aspect of it boils down to: If it’s technically legal, then the US military can use OpenAI’s technology to carry it out. And over the past decades, the US government has stretched the definition of “technically legal” to cover sweeping mass surveillance programs — and more.

In the years after 9/11, US intelligence agencies ramped up a surveillance system that they determined fell within the legal limits OpenAI cites, including multiple mass domestic spying operations (along with apparently highly invasive international ones). In 2013, National Security Agency intelligence contractor Edward Snowden revealed the extent of some of these programs, such as reportedly collecting telephone records of Verizon customers on an “ongoing, daily” basis, and gathering bulk data on individuals from tech companies like Microsoft, Google, and Apple via a secretive program called PRISM. Despite promises of reform from intelligence agencies and attempts at legal changes, few significant limits to these powers were enacted. Mike Masnick, founder of Techdirt, said online that OpenAI’s deal “absolutely does allow for domestic surveillance. EO 12333 is how the NSA hides its domestic surveillance by capturing communications by tapping into lines outside the US even if it contains info from/on US persons.”

Friday, February 27, 2026

Wednesday, February 25, 2026

The danger here isn’t just about one contract; it’s about the precedent. If the Pentagon successfully bullies Anthropic into submission or replaces it with a more “flexible” competitor, we are effectively witnessing the birth of an intentionally unethical AI.

The Death of Human Agency When AI is integrated into weaponry for “all lawful purposes” without restrictions on autonomy, we invite the Responsibility Gap. If an AI-driven drone swarm misidentifies a target, who is at fault? By removing the “human-in-the-loop” requirement, the military is seeking a weapon that offers the ultimate prize of war: lethality without accountability. Surveillance as a Service Existing U.S. laws were written for wiretaps, not for generative AI that can ingest millions of data points to build predictive profiles. Under an “all lawful purposes” mandate, an LLM could be turned into a digital Panopticon. Anthropic has warned that current laws have not caught up to what AI can do in terms of analyzing open-source intelligence on citizens. The Moral Race to the Bottom If the Pentagon blacklists Anthropic, it sends a clear message to competitors: Safety is a liability. To win government billions, firms will be incentivized to strip away safety layers. Reports already suggest OpenAI, Google, and xAI have shown more “flexibility” regarding the Pentagon’s demands.

The Pentagon’s “supply chain threat” maneuver is a scorched-earth tactic designed to force Silicon Valley to choose between its values and its bottom line.

If Anthropic stands firm, it may lose $200 million in revenue and a seat at the defense table. But if they cave, they may well be providing the operating system for the very “Terminator” future they were founded to prevent. In the world of 2026, the most dangerous threat to the supply chain might just be an AI that has been ordered to stop caring about ethics.

Monday, February 23, 2026

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.

It’s clear that the huge spending on AI is adding to the U.S. economy, but the available economic data doesn’t neatly capture its effects. The debating economists and the slippery data suggest that if the technology does start to reshape the economy, it may be challenging to detect and clearly measure. That may leave political and corporate leaders to choose the numbers that fit their preferred narratives on how AI is changing American life and work.

That’s because the $31 trillion in yearly U.S. gross domestic product, the widest measure of the economy, tallies only the final value of products and services produced domestically. Spending on imports and foreign made components is subtracted because it boosts the economies of other countries, not that of the United States.

Roughly three-quarters of the cost of an AI data center is for the computer gear and parts such as computer chips that go inside of it, technology analysts estimate. America’s AI champions, including the computer chip pioneer Nvidia, manufacture many of their products in Asia — despite efforts by the Biden and Trump administrations to reduce U.S. dependence on essential chips made overseas.

And some forecasters say that the U.S. government’s economic data is a poor measure of the impact of AI and that alternative calculations show the current boom is an even bigger boost to economic growth.

“This is a big deal, but not the be-all and end-all,” said Joseph Politano, an economic analyst who writes the Apricitas Economics newsletter. He calculates that AI-related spending contributed about 0.2 percentage points to the 2.2 percent U.S. economic growth last year.

The AI buildup is putting real money into the pockets of some Americans and U.S. businesses. Stock market gains from AI enthusiasm are plumping up Americans’ investment portfolios.

“The two engines of today’s economy are the AI ecosystem and wealthy consumers,” Richmond Fed President Tom Barkin said in a January speech.

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.

Prominent economists, including from Morgan Stanley and JPMorgan Chase, calculate that the AI buildup was directly responsible not for 92 percent or 39 percent of gains to the U.S. economy in 2025, but as little as zero.