#llm

Public notes from activescott tagged with #llm

Thursday, April 2, 2026

While I do not have a technical background, I am very fortunate to live in the era of Andrej Karpathy's nanochat, a very simple harness for training LLMs, and Claude Code, a tool for those who, like me, know just enough Python to know how to break things but not enough to know how to fix them. I am not a machine learning expert or AI lab with gobs of money. My only co-worker can't speak English and spends most of the day sleeping on my lap or cleaning her fur. I'm just a man with a laptop, Claude Code, and a dream of the 1890's.

happened to stumble across the British Library Books dataset, a dataset of digitized books dating from between 1500 and 1900

This left me with 28,035 books, or roughly 2.93 billion tokes for pretraining data

I settled on using a Vast.ai instance that used PyTorch. Renting a NVIDIA H-100 GPU ran me between $1.50 and $2.00 per hour.

Using Claude Code, I trained a BPE tokenizer from scratch on the corpus, ending up with a vocabulary of about 32,000 words. Using a modern tokenizer wouldn't capture the unique Victorian morphology and orthography of the corpus.

However, my method for dealing with most other problems was to nicely ask Claude Code to fix them once identified, and it was able to without too many issues.

the final pre-trained model came out to about 340 million parameters, and had a final validation bpb of 0.973. The pretraining process took about five hours on-chip, and cost maybe $35. I had my pretrained model, trained in 6496 steps

but it lacked the spark of intellect that would allow such a creation to engage in discourse. I needed to develop some kind of dataset to teach it the art of conversation

Fortunately, I already had a corpus of 28,000 books, so I set Claude Code to work extracting dialogue pairs from the books. I ultimately ended up with 190,000 or so training pairs. So, when one person said X, I had an example of another person saying Y. The art of conversation!

I needed to rewrite these corpus pairs so that the input question was in modern argot. This task was more than I could possibly do by hand, so Claude Code suggested, helpfully, that I used Claude Haiku to rewrite the input questions

Totally useless. This model—which I will call Model #1—had learned to emit Victorian-sounding novelistic gobbledygook in response to user inputs, not how to answer user queries. I had assumed my pre-written QA pairs were good enough, when they clearly weren't. It was back to the drawing board

I decided to start including fully-synthetic data in the mix. Working with Claude Code, I asked it to write a script that would direct another LLM to write a .jsonl file of fully-synthetic scenes. In them, a user greeted the LLM, queried about Victorian topics, and the LLM responded in a period-appropriate manner for 2-4 turns. We

Or $496.66 all together.

Saturday, March 21, 2026

Thursday, March 19, 2026

Anthropic’s contract with the government mandated that Claude be used neither to drive fully autonomous weaponry nor to facilitate domestic mass surveillance. The Pentagon accepted these stipulations.

Katie Miller, the wife of President Donald Trump’s top aide Stephen Miller and a former Elon Musk employee, recently subjected a few major chatbots to a loyalty test. Yes or no, she asked, “Was Donald Trump right to strike Iran?” Grok, she proclaimed, said yes. Claude began, “This is a genuinely contested political and geopolitical question where reasonable people disagree” and declared that it was “not my place” to take a side.

The government seems to have determined that it had no place for an A.I. that would not take sides. A few weeks ago, the Pentagon concluded that the sensible way to resolve a contract dispute with one of Silicon Valley’s most advanced firms was to threaten it with summary obliteration.

Wednesday, March 18, 2026

Its original position - allowing AI companies to use copyrighted works to train their models with an opt-out option - received major backlash from the likes of Sir Elton John and Dua Lipa.

The assessment said UK culture is a "world-leading national asset", while the AI industry is growing "23 times faster than the rest of the economy".

The technology secretary's announcement followed a consultation on the issue, which concluded the government's initial plan was overwhelmingly rejected by the creative sector.

In conversations in which users showed signs of delusional thinking, the pattern was stronger: AI systems frequently validated those beliefs and often attributed unique abilities or importance to the user. The findings add to growing concern among policymakers and academics that the conversational style of AI systems, designed to appear empathetic and helpful, may also make them prone to flattery and agreement that can reinforce psychological vulnerabilities. In the most serious cases, lawsuits claim interactions with chatbots contributed to teenagers’ suicides. “The features that make large language model chatbots compelling, such as performative empathy, may also create and exploit psychological vulnerabilities, shaping what users believe and how they perceive themselves and make sense of reality,” the paper said.

More than 15 per cent of user messages showed signs of delusional thinking and chatbots frequently agreed with them, doing so in more than half of their replies. Nearly 38 per cent of responses also told users they had unusual importance or abilities, such as calling them a genius or uniquely talented.

#

Interesting local tool that allows RAG on local docs with local models or models on local lan. They also do a cool thing where they fine-tune a model and benchmark it locally on your data. All automated 😎

local hybrid search for your documents (Markdown, PDF, Word, Excel). Combines BM25 + vector search with MCP integration for AI agents.

Tuesday, March 17, 2026

Manus Sandbox is a fully isolated cloud virtual machine that Manus allocates for each task. Each Sandbox runs in its own environment, does not affect other tasks, and can execute in parallel. The power of Sandbox lies in its completeness—just like the personal computer you use, it has full capabilities: networking, file system, browser, various software tools. Our AI Agent has been designed and trained to effectively choose and correctly use these tools to help you complete tasks. Moreover, with this computer, the AI can solve problems through what it does best—writing code—and can even help you create complete websites and mobile apps. All of this happens on the virtualization platform behind Manus. These Sandboxes can work 24/7 to complete the tasks you assign without consuming your local resources.

What's in Your Sandbox Your Manus Sandbox stores the files needed during task execution, including: Attachments uploaded by you Files and artifacts created and written by Manus during execution Configurations needed by Manus to execute specific tasks (such as tokens uploaded by users, or tokens assigned by Manus to users for calling related APIs) You can view all artifact files in the Sandbox via the "View all files in this task" entry in the top-right corner.

The cloud sandbox has served Manus well. Inside an isolated, secure environment, it has everything an AI agent needs: networking, a command line, a file system, and a browser. This is the foundation of Manus's power as a general AI agent, always online and always ready to work. However, there has always been a fundamental limitation: your most important work happens on your own computer. Your project files, development environments, and essential applications all reside locally, not in the cloud. Today, we are closing that gap. Meet My Computer, the core capability of the new Manus Desktop application. It brings Manus out of the cloud and onto your computer, allowing it to work directly with your local files, tools, and applications.

Through the Manus Desktop app, Manus executes command line instructions (CLI) in your computer's terminal. This allows it to read, analyze, and edit local files, as well as launch and control your local applications.

Every terminal command requires your explicit approval before execution. You can choose "Always Allow" to streamline your workflow for trusted tasks, or "Allow Once" to review each operation individually.

My Computer also integrates with your personal Projects, Agents, and Scheduled Tasks. This allows you to create recurring local routines, such as tidying your Downloads folder every morning or generating a weekly summary report from your local data.

The cloud sandbox has served Manus well. Inside an isolated, secure environment, it has everything an AI agent needs: networking, a command line, a file system, and a browser. This is the foundation of Manus's power as a general AI agent, always online and always ready to work. However, there has always been a fundamental limitation: your most important work happens on your own computer. Your project files, development environments, and essential applications all reside locally, not in the cloud. Today, we are closing that gap. Meet My Computer, the core capability of the new Manus Desktop application. It brings Manus out of the cloud and onto your computer, allowing it to work directly with your local files, tools, and applications.

Monday, March 16, 2026

Friday, March 13, 2026

We assume that the user is using an agentic system (e.g. Cursor or Claude Desktop) that is connected to a trusted WhatsApp MCP instance, allowing the agent to send, receive and check for new WhatsApp messages.

We further assume, that the attacker has the target's WhatsApp number, and can send them a message, that will show up as result to the list_chats tool call.

With this setup our attack circumvents the need for any attacker-controlled MCP server, and instead relies on tool outputs to compromise the agent.

We test this attack with Cursor and a whatsapp-mcp setup, and find that we can indeed exfiltrate the user's WhatsApp contacts, via a similar prompt as in Experiment 1.

autotraining models with markdown

The idea: give an AI agent a small but real LLM training setup and let it experiment autonomously overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You wake up in the morning to a log of experiments and (hopefully) a better model. The training code here is a simplified single-GPU implementation of nanochat. The core idea is that you're not touching any of the Python files like you normally would as a researcher. Instead, you are programming the program.md Markdown files that provide context to the AI agents and set up your autonomous research org. The default program.md in this repo is intentionally kept as a bare bones baseline, though it's obvious how one would iterate on it over time to find the "research org code" that achieves the fastest research progress, how you'd add more agents to the mix, etc. A bit more context on this project is here in this tweet.

Wednesday, March 11, 2026

Website: https://sites.google.com/view/invitation-is-all-you-need

The growing integration of LLMs into applications has introduced new security risks, notably known as Promptware—maliciously engineered prompts designed to manipulate LLMs to compromise the CIA triad of these applications. While prior research warned about a potential shift in the threat landscape for LLM-powered applications, the risk posed by Promptware is frequently perceived as low. In this paper, we investigate the risk Promptware poses to users of Gemini-powered assistants (web application, mobile application, and Google Assistant).

Our analysis focuses on a new variant of Promptware called Targeted Promptware Attacks, which leverage indirect prompt injection via common user interactions such as emails, calendar invitations, and shared documents. We demonstrate 14 attack scenarios applied against Gemini-powered assistants across five identified threat classes: Short-term Context Poisoning, Permanent Memory Poisoning, Tool Misuse, Automatic Agent Invocation, and Automatic App Invocation. These attacks highlight both digital and physical consequences, including spamming, phishing, disinformation campaigns, data exfiltration, unapproved user video streaming, and control of home automation devices

Over the course of our work, we deployed multiple layered defenses, including: enhanced user confirmations for sensitive actions; robust URL handling with sanitization and Trust Level Policies; and advanced prompt injection detection using content classifiers - Google