activescott's Notes

Public notes from activescott

Thursday, April 2, 2026

While I do not have a technical background, I am very fortunate to live in the era of Andrej Karpathy's nanochat, a very simple harness for training LLMs, and Claude Code, a tool for those who, like me, know just enough Python to know how to break things but not enough to know how to fix them. I am not a machine learning expert or AI lab with gobs of money. My only co-worker can't speak English and spends most of the day sleeping on my lap or cleaning her fur. I'm just a man with a laptop, Claude Code, and a dream of the 1890's.

happened to stumble across the British Library Books dataset, a dataset of digitized books dating from between 1500 and 1900

This left me with 28,035 books, or roughly 2.93 billion tokes for pretraining data

I settled on using a Vast.ai instance that used PyTorch. Renting a NVIDIA H-100 GPU ran me between $1.50 and $2.00 per hour.

Using Claude Code, I trained a BPE tokenizer from scratch on the corpus, ending up with a vocabulary of about 32,000 words. Using a modern tokenizer wouldn't capture the unique Victorian morphology and orthography of the corpus.

However, my method for dealing with most other problems was to nicely ask Claude Code to fix them once identified, and it was able to without too many issues.

the final pre-trained model came out to about 340 million parameters, and had a final validation bpb of 0.973. The pretraining process took about five hours on-chip, and cost maybe $35. I had my pretrained model, trained in 6496 steps

but it lacked the spark of intellect that would allow such a creation to engage in discourse. I needed to develop some kind of dataset to teach it the art of conversation

Fortunately, I already had a corpus of 28,000 books, so I set Claude Code to work extracting dialogue pairs from the books. I ultimately ended up with 190,000 or so training pairs. So, when one person said X, I had an example of another person saying Y. The art of conversation!

I needed to rewrite these corpus pairs so that the input question was in modern argot. This task was more than I could possibly do by hand, so Claude Code suggested, helpfully, that I used Claude Haiku to rewrite the input questions

Totally useless. This model—which I will call Model #1—had learned to emit Victorian-sounding novelistic gobbledygook in response to user inputs, not how to answer user queries. I had assumed my pre-written QA pairs were good enough, when they clearly weren't. It was back to the drawing board

I decided to start including fully-synthetic data in the mix. Working with Claude Code, I asked it to write a script that would direct another LLM to write a .jsonl file of fully-synthetic scenes. In them, a user greeted the LLM, queried about Victorian topics, and the LLM responded in a period-appropriate manner for 2-4 turns. We

Or $496.66 all together.

Wednesday, April 1, 2026

Blocks, Elements and Modifiers Block Standalone entity that is meaningful on its own.

Examples header, container, menu, checkbox, input

Element A part of a block that has no standalone meaning and is semantically tied to its block.

Examples menu item, list item, checkbox caption, header title

Modifier A flag on a block or element. Use them to change appearance or behavior.

Examples disabled, highlighted, checked, fixed, size big, color yellow

#

uhashring implements consistent hashing in pure Python.

Consistent hashing is mostly used on distributed systems/caches/databases as this avoid the total reshuffling of your key-node mappings when adding or removing a node in your ring (called continuum on libketama). More information and details about this can be found in the literature section.

This full featured implementation offers:

a lot of convenient methods to use your consistent hash ring in real world applications. simple integration with other libs such as memcache through monkey patching. a full ketama compatibility if you need to use it (see important mention below). all the missing functions in the libketama C python binding (which is not even available on pypi) for ketama users. possibility to use your own weight and hash functions if you don't care about the ketama compatibility. instance-oriented usage so you can use your consistent hash ring object directly in your code (see advanced usage). native pypy support, since this is a pure python library. tests of implementation, key distribution and ketama compatibility.

Solo founders and small teams are shipping real products faster than ever.

Launching is the easy part. Once you're live, the hard part starts -- growth, monetization, security, stability — that's where most apps stall out or die.

We've scaled products to millions of users and hundreds of millions in revenue. Now we're building the tools we wish we had when we started.

Think of us as the cofounder you wish you had.

Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.

Define workflows where each step in the workflow is a container. Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a graph (DAG). Easily run compute intensive jobs for machine learning or data processing in a fraction of the time using Argo Workflows on Kubernetes. Run CI/CD pipelines natively on Kubernetes without configuring complex software development products.

Tuesday, March 31, 2026

A filter composed of several other filters (AdGuard Base filter, Social media filter, Tracking Protection filter, Mobile Ads filter, EasyList and EasyPrivacy) and simplified specifically to be better compatible with DNS-level ad blocking.

The direct link to the filter: https://adguardteam.github.io/AdGuardSDNSFilter/Filters/filter.txt.

Please note, that to use this filter it is necessary to support basic ad blocking rules syntax. It does not make much sense to extract just the hosts file.

Monday, March 30, 2026

SPACE was created explicitly to address the limitations of single-dimension productivity metrics (including DORA). Its core argument is that developer productivity is multidimensional and cannot be captured by any single metric or even a single category of metrics. You need to measure across multiple dimensions and combine perceptual (self-reported) data with behavioral (system-observed) data.

S — Satisfaction and Well-being

What it measures: How fulfilled, happy, and healthy developers feel about their work, team, tools, and culture. Why it matters: Developer satisfaction is both an outcome worth caring about and a leading indicator of future productivity. Dissatisfied developers leave, disengage, or burn out — all of which destroy team productivity over time. Satisfaction is also the dimension most likely to surface problems that system metrics miss (e.g., "our CI is technically fast but the developer experience of debugging failures is awful"). Example metrics:

Developer satisfaction surveys (NPS-style or Likert scale) Retention and turnover rates Burnout indicators (after-hours work patterns, survey responses) Tool satisfaction ratings

P — Performance

What it measures: The outcomes of the work — not how much was done, but whether what was done achieved its intended result. Why it matters: Activity without outcomes is waste. A team can be very busy (high activity) and still underperform (low performance) if they're working on the wrong things, producing low-quality output, or failing to deliver customer value. Example metrics:

Customer satisfaction / NPS Feature adoption rates Reliability (uptime, error rates) Code quality indicators (defect density, code review quality) Revenue or business KPIs tied to engineering output

A — Activity

What it measures: The count or volume of actions and outputs produced by developers and teams. Why it matters (with caveats): Activity metrics are the most straightforward to collect from systems (commits, PRs, deployments, reviews). They're useful as a component of productivity measurement but dangerous as the primary measure because they incentivize volume over value. The SPACE authors explicitly warn against using activity metrics in isolation. Example metrics:

Number of PRs opened, reviewed, merged Number of commits Number of code reviews completed Number of deployments Number of incidents responded to CI/CD pipeline runs

C — Communication and Collaboration

What it measures: How effectively people and teams share information, coordinate work, review each other's contributions, and work together. Why it matters: Software development is a team sport. Individual velocity means little if coordination overhead is high. Teams with poor communication have longer cycle times, more rework, and more integration conflicts — even if individual developers are productive in isolation. Example metrics:

Code review turnaround time (time from review request to first review) PR review depth (number of review comments, reviewers per PR) Knowledge distribution (bus factor — how many people can work on a given area?) Cross-team PR review frequency Meeting load and interruption frequency

E — Efficiency and Flow

What it measures: Whether developers can do their work with minimal interruptions, delays, and friction. This dimension captures the experience of getting work done — are there unnecessary handoffs, tool-switching, waiting periods, or manual steps? Why it matters: This is the heart of the "developer experience" concept. Two teams with identical DORA metrics can have radically different developer experiences if one team's pipeline is smooth and automated while the other requires manual interventions, workarounds, and waiting. Example metrics:

Time spent waiting (for CI, for reviews, for environments) Handoffs between teams or tools Manual steps in automated workflows Context switches per day "Flow state" time (uninterrupted coding time) Toil and workaround frequency

Sunday, March 29, 2026

As I said in the video, Ford only went EV because the competition threatened its one and only remaining cash cow, the F150. Now that there is no threat, they're getting out of the game, and going all-in on quick profits and the easiest path to get more. Ford reverse engineered some better engineered and built Chinese cars to create the cheapest, easiest EV the Blue Oval could possibly create, and restructure their entire 'assembly line' culture, which will allow them to dump piles of workers and shed liabilities, and make short term profits to jack up their preferred share power.