#code
Public notes from activescott tagged with #code
Monday, December 15, 2025
Sunday, December 14, 2025
Introducing Nested Learning: A new ML paradigm for continual learning
continually updating a model's parameters with new data, often leads to “catastrophic forgetting” (CF), where learning new tasks sacrifices proficiency on old tasks. Researchers traditionally combat CF through architectural tweaks or better optimization rules. However, for too long, we have treated the model's architecture (the network structure) and the optimization algorithm (the training rule) as two separate things, which prevents us from achieving a truly unified, efficient learning system.
By defining an update frequency rate, i.e., how often each component's weights are adjusted, we can order these interconnected optimization problems into "levels." This ordered set forms the heart of the Nested Learning paradigm.
We observed that many standard optimizers rely on simple dot-product similarity (a measure of how alike two vectors are by calculating the sum of the products of their corresponding components) whose update doesn't account for how different data samples relate to each other. By changing the underlying objective of the optimizer to a more standard loss metric, such as L2 regression loss (a common loss function in regression tasks that quantifies the error by summing the squares of the differences between predicted and true values), we derive new formulations for core concepts like momentum, making them more resilient to imperfect data.
In a standard Transformer, the sequence model acts as a short-term memory, holding the immediate context, while the feedforward neural networks act as long-term memory, storing pre-training knowledge. The Nested Learning paradigm extends this concept into what we call a “continuum memory system” (CMS), where memory is seen as a spectrum of modules, each updating at a different, specific frequency rate. This creates a much richer and more effective memory system for continual learning.
"Nested Learning" extends the traditional two-tier memory concept of "attention layers" (short-term memory / context window) and "feed-forward network layers" (long term memory) into a spectrum of modules that update at different rates, some very frequently (like attention), some rarely (like FFNs), and others at various points in between.
Friday, December 12, 2025
webinstall.dev
whalebrew/whalebrew: Homebrew, but with Docker images
Whalebrew creates aliases for Docker images so you can run them as if they were native commands. It's like Homebrew, but with Docker images.
Monday, December 8, 2025
CEL | Common Expression Language
Common Expression Language (CEL) is an expression language that’s fast, portable, and safe to execute in performance-critical applications. CEL is designed to be embedded in an application, with application-specific extensions, and is ideal for extending declarative configurations that your applications might already use.
Thursday, November 20, 2025
Ollama
A llama.cpp-based app for running local models.
GPT4All
A great open source alternative that I used for running llms locally without having to use llama.cpp directly.
ggml-org/llama.cpp: LLM inference in C/C++
node-llama-cpp | Run AI models locally on your machine
Tuesday, November 18, 2025
A developer productivity tooling platform. | moonrepo
Sunday, November 16, 2025
What if you don't need MCP at all?
Wednesday, November 12, 2025
Configuring Web Applications
Explains how to optimize website apps for iOS.
Sunday, November 9, 2025
OpenGraph - Preview Images and Generate Open Graph Meta Tags
SVG to ICO | CloudConvert
Tuesday, November 4, 2025
All Contributors Bot
Display GitHub contributors in README | remarkablemark
Alternatively, you can add .png to the profile url (which acts as a redirect to the avatar image):
https://github.com/remarkablemark.png?size=50