Unsloth AI - Open Source Fine-tuning & RL for LLMs
Easy to use, well documented fine-tuning. NVIDIA optimized with AMD support and Apple M support in the works.
Public notes from activescott tagged with #code
All things code!
Easy to use, well documented fine-tuning. NVIDIA optimized with AMD support and Apple M support in the works.
continually updating a model's parameters with new data, often leads to “catastrophic forgetting” (CF), where learning new tasks sacrifices proficiency on old tasks. Researchers traditionally combat CF through architectural tweaks or better optimization rules. However, for too long, we have treated the model's architecture (the network structure) and the optimization algorithm (the training rule) as two separate things, which prevents us from achieving a truly unified, efficient learning system.
By defining an update frequency rate, i.e., how often each component's weights are adjusted, we can order these interconnected optimization problems into "levels." This ordered set forms the heart of the Nested Learning paradigm.
We observed that many standard optimizers rely on simple dot-product similarity (a measure of how alike two vectors are by calculating the sum of the products of their corresponding components) whose update doesn't account for how different data samples relate to each other. By changing the underlying objective of the optimizer to a more standard loss metric, such as L2 regression loss (a common loss function in regression tasks that quantifies the error by summing the squares of the differences between predicted and true values), we derive new formulations for core concepts like momentum, making them more resilient to imperfect data.
In a standard Transformer, the sequence model acts as a short-term memory, holding the immediate context, while the feedforward neural networks act as long-term memory, storing pre-training knowledge. The Nested Learning paradigm extends this concept into what we call a “continuum memory system” (CMS), where memory is seen as a spectrum of modules, each updating at a different, specific frequency rate. This creates a much richer and more effective memory system for continual learning.
"Nested Learning" extends the traditional two-tier memory concept of "attention layers" (short-term memory / context window) and "feed-forward network layers" (long term memory) into a spectrum of modules that update at different rates, some very frequently (like attention), some rarely (like FFNs), and others at various points in between.
Whalebrew creates aliases for Docker images so you can run them as if they were native commands. It's like Homebrew, but with Docker images.
Common Expression Language (CEL) is an expression language that’s fast, portable, and safe to execute in performance-critical applications. CEL is designed to be embedded in an application, with application-specific extensions, and is ideal for extending declarative configurations that your applications might already use.
A llama.cpp-based app for running local models.
A great open source alternative that I used for running llms locally without having to use llama.cpp directly.
This is all I feed to my agent. It's a handful of tools that cover all the bases for my use case. Each tool is a simple Node.js script that uses Puppeteer Core. By reading that README, the agent knows the available tools, when to use them, and how to use them via Bash.
Explains how to optimize website apps for iOS.
Alternatively, you can add .png to the profile url (which acts as a redirect to the avatar image):
https://github.com/remarkablemark.png?size=50