activescott's Notes

Public notes from activescott

Tuesday, December 9, 2025

Sphinx is cray cray:

Inside Python object description directives, reStructuredText field lists with these fields are recognized and formatted nicely:

param, parameter, arg, argument, key, keyword: Description of a parameter.

type: Type of a parameter. Creates a link if possible.

raises, raise, except, exception: That (and when) a specific exception is raised.

var, ivar, cvar: Description of a variable.

vartype: Type of a variable. Creates a link if possible.

returns, return: Description of the return value.

rtype: Return type. Creates a link if possible.

meta: Add metadata to description of the python object. The metadata will not be shown on output document. For example, :meta private: indicates the python object is private member. It is used in sphinx.ext.autodoc for filtering members.
#

Monday, December 8, 2025

MLPerf Client is a benchmark developed collaboratively at MLCommons to evaluate the performance of large language models (LLMs) and other AI workloads on personal computers–from laptops and desktops to workstations. By simulating real-world AI tasks it provides clear metrics for understanding how well systems handle generative AI workloads. The MLPerf Client working group intends for this benchmark to drive innovation and foster competition, ensuring that PCs can meet the challenges of the AI-powered future.

Common Expression Language (CEL) is an expression language that’s fast, portable, and safe to execute in performance-critical applications. CEL is designed to be embedded in an application, with application-specific extensions, and is ideal for extending declarative configurations that your applications might already use.

We introduce the Berkeley Function Calling Leaderboard (BFCL), the first comprehensive and executable function call evaluation dedicated to assessing Large Language Models' (LLMs) ability to invoke functions. Unlike previous evaluations, BFCL accounts for various forms of function calls, diverse scenarios, and executability.

In 2024, SWE-bench & SWE-agent helped kickstart the coding agent revolution.

We now ask: What if SWE-agent was 100x smaller, and still worked nearly as well?

mini is for

Researchers who want to benchmark, fine-tune or RL without assumptions, bloat, or surprises
Developers who like their tools like their scripts: short, sharp, and readable
Engineers who want something trivial to sandbox & to deploy anywhere

Here's some details:

Minimal: Just 100 lines of python (+100 total for env, model, script) — no fancy dependencies!
Powerful: Resolves >74% of GitHub issues in the SWE-bench verified benchmark (leaderboard).
Convenient: Comes with UIs that turn this into your daily dev swiss army knife!
Deployable: In addition to local envs, you can use docker, podman, singularity, apptainer, and more
Tested: Codecov
Cutting edge: Built by the Princeton & Stanford team behind SWE-bench and SWE-agent.

Rnj-1 is an 8B model that roughly follows the open-source Gemma 3 architecture. We employ global self-attention and YaRN to extend the context to 32k. The Rnj-1 Base and Instruct models compare favorably against similarly sized open weight models.

Rnj-1 Instruct dominates the pack on Agentic coding, one of our target abilities. SWE bench performance is indicative of the model's ability to tackle everyday software engineering tasks. We are an order of magnitude stronger than comparably sized models on SWE-bench and approach the capabilities available in much larger models (leaderboard: SWE-bench-Verified bash-only).

Sunday, December 7, 2025

It is beyond belief that this is happening and tolerated in this country. 😭

The National Park Service will offer free admission to U.S. residents on President Donald Trump’s birthday next year — which also happens to be Flag Day — but is eliminating the benefit for Martin Luther King Jr. Day and Juneteenth.

#

Saturday, December 6, 2025

RAILGUN is code that exists on every Ethereum node. It's a privacy system built directly on-chain for Ethereum, BSC, Polygon, and Arbitrum. It uses Zero-Knowledge (ZK) cryptography to enable private use of smart contracts and DeFi, all without leaving the security of the user’s preferred chain. The RAILGUN code has no owner. Interactions on your chain of choice are made private.

The RAIL token is purely a governance token and is not a privacy coin. Holding RAIL is not necessary to use the protocol and it does not confer any rights to holders. You can read more about governance here.

RAILGUN users have access to a special 0zk address that are confidential on Etherscan, Arbiscan, or any similar resource. Independent wallet providers can use the RAILGUN protocol, head here to select from a list of wallet providers. RAILGUN is 100% non-custodial.

The user experience is similar to using a public wallet to interact with Ethereum/EVM chains, just with the added ability to interact privately.

As RAILGUN is simply on-chain smart contract code, privacy is achieved without the need to move to a separate chain.

Do Trump and the Republican's just hate the earth? Apparently nobody even wants to drill for oil there. So why is this such a priority?

The U.S. Senate is about to vote on a resolution to toss ex-President Biden’s limits on oil and gas leasing in the Arctic National Wildlife Refuge and ensure nothing like it is imposed again. ... Congress and the Trump administration have already nullified the Biden limits on leasing in the Arctic Refuge. But the latest nullification method uses the Congressional Review Act. That means a future president could not impose substantially similar limits without an act of Congress.

Sen. Martin Heinrich, D-N.M., spoke against the resolution. An outdoorsman who has travelled to the region, Heinrich described the refuge as a breathtaking wilderness that’s vital for hundreds of species of birds and wildlife.

““The Arctic Refuge is the crown jewel of our National Wildlife Refuge System, and it belongs to every single American,” he said. “It deserves our protection.”

Market forces may, in effect, provide that protection. No major oil companies bid when the first Trump administration held an ANWR lease sale in 2021. A lease sale during the Biden administration, with more restrictive conditions imposed, drew no bids at all.

The current MOU, negotiated by the Obama administration and providing for Israel to receive $38 billion in weapons, expires in FY2028. According to recent media reports, initial negotiations began recently between US and Israeli officials for the next MOU, with Israel proposing that its term be extended to 20 years.

Israel is also reportedly seeking even greater annual appropriations of weapons than the $3.8 billion outlined in the current MOU, meaning that taxpayers could be on the hook for providing Israel with $76 billion of weapons at a bare minimum if the Trump administration accedes to Israel’s requests.

Friday, December 5, 2025

Thursday, December 4, 2025

Wednesday, December 3, 2025

Some interesting subtle things he ever so briefly mentions that I think are notable:

  • "Everybody uses all the products": This translates to each person deeply knows what each product does, it's use cases and how users use each product because they are users of the product. He mentioned this in the context of "developers just commit to other products" - They will just download the repo and submit a PR. He mentions the value of Claude in that process, which I know it is, but takes for granted the value of knowing the product.
  • While I'm sure these products have complex coding challenges, they're all well defined and narrowly scoped. I think it's much harder to describe a complex application or set of applications using proprietary services with sometimes odd design choices, and integrating with external proprietary services. With that said, I find AI to be exceptional at helping to understand complex code across many services and frontend components - maybe even more valuable than writing the code. However, it still is non-trivial. It also doesn't help with knowing what to build for your customer.