This feed omits posts by rms. Just 'cause.

Bram Cohen
Manyana

I’m releasing Manyana, a project which I believe presents a coherent vision for the future of version control — and a compelling case for building it.

It’s based on the fundamentally sound approach of using CRDTs for version control, which is long overdue but hasn’t happened yet because of subtle UX issues. A CRDT merge always succeeds by definition, so there are no conflicts in the traditional sense — the key insight is that changes should be flagged as conflicting when they touch each other, giving you informative conflict presentation on top of a system which never actually fails. This project works that out.

Better conflict presentation

One immediate benefit is much more informative conflict markers. Two people branch from a file containing a function. One deletes the function. The other adds a line in the middle of it. A traditional VCS gives you this:

<<<<<<< left
=======
def calculate(x):
    a = x * 2
    logger.debug(f"a={a}")
    b = a + 1
    return b
>>>>>>> right

Two opaque blobs. You have to mentally reconstruct what actually happened.

Manyana gives you this:

<<<<<<< begin deleted left
def calculate(x):
    a = x * 2
======= begin added right
    logger.debug(f"a={a}")
======= begin deleted left
    b = a + 1
    return b
>>>>>>> end conflict

Each section tells you what happened and who did it. Left deleted the function. Right added a line in the middle. You can see the structure of the conflict instead of staring at two blobs trying to figure it out.

What CRDTs give you

CRDTs (Conflict-Free Replicated Data Types) give you eventual consistency: merges never fail, and the result is always the same no matter what order branches are merged in — including many branches mashed together by multiple people working independently. That one property turns out to have profound implications for every aspect of version control design.

Line ordering becomes permanent. When two branches insert code at the same point, the CRDT picks an ordering and it sticks. This prevents problems when conflicting sections are both kept but resolved in different orders on different branches.

Conflicts are informative, not blocking. The merge always produces a result. Conflicts are surfaced for review when concurrent edits happen “too near” each other, but they never block the merge itself. And because the algorithm tracks what each side did rather than just showing the two outcomes, the conflict presentation is genuinely useful.

History lives in the structure. The state is a weave — a single structure containing every line which has ever existed in the file, with metadata about when it was added and removed. This means merges don’t need to find a common ancestor or traverse the DAG. Two states go in, one state comes out, and it’s always correct.

Rebase without the nightmare

One idea I’m particularly excited about: rebase doesn’t have to destroy history. Conventional rebase creates a fictional history where your commits happened on top of the latest main. In a CRDT system, you can get the same effect — replaying commits one at a time onto a new base — while keeping the full history. The only addition needed is a “primary ancestor” annotation in the DAG.

This matters because aggressive rebasing quickly produces merge topologies with no single common ancestor, which is exactly where traditional 3-way merge falls apart. CRDTs don’t care — the history is in the weave, not reconstructed from the DAG.

What this is and isn’t

Manyana is a demo, not a full-blown version control system. It’s about 470 lines of Python which operate on individual files. Cherry-picking and local undo aren’t implemented yet, though the README lays out a vision for how those can be done well.

What it is is a proof that CRDT-based version control can handle the hard UX problems and come out with better answers than the tools we’re all using today — and a coherent design for building the real thing.

The code is public domain. The full design document is in the README.

Subscribe now

Posted
jwz (Jamie Zawinski)
Google Pass, redux
Dear Lazyweb,

Four years ago I asked whether "Google Pass" was a thing that I needed to give a shit about and consensus was, "no, nobody uses that." But I have heard anecdotally, recently, that this might no longer be true. Thoughts?

The goal here is, "reduce the amount of time it takes for someone standing in front of my nightclub to wave their QR code at the door staff." On iOS, Apple Wallet supports that goal very well.

Note: I don't use Android and know as little about its ecosystem as possible, so please use small words.

Previously, previously, previously, previously, previously, previously, previously, previously, previously.

Posted
jwz (Jamie Zawinski)
Ageless Linux
The Ageless Device: A physical computing device designed to satisfy every element of the California Digital Age Assurance Act's regulatory scope while deliberately refusing to comply with its requirements. The device costs less than lunch and will be handed to children.

Configuration Tiers: Three Levels of Infraction

TIER 0, "The Pamphlet" ~$6
Minimum viable violation. A bootable Linux device with a display, network connectivity, and an app store. No battery, no keyboard -- just proof that this constitutes a regulated device under AB 1043. Good for bulk handout at conferences (50-100 units).

Legal status: Arguable. The 128×64 display introduces fuzziness. The AG could claim it's a dev board. That's fine -- ambiguity is instructive too.

TIER 1, "The Computer", ~$12
An unambiguous general purpose computing device. Color display, keyboard, WiFi, Linux, app store, user setup. The core product. There is no interpretive gap between this device and the law's definitions.

Legal status: Unambiguous. This is a computer with a color display, keyboard, WiFi, Linux, and an app store. It does not collect age data. It is handed to a child. The maximum fine is $7,500.

TIER 2, "The Appliance", ~$18
A self-contained, battery-powered pocket Linux computer. The educational device angle -- a modern descendant of the Acorn BBC Micro and the original Raspberry Pi.

Legal status: Beyond unambiguous. A pocket computer with a color screen, keyboard, battery, WiFi, 8GB storage, and an AI accelerator. It costs less than a large pizza. It fits in a child's hand.

Every tab on this site is gold:

How Distros Are Responding: We track how Linux distributions are responding to age verification mandates, and we provide tools to undo whatever they implement. If a distribution adds an age collection prompt, we will publish a script that removes it. If it ships a D-Bus age verification daemon, we will publish a package that replaces it with silence.

How One Bill Becomes Every Bill: AB 1043 was not written in isolation. It is a template. ICMEC published the model text as a ready-to-introduce statutory draft, and its Global Head of Policy presented it directly to Virginia's Joint Commission on Technology and Science. The same organizations that drafted the model bill are now deploying it in state legislatures across the country. The companies that benefit from the compliance moat fund the advocacy organizations that draft the bills that create the compliance moat. [...]

The Door That Stays Open: AB 1043 requires only self-declared age -- a birthdate field, not government ID or biometrics. Industry analysts have described this as "an initial implementation designed to get the door open." Self-declaration today. Biometric verification tomorrow. The infrastructure is the same; only the input changes. Once every operating system has an age collection interface and a real-time API for transmitting age data to applications, upgrading from a text field to a face scan is a configuration change, not a new law.

Penalty Comparison: Cost of Giving a Child a Computer:

Cost of one Ageless Linux device: $12-18
Maximum combined US penalty for one device given to one child: $46,000
US penalty-to-cost ratio: 3,067:1
Brazil penalty for one violation: up to 522,222:1

Previously, previously, previously, previously.

Posted
jwz (Jamie Zawinski)
systemd-censord
*Slow clap*

On the need for a censorship API for legal compliance reasons in some countries and U.S. states

From: FloofyWolf <debian-devel-list@floofywolf.net>
Date: Tue, 3 Mar 2026 20:38:08 -0800
To: debian-devel@lists.debian.org
Cc: xdg@lists.freedesktop.org, ubuntu-devel@lists.ubuntu.com, debian-legal@lists.debian.org, legal@lists.fedoraproject.org, devel@lists.fedoraproject.org
Subject: On the need for a censorship API for legal compliance reasons in some countries and U.S. states

Recently, a proposal has been made to implement an API for a new California censorship regulation, "On the unfortunate need for an "age verification" API for legal compliance reasons in some U.S. states" by Aaron Rainbolt. I believe the approach outlined to be very short-sighted, in that creating a bespoke API for each of the hundreds of government censorship requirements that debian will presumably now be following will result in much duplication of effort and an unreliable user experience in which important censorship restrictions may be missed and not implemented. As such, with people now supporting the idea that debian should implement government censorship requests, even creating new standards if needed, I propose the creation of a censorship framework to speed implementation of current and future censorship regulations. [...]

Systemd units will be created for every desired censorship function, and will be started based on the user's location. For example, a unit for Kazakhstan will implement the government-required backdoor, a unit for China will implement keyword scans and web access blocks (more on this later), a unit for Florida will ban all packages with "trans" in the name (201 packages in current stable distribution), a unit for Oklahoma will ensure all educational software is compliant with the Christian Holy Bible, a unit for the entire United States will prevent installation of any program capable of decoding DVD or BluRay media, and a unit for California will provide the user's age to all applications and all web sites from which applications may be downloaded. As can be seen, multiple units may be started for a given location. [...]

To prevent users from bypassing censorship requirements, debian will need to switch to being a binary-only distribution with signed binaries, signed kernel, and signed kernel modules, with mandatory secureboot, and controls to prevent any non-signed software from being installed, written, or compiled, as any foreign sources of software may fail to query systemd-censord or may fail to respect the permissions it returns.

Previously, previously, previously, previously, previously.

Posted
jwz (Jamie Zawinski)
GitHub Copilot litigation
This suit started in 2022 but seems to still be slogging along:

By training their AI systems on public GitHub repositories (though based on their public statements, possibly much more) we contend that the defendants have violated the legal rights of a vast number of creators who posted code or other work under certain open-source licenses on GitHub. Which licenses? A set of 11 popular open-source licenses that all require attribution of the author's name and copyright, including the MIT license, the GPL, and the Apache license. (These are enumerated in the appendix to the complaint.)

In addition to violating the attribution requirements of these licenses, we contend that the defendants have violated:

  • GitHub's own terms of service and privacy policies;
  • DMCA § 1202, which forbids the removal of copyright-management information;
  • the California Consumer Privacy Act;
  • and other laws giving rise to related legal claims.

Previously, previously.

Posted
jwz (Jamie Zawinski)
Top FEMA Official Claims He Teleported to Waffle House
FEMA's Gregg Phillips says he has experienced multiple "scary" episodes of sudden teleportation:

Phillips spoke "on multiple podcasts" about being teleported against his will, which he has described as "evil." As director of the Office of Response and Recovery, Phillips oversees billions in funds, and is deeply involved in rapid response efforts in the aftermath of disasters.

"Teleporting is no fun," Phillips said last year. "It's no fun because you don't really know what you're doing. You don't really understand it, it's scary, but yet so real. And you know it's happening but you can't do anything about it, and so you just go, you just go with the ride. And wow, what just an incredible adventure it all was."

Phillips in the same interview described "teleporting" to a Waffle House 50 miles away. "I was with my boys one time and I was telling them I was gonna go to Waffle House and get Waffle House," he said. "And I ended up at a Waffle House -- this was in Georgia and I end up at a Waffle House like 50 miles away from where I was."

Now, do not mistake Phillips description for something like a medical episode or a black out of some form. He insisted that he was traveling from location to location without experiencing the passage of time. When his friends asked him where he was, he replied that he was at the "'Waffle House in Rome, Georgia.' And they said, 'That's not possible, you just left here a moment ago.' But it was possible. It was real."

Phillips also claimed that he had once felt his car "lifted up" and teleported forty miles to a ditch near a church. [...]

At FEMA, Phillips, who lacks any sort of professional experience related to disaster response, has been successful in the sense that his lack of qualifications fall in line with the Trump administration's apparent goal of kneecaping the agency.

Previously, previously, previously, previously, previously, previously, previously, previously, previously, previously.

Posted
jwz (Jamie Zawinski)
Suspension Bridges of Disbelief, part 2
VFX artists vs. The Golden Gate:

McMurry recalls plenty of discussion about adhering to any 'real' physics if that event actually happened. "There was a very fun debate about what would happen if the center broke. Some people said, 'Well, these towers, since they're under stress, they would just go immediately back out this way.' We did some tests but then we said, 'Well, that looks stupid.'" [...]

"We were always scaling things up and down, cheating gravity stronger or weaker, just to make everything feel like it had the right scale but wasn't too slow to maintain the excitement. The destruction simulations were tweaked just to look good. They're not based on any mechanical analysis of material strengths."

"The bridge attack is one of the biggest cheats in the whole show," adds Knoll. "The nominal size for the Kaijus and Jaegers was around 250 feet tall, and we tried to stay around that where we could. The roadway of the Golden Gate Bridge is around 230 feet above the water, so the Kaiju in those shots is cheated up to around 700 feet tall!" [...]

Godzilla's interaction with the bridge -- despite ultimately breaking through it -- remains somewhat low-key. "In the movie, notes Bonami, "the cables are grabbed, shaken and cut. In reality if one of those cables were cut, the bridge would swing and if Godzilla were to walk thought it the effect on the bridge would be devastating. Our challenge was therefore to downplay the physics but also to try and maintain realism."

Previously, previously.

Posted
jwz (Jamie Zawinski)
The best privacy money can buy
North Oaks, Minnesota is the only city in the United States that is not on Google Maps Street View. YouTube documentarian Chris Parr, who grew up not too far from North Oaks, set out to change that earlier this year. For a brief few days, he literally put North Oaks on the map. And then it was gone again.

"It's known by Minnesotans as a place where executives and CEOs live," Parr told 404 Media. "Famously Walter Mondale is from North Oaks, but also like United Healthcare executives and Target executives."

North Oaks has managed to largely stay unmapped on Street View because of the way the city handles its streets. In almost every city and town in the United States, property owners give an easement to their local government for the roads in front of their homes (or don't have any claim to the roads at all). In North Oaks, homeowners' property extends into the middle of the street, meaning there is literally no "public" property in the city, and the roads are maintained by the North Oaks Homeowners' Association (NOHOA): "the City owns no roads, land, or buildings." [...]

"Technically, if you launch your drone from public property, which anyone can do if you're a registered drone pilot, you can fly it straight up and above private property," Parr said. And so Parr stood at "six or seven different spots" directly outside the boundary of North Oaks and flew his drone around. "I just pulled my car over onto the shoulder and popped my drone up and flew it over," he added. [...]

"According to North Oak's ordinances, you can go like, visit a friend, or if you're a contractor working on a house, you can go into the city, but you have to be an invited guest," Parr said. "I made a Craigslist post asking for somebody to invite me and I got an absolute ton of responses."

Previously, previously, previously, previously, previously, previously, previously, previously.

Posted
jwz (Jamie Zawinski)
Tinder to AI your dick pix
In a feature the dating app says is set to roll out in the U.S. later this spring, Tinder plans to access users' camera rolls to pick photos and determine what they're into.

"It's up to you to figure out what you're comfortable sharing back with Tinder," Tinder Head of Product Mark Kantor told 404 Media. Still, users can't pick individual photos they want analyzed or ignored. [...]

Tinder has already leaned heavily into AI. Kantor told 404 Media that artificial intelligence is writing more than half the app's code these days.

Previously, previously, previously, previously.

Posted
jwz (Jamie Zawinski)
Today in ACAB: Afroman defeats Officer Lemon Pound Cake
Afroman found not liable in bizarre Ohio defamation case

Afroman did not defame Ohio cops in a satirical music video that featured footage of them fruitlessly raiding the rapper's house, a jury found on Wednesday. [...]

The hip hop star wrote the satirical song "Lemon Pound Cake" and made a music video with real footage of the raid taken from his home surveillance cameras to raise money for property damage caused during the search, he has said.

Seven cops with the sheriff's office then sued him in March 2023, alleging the music video defamed them, invaded their constitutional privacy, and was an intentional infliction of emotional distress.

The video features footage of the cops busting down his door, and of one officer eyeing his "mama's lemon poundcake" with his gun drawn. [...]

An attorney for the police, meanwhile, demanded a total of $3.9 million in damages -- divided among the seven officers involved.

Previously, previously, previously, previously, previously, previously, previously, previously.

Posted
Avery Pennarun
Every layer of review makes you 10x slower

We’ve all heard of those network effect laws: the value of a network goes up with the square of the number of members. Or the cost of communication goes up with the square of the number of members, or maybe it was n log n, or something like that, depending how you arrange the members. Anyway doubling a team doesn't double its speed; there’s coordination overhead. Exactly how much overhead depends on how badly you botch the org design.

But there’s one rule of thumb that someone showed me decades ago, that has stuck with me ever since, because of how annoyingly true it is. The rule is annoying because it doesn’t seem like it should be true. There’s no theoretical basis for this claim that I’ve ever heard. And yet, every time I look for it, there it is.

Here we go:

Every layer of approval makes a process 10x slower

I know what you're thinking. Come on, 10x? That’s a lot. It’s unfathomable. Surely we’re exaggerating.

Nope.

Just to be clear, we're counting “wall clock time” here rather than effort. Almost all the extra time is spent sitting and waiting.

Look:

  • Code a simple bug fix
    30 minutes

  • Get it code reviewed by the peer next to you
    300 minutes → 5 hours → half a day

  • Get a design doc approved by your architects team first
    50 hours → about a week

  • Get it on some other team’s calendar to do all that
    (for example, if a customer requests a feature)
    500 hours → 12 weeks → one fiscal quarter

I wish I could tell you that the next step up — 10 quarters or about 2.5 years — was too crazy to contemplate, but no. That’s the life of an executive sitting above a medium-sized team; I bump into it all the time even at a relatively small company like Tailscale if I want to change product direction. (And execs sitting above large teams can’t actually do work of their own at all. That's another story.)

AI can’t fix this

First of all, this isn’t a post about AI, because AI’s direct impact on this problem is minimal. Okay, so Claude can code it in 3 minutes instead of 30? That’s super, Claude, great work.

Now you either get to spend 27 minutes reviewing the code yourself in a back-and-forth loop with the AI (this is actually kinda fun); or you save 27 minutes and submit unverified code to the code reviewer, who will still take 5 hours like before, but who will now be mad that you’re making them read the slop that you were too lazy to read yourself. Little of value was gained.

Now now, you say, that’s not the value of agentic coding. You don’t use an agent on a 30-minute fix. You use it on a monstrosity week-long project that you and Claude can now do in a couple of hours! Now we’re talking. Except no, because the monstrosity is so big that your reviewer will be extra mad that you didn’t read it yourself, and it’s too big to review in one chunk so you have to slice it into new bite-sized chunks, each with a 5-hour review cycle. And there’s no design doc so there’s no intentional architecture, so eventually someone’s going to push back on that and here we go with the design doc review meeting, and now your monstrosity week-long project that you did in two hours is... oh. A week, again.

I guess I could have called this post Systems Design 4 (or 5, or whatever I’m up to now, who knows, I’m writing this on a plane with no wifi) because yeah, you guessed it. It's Systems Design time again.

The only way to sustainably go faster is fewer reviews

It’s funny, everyone has been predicting the Singularity for decades now. The premise is we build systems that are so smart that they themselves can build the next system that is even smarter, that builds the next smarter one, and so on, and once we get that started, if they keep getting smarter faster enough, then the incremental time (t) to achieve a unit (u) of improvement goes to zero, so (u/t) goes to infinity and foom.

Anyway, I have never believed in this theory for the simple reason we outlined above: the majority of time needed to get anything done is not actually the time doing it. It’s wall clock time. Waiting. Latency.

And you can’t overcome latency with brute force.

I know you want to. I know many of you now work at companies where the business model kinda depends on doing exactly that.

Sorry.

But you can’t just not review things!

Ah, well, no, actually yeah. You really can’t.

There are now many people who have seen the symptom: the start of the pipeline (AI generated code) is so much faster, but all the subsequent stages (reviews) are too slow! And so they intuit the obvious solution: stop reviewing then!

The result might be slop, but if the slop is 100x cheaper, then it only needs to deliver 1% of the value per unit and it's still a fair trade. And if your value per unit is even a mere 2% of what it used to be, you’ve doubled your returns! Amazing.

There are some pretty dumb assumptions underlying that theory; you can imagine them for yourself. Suffice it to say that this produces what I will call the AI Developer’s Descent Into Madness:

  1. Whoa, I produced this prototype so fast! I have super powers!

  2. This prototype is getting buggy. I’ll tell the AI to fix the bugs.

  3. Hmm, every change now causes as many new bugs as it fixes.

  4. Aha! But if I have an AI agent also review the code, it can find its own bugs!

  5. Wait, why am I personally passing data back and forth between agents

  6. I need an agent framework

  7. I can have my agent write an agent framework!

  8. Return to step 1

It’s actually alarming how many friends and respected peers I’ve lost to this cycle already. Claude Code only got good maybe a few months ago, so this only recenlty started happening, so I assume they will emerge from the spiral eventually. I mean, I hope they will. We have no way of knowing.

Why we review

Anyway we know our symptom: the pipeline gets jammed up because of too much new code spewed into it at step 1. But what's the root cause of the clog? Why doesn’t the pipeline go faster?

I said above that this isn’t an article about AI. Clearly I’m failing at that so far, but let’s bring it back to humans. It goes back to the annoyingly true observation I started with: every layer of review is 10x slower. As a society, we know this. Maybe you haven't seen it before now. But trust me: people who do org design for a living know that layers are expensive... and they still do it.

As companies grow, they all end up with more and more layers of collaboration, review, and management. Why? Because otherwise mistakes get made, and mistakes are increasingly expensive at scale. The average value added by a new feature eventually becomes lower than the average value lost through the new bugs it causes. So, lacking a way to make features produce more value (wouldn't that be nice!), we try to at least reduce the damage.

The more checks and controls we put in place, the slower we go, but the more monotonically the quality increases. And isn’t that the basis of continuous improvement?

Well, sort of. Monotonically increasing quality is on the right track. But “more checks and controls” went off the rails. That’s only one way to improve quality, and it's a fraught one.

“Quality Assurance” reduces quality

I wrote a few years ago about W. E. Deming and the "new" philosophy around quality that he popularized in Japanese auto manufacturing. (Eventually U.S. auto manufacturers more or less got the idea. So far the software industry hasn’t.)

One of the effects he highlighted was the problem of a “QA” pass in a factory: build widgets, have an inspection/QA phase, reject widgets that fail QA. Of course, your inspectors probably miss some of the failures, so when in doubt, add a second QA phase after the first to catch the remaining ones, and so on.

In a simplistic mathematical model this seems to make sense. (For example, if every QA pass catches 90% of defects, then after two QA passes you’ve reduced the number of defects by 100x. How awesome is that?)

But in the reality of agentic humans, it’s not so simple. First of all, the incentives get weird. The second QA team basically serves to evaluate how well the first QA team is doing; if the first QA team keeps missing defects, fire them. Now, that second QA team has little incentive to produce that outcome for their friends. So maybe they don’t look too hard; after all, the first QA team missed the defect, it’s not unreasonable that we might miss it too.

Furthermore, the first QA team knows there is a second QA team to catch any defects; if I don’t work too hard today, surely the second team will pick up the slack. That's why they're there!

Also, the team making the widgets in the first place doesn’t check their work too carefully; that’s what the QA team is for! Why would I slow down the production of every widget by being careful, at a cost of say 20% more time, when there are only 10 defects in 100 and I can just eliminate them at the next step for only a 10% waste overhead? It only makes sense. Plus they'll fire me if I go 20% slower.

To say nothing of a whole engineering redesign to improve quality, that would be super expensive and we could be designing all new widgets instead.

Sound like any engineering departments you know?

Well, this isn’t the right time to rehash Deming, but suffice it to say, he was on to something. And his techniques worked. You get things like the famous Toyota Production System where they eliminated the QA phase entirely, but gave everybody an “oh crap, stop the line, I found a defect!” button.

Famously, US auto manufacturers tried to adopt the same system by installing the same “stop the line” buttons. Of course, nobody pushed those buttons. They were afraid of getting fired.

Trust

The basis of the Japanese system that worked, and the missing part of the American system that didn’t, is trust. Trust among individuals that your boss Really Truly Actually wants to know about every defect, and wants you to stop the line when you find one. Trust among managers that executives were serious about quality. Trust among executives that individuals, given a system that can work and has the right incentives, will produce quality work and spot their own defects, and push the stop button when they need to push it.

But, one more thing: trust that the system actually does work. So first you need a system that will work.

Fallibility

AI coders are fallible; they write bad code, often. In this way, they are just like human programmers.

Deming’s approach to manufacturing didn’t have any magic bullets. Alas, you can’t just follow his ten-step process and immediately get higher quality engineering. The secret is, you have to get your engineers to engineer higher quality into the whole system, from top to bottom, repeatedly. Continuously.

Every time something goes wrong, you have to ask, “How did this happen?” and then do a whole post-mortem and the Five Whys (or however many Whys are in fashion nowadays) and fix the underlying Root Causes so that it doesn’t happen again. “The coder did it wrong” is never a root cause, only a symptom. Why was it possible for the coder to get it wrong?

The job of a code reviewer isn't to review code. It's to figure out how to obsolete their code review comment, that whole class of comment, in all future cases, until you don't need their reviews at all anymore.

(Think of the people who first created "go fmt" and how many stupid code review comments about whitespace are gone forever. Now that's engineering.)

By the time your review catches a mistake, the mistake has already been made. The root cause happened already. You're too late.

Modularity

I wish I could tell you I had all the answers. Actually I don’t have much. If I did, I’d be first in line for the Singularity because it sounds kind of awesome.

I think we’re going to be stuck with these systems pipeline problems for a long time. Review pipelines — layers of QA — don’t work. Instead, they make you slower while hiding root causes. Hiding causes makes them harder to fix.

But, the call of AI coding is strong. That first, fast step in the pipeline is so fast! It really does feel like having super powers. I want more super powers. What are we going to do about it?

Maybe we finally have a compelling enough excuse to fix the 20 years of problems hidden by code review culture, and replace it with a real culture of quality.

I think the optimists have half of the right idea. Reducing review stages, even to an uncomfortable degree, is going to be needed. But you can’t just reduce review stages without something to replace them. That way lies the Ford Pinto or any recent Boeing aircraft.

The complete package, the table flip, was what Deming brought to manufacturing. You can’t half-adopt a “total quality” system. You need to eliminate the reviews and obsolete them, in one step.

How? You can fully adopt the new system, in small bites. What if some components of your system can be built the new way? Imagine an old-school U.S. auto manufacturer buying parts from Japanese suppliers; wow, these parts are so well made! Now I can start removing QA steps elsewhere because I can just assume the parts are going to work, and my job of "assemble a bigger widget from the parts" has a ton of its complexity removed.

I like this view. I’ve always liked small beautiful things, that’s my own bias. But, you can assemble big beautiful things from small beautiful things.

It’s a lot easier to build those individual beautiful things in small teams that trust each other, that know what quality looks like to them. They deliver their things to customer teams who can clearly explain what quality looks like to them. And on we go. Quality starts bottom-up, and spreads.

I think small startups are going to do really well in this new world, probably better than ever. Startups already have fewer layers of review just because they have fewer people. Some startups will figure out how to produce high quality components quickly; others won't and will fail. Quality by natural selection?

Bigger companies are gonna have a harder time, because their slow review systems are baked in, and deleting them would cause complete chaos.

But, it’s not just about company size. I think engineering teams at any company can get smaller, and have better defined interfaces between them.

Maybe you could have multiple teams inside a company competing to deliver the same component. Each one is just a few people and a few coding bots. Try it 100 ways and see who comes up with the best one. Again, quality by evolution. Code is cheap but good ideas are not. But now you can try out new ideas faster than ever.

Maybe we’ll see a new optimal point on the monoliths-microservices continuum. Microservices got a bad name because they were too micro; in the original terminology, a “micro” service was exactly the right size for a “two pizza team” to build and operate on their own. With AI, maybe it's one pizza and some tokens.

What’s fun is you can also use this new, faster coding to experiment with different module boundaries faster. Features are still hard for lots of reasons, but refactoring and automated integration testing are things the AIs excel at. Try splitting out a module you were afraid to split out before. Maybe it'll add some lines of code. But suddenly lines of code are cheap, compared to the coordination overhead of a bigger team maintaining both parts.

Every team has some monoliths that are a little too big, and too many layers of reviews. Maybe we won't get all the way to Singularity. But, we can engineer a much better world. Our problems are solvable.

It just takes trust.

Bram Cohen
AI thoughts

Since nobody reads to the end of my posts I’ll start this one with the actionable experiment:

Deep neural network have a fundamental problem. The thing which makes them able to be trained also makes them susceptible to Manchurian Candidate type attacks where you say the right gibberish to them and it hijacks their brain to do whatever you want. They’re so deeply susceptible to this that it’s a miracle they do anything useful at all, but they clearly do and mostly people just pretend this problem is academic when using them in the wild even though the attacks actually work.

There’s a loophole to this which it might be possible to make reliable: thinking. If an LLM spends time talking to itself then it might be possible for it to react to a Manchurian Candidate attack by initially being hijacked but then going ‘Wait, what am I talking about?’ and pulling itself together before giving its final answer. This is a loophole because the final answer changes chaotically with early word selection so it can’t be back propagated over.

This is something which should explicitly be trained for. During training you can even cheat and directly inject adversarial state without finding a specific adversarial prompt which causes that state. You then get its immediate and post-thinking answers to multiple choice questions and use reinforcement learning to improve its accuracy. Make sure to also train on things where it gets the right answer immediately so you aren’t just training to always change its answer. LLMs are sneaky.

Now on to rambling thoughts.

Some people nitpicked that in my last post I was a little too aggressive not including normalization between layers and residuals, which is fair enough, they are important and possibly necessary details which I elided (although I did mention softmax), but they most definitely play strictly within the rules and the framework given, which was the bigger point. It’s still a circuit you can back propagate over. There’s a problem with online discourse in general, where people act like they’ve debunked an entire thesis if any nitpick can be found, even if it isn’t central to the thesis or the nitpick is over a word fumbling or simplification or the adjustment doesn’t change the accuracy of the thesis at all.

It’s beautifully intuitive how the details of standard LLM circuits fit together: Residuals stop gradient decay. Softmax stops gradient explosion. Transformers cause diffusion. Activation functions add in nonlinearity. There’s another big benefit of residuals which I find important but most people don’t worry about: If you just did a matrix multiplication then all permutations of the outputs would be isomorphic and have valid encodings effectively throwing away log(N!) bits from the weights which is a nontrivial loss. Residuals give an order and make the permutations not at all isomorphic. One quirk of the vernacular is that there isn’t a common term for the reciprocal of the gradient, the size of training adjustments, which is the actual problem. When you have gradient decay you have adjustment explosion and the first layer weights become chaotic noise. When you have gradient explosion you have adjustment decay and the first layer weights are frozen and unchanging. Both are bad for different reasons.

There are clear tradeoffs between fundamental limitations and practical trainability. Simple DNNs get mass quantities of feedback but have slightly mysterious limitations which are terrifying. Thinking has slightly less limitations at the cost of doing the thinking both during running and training where it only gets one unit of feedback per entire session instead of per word. Genetic algorithms have no limitations on the kinds of functions then can handle at all at the cost of being utterly incapable of utilizing back propagation. Simple mutational hill climbing has essentially no benefit over genetic algorithms.

On the subject of activation functions, sometimes now people use Relu^2 which seems directly against the rules and only works by ‘divine benevolence’. There must be a lot of devil in the details in that its non-scale-freeness is leveraged and everything is normed to make the values mostly not go above 1 so there isn’t too much gradient growth. I still maintain trying Reluss is an experiment worth doing.

Some things about the structure of LLMs are bugging me (This is a lot fuzzier and more speculative than the above). In the later layers the residuals make sense but for the first few they’re forcing it to hold onto input information in its brain while it’s trying to form more abstract thoughts so it’s going to have to arbitrarily pick some bits to sacrifice. Of course the actual inputs to an LLM have special handling so this may not matter, at least not for the main part of everything. But that raises some other points which feel off. The input handling being special is a bit weird, but maybe reasonable. It still has the property that in practice the input is completely jamming the first layer for a simply practical reason: The ‘context window’ is basically the size of its brain, and you don’t have to literally overwhelm the whole first layer with it, but if you don’t you’re missing out on potentially useful content, so in practice people overwhelm its brain and figure the training will make it make reasonable tradeoffs on which tokens it starts ignoring, although I suspect in practice it somewhat arbitrarily picks token offsets to just ignore so it has some brain space to think. It also feels extremely weird that it has special weights for all token offset. While the very last word is special and the one before that less so, that goes down quickly and it seems wrong that the weights related the hundredth to hundred and first token back are unrelated to the weights related to the hundred and first and hundred and second token back. Those should be tied together so it’s getting trained as one thing. I suspect that some of that is redundant and inefficient and some of it it is again ignoring parts of the input so it has brain space to think.

Subscribe now

Posted
Bram Cohen
There's Only One Idea In AI

In 1995 someone could have written a paper which went like this (using modern vernacular) and advanced the field of AI by decades:

The central problem with building neural networks is training them when they’re deeper than two layers due to gradient descent and gradient decay. You can get around this problem by building a neural network which has N values at each layer which are then multiplied by an NxN matrix of weights and have Relu applied to them afterwards. This causes the derivative of effects on the last layer to be proportionate with the effects on the first layer no matter how deep the neural network is. This represents a quirky family of functions whose theoretical limitations are mysterious but demonstrably work well for simple problems in practice. As computers get faster it will be necessary to use a sub-quadratic structures for the layers.

History being the quirky thing that it is what actually happened is decades later the seminal paper on those sub-quadratic structures happened to stumble across making everything sublinear and as a result people are confused as to which is actually the core insight. But the structure holds: In a deep neural network, you stick to relu, softmax, sigmoid, sin, and other sublinear functions and magically can train neural networks no matter how deep they are.

There are two big advantages which digital brains have over ours: First, they can be copied perfectly for free, and second, as long as they haven’t diverged too much the results of training them can be copied from one to another. Instead of a million individuals with 20 years experience you get a million copies of one individual with 20 million years of experience. The amount of training data current we humans need to become useful is miniscule compared to current AI but they have the advantage of sheer scale.

Subscribe now

Posted
Greg Kroah-Hartman
Linux CVE assignment process

As described previously, the Linux kernel security team does not identify or mark or announce any sort of security fixes that are made to the Linux kernel tree. So how, if the Linux kernel were to become a CVE Numbering Authority (CNA) and responsible for issuing CVEs, would the identification of security fixes happen in a way that can be done by a volunteer staff? This post goes into the process of how kernel fixes are currently automatically assigned to CVEs, and also the other “out of band” ways a CVE can be issued for the Linux kernel project.

Posted
Bram Cohen
Chords And Microtonality

When playing melodies the effects of microtonality are a bit disappointing. Tunes are still recognizable when played ‘wrong’. The effects are much more dramatic when you play chords:

You can and should play with an interactive version of this here. It’s based off this and this with labels added by me. The larger gray dots are standard 12EDO (Equal Divisions of the Octave) positions and the smaller dots are 24EDO. There are a lot of benefits of going with 24EDO for microtonality. It builds on 12EDO as a foundation, in the places where it deviates it’s as microtonal as is possible, and it hits a lot of good chords.

Unrelated to that I’d like to report on an experiment of mine which failed. I had this idea that you could balance the volumes of dissonant notes to make dyads consonant in unusual places. It turns out this fails because the second derivative of dissonance curves is negative everywhere except unity. This can’t possibly be a coincidence. If you were to freehand something which looks like dissonance curves it wouldn’t have this property. Apparently the human ear uses positions where the second derivative of dissonance is positive to figure out what points form the components of a sound and looks for patterns in those to find complete sounds.

Subscribe now

Posted
Bram Cohen
A Legendary Poker Hand and A Big Poker Tell

Here’s the story of a legendary poker hand:

Our hero decides to play with 72, which is the worst hand in Holdem and theory says he was supposed to have folded but he played it anyway.

Later he bluffed all-in with 7332 on the board and the villain was thinking about whether to call. At this point our hero offered a side bet: For a fee you can look at one of my hole cards of your choice. The villain paid the fee and happened to see the 2, at which point he incorrectly deduced that the hero must have 22 as his hole cards and folded.

What’s going on here is that the villain had a mental model which doesn’t include side bets. It may have been theoretically wrong to play 72, but in a world where side bets are allowed and the opponent’s mental model doesn’t include them it can be profitable. The reveal of information in this case was adversarial. The fee charged for it was misdirection to make the opponent think that it was a tradeoff for value rather than information which the hero wanted to give away.

What the villain should have done was think through this one level deeper. Why is my opponent offering this at all? Under what situations would they come up with it? Even without working through the details there’s a much simpler heuristic for cutting through everything: There’s a general poker tell that if you’re considering what to do and your opponent starts talking about the hand that suggests that they want you to fold. A good rule of thumb is that if you’re thinking and the opponent offers some cockamamie scheme you should just call. That certainly would have worked in this case. This seems like a rule which applies in general in life, not just in Poker.

Subscribe now

Posted
Bram Cohen
Camper Vehicles

Let’s say you wanted an offroad vehicle which rather than being a car-shaped cowboy hat was actually useful for camping. How would it be configured?

The way people really into camping approach the process is very strange to normal people and does a negative job of marketing it. You drive to the campground in a perfectly good piece of shelter and then pitch a tent. Normal people aren’t there to rough it, they’re there to enjoy nature, and sleeping in one’s car is a much more reasonable approach.

To that end a camper vehicle should have built-in insulation, motorized roll-up window covers, and fold-up rear seats. You drive to the campground, press the button for privacy on the windows, fold up the seats, and bam, you’re all set.

It should have a big electric battery with range extender optimized for charging overnight. The waste heat during the charging process can keep the vehicle warm while you sleep in it.

Roughly 8 inch elevation off the ground and a compliant suspension designed for comfort on poorly maintained roads rather than feeling sporty.

Compact hatchback form with boxy styling. Hatchbacks are already boxy to begin with and a flat front windshield works well with window covers so it’s both functional and matches the aesthetics.

Available modular fridge, induction plate, and water heater. With custom connectors to the car’s battery the electric cooking elements could ironically be vastly better than the ones in your kitchen.

Unfortunately having a built-in shower or toilet is impossible in a compact but the above features might be enough to make it qualify as a camper van which you’re allowed to live in. They’d at least make it practical to inconspicuously live in one’s car and shower at a gym.

Subscribe now

Posted