This feed omits posts by rms. Just 'cause.

Please enjoy jwz mixtape 228.

Posted Sun Jun 13 16:57:46 2021 Tags:
Diver Nearly Swallowed By Humpback Whale

"I was in his closed mouth for about 30 to 40 seconds before he rose to the surface and spit me out," Packard wrote. "I am very bruised up but have no broken bones." [...]

"To be clear, the whale did not want him in its mouth," Kerr added, comparing the situation to an open-mouthed biker accidentally inhaling a fly.

Previously, previously, previously, previously, previously, previously, previously, previously, previously, previously.

Posted Sun Jun 13 06:06:50 2021 Tags:
"Ultimate Slip 'N Slide" Production Shut Down Over Explosive Diarrhea.

The upcoming TV competition series has shut down indefinitely after up to 40 crew members fell violently ill during production on a remote ranch in Simi Valley, California. The outbreak of "awful explosive diarrhea" left people "collapsing" on set and "being forced to run into port-o-potties." [...] According to an insider, many original crew members are reluctant to return to the shoot and blame the production for not ensuring that fresh, clean water was used on set.

Plus side, maybe Ow My Balls will get picked up for a full season.

Previously, previously, previously, previously, previously, previously, previously.

Posted Sat Jun 12 17:23:49 2021 Tags:
I always enjoy it when people tell on themselves with their font changes, making it clear that they have sent a hastily-edited form letter rather than a sincere one-to-one communication. Here's what a recent email looked like when viewed in Mail on iOS:

And here's what it looked like in Mail on macOS. The difference is more subtle, but the text of [MAJOR METROPOLITAN AREA] is still smaller (capital letters are 25px instead of 28px).

Also one of them is Arial and the other is Helvetica. Now, I know what you're thinking -- normally when someone makes that distinction you know that you're dealing with some kind of font-hipster super-nerd. This is true. But when they're side by side, sometimes even normal people can tell them apart!

Here's an easy way to screw up like this on macOS:

  • Compose an email in Mail.app.
  • Format / Make Rich Text.
  • Type stuff.
  • Open Notes.app and copy stuff.
  • Paste.

You will almost certainly get different fonts, because you didn't do "Paste and Match Style".

"Make Plain Text" also works, because nobody needs to see the damned fonts in your damned email in the first place (sorry not sorry).

And it's not just Apple; the Google and MICROS~1 tool sets all fuck up in nearly identical ways. Besides the highly visible breadcrumbs that they reveal through this kind of font fuckery, view source also reveals many other edits. You tend to get a new DIV or SPAN every time you paste something or join paragraphs, and that reveals clues about your editing process, information that you probably didn't intend the recipient to have.

I guess it's unsurprising that the people writing these rich-text editors screw up in this way -- it's probably inconceivable to them to be concerned about the on-the-wire representation of the data they are emitting, beyond whether it displays correctly for them personally. For other examples of this lack of craftsmanship just look at how pathetically every mail reader encodes quoted-printable -- a format intentionally designed to be readable by humans in the raw, but now universally treated as a binary blob. Or the criminally incompetent manner in which they all convert HTML to text/plain in multipart/alternative. Or, gestures wildly at the entirety of XML and JSON.

But the font size changes are surprising, because it's so obvious. It makes you look like you screwed up. Surely people have complained about this. And by "people" I mean "people who matter", you know, wealthy clients.

Previously, previously, previously, previously, previously, previously, previously, previously, previously.

Posted Fri Jun 11 17:47:30 2021 Tags:
XScreenSaver 6.01 is out now. This is an X11-only maintenance release, so there are no macOS, iOS or Android packages this time around.

The 6.00 refactor of the X11 XScreenSaver daemon has so far proven to be very stable and secure! The bugs fixed in this release are relatively minor. However:

  • If you are using 6.00, I strongly recommend upgrading to 6.01.
  • If you are using 5.xx, I strongly, strongly recommend upgrading to 6.01.

Sadly, due to recent catastrophic decision-making on the part of certain Linux distros, I was forced to add the following section to the XScreenSaver manual:

THE WAYLAND PROBLEM

Wayland is a completely different window system that is intended to replace X11. After 13+ years of trying, some Linux distros have finally begun enabling it by default. Most deployments of it also include XWayland, which is a compatibility layer that allows some X11 programs to continue to work within a Wayland environment.

Unfortunately, XScreenSaver is not one of those programs.

If your system is running XWayland, XScreenSaver will malfunction in two ways:

  1. It will be unable to detect user activity in non-X11 programs. This means that while a native Wayland program is selected, XScreenSaver will think that you are idle, and may blank the screen prematurely.

  2. It will be unable to lock the screen. This is because X11 grabs don't work properly under XWayland, so there is no way for XScreenSaver to prevent the user from switching away from the screen locker to another application.

If you are aware of some way for XScreenSaver to reliably detect user activity under Wayland, do let me know. Maybe there is some dbus/systemd signal that I have missed?

Now that the XScreenSaver 6.x daemon has been sandboxed and massively reduced in size, it might be plausible for someone to rewrite xscreensaver.c to speak Wayland instead of X11, to run inside the Wayland compositor, and then to launch the X11 programs xscreensaver-auth and xscreensaver-gfx as needed. But that someone will not be me.

Previously, previously.

Posted Wed Jun 9 16:34:45 2021 Tags:
One of life's simple joys is pointing a video camera at a TV. We should all do that more often. We recently got a screaming deal on this gargantuan 75" TV, so we turned it into an instagram trap under the main stairs. I think it's going to be very popular! "What software does it use?" None, it uses none software.

Video feedback works better if it stays in the analog domain, as with 30+ year old camcorders and CRTs rather than that four letter word "HDMI", but one does what one can.

Hey, here's an interior decoration project maybe one of you could help us out with. The Green Room, that room to the South of the Lounge that you enter from the stage-left stairs, has always been less inviting than we'd like. People treat it as more of a hallway than as a place to hang out. I think it could use a bunch of overstuffed thrift-shop couches and chairs -- really gaudy stuff -- and also a shitload of chandeliers. It has a surprisingly high ceiling, and I think that filling that space with dozens of chandeliers might look pretty cool.

So, if you are a person who already has the hobby of frequenting thrift shops and estate sales, I'd love to give you a small monthly budget for broken down chandeliers and couches to dress up that room. (The couches, I assume, won't long survive contact with the enemy and will need to be periodically replenished.)

Posted Sun Jun 6 17:34:46 2021 Tags:


Sometimes you just have to ask the right question. In all the excitement I forgot why I asked specifically this, but I duckducked "use maven central in python" and I ran into scyjava. The documentation was not extensive, but installed was the logical pip install scyjava

And the cool thing is, the Groovy code examples can be converted without big changes.

    from scyjava import config, jimport

    config.add_endpoints(

        'io.github.egonw.bacting:managers-cdk:0.0.16'

    )

    workspaceRoot = "."

    cdkClass = jimport("net.bioclipse.managers.CDKManager")

    cdk = cdkClass(workspaceRoot)

    print(cdk.fromSMILES("COC"))

The first two instructions are the equivalent of the @Grab in Groovy. The next three instructions are defining the Bacting manager, and the last line shows how to use the Chemistry Development Kit in Python! 

Posted Sat Apr 3 09:54:00 2021 Tags:

TL;DR: There should be an option, taproot=lockintrue, which allows users to set lockin-on-timeout to true. It should not be the default, though.

As stated in my previous post, we need actual consensus, not simply the appearance of consensus. I’m pretty sure we have that for taproot, but I would like a template we can use in future without endless debate each time.

  • Giving every group a chance to openly signal for (or against!) gives us the most robust assurance that we actually have consensus. Being able to signal opposition is vital, since everyone can lie anyway; making opposition difficult just reduces the reliability of the signal.
  • Developers should not activate. They’ve tried to assure themselves that there’s broad approval of the change, but that’s not really a transferable proof. We should be concerned about about future corruption, insanity, or groupthink. Moreover, even the perception that developers can set the rules will lead to attempts to influence them as Bitcoin becomes more important. As a (non-Bitcoin-core) developer I can’t think of a worse hell myself, nor do we want to attract developers who want to be influenced!
  • Miner activation is actually brilliant. It’s easy for everyone to count, and majority miner enforcement is sufficient to rely on the new rules. But its real genius is that miners are most directly vulnerable to the economic majority of users: in a fork they have to pick sides continuously knowing that if they are wrong, they will immediately suffer economically through missed opportunity cost.
  • Of course, economic users are ultimately in control. Any system which doesn’t explicitly encode that is fragile; nobody would argue that fair elections are unnecessary because if people were really dissatisfied they could always overthrow the government themselves! We should make it as easy for them to exercise this power as possible: this means not requiring them to run unvetted or home-brew modifications which will place them at more risk, so developers need to supply this option (setting it should also change the default User-Agent string, for signalling purposes). It shouldn’t be an upgrade either (which inevitably comes with other changes). Such a default-off option provides both a simple method, and a Schelling point for the lockinontimeout parameters. It also means much less chance of this power being required: “Si vis pacem, para bellum“.

This triumverate model may seem familiar, being widely used in various different governance systems. It seems the most robust to me, and is very close to what we have evolved into already. Formalizing it reduces uncertainty for any future changes, as well.

Posted Fri Feb 26 02:17:10 2021 Tags:

Bitcoin’s consensus rules define what is valid, but this isn’t helpful when we’re looking at changing the rules themselves. The trend in Bitcoin has been to make such changes in an increasingly inclusive and conservative manner, but we are still feeling our way through this, and appreciating more nuance each time we do so.

To use Bitcoin, you need to remain in the supermajority of consensus on what the rules are. But you can never truly know if you are. Everyone can signal, but everyone can lie. You can’t know what software other nodes or miners are running: even expensive testing of miners by creating an invalid block only tests one possible difference, may still give a false negative, and doesn’t mean they can’t change a moment later.

This risk of being left out is heightened greatly when the rules change. This is why we need to rely on multiple mechanisms to reassure ourselves that consensus will be maintained:

  1. Developers assure themselves that the change is technically valid, positive and has broad support. The main tools for this are open communication, and time. Developers signal support by implementing the change.
  2. Users signal their support by upgrading their nodes.
  3. Miners signal their support by actually tagging their blocks.

We need actual consensus, not simply the appearance of consensus. Thus it is vital that all groups know they can express their approval or rejection, in a way they know will be heard by others. In the end, the economic supermajority of Bitcoin users can set the rules, but no other group or subgroup should have inordinate influence, nor should they appear to have such control.

The Goodwill Dividend

A Bitcoin community which has consensus and knows it is not only safest from a technical perspective: the goodwill and confidence gives us all assurance that we can make (or resist!) changes in future.

It will also help us defend against the inevitable attacks and challenges we are going to face, which may be a more important effect than any particular soft-fork feature.

Posted Thu Feb 18 03:29:02 2021 Tags:

Last year I wrote about how to create a user-specific XKB layout, followed by a post explaining that this won't work in X. But there's a pandemic going on, which is presumably the only reason people haven't all switched to Wayland yet. So it was time to figure out a workaround for those still running X.

This Merge Request (scheduled for xkeyboard-config 2.33) adds a "custom" layout to the evdev.xml and base.xml files. These XML files are parsed by the various GUI tools to display the selection of available layouts. An entry in there will thus show up in the GUI tool.

Our rulesets, i.e. the files that convert a layout/variant configuration into the components to actually load already have wildcard matching [1]. So the custom layout will resolve to the symbols/custom file in your XKB data dir - usually /usr/share/X11/xkb/symbols/custom.

This file is not provided by xkeyboard-config. It can be created by the user though and whatever configuration is in there will be the "custom" keyboard layout. Because xkeyboard-config does not supply this file, it will not get overwritten on update.

From XKB's POV it is just another layout and it thus uses the same syntax. For example, to override the +/* key on the German keyboard layout with a key that produces a/b/c/d on the various Shift/Alt combinations, use this:


default
xkb_symbols "basic" {
include "de(basic)"
key <AD12> { [ a, b, c, d ] };
};
This example includes the "basic" section from the symbols/de file (i.e. the default German layout), then overrides the 12th alphanumeric key from left in the 4th row from bottom (D) with the given symbols. I'll leave it up to the reader to come up with a less useful example.

There are a few drawbacks:

  • If the file is missing and the user selects the custom layout, the results are... undefined. For run-time configuration like GNOME it doesn't really matter - the layout compilation fails and you end up with the one the device already had (i.e. the default one built into X, usually the US layout).
  • If the file is missing and the custom layout is selected in the xorg.conf, the results are... undefined. I tested it and ended up with the US layout but that seems more by accident than design. My recommendation is to not do that.
  • No variants are available in the XML files, so the only accessible section is the one marked default.
  • If a commandline tool uses a variant of custom, the GUI will not reflect this. If the GUI goes boom, that's a bug in the GUI.

So overall, it's a hack[2]. But it's a hack that fixes real user issues and given we're talking about X, I doubt anyone notices another hack anyway.

[1] If you don't care about GUIs, setxkbmap -layout custom -variant foobar has been possible for years.
[2] Sticking with the UNIX principle, it's a hack that fixes the issue at hand, is badly integrated, and weird to configure.

Posted Thu Feb 18 01:57:00 2021 Tags:

As was pointed out to us stable kernel maintainers last week, the overflow of the .y release number was going to happen soon, and our proposed solution for it (use 16 bits instead of 8), turns out to be breaking a userspace-visable api.

As we can’t really break this, I did a release of the 4.4.256 and 4.9.256 releases today that contain nothing but a new version number. See the links for the full technical details if curious.

Right now I’m asking that everyone who uses these older kernel releases to upgrade to this release, and do a full rebuild of their systems in order to see what might, or might not, break. If problems happen, please let us know on the stable@vger.kernel.org mailing list as soon as possible as I can only hold off on doing new stable releases for these branches for a single week only (i.e. February 12, 2021).

Posted Fri Feb 5 15:10:32 2021 Tags:

A number of months ago I did an “Ask Me Anything” interview on r/linux on redit. As part of that, a discussion of the hardware I used came up, and someone said, “I know someone that can get you a new machine” “get that person a new machine!” or something like that.

Fast forward a few months, and a “beefy” AMD Threadwripper 3970X shows up on my doorstep thanks to the amazing work of Wendell Wilson at Level One Techs.

Ever since I started doing Linux kernel development the hardware I use has been a mix of things donated to me for development (workstations from Intel and IBM, laptops from Dell) machines my employer have bought for me (various laptops over the years), and machines I’ve bought on my own because I “needed” it (workstations built from scratch, Apple Mac Minis, laptops from Apple and Dell and ASUS and Panasonic). I know I am extremely lucky in this position, and anything that has been donated to me, has been done so only to ensure that the hardware works well on Linux. “Will code for hardware” was an early mantra of many kernel developers, myself included, and hardware companies are usually willing to donate machines and peripherals to ensure kernel support.

This new AMD machine is just another in a long line of good workstations that help me read email really well. Oops, I mean, “do kernel builds really fast”…

For full details on the system, see this forum description, and this video that Wendell did in building the machine, and then this video of us talking about it before it was sent out. We need to do a follow-on one now that I’ve had it for a few months and have gotten used to it.

Benchmark tools

Below I post the results of some benchmarks that I have done to try to show the speed of different systems. I’ve used the tool Fio version fio-3.23-28-g7064, kcbench version v0.9.0 (from git), and perf version 5.7.g3d77e6a8804a. All of these are great for doing real-world tests of I/O systems (fio), kernel build tests (kcbench), and “what is my system doing at the moment” queries (perf). I recommend trying all of these out yourself if you haven’t done so already.

Fast Builds

I’ve been using a laptop for my primary development system for a number of years now, due to travel and moving around a bit, and because it was just “good enough” at the time. I do some local builds and testing, but have a “build machine” in a data center somewhere, that I do all of my normal stable kernel builds on, as it is much much faster than any laptop. It is set up to do kernel builds directly off of a RAM disk, ensuring that I/O isn’t an issue. Given that is has 128Gb of RAM, carving out a 40Gb ramdisk for kernel builds to run on (room for 4-5 at once), this has worked really well, with kernel builds of a full kernel tree in a few minutes.

Here’s the output of kcbench on my data center build box which is running Fedora 32:

Processor:           Intel Core Processor (Broadwell) [40 CPUs]
Cpufreq; Memory:     Unknown; 120757 MiB
Linux running:       5.8.7-200.fc32.x86_64 [x86_64]
Compiler:            gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)
Linux compiled:      5.7.0 [/home/gregkh/.cache/kcbench/linux-5.7]
Config; Environment: defconfig; CCACHE_DISABLE="1"
Build command:       make vmlinux
Filling caches:      This might take a while... Done
Run 1 (-j 40):       81.92 seconds / 43.95 kernels/hour [P:3033%]
Run 2 (-j 40):       83.38 seconds / 43.18 kernels/hour [P:2980%]
Run 3 (-j 46):       82.11 seconds / 43.84 kernels/hour [P:3064%]
Run 4 (-j 46):       81.43 seconds / 44.21 kernels/hour [P:3098%]

Contrast that with my current laptop:

Processor:           Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz [8 CPUs]
Cpufreq; Memory:     powersave [intel_pstate]; 15678 MiB
Linux running:       5.8.8-arch1-1 [x86_64]
Compiler:            gcc (GCC) 10.2.0
Linux compiled:      5.7.0 [/home/gregkh/.cache/kcbench/linux-5.7]
Config; Environment: defconfig; CCACHE_DISABLE="1"
Build command:       make vmlinux
Filling caches:      This might take a while... Done
Run 1 (-j 8):        392.69 seconds / 9.17 kernels/hour [P:768%]
Run 2 (-j 8):        393.37 seconds / 9.15 kernels/hour [P:768%]
Run 3 (-j 10):       394.14 seconds / 9.13 kernels/hour [P:767%]
Run 4 (-j 10):       392.94 seconds / 9.16 kernels/hour [P:769%]
Run 5 (-j 4):        441.86 seconds / 8.15 kernels/hour [P:392%]
Run 6 (-j 4):        440.31 seconds / 8.18 kernels/hour [P:392%]
Run 7 (-j 6):        413.48 seconds / 8.71 kernels/hour [P:586%]
Run 8 (-j 6):        412.95 seconds / 8.72 kernels/hour [P:587%]

Then the new workstation:

Processor:           AMD Ryzen Threadripper 3970X 32-Core Processor [64 CPUs]
Cpufreq; Memory:     schedutil [acpi-cpufreq]; 257693 MiB
Linux running:       5.8.8-arch1-1 [x86_64]
Compiler:            gcc (GCC) 10.2.0
Linux compiled:      5.7.0 [/home/gregkh/.cache/kcbench/linux-5.7/]
Config; Environment: defconfig; CCACHE_DISABLE="1"
Build command:       make vmlinux
Filling caches:      This might take a while... Done
Run 1 (-j 64):       37.15 seconds / 96.90 kernels/hour [P:4223%]
Run 2 (-j 64):       37.14 seconds / 96.93 kernels/hour [P:4223%]
Run 3 (-j 71):       37.16 seconds / 96.88 kernels/hour [P:4240%]
Run 4 (-j 71):       37.12 seconds / 96.98 kernels/hour [P:4251%]
Run 5 (-j 32):       43.12 seconds / 83.49 kernels/hour [P:2470%]
Run 6 (-j 32):       43.81 seconds / 82.17 kernels/hour [P:2435%]
Run 7 (-j 38):       41.57 seconds / 86.60 kernels/hour [P:2850%]
Run 8 (-j 38):       42.53 seconds / 84.65 kernels/hour [P:2787%]

Having a local machine that builds kernels faster than my external build box has been a liberating experience. I can do many more local tests before sending things off to the build systems for “final test builds” there.

Here’s a picture of my local box doing kernel builds, and the remote machine doing builds at the same time, both running bpytop to monitor what is happening (htop doesn’t work well for huge numbers of cpus). It’s not really all that useful, but is fun eye-candy:

SSD vs. NVME

As shipped to me, the machine booted from a raid array of an NVME disk. Outside of laptops, I’ve not used NVME disks, only SSDs. Given that I didn’t really “trust” the Linux install on the disk, I deleted the data on the disks, and installed a trusty SATA SSD disk and got Linux up and running well on it.

After that was all up and running well (btw, I use Arch Linux), I looked into the NVME disk, to see if it really would help my normal workflow out or not.

Firing up fio, here are the summary numbers of the different disk systems using the default “examples/ssd-test.fio” test settings:

SSD:

Run status group 0 (all jobs):
   READ: bw=219MiB/s (230MB/s), 219MiB/s-219MiB/s (230MB/s-230MB/s), io=10.0GiB (10.7GB), run=46672-46672msec

Run status group 1 (all jobs):
   READ: bw=114MiB/s (120MB/s), 114MiB/s-114MiB/s (120MB/s-120MB/s), io=6855MiB (7188MB), run=60001-60001msec

Run status group 2 (all jobs):
  WRITE: bw=177MiB/s (186MB/s), 177MiB/s-177MiB/s (186MB/s-186MB/s), io=10.0GiB (10.7GB), run=57865-57865msec

Run status group 3 (all jobs):
  WRITE: bw=175MiB/s (183MB/s), 175MiB/s-175MiB/s (183MB/s-183MB/s), io=10.0GiB (10.7GB), run=58539-58539msec

Disk stats (read/write):
  sda: ios=4375716/5243124, merge=548/5271, ticks=404842/436889, in_queue=843866, util=99.73%

NVME:

Run status group 0 (all jobs):
   READ: bw=810MiB/s (850MB/s), 810MiB/s-810MiB/s (850MB/s-850MB/s), io=10.0GiB (10.7GB), run=12636-12636msec

Run status group 1 (all jobs):
   READ: bw=177MiB/s (186MB/s), 177MiB/s-177MiB/s (186MB/s-186MB/s), io=10.0GiB (10.7GB), run=57875-57875msec

Run status group 2 (all jobs):
  WRITE: bw=558MiB/s (585MB/s), 558MiB/s-558MiB/s (585MB/s-585MB/s), io=10.0GiB (10.7GB), run=18355-18355msec

Run status group 3 (all jobs):
  WRITE: bw=553MiB/s (580MB/s), 553MiB/s-553MiB/s (580MB/s-580MB/s), io=10.0GiB (10.7GB), run=18516-18516msec

Disk stats (read/write):
    md0: ios=5242880/5237386, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=1310720/1310738, aggrmerge=0/23, aggrticks=63986/25048, aggrin_queue=89116, aggrutil=97.67%
  nvme3n1: ios=1310720/1310729, merge=0/0, ticks=63622/25626, in_queue=89332, util=97.63%
  nvme0n1: ios=1310720/1310762, merge=0/92, ticks=63245/25529, in_queue=88858, util=97.67%
  nvme1n1: ios=1310720/1310735, merge=0/3, ticks=64009/24018, in_queue=88114, util=97.58%
  nvme2n1: ios=1310720/1310729, merge=0/0, ticks=65070/25022, in_queue=90162, util=97.49%

Full logs of both tests can be found here for the SSD, and here for the NVME array.

Basically the NVME array is up to 3 times faster than the SSD, depending on the specific read/write test, and is faster for everything overall.

But, does my normal workload of kernel builds matter when building on such fast storage? Normally a kernel build is very I/O intensive, but only up to a point. If the storage system can keep the CPU “full” of new data to build, and writes do not stall, a kernel build should be limited by CPU power, if the storage system can go fast enough.

So, is a SSD “fast” enough on a huge AMD Threadripper system?

In short, yes, here’s the output of kcbench running on the NVME disk:

Processor:           AMD Ryzen Threadripper 3970X 32-Core Processor [64 CPUs]
Cpufreq; Memory:     schedutil [acpi-cpufreq]; 257693 MiB
Linux running:       5.8.8-arch1-1 [x86_64]
Compiler:            gcc (GCC) 10.2.0
Linux compiled:      5.7.0 [/home/gregkh/.cache/kcbench/linux-5.7/]
Config; Environment: defconfig; CCACHE_DISABLE="1"
Build command:       make vmlinux
Filling caches:      This might take a while... Done
Run 1 (-j 64):       36.97 seconds / 97.38 kernels/hour [P:4238%]
Run 2 (-j 64):       37.18 seconds / 96.83 kernels/hour [P:4220%]
Run 3 (-j 71):       37.14 seconds / 96.93 kernels/hour [P:4248%]
Run 4 (-j 71):       37.22 seconds / 96.72 kernels/hour [P:4241%]
Run 5 (-j 32):       44.77 seconds / 80.41 kernels/hour [P:2381%]
Run 6 (-j 32):       42.93 seconds / 83.86 kernels/hour [P:2485%]
Run 7 (-j 38):       42.41 seconds / 84.89 kernels/hour [P:2797%]
Run 8 (-j 38):       42.68 seconds / 84.35 kernels/hour [P:2787%]

Almost the exact same number of kernels built per hour.

So for a kernel developer, right now, a SSD is “good enough”, right?

It’s not just all builds

While kernel builds are the most time-consuming thing that I do on my systems, the other “heavy” thing that I do is lots of git commands on the Linux kernel tree. git is really fast, but it is limited by the speed of the storage medium for lots of different operations (clones, switching branches, and the like).

After I switched to running my kernel trees off of the NVME storage, it “felt” like git was going faster now, so I came up with some totally-artifical benchmarks to try to see if this was really true or not.

One common thing is cloning a whole kernel tree from a local version in a new directory to do different things with it. Git is great in that you can keep the “metadata” in one place, and only check out the source files in the new location, but dealing with 70 thousand files is not “free”.

$ cat clone_test.sh
#!/bin/bash
git clone -s ../work/torvalds/ test
sync

And, to make sure the data isn’t just coming out of the kernel cache, be sure to flush all caches first.

SSD output:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ perf stat ./clone_test.sh
Cloning into 'test'...
done.
Updating files: 100% (70006/70006), done.

 Performance counter stats for './clone_test.sh':

          4,971.83 msec task-clock:u              #    0.536 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            92,713      page-faults:u             #    0.019 M/sec
    14,623,046,712      cycles:u                  #    2.941 GHz                      (83.18%)
       720,522,572      stalled-cycles-frontend:u #    4.93% frontend cycles idle     (83.40%)
     3,179,466,779      stalled-cycles-backend:u  #   21.74% backend cycles idle      (83.06%)
    21,254,471,305      instructions:u            #    1.45  insn per cycle
                                                  #    0.15  stalled cycles per insn  (83.47%)
     2,842,560,124      branches:u                #  571.734 M/sec                    (83.21%)
       257,505,571      branch-misses:u           #    9.06% of all branches          (83.68%)

       9.270460632 seconds time elapsed

       3.505774000 seconds user
       1.435931000 seconds sys

NVME disk:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
~/linux/tmp $ perf stat ./clone_test.sh
Cloning into 'test'...
done.
Updating files: 100% (70006/70006), done.

 Performance counter stats for './clone_test.sh':

          5,183.64 msec task-clock:u              #    0.833 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            87,409      page-faults:u             #    0.017 M/sec
    14,660,739,004      cycles:u                  #    2.828 GHz                      (83.46%)
       712,429,063      stalled-cycles-frontend:u #    4.86% frontend cycles idle     (83.40%)
     3,262,636,019      stalled-cycles-backend:u  #   22.25% backend cycles idle      (83.09%)
    21,241,797,894      instructions:u            #    1.45  insn per cycle
                                                  #    0.15  stalled cycles per insn  (83.50%)
     2,839,260,818      branches:u                #  547.735 M/sec                    (83.30%)
       258,942,077      branch-misses:u           #    9.12% of all branches          (83.25%)

       6.219492326 seconds time elapsed

       3.336154000 seconds user
       1.593855000 seconds sys

So a “clone” is faster by 3 seconds, nothing earth shattering, but noticable.

But clones are rare, what’s more common is switching between branches, which checks out a subset of the different files depending on what is contained in the branches. It’s a lot of logic to figure out exactly what files need to change.

Here’s the test script:

$ cat branch_switch_test.sh
#!/bin/bash
cd test
git co -b old_kernel v4.4
sync
git co -b new_kernel v5.8
sync

And the results on the different disks:

SSD:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ perf stat ./branch_switch_test.sh
Updating files: 100% (79044/79044), done.
Switched to a new branch 'old_kernel'
Updating files: 100% (77961/77961), done.
Switched to a new branch 'new_kernel'

 Performance counter stats for './branch_switch_test.sh':

     10,500.82 msec task-clock:u              #    0.613 CPUs utilized
         0      context-switches:u        #    0.000 K/sec
         0      cpu-migrations:u          #    0.000 K/sec
       195,900      page-faults:u             #    0.019 M/sec
    27,773,264,048      cycles:u                  #    2.645 GHz                      (83.35%)
     1,386,882,131      stalled-cycles-frontend:u #    4.99% frontend cycles idle     (83.54%)
     6,448,903,713      stalled-cycles-backend:u  #   23.22% backend cycles idle      (83.22%)
    39,512,908,361      instructions:u            #    1.42  insn per cycle
                          #    0.16  stalled cycles per insn  (83.15%)
     5,316,543,747      branches:u                #  506.298 M/sec                    (83.55%)
       472,900,788      branch-misses:u           #    8.89% of all branches          (83.18%)

      17.143453331 seconds time elapsed

       6.589942000 seconds user
       3.849337000 seconds sys

NVME:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
~/linux/tmp $ perf stat ./branch_switch_test.sh
Updating files: 100% (79044/79044), done.
Switched to a new branch 'old_kernel'
Updating files: 100% (77961/77961), done.
Switched to a new branch 'new_kernel'

 Performance counter stats for './branch_switch_test.sh':

         10,945.41 msec task-clock:u              #    0.921 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
           197,776      page-faults:u             #    0.018 M/sec
    28,194,940,134      cycles:u                  #    2.576 GHz                      (83.37%)
     1,380,829,465      stalled-cycles-frontend:u #    4.90% frontend cycles idle     (83.14%)
     6,657,826,665      stalled-cycles-backend:u  #   23.61% backend cycles idle      (83.37%)
    41,291,161,076      instructions:u            #    1.46  insn per cycle
                                                  #    0.16  stalled cycles per insn  (83.00%)
     5,353,402,476      branches:u                #  489.100 M/sec                    (83.25%)
       469,257,145      branch-misses:u           #    8.77% of all branches          (83.87%)

      11.885845725 seconds time elapsed

       6.741741000 seconds user
       4.141722000 seconds sys

Just over 5 seconds faster on an nvme disk array.

Now 5 seconds doesn’t sound like much, but I’ll take it…

Conclusion

If you haven’t looked into new hardware in a while, or are stuck doing kernel development on a laptop, please seriously consider doing so, the power in a small desktop tower these days (and who is traveling anymore that needs a laptop?) is well worth it if possible.

Again, many thanks to Level1Techs for the hardware, it’s been put to very good use.

Posted Wed Feb 3 17:21:34 2021 Tags:

A recent email thread about “Why isn’t the 5.10 stable kernel listed as supported for 6 years yet!” on the linux-kernel mailing list ended up generating a bunch of direct emails to me asking what could different companies and individuals due to help out. What exactly was I looking for here?

Instead of having to respond to private emails with the same information over and over, I figured it was better to just put it here so that everyone can see what exactly I am expecting with regards to support in order to be able to maintain a kernel for longer than 2 years:

What I need help with

All I request is that people test the -rc releases when I announce them, and let me know if they work or not for their systems/workloads/tests/whatever.

If you look at the -rc announcements today, you will see a number of different people/groups responding with this information. If they want, they can provide a Tested-by: ... line that I will add to the release commit, or not, that’s up to them.

Here and here and here and here and here are all great examples of how people let me know that all is ok with the -rc kernels so that I know it is “safe” to do the release.

I also have a few companies send me private emails that all is good, there’s no requirement to announce this in public if you don’t want to (but it is nice, as kernel development should be done in public.)

Some companies can’t do tests on -rc releases due to their build infrastructures not handling that very well, so they email me after the stable release is out, saying all is good. Worst case, we end up reverting a patch in a released kernel, but it’s better to quickly do that based on testing than to miss it entirely because no one is testing at all.

And that’s it!

But, if you want to do more, I always really appreciate when people email me, or stable@vger.kernel.org, git commit ids that are needed to be backported to specific stable kernel trees because they found them in their testing/development efforts. You know what problems you hit better than anyone, and once those issues are found and fixed, making sure they get backported is a good thing, so I always want to know that.

Again, if you look on the stable@vger.kernel.org list, you will see different companies and developers providing backports of things they want backported, or just a list of the git commit ids if the backports apply cleanly.

Does that sound reasonable? I want to make sure that the LTS kernels that you rely on actually work for you without regressions, so testing is key, as is finding any fixes that are needed for them.

It’s not much, but I can’t do it alone :)

So, 6 years or not for 5.10?

The above is what I need in order to be able to support a kernel for 6 years, constant testing by users of the kernels. If we don’t have that, then why even do these releases because that must mean that no one is using them? So email me and let me know.

As of this point in time (February 3, 2021), I do not have enough committments by companies to help out with this effort to be able to say I can do this for 6 years right now (note, no response yet from the company that originally asked this question…) Hopefully that changes soon, and if it does, the kernel.org release page will be updated with the new date.

Posted Wed Feb 3 17:21:34 2021 Tags:

Your XKB keymap contains two important parts. One is the mapping from the hardware scancode to some internal representation, for example:

  <AB10> = 61;  

Which basically means Alphanumeric key in row B (from bottom), 10th key from the left. In other words: the /? key on a US keyboard.

The second part is mapping that internal representation to a keysym, for example:

  key <AB10> {        [     slash,    question        ]       }; 

This is the actual layout mapping - once in place this key really produces a slash or question mark (on level2, i.e. when Shift is down).

This two-part approach exists so either part can be swapped without affecting the other. Swap the second part to an exclamation mark and paragraph symbol and you have the French version of this key, swap it to dash/underscore and you have the German version of the key - all without having to change the keycode.

Back in the golden days of everyone-does-what-they-feel-like, keyboard manufacturers (presumably happily so) changed the key codes and we needed model-specific keycodes in XKB. The XkbModel configuration is a leftover from these trying times.

The Linux kernel's evdev API has largely done away with this. It provides a standardised set of keycodes, defined in linux/input-event-codes.h, and ensures, with the help of udev [0], that all keyboards actually conform to that. An evdev XKB keycode is a simple "kernel keycode + 8" [1] and that applies to all keyboards. On top of that, the kernel uses semantic definitions for the keys as they'd be in the US layout. KEY_Q is the key that would, behold!, produce a Q. Or an A in the French layout because they just have to be different, don't they? Either way, with evdev the Xkb Model configuration largely points to nothing and only wastes a few cycles with string parsing.

The second part, the keysym mapping, uses two approaches. One is to use a named #define like the "slash", "question" outlined above (see X11/keysymdef.h for the defines). The other is to use unicode directly like this example from  the Devangari layout:

  key <AB10> { [ U092f, U095f, slash, question ] };

As you can see, mix and match is available too. Using Unicode code points of course makes the layouts less immediately readable but on the other hand we don't need to #define the whole of Unicode. So from a maintenance perspective it's a win.

However, there's a third type of key that we care about: functional keys. Those are the multimedia (historically: "internet") keys that most devices have these days. Volume up, touchpad on/off, cycle display connectors, etc. Those keys are special in that they don't have a Unicode representation and they are always mapped to the same fixed functionality. Even Dvorak users want their volume keys to do what it says on the key.

Because they have no Unicode code points, those keys are defined, historically, in XF86keysyms.h:

  #define XF86XK_MonBrightnessUp    0x1008FF02  /* Monitor/panel brightness */

And mapping a key like this looks like this [2]:

  key <I21>   {       [ XF86Calculator        ] };

The only drawback: every key needs to be added manually. This has been done for some, but not for others. And some keys were added with different names than what the kernel uses [3].

So we're in this weird situation where we have a flexible keymap system  but the kernel already tells us what a key does anyway and we don't want to change that. Virtually all keys added in the last decade or so falls into that group of keys, but to actually make use of them requires a #define in xorgproto and an update to the keycodes and symbols in xkeyboard-config. That again introduces discrepancies and we end up in the situation where we're at right now: some keys don't work until someone files a bug, and then the users still need to wait for several components to be released and those releases trickle into the distributions.

10 years ago would've been a good time to make this more efficient. The situation wasn't that urgent then, most of the kernel keycodes added are >255 which means they cannot be used in X anyway. [4] The second best time to do it is now. What we need is basically a pass-through from kernel code to symbol and that's currently sitting in various MRs:

- xkeyboard-config can generate the keycodes/evdev file based on the list of kernel keycodes, so all kernel keycodes are mapped to internal representations by default

- xorgproto has reserved a range within the XF86 keysym reserved range for pass-through mappings, i.e. any KEY_FOO define from the kernel is mapped to XF86XK_Foo with a specific value [5]. The #define format is fixed so it can be parsed.

- xkeyboard-config parses theses XF86 keysyms and sets up a keysym mapping in the default keymap.

This is semi-automatic, i.e. there are helper scripts that detect changes and notify us, hooked into the CI, but the actual work must be done manually. These keysyms immediately become set-in-stone API so we don't want some unsupervised script to go wild on them.

There's a huge backlog of keys to be added (dating to kernels pre-v3.18) and I'll go through them one-by-one over the next weeks to make sure they're correct. But eventually they'll be done and we have a full keymap for all kernel keys to be immediately available in the XKB layout.

The last part of all of this is a calendar reminder for me to do this after every new kernel release. Let's hope this crucial part isn't the first to fail.

[0] 60-keyboard.hwdb has a mere ~1800 lines!
[1] Historical reasons, you don't want to know. *jedi wave*
[2] the XK_ part of the key name is dropped, implementation detail.
[3] This can also happen when a kernel define is renamed/aliased but we cannot easily do so for this header.
[4] X has an 8 bit keycode limit and that won't change until someone develops XKB2 with support for 32-bit keycodes, i.e. never.

[5] The actual value is an implementation detail and no client must care


Posted Fri Jan 22 00:58:00 2021 Tags:

This post explains how to parse the HID Unit Global Item as explained by the HID Specification, page 37. The table there is quite confusing and it took me a while to fully understand it (Benjamin Tissoires was really the one who cracked it). I couldn't find any better explanation online which means either I'm incredibly dense and everyone's figured it out or no-one has posted a better explanation. On the off-chance it's the latter [1], here are the instructions on how to parse this item.

We know a HID Report Descriptor consists of a number of items that describe the content of each HID Report (read: an event from a device). These Items include things like Logical Minimum/Maximum for axis ranges, etc. A HID Unit item specifies the physical unit to apply. For example, a Report Descriptor may specify that X and Y axes are in mm which can be quite useful for all the obvious reasons.

Like most HID items, a HID Unit Item consists of a one-byte item tag and 1, 2 or 4 byte payload. The Unit item in the Report Descriptor itself has the binary value 0110 01nn where the nn is either 1, 2, or 3 indicating 1, 2 or 4 bytes of payload, respectively. That's standard HID.

The payload is divided into nibbles (4-bit units) and goes from LSB to MSB. The lowest-order 4 bits (first byte & 0xf) define the unit System to apply: one of SI Linear, SI Rotation, English Linear or English Rotation (well, or None/Reserved). The rest of the nibbles are in this order: "length", "mass", "time", "temperature", "current", "luminous intensity". In something resembling code this means:


system = value & 0xf
length_exponent = (value & 0xf0) >> 4
mass_exponent = (value & 0xf00) >> 8
time_exponent = (value & 0xf000) >> 12
...
The System defines which unit is used for length (e.g. SILinear means length is in cm). The actual value of each nibble is the exponent for the unit in use [2]. In something resembling code:

switch (system)
case SILinear:
print("length is in cm^{length_exponent}");
break;
case SIRotation:
print("length is in rad^{length_exponent}");
break;
case EnglishLinear:
print("length is in in^{length_exponent}");
break;
case EnglishRotation:
print("length is in deg^{length_exponent}");
break;
case None:
case Reserved"
print("boo!");
break;

For example, the value 0x321 means "SI Linear" (0x1) so the remaining nibbles represent, in ascending nibble order: Centimeters, Grams, Seconds, Kelvin, Ampere, Candela. The length nibble has a value of 0x2 so it's square cm, the mass nibble has a value of 0x3 so it is cubic grams (well, it's just an example, so...). This means that any report containing this item comes in cm²g³. As a more realistic example: 0xF011 would be cm/s.

If we changed the lowest nibble to English Rotation (0x4), i.e. our value is now 0x324, the units represent: Degrees, Slug, Seconds, F, Ampere, Candela [3]. The length nibble 0x2 means square degrees, the mass nibble is cubic slugs. As a more realistic example, 0xF014 would be degrees/s.

Any nibble with value 0 means the unit isn't in use, so the example from the spec with value 0x00F0D121 is SI linear, units cm² g s⁻³ A⁻¹, which is... Voltage! Of course you knew that and totally didn't have to double-check with wikipedia.

Because bits are expensive and the base units are of course either too big or too small or otherwise not quite right, HID also provides a Unit Exponent item. The Unit Exponent item (a separate item to Unit in the Report Descriptor) then describes the exponent to be applied to the actual value in the report. For example, a Unit Eponent of -3 means 10⁻³ to be applied to the value. If the report descriptor specifies an item of Unit 0x00F0D121 (i.e. V) and Unit Exponent -3, the value of this item is mV (milliVolt), Unit Exponent of 3 would be kV (kiloVolt).

Now, in hindsight all this is pretty obvious and maybe even sensible. It'd have been nice if the spec would've explained it a bit clearer but then I would have nothing to write about, so I guess overall I call it a draw.

[1] This whole adventure was started because there's a touchpad out there that measures touch pressure in radians, so at least one other person out there struggled with the docs...
[2] The nibble value is twos complement (i.e. it's a signed 4-bit integer). Values 0x1-0x7 are exponents 1 to 7, values 0x8-0xf are exponents -8 to -1.
[3] English Linear should've trolled everyone and use Centimetres instead of Centimeters in SI Linear.

Posted Wed Jan 13 11:28:00 2021 Tags:

"Systems design" is a branch of study that tries to find universal architectural patterns that are valid across disciplines.

You might think that's not a possibility. Back in university, students used to tease the Systems Design Engineers, calling it "boxes and arrows" engineering. Not real engineering, you see, since it didn't touch anything tangible, like buildings, motors, hydrochloric acid, or, uh, electrons.

I don't think the Systems Design people took this criticism too seriously since everyone also knew that programme had the toughest admittance criteria in the whole university.

(A mechanical engineer told me they saw electrical/computer engineers the same way: waveforms on a screen instead of real physical things that you could touch, change, and fix.)

I don't think any of us really understood what boxes-and-arrows engineering really was back then, but luckily for you, now I'm old. Let me tell you some stories.

What is systems design?

I started thinking more clearly about systems design when I was at a big tech company and helped people refine their self-promotion employee review packets. Most of it was straightforward, helping them map their accomplishments to the next step up the engineering ladder:

  • As a Novice going for Junior, you had to prove you could fix bugs without too much supervision;
  • Going for Senior, you had to prove you could implement a whole design with little supervision;
  • Going for Staff, you had to show you could produce designs based on business problems with basically no management;
  • Going for Senior Staff, you had to solve bigger and bigger business problems; and so on.

After helping a few dozen people with their assessments, I noticed a trend. Most developers mapped well onto the ladder, but some didn't fit, even though they seemed like great engineers to me.

There were two groups of misfits:

  1. People who maxed out as a senior engineer (building things) but didn't seem to want to, or be able to, make it to staff engineer (translating business problems).

  2. People who were ranked at junior levels, but were better at translating business problems than at fixing bugs.

Group #1 was formally accounted for: the official word was most employees should never expect to get past Senior Engineer. That's why they called it Senior. It wasn't not much consolation to people who wanted to earn more money or to keep improving for the next 20-30 years of a career, but it was something we could talk about.

(The book Radical Candor by Kim Scott has some discussion about how to handle great engineers who just want to build things. She suggests a separate progression for "rock solid" engineers, who want to become world-class experts at things they're great at, and "steep trajectory" engineers, who might have less attention to detail but who want to manage ever-bigger goals and jump around a lot.)

People in group #2 weren't supposed to exist. They were doing some hard jobs - translating business problems into designs - with great expertise, but these accomplishments weren't interesting to the junior-level promotion committees, who had been trained to look for "exactly one level up" attributes like deep technical knowledge in one or two specific areas, a history of rapid and numerous bug fixes, small independent launches, and so on. Meanwhile, their peers who couldn't (yet) architect their way out of a paper bag rose more quickly through the early ranks, because they wrote reams of code fast.

Tanya Reilly has an excellent talk (and transcribed slides) called Being Glue that perfectly captures this effect. In her words: "Glue work is expected when you're senior... and risky when you're not."

What she calls glue work, I'm going to call systems design. They're two sides of the same issue. Humans are the most unruly systems of all, and yet, amazingly, they follow many of the same patterns as other systems.

People who are naturally excellent at glue work often stall out early in the prescribed engineering pipeline, even when they'd be great in later stages (staff engineers, directors, and executives) that traditional engineers struggle at. In fact, it's well documented that an executive in a tech company requires almost a totally different skill set than a programmer, and rising through the ranks doesn't prepare you for that job at all. Many big tech companies hire executives from outside the company, and sometimes even from outside their own industry, for that reason.

...but I guess I still haven't answered the question. What is systems design? It's the thing that will eventually kill your project if you do it wrong, but probably not right away. It's macroeconomics instead of microeconomics. It's fixing which promotion ladders your company even has, rather than trying to climb the ladders. It's knowing when a distributed system is or isn't appropriate, not just knowing how to build one. It's repairing the incentives in a political system, not just getting elected and passing your favourite laws.

Most of all, systems design is invisible to people who don't know how to look for it. At least with code, you can measure output by the line or the bug, and you can hire more programmers to get more code. With systems design, the key insight might be a one-sentence explanation given at the right time to the right person, that affects the next 5 years of work, or is the difference between hypergrowth and steady growth.

Sorry, I don't know how to explain it better than that. What I can do instead is talk about some systems design problems and archetypes that repeat, over and over, across multiple fields. If you can recognize these archetypes, and handle them before they kill your project, you're on your way to being a systems designer.

Systems of control: hierarchies and decentralization

Let's start with an obvious one: the problem of centralized vs distributed control structures. If I ask you what's a better org structure: a command-and-control hierarchy or a flat organization, most people have been indoctrinated to say the latter. Similarly if I ask whether you should have an old crusty centralized database or a fancy distributed database, everyone wants to build the latter. If you're an SRE and we start talking about pets and cattle, you always vote for cattle. You'd laugh at me if I suggested using anything but a distributed software version control system (ie. git). The future of money, I've heard, is distributed decentralized cryptocurrency. If you want to defeat censorship, you need a distributed social network. The trend is clear. What's to debate?

Well, real structures are more complicated than that. The best introductory article I know on this topic is Jo Freeman's The Tyranny of Structurelessness, which includes the famous quote: "This apparent lack of structure too often disguised an informal, unacknowledged and unaccountable leadership that was all the more pernicious because its very existence was denied."

"Informal, unacknowledged, and unaccountable" control is just as common in distributed computing systems as it is in human social systems.

The truth is, nearly every attempt to design a hierarchy-free, "flat" control system just moves the central control around until you can't see it anymore. Human structures all have leaders, whether implicit or explicit, and the explicit ones tend to be more diverse.

The web depends on centrally controlled DNS and centrally approved TLS certificate issuers; the global Internet depends on a small cabal who sorts out routing problems. Every blockchain depends on whoever decides if your preferred chain will fork this week, and whoever runs the popular exchanges, and whoever decides whether to arrest those people. Distributed radio networks depend on centralized government spectrum licenses. Democracy depends on someone enforcing your right to vote. Capitalism depends on someone enforcing the rules of a "free" marketplace.

At my first startup, we tried to run the development team as a flat organization, where everyone's opinions were listened to and everyone could debate the best way to do something. The overall consensus was that we mostly succeeded. But I was shocked when one of my co-workers said to me afterward: "Our team felt flat and egalitarian. But you can't ever forget that it was only that way because you forced it to be that way."

Truly distributed systems do exist. Earth's ecosystem is perhaps one (although it's becoming increasingly fragile and dependent on humans not to break it). Truly distributed databases using Raft consensus or similar algorithms certainly exist and work. Distributed version control (like git) really is distributed, although we ironically end up re-centralizing our usage of it through something like Github.

CAP theorem is perhaps the best-known statement of the tradeoffs in distributed systems, between consistency, availability, and "partition tolerance." Normally we think of the CAP theorem as applying to databases, but it applies to all distributed systems. Centralized databases do well at consistency and availability, but suck at partition tolerance; so do authoritarian government structures.

In systems design, there is rarely a single right answer that applies everywhere. But with centralized vs distributed systems, my rule of thumb is to do exactly what Jo Freeman suggested: at least make sure the control structure is explicit. When it's explicit, you can debug it.

Chicken-egg problems

Another archetypal systems design question is the "chicken-egg problem," which is short for: which came first, the chicken or the egg?

In case that's not a common question where you come from, the idea is eggs produce chickens, and chickens produce eggs. That's all fine once it's going, but what happened, back in ancient history? Was the very first step in the first iteration an egg, or a chicken?

The question sounds silly and faux-philosophical at first, but there's a real answer and that answer applies to real problems in the business world.

The answer to the riddle is "neither"; unless you're a Bible literalist, you can't trace back to the Original Chicken that laid the Original Egg. Instead there was probably a chicken-like bird that laid a mostly egg-ish egg, and before that, there were millions of years of evolution, going all the way back to single-celled organisms and whatever phenomenon first spawned those. What came "first"? All that other stuff.

Chicken-egg problems appear all the time when building software or launching products. Which came first, HTML5 web browsers or HTML5 web content? Neither, of course. They evolved in loose synchronization, tracing back to the first HTML experiments and way before HTML itself, growing slowly and then quickly in popularity along the way.

I refer to chicken-egg problems a lot because designers are oblivious to them a lot. Here are some famous chicken-egg problems:

  • Electrical distribution networks
  • Phone and fax technologies
  • The Internet
  • IPv6
  • Every social network (who will use it if nobody is using it?)
  • CDs, DVDs, and Blu-Ray vs HD DVD
  • HDTV (1080p etc), 4k TV, 8k TV, 3D TV
  • Interstate highways
  • Company towns (usually built around a single industry)
  • Ivy league universities (could you start a new one?)
  • Every new video game console
  • Every desktop OS, phone OS, and app store

The defining characteristic of a chicken-egg technology or product is that it's not useful to you unless other people use it. Since adopting new technology isn't free (in dollars, or time, or both), people aren't likely to adopt it unless they can see some value, but until they do, the value isn't there, so they don't. A conundrum.

It's remarkable to me how many dreamers think they can simply outwait the problem ("it'll catch on eventually!") or outspend the problem ("my new mobile OS will be great, we'll just subsidize a few million phones"). And how many people think getting past a chicken-egg problem, or not, is just luck.

But no! Just like with real chickens and real eggs, there's a way to do it by bootstrapping from something smaller. The main techniques are to lower the cost of adoption, and to deliver more value even when there are fewer users.

Video game console makers (Nintendo, Sony, Microsoft) have become skilled at this; they're the only ones I know who do it on purpose every few years. Some tricks they use are:

  • Subsidizing the cost of early console sales.
  • Backward compatibility, so people who buy can use older games even before there's much native content.
  • Games that are "mostly the same" but "look better" on the new console.
  • Compatible gamepads between generations, so developers can port old games more easily.
  • "Exclusive launch titles": co-marketing that ensures there's value up front for consumers (new games!) and for content producers (subsidies, free advertising, higher prices).

In contrast, the designs that baffle me the most are ones that absolutely ignore the chicken-egg problem. Firefox and Ubuntu phones, distributed open source social networks, alternative app stores, Linux on the desktop, Netflix competitors.

Followers of this diary have already seen me rant about IPv6: it provides nearly no value to anyone until it is 100% deployed (so we can finally shut down IPv4!), but costs immediately in added complexity and maintenance (building and running a whole parallel Internet). Could IPv6 have been rolled out faster, if the designers had prioritized unwinding the chicken-egg problem? Absolutely yes. But they didn't acknowledge it as the absolute core of their design problem, the way Android, Xbox, Blu-Ray, and Facebook did.

If your product or company has a chicken-egg problem, and you can't clearly spell out your concrete plan for solving it, then investors definitely should not invest in your company. Solving the chicken-egg problem should be the first thing on your list, not some afterthought.

By the way, while we're here, there are even more advanced versions of the chicken-egg problem. Facebook or faxes are the basic form: the more people who use Facebook or have a fax machine, the more value all those users get from each other.

The next level up is a two-sided market, such as Uber or Ebay. Nobody can get a ride from Uber unless there are drivers; but drivers don't want to work for Uber unless they can get work. Uber has to attract both kinds of users (and worse: in the same geographic region! at the same time of day!) before either kind gets anything from the deal. This is hard. They decided to spend their way to success, although even Uber was careful to do so only in a few markets at a time, especially at first.

The most difficult level I know is a three-sided market. For example, UberEats connects consumers, drivers, and restaurants. Getting a three-sided market rolling is insanely complicated, expensive, and failure-prone. I would never attempt it myself, so I'm impressed at the people who try. UberEats had a head start since Uber had consumers and drivers in their network already, and only needed to add "one more side" to their market. Most of their competitors had to attract all three sides just to start. Whoa.

If you're building a one-sided, two-sided, or three-sided market, you'd better understand systems design, chickens, and eggs.

Second-system effect

Taking a detour from business, let's move to an issue that engineers experience more directly: second-system effect, a term that comes from the excellent book, The Mythical Man-Month, by Fred Brooks.

Second system effect arises through the following steps:

  • An initial product starts small and is built incrementally, starting with a low budget and a few users.
  • Over time, the product gains popularity and becomes profitable.
  • The system evolves, getting more and more hacks on top, and early design tradeoffs start to be a bottleneck.
  • The engineers figure out a new design that would fix all the mistakes we know about, plus more! (And they're probably right.)
  • Since the product is already popular, it's easy to justify spending the time to "do it right this time" and "build a strong platform for the next 10 years." So a project is launched to rewrite everything from scratch. It's expected to take several months, maybe a couple of years, and a big engineering team.

Sound familiar? People were trying this back in 1975 when the book was written, and they're still trying it now. It rarely goes well; even when it does work, it's incredibly painful.

25 years after the book, Joel Spolsky wrote Things you should never do, part 1 about the company-destroying effect of Netscape/Mozilla trying this. "They did it by making the single worst strategic mistake that any software company can make: they decided to rewrite the code from scratch."

[Update 2020-12-28: I mention Joel's now-20-year-old article not because Mozilla was such a landmark example, but because it's such a great article.]

Some other examples of second system effect are IPv6, Python 3, Perl 6, the Plan9 OS, and the United States system of government.

The results are remarkably consistent:

  • The project takes longer than expected to reach feature parity.
  • The new design often does solve the architectural problems in the original; however, it unexpectedly creates new architectural problems that weren't in the original.
  • Development time is split (or different developers are assigned) between maintaining the old system and launching the new system.
  • As the project gets increasingly overdue, project managers are increasingly likely to shut down the old system to force users to switch to the new one, even though users still prefer the old one.

Second systems can be merely expensive, or they can bankrupt your company, or destroy your user community. The attention to Perl 6 severely weakened the progress of perl; the work on Python 3 fractured the python community for more than a decade (and still does); IPv6 is obstinately still trying to deprecate IPv4, 25 years later, even though the problems it was created to solve are largely obsolete.

As for solutions, there isn't much to say about the second system effect except you should do your utmost to prevent it; it's entirely self-inflicted. Refactor your code instead. Even if it seems like incrementalism will be more work... it's worth it. Maintaining two systems in parallel is a lot more expensive than you think.

In his book, Fred Brooks called it the "second" system on purpose, because it was his opinion that after experiencing it once, any designer will build their third and later systems more incrementally so they never have to go through that again. If you're lucky enough to learn from historical wisdom, perhaps even your second system won't suffer from this strategic error.

A more embarrassing related problem is when large companies try to build a replacement for their own first system, but the developers of the first system have left or have already learned their Second System Lesson and are not willing to play that game. Thus, a new team is assembled to build the replacement, without the experience of having built the first one, but with all the confidence of a group of users who are intimately experienced with its surface flaws. I don't even know what this phenomenon should be called; the vicarious second system effect? Anyway, my condolences if you find yourself building or using such a product. You can expect years of pain.

[Update 2020-12-28: someone reminded me that CADT ("cascade of attention-deficit teenagers") is probably related to this last phenomenon.]

Innovator's dilemmas

Let's finally talk about a systems design issue that's good news for your startup, albeit bad news for big companies. The Innovator's Dilemma is a great book by Clayton Christensen that discusses a fascinating phenomenon.

Innovator's dilemmas are so elegant and beautiful you can hardly believe they exist as such a repeatable abstraction. Here's the latest one I've heard about, via an Anandtech Article about Apple Silicon:

A summary of the Innovator's Dilemma is as follows:

  • You (Intel in this case) make an awesome product in a highly profitable industry.
  • Some crappy startup appears (ARM in this case) and makes a crappy competing product with crappy specs. The only thing they seem to have going for them is they can make some low-end garbage for cheap.
  • As a big successful company, your whole business is optimized for improving profits and margins. Your hard-working employees realize that if they cede the ultra-low-end garbage portion of the market to this competitor, they'll have more time to spend on high-valued customers. As a bonus, your average margin goes up! Genius.
  • The next year, your competitor's product gets just a little bit better, and you give up the new bottom of your market, and your margins and profits further improve. This cycle repeats, year after year. (We call this "retreating upmarket.")
  • The crappy competitor has some kind of structural technical advantage that allows their performance (however you define performance; something relevant to your market) to improve, year over year, at a higher percentage rate than your product can. And/or their product can do something yours can't do at all (in ARM's case: power efficiency).
  • Eventually, one year, the crappy competitor's product finally exceeds the performance metrics of your own product, and promptly blows your entire fucking company instantly to smithereens.

Hey now, we've started swearing, was that really called for? Yes, I think so. If I were an Intel executive looking at this chart and Apple's new laptops, I would be scared out of my mind right now. There is no more upmarket to retreat to. The competitor's product is better, and getting better faster than mine. The game is already over, and I didn't even realize I was playing.

What makes the Innovator's Dilemma so beautiful, from a systems design point of view, is the "dilemma" part. The dilemma comes from the fact that all large companies are heavily optimized to discard ideas that aren't as profitable as their existing core business. Any company that doesn't optimize like this fails; by definition their profitability would go down. So thousands of worker bees propose thousands of low-margin and high-margin projects, and the company discards the former and invests heavily in the latter (this is called "sustaining innovation" in the book), and they keep making more and more money, and all is well.

But this optimization creates a corporate political environment (aha, you see we're still talking about systems design?) where, for example, Intel could never create a product like ARM. A successful low-priced chip would take time, energy, and profitability away from the high-priced chips, and literally would have made Intel less successful for years of its history. Even once ARM appeared and their trendline of improvements was established, they still had lower margins, so competing with them would still cannibalize their own high-margin products, and worse, now ARM had a head start.

In case you're a big company reading this: the book has a few suggestions for what you can do to avoid this trap. But if you're Intel, you should have read the book a few years ago, not now.

Innovator's dilemma plots are the prettiest when discussing hardware and manufacturing, but the concept applies to software too, especially when software is held back by a hardware limitation. For example, distributed version control systems (where you download the entire repository history to every client) were amusing toys until suddenly disks were big enough and networks were fast enough, and then DVCSes wiped out everything else (except in projects with huge media files).

Fancy expensive databases were the only way to get high transaction throughput, until SSDs came along and made any dumb database fast enough for most jobs.

Complicated database indexes and schemas were great until AWS came along and let everyone just brute force mapreduce everything using short-term rental VMs.

JITs were mostly untenable until memory was so much slower than CPU that compiling was not the expensive part. Software-based network packet processing on a CPU was slower than custom silicon until generic CPUs got fast enough relative to RAM. And so on.

The Innovator's Dilemma is the book that first coined the term "disruptive innovation." Nowadays, startups talk about disrupting this and disrupting that. "Disruption" is an exciting word, everybody wants to do it! The word disruption has lost most of its meaning at this point; it's a joke as often as a serious claim.

But in the book, it had a meaning. There are two kinds of innovations: sustaining and disruptive. Sustaining is the kind that big companies are great at. If you want to make the fastest x86 processor, nobody does it better than Intel (with AMD occasionally nipping at their heels). Intel has every incentive to keep making their x86 processors better. They also charge the highest margins, which means the greatest profits, which means the most money available to pour into more sustaining innovation. There is no dilemma; they dump money and engineers and time into that, and they mostly deliver, and it pays off.

A "disruptive" innovation was meant to refer to specifically the kind you see in that plot up above: the kind where an entirely new thing sucks for a very long time, and then suddenly and instantly blows you away. This is the kind that creates the dilemma.

If you're a startup and you think you have a truly disruptive innovation, then that's great news for you. It's a perfect answer to that awkward investor question, "What if [big company] decides to do this too?" because the honest truth is "their own politics will tear that initiative apart from the inside."

The trick is to determine whether you actually have one of these exact "disruption" things. They're rare. And as an early startup, you don't yet have a historical plot like the one above that makes it clear; you have to convince yourself that you'll realistically be able to improve your thing faster than the incumbent can improve theirs, over a long period of time.

Or, if your innovation only depends on an existing trend - like in the software-based packet processing example above - then you can try to time it so that your software product is ready to mature at the same time as the hardware trend crosses over.

In conclusion: watch out for systems design. It's the sort of thing that can make you massively succeed or completely fail, independent of how well you write code or run your company, and that's scary. Sometimes you need some boxes and arrows.

Posted Sun Dec 27 13:34:08 2020 Tags: