This feed omits posts by rms. Just 'cause.

Not only do you need leap seconds to keep solar time and atomic time in sync, you'll need a different kind of leap second to keep Lunar atomic time and Earth atomic time in sync, because mass distorts spacetime.

So good luck with that...

Defining lunar time is not simple:

Although the definition of the second is the same everywhere, the special theory of relativity dictates that clocks tick slower in stronger gravitational fields. The Moon's gravitational pull is weaker than Earth's, meaning that, to an observer on Earth, a lunar clock would run faster than an Earth one. Gramling estimates that a lunar clock would gain about 56 microseconds over 24 hours. Compared with one on Earth, a clock's speed would also subtly change depending on its position on the lunar surface, because of the Moon's rotation, says Tavella. "This is a paradise for experts in relativity, because you have to take into account so many things," she adds.

Previously, previously, previously, previously, previously, previously, previously.

Posted Sun Jan 29 02:57:56 2023 Tags:
Dear Lazyweb,

Can anyone explain to me what the default event alert time actually does on macOS and iOS with iCloud syncing?

What I would like it to mean: "When I create a new event, that event is created with an alert 15 minutes before, instead of needing to add that by hand."

What it seems to actually mean: "If you create an event, even a recurring event, and at any point change the time of that alert, or delete the alert, then the 'default' 15 minute alert is going to show back up anyway, but some random amount of time later, and also only like 70% of the time."

I suspect that if I turned off the default alert on my desktop as well as every other device it would stop re-adding it, but then I would accidentally end up a bunch of events with no alerts at all.

Previously.

Posted Fri Jan 27 21:58:57 2023 Tags:
Mike Lacher:

Team,

There's no easy way to say this: I have made the difficult decision to lay off over six thousand of you. In the past two years, we have achieved huge wins together. But unfortunately, the macroeconomic environment has shifted in ways none of us could have foreseen, from an economy in which I did feel like paying you, to one in which I'd rather not.

In 2021, things looked different. Interest rates were low, and my enthusiasm for bankrolling your children's insulin was high. Given every available forecast, it was the perfect time to hire 1,200 blockchain developers, spin up original streaming content, and lead three rounds of funding for my nephew's AI-powered B2B sourdough recipe app. Who could have known that in just a few months, despite all our operational velocity, the world would pivot so dramatically? Supply chains have stalled. Inflation has risen. And suddenly all your salaries and dental work hang like millstones chafing the supple neck of my stock compensation package. [...]

This was not an easy decision to make. It's weighed heavily on me for the past month, keeping me up at night and nearly causing me to cancel the exec team's offsite, even though Bad Bunny's appearance fee was only 50 percent refundable. Let's not mince words, though; the accountability for this decision rests with me. The consequences, on the other hand, rest with you, but so does a pretty generous COBRA package.

Previously, previously, previously, previously, previously.

Posted Thu Jan 26 20:04:52 2023 Tags:
Jeffrey Paul:

Imagine my surprise when browsing these images in the Finder, Little Snitch told me that macOS is now connecting to Apple APIs via a program named mediaanalysisd (Media Analysis Daemon - a background process for analyzing media files). [...]

To recap:

  • In 2021, Apple said they'd scan your local files using your own hardware, in service of the police.

  • People got upset, because this is a clear privacy violation and is wholly unjustifiable on any basis whatsoever. (Some people speculated that such a move by Apple was to appease the US federal police in advance of their shipping better encryption features which would otherwise hinder police.)

  • Apple said some additional things that did NOT include "we will not scan your local files", but did include a confirmation that they intend to ship such features that they consider "critically important".

  • The media misreported this amended statement, and people calmed down.

  • Today, Apple scanned my local files and those scanning programs attempted to talk to Apple APIs, even though I don't use iCloud, Apple Photos, or an Apple ID. This would have happened without my knowledge or consent if I were not running third-party network monitoring software.

By default, Little Snitch allows all connections to Apple and iCloud. To block this process (and others) you have to un-check the "icloud.com" and "apple.com" rules on the "System" tab. And then endure two days of whack-a-mole while re-allowing the ones you actually want to be able to connect to Apple, like softwareupdated and IMTransferAgent and a dozen others.

Update: Lots of people keep sending me this rebuttal, and telling me "it no longer phones home as of the OS update that was released 5 minutes from now, so problem solved." Ok, that may well be. But when my OS was phoning home on my photos yesterday and happens to not be phoning home on them today... that doesn't really build trust. Intent matters, and we know what Apple's intent is because they told us. Code matters, and we are not allowed to see Apple's code.

Maybe the fact that it phoned home with a null response is only because the test photos didn't match some magic neural net -- congratulations, Apple didn't report your test images to the FBI.

We cannot know. But suspicion and mistrust are absolutely justified. Apple is examining your photos and then phoning home. The onus is on them to explain -- and prove -- what they are doing and why. They are undeserving of you taking them at their word.

Previously, previously, previously, previously, previously, previously.

Posted Wed Jan 25 00:59:55 2023 Tags:
We had surprisingly little flooding during this month's climate apocalypse. Harrison near Division was underwater for a while, but it turns out that even though that's only a couple blocks away, we're several feet higher in elevation, so the water didn't crest the sidewalk on our block. We did have some inexplicable roof leaks, but nothing too severe.

At one point at the height of the storm, we had dirty water jet up out of the sink drains and water fountains! It did not seem to be sewage, so our best guess is that it was roof water that had nowhere else to go because the sewers were already at capacity.

Despite our worry, we were not burgled a third time on New Year's Day, probably because:

A suspect was arrested on New Year's Eve after running a red light near 4th and Townsend.

[He] was booked into county jail for nine counts of burglary, possession of burglary tools, possession of methamphetamine and possession of narcotics.

SFPD posted this fun photo of his burglary tools, in case you're looking for some tips.

Man arrested, accused of breaking into 10 SF stores:

The burglaries followed a similar pattern of the suspect forcing entry through the front of a business, causing damage. Once inside, the suspect stole cash from registers, safes or ATMs, and various other items from the stores.

Not their first rodeo: Matt and Kayla were also arrested in 2021:

Officers arrived on scene and located two suspects in the process of stealing two vehicles. [...] they fled into another stolen vehicle and drove in the officer's direction at a high rate of speed, causing him to dive out of the way to prevent from being struck. The suspects then fled on foot into a nearby cemetery. [...]

Both suspects [admitted] to numerous thefts throughout the Bay Area. Lake and Gutierrez were transported and booked into San Mateo County Jail on numerous charges.

I can't imagine a scenario where we get any of our money back, however. You will be shocked, shocked to learn that insurance is a scam.

Turns out our insurance policy basically doesn't cover cash. It doesn't matter what kind of paperwork we have documenting the amount of cash that was stolen, the policy caps that at $5k, minus a $1k deductible.

And regardless of whether the insurance company found some reason to deny the claim entirely -- which they almost certainly would -- the mere act of filing the claim would cause our rate to (purely coincidentally) go up by more than $4k per year.

"Nice policy you have there, shame if something were to happen to it."

And I'm gonna guess that bringing a civil suit against a meth-head is also not going to turn out to be an effective strategy.

I'll bet Louis Vuitton doesn't have these problems.

Donations appreciated!

Posted Wed Jan 25 00:31:09 2023 Tags:
Big Tech layoffs are in the news, you say?

On January 20th, 1998, Netscape laid off a lot of people. One of them would have been me, as my "department", such as it was, had been eliminated, but I ended up mometarily moving from "clienteng" over to the "website" division. For about 48 hours I thought that I might end up writing a webmail product or something.

That, uh, didn't happen.

At 8am on January 22, 1998, Netscape put out a press release announcing that the source code to the web browser would be released to the public at the end of March. This was the first that I had heard that this was even being considered.

Lacking any coherent information or direction from management (spoiler alert, there was no plan! none!) a handful of us in the trenches had some impromptu meetings, which began something like:

"What the fuck, I mean what the actual fuck?"
"I thought you got fired? Someone told me you were fired."
"I don't think I'm fired, are you fired?" "I don't think so?"
"Ok so are we doing this? I guess we're doing this?"
"We're doing what now?"

"I got this."

So then I registered the domain mozilla.org. According to WHOIS, the registration went live on January 23rd at 9pm.

The rest, as they say, is cvs log. I mean history. The rest is history.

Here are some photos I took at a meeting we held in early February 1998. (On film! With a camera manfactured in the nineteen seventies! Every photo you took, even the bad ones, cost you like a dollar!) That's Pacman trying to explain to The Usual Suspects the proposed org chart that I had drawn on the wall. Please note that "THE INTERNET" is represented as A CLOUD, because that was the style at the time.


The oldest version of mozilla.org in the Wayback Machine is from December 12, 1998, so I have reconstructed some older versions of the web site.

For the first month, I was hosting the mozilla.org domain on my own server, just to have a placeholder there, and I don't seem to have a copy of that first version. It took me that whole month to figure out how to move the hosting into the corporate data center. But here's the oldest version that I was able to reconstruct from the mozilla.org CVS repository:

And here are a few later copies:

That "Sponsored by DevEdge Online" thing in the top banner is because upper management assumed that the way "open source" worked was, the internal "developer relations" consultancy division would just fart out a zip file and then corporate customers would... handwave handwave... pay us for something? Disabusing them of this notion was a big part of my job that first month.

Fun fact! When I wrote the mozilla.org web site, I designed it to have a "source" directory that contained just the document bodies, and a Makefile generated an output directory that wrapped the headers and menus and such around that to emit the static web site that was actually served. The output directory was not checked into the source control archive, obviously, so I don't have a copy of that. So... I dug up the old CVS archive, checked out those old web site source revs, and then I had to run that website-generating perl script that I wrote 25 years ago.

...it worked without any modifications. Self-high-five.

And I gotta say, that old web site design hasn't really decayed much. If I were tweaking it today I'd have put a max-width on body of 50em or so to avoid the long lines, and I would for sure be using something sans-serif, but I think it still looks pretty good! (Remember, CSS was not even remotely a thing yet. You wanted rounded corners, you had to chisel that shit from flint.)

Here's some other Mozilla-relevant stuff:


Previously, previously, previously, previously, previously, previously, previously, previously.

Posted Mon Jan 23 09:28:34 2023 Tags:
BMG is suing the makers of Poopsie Slime Surprise for ripping off "My Humps" with their own song, "My Poops."

Poopsie Slime Surprise comes from MGA, who are responsible for the Bratz line of dolls. According to the lawsuit, "My Poops" plays on one of the dolls when you press a button on its belly, leading both to dance moves and to a less savory movement: the toys "excrete sparkling slime." [...]

"My poops, my poops my poops my poops," the unicorn sings. The lyrics continue, "Whatcha gonna do with all that poop, all that poop," and adds, "I drive my parents crazy, I do it every day."

BMG alleges that Poopsie Slime Surprise has made MGA tens of millions of dollars and that the company ignored cease-and-desist warnings. They are asking for a steaming $10 million in damages.

(This is of course not the first time a wide-eyed shitting unicorn has graced these pages.)

Previously, previously, previously, previously, previously, previously, previously, previously.

Posted Fri Jan 20 22:11:42 2023 Tags:

After 8 months of work by Yinon Burgansky, libinput now has a new pointer acceleration profile: the "custom" profile. This profile allows users to tweak the exact response of their device based on their input speed.

A short primer: the pointer acceleration profile is a function that multiplies the incoming deltas with a given factor F, so that your input delta (x, y) becomes (Fx, Fy). How this is done is specific to the profile, libinput's existing profiles had either a flat factor or an adaptive factor that roughly resembles what Xorg used to have, see the libinput documentation for the details. The adaptive curve however has a fixed behaviour, all a user could do was scale the curve up/down, but not actually adjust the curve.

Input speed to output speed

The new custom filter allows exactly that: it allows a user to configure a completely custom ratio between input speed and output speed. That ratio will then influence the current delta. There is a whole new API to do this but simplified: the profile is defined via a series of points of (x, f(x)) that are linearly interpolated. Each point is defined as input speed in device units/ms to output speed in device units/ms. For example, to provide a flat acceleration equivalent, specify [(0.0, 0.0), (1.0, 1.0)]. With the linear interpolation this is of course a 45-degree function, and any incoming speed will result in the equivalent output speed.

Noteworthy: we are talking about the speed here, not any individual delta. This is not exactly the same as the flat acceleration profile (which merely multiplies the deltas by a constant factor) - it does take the speed of the device into account, i.e. device units moved per ms. For most use-cases this is the same but for particularly slow motion, the speed may be calculated across multiple deltas (e.g. "user moved 1 unit over 21ms"). This avoids some jumpyness at low speeds.

But because the curve is speed-based, it allows for some interesting features too: the curve [(0.0, 1.0), (1.0, 1.0)] is a horizontal function at 1.0. Which means that any input speed results in an output speed of 1 unit/ms. So regardless how fast the user moves the mouse, the output speed is always constant. I'm not immediately sure of a real-world use case for this particular case (some accessibility needs maybe) but I'm sure it's a good prank to play on someone.

Because libinput is written in C, the API is not necessarily immediately obvious but: to configure you pass an array of (what will be) y-values and set the step-size. The curve then becomes: [(0 * step-size, array[0]), (1 * step-size, array[1]), (2 * step-size, array[2]), ...]. There are some limitations on the number of points but they're high enough that they should not matter.

Note that any curve is still device-resolution dependent, so the same curve will not behave the same on two devices with different resolution (DPI). And since the curves uploaded by the user are hand-polished, the speed setting has no effect - we cannot possibly know how a custom curve is supposed to scale. The setting will simply update with the provided value and return that but the behaviour of the device won't change in response.

Motion types

Finally, there's another feature in this PR - the so-called "movement type" which must be set when defining a curve. Right now, we have two types, "fallback" and "motion". The "motion" type applies to, you guessed it, pointer motion. The only other type available is fallback which applies to everything but pointer motion. The idea here is of course that we can apply custom acceleration curves for various different device behaviours - in the future this could be scrolling, gesture motion, etc. And since those will have a different requirements, they can be configure separately.

How to use this?

As usual, the availability of this feature depends on your Wayland compositor and how this is exposed. For the Xorg + xf86-input-libinput case however, the merge request adds a few properties so that you can play with this using the xinput tool:


# Set the flat-equivalent function described above
$ xinput set-prop "devname" "libinput Accel Custom Motion Points" 0.0 1.0
# Set the step, i.e. the above points are on 0 u/ms, 1 u/ms, ...
# Can be skipped, 1.0 is the default anyway
$ xinput set-prop "devname" "libinput Accel Custom Motion Points" 1.0
# Now enable the custom profile
$ xinput set-prop "devname" "libinput Accel Profile Enabled" 0 0 1
The above sets a custom pointer accel for the "motion" type. Setting it for fallback is left as an exercise to the reader (though right now, I think the fallback curve is pretty much only used if there is no motion curve defined).

Happy playing around (and no longer filing bug reports if you don't like the default pointer acceleration ;)

Availability

This custom profile will be available in libinput 1.23 and xf86-input-libinput-1.3.0. No release dates have been set yet for either of those.

Posted Tue Jan 17 04:47:00 2023 Tags:

In the beginning, there was the egg. Then fictional people started eating that from different ends, and the terms of "little endians" and "Big Endians" was born.

Computer architectures (mostly) come with one of either byte order: MSB first or LSB first. The two are incompatible of course, and many a bug was introduced trying to convert between the two (or, more common: failing to do so). The two byte orders were termed Big Endian and little endian, because that hilarious naming scheme at least gives us something to laugh about while contemplating throwing it all away and considering a future as, I don't know, a strawberry plant.

Back in the mullet-infested 80s when the X11 protocol was designed both little endian and big endian were common enough. And back then running the X server on a different host than the client was common too - the X terminals back then had less processing power than a smart toilet seat today so the cpu-intensive clients were running on some mainfraime. To avoid overtaxing the poor mainframe already running dozens of clients for multiple users, the job of converting between the two byte orders was punted to the X server. So to this day whenever a client connects, the first byte it sends is a literal "l" or "B" to inform the server of the client's byte order. Where the byte order doesn't match the X server's byte order, the client is a "swapped client" in X server terminology and all 16, 32, and 64-bit values must be "byte-swapped" into the server's byte order. All of those values in all requests, and then again back to the client's byte order in all outgoing replies and events. Forever, till a crash do them part.

If you get one of those wrong, the number is no longer correct. And it's properly wrong too, the difference between 0x1 and 0x01000000 is rather significant. [0] Which has the hilarious side-effect of... well, pretty much anything. But usually it ranges from crashing the server (thus taking all other clients down in commiseration) to leaking random memory locations. The list of security issues affecting the various SProcFoo implementations (X server naming scheme for Swapped Procedure for request Foo) is so long that I'm too lazy to pull out the various security advisories and link to them. Just believe me, ok? *jedi handwave*

These days, encountering a Big Endian host is increasingly niche, letting it run an X client that connects to your local little-endian X server is even more niche [1]. I think the only regular real-world use-case for this is running X clients on an s390x, connecting to your local intel-ish (and thus little endian) workstation. Not something most users do on a regular basis. So right now, the byte-swapping code is mainly a free attack surface that 99% of users never actually use for anything real. So... let's not do that?

I just merged a PR into the X server repo that prohibits byte-swapped clients by default. A Big Endian client connecting to an X server will fail the connection with an error message of "Prohibited client endianess, see the Xserver man page". [2] Thus, a whole class of future security issues avoided - yay!

For the use-cases where you do need to let Big Endian clients connect to your little endian X server, you have two options: start your X server (Xorg, Xwayland, Xnest, ...) with the +byteswappedclients commandline option. Alternatively, and this only applies for Xorg: add Option "AllowByteSwappedClients" "on" to the xorg.conf ServerFlags section. Both of these will change the default back to the original setting. Both are documented in the Xserver(1) and xorg.conf(5) man pages, respectively.

Now, there's a drawback: in the Wayland stack, the compositor is in charge of starting Xwayland which means the compositor needs to expose a way of passing +byteswappedclients to Xwayland. This is compositor-specific, bugs are filed for mutter (merged for GNOME 44), kwin and wlroots. Until those are addressed, you cannot easily change this default (short of changing /usr/bin/Xwayland into a wrapper script that passes the option through).

There's no specific plan yet which X releases this will end up in, primarily because the release cycle for X is...undefined. Probably xserver-23.0 if and when that happens. It'll probably find its way into the xwayland-23.0 release, if and when that happens. Meanwhile, distributions interested in this particular change should consider backporting it to their X server version. This has been accepted as a Fedora 38 change.

[0] Also, it doesn't help that much of the X server's protocol handling code was written with the attitude of "surely the client wouldn't lie about that length value"
[1] little-endian client to Big Endian X server is so rare that it's barely worth talking about. But suffice to say, the exact same applies, just with little and big swapped around.
[2] That message is unceremoniously dumped to stderr, but that bit is unfortunately a libxcb issue.

Posted Fri Jan 6 02:15:00 2023 Tags:

Time for another status update on libei, the transport layer for bouncing emulated input events between applications and Wayland compositors [1]. And this time it's all about portals and how we're about to use them for libei communication. I've hinted at this in the last post, but of course you're forgiven if you forgot about this in the... uhm.. "interesting" year that was 2022. So, let's recap first:

Our basic premise is that we want to emulate and/or capture input events in the glorious new world that is Wayland (read: where applications can't do whatever they want, whenever they want). libei is a C library [0] that aims to provide this functionality. libei supports "sender" and "receiver" contexts and that just specifies which way the events will flow. A sender context (e.g. xdotool) will send emulated input events to the compositor, a "receiver" context will - you'll never guess! - receive events from the compositor. If you have the InputLeap [2] use-case, the server-side will be a receiver context, the client side a sender context. But libei is really just the transport layer and hasn't had that many changes since the last post - most of the effort was spent on trying to figure out how to exchange the socket between different applications. And for that, we have portals!

RemoteDesktop

In particular, we have a PR for the RemoteDesktop portal to add that socket exchange. In particular, once a RemoteDesktop session starts your application can request an EIS socket and send input events over that. This socket supersedes the current NotifyButton and similar DBus calls and removes the need for the portal to stay in the middle - the application and compositor now talk directly to each other. The compositor/portal can still close the session at any time though, so all the benefits of a portal stay there. The big advantage of integrating this into RemoteDesktop is that the infrastructucture for that is already mostly in place - once your compositor adds the bits for the new ConnectToEIS method you get all the other pieces for free. In GNOME this includes a visual indication that your screen is currently being remote-controlled, same as from a real RemoteDesktop session.

Now, talking to the RemoteDesktop portal is nontrivial simply because using DBus is nontrivial, doubly so for the way how sessions and requests work in the portals. To make this easier, libei 0.4.1 now includes a new library "liboeffis" that enables your application to catch the DBus. This library has a very small API and can easily be integrated with your mainloop (it's very similar to libei). We have patches for Xwayland to use that and it's really trivial to use. And of course, with the other Xwayland work we already had this means we can really run xdotool through Xwayland to connect through the XDG Desktop Portal as a RemoteDesktop session and move the pointer around. Because, kids, remember, uhm, Unix is all about lots of separate pieces.

InputCapture

On to the second mode of libei - the receiver context. For this, we also use a portal but a brand new one: the InputCapture portal. The InputCapture portal is the one to use to decide when input events should be captured. The actual events are then sent over the EIS socket.

Right now, the InputCapture portal supports PointerBarriers - virtual lines on the screen edges that, once crossed, trigger input capture for a capability (e.g. pointer + keyboard). And an application's basic approach is to request a (logical) representation of the available desktop areas ("Zones") and then set up pointer barriers at the edge(s) of those Zones. Get the EIS connection, Enable() the session and voila - the compositor will (hopefully) send input events when the pointer crosses one of those barriers. Once that happens you'll get a DBus signal in InputCapture and the events will start flowing on the EIS socket. The portal itself doesn't need to sit in the middle, events go straight to the application. The portal can still close the session anytime though. And the compositor can decide to stop capturing events at any time.

There is actually zero Wayland-y code in all this, it's display-system acgnostic. So anyone with too much motivation could add this to the X server too. Because that's what the world needs...

The (currently) bad news is that this needs to be pulled into a lot of different repositories. And everything needs to get ready before it can be pulled into anything to make sure we don't add broken API to any of those components. But thanks to a lot of work by Olivier Fourdan, we have this mostly working in InputLeap (tbh the remaining pieces are largely XKB related, not libei-related). Together with the client implementation (through RemoteDesktop) we can move pointers around like in the InputLeap of old (read: X11).

Our current goal is for this to be ready for GNOME 45/Fedora 39.

[0] eventually a protocol but we're not there yet
[1] It doesn't actually have to be a compositor but that's the prime use-case, so...
[2] or barrier or synergy. I'll stick with InputLeap for this post

Posted Tue Dec 13 03:24:00 2022 Tags:

I just did a presentation at SREcon Conversations (which I call SREconcon) EMEA, called the "Curse of Unreasonably Sized Networks." I talked about the series of Dunbar's numbers and how they relate to different kinds of human social networks, and surprisingly, also to the evolution of the Internet.

Posted Sun Nov 20 02:04:01 2022 Tags:

In August I reported about 2D depiction of (CX)SMILES in Wikidata via linkouts (going back to 2017). Based on a script by Magnus Manske, I wrote a Wikidata gadget that uses the same CDK Depict (VHP4Safety mirror) to depict the 2D structure in Wikidata itself:

Depicting of part of a Wikidata page with 2D structures of a canonical SMILES and
matching CXSMILES.

Note the depiction of the undefined (CIP) stereochemistry on two atoms. Thanks to Adriano and John for working that out.

More about CXSMILES in Wikidata in this Dagstuhl meeting results write up.

Posted Sat Nov 12 12:09:00 2022 Tags:

 In Silicon Valley is a very exclusive fast-food restaurant, which is always open. There is one table, where one guest at a time is served an absolutely fabulous hamburger. When you arrive, you wait in line until the table is available. Then the host takes you to the table and, this being America, you are asked a seemingly endless series of questions about how you would like your hamburger to be cooked and served.

But today we're not talking about culinary delights. We're talking about the queuing system used by the restaurant. If you are lucky to arrive at the restaurant when the table is available and there are no other guests waiting, you are seated right away. Otherwise, the host gives you a buzzer (from an infinite stack of buzzers!) and you are free to roam the neighborhood until your buzzer goes off. It is the host's job to ensure that guests are seated in order of arrival. When it is your turn, the host will cause your buzzer go off and you make your way back to the restaurant, where you will be seated.

If you change your mind, you can return the buzzer to the host, who will take it back without lifting an eyebrow. If your buzzer has already gone off, the host will buzz the next guest, if any. Guests are always polite and don't abscond with their buzzers. The host is always fair and doesn't seat another guest ahead of you even if you take your time making it back.

The above description fits that of a Lock. A guest arriving corresponds to the acquire() call; leaving is a release() call. Changing your mind is like getting cancelled while waiting in acquire(). You can change your mind before or after your buzzer goes off, i.e., you can be cancelled before or after the lock has awakened your call (but before you return from acquire()).

One day the restaurant expands, hiring extra sous-chefs and opening several new tables. There is still only one host, whose job is not really changed. However, since multiple guests can be seated concurrently, a Semaphore must now be used instead of a simple Lock.

It turns out that implementing synchronization primitives is hard. This is somewhat surprising in the case of asyncio, since only one task can be executing at a time, and task switches only happen at await. But in the past year its fairness, correctness, semantics and performance have all been challenged. In fact, the last three complaints happened in the last month, and, being the last asyncio expert standing, I had to learn in a hurry what's the best way to think about semaphores.

The restaurant metaphor was very useful. For example, there is a difference between the number of open tables and the number of guests who may be seated immediately, and it equals the number of guests whose buzzer has gone off but who haven't come back to the host yet.

There was one particular challenge to fairness, where a task that released a semaphore and then immediately tried to acquire it again could starve other tasks. This is like a guest walking out, turning around, and getting seated again ahead of other waiting guests.

And there was a bug where a cancelled acquire() call could leave the lock in a bad state. This is like the host getting confused when a guest with a buzzing buzzer returns it but declines to be seated.

The restaurant metaphor didn't help with everything: cancellation behavior in asyncio is just complex. In Python 3.11 we have started putting extra strain on cancellation, because of two new asynchronous context managers we added:

  • Class TaskGroup for managing a group of related tasks. When one task fails, the others are cancelled, and the context manager waits for all tasks to exit.
  • timeout() function for managing timeouts. When the timeout goes off, the current task is cancelled.

Here is the main complication of cancellation handling:

  • When waiting for a Future, that Future may be cancelled, and then the await operation fails, raising CancelledError.
  • But when awaiting a Future raises CancelledError you cannot assume that the Future was cancelled! It is also possible that the Future was already marked as having a result (so it can no longer be cancelled), and your task has been marked as runnable, but another (also runnable) task runs first and cancels your task. I am grateful to Cyker Way for pointing out this corner case.

 It helps to think of Futures as being in one of four states:

  • Waiting
  • Done, holding a result
  • Done, holding an exception
  • Done, but cancelled

From the waiting state a Future can transition to one of the other states, and then it cannot change state again. (Insert cute picture of state diagram here. :-)

The semaphore manages a FIFO queue of waiters. It does not use the exception state, but it does use the other three states:

  • Waiting: a guest with a buzzer that hasn't gone off yet
  • Holding a result: a guest who has been buzzed
  • Cancelled: a guest who returns their buzzer before it goes off

Fairness is supposed to be ensured by always appending a new Future to the queue to the end when acquire() finds the semaphore locked, and by always marking the leftmost (i.e., oldest) Future in the queue as holding a result when release() is called while queue isn't empty. The fairness bug was due acquire() taking a shortcut when the Semaphore's level (the number of open tables) is nonzero. It should not do this when there are still Futures in the queue. In other words we were sometimes seating a newly arrived guest when there was an open table even though there was already a guest waiting.

Guess what caused the cancellation bug? The scenario where a Future is holding a result (guest with buzzer buzzing) but the task awaiting that Future gets cancelled (guest declining to be seated).

I struggled to visualize the state of the Semaphore for myself, with its level and FIFO queue of waiting Futures. I also struggled with the definition of locked(). If the level variable had been public I would have struggled with its semantics too. In the end I came up with the following definitions:

  • W, the list of waiting futures, or [f for f in queue if not f.done()]
  • R, the list of futures holding a result, or [f for f in queue if f.done() and not f.cancelled()]
  • C, the list of cancelled futures, or [f for f in queue if f.cancelled()]

 and some invariants:

  • set(W + R + C) == set(queue) — all futures are either waiting, have a result, or are cancelled.
  • level >= len(R) — we must have at least as many open tables as there are guests holding buzzing buzzers.
  • define locked() as (len(W) > 0 or len(R) > 0 or level == 0) — we cannot immediately seat anyone unless (a) there are no guests waiting for their buzzer to go off, (b) there are no guests holding a buzzing buzzer,  and (c) there is at least one open table.

 I leave you with a final link, to the current code.

Posted Wed Oct 5 06:39:00 2022 Tags:
Posted Thu Sep 8 22:00:00 2022 Tags: