This feed omits posts by jwz. Just 'cause.

Denmark is passing a law to privilege religion by making it a crime to burn, cut or dirty a religious scripture.

This is a surrender to bullying by people who demand we all bow down to their religion.

Posted Sat Dec 9 05:40:42 2023 Tags:

A court ordered Georgia Republicans to create another majority-black election district, and they did, but they cheated: they made a new one by taking apart an old one.

Posted Sat Dec 9 05:40:42 2023 Tags:

After 100 years of hostility since the end of the war between them, Greece and Turkey have made a friendship agreement in an effort to end the hostility.

Posted Sat Dec 9 05:40:42 2023 Tags:

Biden is thinking of legally imposing a lower price on insulin. Sanders calls for going well beyond that, including reducing prices for other drugs whose development was funded by public funds.

I agree, but we need more than that, We must prevent patents from being a tool to impose unconscionable prices on drugs sold in the US or elsewhere, The thorough way would be to eliminate patents from medicine.

Posted Sat Dec 9 05:40:42 2023 Tags:

*The Palestinian human rights organisation Al-Haq and the UK-based Global Legal Action Network have applied for a judicial review of the [UK] government's export licences for the sale of British weapons capable of being used in Israel’s action in Gaza.*

Posted Sat Dec 9 05:40:42 2023 Tags:

The Tory strangulation of Britain's National Health Service is achieving its effect: millions of Britons are now limited to private medical treatment. Often they cannot afford that, so they are without medical treatment at all.

I've asserted before that this was the Tories' goal. In the past, one could hope Labour would fix this. But Starmer refuses to tax the rich, and without more funds he will be unable to fix it.

Posted Sat Dec 9 05:40:42 2023 Tags:

*Biden’s rhetoric toward the Netanyahu government is toughening. But critics say his words aren’t backed up by the threat of action.*

I agree with those critics. Netanyahu's mass murder in Gaza has continued for weeks. Biden must insist that it stop.

Posted Sat Dec 9 05:40:42 2023 Tags:

*Texas judge rules woman with non-viable pregnancy can have an abortion.*

Posted Sat Dec 9 05:40:42 2023 Tags:

The Revolving Door Project reports that the head of the Commodities Future Trading Commission, Rostin Behnam, is considering resigning early to work for business.

This raises the concern that his new employer might be paying him off for subtle past corruption or paying to use of his influence for subtle future corruption.

Posted Sat Dec 9 05:40:42 2023 Tags:

Spending time with one of the Tyre Extinguishers, who remind owners of SUVs what damage they do by buying and using an SUV.

Posted Sat Dec 9 05:40:42 2023 Tags:

Here are some examples of the intuitive insights I mentioned when talking about Beeping Booping Busy Beavers:

  • Fermat’s Last Theorem (as a conjecture) is an example of a level 1 question, answerable by an ordinary Busy Beaver

  • The Twin Primes Conjecture is an example of a level 2 question, which requires a level 2 Busy Beaver which has access to a level 1 Busy Beaver oracle which answers questions about whether a level 1 beaver halts. The Beeping Busy Beaver equivalence clarifies this. Level 2 questions are generally more difficult than level 1 questions.

  • An example of a level 3 question is ‘Does there exist a C such that 3^A-2^B=C has an infinite number of solutions in the integers?’ The equivalence to Beeping Booping Beavers shows the level. Level 3 questions are generally more difficult than level 2 questions, but they’re thought about much less often in mathematics. I came up with this possibly original example because no simple famous one comes to mind.

Unrelated to that I previously made a cool animation illustrating a possibly better fill pattern along with code with generates it. There’s now a new plug-in system for Cura which should make it straightforward to add this pattern. If anyone implements that it would be much appreciated. People have criticized this pattern because it isn’t all curves like gyroid, and I initially assumed myself that should be changed, but now I’m not so sure. Gyroid isn’t really the mathematical gyroid because it gets very thin in places because of how layers work which gives it points of weakness, and the vibration of having those sharp corners is much less of an issue with input shaping and other 3d printer improvements. What matters most now is strength of support for the amount of support material used and this pattern should score very well on that benchmark.

Posted Thu Dec 7 20:50:05 2023 Tags:

I previously babbled about possible antidotes to the insidious adversarial attacks on neural networks. In the interests of making something which is likely to work I’ve been having shower thoughts about how to detune my construction. Instead of increasing the number of bits an attacker needs to control from 2 to some factor of N the goal is to increas it to 10 or so. That’s still a meaningful improvement and more likely to succeed. It may also result in big practical gains because although it isn’t all that much more resistant to true adversaries it may ‘hallucinate’ a lot less against accidentally adversarial data which occurs just by chance.

Trivially we can make progress towards the goal by taking all the inputs, putting them into 3 different buckets, segregating everything into 3 different smaller neural networks which lead to 3 different outputs at the end and then making the one true output be the sum of those 3. This is straightforward to implement by designating a third of each layer to each of the buckets and zeroing out all connections between buckets, then on the final layer for the designated output value set the weight of the connection from one node in each bucket to 1 and the rest to zero.

Obviously this increases adversary resistance but does so at substantial cost to accuracy. What we’d like to do is get both accuracy and adversary resistance by making it so that all inputs go into the next to last layer outputs but they do so via such wildly different paths that manipulating the original inputs doesn’t cause them all to change in tandem. Hopefully that results in a neural network which can be trained as normal and automatically has adversary resistance without a big hit in accuracy. I will now give a minimalist construction which has that structure.

Group the input values into six buckets. After that there will be phase 1, which is grouped into 15 different mini neural networks corresponding to the 15 different ways of picking exactly two of the input buckets to process. Next is a second phase which is also subdivided into 15 different mini neural networks. They correspond to the 15 different ways of taking three different things from the first phase such that each of the original buckets gets included exactly once, and those are taken as the inputs in the phase transition.

Concretely lets say that the input buckets are numbered 1 through 6. The first phase groups are A: 12, B: 13, C: 14, D: 15, E: 16, F: 23, G: 24, H: 25, I: 26, J: 34, K: 35, L: 36, M: 45, N: 46, O: 56. The second phase groupings are: AJO, AKN, ALM, BGO, BHN, BIM, CFO, CHL, CIK, DFN, DGL, DIJ, EFM, EGK, EHJ. This set of groupings has many beautiful mathematical properties, including that each of the original buckets goes into each of the final outputs exactly once, each bucket is paired with each other bucket in the first phase exactly once, and each first phase group is paired with every other first phase group exactly once in the second phase.

Finally the last layer should take an input weight one from exactly one node in each of the phase 2 groups and all its other input connections are weight zero.

In order to keep this from simply repeating all the inputs at the end of the first phase and then doing all the computation in the second phase and hence not having any adversary resistance advantage any more it’s probably necessary to pinch things a bit, either by zeroing out a significant fraction of the values in the layer before the phase transition or limiting the depth of the second phase or both.

Posted Wed Dec 6 20:52:58 2023 Tags:

The Net Promoter Score (NPS) is a statistically questionable way to turn a set of 10-point ratings into a single number you can compare with other NPSes. That's not the good part.


To understand the good parts, first we have to start with humans. Humans have emotions, and those emotions are what they mostly use when asked to rate things on a 10-point scale.

Almost exactly twenty years ago, I wrote about sitting on a plane next to a musician who told me about music album reviews. The worst rating an artist can receive, he said, is a lukewarm one. If people think your music is neutral, it means you didn't make them feel anything at all. You failed. Someone might buy music that reviewers hate, or buy music that people love, but they aren't really that interested in music that is just kinda meh. They listen to music because they want to feel something.

(At the time I contrasted that with tech reviews in computer magazines (remember those?), and how negative ratings were the worst thing for a tech product, so magazines never produced them, lest they get fewer free samples. All these years later, journalism is dead but we're still debating the ethics of game companies sponsoring Twitch streams. You can bet there's no sponsored game that gets an actively negative review during 5+ hours of gameplay and still gets more money from that sponsor. If artists just want you to feel something, but no vendor will pay for a game review that says it sucks, I wonder what that says about video game companies and art?)

Anyway, when you ask regular humans, who are not being sponsored, to rate things on a 10-point scale, they will rate based on their emotions. Most of the ratings will be just kinda meh, because most products are, if we're honest, just kinda meh. I go through most of my days using a variety of products and services that do not, on any more than the rarest basis, elicit any emotion at all. Mostly I don't notice those. I notice when I have experiences that are surprisingly good, or (less surprisingly but still notably) bad. Or, I notice when one of the services in any of those three categories asks me to rate them on a 10-point scale.

The moment

The moment when they ask me is important. Many products and services are just kinda invisibly meh, most of the time, so perhaps I'd give them a meh rating. But if my bluetooth headphones are currently failing to connect, or I just had to use an airline's online international check-in system and it once again rejected my passport for no reason, then maybe my score will be extra low. Or if Apple releases a new laptop that finally brings back a non-sucky keyboard after making laptops with sucky keyboards for literally years because of some obscure internal political battle, maybe I'll give a high rating for a while.

If you're a person who likes manipulating ratings, you'll figure out what moments are best for asking for the rating you want. But let's assume you're above that sort of thing, because that's not one of the good parts.

The calibration

Just now I said that if I'm using an invisible meh product or service, I would rate it with a meh rating. But that's not true in real life, because even though I was having no emotion about, say, Google Meet during a call, perhaps when they ask me (after how it was, that makes me feel an emotion after all. Maybe that emotion is "leave me alone, you ask me this way too often." Or maybe I've learned that if I pick anything other than five stars, I get a clicky multi-tab questionnaire that I don't have time to answer, so I almost always pick five stars unless the experience was so bad that I feel it's worth an extra minute because I simply need to tell the unresponsive and uncaring machine how I really feel.

Google Meet never gets a meh rating. It's designed not to. In Google Meet, meh gets five stars.

Or maybe I bought something from Amazon and it came with a thank-you card begging for a 5-star rating (this happens). Or a restaurant offers free stuff if I leave a 5-star rating and prove it (this happens). Or I ride in an Uber and there's a sign on the back seat talking about how they really need a 5-star rating because this job is essential so they can support their family and too many 4-star ratings get them disqualified (this happens, though apparently not at UberEats). Okay. As one of my high school teachers, Physics I think, once said, "A's don't cost me anything. What grade do you want?" (He was that kind of teacher. I learned a lot.)

I'm not a professional reviewer. Almost nobody you ask is a professional reviewer. Most people don't actually care; they have no basis for comparison; just about anything will influence their score. They will not feel badly about this. They're just trying to exit your stupid popup interruption as quickly as possible, and half the time they would have mashed the X button instead but you hid it, so they mashed this one instead. People's answers will be... untrustworthy at best.

That's not the good part.

And yet

And yet. As in so many things, randomness tends to average out, probably into a Gaussian distribution, says the Central Limit Theorem.

The Central Limit Theorem is the fun-destroying reason that you can't just average 10-point ratings or star ratings and get something useful: most scores are meh, a few are extra bad, a few are extra good, and the next thing you know, every Uber driver is a 4.997. Or you can ship a bobcat one in 30 times and still get 97% positive feedback.

There's some deep truth hidden in NPS calculations: that meh ratings mean nothing, that the frequency of strong emotions matters a lot, and that deliriously happy moments don't average out disastrous ones.

Deming might call this the continuous region and the "special causes" (outliers). NPS is all about counting outliers, and averages don't work on outliers.

The degrees of meh

Just kidding, there are no degrees of meh. If you're not feeling anything, you're just not. You're not feeling more nothing, or less nothing.

One of my friends used to say, on a scale of 6 to 9, how good is this? It was a joke about how nobody ever gives a score less than 6 out of 10, and nothing ever deserves a 10. It was one of those jokes that was never funny because they always had to explain it. But they seemed to enjoy explaining it, and after hearing the explanation the first several times, that part was kinda funny. Anyway, if you took the 6-to-9 instructions seriously, you'd end up rating almost everything between 7 and 8, just to save room for something unimaginably bad or unimaginably good, just like you did with 1-to-10, so it didn't help at all.

And so, the NPS people say, rather than changing the scale, let's just define meaningful regions in the existing scale. Only very angry people use scores like 1-6. Only very happy people use scores like 9 or 10. And if you're not one of those you're meh. It doesn't matter how meh. And in fact, it doesn't matter much whether you're "5 angry" or "1 angry"; that says more about your internal rating system than about the degree of what you experienced. Similarly with 9 vs 10; it seems like you're quite happy. Let's not split hairs.

So with NPS we take a 10-point scale and turn it into a 3-point scale. The exact opposite of my old friend: you know people misuse the 10-point scale, but instead of giving them a new 3-point scale to misuse, you just postprocess the 10-point scale to clean it up. And now we have a 3-point scale with 3 meaningful points. That's a good part.


So then what? Average out the measurements on the newly calibrated 1-2-3 scale, right?

Still no. It turns out there are three kinds of people: the ones so mad they will tell everyone how mad they are about your thing; the ones who don't care and will never think about you again if they can avoid it; and the ones who had such an over-the-top amazing experience that they will tell everyone how happy they are about your thing.

NPS says, you really care about the 1s and the 3s, but averaging them makes no sense. And the 2s have no effect on anything, so you can just leave them out.

Cool, right?

Pretty cool. Unfortunately, that's still two valuable numbers but we promised you one single score. So NPS says, let's subtract them! Yay! Okay, no. That's not the good part.

The threefold path

I like to look at it this way instead. First of all, we have computers now, we're not tracking ratings on one of those 1980s desktop bookkeeping printer-calculators, you don't have to make every analysis into one single all-encompassing number.

Postprocessing a 10-point scale into a 3-point one, that seems pretty smart. But you have to stop there. Maybe you now have three separate aggregate numbers. That's tough, I'm sorry. Here's a nickel, kid, go sell your personal information in exchange for a spreadsheet app. (I don't know what you'll do with the nickel. Anyway I don't need it. Here. Go.)

Each of those three rating types gives you something different you can do in response:

  • The ones had a very bad experience, which is hopefully an outlier, unless you're Comcast or the New York Times subscription department. Normally you want to get rid of every bad experience. The absence of awful isn't greatness, it's just meh, but meh is infinitely better than awful. Eliminating negative outliers is a whole job. It's a job filled with Deming's special causes. It's hard, and it requires creativity, but it really matters.

  • The twos had a meh experience. This is, most commonly, the majority. But perhaps they could have had a better experience. Perhaps even a great one? Deming would say you can and should work to improve the average experience and reduce the standard deviation. That's the dream; heck, what if the average experience could be an amazing one? That's rarely achieved, but a few products achieve it, especially luxury brands. And maybe that Broadway show, Hamilton? I don't know, I couldn't get tickets, because everyone said it was great so it was always sold out and I guess that's my point.

    If getting the average up to three is too hard or will take too long (and it will take a long time!), you could still try to at least randomly turn a few of them into threes. For example, they say users who have a great customer support experience often rate a product more highly than the ones who never needed to contact support at all, because the support interaction made the company feel more personal. Maybe you can't afford to interact with everyone, but if you have to interact anyway, perhaps you can use that chance to make it great instead of meh.

  • The threes already had an amazing experience. Nothing to do, right? No! These are the people who are, or who can become, your superfan evangelists. Sometimes that happens on its own, but often people don't know where to put that excess positive energy. You can help them. Pop stars and fashion brands know all about this; get some true believers really excited about your product, and the impact is huge. This is a completely different job than turning ones into twos, or twos into threes.

What not to do

Those are all good parts. Let's ignore that unfortunately they aren't part of NPS at all and we've strayed way off topic.

From here, there are several additional things you can do, but it turns out you shouldn't.

Don't compare scores with other products. I guarantee you, your methodology isn't the same as theirs. The slightest change in timing or presentation will change the score in incomparable ways. You just can't. I'm sorry.

Don't reward your team based on aggregate ratings. They will find a way to change the ratings. Trust me, it's too easy.

Don't average or difference the bad with the great. The two groups have nothing to do with each other, require completely different responses (usually from different teams), and are often very small. They're outliers after all. They're by definition not the mainstream. Outlier data is very noisy and each terrible experience is different from the others; each deliriously happy experience is special. As the famous writer said, all meh families are alike.

Don't fret about which "standard" rating ranges translate to bad-meh-good. Your particular survey or product will have the bad outliers, the big centre, and the great outliers. Run your survey enough and you'll be able to find them.

Don't call it NPS. NPS nowadays has a bad reputation. Nobody can really explain the bad reputation; I've asked. But they've all heard it's bad and wrong and misguided and unscientific and "not real statistics" and gives wrong answers and leads to bad incentives. You don't want that stigma attached to your survey mechanic. But if you call it a satisfaction survey on a 10-point or 5-point scale, tada, clear skies and lush green fields ahead.

Bonus advice

Perhaps the neatest thing about NPS is how much information you can get from just one simple question that can be answered with the same effort it takes to dismiss a popup.

I joked about Google Meet earlier, but I wasn't really kidding; after having a few meetings, if I had learned that I could just rank from 1 to 5 stars and then not get guilted for giving anything other than 5, I would do it. It would be great science and pretty unobtrusive. As it is, I lie instead. (I don't even skip, because it's faster to get back to the menu by lying than by skipping.)

While we're here, only the weirdest people want to answer a survey that says it will take "just 5 minutes" or "just 30 seconds." I don't have 30 seconds, I'm busy being mad/meh/excited about your product, I have other things to do! But I can click just one single star rating, as long as I'm 100% confident that the survey will go the heck away after that. (And don't even get me started about the extra layer in "Can we ask you a few simple questions about our website? Yes or no")

Also, don't be the survey that promises one question and then asks "just one more question." Be the survey that gets a reputation for really truly asking that one question. Then ask it, optionally, in more places and more often. A good role model is those knowledgebases where every article offers just thumbs up or thumbs down (or the default of no click, which means meh). That way you can legitimately look at aggregates or even the same person's answers over time, at different points in the app, after they have different parts of the experience. And you can compare scores at the same point after you update the experience.

But for heaven's sake, not by just averaging them.

Posted Tue Dec 5 05:01:12 2023 Tags:

As part of the clustering algorithms stuff I sometimes noodle on I’m trying to figure out how to define a toroidal-ish space which wraps around in a way corresponding to the kissing number for each number of dimensions and is straightforward and efficient to implement in software. I’m probably reinventing the wheel here, with the possible exception of keeping an eye on how to efficiently implement things in software, the details of which are probably mathematically trivial, but this is all new to me and isn’t written up anywhere in a way which I could understand so here’s my presentation of it.

(I do this under the assumption that it will result in k+1 things magically packing equidistantly and nicely colorable, a property I will simply claim happens without any justification whatsoever.)

Despite your existing familiarity with sphere packing in 2 and 3 dimensions it’s most insightful to understand those starting with 4. In 4 dimensions you start with a grid. This isn’t great because each thing is only adjacent to 8 others, but also has a similar problem to what hyperspheres do which is that points have clear antipodes, a single point most distant from them, and you’re trying to pack everything in so everything else should be about equidistant. Like with hyperspheres the fix is to glue each point to its antipode, so you add in another grid where everything is offset by 1/2 in each dimension. In 4 dimensions this results in each point having 16 diagonals (because each dimension goes up or down by 1/2) which by the Pythagorean Theorem are magically of length 1 so they’re the same distance as the other 8 for a total of 24.

Happily 24 is exactly the kissing number for 4 dimensions. This construction doesn’t quite work verbatim for less than 4 dimensions because then the diagonals wind up being less than 1 unit away. That can be fixed by stretching out exactly one of the dimensions to make the diagonals all unit length resulting in face centered packing for 3 dimensions and the standard optimal packing for 2 dimensions.

This way of viewing this is also very nice from an implementation standpoint. To find the shortest distance from the origin to a particular point, independently figure out whether to get their in the positive or negative direction for each dimension to find the closest version, then do the same thing for the point offset by one half in each dimension, and take the shorter of those two vectors.

Above 4 dimensions this construction breaks down because the diagonals are already more than a unit away from the beginning. Again it’s easiest to understand what happens next by jumping up a bit. In 8 dimensions there is the E8 group. To construct that you knock out all the points in the grid where the sum of their offsets in each dimension is odd. That leaves all the major diagonals as length square root of 2, and leaves in minor diagonals where you offset on the grid in exactly two dimensions which are also length square root of 2. That leaves total number of kissing spheres as (2^8)/2+(8*7/2)*2^2 = 240 and we’ve got the optimal solution.

This is also straightforward to implement by doing the same thing as with 4 dimensions and below, but keep track of the number of times a point was flipped to the other side and if it’s odd in the end you have to undo whichever flipping made the least gains.

Like in the earlier case the problem with doing the same thing in a lower number of dimensions is that the diagonals are too short so you have to stretch out one of the dimensions to make them the same length as the other points. In 5 dimensions this nails it again, with 16 major diagonals and 24 minor diagonals for a total of 40 which is the best known, and in 6 dimensions it likewise nails it with 32+40=72.

In 7 dimensions things break down and I get horribly confused. This construction yields 124, but the best known is 126. This is likely too small a difference to matter in practice, but it’s very strange and it would be nice if someone could explain to me where those extra 2 are from some strange other diagonal or if it’s a completely different construction or if the kissing number is improved with some much less symmetric smushing of the pieces around to squeeze in 2 more is being done.

Above 8 dimensions I’m a bit lost. Presumably you can drop everything which isn’t 0 mod 3 instead of 0 mod 2 which will result in many more minor diagonals, but that seems like it isn’t a lattice any more and is unlikely to be able to nail constructing the Leech Lattice. But at least it’s still reasonably easy to implement, albeit with a few more caveats. It seems like a crude but vaguely ballpark estimate for kissing numbers is n choose n/2 divided by n/2.

It might not matter of course. 8 is already a fairly high number of dimensions for what I’m trying to do and I haven’t even benchmarked these against hyperspheres and it seems likely that their potential benefits has already run out at that scale, or maybe falls off a cliff after the magical construction at 8.

Posted Wed Nov 29 00:43:19 2023 Tags:
Responding to a recent blog post. #nist #uncertainty #errorbars #quantification
Posted Sat Nov 25 18:57:43 2023 Tags:

Busy Beaver numbers are the classic example of a well defined noncomputable function. The question is: For a given number of states of a Turing Machine, what’s the maximum number of ones which can be in its final output when it halts? If we’re playing the ‘name the biggest number’ game, this can be used to easily beat everything people normally come up with.

Beeping Busy Beaver numbers grow yet even more profoundly faster than Busy Beaver numbers. For them instead of having one of the state transitions be a halt make one of the state transitions emit a ‘beep’. For a given machine the number is the number of steps it goes through before it emits its final beep. The rate of growth of these numbers is comparable to the rate of growth of machines which are given access to an oracle which can determine whether a given Turing Machine halts, but obviously the beeping construction is much more elegant. (Other things specify a particular state rather than state transation as emitting the beep. Since the transition number is always at least as large and possibly much larger I think it’s a better thing to go with.)

Now you may wonder: Can we go larger? Is there an elegant construction which corresponds in power to having a Turing Machine which can access an oracle which tells it whether a given Turing Machine which has access to whether a regular Turing Machine halts in turn halts? Sorry for how hard it is to parse that sentence, we’re looking for the level above Beeping Busy Beavers. It turns out the answer is yes there is, by having a Beeping Booping Busy Beaver, which is the new idea I’m presenting here.

Like a Beeping Busy Beaver, a Beeping Booping Busy Beaver never halts. It has two state transitions which are special, one which emits a ‘beep’ and one which emits a ‘boop’. Its output is interpreted by counting the number of beeps between each successive boop, resulting in a series of integers. To calculate its number, find the first output value which is later repeated an infinite number of times and count the number of steps it took to first finish that output.

Proving that this is computationally equivalent is left as an exercise to the reader, mostly because I don’t know how to do it, but Scott Aaronson assures me that it does in fact work. As a mathematician I’m a lot better and constructing things than proving things.

A few things jump out here. First of all the Beeping Booping construction gives some insights into which number theory questions correspond to which level of Turing Machine, which I for one didn’t realize before, so that’s an actually useful output of this silliness. Also it seems obvious that there should be some kind of Beep Boop Buzz construction which goes one higher, but oddly I have no idea how to construct that, so maybe mathematicians only rarely ask questions of that level.

Despite it not being obvious how to add in a Buzz there is a clear pattern going on here. Each Beeping Busy Beaver is a Turing Machine which was disqualified from having a Busy Beaver number because it never halts. Likewise each Beep Boop Beaver was disqualified from having a beeping number because it never stops beeping. Maybe a Beep Boop Buzz Beaver is a failed Beep Boop Beaver because each of its boop counts doesn’t repeat infinitely, but it isn’t at all obvious how that should work.

The funny thing about these higher level machines is that they aren’t new machines at all, they’re simply new ways of interpreting the behavior of regular Turing Machines. When something simply looks ‘messy’ you haven’t fully grokked the meaning of its output, and maybe there’s something very coherent which it’s doing if you think about it in the right way.

Posted Fri Nov 24 23:05:58 2023 Tags:

This is a model of how most people model probability. It came from the same place which most of the magic numbers I came up with. I have a lot of fiber in my diet. But my magic numbers seem to work well, and this suggests some concrete predictions which could be studied, so it may be worth considering.

In English we have a few words for probability which people have a reasonable internal model of: Definitely not, probably not, maybe, probably yes, definitely yes. These correspond to 0%, 10%, 50%, 90%, and 100% respectively. People round off to these with 0-5% going to 0%, 6-20% going to 10%, 21-79% to 50%, 80-94% to 90%, and 95-100% to 100%.

When people say something will ‘maybe’ happen they’re expressing that they will feel no emotional response when it goes either way. Most people have no meaningful ability to gauge changes in expected value over this range, and will often be dismissive of the difference between 40% and 60% even when the it’s explained to them. Somehow the difference in expected value between 0% and 20% is much more meaningful than the difference between 40% and 60%. This shows up in bizarre ways in public discourse, where a road policy which causes an identifiable driver to die is viewed as murder, but a policy which causes a demonstrably much larger number of drivers to die but with link being statistical is viewed as the driver’s fault.

When people say something will ‘probably’ happen they mean they’ll be upset if it doesn’t go the way predicted. This fails in both directions: They’re both overconfident in their own predictions in things which more than likely will happen and get overly upset when ‘probable’ events fail.

When people say something will ‘definitely’ happen they mean they’ll be shocked if it doesn’t. Numerous studies have been done about people failing to accurately estimate the changes of very unlikely events, viewing them as far less likely than they actually are, but that’s talking about people with some skill who are intellectually engaged in a very specific prediction. When most people are casually guessing chances they simply have no intuitive notion of probabilities below 1% and round down to 0 from chances even several times that.

When I say ‘people’ here I very much include myself. I have essentially no visceral sense of all quantitative values including time/distance/weight/value/speed etc. I can handle them well by reasoning and calculating but my instinctive ability to judge them is terrible.

Posted Thu Nov 23 00:45:34 2023 Tags:

There’s a family of games including gin, crazy eights, and straight dominos which have a lot of spoilage of the other player and inferencing of what they have. Unfortunately it’s usually fairly hard to deduce what the opponent has, unlike in Holdem where they’re forced to hint strongly. Here is an idea for a game of this genre with the maximum amount of range inference possible:

2 players. There’s a deck of 16 cards, each of which has one of 4 colors in the center and one of four colors on the outer edge, with each combination occurring exactly once. Players are dealt 7 cards each with one card face up in the center. The remaining card is permanently out of the game. First player is selected randomly and thereafter they alternate. On a turn a player puts down one of their cards face up whose outer edge (the ‘bottom’ color) matches the center (‘top’) color on the card below it. First player to not have a legal move loses.

The funny thing about this game is that if the discarded card is made public then it’s a game of perfect information and analyzing it is straightforward, but with that one piece of hidden information complete analysis goes bonkers.

(For those of you who have gotten this far and are wondering: This won’t be in the suite of games I’m working on supporting on chain. It involves way too many turns and is of a very different flavor, where what I’m supporting are of necessity very few turns and are nash equilibrium games which require mixed strategies because that’s the flavor of Poker.)

Posted Sun Nov 19 19:08:16 2023 Tags:

People have been curious whether the recent Lightning attack applies to the state channel work I’m doing right now. The answer is no, the attack only applies at all when you’re routing payments and I’m not doing that yet, but there is a question of what should happen for the future.

The attack is a form of transaction bumping. There’s some transaction in the mempool which spends coin A which you wish to evict, so you make a higher fee transaction which spends A and B to get it out, then make a yet even higher fee transaction spending just B, and now the transaction to spend A has magically disappeared from the mempool. In principle full nodes could simply cache the transaction to spend A and try to reapply it when the second bump happens, and they absolutely should do that, but Bitcoin full nodes don’t do that today.

The way routed payments work is you have two HTLC coins in a route from A→B→C, one for A→B and one for B→C. The way HTLCs work is they can be pulled right immediately using a secure hash preimage reveal or to the left after a timeout. B’s strategy is to reuse the same preimage from B→C to claim the coin from A→B if it C makes that claim. Otherwise B uses the timeout to claim the B→C coin, and the timeout on that coin is set shorter than the A→B coin to ensure that B is always protected. To exploit of this with a fee bumping attack, C can wait until the B→C timeout comes up, then repeatedly foil B’s attempt to claim B→C coin by timeout with a transaction bumping attack. Then A can take the A→B payment when that timeout happens, and then C can claim the B→C payment using the secure hash preimage without B being able to reuse that to claim the A→B payment because it’s already gone. The result is that A and C successfully conspire to steal money from B.

There are numerous practical difficulties with pulling this off in practice. One can argue that many of those are dependent on mempool convention and hence outside of the threat model of blockchains, but that isn’t true for collaboratively controlled coins. If A and C could control the complete mempool for all full nodes B would be hosed no matter what the smart coin logic did, so some amount of security dependence on mempool behavior is necessary here. As it happens in Chia the conventional mempool behavior already defends against transaction bumping attacks very well, because it simply refuses to add in a new transaction which doesn’t replace the spends of every single coin which was spent before. This was introduced as a practical solution to the problem of transaction bumping being very easy in Chia because any transaction can be trivially aggregated with any other transaction.

The downside of this strategy is that it allows for a strange form of transaction pinning to happen, where you can make a transaction which spends coins A and B with a low fee which then can’t get bumped by a transaction spending just A regardless of how high the fee is set. This allows for a funny attack where you bump a transaction by pinning it to a near zero fee when the mempool isn’t full, then fill up the mempool just enough to increase the fees to epsilon, and for a very small fee you’ve locked the other transaction out of the mempool. We should probably modify this logic to allow not completely replaced transactions to get bumped if they don’t have a high enough fee to get into the next block. That allows transaction bumping, but only after a transaction has been bumped by overall fees getting set high enough, which is an expensive attack and an inherent issue anyway. Ideally transactions which get bumped from the outer parts of the mempool should get cached and reapplied, but as long as somebody somewhere tries to reintroduce all cached transactions after each new block is made you’re protected, and that’s a good thing for smart wallets to attempt to do for their own transactions anyway.

Posted Fri Nov 10 19:43:30 2023 Tags:

TLDR: see the title of this blog post, it's really that trivial.

Now that GodotWayland has been coming for ages and all new development focuses on a pile of software that steams significantly less, we're seeing cracks appear in the old Xorg support. Not intentionally, but there's only so much time that can be spent on testing and things that are more niche fall through. One of these was a bug I just had the pleasure of debugging and was triggered by GNOME on Xorg user using the xf86-input-libinput driver for tablet devices.

On the surface of it, this should be fine because libinput (and thus xf86-input-libinput) handles tablets just fine. But libinput is the new kid on the block. The old kid on said block is the xf86-input-wacom driver, older than libinput by slightly over a decade. And oh man, history has baked things into the driver that are worse than raisins in apple strudel [1].

The xf86-input-libinput driver was written as a wrapper around libinput and makes use of fancy things that (from libinput's POV) have always been around: things like input device hotplugging. Fancy, I know. For tablet devices the driver creates an X device for each new tool as it comes into proximity first. Future events from that tool will go through that device. A second tool, be it a new pen or the eraser on the original pen, will create a second X device and events from that tool will go through that X device. Configuration on any device will thus only affect that particular pen. Almost like the whole thing makes sense.

The wacom driver of course doesn't do this. It pre-creates X devices for some possible types of tools (pen, eraser, and cursor [2] but not airbrush or artpen). When a tool goes into proximity the events are sent through the respective device, i.e. all pens go through the pen tool, all erasers through the eraser tool. To actually track pens there is the "Wacom Serial IDs" property that contains the current tool's serial number. If you want to track multiple tools you need to query the property on proximity in [4]. At the time this was within a reasonable error margin of a good idea.

Of course and because MOAR CONFIGURATION! will save us all from the great filter you can specify the "ToolSerials" xorg.conf option as e.g. "airbrush;12345;artpen" and get some extra X devices pre-created, in this case a airbrush and artpen X device and an X device just for the tool with the serial number 12345. All other tools multiplex through the default devices. Again, at the time this was a great improvement. [5]

Anyway, where was I? Oh, right. The above should serve as a good approximation of a reason why the xf86-input-libinput driver does not try to be fullly compatible to the xf86-input-wacom driver. In everyday use these things barely matter [6] but for the desktop environment which needs to configure these devices all these differences mean multiple code paths. Those paths need to be tested but they aren't, so things fall through the cracks.

So quite a while ago, we made the decision that until Xorg goes dodo, the xf86-input-wacom driver is the tablet driver to use in GNOME. So if you're using a GNOME on Xorg session [7], do make sure the xf86-input-wacom driver is installed. It will make both of us happier and that's a good aim to strive for.

[1] It's just a joke. Put the pitchforks down already.
[2] The cursor is the mouse-like thing Wacom sells. Which is called cursor [3] because the English language has a limited vocabulary and we need to re-use words as much as possible lest we run out of them.
[3] It's also called puck. Because [2].
[4] And by "query" I mean "wait for the XI2 event notifying you of a property change". Because of lolz the driver cannot update the property on proximity in but needs to schedule that as idle func so the property update for the serial always arrives at some unspecified time after the proximity in but hopefully before more motion events happen. Or not, and that's how hope dies.
[5] Think about this next time someone says they long for some unspecified good old days.
[6] Except the strip axis which on the wacom driver is actually a bit happily moving left/right as your finger moves up/down on the touch strip and any X client needs to know this. libinput normalizes this to...well, a normal value but now the X client needs to know which driver is running so, oh deary deary.
[7] e.g because your'e stockholmed into it by your graphics hardware

Posted Fri Nov 10 03:22:00 2023 Tags: