The lower-post-volume people behind the software in Debian. (List of feeds.)
Here are some examples of the intuitive insights I mentioned when talking about Beeping Booping Busy Beavers:
Fermat’s Last Theorem (as a conjecture) is an example of a level 1 question, answerable by an ordinary Busy Beaver
The Twin Primes Conjecture is an example of a level 2 question, which requires a level 2 Busy Beaver which has access to a level 1 Busy Beaver oracle which answers questions about whether a level 1 beaver halts. The Beeping Busy Beaver equivalence clarifies this. Level 2 questions are generally more difficult than level 1 questions.
An example of a level 3 question is ‘Does there exist a C such that 3^A-2^B=C has an infinite number of solutions in the integers?’ The equivalence to Beeping Booping Beavers shows the level. Level 3 questions are generally more difficult than level 2 questions, but they’re thought about much less often in mathematics. I came up with this possibly original example because no simple famous one comes to mind.
Unrelated to that I previously made a cool animation illustrating a possibly better fill pattern along with code with generates it. There’s now a new plug-in system for Cura which should make it straightforward to add this pattern. If anyone implements that it would be much appreciated. People have criticized this pattern because it isn’t all curves like gyroid, and I initially assumed myself that should be changed, but now I’m not so sure. Gyroid isn’t really the mathematical gyroid because it gets very thin in places because of how layers work which gives it points of weakness, and the vibration of having those sharp corners is much less of an issue with input shaping and other 3d printer improvements. What matters most now is strength of support for the amount of support material used and this pattern should score very well on that benchmark.
I previously babbled about possible antidotes to the insidious adversarial attacks on neural networks. In the interests of making something which is likely to work I’ve been having shower thoughts about how to detune my construction. Instead of increasing the number of bits an attacker needs to control from 2 to some factor of N the goal is to increas it to 10 or so. That’s still a meaningful improvement and more likely to succeed. It may also result in big practical gains because although it isn’t all that much more resistant to true adversaries it may ‘hallucinate’ a lot less against accidentally adversarial data which occurs just by chance.
Trivially we can make progress towards the goal by taking all the inputs, putting them into 3 different buckets, segregating everything into 3 different smaller neural networks which lead to 3 different outputs at the end and then making the one true output be the sum of those 3. This is straightforward to implement by designating a third of each layer to each of the buckets and zeroing out all connections between buckets, then on the final layer for the designated output value set the weight of the connection from one node in each bucket to 1 and the rest to zero.
Obviously this increases adversary resistance but does so at substantial cost to accuracy. What we’d like to do is get both accuracy and adversary resistance by making it so that all inputs go into the next to last layer outputs but they do so via such wildly different paths that manipulating the original inputs doesn’t cause them all to change in tandem. Hopefully that results in a neural network which can be trained as normal and automatically has adversary resistance without a big hit in accuracy. I will now give a minimalist construction which has that structure.
Group the input values into six buckets. After that there will be phase 1, which is grouped into 15 different mini neural networks corresponding to the 15 different ways of picking exactly two of the input buckets to process. Next is a second phase which is also subdivided into 15 different mini neural networks. They correspond to the 15 different ways of taking three different things from the first phase such that each of the original buckets gets included exactly once, and those are taken as the inputs in the phase transition.
Concretely lets say that the input buckets are numbered 1 through 6. The first phase groups are A: 12, B: 13, C: 14, D: 15, E: 16, F: 23, G: 24, H: 25, I: 26, J: 34, K: 35, L: 36, M: 45, N: 46, O: 56. The second phase groupings are: AJO, AKN, ALM, BGO, BHN, BIM, CFO, CHL, CIK, DFN, DGL, DIJ, EFM, EGK, EHJ. This set of groupings has many beautiful mathematical properties, including that each of the original buckets goes into each of the final outputs exactly once, each bucket is paired with each other bucket in the first phase exactly once, and each first phase group is paired with every other first phase group exactly once in the second phase.
Finally the last layer should take an input weight one from exactly one node in each of the phase 2 groups and all its other input connections are weight zero.
In order to keep this from simply repeating all the inputs at the end of the first phase and then doing all the computation in the second phase and hence not having any adversary resistance advantage any more it’s probably necessary to pinch things a bit, either by zeroing out a significant fraction of the values in the layer before the phase transition or limiting the depth of the second phase or both.
The Net Promoter Score (NPS) is a statistically questionable way to turn a set of 10-point ratings into a single number you can compare with other NPSes. That's not the good part.
Humans
To understand the good parts, first we have to start with humans. Humans have emotions, and those emotions are what they mostly use when asked to rate things on a 10-point scale.
Almost exactly twenty years ago, I wrote about sitting on a plane next to a musician who told me about music album reviews. The worst rating an artist can receive, he said, is a lukewarm one. If people think your music is neutral, it means you didn't make them feel anything at all. You failed. Someone might buy music that reviewers hate, or buy music that people love, but they aren't really that interested in music that is just kinda meh. They listen to music because they want to feel something.
(At the time I contrasted that with tech reviews in computer magazines (remember those?), and how negative ratings were the worst thing for a tech product, so magazines never produced them, lest they get fewer free samples. All these years later, journalism is dead but we're still debating the ethics of game companies sponsoring Twitch streams. You can bet there's no sponsored game that gets an actively negative review during 5+ hours of gameplay and still gets more money from that sponsor. If artists just want you to feel something, but no vendor will pay for a game review that says it sucks, I wonder what that says about video game companies and art?)
Anyway, when you ask regular humans, who are not being sponsored, to rate things on a 10-point scale, they will rate based on their emotions. Most of the ratings will be just kinda meh, because most products are, if we're honest, just kinda meh. I go through most of my days using a variety of products and services that do not, on any more than the rarest basis, elicit any emotion at all. Mostly I don't notice those. I notice when I have experiences that are surprisingly good, or (less surprisingly but still notably) bad. Or, I notice when one of the services in any of those three categories asks me to rate them on a 10-point scale.
The moment
The moment when they ask me is important. Many products and services are just kinda invisibly meh, most of the time, so perhaps I'd give them a meh rating. But if my bluetooth headphones are currently failing to connect, or I just had to use an airline's online international check-in system and it once again rejected my passport for no reason, then maybe my score will be extra low. Or if Apple releases a new laptop that finally brings back a non-sucky keyboard after making laptops with sucky keyboards for literally years because of some obscure internal political battle, maybe I'll give a high rating for a while.
If you're a person who likes manipulating ratings, you'll figure out what moments are best for asking for the rating you want. But let's assume you're above that sort of thing, because that's not one of the good parts.
The calibration
Just now I said that if I'm using an invisible meh product or service, I would rate it with a meh rating. But that's not true in real life, because even though I was having no emotion about, say, Google Meet during a call, perhaps when they ask me (after every...single...call) how it was, that makes me feel an emotion after all. Maybe that emotion is "leave me alone, you ask me this way too often." Or maybe I've learned that if I pick anything other than five stars, I get a clicky multi-tab questionnaire that I don't have time to answer, so I almost always pick five stars unless the experience was so bad that I feel it's worth an extra minute because I simply need to tell the unresponsive and uncaring machine how I really feel.
Google Meet never gets a meh rating. It's designed not to. In Google Meet, meh gets five stars.
Or maybe I bought something from Amazon and it came with a thank-you card begging for a 5-star rating (this happens). Or a restaurant offers free stuff if I leave a 5-star rating and prove it (this happens). Or I ride in an Uber and there's a sign on the back seat talking about how they really need a 5-star rating because this job is essential so they can support their family and too many 4-star ratings get them disqualified (this happens, though apparently not at UberEats). Okay. As one of my high school teachers, Physics I think, once said, "A's don't cost me anything. What grade do you want?" (He was that kind of teacher. I learned a lot.)
I'm not a professional reviewer. Almost nobody you ask is a professional reviewer. Most people don't actually care; they have no basis for comparison; just about anything will influence their score. They will not feel badly about this. They're just trying to exit your stupid popup interruption as quickly as possible, and half the time they would have mashed the X button instead but you hid it, so they mashed this one instead. People's answers will be... untrustworthy at best.
That's not the good part.
And yet
And yet. As in so many things, randomness tends to average out, probably into a Gaussian distribution, says the Central Limit Theorem.
The Central Limit Theorem is the fun-destroying reason that you can't just average 10-point ratings or star ratings and get something useful: most scores are meh, a few are extra bad, a few are extra good, and the next thing you know, every Uber driver is a 4.997. Or you can ship a bobcat one in 30 times and still get 97% positive feedback.
There's some deep truth hidden in NPS calculations: that meh ratings mean nothing, that the frequency of strong emotions matters a lot, and that deliriously happy moments don't average out disastrous ones.
Deming might call this the continuous region and the "special causes" (outliers). NPS is all about counting outliers, and averages don't work on outliers.
The degrees of meh
Just kidding, there are no degrees of meh. If you're not feeling anything, you're just not. You're not feeling more nothing, or less nothing.
One of my friends used to say, on a scale of 6 to 9, how good is this? It was a joke about how nobody ever gives a score less than 6 out of 10, and nothing ever deserves a 10. It was one of those jokes that was never funny because they always had to explain it. But they seemed to enjoy explaining it, and after hearing the explanation the first several times, that part was kinda funny. Anyway, if you took the 6-to-9 instructions seriously, you'd end up rating almost everything between 7 and 8, just to save room for something unimaginably bad or unimaginably good, just like you did with 1-to-10, so it didn't help at all.
And so, the NPS people say, rather than changing the scale, let's just define meaningful regions in the existing scale. Only very angry people use scores like 1-6. Only very happy people use scores like 9 or 10. And if you're not one of those you're meh. It doesn't matter how meh. And in fact, it doesn't matter much whether you're "5 angry" or "1 angry"; that says more about your internal rating system than about the degree of what you experienced. Similarly with 9 vs 10; it seems like you're quite happy. Let's not split hairs.
So with NPS we take a 10-point scale and turn it into a 3-point scale. The exact opposite of my old friend: you know people misuse the 10-point scale, but instead of giving them a new 3-point scale to misuse, you just postprocess the 10-point scale to clean it up. And now we have a 3-point scale with 3 meaningful points. That's a good part.
Evangelism
So then what? Average out the measurements on the newly calibrated 1-2-3 scale, right?
Still no. It turns out there are three kinds of people: the ones so mad they will tell everyone how mad they are about your thing; the ones who don't care and will never think about you again if they can avoid it; and the ones who had such an over-the-top amazing experience that they will tell everyone how happy they are about your thing.
NPS says, you really care about the 1s and the 3s, but averaging them makes no sense. And the 2s have no effect on anything, so you can just leave them out.
Cool, right?
Pretty cool. Unfortunately, that's still two valuable numbers but we promised you one single score. So NPS says, let's subtract them! Yay! Okay, no. That's not the good part.
The threefold path
I like to look at it this way instead. First of all, we have computers now, we're not tracking ratings on one of those 1980s desktop bookkeeping printer-calculators, you don't have to make every analysis into one single all-encompassing number.
Postprocessing a 10-point scale into a 3-point one, that seems pretty smart. But you have to stop there. Maybe you now have three separate aggregate numbers. That's tough, I'm sorry. Here's a nickel, kid, go sell your personal information in exchange for a spreadsheet app. (I don't know what you'll do with the nickel. Anyway I don't need it. Here. Go.)
Each of those three rating types gives you something different you can do in response:
-
The ones had a very bad experience, which is hopefully an outlier, unless you're Comcast or the New York Times subscription department. Normally you want to get rid of every bad experience. The absence of awful isn't greatness, it's just meh, but meh is infinitely better than awful. Eliminating negative outliers is a whole job. It's a job filled with Deming's special causes. It's hard, and it requires creativity, but it really matters.
-
The twos had a meh experience. This is, most commonly, the majority. But perhaps they could have had a better experience. Perhaps even a great one? Deming would say you can and should work to improve the average experience and reduce the standard deviation. That's the dream; heck, what if the average experience could be an amazing one? That's rarely achieved, but a few products achieve it, especially luxury brands. And maybe that Broadway show, Hamilton? I don't know, I couldn't get tickets, because everyone said it was great so it was always sold out and I guess that's my point.
If getting the average up to three is too hard or will take too long (and it will take a long time!), you could still try to at least randomly turn a few of them into threes. For example, they say users who have a great customer support experience often rate a product more highly than the ones who never needed to contact support at all, because the support interaction made the company feel more personal. Maybe you can't afford to interact with everyone, but if you have to interact anyway, perhaps you can use that chance to make it great instead of meh.
-
The threes already had an amazing experience. Nothing to do, right? No! These are the people who are, or who can become, your superfan evangelists. Sometimes that happens on its own, but often people don't know where to put that excess positive energy. You can help them. Pop stars and fashion brands know all about this; get some true believers really excited about your product, and the impact is huge. This is a completely different job than turning ones into twos, or twos into threes.
What not to do
Those are all good parts. Let's ignore that unfortunately they aren't part of NPS at all and we've strayed way off topic.
From here, there are several additional things you can do, but it turns out you shouldn't.
Don't compare scores with other products. I guarantee you, your methodology isn't the same as theirs. The slightest change in timing or presentation will change the score in incomparable ways. You just can't. I'm sorry.
Don't reward your team based on aggregate ratings. They will find a way to change the ratings. Trust me, it's too easy.
Don't average or difference the bad with the great. The two groups have nothing to do with each other, require completely different responses (usually from different teams), and are often very small. They're outliers after all. They're by definition not the mainstream. Outlier data is very noisy and each terrible experience is different from the others; each deliriously happy experience is special. As the famous writer said, all meh families are alike.
Don't fret about which "standard" rating ranges translate to bad-meh-good. Your particular survey or product will have the bad outliers, the big centre, and the great outliers. Run your survey enough and you'll be able to find them.
Don't call it NPS. NPS nowadays has a bad reputation. Nobody can really explain the bad reputation; I've asked. But they've all heard it's bad and wrong and misguided and unscientific and "not real statistics" and gives wrong answers and leads to bad incentives. You don't want that stigma attached to your survey mechanic. But if you call it a satisfaction survey on a 10-point or 5-point scale, tada, clear skies and lush green fields ahead.
Bonus advice
Perhaps the neatest thing about NPS is how much information you can get from just one simple question that can be answered with the same effort it takes to dismiss a popup.
I joked about Google Meet earlier, but I wasn't really kidding; after having a few meetings, if I had learned that I could just rank from 1 to 5 stars and then not get guilted for giving anything other than 5, I would do it. It would be great science and pretty unobtrusive. As it is, I lie instead. (I don't even skip, because it's faster to get back to the menu by lying than by skipping.)
While we're here, only the weirdest people want to answer a survey that says it will take "just 5 minutes" or "just 30 seconds." I don't have 30 seconds, I'm busy being mad/meh/excited about your product, I have other things to do! But I can click just one single star rating, as long as I'm 100% confident that the survey will go the heck away after that. (And don't even get me started about the extra layer in "Can we ask you a few simple questions about our website? Yes or no")
Also, don't be the survey that promises one question and then asks "just one more question." Be the survey that gets a reputation for really truly asking that one question. Then ask it, optionally, in more places and more often. A good role model is those knowledgebases where every article offers just thumbs up or thumbs down (or the default of no click, which means meh). That way you can legitimately look at aggregates or even the same person's answers over time, at different points in the app, after they have different parts of the experience. And you can compare scores at the same point after you update the experience.
But for heaven's sake, not by just averaging them.
As part of the clustering algorithms stuff I sometimes noodle on I’m trying to figure out how to define a toroidal-ish space which wraps around in a way corresponding to the kissing number for each number of dimensions and is straightforward and efficient to implement in software. I’m probably reinventing the wheel here, with the possible exception of keeping an eye on how to efficiently implement things in software, the details of which are probably mathematically trivial, but this is all new to me and isn’t written up anywhere in a way which I could understand so here’s my presentation of it.
(I do this under the assumption that it will result in k+1 things magically packing equidistantly and nicely colorable, a property I will simply claim happens without any justification whatsoever.)
Despite your existing familiarity with sphere packing in 2 and 3 dimensions it’s most insightful to understand those starting with 4. In 4 dimensions you start with a grid. This isn’t great because each thing is only adjacent to 8 others, but also has a similar problem to what hyperspheres do which is that points have clear antipodes, a single point most distant from them, and you’re trying to pack everything in so everything else should be about equidistant. Like with hyperspheres the fix is to glue each point to its antipode, so you add in another grid where everything is offset by 1/2 in each dimension. In 4 dimensions this results in each point having 16 diagonals (because each dimension goes up or down by 1/2) which by the Pythagorean Theorem are magically of length 1 so they’re the same distance as the other 8 for a total of 24.
Happily 24 is exactly the kissing number for 4 dimensions. This construction doesn’t quite work verbatim for less than 4 dimensions because then the diagonals wind up being less than 1 unit away. That can be fixed by stretching out exactly one of the dimensions to make the diagonals all unit length resulting in face centered packing for 3 dimensions and the standard optimal packing for 2 dimensions.
This way of viewing this is also very nice from an implementation standpoint. To find the shortest distance from the origin to a particular point, independently figure out whether to get their in the positive or negative direction for each dimension to find the closest version, then do the same thing for the point offset by one half in each dimension, and take the shorter of those two vectors.
Above 4 dimensions this construction breaks down because the diagonals are already more than a unit away from the beginning. Again it’s easiest to understand what happens next by jumping up a bit. In 8 dimensions there is the E8 group. To construct that you knock out all the points in the grid where the sum of their offsets in each dimension is odd. That leaves all the major diagonals as length square root of 2, and leaves in minor diagonals where you offset on the grid in exactly two dimensions which are also length square root of 2. That leaves total number of kissing spheres as (2^8)/2+(8*7/2)*2^2 = 240 and we’ve got the optimal solution.
This is also straightforward to implement by doing the same thing as with 4 dimensions and below, but keep track of the number of times a point was flipped to the other side and if it’s odd in the end you have to undo whichever flipping made the least gains.
Like in the earlier case the problem with doing the same thing in a lower number of dimensions is that the diagonals are too short so you have to stretch out one of the dimensions to make them the same length as the other points. In 5 dimensions this nails it again, with 16 major diagonals and 24 minor diagonals for a total of 40 which is the best known, and in 6 dimensions it likewise nails it with 32+40=72.
In 7 dimensions things break down and I get horribly confused. This construction yields 124, but the best known is 126. This is likely too small a difference to matter in practice, but it’s very strange and it would be nice if someone could explain to me where those extra 2 are from some strange other diagonal or if it’s a completely different construction or if the kissing number is improved with some much less symmetric smushing of the pieces around to squeeze in 2 more is being done.
Above 8 dimensions I’m a bit lost. Presumably you can drop everything which isn’t 0 mod 3 instead of 0 mod 2 which will result in many more minor diagonals, but that seems like it isn’t a lattice any more and is unlikely to be able to nail constructing the Leech Lattice. But at least it’s still reasonably easy to implement, albeit with a few more caveats. It seems like a crude but vaguely ballpark estimate for kissing numbers is n choose n/2 divided by n/2.
It might not matter of course. 8 is already a fairly high number of dimensions for what I’m trying to do and I haven’t even benchmarked these against hyperspheres and it seems likely that their potential benefits has already run out at that scale, or maybe falls off a cliff after the magical construction at 8.
Busy Beaver numbers are the classic example of a well defined noncomputable function. The question is: For a given number of states of a Turing Machine, what’s the maximum number of ones which can be in its final output when it halts? If we’re playing the ‘name the biggest number’ game, this can be used to easily beat everything people normally come up with.
Beeping Busy Beaver numbers grow yet even more profoundly faster than Busy Beaver numbers. For them instead of having one of the state transitions be a halt make one of the state transitions emit a ‘beep’. For a given machine the number is the number of steps it goes through before it emits its final beep. The rate of growth of these numbers is comparable to the rate of growth of machines which are given access to an oracle which can determine whether a given Turing Machine halts, but obviously the beeping construction is much more elegant. (Other things specify a particular state rather than state transation as emitting the beep. Since the transition number is always at least as large and possibly much larger I think it’s a better thing to go with.)
Now you may wonder: Can we go larger? Is there an elegant construction which corresponds in power to having a Turing Machine which can access an oracle which tells it whether a given Turing Machine which has access to whether a regular Turing Machine halts in turn halts? Sorry for how hard it is to parse that sentence, we’re looking for the level above Beeping Busy Beavers. It turns out the answer is yes there is, by having a Beeping Booping Busy Beaver, which is the new idea I’m presenting here.
Like a Beeping Busy Beaver, a Beeping Booping Busy Beaver never halts. It has two state transitions which are special, one which emits a ‘beep’ and one which emits a ‘boop’. Its output is interpreted by counting the number of beeps between each successive boop, resulting in a series of integers. To calculate its number, find the first output value which is later repeated an infinite number of times and count the number of steps it took to first finish that output.
Proving that this is computationally equivalent is left as an exercise to the reader, mostly because I don’t know how to do it, but Scott Aaronson assures me that it does in fact work. As a mathematician I’m a lot better and constructing things than proving things.
A few things jump out here. First of all the Beeping Booping construction gives some insights into which number theory questions correspond to which level of Turing Machine, which I for one didn’t realize before, so that’s an actually useful output of this silliness. Also it seems obvious that there should be some kind of Beep Boop Buzz construction which goes one higher, but oddly I have no idea how to construct that, so maybe mathematicians only rarely ask questions of that level.
Despite it not being obvious how to add in a Buzz there is a clear pattern going on here. Each Beeping Busy Beaver is a Turing Machine which was disqualified from having a Busy Beaver number because it never halts. Likewise each Beep Boop Beaver was disqualified from having a beeping number because it never stops beeping. Maybe a Beep Boop Buzz Beaver is a failed Beep Boop Beaver because each of its boop counts doesn’t repeat infinitely, but it isn’t at all obvious how that should work.
The funny thing about these higher level machines is that they aren’t new machines at all, they’re simply new ways of interpreting the behavior of regular Turing Machines. When something simply looks ‘messy’ you haven’t fully grokked the meaning of its output, and maybe there’s something very coherent which it’s doing if you think about it in the right way.
This is a model of how most people model probability. It came from the same place which most of the magic numbers I came up with. I have a lot of fiber in my diet. But my magic numbers seem to work well, and this suggests some concrete predictions which could be studied, so it may be worth considering.
In English we have a few words for probability which people have a reasonable internal model of: Definitely not, probably not, maybe, probably yes, definitely yes. These correspond to 0%, 10%, 50%, 90%, and 100% respectively. People round off to these with 0-5% going to 0%, 6-20% going to 10%, 21-79% to 50%, 80-94% to 90%, and 95-100% to 100%.
When people say something will ‘maybe’ happen they’re expressing that they will feel no emotional response when it goes either way. Most people have no meaningful ability to gauge changes in expected value over this range, and will often be dismissive of the difference between 40% and 60% even when the it’s explained to them. Somehow the difference in expected value between 0% and 20% is much more meaningful than the difference between 40% and 60%. This shows up in bizarre ways in public discourse, where a road policy which causes an identifiable driver to die is viewed as murder, but a policy which causes a demonstrably much larger number of drivers to die but with link being statistical is viewed as the driver’s fault.
When people say something will ‘probably’ happen they mean they’ll be upset if it doesn’t go the way predicted. This fails in both directions: They’re both overconfident in their own predictions in things which more than likely will happen and get overly upset when ‘probable’ events fail.
When people say something will ‘definitely’ happen they mean they’ll be shocked if it doesn’t. Numerous studies have been done about people failing to accurately estimate the changes of very unlikely events, viewing them as far less likely than they actually are, but that’s talking about people with some skill who are intellectually engaged in a very specific prediction. When most people are casually guessing chances they simply have no intuitive notion of probabilities below 1% and round down to 0 from chances even several times that.
When I say ‘people’ here I very much include myself. I have essentially no visceral sense of all quantitative values including time/distance/weight/value/speed etc. I can handle them well by reasoning and calculating but my instinctive ability to judge them is terrible.
There’s a family of games including gin, crazy eights, and straight dominos which have a lot of spoilage of the other player and inferencing of what they have. Unfortunately it’s usually fairly hard to deduce what the opponent has, unlike in Holdem where they’re forced to hint strongly. Here is an idea for a game of this genre with the maximum amount of range inference possible:
2 players. There’s a deck of 16 cards, each of which has one of 4 colors in the center and one of four colors on the outer edge, with each combination occurring exactly once. Players are dealt 7 cards each with one card face up in the center. The remaining card is permanently out of the game. First player is selected randomly and thereafter they alternate. On a turn a player puts down one of their cards face up whose outer edge (the ‘bottom’ color) matches the center (‘top’) color on the card below it. First player to not have a legal move loses.
The funny thing about this game is that if the discarded card is made public then it’s a game of perfect information and analyzing it is straightforward, but with that one piece of hidden information complete analysis goes bonkers.
(For those of you who have gotten this far and are wondering: This won’t be in the suite of games I’m working on supporting on chain. It involves way too many turns and is of a very different flavor, where what I’m supporting are of necessity very few turns and are nash equilibrium games which require mixed strategies because that’s the flavor of Poker.)
People have been curious whether the recent Lightning attack applies to the state channel work I’m doing right now. The answer is no, the attack only applies at all when you’re routing payments and I’m not doing that yet, but there is a question of what should happen for the future.
The attack is a form of transaction bumping. There’s some transaction in the mempool which spends coin A which you wish to evict, so you make a higher fee transaction which spends A and B to get it out, then make a yet even higher fee transaction spending just B, and now the transaction to spend A has magically disappeared from the mempool. In principle full nodes could simply cache the transaction to spend A and try to reapply it when the second bump happens, and they absolutely should do that, but Bitcoin full nodes don’t do that today.
The way routed payments work is you have two HTLC coins in a route from A→B→C, one for A→B and one for B→C. The way HTLCs work is they can be pulled right immediately using a secure hash preimage reveal or to the left after a timeout. B’s strategy is to reuse the same preimage from B→C to claim the coin from A→B if it C makes that claim. Otherwise B uses the timeout to claim the B→C coin, and the timeout on that coin is set shorter than the A→B coin to ensure that B is always protected. To exploit of this with a fee bumping attack, C can wait until the B→C timeout comes up, then repeatedly foil B’s attempt to claim B→C coin by timeout with a transaction bumping attack. Then A can take the A→B payment when that timeout happens, and then C can claim the B→C payment using the secure hash preimage without B being able to reuse that to claim the A→B payment because it’s already gone. The result is that A and C successfully conspire to steal money from B.
There are numerous practical difficulties with pulling this off in practice. One can argue that many of those are dependent on mempool convention and hence outside of the threat model of blockchains, but that isn’t true for collaboratively controlled coins. If A and C could control the complete mempool for all full nodes B would be hosed no matter what the smart coin logic did, so some amount of security dependence on mempool behavior is necessary here. As it happens in Chia the conventional mempool behavior already defends against transaction bumping attacks very well, because it simply refuses to add in a new transaction which doesn’t replace the spends of every single coin which was spent before. This was introduced as a practical solution to the problem of transaction bumping being very easy in Chia because any transaction can be trivially aggregated with any other transaction.
The downside of this strategy is that it allows for a strange form of transaction pinning to happen, where you can make a transaction which spends coins A and B with a low fee which then can’t get bumped by a transaction spending just A regardless of how high the fee is set. This allows for a funny attack where you bump a transaction by pinning it to a near zero fee when the mempool isn’t full, then fill up the mempool just enough to increase the fees to epsilon, and for a very small fee you’ve locked the other transaction out of the mempool. We should probably modify this logic to allow not completely replaced transactions to get bumped if they don’t have a high enough fee to get into the next block. That allows transaction bumping, but only after a transaction has been bumped by overall fees getting set high enough, which is an expensive attack and an inherent issue anyway. Ideally transactions which get bumped from the outer parts of the mempool should get cached and reapplied, but as long as somebody somewhere tries to reintroduce all cached transactions after each new block is made you’re protected, and that’s a good thing for smart wallets to attempt to do for their own transactions anyway.
TLDR: see the title of this blog post, it's really that trivial.
Now that GodotWayland has been coming for ages and all new development focuses on a pile of software
that steams significantly less, we're seeing cracks appear in the old Xorg support. Not intentionally,
but there's only so much time that can be spent on testing and things that are more niche fall through.
One of these was a bug I just had the pleasure of debugging and was triggered by GNOME on Xorg user using the xf86-input-libinput driver for tablet devices.
On the surface of it, this should be fine because libinput (and thus xf86-input-libinput) handles tablets just fine. But libinput is the new kid on the block. The old kid on said block is the xf86-input-wacom driver, older than libinput by slightly over a decade. And oh man, history has baked things into the driver that are worse than raisins in apple strudel [1].
The xf86-input-libinput driver was written as a wrapper around libinput and makes use of fancy things that (from libinput's POV) have always been around: things like input device hotplugging. Fancy, I know. For tablet devices the driver creates an X device for each new tool as it comes into proximity first. Future events from that tool will go through that device. A second tool, be it a new pen or the eraser on the original pen, will create a second X device and events from that tool will go through that X device. Configuration on any device will thus only affect that particular pen. Almost like the whole thing makes sense.
The wacom driver of course doesn't do this. It pre-creates X devices for some possible types of tools (pen, eraser, and cursor [2] but not airbrush or artpen). When a tool goes into proximity the events are sent through the respective device, i.e. all pens go through the pen tool, all erasers through the eraser tool. To actually track pens there is the "Wacom Serial IDs" property that contains the current tool's serial number. If you want to track multiple tools you need to query the property on proximity in [4]. At the time this was within a reasonable error margin of a good idea.
Of course and because MOAR CONFIGURATION! will save us all from the great filter you can specify the "ToolSerials" xorg.conf option as e.g. "airbrush;12345;artpen" and get some extra X devices pre-created, in this case a airbrush and artpen X device and an X device just for the tool with the serial number 12345. All other tools multiplex through the default devices. Again, at the time this was a great improvement. [5]
Anyway, where was I? Oh, right. The above should serve as a good approximation of a reason why the xf86-input-libinput driver does not try to be fullly compatible to the xf86-input-wacom driver. In everyday use these things barely matter [6] but for the desktop environment which needs to configure these devices all these differences mean multiple code paths. Those paths need to be tested but they aren't, so things fall through the cracks.
So quite a while ago, we made the decision that until Xorg goes dodo, the xf86-input-wacom driver is the tablet driver to use in GNOME. So if you're using a GNOME on Xorg session [7], do make sure the xf86-input-wacom driver is installed. It will make both of us happier and that's a good aim to strive for.
[1] It's just a joke. Put the pitchforks down already.
[2] The cursor is the mouse-like thing Wacom sells. Which is called cursor [3] because the English language has a limited vocabulary and we need to re-use words as much as possible lest we run out of them.
[3] It's also called puck. Because [2].
[4] And by "query" I mean "wait for the XI2 event notifying you of a property change". Because of lolz the driver cannot update the property on proximity in but needs to schedule that as idle func so the
property update for the serial always arrives at some unspecified time after the proximity in but hopefully before more motion events happen. Or not, and that's how hope dies.
[5] Think about this next time someone says they long for some unspecified good old days.
[6] Except the strip axis which on the wacom driver is actually a bit happily moving left/right as your finger moves up/down on the touch strip and any X client needs to know this. libinput normalizes this to...well, a normal value but now the X client needs to know which driver is running so, oh deary deary.
[7] e.g because your'e stockholmed into it by your graphics hardware
Covenants are a construction to allow introspection: a transaction output can place conditions on the transaction which spends it (beyond the specific “must provide a valid signature of itself and a particular pubkey”).
I previously looked at Examining ScriptPubkeys, but another useful thing covenants want to enforce is amounts. This is easy for equality, but consider the case where you are allowed to merge inputs: perhaps the first output amount must be the sum of the first and second inputs.
The problem is that Bitcoin Script deals in signed ones-complement values, and 31 bits limits us to 21.47483648 bitcoin. However, using OP_MULTISHA256
or OP_CAT
, it’s possible to deal with full amounts. I’ve written some (untested!) script code below.
The Vexing Problem of Amounts
Using OP_TXHASH
, we can get SHA256(input amount) and SHA256(output amount) on the stack. Since this involves hashing, we can’t evaluate the number for anything but equality, so as in other cases where we don’t have Fully Complete Covenants we need to have the user supply the actual values on the witness stack, and we test those for the conditions we want, and then make sure they match what OP_TXHASH
says is in the transaction. I usually object to this backwards form (just give me the value on the stack!), but as you’ll see, we couldn’t natively use 64 bit values from OP_TX
anyway (I had proposed pushing two values, which is its own kind of ugly).
A Value Form Bitcoin Script Can Deal With
21M BTC is just under 2^51 satoshis.
We split these bits into a pair of stack values:
- lower 24 bits
- upper bits (27, but we allow up to 31)
I call this tuple “Script-friendly pair” (SFP) form. Note that all script numbers on stack are represented in little-endian, with a sign bit (0x80 on the last byte). This is a nasty format to work with, unfortunately.
Converting A Script-Friendly Pair to an 8-byte Little-Endian Value
Here’s the code to takes a positive CScriptNum, and produces two stack values which can be concatenated to make a 4 byte unsigned value:
# !UNTESTED CODE!
# Stack (top to bottom): lower, upper
OP_SWAP
# Generate required prefix to append to stack value to make it 4 bytes long.
OP_SIZE
OP_DUP
OP_NOTIF
# 0 -> 00000000
OP_DROP
4 OP_PUSHDATA1 0x00 0x00 0x00 0x00
OP_ELSE
OP_DUP
1 OP_EQUAL OP_IF
# Single byte: prepend 0x00 0x00 0x00
OP_DROP
3 OP_PUSHDATA1 0x00 0x00 0x00
OP_ELSE
OP_DUP
2 OP_EQUAL OP_IF
# Two bytes: prepend 0x00 0x00
2 OP_PUSHDATA1 0x00 0x00
OP_ELSE
3 OP_EQUAL OP_IF
# Three bytes: prepend 0x00
1 OP_PUSHDATA1 0x00
OP_ELSE
# Prepend nothing.
0
OP_ENDIF
OP_ENDIF
OP_ENDIF
OP_ENDIF
OP_SWAP
# Stack (top to bottom): upper, pad, lower
That 46 bytes handles upper. Now lower is a CScriptNum between 0 and 16777215, and we want to produce two stack values which can be concatenated to make an 3 byte unsigned value. Here we have to remove the zero-padding in the four-byte case:
# !UNTESTED CODE!
# Stack (top to bottom): upper, pad, lower
OP_ROT
# Generate required prefix to append to stack value to make it 3 bytes long.
OP_SIZE
OP_DUP
OP_NOTIF
# 0 -> 000000
OP_DROP
3 OP_PUSHDATA1 0x00 0x00 0x00
OP_ELSE
OP_DUP
1 OP_EQUAL OP_IF
# Single byte: prepend 0x00 0x00
OP_DROP
2 OP_PUSHDATA1 0x00 0x00
OP_ELSE
OP_DUP
2 OP_EQUAL OP_IF
# Two bytes. Now maybe final byte is 0x00 simply so it doesn't
# appear negative, but we don't care.
1 OP_PUSHDATA1 0x00
OP_ELSE
# Three bytes: empty append below
3 OP_EQUAL OP_NOTIF
# Four bytes, e.g. 0xff 0xff 0xff 0x00
# Convert to three byte version: negate and add 2^23
# => 0xff 0xff 0xff
OP_NEG
4 OP_PUSHDATA1 0x00 0x00 0x80 0x00
OP_ADD
OP_ENDIF
# Prepend nothing.
0
OP_ENDIF
OP_ENDIF
OP_ENDIF
OP_SWAP
# Stack (top to bottom): lower, pad, upper, pad
You can optimize these 47 bytes a little, but I’ll leave that as an exercise for the reader!
Now we use OP_MULTISHA256
(or OP_CAT
3 times and OP_SHA256
) to
concatentate them to form an 8-byte little-endian number, for
comparison against the format used by OP_TXHASH
.
Basically, 95 bytes to compare our tuple to a hashed value.
Adding Two Script-Friendly Pairs
Let’s write some code to add two well-formed Script-Friendly Pairs!
# !UNTESTED CODE!
# Stack (top to bottom): a_lower, a_upper, b_lower, b_upper
OP_ROT
OP_ADD
OP_DUP
4 OP_PUSHDATA1 0x00 0x00 0x00 0x01
OP_GREATERTHANOREQUAL
OP_IF
# lower overflow, bump upper.
# FIXME: We can OP_TUCK this constant above!
4 OP_PUSHDATA1 0x00 0x00 0x00 0x01
OP_SUB
OP_SWAP
OP_1ADD
OP_ELSE
OP_SWAP
OP_ENDIF
# Stack now: a_upper(w/carry), lower_sum, b_upper.
OP_ROT
OP_ADD
OP_SWAP
# Stack now: lower_sum, upper_sum
Note that these 26 bytes don’t check that upper doesn’t overflow: if we’re dealing with verified amounts, we can add 16 times before it’s even possible (and it’s never possible with distinct amounts of course). Still, we can add OP_DUP 0 OP_GREATERTHANOREQUAL OP_VERIFY
before the final OP_SWAP
.
Checking Script-Friendly Pairs
The code above assumes well-formed pairs, but since the pairs will come from the witness stack, we need to have a routine to check that a pair is wel-formed:
# !UNTESTED CODE!
# Stack: lower, upper
OP_DUP
# lower must be 0 - 0xFFFFFF inclusive
0
4 OP_PUSHDATA1 0xFF 0xFF 0xFF 0x00
OP_WITHIN
OP_VERIFY
OP_OVER
# upper must be 0 - 0x7FFFFFF inclusive
0
4 OP_PUSHDATA1 0xFF 0xFF 0xFF 0x07
OP_WITHIN
OP_VERIFY
This ensures the ranges are all within spec: no negative numbers, no giant numbers.
Summary
While this shows that OP_CAT
/OP_MULTISHA256
is sufficient to deal with bitcoin amounts in Script, the size (about 250 bytes to validate that two inputs equals one output) makes a fairly compelling case for optimization.
It’s worth noting that this is why Liquid chose to add the following 64-bit opcodes to bitscoin script: OP_ADD64
, OP_SUB64
, OP_MUL64
, OP_DIV64
, OP_NEG64
, OP_LESSTHAN64
, OP_LESSTHANOREQUAL64
, OP_GREATERTHAN64
, OP_GREATERTHANOREQUAL64
.
(They also reenabled the bitwise opcodes (OP_XOR
etc) to work just fine with these. They also implemented OP_SCRIPTNUMTOLE64
, OP_LE64TOSCRIPTNUM
and OP_LE32TOLE64
for conversion.)
In my previous post I proposed OP_LESS
which works on arbitrary values, which doen’t work for these because the endian is wrong! As a minimum, we’d need to add OP_LESSTHAN64
, OP_ADD64
and OP_NEG64
to allow 64-bit comparison, addition and subtraction.
But, with only OP_CAT
or OP_MULTISHA256
, it’s possible to deal with amounts. It’s just not pretty!
Thanks for reading!
Covenants are a construction to allow introspection: a transaction output can place conditions on the transaction which spends it (beyond the specific “must provide a valid signature of itself and a particular pubkey”).
My preferred way of doing instrospection is for Bitcoin Script have a way of asking for various parts of the transaction onto the stack (aka OP_TX
) for direct testing (Fully Complete Covenants, as opposed to using some tx hash, forcing the Script to produce a matching hash to pass (Equality Covenants). In the former case, you do something like:
# Is the nLocktime > 100?
OP_TX_BIT_NLOCKTIME OP_TX 100 OP_GREATERTHAN OP_VERIFY
In the latter you do something like:
# They provide nLocktime on the stack.
OP_DUP
# First check it's > 100
100 OP_GREATERTHAN OP_VERIFY
# Now check it's actually the right value, by comparing its hash the hash of nLocktime
OP_SHA256
OP_TX_BIT_NLOCKTIME OP_TXHASH OP_EQUALVERIFY
However, when we come to examining an output’s ScriptPubkey, we’re forced into the latter mode unless we’re seeking an exact match: the ScriptPubkey is (almost always) a one-way function of the actual spending conditions.
Making a Simple Taproot, in Script
Let’s take a simple taproot case. You want to assert that the scriptPubkey pays to a known key K
, or a script given by the covenent spender. This is the simplest interesting form of Taproot, with a single script path.
The steps to make this into a ScriptPubkey (following BIP 341) are:
- Get a tagged tapleaf hash of the script
- Tweak the key
K
by this value. - Prepend two bytes “0x51 0x20”.
- Compare with the ScriptPubkey of this tx.
Step 1: We need OP_CAT, or OP_MULTISHA256
If we spell out the things we need to hash, it looks like:
SHA256(SHA256("TapLeaf") + SHA256("TapLeaf") + 0xC0 + CSCRIPTNUM(LEN(script)) + script)
CSCRIPTNUM(X)
is (if X
is in canonical form, as it will be from OP_SIZE):
- if
X
is less than 253:X
- otherwise, if the length is less than 256:
- 0xFD 0x00
X
- 0xFD 0x00
- otherwise, if the length is less than 65536:
- 0xFD
X
- 0xFD
- otherwise, we don’t care, make shorter scripts!
The obvious way to do this is to enable OP_CAT
, but this was removed because it allows construction of giant stack variables. If that is an issue, we can instead use a “concatenate-and-hash” function OP_MULTISHA256
, which turns out to be easiest to use if it hashes the stack from top to bottom.
OP_MULTISHA256
definition:
- If the stack is empty, fail.
- Pop
N
off the stack. - If
N
is not a CScriptNum, fail. - If there are fewer than
N
entries on the stack, fail. - Initialize a SHA256 context.
- while
N
> 0:- Pop the top entry off the stack.
- Hash it into the SHA256 context
- Decrement
N
- Finish the SHA256 context, and push the resulting 32 bytes onto the stack.
The result is either:
# Script is on stack, produce tagged tapleaf hash
# First, encode length
OP_SIZE
OP_DUP
# < 253?
OP_PUSHDATA1 1 253 OP_LESSTHAN
OP_IF
# Empty byte on stack:
0
OP_ELSE
OP_DUP
# > 255?
OP_PUSHDATA1 1 0xFF OP_GREATERTHAN
OP_IF
OP_PUSHDATA1 1 0xFD
OP_ELSE
# Needs padding byte
OP_PUSHDATA1 2 0xFD 0x00
OP_ENDIF
OP_ENDIF
# Push 0xC0 leaf_version on stack
OP_PUSHDATA1 1 0xC0
# Push hashed tag on stack, twice.
OP_PUSHDATA1 7 "TapLeaf"
OP_SHA256
OP_DUP
# Now, hash them together
6 OP_MULTISHA256
Or, using OP_CAT
(assuming it also concatenates the top of stack to second on stack):
# Script is on stack, produce tagged tapleaf hash
# First, encode length
OP_SIZE
OP_DUP
# < 253?
OP_PUSHDATA1 1 253 OP_LESSTHAN
OP_NOTIF
OP_DUP
# > 255?
OP_PUSHDATA1 1 0xFF OP_GREATERTHAN
OP_IF
OP_PUSHDATA1 1 0xFD
OP_ELSE
# Needs padding byte
OP_PUSHDATA1 2 0xFD 0x00
OP_ENDIF
OP_CAT
OP_ENDIF
# Prepend length to script
OP_CAT
# Prepend 0xC0 leaf_version
OP_PUSHDATA1 1 0xC0
OP_CAT
# Push hashed tag on stack, twice, and prepend
OP_PUSHDATA1 7 "TapLeaf"
OP_SHA256
OP_DUP
OP_CAT
OP_CAT
# Hash the lot.
OP_SHA256
Step 2: We need to Tweak a Key, OP_KEYADDTWEAK
Now, we need to tweak a public key, as detailed in BIP 341:
def taproot_tweak_pubkey(pubkey, h):
t = int_from_bytes(tagged_hash("TapTweak", pubkey + h))
if t >= SECP256K1_ORDER:
raise ValueError
P = lift_x(int_from_bytes(pubkey))
if P is None:
raise ValueError
Q = point_add(P, point_mul(G, t))
return 0 if has_even_y(Q) else 1, bytes_from_int(x(Q))
Let’s assume OP_KEYADDTWEAK
works like so:
- If there are less than two items on the stack, fail.
- Pop the tweak
t
off the stack. If t >= SECP256K1_ORDER, fail. - Pop the key
P
off the stack. If it is not a valid compressed pubkey, fail. Convert to Even-Y if necessary. (i.e.lift_x()
). Q = P + t*G
.- Push the X coordinate of Q on the stack.
So now we just need to create the tagged hash, and feed it to OP_KEYADDTWEAK
:
# Key, tapscript hash are on stack.
OP_OVER
OP_PUSHDATA1 8 "TapTweak"
OP_SHA256
OP_DUP
# Stack is now: key, tapscript, key, H(TapTweak), H(TapTweak)
4 OP_MULTISHA256
OP_KEYADDTWEAK
Or with OP_CAT
instead of OP_MULTISHA256
:
# Key, tapscript hash are on stack.
OP_OVER
OP_PUSHDATA1 8 "TapTweak"
OP_SHA256
OP_DUP
# Stack is now: key, tapscript, key, H(TapTweak), H(TapTweak)
OP_CAT
OP_CAT
OP_CAT
OP_SHA256
OP_KEYADDTWEAK
Step 3: We Need To Prepend The Taproot Bytes
This is easy with OP_CAT
:
# ScriptPubkey, Taproot key is on stack.
# Prepend "OP_1 32" to make Taproot v1 ScriptPubkey
OP_PUSHDATA1 2 0x51 0x20
OP_CAT
OP_EQUALVERIFY
With OP_MULTISHA256
we need to hash the ScriptPubkey to compare it (or, if we only have OP_TXHASH
, it’s already hashed):
# ScriptPubkey, Taproot key is on stack.
OP_SHA256
# Prepend "OP_1 32" to make Taproot v1 ScriptPubkey
OP_PUSHDATA1 2 0x51 0x20
2 OP_MULTISHA256
# SHA256(ScriptPubkey) == SHA256(0x51 0x20 taproot)
OP_EQUALVERIFY
Making a More Complete Taproot, in Script
That covers the “one key, one script” case.
If we have more than one taproot leaf, we need to perform the merkle on them, rather than simply use the taproot leaf directly. Let’s assume for simplicity that we have two scripts:
- Produce the tagged leaf hash for scripts, call them
H1
andH2
. - If
H1
<H2
, merkle isTaggedHash("TapBranch", H1 + H2)
, otherwiseTaggedHash("TapBranch", H2 + H1)
Step 1: Tagged Hash
We’ve done this before, it’s just Step 1 as before.
Step 2: Compare and Hash: We Need OP_LESS or OP_CONDSWAP
Unfortunately, all the arithmetic functions except OP_EQUAL
only take CScriptNums, so we need a new opcode to compare 32-byte blobs. Minimally, this would be OP_LESS
, though OP_CONDSWAP
(put lesser one on top of stack) is possible too. In our case we don’t care what happens in unequal lengths, but if we assume big-endian values are most likely, we could zero-prepend to the shorter value before comparing.
The result looks like this:
# Hash1, Hash2 are on the stack.
# Put lesser hash top of stack if not already
OP_LESS
OP_NOTIF OP_SWAP OP_ENDIF
OP_PUSHDATA1 9 "TapBranch"
OP_SHA256
OP_DUP
4 OP_MULTISHA256
Or, using OP_CAT
and OP_CONDSWAP
:
# Hash1, Hash2 are on the stack.
# Put lesser hash top of stack if not already
OP_CONDSWAP
OP_PUSHDATA1 9 "TapBranch"
OP_SHA256
OP_DUP
OP_CAT
OP_CAT
OP_CAT
OP_SHA256
So now we can make arbitrarily complex merkle trees from parts, in Script!
Making More Useful Templates: Reducing the Power of OP_SUCCESS
Allowing the covenant spender to specify a script branch of their own is OK if we simply want a condition which is “… OR anything you want”. But that’s not generally useful: consider vaults, where you want to enforce a delay, after which they can spend. In this case, we want “… AND anything you want”.
We can, of course, insist that the script they provide starts with
1000 OP_CHECKSEQUENCEVERIFY
. But because any unknown opcode causes
immediate script success (without actually executing anything), they
can override this test by simply inserting an invalid opcode in the
remainder of the script!
There are two ways I can see to resolve this: one is delegation, where
the remainder of the script is popped off the stack (OP_POPSCRIPT
?).
You would simply insist that the script they provide be exactly 1000
OP_CHECKSEQUENCEVERIFY OP_POPSCRIPT
.
The other way is to weaken OP_SUCCESSx
opcodes. This must be done
carefully! In particular, we can use a separator, such as
OP_SEPARATOR
, and change the semantics of OP_SUCCESSx
:
- If there is an
OP_SEPARATOR
beforeOP_SUCCESSx
:- Consider the part before the
OP_SEPARATOR
:- if (number of
OP_IF
) + (number ofOP_NOTIF
) > (number ofOP_ENDIF
): fail - Otherwise execute it as normal: if it fails, fail.
- if (number of
- Consider the part before the
- Succeed the script
This insulates a prefix from OP_SUCCESSx
, but care has to be taken
that it is a complete script fragment: a future OP_SUCCESSx
definition
must not turn an invalid script into a valid one (by revealing an
OP_ENDIF
which would make the script valid).
Summary
I’ve tried to look at what it would take to make generic convenants in Script: ones which can meaningfully interrogate spending conditions assuming some way (e.g. OP_TXHASH
) of accessing an output’s script. There are reasons to believe this is desirable (beyond a completeness argument): vaulting in particular requires this.
We need three new Script opcodes: I’ve proposed OP_MULTISHA256
, OP_KEYADDTWEAK
and OP_LESS
, and a (soft-fork) revision to treatment of OP_SUCCESSx
. None of these are grossly complex.
The resulting scripts are quite long (and mine are untested and no doubt buggy!). It’s 41 bytes to hash a tapleaf, 19 to combine two tapleaves, 8 to compare the result to the scriptpubkey. That’s at least 109 witness weight to do a vault, and in addition you need to feed it the script you’re using for the output. That seems expensive, but not unreasonable: if this were to become common then new opcodes could combine several of these steps.
I haven’t thought hard about the general applicability of these opcodes, so there may be variants which are better when other uses are taken into account.
Thanks for reading!
[ The below is a personal statement that I make on my own behalf. While my statement's release coincides with a release of an unrelated statement on similar topics made by my employer, Software Freedom Conservancy, and the Free Software Foundation Europe, please keep in mind that this statement is my own, personal opinion — written exclusively by me — and not necessarily the opinion of either of those organizations. I did not consult nor coordinate with either organization on this statement. ]
With great trepidation, I have decided to make this public statement regarding the psychological abuse, including menacing, that I suffered, perpetrated by Eben Moglen, both while I was employed at his Software Freedom Law Center (SFLC) from 2005-2010, and in the years after he fired me. No one revels in having psychological injuries and mistreatment they've suffered paraded to the public. I'll be frank that if it were not for Moglen's use of the USA Trademark Trial and Appeal Board (TTAB) as a method to perpetrate further abusive behavior, I wouldn't have written this post. Furthermore, sadly, Moglen has threatened in recent TTAB filings his intention to use the proceeding to release personal details about my life to the public (using the litigation itself as a lever). I have decided to preemptively make public the facts herein first myself — so that I can at least control the timing and framing of the information.
This post is long; the issues discussed in it are complicated, nuanced, and cannot be summed up easily. Nevertheless, I'm realistic that most people will stop reading soon, so I'll summarize now as best I can in a few sentences: I worked initially with, and then for, Eben Moglen for nearly a decade — during which time he was psychologically abusive and gaslighted me (under the guise of training and mentoring me). I thought for many years that he was one of my best friends (— in retrospect, I believe that he tricked me into believing that he was). As such, I shared extremely personal details about myself to him — which he has used both contemporaneously and in years hence to attempt to discredit me with my colleagues and peers. Recently, Moglen declared his plans to use current TTAB proceedings to force me to answer questions about my mental health in deposition0. Long ago, I disclosed key personal information to Moglen, I therefore have a pretty good idea of what his next move will be during that deposition questioning. Specifically, I believe Moglen was hoping to out me as omni/bisexual1 as part of my deposition in this proceeding. As such, I'm outing myself here first (primarily) to disarm his ability to use what he knows about my sexual orientation against me. Since that last sentence makes me already out, Moglen will be unable to use the biggest “secret” that Moglen “has on me” in his future psychological and legal attacks.

I suspect some folks will stop reading here, but I really urge that you keep reading this post, and also to read the unrelated statement made by Conservancy and FSFE. The details are important and matter. I am admittedly embarrassed to talk publicly about how Moglen exacerbated, expanded, and caused new symptoms of my Post-Traumatic Stress Disorder (PTSD) — which I already suffered from when I met him. But, I feel it is important to talk about these issues publicly for many reasons — including that Moglen seeks to expose these personal facts about me as an attempt to stigmatize what is actually a positive thing: I seek ongoing treatment for my PTSD (which Moglen himself, in part, caused) and to simultaneously process and reduce my (painful and stubborn) internalized shame about my LGBTQIA+ status. (Like many proud LGBTQIA+ folks, I struggle with this because living in a society unfriendly to LGBTQIA+ folks can lead to difficult shame issues — this is a well-documented phenomena that LGBTQIA+ folks like myself suffer from.)
The primary recent catalyst for this situation is as follows: Moglen has insisted that, as part of the ongoing trademark cancellation petition that SFLC filed against my employer, Software Freedom Conservancy in the TTAB, that Moglen both personally be allowed to be present at, and to actually take the depositions3 of me and my colleague, Karen Sandler.
This kind of behavior is typical of how abusers use litigation to perpetuate their abuse. The USA legal system is designed to give everyone “their day in Court”. Frankly, many of the rules established for Court proceedings did not contemplate that the process could be manipulated by abusers, and it remains an open problem on how to repair the rules that both preserve the egalitarian nature of our legal system, but also does not make it easy for abusers to misuse those same rules. Depositions, in particular, are a key tool in abusers' arsenals. Depositions allow Plaintiffs (in the TTAB, BTW, the Plaintiff is called “the Petitioner”) to gather evidence. Generally speaking, most Courts have no good default rules to prevent abusers from using these depositions to get themselves in the room with their victims and harass those victims further with off-topic haranguing. The only method (which is quite clunky as a legal tool) to curtail the harassment somewhat is called a protective order. However, Moglen has been smart enough to use the very process of the protective order application to further perpetuate abusive behavior.
To understand all this in context, I ask that you first read Conservancy's public response to the initial filing of the trademark cancellation proceeding (six years ago). In short, SFLC is seeking to “cancel” the trademark on the name “Software Freedom Conservancy”. Ostensibly, that's all this case is (or, rather should be) about.
The problem is that, upon reading the docket in detail, it's easily seen that at nearly every step, Moglen has attempted to use the proceeding as a method to harass and attack me and my colleague, Karen Sandler — regarding issues wholly unrelated to the trademarks. The recent arguments have been about our depositions4 — mine and Karen's2.
After some complex legal back-and-forth, Judge Elgin ordered that I was legally required to sit for a deposition with and by Moglen. This is the point where a catch-22 began for me.
- Option 0: Sit in a room for 8+ hours with a person who had spent years verbally abusing me and let him ask me any question he wants5 — under penalty of perjury and contempt of Court if I refuse.
- Option 1: Give Conservancy's lawyers permission to talk openly, in public documents, about the details of the abuse I suffered from Moglen and the psychological harm that it caused me (which is the necessary backup document for a protective order motion).
Fortunately, that aforementioned sworn testimony was sufficient to convince Judge Elgin to at least entertain reconsidering her decision that I have to sit8 for a deposition with Moglen. However, submitting the official motion then required that I give even more information about why the deposition with Moglen will be psychologically harmful. In particular, I had little choice but to add a letter from my (highly qualified) mental health provider speaking to the psychological dangers that I would face if deposed by Moglen personally and/or in his presence. I reluctantly asked my therapist to provide such a letter. It was really tough for me to publicly identify who my therapist is, but it was, again, my best option out of that catch-22. I admittedly didn't anticipate that Moglen might use this knowledge as a method to further his abuse against me publicly in his response filing.
As can be seen in Moglen's response filing, Moglen directly attacks my therapist's credentials — claiming she is not credible nor qualified. Moglen's argument is that because my therapist is a licensed, AASECT-certified sex therapist, she is not qualified to diagnose PTSD. Of course, Moglen's argument is without merit: my therapist's sex therapy credentials are in addition to her many other credentials and certifications — all of which is explained on her website that Moglen admits in his filing he has reviewed.
As I mentioned, at one time, I foolishly and erroneously considered Moglen a good friend. As such, I told Moglen a lot about my personal life, including that I was omni/bisexual, and that I was (at the time) closeted. So, Moglen already knows full well the reason that I would select a therapist who held among her credentials a certification to give therapy relating to sexuality. Moglen's filing is, in my view, a veiled threat to me that he's going to disclose publicly what he knows about my sexuality as part of this proceeding. So, I've decided — after much thought — that I should simply disarm him on this and say it first: I have identified as bisexual/omnisexual6 since 1993, but I have never been “out” in my professional community — until now. Moglen knows full well (because I told him on more than one occasion) that I struggled with whether or not to come out for decades. Thus, I chose a therapist who was both qualified to give treatment for PTSD as well as for sexual orientation challenges because I've lived much of my life with internalized shame about my sexual orientation. (I was (and still am, a bit) afraid that it would hurt my career opportunities in the FOSS community and technology generally if I came out; more on that below.) I was still working through these issues with my therapist when all these recent events occurred.
Despite the serious psychological abuse I've suffered from Moglen, until this recent filing, I wouldn't have imagined that Moglen would attempt to use the secrecy about my LGBTQIA+ status as a way to further terrorize me. All I can think to say to Moglen in response is to quote what Joe Welch said to Senator Joe McCarthy on 1954-06-09: “Have you no sense of decency, sir — at long last? Have you left no sense of decency?”.
It's hard to express coherently the difficult realization of the stark political reality of our world. There are people you might meet (and/or work for) who, if they have a policy disagreement8 with you later, will use every single fact about you to their advantage to prevail in that disagreement. There is truly no reason that Moglen needed to draw attention to the fact that I see a therapist who specializes (in part) in issues with sexuality. The fact that he goes on to further claim that the mere fact that she has such certification makes her unqualified to treat my other mental health illness — some of which Moglen himself (in part) personally caused — is unconscionable. I expect that even most of my worst political rivals who work for proprietary software companies and violate copyleft licenses on a daily basis would not stoop as low to what Moglen has in this situation.
At this point, I really have no choice but to come out as omnisexual7 — even though I wasn't really ready to do so. Moglen has insisted now that my therapy has been brought up in the proceeding, that he has a legal right to force me to be evaluated by a therapist of his choosing (as if I were a criminal defendant). Moglen has also indicated that, during my deposition, he will interrogate me about my therapy and my reasons for choosing this particular therapist (see, for example, footnote 2 on page 11 (PDF-Page 27) of Moglen's declaration in support of the motion). Now, even if the judge grants Conservancy's motion to exclude Moglen from my deposition, Moglen will instruct his attorneys to ask me those questions about my therapy and my sexual orientation — with the obvious goal of seeking to embarrass me by forcing me to reveal such things publicly. Like those folks who sat before McCarthy in those HUAC hearings, I know that none of my secrets will survive Moglen's deposition. By outing myself here first, I am, at least, disarming Moglen from attempting to use my shame about my sexual orientation against me.
Regarding LGBTQIA+ Acceptance and FOSS
I would like to leave Moglen and his abusive behavior there, and spend the rest of this post talking about related issues of much greater importance. First, I want to explain why it was so difficult for me to come out in my professional community. Being somewhat older than most folks in FOSS today, I really need to paint the picture of the USA when my career in technology and FOSS got started. I was in my sophomore year of my Computer Science undergraduate program when Clinton implemented the Don't ask, Don't tell (DADT) policy for military in the USA. Now, as a pacifist, I had no desire to join the military, but the DADT approach was widely accepted in all areas of life. The whole sarcastic “Not that there's anything wrong with that …” attitude (made famous contemporaneously to DADT on an episode of the TV show, Seinfeld) made it clear in culture that the world, including those who ostensibly supported LGBTQIA+ rights, wanted queer folks to remain, at best, “quiet and proud”, not “loud and proud”. As a clincher, note that three years after DADT was put in effect, overwhelming bipartisan support came forward for the so-called “Defense of Marriage Act (DOMA)”. An overwhelming majority of everyone in Congress and the Presidency (regardless of party affiliation) was in 1996 anti-LGBTQIA+. Folks who supported and voted yes for DOMA include: Earl Blumenauer (still a senator from my current state), Joe Biden (now POTUS (!)), Barbara Mikulski (a senator until 2017 from my home state), and Chuck Schumer (still Senate majority leader today). DADT didn't end until 2011, and while SCOTUS ruled parts of DOMA unconstitutional in 2015, Congress didn't actually repeal DOMA until last year! Hopefully, that gives a clear sense of what the climate for LGBTQIA+ folks was like in the 1990s, and why I felt was terrified to be outed — even as the 1990s became the 2000s.
I also admit that my own shame about my sexual orientation grew as I got older and began my professional career. I “pass” as straight — particularly in our heteronormative culture that auto-casts everyone as cishet until proven otherwise. It was just easier to not bring it up. Why bother, I thought? It was off-topic (so I felt), and there were plenty of people around the tech world in the 1990s and early 2000s who were not particularly LGBTQIA+-friendly, or who feigned that they were but were still “weird” about it.
I do think tech in general and FOSS in particular are much more LGBTQIA+-friendly than they once were. However, there has been a huge anti-LGBTQIA+ backlash in certain areas of the USA in recent years, so even as I became more comfortable with the idea of being “out”, I also felt (and do feel) that the world has recently gotten a lot more dangerous for LGBTQIA+ folks. Folks like Moglen who wage “total war” against their political opponents know this, and it is precisely why they try to cast phrases like bisexual, gay, queer, and “sex therapist” as salacious.
Also, PTSD has this way of making you believe you're vulnerable in every situation. When you're suffering from the worst of PTSD's symptoms, you believe that you can never be safe anywhere — ever again. But, logically I know that I'm safe being a queer person (at least in the small FOSS world) — for two big reasons. First, the FOSS community of today is (in most cases) very welcoming to LGBTQIA+ folks and most of the cishet folks in FOSS identify as LGBTQIA+ allies. Second, I sheepishly admit that as I've reached my 0x32'nd year of life this year, I have a 20+ year credentialed career that has left me in a position of authority and privilege as a FOSS leader. I gain inherent safety from my position of power in the community to just be who I am.
While this is absolutely not the manner and time in which I wanted to come out, I'll try to make some proverbial lemonade out of the lemons. By now being out as LGBTQIA+ and already being a FOSS leader, I'd like to offer to anyone who is new to FOSS and faces fear and worry about LGBTQIA+ issues in FOSS to contact me if they think I can help. I can't promise to write back to everyone, but I will do my very best to try to either help or route you to someone else in FOSS who might be able to.
Also, I want to state something in direct contrast to Moglen's claims that the mere fact that a therapist who is qualified for treating people with issues related to sexual orientation is ipso facto unqualified to treat any other mental condition. I want to share publicly how valuable it has been for me in finding a therapist who “gets it” with regard to living queer in the world while also suffering from other conditions (such as PTSD). So many LGBTQIA+ youth are bullied due to their orientation, and sustained bullying commonly causes PTSD. I think we should all be so lucky to have a mental health provider, as I do, that is extensively qualified to treat the whole person and not just a single condition or issue. We should stand against people like Moglen who, upon seeing that someone's therapist specializes in helping people with their sexual orientation, would use that fact as a way to shame both the individual and the therapist. Doing that is wrong, and people who do that are failing to create safe spaces for the LGBTQIA+ community.
I am aghast that Moglen is trying to shame me for seeking help from a mental health provider who could help me overcome my internalized shame regarding my sexual orientation. I also want people to know that I did not feel safe as a queer person when I worked for Eben Moglen at SFLC. But I also know Moglen doesn't represent what our FOSS community and software freedom is about. I felt I needed to make this post not only to disarm the power Moglen held to “out me” before I was ready, but also to warn others that, in my opinion, Software Freedom Law Center (SFLC) as an organization that is not a safe space for LGBTQIA+ folks. Finally, I do know that Moglen is also a tenured professor at Columbia Law School. I have so often worried about his students — who may, as I did, erroneously believe they can trust Moglen with private information as important as their LGBTQIA+ status. I simply felt I couldn't stay silent about my experiences in good conscience any longer.
0, 4 A deposition is a form of testimony done during litigation before trial begins. Each party in a legal dispute can subpoena witnesses. Rules vary from venue to venue, but typically, a deposition is taken for eight hours, and opposing attorneys can ask as many questions as they want — including leading questions.
5In most depositions, there is a time limit, but the scope of what questions can be asked are not bounded. Somewhat strangely, one's own lawyer is not usually permitted to object on grounds of relevancy to the case, so the questions can be as off-topic as the opposing counsel wants.
3, 8 The opposing attorney who asks the question is said to be “taking the deposition”. The witness is said to be “sitting for a deposition”. (IIUC, these are terms of art in litigation).
1, 6, 7 From 1993-2018, I identified as “bisexual”. That term, unfortunately, is, in my opinion, not friendly to non-binary people, since the “bi” part (at least to me, I know others disagree) assumes binary gender. The more common term used today is “pansexual”, but, personally I prefer the term “omnisexual” to “pansexual” for reasons that are beyond the scope of this particular post. I am, however, not offended if you use any of the three terms to refer to my sexual orientation.
2Note, BTW: when you read the docket, Judge Elgin (about 75% of the time) calls Karen by the name “Ms. Bradley” (using my first name as if it were Karen's surname). It's a bit confusing, so watch for it while you're reading so you don't get confused.
8 Footnote added 2023-10-12, 19:00 US/Eastern: Since I posted this about 30 hours ago, I've gotten so many statements of support emailed to me that I can't possibly respond to them all, but I'll try. Meanwhile, a few people have hinted at and/or outright asked what policy disagreements Moglen actually has with me. I was reluctant to answer because the point I'm making in this post is that even if Moglen thought every last thing I've ever done in my career was harmful policy-wise, it still would not justify these abusive behaviors. Nevertheless, I admit that if this post were made by someone else, I'd be curious about what the policy disagreements were, so I decided to answer the question. I think that my overarching policy disagreement with Eben Moglen is with regard to how and when to engage in enforcement of the GPL and other copyleft licenses through litigation. I think Moglen explains this policy disagreement best in his talk that the Linux Foundation contemporaneously promoted (and continues to regularly reference) entitled “Whither (Not Wither) Copyleft”. In this talk, Moglen states that I (among others) are “on a jihad for free software” (his words, direct quote) because we continued to pursue GPL enforcement through litigation. While I agree that litigation should still remain the last resort, I do think it remains a necessary step often. Moglen argues that even though litigation was needed in the past, it should never be used again for copyleft and GPL enforcement. As Moglen outlines in his talk, he supports the concept of “spontaneous compliance” — a system whereby there is no regulatory regime and firms simply chose to follow the rules of copyleft because it's so obviously in their own best interest. I've not seen this approach work in practice, which is why I think we must still sometimes file GPL (and LGPL) lawsuits — even today. Moglen and I have plenty of other smaller policy disagreements: from appropriate copyright assignment structures for FOSS, to finer points of how GPLv3 should have been drafted, to tactics and strategy with regard to copyleft advocacy, to how non-profits and charities should be structured for the betterment of FOSS. However, I suspect all these smaller policy disagreements stem from our fundamental policy disagreement about GPL enforcement. However, I conclude by (a) saying again no policy disagreement with anyone justifies abusive behavior toward that person — not ever, and (b) please do note the irony that, in that 2016-11-02 speech, Moglen took the position that lawsuits should no longer be used to settle disputes in FOSS, and yet — less than 10 months later — Moglen sued Conservancy (his former client) in the TTAB.
A few conversations last week made me realize I use the word “interesting” in an unusual way.
I rely heavily on mental models. Of course, everyone relies on mental models. But I do it intentionally and I push it extra hard.
What I mean by that is, when I’m making predictions about what will happen next, I mostly don’t look around me and make a judgement based on my immediate surroundings. Instead, I look at what I see, try to match it to something inside my mental model, and then let the mental model extrapolate what “should” happen from there.
If this sounds predictably error prone: yes. It is.
But it’s also powerful, when used the right way, which I try to do. Here’s my system.
Confirmation bias
First of all, let’s acknowledge the problem with mental models: confirmation bias. Confirmation bias is the tendency of all people, including me and you, to consciously or subconsciously look for evidence to support what we already believe to be true, and try to ignore or reject evidence that disagrees with our beliefs.
This is just something your brain does. If you believe you’re exempt from this, you’re wrong, and dangerously so. Confirmation bias gives you more certainty where certainty is not necessarily warranted, and we all act on that unwarranted certainty sometimes.
On the one hand, we would all collapse from stress and probably die from bear attacks if we didn’t maintain some amount of certainty, even if it’s certainty about wrong things. But on the other hand, certainty about wrong things is pretty inefficient.
There’s a word for the feeling of stress when your brain is working hard to ignore or reject evidence against your beliefs: cognitive dissonance. Certain Internet Dingbats have recently made entire careers talking about how to build and exploit cognitive dissonance, so I’ll try to change the subject quickly, but I’ll say this: cognitive dissonance is bad… if you don’t realize you’re having it.
But your own cognitive dissonance is amazingly useful if you notice the feeling and use it as a tool.
The search for dissonance
Whether you like it or not, your brain is going to be working full time, on automatic pilot, in the background, looking for evidence to support your beliefs. But you know that; at least, you know it now because I just told you. You can be aware of this effect, but you can’t prevent it, which is annoying.
But you can try to compensate for it. What that means is using the part of your brain you have control over — the supposedly rational part — to look for the opposite: things that don’t match what you believe.
To take a slight detour, what’s the relationship between your beliefs and your mental model? For the purposes of this discussion, I’m going to say that mental models are a system for generating beliefs. Beliefs are the output of mental models. And there’s a feedback loop: beliefs are also the things you generalize in order to produce your mental model. (Self-proclaimed ”Bayesians” will know what I’m talking about here.)
So let’s put it this way: your mental model, combined with current observations, produce your set of beliefs about the world and about what will happen next.
Now, what happens if what you expected to happen next, doesn’t happen? Or something happens that was entirely unexpected? Or even, what if someone tells you you’re wrong and they expect something else to happen?
Those situations are some of the most useful ones in the world. They’re what I mean by interesting.
The “aha” moment
-
The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” (I found it!) but “That’s funny…”
— possibly Isaac Asimov
When you encounter evidence that your mental model mismatches someone else’s model, that’s an exciting opportunity to compare and figure out which one of you is wrong (or both). Not everybody is super excited about doing that with you, so you have to be be respectful. But the most important people to surround yourself with, at least for mental model purposes, are the ones who will talk it through with you.
Or, if you get really lucky, your predictions turn out to be demonstrably concretely wrong. That’s an even bigger opportunity, because now you get to figure out what part of your mental model is mistaken, and you don’t have to negotiate with a possibly-unwilling partner in order to do it. It’s you against reality. It’s science: you had a hypothesis, you did an experiment, your hypothesis was proven wrong. Neat! Now we’re getting somewhere.
What follows is then the often-tedious process of figuring out what actual thing was wrong with your model, updating the model, generating new outputs that presumably match your current observations, and then generating new hypotheses that you can try out to see if the new model works better more generally.
For physicists, this whole process can sometimes take decades and require building multiple supercolliders. For most of us, it often takes less time than that, so we should count ourselves fortunate even if sometimes we get frustrated.
The reason we update our model, of course, is that most of the time, the update changes a lot more predictions than just the one you’re working with right now. Turning observations back into generalizable mental models allows you to learn things you’ve never been taught; perhaps things nobody has ever learned before. That’s a superpower.
Proceeding under uncertainty
But we still have a problem: that pesky slowness. Observing outcomes, updating models, generating new hypotheses, and repeating the loop, although productive, can be very time consuming. My guess is that’s why we didn’t evolve to do that loop most of the time. Analysis paralysis is no good when a tiger is chasing you and you’re worried your preconceived notion that it wants to eat you may or may not be correct.
Let’s tie this back to business for a moment.
You have evidence that your mental model about your business is not correct. For example, let’s say you have two teams of people, both very smart and well-informed, who believe conflicting things about what you should do next. That’s interesting, because first of all, your mental model is that these two groups of people are very smart and make right decisions almost all the time, or you wouldn’t have hired them. How can two conflicting things be the right decision? They probably can’t. That means we have a few possibilities:
- The first group is right
- The second group is right
- Both groups are wrong
- The appearance of conflict is actually not correct, because you missed something critical
There is also often a fifth possibility:
- Okay, it’s probably one of the first four but I don’t have time to figure that out right now
In that case, there’s various wisdom out there involving one- vs two-way doors, and oxen pulling in different directions, and so on. But it comes down to this: almost always, it’s better to get everyone aligned to the same direction, even if it’s a somewhat wrong direction, than to have different people going in different directions.
To be honest, I quite dislike it when that’s necessary. But sometimes it is, and you might as well accept it in the short term.
The way I make myself feel better about it is to choose the path that will allow us to learn as much as possible, as quickly as possible, in order to update our mental models as quickly as possible (without doing too much damage) so we have fewer of these situations in the future. In other words, yes, we “bias toward action” — but maybe more of a “bias toward learning.” And even after the action has started, we don’t stop trying to figure out the truth.
Being wrong
Leaving aside many philosophers’ objections to the idea that “the truth” exists, I think we can all agree that being wrong is pretty uncomfortable. Partly that’s cognitive dissonance again, and partly it’s just being embarrassed in front of your peers. But for me, what matters more is the objective operational expense of the bad decisions we make by being wrong.
You know what’s even worse (and more embarrassing, and more expensive) than being wrong? Being wrong for even longer because we ignored the evidence in front of our eyes.
You might have to talk yourself into this point of view. For many of us, admitting wrongness hurts more than continuing wrongness. But if you can pull off that change in perspective, you’ll be able to do things few other people can.
Bonus: Strong opinions held weakly
Like many young naive nerds, when I first heard of the idea of “strong opinions held weakly,” I thought it was a pretty good idea. At least, clearly more productive than weak opinions held weakly (which are fine if you want to keep your job), or weak opinions held strongly (which usually keep you out of the spotlight).
The real competitor to strong opinions held weakly is, of course, strong opinions held strongly. We’ve all met those people. They are supremely confident and inspiring, until they inspire everyone to jump off a cliff with them.
Strong opinions held weakly, on the other hand, is really an invitation to debate. If you disagree with me, why not try to convince me otherwise? Let the best idea win.
After some decades of experience with this approach, however, I eventually learned that the problem with this framing is the word “debate.” Everyone has a mental model, but not everyone wants to debate it. And if you’re really good at debating — the thing they teach you to be, in debate club or whatever — then you learn how to “win” debates without uncovering actual truth.
Some days it feels like most of the Internet today is people “debating” their weakly-held strong beliefs and pulling out every rhetorical trick they can find, in order to “win” some kind of low-stakes war of opinion where there was no right answer in the first place.
Anyway, I don’t recommend it, it’s kind of a waste of time. The people who want to hang out with you at the debate club are the people who already, secretly, have the same mental models as you in all the ways that matter.
What’s really useful, and way harder, is to find the people who are not interested in debating you at all, and figure out why.
The standard approach to supporting real Poker (generally meaning Hold’em) on a blockchain is to use a bunch of zero knowledge stuff to get a secret permutation. A much more practical albeit unpleasant approach is to play hands out as normal and simply cancel any hand which has a duplicate card. You can make that much less unpleasant by doing a 2-party computation beforehand to figure out if there will be any duplicate cards and ‘pre-cancel’ the hand, resulting in a much better practical play experience and very low on-chain cost but with possibly impractical computational requirements of the machines involved. I’ve done some digging into this now and the indications aren’t encouraging but I’ll put my thoughts together here.
For context, here’s what the on-chain protocol looks like:
Alice and Bob commit to their 5th images (if you don’t want the 2-party computation you can optimize out half of this step but that’s no big difference.)
Alice and Bob reveal their 4th images. Alice calculated her two hole cards from Bob’s 4th image and her preimage and Bob calculates his two hole cards from Alice’s 4th image and his preimage
Alice and Bob reveal their 3rd images, used to calculate the 3 flop cards
Alice and Bob reveal their 2nd images, used to calculate the turn card
Alice and Bob reveal their 1st images used to calculate the river card
Alice and Bob reveal their preimages
That’s about 20 sha256 hashes at about depth 6. For my specific case a secure hash function based off BLS12-381 groups could also be used.
It appears to be that a semi-honest protocol can be used as follows: The output of the circuit is Alice’s 5th image (her commit), Bob’s 5th image (his commit), whether a duplicate card happens, and the hash of a combination of Alice and Bob’s preimages, used preventing chicanery. The protocol is run twice, once for Alice to do the calculations and once for Bob. After the calculations are done Alice and Bob send each other MAC challenges about that final value and if the other side isn’t able to generate a good response the hand is cancelled.
Whether this is a reasonable/safe use of semi-honest computation I’m unclear. There’s no harm done if either side can sniff some information as long as it results in cancellation of the hand, and even if one side is able to sniff, say, a single bit of the input of the other side’s preimage that doesn’t help cheating because the whole thing is hashed anyway. If a controlled bit of information from inside the circuit can be sniffed that’s much more of a problem. Feedback on this welcome.
As for benchmarks which need to be met, here are some teetering-on-the-edge benchmarks which need to be met for a Poker protocol to be practical. If all of these are met the protocol is on the edge. If they’re all exceeded by an order of magnitude you’re on easy street:
No more than 10 seconds of computation time on a standard desktop
No more than 100 round trips
No more than 100 megabytes of data transferred each way
Initial digging indicates that number of round trips can be met by selecting the right protocol (assuming semi-honest is okay) but the other two are off by a few orders of magnitude. Again feedback from experts would be welcome.
At least two people have contacted me concerning the 2 BTC bounty:
2 BTC for a human-readable bolt 12 offer generator feature integrated into a popular iOS or android bitcoin wallet. “Human-readable” means something that can be used on feature phone without QR or copy/paste ability. For example, something that looks like LN address.
This, of course, is asking to solve Zooko’s Triangle, so one of decentralizationm, human readability, or security needs to compromise! Fortunately, the reference to LN address gives a hint on how we might proceed.
The scenario, presumably, is Bob wants to pay Alice, where Alice shows Bob a “Human Readable Offer” and Bob types it into his phone. Each one runs Phoenix, Greenlight, or (if their phone is too low-end) uses some hosted service, but any new third party trust should be minimized.
There are three parts we need here:
- Bob finds Alice’s node.
- Bob requests Alice’s node for invoice.
- If she wants, Alice can easily check Bob’s going to pay the right thing.
The Imagined Scenario
Consider the normal offer case: the offer encodes Alice’s nodeid and description (and maybe other info) about what’s on offer. Bob turns this into an invoice_request, sends an onion message to Alice’s node, which returns the (signed) invoice, which Bob pays. We need to encode that nodeid and extra information as compactly as we can.
Part 1: Finding Alice’s Node from a Human Readable Offer
The issue of “finding Alice’s node” has been drafted already for BOLT12, at https://github.com/rustyrussell/bolt12address (but it needs updating!). This means that if you say “rusty@blockstream.com” you can get a valid generic offer, either by contacting the webserver at “blockstream.com” or having someone else do it for you (important for privacy!), or even downloading a public list of common receivers.
Note that it’s easier to type *
than @
on feature phones, so I suggest allowing both rusty@blockstream.com
and RUSTY*BLOCKSTREAM.COM
.
What’s Needed On The Server
- The BOLT 12 Address Format needs to be updated.
- It needs to be implemented for some Web server.
- Ideally, integrate it into BTC Payserver or the like.
Part 2: Getting the Invoice
Now, presumably, we want a specific invoice: it might be some default “donate to Alice”, but it could be a specific thing “$2 hot dog”. So you really want some (short!) short-code to indicate which invoice you want. I suggest a hash, followed by some randomly chosen alphanumeric string here (case-insensitive!): an implementation may choose to restrict themselves to numbers however, as that’s faster to enter on a feature phone.
What’s Needed On The Server
- We can put the short-code in the
invreq_payer_note
field in BOLT 12 or add a new odd field. - We need to implement (presumably in Core Lightning):
- A way to specify/assign a short-code for each offer.
- A way of serving a particular invoice based on this short-code match.
Part 3: Checking the Invoice
So, did you even get the right node id? That’s the insecure part; you’re trusting blockstream.com! Checking the nodeid is hard: someone can grind out a nodeid with the same first 16 digits in a few weeks. But I think you can provide some assurance, by creating a 4-color “flag” using the node id and the latest bitcoin blocks: this will change every new block, and is comparable between Alice and Bob at a glance:
This was made using this hacky code which turns my node id 024b9a1fa8e006f1e3937f65f66c408e6da8e1ca728ea43222a7381df1cc449605 into an RGB color (by hashing the nodeid+blockhash).
For a moment, when a new block comes in, one image might be displaced, hence the number, but it’ll only be out by one.
Putting it All Together
What’s Needed On Alice’s Client
- Alice needs to configure her BOLT12 Address with some provider when she sets up the phone: it should check that it works!
- She should be able to choose an existing offer (may be a “donation” by default), or create a new one on the fly (with a new short code).
- Display the BOLT12-ADDRESS
#
SHORT-CODE, and the current nodeid flag.
What’s Needed On Bob’s Client
- It needs to be able to convert BOLT12-ADDRESS into a bolt12 address request:
- Either via some service (to be implemented!), or by directly query (ideally over Tor).
- It needs to be able to produce an offer from the returns bolt12 address response, by putting the SHORT-CODE into the invreq_payer_note.
- It needs to be able to fetch an invoice for this offer.
- It needs to be able to display the current nodeid flag for the invoice’s node id.
- Allow Bob to confirm to send payment.
Is There Anything Else?
There are probably other ways of doing this, but this method has the advantage of driving maturity in several different areas which we want to see in Bitcoin:
- bolt12 address to support vendor field validation for offers.
- Simple name support for bootstrapping.
- Driving Bitcoin to be more accessible to everyone!
Feel free to contact me with questions!
I’ve been working lately on implementing on-chain gaming in Chialisp. Since on-chain costs and latency are very high this requires optimization at all levels:
Only two players are supported. Multiplayer creates problems in all aspects of implementation and practice.
The games supported have very few turns total. It turns out Poker is a good fit because it’s designed to allow people to rage quit in the middle of sessions, so a cash game is a series of very short games instead of a single long one.
Play on chain is inside of a ‘referee’ which accepts moves and states optimistically. When a player misbehaves the other player can provide evidence of their cheating and have the referee slash them. This is the only time the rules of the game are actually invoked on chain. Usually the evidence of cheating is just pointing out what rule was violated, but there are other things like invoking the dictionary in a word game.
Instead of hands being played on chain they’re done over a payment channel. This way play can be done at very low latency and cost.
Combining all of those makes for a gaming experience which actually makes sense. It also involves making state channels, which is good because this exercise is partially an excuse to work on state channels with a real use case to see what functionality makes sense. The minimum viable product of a payment channel network involves quite a bit of real world infrastructure and relationships, but the MVP of gaming while it’s more technically complex only requires two people who want to play. If every session is done over an ephemeral channel that requires a minute to set up at the beginning (assuming it’s on Chia) which is no big deal. Later on a state channel network can be created to get rid of that start-up time.
In service of this I’ve put together an initial suite of games which is meant to hash out different edge cases of implementation details while being based on proven concepts mostly only changed to adapt them to the medium. Here is the current list:
A game similar to closed Chinese Poker but which has fewer round trips and is a bit more of a real game. This is the simplest game and meant to test out the minimal edge cases along with forcing fleshing out of Chialisp itself with support for evaluating Poker hands.
A two-player variant of Wordle where one player picks a word and the other tries to guess it. To make it more of a real game I’m throwing in something similar to a doubling cube. This requires pulling in of outside evidence from the dictionary which hits some very important edge cases. This has the gameplay benefit of play being extremely fast.
A Poker variant. It turns out the hardest part of adapting it is getting a hidden permutation working. My current plan is the simple approach of changing the rules so duplicate cards are simply allowed. To keep the complexity down suits are then thrown out, resulting in no more flushes but five of a kind being possible. To get some of the gameplay involved in hole cards being suited back there’s an additional bit of hidden information which comes with hole cards which is whether they’re ‘boosted’, which has a one in three chance of being true and serves as the first tiebreak after hand type. Boosting has similar gameplay effects to cards being suited but is relevant in many more circumstances. Hopefully this is a net positive for gameplay but obviously is likely to be controversial. I may also change the play order so whoever was the last aggressor is out of position on each round, just because it reduces the total number of turns.
Usually people implement random permutations using zero knowledge stuff but that appears to be a poor fit here because it’s putting some very expensive stuff on chain. Much more expedient would be to have both players reveal their commits before a hand even starts then do a multi-party computation to determine if that results in a card collision and if so cancel the hand. Otherwise it’s played out and the cards magically don’t come out as duplicates. That adds literally zero on-chain costs and only results in cancelling about half the hands for Hold’em because it’s nine cards out of a 52 card deck. I’m currently researching what available libraries are and how to use them and it may be practical but is a bit of a project.
There may be other games worth supporting, for example Liar’s Poker, but that doesn’t seem to be played very often and there isn’t an obvious canonical set of rules, especially for only two people. Much as I like the game designer indulgence of getting people to play games which I make up I’m running a crypto company not a game company and need to be responsible about sticking as much as possible to proven formulas with justifiable tweaks to adapt them to the medium and (hopefully) enhance gameplay.
Having a bit more experience with Chialisp development now it’s becoming clearer how it compares to Solidity development. One very in-your-face aspect of it is that it forces the usage of the most reentrancy resistant design patterns possible. This is obviously a good thing. It forces a fair amount of pain up front but even total Solidity-heads are in favor of everyone being forced to do that properly. Another big change is how capabilities are handled. In Chialisp everyone carries around their certificates with them and proves their authenticity using backwards-pointing covenants. This requires some tricks which are currently quite unconventional to keep the certificates, but makes checking them trivial and super reliable. Solidity on the other hand requires checking capabilities with third party contracts, which is ugly and awkward and involves relying on an API with bespoke implementations instead of a standard audited piece of code. This is a much less clear tradeoff with obvious benefits and downsides, the main downside being that the techniques for maintaining certifications are currently not widely understood, but we’re working on that. That relates to the other big difference which is that the tooling and environment of Solidity is much more mature than Chialisp. That is improving rapidly and in not too long will be at the point where people who wish to do something can simply start writing code, but it isn’t there yet. On the plus side even today writing actually secure and auditable code in Chialisp is easier than in Solidity because mistakes tend to make things simply fail rather than having security problems. But we aspire to make it more convenient in all ways so there’s still much work to be done.
In Europe there are statutory penalties which airlines have to pay when flights are delayed. In the US there aren’t. This should be changed.
Airlines don’t control the weather, but they do control the decision of whether to use hub cities and if so which ones, how much extra capacity to keep in the system, and what contingency plans there are. Plus they have access to plain old fashioned insurance.
I have to admit to writing this grouchily: I was recently delayed nearly a full day on a connecting flight back home, with no compensation, not even for my overnight hotel. But my complaint is legit: It was a connecting flight in a city I have no connection to and which has frequent storms. The airline didn’t have the capacity to run redeye flights during night hours which they normally don’t instead of bumping everyone to flights the next day, and they were in no rush to add extra flights early the next day since once you’re bumped because of weather they’re off the hook and the amount of delay doesn’t matter.
While I was trying to get a new connecting flights they were doing legally required auctions for who gets bumped and having to pay in at least one case $800 a seat, which makes sense because getting delayed generally disrupts the rest of your plans, often the first day of vacation or first day of work back from vacation. Mandatory compensation of people for much less than that but still in excess of the cost of a hotel room is clearly warranted.
Having delay penalties in place would result in some flights being a bit more expensive on their face, but with compensation when delays happened, and many fewer delays and better responses to them. It would overall be saving people a lot in the hidden costs of their inconvenience when flights get delayed. It’s possible that this would result in some airlines going under or some airports getting used a lot less. If it does, that will be a good thing.