The lower-post-volume people behind the software in Debian. (List of feeds.)

, literacy, and democracy. Via Mary Meeker at Presumably subject to some pretty fuzzy definitions ("democracy"), but I think it's fairly honest.

Posted Fri Oct 13 03:44:40 2017 Tags:

, if you have highly motivated, engaged, efficient people with not too many deadlines, you don't have to solve this explicitly: just let people do what they think is "important" and it'll work out. In small companies, probably like early-day Google, this informal system works out great. Once you have formal systems, you have incentives, and once you have incentives, you have to have high-level people (who are further from the actual problems being solved) make decisions about what to incent and disincent. Unsurprisingly, those decisions end up being mostly about "strategic direction" and not about day-to-day manageability, because executives don't have to do any day-to-day management. Instead, they just see the technical debt slowly build up and the teams slow down, but nobody can quite tell how it happened.

Posted Fri Oct 13 03:44:40 2017 Tags:

, but it's pretty great. As usual he comes across as a bit unrealistic about what's possible in the short term, but he still tells a pretty compelling story. I especially like the idea of underground car tunnels vs flying cars. ("I sure hope the people up there have kept up with their hubcap maintenance" was a pretty funny understatement.)

Posted Fri Oct 13 03:44:40 2017 Tags:

I've been thinking a lot about "overnight success" lately. Check out how long it's been taking for Tesla to achieve theirs! [Data collected by combining various public data sources. I might have screwed it up, but I think I'm mostly right.]

Of interest:

  • Wow, they really didn't sell very many Roadsters.

  • Recent increase in units/employee, probably due to better factory automation

  • Miles driven since the last (reported) autopilot fatality is more than 2x as many as before; they're either getting luckier or the software is getting better.

Posted Fri Oct 13 03:44:40 2017 Tags:

. Well that's not what I expected."

Posted Fri Oct 13 03:44:40 2017 Tags:

, such as "What is this?" when the meeting invite has no detailed description :)

Posted Fri Oct 13 03:44:40 2017 Tags:

keep posting about. But my deep dark secret is... I mostly got it from reading books in the first place.

Posted Fri Oct 13 03:44:40 2017 Tags:

, around the time GFiber started (2011-2012) to last year. The right strategy for an ISP certainly does change over time. (From Mary Meeker at It's filled with awesome.)

Posted Fri Oct 13 03:44:40 2017 Tags:

, I measure time in units of Bachelor's degrees. For most impressive results, I simultaneously measure life experience in units of Bachelor's degrees.

Happy 1.5 billion seconds, everyone :)

Posted Fri Oct 13 03:44:40 2017 Tags:

, Apple frittered away its dominance of the music industry. Perhaps this is old news to everyone but me, but the story is something like this:

  • "Nobody" wants to buy music anymore. It's all streaming.

  • In case you do download music, everyone long ago gave up on DRM and switched to watermarking. So you can carry your music away from whatever store you downloaded it from (including iTunes) and even switch platforms easily.

  • Even Apple wants you to switch to streaming: Apple Music. Perhaps because streaming is still DRMed.

  • Streaming apps (at least some of them) now work great with "offline mode." You can just pick entire albums and mark them offline, which downloads them to local storage, straight from the Internet.

  • That means you don't need iTunes to load music on your iPhone anymore.

  • Because you already bought a monthly subscription to the streaming service, you aren't making an "in-app purchase" when you do this, thus you bypass Apple's 30% cash grab on in-app purchases.

  • Most streaming apps are cross-platform (iOS, Android, Windows, Mac) so there is no vendor lock-in. And they store all your settings in the cloud, so you can switch devices seamlessly. (Incidentally there's not much vendor lock-in to particular streaming services either. With a few exceptions, they all basically have all the music.)

  • Spotify has ~2x as many users as Apple Music.

  • Tidal is smaller than Apple Music, but has better sound quality.

Goodness, how quickly things change. It wasn't so long ago that Apple was shutting down devices which tried to fake their way into auto-syncing music from iTunes. I wonder if they're actually worried about all this, or just don't care that much about the music market at this point. (It looks like the whole business is about $4B/year[1], and most of that presumably goes to the record companies. Not sure if that's US-only though.)

They still have the power to make things miserable for all these other music services, but they're not using that power.


Posted Fri Oct 13 03:44:40 2017 Tags:
Screenshot from the SIRIUS 3 Documentation.
License: unknown.
It has been ages I blogged about work I heard about and think should receive more attention. So, I'll try to pick up that habit again.

After my PhD research (about machine learning (chemometrics, mostly), crystallography, QSAR) I first went into the field metabolomics. Because is combines core chemistry with the complexity biology. My first position was with Chris Steinbeck, in Cologne, within the bioinformatics institute led by Prof. Schomburg (of the BRENDA database). During that year, I worked in a group that worked on NMR data (NMRShiftDb, dr. Stefan Kuhn), Bioclipse (collaboration with Ola Spjuth), and, of course, the Chemistry Development Kit (see our new paper).

This new paper, actually, introduces functionality that was developed in that year, for example, work started by Miquel Rojas-Cheró. This includes the work on atom types, which we needed to handle radicals, lone pairs, etc, for delocalisation. It also includes work around handling molecular formula and calculating molecular formulas from (accurate) molecular masses. For the latter, more recent work even further improved on earlier work.

So, whenever metabolomics work is published and they use the CDK, I realize that what the CDK does has impact. This week Google Scholar alerted me about a user guidance document for SIRIUS 3 (see the screenshot). Seems really nice (great) work from Sebastian Böcker et al.!

It also makes me happy, as our Faculty of Heath, Medicine, and Life Sciences (FHML) is now part of the Netherlands Metabolomics Center, and that we published the recent article our vision of a stronger, more FAIR European metabolomics community.
Posted Sun Oct 8 09:11:00 2017 Tags:

HUION PenTablet devices are graphics tablet devices aimed at artists. These tablets tend to aim for the lower end of the market, driver support is often somewhere between meh and disappointing. The DIGImend project used to take care of them, but with that out of the picture, the bugs bubble up to userspace more often.

The most common bug at the moment is a lack of proximity events. On pen devices like graphics tablets, we expect a BTN_TOOL_PEN event whenever the pen goes in or out of the detectable range of the tablet ('proximity'). On most devices, proximity does not imply touching the surface (that's BTN_TOUCH or a pressure-based threshold), on anything that's not built into a screen proximity without touching the surface is required to position the cursor correctly. libinput relies on proximity events to provide the correct tool state, which again is relied upon by compositors and clients.

The broken HUION devices only send BTN_TOOL_PEN once whenever the pen first goes into proximity and then never again until the device is disconnected. To make things more fun, HUION re-uses USB ids, so we cannot even reliably detect the broken devices and do the usual approach to hardware-quirking. So far, libinput support for HUION devices has thus been spotty. The good news is that libinput git master (and thus libinput 1.9) will have a fix for this. The one thing we can rely on is that tablets keep sending events at the device's scanout frequency. So in libinput we now add a timeout to the tablets and assume proximity-out has happened. libinput fakes a proximity out event and waits for the next event from the tablet - at which point we'll fake a proximity in before processing the events. This is enabled on all HUION devices now (re-using USB IDs, remember?) but not on any other device.

One down, many more broken devices more to go. Yay.

Posted Thu Sep 21 04:52:00 2017 Tags:

As the 4.13 release has now happened, the merge window for the 4.14 kernel release is now open. I mentioned this many weeks ago, but as the word doesn’t seem to have gotten very far based on various emails I’ve had recently, I figured I need to say it here as well.

So, here it is officially, 4.14 should be the next LTS kernel that I’ll be supporting with stable kernel patch backports for at least two years, unless it really is a horrid release and has major problems. If so, I reserve the right to pick a different kernel, but odds are, given just how well our development cycle has been going, that shouldn’t be a problem (although I guess I just doomed it now…)

As always, if people have questions about this, email me and I will be glad to discuss it, or talk to me in person next week at the LinuxCon^WOpenSourceSummit or Plumbers conference in Los Angeles, or at any of the other conferences I’ll be at this year (ELCE, Kernel Recipes, etc.)

Posted Wed Sep 6 15:20:44 2017 Tags:

A few days ago, I pushed code for button debouncing into libinput, scheduled for libinput 1.9. What is button debouncing you ask? Well, I'm glad you asked, because otherwise typing this blog post would've been a waste of time :)

Over in Utopia, when you press the button on a device, you get a press event from the hardware. When you release said button, you get a release event from the hardware. Together, they form the button click interaction we have come to learn and love over the last couple of decades. Life is generally merry and the sunshine to rainbow to lollipop ratio is good. Meanwhile, over here in the real world, buttons can be quite dodgy, don't always work like they're supposed to, lollipops are unhealthy and boy, have you seen that sunburn the sunshine gave me? One way how buttons may not work is that they can lose contact for a fraction of a second and send release events even though the button is being held down. The device usually detects that the button is still being down in the next hardware cycle (~8ms on most devices) and thus sends another button press.

For us, there are not a lot of hints that this is bad hardware besides the timestamps. It's not possible for a user to really release and press a button within 8ms, so we can take this as a signal for dodgy hardware. But at least that's someting. All we need to do is ignore the release event (and subsequent button event) and only release when the button is released properly. This requires timeouts and delays of the event, something we generally want to avoid unless absolutely necessary. So the behaviour libinput has now is enabled but inactive button debouncing on all devices. We monitor button release and button press timestamps, but otherwise leave the events as-is, so no delays are introduced. Only if a device sends release/press combos with unfeasably short timeouts, activate button debouncing. Once active, we filter all button release events and instead set a timer. Once the timer expires, we send the button release event. But if at any time before then another button press is detected, the scheduled release is discarded, the button press is filtered and no event is sent. Thus, we paper over the release/press combo the hardware gives us and to anyone using libinput, it will look like the button was held down for the whole time.

There's one downside with this approach - the very first button debounce to happen on a device will still trigger an erroneous button release event. It remains to be seen whether this is a problem in real-world scenarios. That's the cost of having it as an auto-enabling feature rather than an explicit configuration option.

If you do have a mouse that suffers from button bouncing, I recommend you try libinput's master branch and file any issues if the debouncing doesn't work as it should. Might as well get any issues fixed before we have a release.

Posted Thu Jul 27 11:52:00 2017 Tags:
An effort to clean up several messes simultaneously. #rng #forwardsecrecy #urandom #cascade #hmac #rekeying #proofs
Posted Sun Jul 23 13:37:46 2017 Tags:
News regarding the SUPERCOP benchmarking system, and more recommendations to NIST. #benchmarking #supercop #nist #pqcrypto
Posted Wed Jul 19 18:15:13 2017 Tags:

An interesting math question is: for a given number of dimensions, how many different points can be placed in that many dimensions such that the angle formed by every triplet of them is acute? This paper shows a dramatic new result which makes an explicit construction of (21/2)d which is about 1.41d. I’ve managed to improve this to (61/5)d, which is about 1.43d.

Those of you not familiar with it already should read about dot products before proceeding. The important points for this result are: If the dot product is positive the angle is acute. The dot product across multiple dimensions is the sum of the products for each of the dimensions individually, and if one of the vector lengths in a dot product is very small then the dot product is very small as well.

My result is based on a recurrence which adds five dimensions and multiplies the number of points by six. It’s done like so:

Some explanation for this diagram is in order. The points a, b, c, p, q, and r all correspond to the same old point. The leftmost picture is the first and second new dimensions, the middle one the third and fourth, and the right one the fifth. The value ε1 is very small compared to all values which previously appeared and all normal functions on them. The value ε2 is very small compared to ε1 and this continues to ε5. (Just pretend I drew those as epsilons instead of es. My diagramming skills are limited.)

A more numeric way of defining the values for the new dimensions is as follows:

point 1 2 3 4 5
a cos(θ1) * ε1 sin(θ1) * ε1 cos(θ3) * ε5 sin(θ3) * ε5 ε4
b cos(θ1) * ε1 sin(θ1) * ε1 -cos(θ3) * ε5 -sin(θ3) * ε5 4
c cos(θ1) * ε1 * (1-ε3) sin(θ1) * ε1 * (1-ε3) cos(θ2) * ε2 sin(θ2) * ε2 0
p -cos(θ1) * ε1 -sin(θ1) * ε1 cos(θ5) * ε5 sin(θ5) * ε5 ε4
q -cos(θ1) * ε1 -sin(θ1) * ε1 -cos(θ5) * ε5 -sin(θ5) * ε5 4
r -cos(θ1) * ε1 * (1-ε3) -sin(θ1) * ε1 * (1-ε3) cos(θ4) * ε2 sin(θ4) * ε2 0

Those thetas are all angles. Their exact values don’t matter much, but they need to not be too close to each other compared to ε. Also θ2 and θ4 need to be between 0 and π/2 and not too close to the ends of that range.

Each triplet of points either all correspond to different old points, or two correspond to the same old point and another one to a second, or all three correspond to the same old point. In the case where all three correspond to different old points, their angle will be acute because the positions have only gotten changed by epsilon. The other cases can be straightforwardly enumerated. Because there are only two types of points there are only two cases for the singular external point to consider, labelled x and y in the diagram above. Because the last value of x can be positive or negative there are two positions labelled for it.

To check if each angle is acute, the dot product is calculated and checked to see if it’s positive. Rather than calculate these values exactly, it’s sufficient to check what the smallest epsilon each value is multiplied by and whether it’s positive or possibly negative. If the sum of the different dimensions is a positive larger epsilon (that is, one with a smaller subscript) then the dot product is positive and the angle is acute. Because many of the cases are very similar they can be consolidated down, so multiple endpoints are listed in some of the cases of the chart of all cases below. In cases where a value in one set of dimensions overwhelms all values which come after it, those values are skipped.

vertex end point end point 1 & 2 3 & 4 5
a b x 0 ε5 0
a b y 0 5 ε4
a c x ε3 5
a c y ε3 ε2
a pqr xy ε1
c ab x 3 ε2
c ab y 3 ε2
c pqr xy ε1
a b cr 0 5 ε4
a b pq 0 ε5 0
a c pq ε3 5
a pqr pqr ε1
c abpq bpq 3 2
c abpq r 3 2

Note that the extra restrictions on θ2 and θ4 are important in the CAY case (where A is the vertex).

In case you’re wondering how I came up with this: I can’t visualize in five dimensions any better than you can, but I’m pretty good at visualizing in three, so I worked out a 3 dimensions to 3 points recurrence which almost works. That’s dimensions three through five here. I put one of the recurrences from the previous work for dimensions one and two and moved the c and r points ‘down’ in the first two dimensions to fix the case which was broken, specifically CAX.

I’m quite certain this result can be improved on, although it’s hard for my poor little human brain to work out the more complicated cases. My guess is that it trends towards the absolute maximum of the nth root of n, which is at e (that’s the root of the natural log, not a miswritten epsilon). I conjecture that this is tight, so no matter how small ε is, (e1/e – ε)d < acute(d) < (e1/e + ε)d for sufficiently large d.

Posted Mon Jul 17 02:32:42 2017 Tags:

Two years ago, considering the blocksize debate, I made two attempts to measure average bandwidth growth, first using Akamai serving numbers (which gave an answer of 17% per year), and then using fixed-line broadband data from OFCOM UK, which gave an answer of 30% per annum.

We have two years more of data since then, so let’s take another look.

OFCOM (UK) Fixed Broadband Data

First, the OFCOM data:

  • Average download speed in November 2008 was 3.6Mbit
  • Average download speed in November 2014 was 22.8Mbit
  • Average download speed in November 2016 was 36.2Mbit
  • Average upload speed in November 2008 to April 2009 was 0.43Mbit/s
  • Average upload speed in November 2014 was 2.9Mbit
  • Average upload speed in November 2016 was 4.3Mbit

So in the last two years, we’ve seen 26% increase in download speed, and 22% increase in upload, bringing us down from 36/37% to 33% over the 8 years. The divergence of download and upload improvements is concerning (I previously assumed they were the same, but we have to design for the lesser of the two for a peer-to-peer system).

The idea that upload speed may be topping out is reflected in the Nov-2016 report, which notes only an 8% upload increase in services advertised as “30Mbit” or above.

Akamai’s State Of The Internet Reports

Now let’s look at Akamai’s Q1 2016 report and Q1-2017 report.

  • Annual global average speed in Q1 2015 – Q1 2016: 23%
  • Annual global average speed in Q1 2016 – Q1 2017: 15%

This gives an estimate of 19% per annum in the last two years. Reassuringly, the US and UK (both fairly high-bandwidth countries, considered in my previous post to be a good estimate for the future of other countries) have increased by 26% and 19% in the last two years, indicating there’s no immediate ceiling to bandwidth.

You can play with the numbers for different geographies on the Akamai site.

Conclusion: 19% Is A Conservative Estimate

17% growth now seems a little pessimistic: in the last 9 years the US Akamai numbers suggest the US has increased by 19% per annum, the UK by almost 21%.  The gloss seems to be coming off the UK fixed-broadband numbers, but they’re still 22% upload increase for the last two years.  Even Australia and the Philippines have managed almost 21%.

Posted Thu Jul 6 10:01:47 2017 Tags:
Figure from the article. CC-BY.
One of the projects I worked on at Karolinska Institutet with Prof. Grafström was the idea of combining transcriptomics data with dose-response data. Because we wanted to know if there was a relation between the structures of chemicals (drugs, toxicants, etc) and how biological systems react to that. Basically, testing the whole idea behind quantitative-structure activity relationship (QSAR) modeling.

Using data from the Connectivity Map (Cmap, doi:10.1126/science.1132939) and NCI60, we set out to do just that. My role in this work was to explore the actual structure-activity relationship. The Chemistry Development Kit (doi:10.1186/s13321-017-0220-4) was used to calculate molecular descriptor, and we used various machine learning approaches to explore possible regression models. Bottom line was, it is not possible to correlate the chemical structures with the biological activities. We explored the reason and ascribe this to the high diversity of the chemical structures in the Cmap data set. In fact, they selected the chemicals in that study based on chemical diversity. All the details can be found in this new paper.

It's important to note that these findings does not validate the QSAR concept, but just that they very unfortunately selected their compounds, making exploration of this idea impossible, by design.

However, using the transcriptomics data and a method developed by Juuso Parkkinen it is able to find multivariate patterns. In fact, what we saw is more than is presented in this paper, as we have not been able to support further findings with supporting evidence yet. This paper, however, presents experimental confirmation that predictions based on this component model, coined the Predictive Toxicogenocics Gene Space, actually makes sense. Biological interpretation is presented using a variety of bioinformatics analyses. But a full mechanistic description of the components is yet to be developed. My expectation is that we will be able to link these components to key events in biological responses to exposure to toxicants.

 Kohonen, P., Parkkinen, J. A., Willighagen, E. L., Ceder, R., Wennerberg, K., Kaski, S., Grafström, R. C., Jul. 2017. A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury. Nature Communications 8.
Posted Wed Jul 5 09:31:00 2017 Tags:

I (finally!) merged a patchset to detect palms based on pressure into libinput. This should remove a lot of issues that our users have seen with accidental pointer movement. Palm detection in libinput previously used two approaches: disable-while-typing and an edge-based approach. The former simply ignores touchpad events while keyboard events are detected, the latter ignores touches that happen in the edge zones of the touchpad where real interaction is unlikely. Both approaches have the obvious disadvantages: they're timeout- and location-dependent, causing erroneous pointer movements. But their big advantage is that they work even on old touchpads where a lot of other information is unreliable. Touchpads are getting better, so it's time to make use of that.

The new feature is relatively simple: libinput looks at per-touch pressure and if that pressure hits a given threshold, the touch is regarded as palm. Once a palm, that touch will be ignored until touch up. The threshold is intended to be high enough that it cannot easily be hit. At least on the touchpads I have available for testing, I have to go through quite some effort to trigger palm detection with my finger.

Pressure on touchpads is unfortunately hardware-dependent and we can expect most laptops to have different pressure thresholds. For our users this means that the feature won't immediately work perfectly, it will require a lot of hwdb entries. libinput now ships a libinput measure touchpad-pressure tool to experiment with the various pressure thresholds. This makes it easy to figure out the right pressure threshold and submit a bug report (or patch) for libinput to get the pressure threshold updated. The documentation for this tool is available as part of libinput's online documentation.

TLDR: if libinput seems to misdetect touches as palms, figure out the right threshold with libinput measure touchpad-pressure and file a bug report so we can merge this into our hwdb.

Posted Tue Jul 4 04:43:00 2017 Tags: