The lower-post-volume people behind the software in Debian. (List of feeds.)

If you're one of the people in the software freedom community who is attending O'Reilly's Open Source Software Convention (OSCON) next week here in Portland, you may have seen debate about O'Reilly and Associates (ORA)'s surreptitious Code of Conduct change (and quick revocation thereof) to name “political affiliation” as a protected class. If you're going to OSCON or plan to go to an OSCON or ORA event in the future, I suggest that you familiarize yourself with this issue and the political historical context in which these events of the last few days take place.

First, OSCON has always been political: software freedom is inherently a political struggle for the rights of computer users, so any conference including that topic is necessarily political. Additionally, O'Reilly himself had stated his political positions many times at OSCON, so it's strange that, in his response this morning, O'Reilly admits that he and his staff tried to require via agreements that speakers … refrain from all political speech. OSCON can't possibly be a software freedom community event if ORA's intent … [is] to make sure that conferences put on for the exchange of technical information aren't politicized (as O'Reilly stated today). OTOH, I'm not surprised by this tack, because O'Reilly, in large part via OSCON, often pushes forward political views that O'Reilly likes, and marginalizes those he doesn't.

Second, I must strongly disagree with ORA's new (as of this morning) position that Codes of Conduct should only include “protected classes” that the laws of a particular country currently recognize. Codes of Conduct exist in our community not only as mechanism to assure the rights of protected classes, but also to assure that everyone feels safe and free of harassment and hate speech. In fact, most Codes of Conduct in our community have “including but not limited to” language alongside any list of protected classes, and IMO all of them should.

More than that, ORA has missed a key opportunity to delineate hate speech and political speech in a manner that is sorely needed here in the USA and in the software freedom community. We live in a political climate where our Politician-in-Chief governs via Twitter and smoothly co-mingles political positioning with statements that would violate the Code of Conduct at most conferences. In other words, in a political climate where the party-ticket-headline candidate is exposed for celebrating his own sexual harassing behavior and gets elected anyway, we are culturally going to have trouble nationwide distinguishing between political speech and hate speech. Furthermore, political manipulators now use that confusion to their own ends, and we must be ever-vigilant in efforts to assure that political speech is free, but that it is delineated from hate speech, and, most importantly, that our policy on the latter is zero-tolerance.

In this climate, I'm disturbed to see that O'Reilly, who is certainly politically savvy enough to fully understand these delineations, is ignoring them completely. The rancor in our current politics — which is not just at the national level but has also trickled down into the software freedom community — is fueled by bad actors who will gladly conflate their own hate speech and political speech, and (in the irony that only post-fact politics can bring), those same people will also accuse the other side of hate speech, primarily by accusing intolerance of the original “political speech” (which is of course was, from the start, a mix of hate speech and political speech). (Examples of this abound, but one example that comes to mind is Donald Trump's public back-and-forth with San Juan Mayor Carmen Yulín Cruz.) None of ORA's policy proposals, nor O'Reilly's public response, address this nuance. ORA's detractors are legitimately concerned, because blanketly adding “political affiliation” to a protected class, married with a outright ban on political speech, creates an environment where selective enforcement favors the powerful, and furthermore allows the Code of Conduct to more easily become a political weapon by those who engage in the conflation practice I described.

However, it's no surprise that O'Reilly is taking this tack, either. OSCON (in particular) has a long history — on political issues of software freedom — of promoting (and even facilitating) certain political speech, even while squelching other political speech. Given that history (examples of which I include below), O'Reilly shouldn't be surprised that many in our community are legitimately skeptical about why ORA made these two changes without community discussion, only to quickly backpedal when exposed. I too am left wondering what political game O'Reilly is up to, since I recall well that Morozov documented O'Reilly's track record of political manipulation in his article, The Meme Hustler. I thus encourage everyone who attends ORA events to follow this political game with a careful eye and a good sense of OSCON history to figure out what's really going on. I've been watching for years, and OSCON is often a master class in achieving what Chomsky critically called “manufacturing consent” in politics.

For example, back in 2001, when OSCON was already in its third year, Microsoft executives went on the political attack against copyleft (calling it unAmerican and a “cancer”). O'Reilly, long unfriendly to copyleft himself, personally invited Craig Mundie of Microsoft to have a “Great Debate” keynote at the next OSCON — where Mundie would “debate” with “Open Source leaders” about the value of Open Source. In reality, O'Reilly put on stage lots of Open Source people with Mundie, but among them was no one who supported the strategy of copyleft, the primary component of Microsoft's political attacks. The “debate” was artfully framed to have only one “logical” conclusion: “we all love Open Source — even Microsoft (!) — it's just copyleft that can be problematic and which we should avoid”. It was no debate at all; only carefully crafted messaging that left out much of the picture.

That wasn't an isolated incident; both subtle and overt examples of crafted political messaging at OSCON became annual events after that. As another example, ten years later, O'Reilly did almost the same playbook again: he invited the GitHub CEO to give a very political and completely anti-copyleft keynote. After years of watching how O'Reilly carefully framed the political issue of copyleft at OSCON, I am definitely concerned about how other political issues might be framed.

And, not all political issues are equal. I follow copyleft politics because it's my been my day job for two decades. But, I admit there are stakes even higher with other political topics, and having watched how ORA has handled the politics of copyleft for decades, I'm fearful that ORA is (at best) ill-equipped to handle political issues that can cause real harm — such as the current political climate that permits hate speech, and even racist speech (think of Trump calling Elizabeth Warren “Pocahontas”), as standard political fare. The stakes of contemporary politics now leave people feeling unsafe. Since OSCON is a political event, ORA should face this directly rather than pretending OSCON is merely a series of technical lectures.

The most insidious part of ORA's response to this issue is that, until the issue was called out, it seems that all political speech (particularly that in opposition to the status quo) violated OSCON's policies by default. We've successfully gotten ORA to back down from that position, but not without a fight. My biggest concern is that ORA nearly ran OSCON this year with the problematic combination of banning political speech in the speaker agreement, while treating “political affiliation” as a protected class in the Code of Conduct. Regardless of intent, confusing and unclear rules like that are gamed primarily by bad actors, and O'Reilly knows that. Indeed, just days later, O'Reilly admits that both items were serious errors, yet still asks for voluntary compliance with the “spirit” of those confusing rules.

How could it be that an organization that's been running the same event for two decades only just began to realize that these are complex issues? Paradoxically, I'm both baffled and not surprised that ORA has handled this issue so poorly. They still have no improved solution for the original problem that O'Reilly states they wanted to address (i.e., preventing hate speech). Meanwhile, they've cycled through a series of failed (and alarming) solutions without community input. Would it have really been that hard for them to publicly ask first: “We want to welcome all political views at OSCON, but we also detest hate speech that is sometimes joined with political speech. Does anyone want to join a committee to work on improvements to our policies to address this issue?” I think if they'd handled this issue in that (Open Source) way, the outcome would have not be the fiasco it's become.

Posted Thu Jul 12 09:40:00 2018 Tags:

A common error when building from source is something like the error below:


meson.build:50:0: ERROR: Native dependency 'foo' not found
or a similar warning

meson.build:63:0: ERROR: Invalid version of dependency, need 'foo' ['>= 1.1.0'] found '1.0.0'.
Seeing that can be quite discouraging, but luckily, in many cases it's not too difficult to fix. As usual, there are many ways to get to a successful result, I'll describe what I consider the simplest.

What does it mean? Dependencies are simply libraries or tools that meson needs to build the project. Usually these are declared like this in meson.build:


dep_foo = dependency('foo', version: '>= 1.1.0')
In human words: "we need the development headers for library foo (or 'libfoo') of version 1.1.0 or later". meson uses the pkg-config tool in the background to resolve that request. If we require package foo, pkg-config searches for a file foo.pc in the following directories:
  • /usr/lib/pkgconfig,
  • /usr/lib64/pkgconfig,
  • /usr/share/pkgconfig,
  • /usr/local/lib/pkgconfig,
  • /usr/local/share/pkgconfig
The error message simply means pkg-config couldn't find the file and you need to install the matching package from your distribution or from source.

And important note here: in most cases, we need the development headers of said library, installing just the library itself is not sufficient. After all, we're trying to build against it, not merely run against it.

What package provides the foo.pc file?

In many cases the package is the development version of the package name. Try foo-devel (Fedora, RHEL, SuSE, ...) or foo-dev (Debian, Ubuntu, ...). yum and dnf provide a great shortcut to install any pkg-config dependency:


$> dnf install "pkgconfig(foo)"

$> yum install "pkgconfig(foo)"
will automatically search and install the right package, including its dependencies.
apt-get requires a bit more effort:

$> apt-get install apt-file
$> apt-file update
$> apt-file search --package-only foo.pc
foo-dev
$> apt-get install foo-dev
For those running Arch and pacman, the sequence is:

$> pacman -S pkgfile
$> pkgfile -u
$> pkgfile foo.pc
extra/foo
$> pacman -S extra/foo
Once that's done you can re-run meson and see if all dependencies have been met. If more packages are missing, follow the same process for the next file.

Any users of other distributions - let me know how to do this on yours and I'll update the post

My version is wrong!

It's not uncommon to see the following error after installing the right package:


meson.build:63:0: ERROR: Invalid version of dependency, need 'foo' ['>= 1.1.0'] found '1.0.0'.
Now you're stuck and you have a problem. What this means is that the package version your distribution provides is not new enough to build your software. This is where the simple solutions and and it all gets a bit more complicated - with more potential errors. Unless you are willing to go into the deep end, I recommend moving on and accepting that you can't have the newest bits on an older distribution. Because now you have to build the dependencies from source and that may then require to build their dependencies from source and before you know you've built 30 packages. If you're willing read on, otherwise - sorry, you won't be able to run your software today.

Manually installing dependencies

Now you're in the deep end, so be aware that you may see more complicated errors in the process. First of all you need to figure out where to get the source from. I'll now use cairo as example instead of foo so you see actual data. On rpm-based distributions like Fedora run dnf or yum:


$> dnf info cairo-devel # or yum info cairo-devel
Loaded plugins: auto-update-debuginfo, langpacks
Installed Packages
Name : cairo-devel
Arch : x86_64
Version : 1.13.1
Release : 0.1.git337ab1f.fc20
Size : 2.4 M
Repo : installed
From repo : fedora
Summary : Development files for cairo
URL : http://cairographics.org
License : LGPLv2 or MPLv1.1
Description : Cairo is a 2D graphics library designed to provide high-quality
: display and print output.
:
: This package contains libraries, header files and developer
: documentation needed for developing software which uses the cairo
: graphics library.
The important field here is the URL line - got to that and you'll find the source tarballs. That should be true for most projects but you may need to google for the package name and hope. Search for the tarball with the right version number and download it. On Debian and related distributions, cairo is provided by the libcairo2-dev package. Run apt-cache show on that package:

$> apt-cache show libcairo2-dev
Package: libcairo2-dev
Source: cairo
Version: 1.12.2-3
Installed-Size: 2766
Maintainer: Dave Beckett
Architecture: amd64
Provides: libcairo-dev
Depends: libcairo2 (= 1.12.2-3), libcairo-gobject2 (= 1.12.2-3),[...]
Suggests: libcairo2-doc
Description-en: Development files for the Cairo 2D graphics library
Cairo is a multi-platform library providing anti-aliased
vector-based rendering for multiple target backends.
.
This package contains the development libraries, header files needed by
programs that want to compile with Cairo.
Homepage: http://cairographics.org/
Description-md5: 07fe86d11452aa2efc887db335b46f58
Tag: devel::library, role::devel-lib, uitoolkit::gtk
Section: libdevel
Priority: optional
Filename: pool/main/c/cairo/libcairo2-dev_1.12.2-3_amd64.deb
Size: 1160286
MD5sum: e29852ae8e8e5510b00b13dbc201ce66
SHA1: 2ed3534d02c01b8d10b13748c3a02820d10962cf
SHA256: a6099cfbcc6bd891e347dd9abc57b7f137e0fd619deaff39606fd58f0cc60d27
In this case it's the Homepage line that matters, but the process of downloading tarballs is the same as above. For Arch users, the interesting line is URL as well:

$> pacman -Si cairo | grep URL
Repository : extra
Name : cairo
Version : 1.12.16-1
Description : Cairo vector graphics library
Architecture : x86_64
URL : http://cairographics.org/
Licenses : LGPL MPL
....

Now to the complicated bit: In most cases, you shouldn't install the new version over the system version because you may break other things. You're better off installing the dependency into a custom folder ("prefix") and point pkg-config to it. So let's say you downloaded the cairo tarball, now you need to run:


$> mkdir $HOME/dependencies/
$> tar xf cairo-someversion.tar.xz
$> cd cairo-someversion
$> autoreconf -ivf
$> ./configure --prefix=$HOME/dependencies
$> make && make install
$> export PKG_CONFIG_PATH=$HOME/dependencies/lib/pkgconfig:$HOME/dependencies/share/pkgconfig
# now go back to original project and run meson again
So you create a directory called dependencies and install cairo there. This will install cairo.pc as $HOME/dependencies/lib/cairo.pc. Now all you need to do is tell pkg-config that you want it to look there as well - so you set PKG_CONFIG_PATH. If you re-run meson in the original project, pkg-config will find the new version and meson should succeed. If you have multiple packages that all require a newer version, install them into the same path and you only need to set PKG_CONFIG_PATH once. Remember you need to set PKG_CONFIG_PATH in the same shell as you are running configure from.

In the case of dependencies that use meson, you replace autotools and make with meson and ninja:


$> mkdir $HOME/dependencies/
$> tar xf foo-someversion.tar.xz
$> cd foo-someversion
$> meson builddir -Dprefix=$HOME/dependencies
$> ninja -C builddir install
$> export PKG_CONFIG_PATH=$HOME/dependencies/lib/pkgconfig:$HOME/dependencies/share/pkgconfig
# now go back to original project and run meson again

If you keep seeing the version error the most common problem is that PKG_CONFIG_PATH isn't set in your shell, or doesn't point to the new cairo.pc file. A simple way to check is:


$> pkg-config --modversion cairo
1.13.1
Is the version number the one you installed or the system one? If it is the system one, you have a typo in PKG_CONFIG_PATH, just re-set it. If it still doesn't work do this:

$> cat $HOME/dependencies/lib/pkgconfig/cairo.pc
prefix=/usr
exec_prefix=/usr
libdir=/usr/lib64
includedir=/usr/include

Name: cairo
Description: Multi-platform 2D graphics library
Version: 1.13.1

Requires.private: gobject-2.0 glib-2.0 >= 2.14 [...]
Libs: -L${libdir} -lcairo
Libs.private: -lz -lz -lGL
Cflags: -I${includedir}/cairo
If the Version field matches what pkg-config returns, then you're set. If not, keep adjusting PKG_CONFIG_PATH until it works. There is a rare case where the Version field in the installed library doesn't match what the tarball said. That's a defective tarball and you should report this to the project, but don't worry, this hardly ever happens. In almost all cases, the cause is simply PKG_CONFIG_PATH not being set correctly. Keep trying :)

Let's assume you've managed to build the dependencies and want to run the newly built project. The only problem is: because you built against a newer library than the one on your system, you need to point it to use the new libraries.


$> export LD_LIBRARY_PATH=$HOME/dependencies/lib
and now you can, in the same shell, run your project.

Good luck!

Posted Sun Jul 8 23:46:00 2018 Tags:

This post is part of a series: Part 1, Part 2, Part 3, Part 4, Part 5.

In this post I'll describe the X server pointer acceleration for trackpoints. You will need to read Observations on trackpoint input data first to make sense of this post.

As described in that linked post, trackpoint input data varies wildly. Combined with the options we have in the server to configure everything makes this post a bit pointless as almost every single behaviour can be changed.

The linked post also describes the three subjective pressure ranges: no real physical pressure, some physical pressure, and serious pressure. The line between the first two ranges is roughly where the trackpoint sends deltas at the maximum reporting rate (100Hz) but with a value of 1. Below that pressure, the intervals increase but the delta remains at 1. Above that pressure, the interval remains constant at 10ms but the deltas increase. I've used the default kernel trackpoint sensitivity of 128 for any data listed here. Here is the visualisation of how deltas and intervals change again.

The default pointer acceleration profile in the X server is the simple profile. We know this from the earlier posts, it has a double-plateau shape. On a trackpoint mm/s doesn't make sense here so let's look at it in units/ms instead. A unit is simply a device-specific measurement of distance/pressure/tilt/whatever - it all depends on the device. On trackpoints that is (mostly) sideways pressure or tilt. On mice and touchpads we can convert units to mm based on their resolution. On trackpoints, we don't have a physical reference and we thus have to deal with it in units. The obvious problem here is that 1 unit on one device does not equal 1 unit on another device. And for configurable trackpoints, the definition of a unit changes as the sensitivity changes. And that's after the kernel already mangles it (if it does, it doesn't for all devices). So here's a box of asterisks, please sprinkle it liberally.

The smallest delta the kernel can send is 1. At a hardware report rate of 100Hz, continuous pressure to the smallest detected threshold thus generates 1 unit every 10 milliseconds or 0.1 units/ms. If I push uncomfortably hard, I can get deltas of around 10 units every 10ms or 1 unit/ms. In other words, we better zoom in here. Let's look at the meaningful range of this curve.

On my trackpoint, below 0.1 units/ms means virtually no pressure (pressure range one). Pressure range two is 0.1 to 0.4, approximately. Beyond that is pressure range three but that is also the range that becomes pointless quickly - I simply wouldn't want to press this hard in normal operation. 1 unit per ms (10 units per report) is very high pressure. This means the pointer acceleration curve is actually defined for the usable range with only outliers hitting the maximum acceleration. For mice this curve was effectively a constant acceleration for all but slow movements (see here). However, any configuration can change this curve to a point where none of the above applies.

Back to the minimum constant movement of 0.1 units/ms. That one effectively matches the start of the 'no accel' plateau. Anything below that will be decelerated, i.e. a delta of 1 unit will result a pointer delta less than 1 pixel. In other words, anything up to where you have to apply real pressure is decelerated.

The constant factor plateau goes all the way to 0.4 units/ms. Then there's the buggy jump to a factor of ~1.5, followed by a smooth curve to 0.8 units/ms where the factor maxes out. A bit of testing here suggests that 0.4 units/ms is in the upper limits of the second pressure range mentioned above. Going past 0.6 or 0.7 is definitely well within the third pressure range where things get uncomfortable quickly. This means that the acceleration bug is actually sitting right in the highest interesting range. Apparently no-one has noticed for 10 years.

But what does it matter? Well, probably not even that much. The only interesting bit I I can see here is that we have deceleration for most low-pressure movements and a constant acceleration of 1 for most realistic movements. I very much doubt that the range above 0.4 really matters.

But hey, this is just the default configuration. It is affected when someone changes the speed slider in GNOME, or when someone changes the sensitivity at the sysfs level. Other trackpoints wont have the exact same behaviour. Any analysis is thrown out of the window as soon as someone changes the sysfs sensitivity or increases the acceleration threshold.

Let's talk sysfs - if we increase my trackpoint sensitivity to 200, the deltas coming from the trackpoint change. First, the pressure required to give me a constant stream of events often gives me deltas of size 2 or 3. So we're half-way into the no acceleration plateau here. Higher pressures easily give me deltas of size 10 or 1 unit per ms, the edge of the image above.

I wish I could analyse this any further but realistically, the only takeaway here is that any change in configuration options results in some version of trial-and-error by the user until the trackpoint moves as they want to. But without knowing all those options, we just cannot know what exactly is happening.

However, what this is useful for is comparing it to libinput. libinput got a custom trackpoint acceleration function in 1.8, designed around the hardware delta range. The idea was that you (or someone) measures the trackpoint device's range once, if it's outside of the assumed default ranges we add a hwdb entry and voila, it scales back to the right ranges and that device is fixed for good.

Except - this doesn't work. libinput scales into the delta range and calculates the factor from that but it doesn't take the time stamps into account. It works on the assumption that a trackpoint deltas are at a constant frequency with a varying delta. That is simply not the case and the dynamic range of the trackpoint is so small that any acceleration of the deltas results in jerky movement.

This is of course fixable, we can just convert the deltas into a speed and then apply the acceleration curve based on that. So that's the next task, if you're interested in that, subscribe yourself to this issue.

Posted Tue Jun 26 23:15:00 2018 Tags:

My friend, colleague, and boss, Karen Sandler, yesterday tweeted about one of the unfortunately sexist incidents that she's faced in her life. This incident is a culmination of sexist incidents that Karen and I have seen since we started working together. I describe below how these events entice me to be complicit in sexist incidents, which I do my best to actively resist.

Ultimately, this isn't about me, Karen, or about a single situation, but this is a great example of how sexist behaviors manipulate a situation and put successful women leaders in no-win situations. If you read this tweet (and additionally already knew about Software Freedom Conservancy where I work)…

“#EveryDaySexism I'm Exec Director of a charity.  A senior tech exec is making his company's annual donation conditional on his speaking privately to a man who reports to me. I hope shining light on these situations erodes their power to build no-win situations for women leaders.” — Karen Sandler

… you've already guessed that I'm the male employee that this executive meant. When I examine the situation, I can't think of a single reason this donor could want to speak to me that would not be more productive if he instead spoke with Karen. Yet, the executive, who was previously well briefed on the role changes at Conservancy, repeatedly insisted that the donation was gated on a conversation with me.

Those who follow my and Karen's work know that I was Conservancy's first Executive Director. Now, I have a lower-ranking role since Karen came to Conservancy.

Back in 2014, Karen and I collaboratively talked about what role would make sense for her and me — and we made a choice together. We briefly considered a co-Executive Director situation, but that arrangement has been tried elsewhere and is typically not successful in the long term. Karen is much better than me at the key jobs of a successful Executive Director. Karen and I agreed she was better for the job than me. We took it to Conservancy's Board of Directors, and they moved my leadership role at Conservancy to be honorary, and we named Karen the sole Executive Director. Yes, I'm still nebulously a leader in the Free Software community (which I'm of course glad about). But for Conservancy matters, and specifically donor relations and major decisions about the organization, Karen is in charge.

Karen is an impressive leader and there is no one else that I'd want to follow in my software freedom activism work. She's the best Executive Director that Conservancy could possibly have — by far. Everyone in the community who works with us regularly knows this. Yet ever since Karen was named our Executive Director, she faces everyday sexist behavior, including people who seek to conscript me into participation in institutional sexism. As outlined above, I was initially Executive Director of Conservancy, and I was treated very differently than she is treated in similar situations, even though the organization has grown significantly under her leadership. More on that below, but first a few of the other everyday examples of sexism I've witnessed with Karen:

Many times when we're at conferences together, men who meet us assume that Karen works for me until we explain our roles. This happens almost every time both Karen and I are at the same conference, which is at least a few times each year.

Another time: a journalist wrote an article about some of “Bradley's work” at Conservancy. We pointed out to the journalist how strange it was that Karen was not mentioned in the article, and that it made it sound like I was the only person doing this work at our organization. He initially responded that because I was the “primary spokesperson”, it was natural to credit me and not her. Karen in fact had been more recently giving multiple keynotes on the topic, and had more speaking engagements than I did in that year. One of those keynotes was just weeks before the article, and it had been months since I'd given a talk or made any public statements. Fortunately, the journalist was willing to engage and discuss the importance of the issue (which was excellent) and the journalist even did agree it was a mistake, but neverthless couldn't rewrite the article.

Another time: we were leaked (reliable) information about a closed-door meeting where some industry leaders were discussing Conservancy and its work. The person who leaked us the information told us that multiple participants kept talking only about me, not Karen's work. When someone in the meeting said wait, isn't Karen Sandler the Executive Director?, our source (who was giving us a real-time report over IRC) reported that that the (male) meeting coordinator literally said: Oh sure, Karen works there, but Bradley is their guiding light. Karen had been Executive Director for years at that point.

I consistently say in talks, and in public conversations, that Karen is my boss. I literally use the word “boss”, so there is no confusion nor ambiguity. I did it this week at a talk. But instead of taking that as the fact that it is, many people make comments like well, Karen's not really your boss, right; that's just a thing you say?. So, I'm saying unequivocally here (surely not for the last time): I report to Karen at Conservancy. She is in charge of Conservancy. She has the authority to fire me. (I hope she won't, of course :). She takes views and opinions of our entire staff seriously but she sets the agenda and makes the decisions about what work we do and how we do it. (It shows how bad sexism is in our culture that Karen and I often have to explain in intricate detail what it means for someone to be an Executive Director of an organization.)

Interestingly but disturbingly, the actors here are not typically people who are actually sexist. They are rarely doing these actions consciously. Rather these incidents teach how institutional sexism operates in practice. Every time I'm approached (which is often) with some subtle situation where it makes Karen look like she's not really in charge, I'm given the opportunity to pump myself up, make myself look more important, and gain more credibility and power. It is clear to me that this comes at the expense of subtly denigrating Karen and that the enticement is part of an institutionally sexist zero-sum game.

These situations are no-win. I know that in the recent situation, the donation would be assured if I'd just agreed to a call right away without Karen's involvement. I didn't do it, because that approach would make me inherently complicit in institutional sexism. But, avoiding becoming “part of the problem” requires constant vigilance.

These situations are sadly very common, particularly for women who are banging cracks into the glass ceiling. For my part, I'm glad to help where I can tell my side the story, because I think it's essential for men to assist and corroborate the fight against sexism in our industry without mansplaining or white-knighting. I hope other men in technology will join me and refuse to participate and support behavior that seeks to erode women's well-earned power in our community. When you are told that a woman is in charge of a free software project, that a woman is the executive director of the organization, or that a woman is the chair of the board, take the fact at face value, treat that person as the one who is in charge of that endeavor, and don't (inadvertantly nor explicitly) undermine her authority.

Posted Thu Jun 21 18:40:00 2018 Tags:

This post does not describe a configuration system. If that's all you care about, read this post here and go be angry at someone else. Anyway, with that out of the way let's get started.

For a long time, libinput has supported model quirks (first added in Apr 2015). These model quirks are bitflags applied to some devices so we can enable special behaviours in the code. Model flags can be very specific ("this is a Lenovo x230 Touchpad") or generic ("This is a trackball") and it just depends on what the specific behaviour is that we need. The x230 touchpad for example has a custom pointer acceleration but trackballs are marked so they get some config options mice don't have/need.

In addition to model tags we also have custom attributes. These are free-form and provide information that we cannot get from the kernel. These too can be specific ("this model needs a pressure threshold of N") or generic ("bluetooth keyboards are an external keyboards").

Overall, it's a good system. Most users never have to care that we even have this. The whole point is that any device-specific quirks need to be merged only once for each model, then everyone with the same device gets to benefit on the next update.

Originally quirks were hardcoded but this required rebuilding libinput for any changes. So we moved this to utilise the udev hwdb. For the trivial work of fetching udev properties we got a lot of flexibility in how we can match against devices. For example, an entry may look like this:


libinput:name:*AlpsPS/2 ALPS GlidePoint:dmi:*svnDellInc.:pnLatitudeE6220:*
LIBINPUT_ATTR_PRESSURE_RANGE=100:90
The above uses a name match and the dmi modalias match to apply a property for the touchpad on the Dell Latitude E6330. The exact match format is defined by a bunch of udev rules that ship as part of libinput.

Using the udev hwdb maked the quirk storage a plaintext file that can be updated independently of libinput, including local overrides for testing things before merging them upstream. Having said that, it's definitely not public API and can change even between stable branch updates as properties are renamed or rescoped to fit the behaviour more accurately. For example, a model-specific tag may be renamed to a behaviour-specific tag as we find more devices affected by the same issue.

The main issue with the quirks now is that we keep accumulating more and more of them and I'm starting to hit limits with the udev hwdb match behaviour. The hwdb is great for single matches but not so great for cascading matches where one match may overwrite another match. The hwdb match system is largely implementation-defined so it's not always predictable which match rule wins out in the end.

Second, debugging the udev hwdb is not at all trivial. It's a bit like git - once you're used to it it's just fine but until then the air turns yellow with all the swearing being excreted by the unsuspecting user.

So long story short, libinput 1.12 will replace the hwdb model quirks database with a set of .ini files. The model quirks will be installed in /usr/share/libinput/ or whatever prefix your distribution prefers instead. It's a bunch of files with fairly simplistic instructions, each [section] has a set of MatchFoo=Bar directives and the ModelFoo=bar or AttrFoo=bar tags. See this file for an example. If all MatchFoo directives apply to a device, the Model and Attr tags are applied. Matching works in inter- and intra-file sequential order so the last section in a file overrides the first section of that file and the highest-sorting file overrides the lowest-sorting file. Otherwise the tags are accumulated, so if two files match on the same device with different tags, both tags are applied. So far, so unexciting.

Sometimes it's necessary to install a temporary local quirk until upstream libinput is updated or the distribution updates its package. For this, the /etc/libinput/local-overrides.quirks file is read in as well (if it exists). Note though that the config files are considered internal API, so any local overrides may stop working on the next libinput update. Should've upstreamed that quirk, eh?

These files give us the same functionality as the hwdb - we can drop in extra files without recompiling. They're more human-readable than a hwdb match and it's a lot easier to add extra match conditions to it. And we can extend the file format at will. But the biggest advantage is that we can quite easily write debugging tools to figure out why something works or doesn't work. The libinput list-quirks tool shows what tags apply to a device and using the --verbose flag shows you all the files and sections and how they apply or don't apply to your device.

As usual, the libinput documentation has details.

Posted Wed Jun 13 05:50:00 2018 Tags:

This time we talk trackpoints. Or pointing sticks, or whatever else you want to call that thing between the GHB keys. If you don't have one and you've never seen one, prepare to be amazed. [1]

Trackpoints are tiny joysticks that react to pressure [2], convert that pressure into relative x/y events and pass that on to whoever is interested in it. The harder you push, the higher the deltas. This is where the simple and obvious stops and it gets difficult. But then again, if it was that easy I wouldn't write this post, you wouldn't have anything to read, so somehow everyone wins. Whoop-dee-doo.

All the data and measurements below refer to my trackpoint, a Lenovo T440s. It may not apply to any other trackpoints, including those on on different laptop models or even on the same laptop model with different firmware versions. I've written the below with a lot of cringing and handwringing. I want to write data that is irrefutable, but the universe is against me and what the universe wants, the universe gets. Approximately every second sentence below has a footnote of "actual results may vary". Feel free to re-create the data on your device though.

Measuring trackpoint range is highly subjective, so you'll have to trust me when I describe how specific speeds/pressure ranges feel. There are three ranges of pressure on my trackpoint (sort-of):

  • Pressure range one: When resting the finger on the trackpoint I don't really need to apply noticable pressure to make the trackpoint send events. Just moving the finger on the trackpoint makes it send events, albeit sporadically.
  • Pressure range two: Going beyond range one requires applying real pressure and feels to me like we're getting into RSI territory. Not a problem for short periods, but definitely not something I'd want all the time. It's the pressure I'd use to cross the screen.
  • Pressure range three: I have to push hard. I definitely wouldn't want to do this during everyday interaction and it just feels wrong anyway. This pressure range is for testing maximum deltas, not one you would want to use otherwise.
The first/second range are easier delineated than the second/third range because going from almost no pressure to some real pressure is easy. Going from some pressure to too much pressure is more blurry, there is some overlap between second and third range. Either way, keep these ranges in mind though as I'll be using them in the explanations below.

Ok, so with the physical conditions explained, let's look at what we have to worry about in software:

  • It is impossible to provide a constant input to a trackpoint if you're a puny human. Without a robotic setup you just cannot apply constant pressure so any measurements have some error. You also get to enjoy a feedback loop - pressure influences pointer motion but that pointer motion influences how much pressure you inadvertently apply. This makes any comparison filled with errors. I don't know if I'm applying the same pressure on the two devices I'm testing, I don't know if a user I'm asking to test something uses constant/the same/the right pressure.
  • Not all trackpoints are created equal. Some trackpoints (mostly in Lenovos), have configurable sensibility - 256 levels of it. [3] So one trackpoint measured does not equal another trackpoint unless you keep track of the firmware-set sensibility. Those trackpoints also have other toggles. More importantly and AFAIK, this type of trackpoint also has a built-in acceleration curve. [4] Other trackpoints (ALPS) just have a fixed sensibility, I have no idea whether those have a built-in acceleration curve or merely have a linear-ish pressure->delta mappings.

    Due to some design choices we did years ago, systemd increases the sensitivity on some devices (the POINTINGSTICK_SENSITIVITY property). So even on a vanilla install, you can't actually rely on the trackpoint being set to the manufacturer default. This was in an attempt to make trackpoints behave more consistently, systemd had the hwdb and it seemed like the right place to put device-specific quirks. In hindsight, it was the wrong design choice.
  • Deltas are ... unreliable. At high sensitivity and high pressures you might get a sequence of [7, 7, 14, 8, 3, 7]. At lower pressure you get the deltas at seemingly random intervals. This could be because it's hard to keep exact constant pressure, it could be a hardware issue.
  • evdev has been the default driver for almost a decade and before that it was the mouse driver for a long time. So the kernel will "Divide 4 since trackpoint's speed is too fast" [sic] for some trackpoints. Or by 8. Or not at all. In other words, the kernel adjusts for what the default user space is and userspace is based on what the kernel provides. On the newest ALPS trackpoints the kernel has stopped doing any in-kernel scaling (good!) but that means that the deltas are out by a factor of 8 now.
  • Trackpoints don't always have the same pressure ranges for x/y. AFAICT the y range is usually a bit less than the x range on many or most trackpoints. A bit weird because the finger position would suggest that strong vertical pressure is easier to apply than sideways pressure.
  • (Some? All?) Trackpoints have built-in calibration procedures to find and set their own center-point. Without that you'll get the trackpoint eventually being ever so slightly off center over time, causing a mouse pointer that just wanders off the screen, possibly into the woods, without the obligatory red cape and basket full of whatever grandma eats when she's sick.

    So the calibration is required but can be triggered accidentally by the user: If you push with the same pressure into the same direction for 2-5 seconds (depending on $THINGS) you trigger the calibration procedure and the current position becomes the new center point. When you release, the cursor wanders off for a few seconds until the calibration sets things straight again. If you ever see the cursor buzz off in a fixed direction or walking backwards for a centimetre or two you've triggered that calibration. The only way to avoid this is to make sure the pointer acceleration mechanism allows you to reach any target within 2 seconds and/or never forces you to apply constant pressure for more than 2 seconds. Now there's a challenge...

Ok. If you've been paying attention instead of hoping for a TLDR that's more elusive than Godot, we're now aware of the various drawbacks of collecting data from a trackpoint. Let's go and look at data. Sensitivity is set to the kernel default of 128 in sysfs, the default reporting rate is 100Hz. All observations are YMMV and whatnot, especially the latter.

Trackpoint deltas are in integers but the dynamic range of delta values is tiny. You mostly get 1 or 2 and it requires quite a fair bit of pressure to get up to 5 or more. At low pressure you get deltas of 1, but less frequently. Visualised, the relationship between deltas and the interval between deltas is like this:

At low pressure, we get deltas of 1 but high intervals. As the pressure increases, the interval between events shrinks until at some point the interval between events matches the reporting rate (100Hz/10ms). Increasing the pressure further now increases the deltas while the intervals remain at the reporting rate. For example, here's an event sequence at low pressure:

E: 63796.187226 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +20ms
E: 63796.227912 0002 0001 0001 # EV_REL / REL_Y 1
E: 63796.227912 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +40ms
E: 63796.277549 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.277549 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +50ms
E: 63796.436793 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.436793 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +159ms
E: 63796.546114 0002 0001 0001 # EV_REL / REL_Y 1
E: 63796.546114 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +110ms
E: 63796.606765 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.606765 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +60ms
E: 63796.786510 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.786510 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +180ms
E: 63796.885943 0002 0001 0001 # EV_REL / REL_Y 1
E: 63796.885943 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +99ms
E: 63796.956703 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.956703 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +71ms
This was me pressing lightly but with perceived constant pressure and the time stamps between events go from 20m to 180ms. Remember what I said above about unreliable deltas? Yeah, that.

Here's an event sequence from a trackpoint at a pressure that triggers almost constant reporting:


E: 72743.926045 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.926045 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.926045 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 72743.939414 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.939414 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.939414 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +13ms
E: 72743.949159 0002 0000 -002 # EV_REL / REL_X -2
E: 72743.949159 0002 0001 -002 # EV_REL / REL_Y -2
E: 72743.949159 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 72743.956340 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.956340 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.956340 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +7ms
E: 72743.978602 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.978602 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.978602 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +22ms
E: 72743.989368 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.989368 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.989368 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +11ms
E: 72743.999342 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.999342 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.999342 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 72744.009154 0002 0000 -001 # EV_REL / REL_X -1
E: 72744.009154 0002 0001 -001 # EV_REL / REL_Y -1
E: 72744.009154 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 72744.018965 0002 0000 -002 # EV_REL / REL_X -2
E: 72744.018965 0002 0001 -003 # EV_REL / REL_Y -3
E: 72744.018965 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +9ms
Note how there is an events in there with 22ms? Maintaining constant pressure is hard. You can re-create the above recordings by running evemu-record.

Pressing hard I get deltas up to maybe 5. That's staying within the second pressure range outlined above, I can force higher deltas but what's the point. So the dynamic range for deltas alone is terrible - we have a grand total of 5 values across the comfortable range.

Changing the sensitivity setting higher than the default will send higher deltas, including deltas greater than 1 before reaching the report rate. Setting it to lower than the default (does anyone do that?) sends smaller deltas. But doing so means changing the hardware properties, similar to how some gaming mice can switch dpi on the fly.

I leave you with a fun thought exercise in correlation vs. causation: your trackpoint uses PS/2, your touchpad probably uses PS/2. Your trackpoint has a reporting rate of 100Hz but when you touch the touchpad half the bandwidth is used by the touchpad. So your trackpoint sends half the events when you have the palm resting on the touchpad. From my observations, the deltas don't double in size. In other words, your trackpoint just slows down to roughly half the speed. I can reduce the reporting rate to approximately a third by putting two or more fingers onto the touchpad. Trackpoints haven't changed that much over the years but touchpads have. So the takeway is: 10 years ago touchpads were smaller and trackpoints were faster. Simply because you could use them without touching the touchpad. Mind blown (if true, measuring these things is hard...)

Well, that was fun, wasn't it. I'm glad you stayed that long, because I did and it'd feel lonely otherwise. In the next post I'll outline the pointer acceleration curves for trackpoints and what we're going to to about that. Besides despairing, that is.

[1] I doubt you will be, but it always pays to be prepared.
[2] In this post I'm using "pressure" here as side-ways pressure, not downwards pressure. Some trackpoints can handle downwards pressure and modify the acceleration based on it (or expect userland to do so).
[3] Not that this number is always correct, the Lenovo CompactKeyboard USB with Trackpoint has a default sensibility of 5 - any laptop trackpoint would be unusable at that low value (their default is 128).
[4] I honestly don't know this for sure but ages ago I found a hw spec document that actually detailed the process. Search for ""TrackPoint System Version 4.0 Engineering Specification", page 43 "2.6.2 DIGITAL TRANSFER FUNCTION"
Posted Thu Jun 7 06:03:00 2018 Tags:

Thanks to Daniel Stone's efforts, libinput is now on gitlab. For a longer explanation on the move from the old freedesktop infrastructure (cgit, bugzilla, etc.) to the gitlab instance hosted by freedesktop.org, see this email.

All open bugs have been migrated from bugzilla to gitlab too, the documentation has been updated acccordingly, and we're ready to go. The new base URL for libinput in gitlab is: https://gitlab.freedesktop.org/libinput/.

Posted Wed Jun 6 01:35:00 2018 Tags:

libinput 1.11 is just around the corner and one of the new features added are the libinput-record and libinput-replay tools. These are largely independent of libinput itself (libinput-replay is a python script) and replace the evemu-record and evemu-replay tools. The functionality is roughly the same with a few handy new features. Note that this is a debugging tool, if you're "just" a user, you may never have to use either tool. But for any bug report expect me to ask for a libinput-record output, same as I currently ask everyone for an evemu recording.

So what does libinput-record do? Simple - it opens an fd to a kernel device node and reads events from it. These events are converted to YAML and printed to stdout (or the provided output file). The output is a combination of machine-readable information and human-readable comments. Included in the output are the various capabilities of the device but also some limited system information like the kernel version and the dmi modalias. The YAML file can be passed to libinput-replay, allowing me to re-create the event device on my test machines and hopefully reproduce the bug. That's about it. evemu did exactly the same thing and it has done wonders for how efficiently we could reproduce and fix bugs.

Alas, evemu isn't perfect. It's becoming 8 years old now and its API is a bit crufty. Originally two separate tools generated two separate files (machine-readable only), two different tools for creating the device and playing events. Over the years it got more useful. Now we only have one tool each to record or replay events and the file includes human-readable comments. But we're hitting limits, its file format is very inflexible and the API is the same. So we'd have to add a new file format and the required parsing, break the API, deal with angry users, etc. Not worth it.

Thus libinput-record is the replacement for evemu. The main features that libinput-record adds are a more standardised file format that can be expanded and parsed easily, the ability to record and replay multiple devices at once and the interleaving of evdev events with libinput events to check what's happening. And it's more secure by default, all alphanumeric keys are (by default) printed as KEY_A so there's no risk of a password leaking into a file attached to Bugzilla. evemu required python bindings, for libinput-record's output format we don't need those since you can just access YAML as array in Python. And finally - it's part of libinput which means it's going to be easier to install (because distributions won't just ignore libinput) and it's going to be more up-to-date (because if you update libinput, you get the new libinput-record).

It's new code so it will take a while to iron out any leftover bugs but after that it'll be the glorious future ;)

Posted Tue May 29 05:58:00 2018 Tags:

With Xamarin.Forms 3.0 in addition to the many new feature work that we did, we have been doing some general optimizations across the board, from compile times to startup times and wanted to share some recent results on the net effect on one of our larger sample apps.

These are the results when doing a cold start for the SmartHotel360 application on Android when compiled for 32bits (armeabi-v7a) on a Google Pixel (1st gen).

Release Release/AOT Release/AOT+LLVM
Forms 2.5.0 6.59s 1.66s 1.61s
Forms 3.0.0 3.52s 1.41s 1.38s

This is independent of the work that we are doing to improve Android's startup speed, that both brings additional benefits today, and will bring additional benefits in the future.

One of the areas that we are investing on for Android is to remove any dynamic code execution at startup to integrate with the Java runtime, instead all of this is being statically computed, similar to what we are doing on Mac and iOS where we completely eliminated reflection and code generation from startup.

Posted Fri May 18 13:40:20 2018 Tags:

This post is part of a four part series: Part 1, Part 2, Part 3, Part 4.

In the first three parts, I covered the X server and synaptics pointer acceleration curves and how libinput compares to the X server pointer acceleration curve. In this post, I will compare libinput to the synaptics acceleration curve.

Comparison of synaptics and libinput

libinput has multiple different pointer acceleration curves, depending on the device. In this post, I will only consider the one used for touchpads. So let's compare the synaptics curve with the libinput curve at the default configurations:

But this one doesn't tell the whole story, because the touchpad accel for libinput actually changes once we get faster. So here are the same two curves, but this time with the range up to 1000mm/s. These two graphs show that libinput is both very different and similar. Both curves have an acceleration factor less than 1 for the majority of speeds, they both decelerate the touchpad more than accelerating it. synaptics has two factors it sticks to and a short curve, libinput has a short deceleration curve and its plateau is the same or lower than synaptics for the most part. Once the threshold is hit at around 250 mm/s, libinput's acceleration keeps increasing until it hits a maximum much later.

So, anything under ~20mm/s, libinput should be the same as synaptics (ignoring the <7mm/s deceleration). For anything less than 250mm/s, libinput should be slower. I say "should be" because that is not actually the case, synaptics is slower so I suspect the server scaling slows down synaptics even further. Hacking around in the libinput code, I found that moving libinput's baseline to 0.2 matches the synaptics cursor's speed. However, AFAIK that scaling depends on the screen size, so your mileage may vary.

Comparing configuration settings

Let's overlay the libinput speed toggles. In Part 2 we've seen the synaptics toggles and they're open-ended, so it's a bit hard to pick a specific set to go with to compare. I'll be using the same combined configuration options from the diagram there.

And we need the diagram from 0-1000mm/s as well. There isn't much I can talk about here in direct comparison, the curves are quite different and the synaptics curves vary greatly with the configuration options (even though the shape remains the same).

Analysis

It's fairly obvious that the acceleration profiles are very different once depart from the default settings. Most notable, only libinput's slowest speed setting matches the 0.2 speed that is the synaptics default setting. In other words, if your touchpad is too fast compared to synaptics, it may not be possible to slow it down sufficiently. Likewise, even at the fastest speed, the baseline is well below the synaptics baseline for e.g. 0.6 [1], so if your touchpad is too slow, you may not be able to speed it up sufficiently (at least for low speeds). That problem won't exist for the maximum acceleration factor, the main question here is simply whether they are too high. Answer: I don't know.

So the base speed of the touchpad in libinput needs a higher range, that's IMO a definitive bug that I need to work on. The rest... I don't know. Let's see how we go.

[1] A configuration I found suggested in some forum when googling for MinSpeed, so let's assume there's at least one person out there using it.

Posted Thu May 10 05:37:00 2018 Tags:

This post is part of a four part series: Part 1, Part 2, Part 3, Part 4.

In Part 1 and Part 2 I showed the X server acceleration code as used by the evdev and synaptics drivers. In this part, I'll show how it compares against libinput.

Comparison to libinput

libinput has multiple different pointer acceleration curves, depending on the device. In this post, I will only consider the default one used for mice. A discussion of the touchpad acceleration curve comes later. So, back to the graph of the simple profile. Let's overlay this with the libinput pointer acceleration curve:

Turns out the pointer acceleration curve, mostly modeled after the xserver behaviour roughly matches the xserver behaviour. Note that libinput normalizes to 1000dpi (provided MOUSE_DPI is set correctly) and thus the curves only match this way for 1000dpi devices.

libinput's deceleration is slightly different but I doubt it is really noticeable. The plateau of no acceleration is virtually identical, i.e. at slow speeds libinput moves like the xserver's pointer does. Likewise for speeds above ~33mm/s, libinput and the server accelerate by the same amount. The actual curve is slightly different. It is a linear curve (I doubt that's noticeable) and it doesn't have that jump in it. The xserver acceleration maxes out at roughly 20mm/s. The only difference in acceleration is for the range of 10mm/s to 33mm/s.

30mm/s is still a relatively slow movement (just move your mouse by 30mm within a second, it doesn't feel fast). This means that for all but slow movements, the current server and libinput acceleration provides but a flat acceleration at whatever the maximum acceleration is set to.

Comparison of configuration options

The biggest difference libinput has to the X server is that it exposes a single knob of normalised continuous configuration (-1.0 == slowest, 1.0 == fastest). It relies on settings like MOUSE_DPI to provide enough information to map a device into that normalised range.

Let's look at the libinput speed settings and their effect on the acceleration profile (libinput 1.10.x).

libinput's speed setting is a combination of changing thresholds and accel at the same time. The faster you go, the sooner acceleration applies and the higher the maximum acceleration is. For very slow speeds, libinput provides deceleration. Noticeable here though is that the baseline speed is the same until we get to speed settings of less than -0.5 (where we have an effectively flat profile anyway). So up to the (speed-dependent) threshold, the mouse speed is always the same.

Let's look at the comparison of libinput's speed setting to the accel setting in the simple profile:

Clearly obvious: libinput's range is a lot smaller than what the accel setting allows (that one is effectively unbounded). This obviously applies to the deceleration as well: I'm not posting the threshold comparison, as Part 1 shows it does not effect the maximum acceleration factor anyway.

Analysis

So, where does this leave us? I honestly don't know. The curves are different but the only paper I could find on comparing acceleration curves is Casiez and Roussel' 2011 UIST paper. It provides a comparison of the X server acceleration with the Windows and OS X acceleration curves [1]. It shows quite a difference between the three systems but the authors note that no specific acceleration curve is definitely superior. However, the most interesting bit here is that both the Windows and the OS X curve seem to be constant acceleration (with very minor changes) rather than changing the curve shape.

Either way, there is one possible solution for libinput to implement: to change the base plateau with the speed. Otherwise libinput's acceleration curve is well defined for the configurable range. And a maximum acceleration factor of 3.5 is plenty for a properly configured mouse (generally anything above 3 is tricky to control). AFAICT, the main issues with pointer acceleration come from mice that either don't have MOUSE_DPI set or trackpoints which are, unfortunately, a completely different problem.

I'll probably also give the windows/OS X approaches a try (i.e. same curve, different constant deceleration) and see how that goes. If it works well, that may be a a solution because it's easier to scale into a large range. Otherwise, *shrug*, someone will have to come with a better solution.

[1] I've never been able to reproduce the same gain (== factor) but at least the shape and x axis seems to match.
Posted Thu May 10 03:47:00 2018 Tags:

This post is part of a four part series: Part 1, Part 2, Part 3, Part 4.

In Part 1 I showed the X server acceleration code as used by the evdev driver (which leaves all acceleration up to the server). In this part, I'll show the acceleration code as used by the synaptics touchpad driver. This driver installs a device-specific acceleration profile but beyond that the acceleration is... difficult. The profile itself is not necessarily indicative of the real movement, the coordinates are scaled between device-relative, device-absolute, screen-relative, etc. so often that it's hard to keep track of what the real delta is. So let's look at the profile only.

Diagram generation

Diagrams were generated by gnuplot, parsing .dat files generated by the ptrveloc tool in the git repo. Helper scripts to regenerate all data are in the repo too. Default values unless otherwise specified:

  • MinSpeed: 0.4
  • MaxSpeed: 0.7
  • AccelFactor: 0.04
  • dpi: 1000 (used for converting units to mm)
All diagrams are limited to 100 mm/s and a factor of 5 so they are directly comparable. From earlier testing I found movements above over 300 mm/s are rare, once you hit 500 mm/s the acceleration doesn't really matter that much anymore, you're going to hit the screen edge anyway.

The choice of 1000 dpi is a difficult one. It makes the diagrams directly comparable to those in Part 1but touchpads have a great variety in their resolution. For example, an ALPS DualPoint touchpad may have resolutions of 25-32 units/mm. A Lenovo T440s has a resolution of 42 units/mm over PS/2 but 20 units/mm over the newer SMBus/RMI4 protocol. This is the same touchpad. Overall it doesn't actually matter that much though, see below.

The acceleration profile

This driver has a custom acceleration profile, configured by the MinSpeed, MaxSpeed and AccelFactor options. The former two put a cap on the factor but MinSpeed also adjusts (overwrites) ConstantDeceleration. The AccelFactor defaults to a device-specific size based on the device diagonal.

Let's look at the defaults of 0.4/0.7 for min/max and 0.04 (default on my touchpad) for the accel factor:

The simple profile from part 1 is shown in this graph for comparison. The synaptics profile is printed as two curves, one for the profile output value and one for the real value used on the delta. Unlike the simple profile you cannot configure ConstantDeceleration separately, it depends on MinSpeed. Thus the real acceleration factor is always less than 1, so the synaptics driver doesn't accelerate as such, it controls how much the deltas are decelerated.

The actual acceleration curve is just a plain old linear interpolation between the min and max acceleration values. If you look at the curves closer you'll find that there is no acceleration up to 20mm/s and flat acceleration from 25mm/s onwards. Only in this small speed range does the driver adjust its acceleration based on input speed. Whether this is in intentional or just happened, I don't know.

The accel factor depends on the touchpad x/y axis. On my T440s using PS/2, the factor defaults to 0.04. If I get it to use SMBus/RMI4 instead of PS/2, that same device has an accel factor of 0.09. An ALPS touchpad may have a factor of 0.13, based on the min/max values for the x/y axes. These devices all have different resolutions though, so here are the comparison graphs taking the axis range and the resolution into account:

The diagonal affects the accel factor, so these three touchpads (two curves are the same physical touchpad, just using a different bus) get slightly different acceleration curves. They're more similar than I expected though and for the rest of this post we can get away we just looking at the 0.04 default value from my touchpad.

Note that due to how synaptics is handled in the server, this isn't the whole story, there is more coordinate scaling etc. happening after the acceleration code. The synaptics acceleration profile also does not acccommodate for uneven x/y resolutions, this is handled in the server afterwards. On touchpads with uneven resolutions the velocity thus depends on the vector, moving along the x axis provides differently sized deltas than moving along the y axis. However, anything applied later isn't speed dependent but merely a constant scale, so these curves are still a good representation of what happens.

The effect of configurations

What does the acceleration factor do? It changes when acceleration kicks in and how steep the acceleration is.

And how do the min/max values play together? Let's adjust MinSpeed but leave MaxSpeed at 0.7.

MinSpeed lifts the baseline (i.e. the minimum acceleration factor), somewhat expected from a parameter named this way. But it looks again like we have a bug here. When MinSpeed and MaxSpeed are close together, our acceleration actually decreases once we're past the threshold. So counterintuitively, a higher MinSpeed can result in a slower cursor once you move faster.

MaxSpeed is not too different here:

The same bug is present, if the MaxSpeed is smaller or close to MinSpeed, our acceleration actually goes down. A quick check of the sources didn't indicate anything enforcing MinSpeed < MaxSpeed either. But otherwise MaxSpeed lifts the maximum acceleration factor.

These graphs look at the options in separation, in reality users would likely configure both MinSpeed and MaxSpeed at the same time. Since both have an immediate effect on pointer movement, trial and error configuration is simple and straightforward. Below is a graph of all three adjusted semi-randomly:

No suprises in there, the baseline (and thus slowest speed) changes, the maximum acceleration changes and how long it takes to get there changes. The curves vary quite a bit though, so without knowing the configuration options, it's impossible to predict how a specific touchpad behaves.

Epilogue

The graphs above show the effect of configuration options in the synaptics driver. I purposely didn't put any specific analysis in and/or compare it to libinput. That comes in a future post.

Posted Thu May 10 03:26:00 2018 Tags:

This post is part of a four part series: Part 1, Part 2, Part 3, Part 4.

Over the last few days, I once again tried to tackle pointer acceleration. After all, I still get plenty of complaints about how terrible libinput is and how the world was so much better without it. So I once more tried to understand the X server's pointer acceleration code. Note: the evdev driver doesn't do any acceleration, it's all handled in the server. Synaptics will come in part two, so this here focuses mostly on pointer acceleration for mice/trackpoints.

After a few failed attempts of live analysis [1], I finally succeeded extracting the pointer acceleration code into something that could be visualised. That helped me a great deal in going back and understanding the various bits and how they fit together.

The approach was: copy the ptrveloc.(c|h) files into a new project, set up a meson.build file, #define all the bits that are assumed to be there and voila, here's your library. Now we can build basic analysis tools provided we initialise all the structs the pointer accel code needs correctly. I think I succeeded. The git repo is here if anyone wants to check the data. All scripts to generate the data files are in the repository.

A note on language: the terms "speed" and "velocity" are subtly different but for this post the difference doesn't matter. The code uses "velocity" but "speed" is more natural to talk about, so just assume equivalence.

The X server acceleration code

There are 15 configuration options for pointer acceleration (ConstantDeceleration, AdaptiveDeceleration, AccelerationProfile, ExpectedRate, VelocityTrackerCount, Softening, VelocityScale, VelocityReset, VelocityInitialRange, VelocityRelDiff, VelocityAbsDiff, AccelerationProfileAveraging, AccelerationNumerator, AccelerationDenominator, AccelerationThreshold). Basically, every number is exposed as configurable knob. The acceleration code is a product of a time when we were handing out configuration options like participation medals at a children's footy tournament. Assume that for the rest of this blog post, every behavioural description ends with "unless specific configuration combinations apply". In reality, I think only four options are commonly used: AccelerationNumerator, AccelerationDenominator, AccelerationThreshold, and ConstantDeceleration. These four have immediate effects on the pointer movement and thus it's easy to do trial-and-error configuration.

The server has different acceleration profiles (called the 'pointer transfer function' in the literature). Each profile is a function that converts speed into a factor. That factor is then combined with other things like constant deceleration, but eventually our output delta forms as:


deltaout(x, y) = deltain(x, y) * factor * deceleration
The output delta is passed back to the server and the pointer saunters over by few pixels, happily bumping into any screen edge on the way.

The input for the acceleration profile is a speed in mickeys, a threshold (in mickeys) and a max accel factor (unitless). Mickeys are a bit tricky. This means the acceleration is device-specific, the deltas for a mouse at 1000 dpi are 20% larger than the deltas for a mouse at 800 dpi (assuming same physical distance and speed). The "Resolution" option in evdev can work around this, but by default this means that the acceleration factor is (on average) higher for high-resolution mice for the same physical movement. It also means that that xorg.conf snippet you found on stackoverflow probably does not do the same on your device.

The second problem with mickeys is that they require a frequency to map to a physical speed. If a device sends events every N ms, delta/N gives us a speed in units/ms. But we need mickeys for the profiles. Devices generally have a fixed reporting rate and the speed of each mickey is the same as (units/ms * reporting rate). This rate defaults to 10 in the server (the VelocityScaling default value) and thus matches a device reporting at 100Hz (a discussion of this comes later). All graphs below were generated with this default value.

Back to the profile function and how it works: The threshold(usually) defines the mimimum speed at which acceleration kicks in. The max accel factor (usually) limits the acceleration. So the simplest algorithm is


if (velocity < threshold)
return base_velocity;
factor = calculate_factor(velocity);
if (factor > max_accel)
return max_accel;
return factor;
In reality, things are somewhere between this simple and "whoops, what have we done".

Diagram generation

Diagrams were generated by gnuplot, parsing .dat files generated by the ptrveloc tool in the git repo. Helper scripts to regenerate all data are in the repo too. Default values unless otherwise specified:

  • threshold: 4
  • accel: 2
  • dpi: 1000 (used for converting units to mm)
  • constant deceleration: 1
  • profile: classic
All diagrams are limited to 100 mm/s and a factor of 5 so they are directly comparable. From earlier testing I found movements above over 300 mm/s are rare, once you hit 500 mm/s the acceleration doesn't really matter that much anymore, you're going to hit the screen edge anyway.

Acceleration profiles

The server provides a number of profiles, but I have seen very little evidence that people use anything but the default "Classic" profile. Synaptics installs a device-specific profile. Below is a comparison of the profiles just so you get a rough idea what each profile does. For this post, I'll focus on the default Classic only.

First thing to point out here that if you want to have your pointer travel to Mars, the linear profile is what you should choose. This profile is unusable without further configuration to bring the incline to a more sensible level. Only the simple and limited profiles have a maximum factor, all others increase acceleration indefinitely. The faster you go, the more it accelerates the movement. I find them completely unusable at anything but low speeds.

The classic profile transparently maps to the simple profile, so the curves are identical.

Anyway, as said above, profile changes are rare. The one we care about is the default profile: the classic profile which transparently maps to the simple profile (SimpleSmoothProfile() in the source).

Looks like there's a bug in the profile formula. At the threshold value it jumps from 1 to 1.5 before the curve kicks in. This code was added in ~2008, apparently no-one noticed this in a decade.

The profile has deceleration (accel factor < 1 and thus decreasing the deltas) at slow speeds. This provides extra precision at slow speeds without compromising pointer speed at higher physical speeds.

The effect of config options

Ok, now let's look at the classic profile and the configuration options. What happens when we change the threshold?

First thing that sticks out: one of these is not like the others. The classic profile changes to the polynomial profile at thresholds less than 1.0. *shrug* I think there's some historical reason, I didn't chase it up.

Otherwise, the threshold not only defines when acceleration starts kicking in but it also affects steepness of the curve. So higher threshold also means acceleration kicks in slower as the speed increases. It has no effect on the low-speed deceleration.

What happens when we change the max accel factor? This factor is actually set via the AccelerationNumerator and AccelerationDenominator options (because floats used to be more expensive than buying a house). At runtime, the Xlib function of your choice is XChangePointerControl(). That's what all the traditional config tools use (xset, your desktop environment pre-libinput, etc.).

First thing that sticks out: one is not like the others. When max acceleration is 0, the factor is always zero for speeds exceeding the threshold. No user impact though, the server discards factors of 0.0 and leaves the input delta as-is.

Otherwise it's relatively unexciting, it changes the maximum acceleration without changing the incline of the function. And it has no effect on deceleration. Because the curves aren't linear ones, they don't overlap 100% but meh, whatever. The higher values are cut off in this view, but they just look like a larger version of the visible 2 and 4 curves.

Next config option: ConstantDeceleration. This one is handled outside of the profile but at the code is easy-enough to follow, it's a basic multiplier applied together with the factor. (I cheated and just did this in gnuplot directly)

Easy to see what happens with the curve here, it simply stretches vertically without changing the properties of the curve itself. If the deceleration is greater than 1, we get constant acceleration instead.

All this means with the default profile, we have 3 ways of adjusting it. What we can't directly change is the incline, i.e. the actual process of acceleration remains the same.

Velocity calculation

As mentioned above, the profile applies to a velocity so obviously we need to calculate that first. This is done by storing each delta and looking at their direction and individual velocity. As long as the direction remains roughly the same and the velocity between deltas doesn't change too much, the velocity is averaged across multiple deltas - up to 16 in the default config. Of course you can change whether this averaging applies, the max time deltas or velocity deltas, etc. I'm honestly not sure anyone ever used any of these options intentionally or with any real success.

Velocity scaling was explained above (units/ms * reporting rate). The default value for the reporting rate is 10, equivalent to 100Hz. Of the 155 frequencies currently defined in 70-mouse.hwdb, only one is 100 Hz. The most common one here is 125Hz, followed by 1000Hz followed by 166Hz and 142Hz. Now, the vast majority of devices don't have an entry in the hwdb file, so this data does not represent a significant sample set. But for modern mice, the default velocity scale of 10 is probably off between 25% and a factor 10. While this doesn't mean much for the local example (users generally just move the numbers around until they're happy enough) it means that the actual values are largely meaningless for anyone but those with the same hardware.

Of note: the synaptics driver automatically sets VelocityScale to 80Hz. This is correct for the vast majority of touchpads.

Epilogue

The graphs above show the X server's pointer acceleration for mice, trackballs and other devices and the effects of the configuration toggles. I purposely did not put any specific analysis in and/or comparison to libinput. That will come in a future post.

[1] I still have a branch somewhere where the server prints yaml to the log file which can then be extracted by shell scripts, passed on to python for processing and ++++ out of cheese error. redo from start ++++
Posted Wed May 9 22:34:00 2018 Tags:

My friend Aras recently wrote the same ray tracer in various languages, including C++, C# and the upcoming Unity Burst compiler. While it is natural to expect C# to be slower than C++, what was interesting to me was that Mono was so much slower than .NET Core.

The numbers that he posted did not look good:

  • C# (.NET Core): Mac 17.5 Mray/s,
  • C# (Unity, Mono): Mac 4.6 Mray/s,
  • C# (Unity, IL2CPP): Mac 17.1 Mray/s,

I decided to look at what was going on, and document possible areas for improvement.

As a result of this benchmark, and looking into this problem, we identified three areas of improvement:

  • First, we need better defaults in Mono, as users will not tune their parameters
  • Second, we need to better inform the world about the LLVM code optimizing backend in Mono
  • Third, we tuned some of the parameters in Mono.

The baseline for this test, was to run Aras ray tracer on my machine, since we have different hardware, I could not use his numbers to compare. The results on my iMac at home were as follows for Mono and .NET Core:

Runtime Results MRay/sec
.NET Core 2.1.4, debug build dotnet run 3.6
.NET Core 2.1.4, release build, dotnet run -c Release 21.7
Vanilla Mono, mono Maths.exe 6.6
Vanilla Mono, with LLVM and float32 15.5

During the process of researching this problem, we found a couple of problems, which once we fixed, produced the following results:

Runtime Results MRay/sec
Mono with LLVM and float32 15.5
Improved Mono with LLVM, float32 and fixed inline 29.6

Aggregated:

Chart visualizing the results of the table above

Just using LLVM and float32 your code can get almost a 2.3x performance improvement in your floating point code. And with the tuning that we added to Mono’s as a result of this exercise, you can get 4.4x over running the plain Mono - these will be the defaults in future versions of Mono.

This blog post explains our findings.

32 and 64 bit Floats

Aras is using 32-bit floats for most of his math (the float type in C#, or System.Single in .NET terms). In Mono, decades ago, we made the mistake of performing all 32-bit float computations as 64-bit floats while still storing the data in 32-bit locations.

My memory at this point is not as good as it used to be and do not quite recall why we made this decision.

My best guess is that it was a decision rooted in the trends and ideas of the time.

Around this time there was a positive aura around extended precision computations for floats. For example the Intel x87 processors use 80-bit precision for their floating point computations, even when the operands are doubles, giving users better results.

Another theme around that time was that the Gnumeric spreadsheet, one of my previous projects, had implemented better statistical functions than Excel had, and this was well received in many communities that could use the more correct and higher precision results.

In the early days of Mono, most mathematical operations available across all platforms only took doubles as inputs. C99, Posix and ISO had all introduced 32-bit versions, but they were not generally available across the industry in those early days (for example, sinf is the float version of sin, fabsf of fabs and so on).

In short, the early 2000’s were a time of optimism.

Applications did pay a heavier price for the extra computation time, but Mono was mostly used for Linux desktop application, serving HTTP pages and some server processes, so floating point performance was never an issue we faced day to day. It was only noticeable in some scientific benchmarks, and those were rarely the use case for .NET usage in the 2003 era.

Nowadays, Games, 3D applications image processing, VR, AR and machine learning have made floating point operations a more common data type in modern applications. When it rains, it pours, and this is no exception. Floats are no longer your friendly data type that you sprinkle in a few places in your code, here and there. They come in an avalanche and there is no place to hide. There are so many of them, and they won’t stop coming at you.

The “float32” runtime flag

So a couple of years ago we decided to add support for performing 32-bit float operations with 32-bit operations, just like everyone else. We call this runtime feature “float32”, and in Mono, you enable this by passing the --O=float32 option to the runtime, and for Xamarin applications, you change this setting on the project preferences.

This new flag has been well received by our mobile users, as the majority of mobile devices are still not very powerful and they rather process data faster than they need the precision. Our guidance for our mobile users has been both to turn on the LLVM optimizing compiler and float32 flag at the same time.

While we have had the flag for some years, we have not made this the default, to reduce surprises for our users. But we find ourselves facing scenarios where the current 64-bit behavior is already surprises to our users, for example, see this bug report filed by a Unity user.

We are now going to change the default in Mono to be float32, you can track the progress here: https://github.com/mono/mono/issues/6985.

In the meantime, I went back to my friend Aras project. He has been using some new APIs that were introduced in .NET Core. While .NET core always performed 32-bit float operations as 32-bit floats, the System.Math API still forced some conversions from float to double in the course of doing business. For example, if you wanted to compute the sine function of a float, your only choice was to call Math.Sin (double) and pay the price of the float to double conversion.

To address this, .NET Core has introduced a new System.MathF type, which contains single precision floating point math operations, and we have just brought this [System.MathF](https://github.com/mono/mono/pull/7941) to Mono now.

While moving from 64 bit floats to 32 bit floats certainly improves the performance, as you can see in the table below:

Runtime and Options Mrays/second
Mono with System.Math 6.6
Mono with System.Math, using -O=float32 8.1
Mono with System.MathF 6.5
Mono with System.MathF, using -O=float32 8.2

So using float32 really improves things for this test, the MathF had a small effect.

Tuning LLVM

During the course of this research, we discovered that while Mono’s Fast JIT compiler had support for float32, we had not added this support to the LLVM backend. This meant that Mono with LLVM was still performing the expensive float to double conversions.

So Zoltan added support for float32 to our LLVM code generation engine.

Then he noticed that our inliner was using the same heuristics for the Fast JIT than it was using for LLVM. With the Fast JIT, you want to strike a balance between JIT speed and execution speed, so we limit just how much we inline to reduce the work of the JIT engine.

But when you are opt into using LLVM with Mono, you want to get the fastest code possible, so we adjusted the setting accordingly. Today you can change this setting via an environment variable MONO_INLINELIMIT, but this really should be baked into the defaults.

With the tuned LLVM setting, these are the results:

Runtime and Options Mrays/seconds
Mono with System.Math --llvm -O=float32 16.0
Mono with System.Math --llvm -O=float32 fixed heuristics 29.1
Mono with System.MathF --llvm -O=float32 fixed heuristics 29.6

Next Steps

The work to bring some of these improvements was relatively low. We had some on and off discussions on Slack which lead to these improvements. I even managed to spend a few hours one evening to bring System.MathF to Mono.

Aras RayTracer code was an ideal subject to study, as it was self-contained, it was a real application and not a synthetic benchmark. We want to find more software like this that we can use to review the kind of bitcode that we generate and make sure that we are giving LLVM the best data that we can so LLVM can do its job.

We also are considering upgrading the LLVM that we use, and leverage any new optimizations that have been added.

SideBar

The extra precision has some nice side effects. For example, recently, while reading the pull requests for the Godot engine, I saw that they were busily discussing making floating point precision for the engine configurable at compile time (https://github.com/godotengine/godot/pull/17134).

I asked Juan why anyone would want to do this, I thought that games were just content with 32-bit floating point operations.

Juan explained to that while floats work great in general, once you “move away” from the center, say in a game, you navigate 100 kilometers out of the center of your game, the math errors start to accumulate and you end up with some interesting visual glitches. There are various mitigation strategies that can be used, and higher precision is just one possibility, one that comes with a performance cost.

Shortly after our conversation, this blog showed up on my Twitter timeline showing this problem:

http://pharr.org/matt/blog/2018/03/02/rendering-in-camera-space.html

A few images show the problem. First, we have a sports car model from the pbrt-v3-scenes **distribution. Both the camera and the scene are near the origin and everything looks good.

** (Cool sports car model courtesy Yasutoshi Mori.) Next, we’ve translated both the camera and the scene 200,000 units from the origin in xx, yy, and zz. We can see that the car model is getting fairly chunky; this is entirely due to insufficient floating-point precision.

** (Thanks again to Yasutoshi Mori.) If we move 5×5× farther away, to 1 million units from the origin, things really fall apart; the car has become an extremely coarse voxelized approximation of itself—both cool and horrifying at the same time. (Keanu wonders: is Minecraft chunky purely because everything’s rendered really far from the origin?)

** (Apologies to Yasutoshi Mori for what has been done to his nice model.)

Posted Wed Apr 11 21:12:37 2018 Tags:

Ed Felton tweeted a few days ago: “Often hear that the reason today’s Internet is not more secure is that the early designers failed to imagine that security could ever matter. That is a myth.”

This is indeed a myth.  Much of the current morass can be laid at the feet of the United States government, due to its export regulations around cryptography.

I will testify against the myth.  Bob Scheifler and I started the X Window System in 1984 at MIT, which is a network transparent window system: that is, applications can reside on computers anywhere in the network and use the X display server. As keyboard events may be transmitted over the network, it was clear to us from the get-go that it was a security issue. It is in use to this day on Linux systems all over the world (remote X11 access is no longer allowed: the ssh protocol is used to tunnel the X protocol securely for remote use). By sometime in 1985 or 1986 we were distributing X under the MIT License, which was developed originally for use of the MIT X Window System distribution (I’d have to go dig into my records to get the exact date).

I shared an office with Steve Miller at MIT Project Athena, who was (the first?) programmer working on Kerberos authentication service, which is used by Microsoft’s Active Directory service. Needless to say, we at MIT were concerned about security from the advent of TCP/IP.

We asked MIT whether we could incorporate Kerberos (and other encryption) into the X Window System. According to the advice at the time (and MIT’s lawyers were expert in export control, and later involved in PGP), if we had even incorporated strong crypto for authentication into our sources, this would have put the distribution under export control, and that that would have defeated X’s easy distribution. The best we could do was to leave enough hooks into the wire protocol that kerberos support could be added as a source level “patch” (even calls to functions to use strong authentication/encryption by providing an external library would have made it covered under export control). Such a patch for X existed, but could never be redistributed: by the time that export controls were relaxed, the patch had become mostly moot, as ssh had become available, which, along with the advent of the World Wide Web, was “good enough”, though far from an ideal solution.

Long before the term Open Source software was invented, open source and free software was essential to the Internet for essential services. The choice for all of us  working on that software was stark: we could either distribute the product of our work, or enter a legal morass, and getting it wrong could end up in court, as Phil Zimmerman did somewhat later with PGP.

Anyone claiming security was a “failure of imagination” does not know the people or the history and should not be taken seriously. Security mattered not just to us, but everyone working on the Internet. There are three software legacies from Project Athena: Kerberos, the X Window System, and instant messaging. We certainly paid much more than lip service to Internet security!

Government export controls crippled Internet security and the design of Internet protocols from the very beginning: we continue to pay the price to this day.  Getting security right is really, really hard, and current efforts towards “back doors”, or other access is misguided. We haven’t even recovered from the previous rounds of government regulations, which has caused excessive complexity in an already difficult problem and many serious security problems. Let us not repeat this mistake…

 

 

Posted Mon Apr 9 22:02:05 2018 Tags:

This was driving me insane. For years, I have been using Command-Shift-4 to take screenshots on my Mac. When you hit that keypress, you get to select a region of the screen, and the result gets placed on your ~/Desktop directory.

Recently, the feature stopped working.

I first blamed Dropbox settings, but that was not it.

I read every article on the internet on how to change the default location, restart the SystemUIServer.

The screencapture command line tool worked, but not the hotkey.

Many reboots later, I disabled System Integrity Protection so I could use iosnoop and dtruss to figure out why screencapture was not logging. I was looking at the logs right there, and saw where things were different, but could not figure out what was wrong.

Then another one of my Macs got infected. So now I had two Mac laptops that could not take screenshots.

And then I realized what was going on.

When you trigger Command-Shift-4, the TouchBar lights up and lets you customize how you take the screenshot, like this:

And if you scroll it, you get these other options:

And I had recently used these settings.

If you change your default here, it will be preserved, so even if the shortcut is Command-Shift-4 for take-screenshot-and-save-in-file, if you use the TouchBar to make a change, this will override any future uses of the command.

Posted Wed Apr 4 17:40:08 2018 Tags:

As many people know, last week there was a court hearing in the Geniatech vs. McHardy case. This was a case brought claiming a license violation of the Linux kernel in Geniatech devices in the German court of OLG Cologne.

Harald Welte has written up a wonderful summary of the hearing, I strongly recommend that everyone go read that first.

In Harald’s summary, he refers to an affidavit that I provided to the court. Because the case was withdrawn by McHardy, my affidavit was not entered into the public record. I had always assumed that my affidavit would be made public, and since I have had a number of people ask me about what it contained, I figured it was good to just publish it for everyone to be able to see it.

There are some minor edits from what was exactly submitted to the court such as the side-by-side German translation of the English text, and some reformatting around some footnotes in the text, because I don’t know how to do that directly here, and they really were not all that relevant for anyone who reads this blog. Exhibit A is also not reproduced as it’s just a huge list of all of the kernel releases in which I felt that were no evidence of any contribution by Patrick McHardy.

AFFIDAVIT

I, the undersigned, Greg Kroah-Hartman,
declare in lieu of an oath and in the
knowledge that a wrong declaration in
lieu of an oath is punishable, to be
submitted before the Court:

I. With regard to me personally:

1. I have been an active contributor to
   the Linux Kernel since 1999.

2. Since February 1, 2012 I have been a
   Linux Foundation Fellow.  I am currently
   one of five Linux Foundation Fellows
   devoted to full time maintenance and
   advancement of Linux. In particular, I am
   the current Linux stable Kernel maintainer
   and manage the stable Kernel releases. I
   am also the maintainer for a variety of
   different subsystems that include USB,
   staging, driver core, tty, and sysfs,
   among others.

3. I have been a member of the Linux
   Technical Advisory Board since 2005.

4. I have authored two books on Linux Kernel
   development including Linux Kernel in a
   Nutshell (2006) and Linux Device Drivers
   (co-authored Third Edition in 2009.)

5. I have been a contributing editor to Linux
   Journal from 2003 - 2006.

6. I am a co-author of every Linux Kernel
   Development Report. The first report was
   based on my Ottawa Linux Symposium keynote
   in 2006, and the report has been published
   every few years since then. I have been
   one of the co-author on all of them. This
   report includes a periodic in-depth
   analysis of who is currently contributing
   to Linux. Because of this work, I have an
   in-depth knowledge of the various records
   of contributions that have been maintained
   over the course of the Linux Kernel
   project.

   For many years, Linus Torvalds compiled a
   list of contributors to the Linux kernel
   with each release. There are also usenet
   and email records of contributions made
   prior to 2005. In April of 2005, Linus
   Torvalds created a program now known as
   “Git” which is a version control system
   for tracking changes in computer files and
   coordinating work on those files among
   multiple people. Every Git directory on
   every computer contains an accurate
   repository with complete history and full
   version tracking abilities.  Every Git
   directory captures the identity of
   contributors.  Development of the Linux
   kernel has been tracked and managed using
   Git since April of 2005.

   One of the findings in the report is that
   since the 2.6.11 release in 2005, a total
   of 15,637 developers have contributed to
   the Linux Kernel.

7. I have been an advisor on the Cregit
   project and compared its results to other
   methods that have been used to identify
   contributors and contributions to the
   Linux Kernel, such as a tool known as “git
   blame” that is used by developers to
   identify contributions to a git repository
   such as the repositories used by the Linux
   Kernel project.

8. I have been shown documents related to
   court actions by Patrick McHardy to
   enforce copyright claims regarding the
   Linux Kernel. I have heard many people
   familiar with the court actions discuss
   the cases and the threats of injunction
   McHardy leverages to obtain financial
   settlements. I have not otherwise been
   involved in any of the previous court
   actions.

II. With regard to the facts:

1. The Linux Kernel project started in 1991
   with a release of code authored entirely
   by Linus Torvalds (who is also currently a
   Linux Foundation Fellow).  Since that time
   there have been a variety of ways in which
   contributions and contributors to the
   Linux Kernel have been tracked and
   identified. I am familiar with these
   records.

2. The first record of any contribution
   explicitly attributed to Patrick McHardy
   to the Linux kernel is April 23, 2002.
   McHardy’s last contribution to the Linux
   Kernel was made on November 24, 2015.

3. The Linux Kernel 2.5.12 was released by
   Linus Torvalds on April 30, 2002.

4. After review of the relevant records, I
   conclude that there is no evidence in the
   records that the Kernel community relies
   upon to identify contributions and
   contributors that Patrick McHardy made any
   code contributions to versions of the
   Linux Kernel earlier than 2.4.18 and
   2.5.12. Attached as Exhibit A is a list of
   Kernel releases which have no evidence in
   the relevant records of any contribution
   by Patrick McHardy.
Posted Sun Mar 11 02:55:05 2018 Tags:

Mono has a pure C# implementation of the Windows.Forms stack which works on Mac, Linux and Windows. It emulates some of the core of the Win32 API to achieve this.

While Mono's Windows.Forms is not an actively developed UI stack, it is required by a number of third party libraries, some data types are consumed by other Mono libraries (part of the original design contract), so we have kept it around.

On Mac, Mono's Windows.Forms was built on top of Carbon, an old C-based API that was available on MacOS. This backend was written by Geoff Norton for Mono many years ago.

As Mono switched to 64 bits by default, this meant that Windows.Forms could not be used. We have a couple of options, try to upgrade the 32-bit Carbon code to 64-bit Carbon code or build a new backend on top of Cocoa (using Xamarin.Mac).

For years, I had assumed that Carbon on 64 was not really supported, but a recent trip to the console shows that Apple has added a 64-bit port. Either my memory is failing me (common at this age), or Apple at some point changed their mind. I spent all of 20 minutes trying to do an upgrade, but the header files and documentation for the APIs we rely on are just not available, so at best, I would be doing some guess work as to which APIs have been upgraded to 64 bits, and which APIs are available (rumors on Google searches indicate that while Carbon is 64 bits, not all APIs might be there).

I figured that I could try to build a Cocoa backend with Xamarin.Mac, so I sent this pull request to let me do this outside of the Mono tree on my copious spare time, so this weekend I did some work on the Cocoa Driver.

But this morning, on twitter, Filip Navarra noticed the above, and contacted me:

He has been kind enough to upload this Cocoa-based backend to GitHub.

Going Native

There are a couple of interesting things about this Windows.Forms backend for Cocoa.

The first one, is that it is using sysdrawing-coregraphics, a custom version of System.Drawing that we had originally developed for iOS users that implements the API in terms of CoreGraphics instead of using Cairo, FontConfig, FreeType and Pango.

The second one, is that some controls are backed by native AppKit controls, those that implement the IMacNativeControl interface. Among those you can find Button, ComboBox, ProgressBar, ScrollBar and the UpDownStepper.

I will now abandon my weekend hack, and instead get this code drop integrated as the 64-bit Cocoa backend.

Stay tuned!

Posted Tue Feb 20 20:09:04 2018 Tags:

While I had promised my friend Lucas that I would build a game in Unity for what seems like a decade, I still have not managed to build one.

Recently Aras shared his excitement for Unity in 2018. There is a ton on that blog post to unpack.

What I am personally excited about is that Unity now ships an up-to-date Mono in the core.

Jonathan Chambers and his team of amazing low-level VM hackers have been hard at work in upgrading Unity's VM and libraries to bring you the latest and greatest Mono runtime to Unity. We have had the privilege of assisting in this migration and providing them with technical support for this migration.

The work that the Unity team has done lays down the foundation for an ongoing update to their .NET capabilities, so future innovation on the platform can be quickly adopted, bringing new and more joyful capabilities to developers in the platform.

With this new runtime, Unity developers will be able to access and consume a large portion of third party .NET libraries, including all those shiny .NET Standard Libraries - the new universal way of sharing code in the .NET world.

C# 7

The Unity team has also provided very valuable guidance to the C# team which have directly influenced features in C# 7 like ref locals and returns

  • In our own tests using C# for an AR application, we doubled the speed of managed-code AR processing by using these new features.

When users use the new Mono support in Unity, they default to C# 6, as this is the version that Mono's C# compiler fully supports. One of the challenges is that Mono's C# compiler has not fully implemented support for C# 7, as Mono itself moved to Roslyn.

The team at Unity is now working with the Roslyn team to adopt the Roslyn C# compiler in Unity. Because Roslyn is a larger compiler, it is a slower compiler to startup, and Unity does many small incremental compilations. So the team is working towards adopting the server compilation mode of Roslyn. This runs the Roslyn C# compiler as a reusable service which can compile code very quickly, without having to pay the price for startup every time.

Visual Studio

If you install the Unity beta today, you will also see that on Mac, it now defaults to Visual Studio for Mac as its default editor.

JB evain leads our Unity support for Visual Studio and he has brought the magic of his Unity plugin to Visual Studio for Mac.

As Unity upgrades its Mono runtime, they also benefit from the extended debugger protocol support in Mono, which bring years of improvements to the debugging experience.

Posted Tue Feb 20 20:09:04 2018 Tags:

640px-elephas_maximus_eye_closeup

Bufferbloat is responsible for much of the poor performance seen in the Internet today and causes latency (called “lag” by gamers), triggered even by your own routine web browsing and video playing.

But bufferbloat’s causes and solutions remind me of the old parable of the Blind Men and the Elephant, updated for the modern Internet:

It was six men of Interstan,
 To learning much inclined,
Who sought to fix the Internet,
 (With market share in mind),
That evil of congestion,
 Might soon be left behind.

Definitely, the pipe-men say,
A wider link we laud,
For congestion occurs only
When bandwidth exceeds baud.
If only the reverse were true,
We would all live like lords.

But count the cost, if bandwidth had
No end, money-men say,
But count the cost, if bandwidth had
No end, money-men say,
Perpetual upgrade cycle;
True madness lies that way.

From DiffServ-land, their tried and true
Solution now departs,
Some packets are more equal than
Their lesser counterparts.
But choosing which is which presents
New problems in all arts.

“Every packet is sacred!” cries
The data-centre clan,
Demanding bigger buffers for
Their ever-faster LAN.
To them, a millisecond is
Eternity to plan.

The end-to-end principle guides
A certain band of men,
Detecting when congestion strikes;
Inferring bandwidth then.
Alas, old methods outcompete their
Algorithmic maven.

The Isolationists prefer
Explicit signalling,
Ensuring each and every flow
Is equally a king.
If only all the other men
Would listen to them sing!

And so these men of industry,
Disputed loud and long,
Each in his own opinion,
Exceeding stiff and strong,
Though each was partly in the right,
Yet mostly in the wrong!

So, oft in technologic wars,
The disputants, I ween,
Rail on in utter ignorance,
Of what each other mean,
And prate about an Internet,
Not one of them has seen!

Jonathan Morton, with apologies to John Godfrey Saxe

Most technologists are not truly wise: we are usually like the blind men of Interstan. The TCP experts, network operators, telecom operators, router makers, Internet service operators, router vendors and users have all had a grip *only* on their piece of the elephant.

The TCP experts look at TCP and think “if only TCP were changed” in their favorite way, all latency problems would be solved, forgetting that there are many other causes of saturating an Internet link, and that changing every TCP implementation on the planet will take time measured in decades. With the speed of today’s processors, almost everything can potentially transmit at gigabits per second. Saturated (“congested”) links are *normal* operation in the Internet, not abnormal. And indeed, improved congestion avoidance algorithms such as BBR and mark/drop algorithms such as CoDel and PIE are part of the solution to bufferbloat.

And since TCP is innately “unfair” to flows with different RTT’s, and we have both nearby (~10ms) or (inter)continental distant (~75-200ms) services, no TCP only solution can possibly provide good service. This is a particularly acute problem for people living in remote locations in the world who have great inherent latency differences between local and non-local services. But TCP only solutions cannot solve other problems causing unnecessary latency, and can never achieve really good latency as the queue they build depend on the round trip time (RTT) of the flows.

Network operators often think: “if only I can increase the network bandwidth,” they can banish bufferbloat: but at best, they can only move the bottleneck’s location and that bottleneck’s buffering may even be worse! When my ISP increased the bandwidth to my house several years ago (at no cost, without warning me), they made my service much worse, as the usual bottleneck moved from the broadband link (where I had bufferbloat controlled using SQM) to WiFi in my home router. Suddenly, my typical lag became worse by more than a factor of ten, without having touched anything I owned and having double the bandwidth!

The logical conclusion of solving bufferbloat this way would be to build an ultimately uneconomic and impossible-to-build Internet where each hop is at least as fast as the previous link under all circumstances, and which ultimately collides with the limits of wireless bandwidth available at the edge of the Internet. Here lies madness. Today, we often pay for much more bandwidth than needed for our broadband connections just to reduce bufferbloat’s effects; most applications are more sensitive to latency than bandwidth, and we often see serious bufferbloat issues at peering points as well in the last mile, at home and using cellular systems. Unnecessary latency just hurts everyone.

Internet service operators optimize the experience of their applications to users, but seldom stop to see if the their service damages that of other Internet services and applications. Having a great TV experience, at the cost of good phone or video conversations with others is not a good trade-off.

Telecom operators, have often tried to limit bufferbloat damage by hacking the congestion window of TCP, which does not provides low latency, nor does it prevents severe bufferbloat in their systems when they are loaded or the station remote.

Some packets are much more important to deliver quickly, so as to enable timely behavior of applications. These include ACKS, TCP opens, TLS handshakes, and many other packet types such as VOIP, DHCP and DNS lookups. Applications cannot make progress until those responses return. Web browsing and other interactive applications suffer greatly if you ignore this reality. Some network operation experts have developed complex packet classification algorithms to try to send these packets on their way quickly.

Router manufacturers often have extremely complex packet classification rules and user interfaces, that no sane person can possibly understand. How many of you have successfully configured your home router QOS page, without losing hair from your head?

Lastly, packet networks are inherently bursty, and these packet bursts cause “jitter.” With only First In First Out (FIFO) queuing, bursts of tens or hundreds of packets happen frequently. You don’t want your VOIP packet or other time sensitive to be arbitrarily delayed by “head of line” blocking of bursts of packets in other flows. Attacking the right problem is essential.

We must look to solve all of these problems at once, not just individual aspects of the bufferbloat elephant. Flow Queuing CoDel (FQ_CoDel) is the first, but will not be the last such algorithm. And we must attack bufferbloat however it appears.

Flow Queuing Codel Algorithm: FQ_CoDel

Kathleen Nichols and Van Jacobson invented the CoDel Algorithm (described now formally in RFC 8289), which represents a fundamental advance in Active Queue Management that preceded it, which required careful tuning and could hurt you if not tuned properly.  Their recent blog entry helps explain its importance further. CoDel is based on the notion of “sojourn time”, the time that a packet is in the queue, and drops packets at the head of the queue (rather than tail or random drop). Since the CoDel algorithm is independent of queue length, is self tuning, and is solely dependent on sojourn time, additional combined algorithms become possible not possible with predecessors such as RED.

Dave Täht had been experimenting with Paul McKenney’s Stochastic Fair Queuing algorithm (SFQ) as part of his work on bufferbloat, confirming that many of the issues are caused by head of line blocking (and unfairness of differing RTT’s of competing flows) were well handled by that algorithm.

CoDel and Dave’s and Paul’s work inspired Eric Dumazet to invent the FQ_Codel algorithm almost immediately after the CoDel algorithm became available. Note we refer to it as “Flow Queuing” rather than “Fair Queuing” as it is definitely “unfair” in certain respects rather than a true “Fair Queuing” algorithm.

Synopsis of the FQ_Codel Algorithm Itself

FQ_CoDel, documented in RFC 8290, uses a hash (typically but not limited of the usual 5-tuple identifying a flow), and establishes a queue for each, as does the Stochastic Fair Queuing (SFQ) algorithm. The hash bounds memory usage of the algorithm enabling use on even small embedded router or in hardware. The number of hash buckets (flows) and what constitutes a flow can be adjusted as needed; in practice the flows are independent TCP flows.

There are two sets of flow queues: those flows which have built a queue, and queues for new flows.

The packet scheduler preferentially schedules packets from flows that have not built a queue from those which have built a queue (with provisions to ensure that flows that have built queues cannot be starved, and continue to make progress).

If a flow’s queue empties, and packets again appear later, that flow is considered a new flow again.

Since CoDel only depends on the time in queue, it is easily applied in FQ_CoDel to ensure that TCP flows are properly policed to keep their behavior from building large queues.

CoDel does not require tuning and works over a very wide range of bandwidth and traffic. Combining flow queuing with a older mark/drop algorithm is impossible with older AQM algorithms such as Random Early Detection (RED) which depend on queue length rather than time in queue, and require tuning. Without some mark/drop algorithm such as CoDel, individual flows might fill their FQ_CoDel queues, and while they would not affect other flows, those flows would still suffer bufferbloat.

What Effects Does the FQ_CoDel Algorithm Have?

FQ_Codel has a number of wonderful properties for such a simple algorithm:

  • Simple “request/response” or time based protocols are preferentially scheduled relative to bulk data transport. This means that your VOIP packets, your TCP handshakes, cryptographic associations, your button press in your game, your DHCP or other basic network protocols all get preferential service without the complexity of extensive packet classification, even under very heavy load of other ongoing flows. Your phone call can work well despite large downloads or video use.
  • Dramatically improved service for time sensitive protocols, since ACKS, TCP opens, security associations, DNS lookups, etc., fly through the bottleneck link.
  • Idle applications get immediate service as they return from their idle to active state. This is often due to interactive user’s actions and enabling rapid restart of such flows is preferable to ongoing, time insensitive bulk data transfer.
  • Many/most applications put their most essential information in the first data packets after a connection opens: FQ_CoDel sees that these first packets are quickly forwarded. For example, web browsers need the size information included in an image before being able to complete page layout; this information is typically contained in the first packet of the image. Most later packets in a large flow have less critical time requirements than the first packet(s).
  • Your web browsing is significantly faster, particularly when you share your home connection with others, since head of line blocking is no longer such an issue.
  • Flows that are “well behaved” and do not use more than their share of the available bandwidth at the bottleneck are rewarded for their good citizenship. This provides a positive incentive for good behavior (which the packet pacing work at Google has been exploiting with BBR, for example, which depends on pacing packets). Unpaced packet bursts are poison to “jitter”, that is the variance of latency, which sets how close to real time other algorithms (such as VOIP) need to operate.
  • If there are bulk data flows of differing RTT’s, the algorithm ensures (in the limit) fairness between the flows, solving an issue that cannot itself be solved by TCP.
  • User simplicity: without complex packet classification schemes, FQ_CoDel schedules packets far better than previous rigid classification systems.
  • FQ_CoDel is extremely small, simple and fast and scalable both up and down in bandwidth and # of flows, and down onto very constrained systems. It is safe for FQ_CoDel to be “always on”.

Having an algorithm that “just works” for 99% of cases and attacks the real problem is a huge advantage and simplification. Packet classification need only be used only to provide guarantees when essential, rather than be used to work around bufferbloat. No (sane) human can possibly successfully configure the QOS page found on today’s home router.  Once bufferbloat is removed and flow queuing present, any classification rule set can be tremendously simplified and understood.

Limitations and Warnings

If flow queuing AQM is not present at your bottleneck in your path, nothing you do can save you under load. You can at most move the bottleneck by inserting an artificial bottleneck.

The FQ_CoDel algorithm [RFC 8290], particularly as a queue discipline, is not a panacea for bufferbloat as buffers hide all over today’s systems. There has been about a dozen significant improvements to reduce bufferbloat just in Linux starting with Tom Herbert’s Linux BQL, Eric Dumazet’s TCP Small queues, TSO Sizing and FQ scheduler, just naming a few. Our thanks to everyone in the Linux community for their efforts. FQ_Codel, implemented in its generic form as the Linux queue discipline fq_codel, only solves part of the bufferbloat problem, that found in the qdisc layer. The FQ_CoDel algorithm itself is applicable to a wide variety of circumstances, and not just in networking gear, but even sometimes in applications.

WiFi and Wireless

Good examples of the buffering problems occurs in wireless and other technologies such as DOCSIS and DSL. WiFi is particularly hard: rapid bandwidth variation and packet aggregation means that WiFi drivers must have significant internal buffering. Cellular systems have similar (and different) challenges.

Toke Høiland-Jørgensen, Michał Kazior, Dave Täht, Per Hertig and Anna Brunstrom reported amazing improvements for WiFi at the Usenix Technical Conference. Here, FQ_CoDel algorithm is but part of a more complex algorithm that handles aggregation and transmission air-time fairness (which Linux has heretofore lacked). The results are stunning, and we hope to see similar improvements in other technologies by applying FQ_Codel.

WiFiThis chart shows the latency under load in milliseconds of the new WiFi algorithm, beginning adoption in Linux. The log graph is used to enable plotting results on the same graph. The green curve is for the new driver latency under load (cumulative probability of the latency observed by packets) while the orange graph is for the previous Linux driver implementation. You seldom see 1-2 orders of magnitude improvements in anything. Note that latency can exceed one second under load with existing drivers. And implementation of air-time fairness in this algorithm also significantly increases the total bandwidth available in a busy WiFi network, by using the spectrum more efficiently, while delivering this result! See the above paper for more details and data.

This new driver structure and algorithm is in the process of adoption in Linux, though most drivers do not yet take advantage of what this offers. You should demand your device vendors using Linux update their drivers to the new framework.

While the details may vary for different technologies, we hope the WiFi work will help illuminate techniques appropriate for applying FQ_CoDel (or other flow queuing algorithms, as appropriate) to other technologies such as cellular networks.

Performance and Comparison to Other Work

In the implementations available in Linux, FQ_CoDel has seriously outperformed all comers, as shown in: “The Good, the Bad and the WiFi: Modern AQMs in a residential setting,” by Toke Høiland-Jørgensen, Per Hurtig and Anna Brunstrom. This paper is unique(?) in that it compares the algorithms consistently using the identical tests on all algorithms on the same up to date test hardware and software version, and is an apple-to-apple comparison of running code.

Our favorite benchmark for “latency under load” is the “rrul” test from Toke’s flent test tool. It demonstrates when flow(s) may be damaging the latency of other flow(s) as well as goodput and latency.  Low bandwidth performance is more difficult than high bandwidth. At 1Mbps, a single 1500 byte packet represents 12 milliseconds! This shows fq_codel outperforming all comers in latency, while preserving very high goodput for the available bandwidth. The combination of an effective self adjusting mark/drop algorithm with flow queuing is impossible to beat. The following graph measures goodput versus latency for a number of queue disciplines, at two different bandwidths, of 1 and 10Mbps (not atypical of the low end of WiFi or DSL performance). See the paper for more details and results.

aqmsummary

FQ_CoDel radically outperforms CoDel, while solving problems that no conventional TCP mark/drop algorithm can do by itself. The combination of flow queuing with CoDel is a killer improvement, and improves the behavior of CoDel (or PIE), since ACKS are not delayed.

Pfifo_fast previously has been the default queue discipline in Linux (and similar queuing is used on other operating systems): it is for the purposes of this test, a simple FIFO queue. Note that the default length of the FIFO queue is usually ten times longer than that displayed here in most of today’s operating systems. The FIFO queue length here was chosen so that the differences between algorithms would remain easily visible on the same linear plot!

Other self tuning TCP congestion avoidance mark/drop algorithms such as PIE [RFC 8033] are becoming available, and some of those are beginning to be combined with flow queuing to good effect. As you see above, PIE without flow queuing is not competitive (nor is CoDel). An FQ-PIE algorithm, recently implemented for FreeBSD by Rasool Al-Saadi and Grenville Armitage, is more interesting and promising. Again, flow queuing combined with an auto-adjusting mark/drop algorithm is the key to FQ-PIE’s behavior.

Availability of Modern AQM Algorithms

Implementations of both CoDel and FQ_Codel were released in Linux 3.5 in July 2012 and in FreeBSD 11, in October, 2016 (and backported to FreeBSD 10.3); FQ-PIE is only available in FreeBSD (though preliminary patches for Linux exist).

FQ_CoDel has seen extensive testing, deployment and use in Linux since its initial release. There is never a reason to use CoDel, and is present in Linux solely to enable convenient performance comparisons, as in the above results. FQ_CoDel first saw wide use and deployment as part of OpenWrt when it became the default queue discipline there, and used heavily as part of the SQM (Smart Queue Management) system to mitigate “last mile” bufferbloat. FQ_CoDel is now the default queue discipline on most Linux distributions in many circumstances.  FQ_CoDel is being used in an increasing number of commercial products.

The FQ_CoDel based WiFi improvements shown here are available in the latest LEDE/OpenWrt release for a few chips. WiFi chip vendors and router vendors take note: your competitors may be about to pummel you, as this radical improvement is now becoming available. The honor of the commercial product to incorporate this new WiFi code is the Evenroute IQrouter.

PIE, but not FQ-PIE was mandated in DOCSIS 3.1, and is much less effective than FQ_CoDel (or FQ-PIE).

Continuing Work: CAKE

Research in other AQM’s continue apace. FQ_CoDel and FQ-PIE are far from the last words on the topic.

Rate limiting is necessary to mitigate bufferbloat in equipment that has no effective queue management (such as most existing DSL or most Cable modems). While we have used fq_codel in concert with Hierarchical Token Bucket (HTB) rate limiting for years as part of the Smart Queue Management (SQM) scripts (in OpenWrt and elsewhere), one can go further in an integrated queue discipline to address problems difficult to solve piecemeal, and make configuration radically simpler. The CAKE queue discipline, available today in OpenWrt, additionally adds integrated framing compensation, diffserv support and CoDel improvements. We hope to see CAKE upstream in Linux soon. Please come help in its testing and development.

As you see above, the Make WiFi Fast project is making headway and makes WiFi much to be preferred to cellular service when available, much, much more is left to be done. Investigating fixing bufferbloat in WiFi made us aware of just how much additional improvement in WiFi was also possible. Please come help!

Conclusion

Ye men of Interstan, to learning much inclined, see the elephant, by opening your eyes: 

ISP’s, insist your equipment manufacturers support modern flow queuing automatic queue management. Chip manufacturers: ensure your chips implement best practices in the area, or see your competitors win business.

Gamers, stock traders, musicians, and anyone wanting decent telephone or video calls take note. Insist your ISP fix bufferbloat in its network. Linux and RFC 8290 prove you can have your cake and eat it too. Stop suffering.

Insist on up to date flow queuing queue management in devices you buy and use, and insist that bufferbloat be fixed everywhere.

Posted Mon Feb 12 00:39:15 2018 Tags: