The lower-post-volume people behind the software in Debian. (List of feeds.)

I’ve been pretty quiet lately, other than short posts on G+, because I’ve been grinding hard on NTPsec. We’re coming up on a 1.0 release and, although things are going very well technically, it’s been a shit-ton of work.

One consequence is the NTPsec Project Blog. My first major post there expands on some of the things I’ve written here about stripping crap out of the NTP codebase.

Expect future posts on spinoff tools, the NTPsec test farm, and the prospects for moving NTPsec out of C, probably about one a week. I have a couple of these in draft already.

Posted Fri Dec 2 21:09:48 2016 Tags:

I pushed the patch to require resolution today, expect this to hit the general public with libinput 1.6. If your graphics tablet does not provide axis resolution we will need to add a hwdb entry. Please file a bug in systemd and CC me on it (@whot).

How do you know if your device has resolution? Run sudo evemu-describe against the device node and look for the ABS_X/ABS_Y entries:


# Event code 0 (ABS_X)
# Value 2550
# Min 0
# Max 3968
# Fuzz 0
# Flat 0
# Resolution 13
# Event code 1 (ABS_Y)
# Value 1323
# Min 0
# Max 2240
# Fuzz 0
# Flat 0
# Resolution 13
if the Resolution value is 0 you'll need a hwdb entry or your tablet will stop working in libinput 1.6. You can file the bug now and we can get it fixed, that way it'll be in place once 1.6 comes out.
Posted Tue Nov 29 02:48:00 2016 Tags:

The Fedora Change to retire the synaptics driver was approved by FESCO. This will apply to Fedora 26 and is part of a cleanup to, ironically, make the synaptics driver easier to install.

Since Fedora 22, xorg-x11-drv-libinput is the preferred input driver. For historical reasons, almost all users have the xorg-x11-drv-synaptics package installed. But to actually use the synaptics driver over xorg-x11-drv-libinput requires a manually dropped xorg.conf.d snippet. And that's just not ideal. Unfortunately, in DNF/RPM we cannot just say "replace the xorg-x11-drv-synaptics package with xorg-x11-drv-libinput on update but still allow users to install xorg-x11-drv-synaptics after that".

So the path taken is a package rename. Starting with Fedora 26, xorg-x11-drv-libinput's RPM will Provide/Obsolete [1] xorg-x11-drv-synaptics and thus remove the old package on update. Users that need the synaptics driver then need to install xorg-x11-drv-synaptics-legacy. This driver will then install itself correctly without extra user intervention and will take precedence over the libinput driver. Removing xorg-x11-drv-synaptics-legacy will remove the driver assignment and thus fall back to libinput for touchpads. So aside from the name change, everything else works smoother now. Both packages are now updated in Rawhide and should be available from your local mirror soon.

What does this mean for you as a user? If you are a synaptics user, after an update/install, you need to now manually install xorg-x11-drv-synaptics-legacy. You can remove any xorg.conf.d snippets assigning the synaptics driver unless they also include other custom configuration.

See the Fedora Change page for details. Note that this is a Fedora-specific change only, the upstream change for this is already in place.

[1] "Provide" in RPM-speak means the package provides functionality otherwise provided by some other package even though it may not necessarily provide the code from that package. "Obsolete" means that installing this package replaces the obsoleted package.

Posted Sun Nov 20 03:57:00 2016 Tags:

I've written more extensively about this here but here's an analogy that should get the point across a bit better: Wayland is just a protocol, just like HTTP. In both cases, you have two sides with very different roles and functionality. In the HTTP case, you have the server (e.g. Apache) and the client (a browser, e.g. Firefox). The communication protocol is HTTP but both sides make a lot of decisions unrelated to the protocol. The server decides what data is sent, the client decides how the data is presented to the user. Wayland is very similar. The server, called the "compositor", decides what data is sent (also: which of the clients even gets the data). The client renders the data [1] and decides what to do with input like key strokes, etc.

Asking Does $FEATURE work under Wayland? is akin to asking Does $FEATURE work under HTTP?. The only answer is: it depends on the compositor and on the client. It's the wrong question. You should ask questions related to the compositor and the client instead, e.g. "does $FEATURE work in GNOME?" or "does $FEATURE work in GTK applications?". That's a question that can be answered.

Of course, there are some cases where the fault is really the protocol itself. But often enough, it's not.

[1] albeit it does so by telling the compositor to display it. The analogy with HTTP only works to some extent... :)

Posted Mon Nov 14 00:45:00 2016 Tags:

Interested in hacking on some low-level stuff and implementing a feature that's useful to a lot of laptop owners out there? We have a feature on libinput's todo list but I'm just constantly losing my fight against the ever-growing todo list. So if you already know C and you're interested in playing around with some low-level bits of software this may be the project for you.

Specifically: within libinput, we want to disable certain devices based on a lid state. In the first instance this means that when the lid switch is toggled to closed, the touchpad and trackpoint get silently disabled to not send events anymore. [1] Since it's based on a switch state, this also means that we'll now have to listen to switch events and expose those devices to libinput users.

The things required to get all this working are:

  • Designing a switch interface plus the boilerplate code required (I've done most of this bit already)
  • Extending the current evdev backend to handle devices with EV_SW and exposing their events
  • Hooking up the switch devices to internal touchpads/trackpoints to disable them ad-hoc
  • Handle those devices where lid switch is broken in the hardware (more details on this when we get to this point)

You get to dabble with libinput and a bit of udev and the kernel. Possibly Xorg stuff, but that's unlikely at this point. This project is well suited for someone with a few spare weekends ahead. It's great for someone who hasn't worked with libinput before, but it's not a project to learn C, you better know that ahead of time. I'd provide the mentoring of course (I'm in UTC+10, so expect IRC/email). If you're interested let me know. Riches and fame may happen but are not guaranteed.

[1] A number of laptops have a hw issue where either device may send random events when the lid is closed

Posted Wed Nov 2 07:14:00 2016 Tags:

I finally have a bit of time to look at touchpad pointer acceleration in libinput. But when I did, I found a great total of 5 bugs across freedesktop.org and Red Hat's bugzilla, despite this being the first thing anyone brings up whenever libinput is mentioned. 5 bugs - that's not much to work on. Note that over time there were also a lot of bugs where pointer acceleration was fixed once the touchpad's axis ranges were corrected which usually is a two-liner for the udev hwdb.

Anyway, point of this post: if you're still having issues with pointer acceleration on your touchpad in libinput, please file a bug against libinput and make it block the new tracker bug 98535. The libinput documentation has instructions on how to report a touchpad bug, but amongst the various things I need is your laptop model name.

Don't complain about it on reddit, phoronix, HN, or in some random forum, because you're just wasting bytes there and it won't get fixed that way.

Posted Tue Nov 1 21:50:00 2016 Tags:
My comments to NIST on the first draft of their call for submissions. #standardization #nist #pqcrypto
Posted Sun Oct 30 23:01:57 2016 Tags:

[ This blog was crossposted on Software Freedom Conservancy's website. ]

As I mentioned in an earlier blog post, I had the privilege of attending Embedded Linux Conference Europe (ELC EU) and the OpenWrt Summit in Berlin, Germany earlier this month. I gave a talk (for which the video is available below) at the OpenWrt Summit. I also had the opportunity to host the first of many conference sessions seeking feedback and input from the Linux developer community about Conservancy's GPL Compliance Project for Linux Developers.

ELC EU has no “BoF Board” where you can post informal sessions. So, we scheduled the session by word of mouth over a lunch hour. We nevertheless got an good turnout (given that our session's main competition was eating food :) of about 15 people.

Most notably and excitingly, Harald Welte, well-known Netfilter developer and leader of gpl-violations.org, was able to attend. Harald talked about his work with gpl-violations.org enforcing his own copyrights in Linux, and explained why this was important work for users of the violating devices. He also pointed out that some of the companies that were sued during his most active period of gpl-violations.org are now regular upstream contributors.

Two people who work in the for-profit license compliance industry attended as well. Some of the discussion focused on usual debates that charities involved in compliance commonly have with the for-profit compliance industry. Specifically, one of them asked how much compliance is enough, by percentage? I responded to his question on two axes. First, I addressed the axis of how many enforcement matters does the GPL Compliance Program for Linux Developers do, by percentage of products violating the GPL? There are, at any given time, hundreds of documented GPL violating products, and our coalition works on only a tiny percentage of those per year. It's a sad fact that only that tiny percentage of the products that violate Linux are actually pursued to compliance.

On the other axis, I discussed the percentage on a per-product basis. From that point of view, the question is really: Is there a ‘close enough to compliance’ that we can as a community accept and forget about the remainder? From my point of view, we frequently compromise anyway, since the GPL doesn't require someone to prepare code properly for upstream contribution. Thus, we all often accept compliance once someone completes the bare minimum of obligations literally written in the GPL, but give us a source release that cannot easily be converted to an upstream contribution. So, from that point of view, we're often accepting a less-than-optimal outcome. The GPL by itself does not inspire upstreaming; the other collaboration techniques that are enabled in our community because of the GPL work to finish that job, and adherence to the Principles assures that process can work. Having many people who work with companies in different ways assures that as a larger community, we try all the different strategies to encourage participation, and inspire today's violators to become tomorrow upstream contributors — as Harald mention has already often happened.

That same axis does include on rare but important compliance problem: when a violator is particularly savvy, and refuses to release very specific parts of their Linux code (as VMware did), even though the license requires it. In those cases, we certainly cannot and should not accept anything less than required compliance — lest companies begin holding back all the most interesting parts of the code that GPL requires them to produce. If that happened, the GPL would cease to function correctly for Linux.

After that part of the discussion, we turned to considerations of corporate contributors, and how they responded to enforcement. Wolfram Sang, one of the developers in Conservancy's coalition, spoke up on this point. He expressed that the focus on for-profit company contributions, and the achievements of those companies, seemed unduly prioritized by some in the community. As an independent contractor and individual developer, Wolfram believes that contributions from people like him are essential to a diverse developer base, that their opinions should be taken into account, and their achievements respected.

I found Wolfram's points particularly salient. My view is that Free Software development, including for Linux, succeeds because both powerful and wealthy entities and individuals contribute and collaborate together on equal footing. While companies have typically only enforce the GPL on their own copyrights for business reasons (e.g., there is at least one example of a major Linux-contributing company using GPL enforcement merely as a counter-punch in a patent lawsuit), individual developers who join Conservancy's coalition follow community principles and enforce to defend the rights of their users.

At the end of the session, I asked two developers who hadn't spoken during the session, and who aren't members of Conservancy's coalition, their opinion on how enforcement was historically carried out by gpl-violations.org, and how it is currently carried out by Conservancy's GPL Compliance Program for Linux Developers. Both responded with a simple response (paraphrased): it seems like a good thing to do; keep doing it!

I finished up the session by inviting everyone to the join the principles-discuss list, where public discussion about GPL enforcement under the Principles has already begun. I also invited everyone to attend my talk, that took place an hour later at the OpenWrt Summit, which was co-located with ELC EU.

In that talk, I spoke about a specific example of community success in GPL enforcement. As explained on the OpenWrt history page, OpenWrt was initially made possible thanks to GPL enforcement done by BusyBox and Linux contributors in a coalition together. (Those who want to hear more about the connection between GPL enforcement and OpenWrt can view my talk.)

Since there weren't opportunities to promote impromptu sessions on-site, this event was a low-key (but still quite nice) start to Conservancy's planned year-long effort seeking feedback about GPL compliance and enforcement. Our next session is an official BoF session at Linux Plumbers Conference, scheduled for next Thursday 3 November at 18:00. It will be led by my colleagues Karen Sandler and Brett Smith.

Posted Thu Oct 27 20:47:00 2016 Tags:
I always thought that Mathematica’s language (now coined Wolfram language) is well conceived. But the implementation has lots of oddities. I usually just shrug and and move on but I think I should document them. I won’t wait until I collected a set but instead write as soon as I (re)discover or remember them.
Posted Wed Oct 12 06:02:44 2016 Tags:

Short version: the master branch of Mono now has support for TLS 1.2 out of the box. This means that SslStream now uses TLS 1.2, and uses of HttpWebRequest for HTTPS endpoints also uses TLS 1.2 on the desktop.

This brings TLS 1.2 to Mono on Unix/Linux in addition to Xamarin.{Mac,iOS,tvOS} which were already enabled to use TLS 1.2 via the native Apple TLS stack.

To use, install your fresh version of Mono, and then either run the btls-cert-sync command which will convert your existing list of trusted certificates to the new format (if you used cert-sync or mozroots in the past).

In Detail

The new version of Mono now embeds Google's Boring SSL as the TLS implementation to use.

Last year, you might remember that we completed a C# implementation of TLS 1.2. But we were afraid of releasing a TLS stack that had not been audited, that might contain exploitable holes, and that we did not have the cryptographic chops to ensure that the implementation was bullet proof.

So we decided that rather than ship a brand new TLS implementation we would use a TLS implementation that had been audited and was under active development.

So we picked Boring TLS, which is Google's fork of OpenSSL. This is the stack that powers Android and Google Chrome so we felt more comfortable using this implementation than a brand new implementation.

Linux Distributions

We are considering adding a --with-openssl-cert-directory= option to the configure script so that Linux distributions that package Mono could pass a directory that contains trusted root certificates in the format expected by OpenSSL.

Let us discuss the details in the mono-devel-list@lists.dot.net

Posted Sat Oct 1 01:34:52 2016 Tags:

I just shipped what was probably the silliest and most pointless software release of my career. But hey, it’s the reference implementation of a language and I’m funny that way.

Because I write compilers for fun, I have a standing offer out to reimplement any weird old language for which I am sent a sufficiently detailed softcopy spec. (I had to specify softcopy because scanning and typo-correcting hardcopy is too much work.)

In the quarter-century this offer has been active, I have (re) implemented at least the following: INTERCAL, Michigan Algorithmic Decoder, and a pair of obscure 1960s teaching languages called CORC and CUPL, and an obscure computer-aided-instruction language called Pilot.

Pilot…that one was special. Not in a good way, alas. I don’t know where I bumped into a friend of the language’s implementor, but it was in 1991 when he had just succeeded in getting IEEE to issue a standard for it – IEEE Std 1154-1991. He gave me a copy of the standard.

I should have been clued in by the fact that he also gave me an errata sheet not much shorter than the standard. But the full horror did not come home to me until I sat down and had a good look at both documents – and, friends, PILOT’s design was exceeded in awfulness only by the sloppiness and vagueness of its standard. Even after the corrections.

But I had promised to do a reference implementation, and I did. Delivered it to the inventor’s friend. He couldn’t get it to work – some problem with the version of YACC he was using, if I recall correctly. It wasn’t something I could fix remotely, and I left it to him to figure out, being pretty disgusted with the project. I don’t know if he ever did.

I did fix a couple of minor bugs in my masters; I even shipped occasional releases until late 1996. Then…I let the code molder in a corner for twenty years.

But these things have a way of coming back on you. I got a set of fixes recently from one Frank J. Lhota, forward-porting it to use modern Bison and Flex versions. Dear sweet fornicating Goddess, that meant I’d have to…issue another release. Because it’s bad form to let fix patches drop on the floor pour discourager les autres.

So here it is. It does have one point of mild interest; the implementation is both an interpreter and a compiler (it’s a floor wax! It’s a dessert topping!) for the language – that is, it can either interpret the parsed syntax tree or generate and compile corresponding C code.

I devoutly hope I never again implement a language design as botched as Pilot. INTERCAL was supposed to be a joke…

Posted Tue Sep 27 05:28:36 2016 Tags:

So, the Washington Post publishes yet another bullshit article on gun policy.

In this one, the NRA is charged with racism because it doesn’t leap to defend the right of black men to bear arms without incurring a lethal level of police suspicion.

In a previous blog post, I considered some relevant numbers. At 12% of the population blacks commit 50% of violent index crimes. If you restrict to males in the age range that concentrates criminal behavior, the numbers work out to a black suspect being a a more likely homicidal threat to cops and public safety by at least 26:1.

I haven’t worked out how the conditional probabilities crunch out if you have the prior that your suspect is armed, but it probably makes that 26:1 ratio worse rather than better.

Police who react to a random black male behaving suspiciously who might be in the critical age range as though he is an near-imminent lethal threat, are being rational, not racist. They’re doing what crime statistics and street-level experience train them to do, and they’re right to do it. This was true even before the post-Ferguson wave of deliberate assassinations of police by blacks.

The NRA would, I’m sure, love to defend the RKBA of a black man who isn’t a thug or gangbanger. So would I. The trouble is that when you’re considering police stops nice cases like that are damned thin on the ground.

Seriously, the victims in these stop-and-shoot cases pretty much always turn out to have a history of violent behavior and rap sheets as long as your arm. Often (as in the recent Terence Crutcher case) there is PCP or some other disassociative anaesthetic in their system or in arm’s reach.

It’s hardly any wonder the NRA doesn’t want to spend reputational capital defending the RKBA of these mooks. I wouldn’t either in their shoes; this is not racism, it’s a very rational reluctance to get one’s cause entangled with the scum of the earth.

I cannot help but think that articles like the Post are intended to put the NRA on the horns of a dilemma; remain silent in these cases or be falsely accused of racism, versus speaking up and hearing “Aha! So all that other posturing was bogus, you think it’s just fine for hardened criminals to have guns!”

Sigh…

Posted Sat Sep 24 14:32:55 2016 Tags:

[ This blog was crossposted on Software Freedom Conservancy's website. ]

Last month, Conservancy made a public commitment to attend Linux-related events to get feedback from developers about our work generally, and Conservancy's GPL Compliance Program for Linux Developers specifically. As always, even before that, we were regularly submitting talks to nearly any event with Linux in its name. As a small charity, we always request travel funding from the organizers, who are often quite gracious. As I mentioned in my blog posts about LCA 2016 and GUADEC 2016, the organizers covered my travel funding there, and recently both Karen and I both received travel funding to speak at LCA 2017 and DebConf 2016, as well as many other events this year.

Recently, I submitted talks for the CFPs of Linux Foundation's Embedded Linux Conference Europe (ELC EU) and the Prpl Foundation's OpenWRT Summit. The latter was accepted, and the folks at the Prpl Foundation graciously offered to fund my flight costs to speak at the OpenWRT Summit! I've never spoken at an OpenWRT event before and I'm looking forward to the opportunity getting to know the OpenWRT and LEDE communities better by speaking at that event, and am excited to discuss Conservancy's work with them.

OpenWRT Summit, while co-located, is a wholly separate event from LF's ELC EU. Unfortunately, I was not so lucky in my talk submissions there: my talk proposal has been waitlisted since July. I was hopeful after a talk cancellation in mid-August. (I know because the speaker who canceled suggested that I request his slot for my waitlisted talk.) Unfortunately, the LF staff informed me that they understandably filled his open slot with a sponsored session that came in.

The good news is that my OpenWRT Summit flight is booked, and my friend (and Conservancy Board Member Emeritus) Loïc Dachary (who lives in Berlin) has agreed to let me crash with him for that week. So, I'll be in town for the entirety of ELC EU with almost no direct travel costs to Conservancy! The bad news is that it seems my ELC EU talk remains waitlisted. Therefore, I don't have a confirmed registration for the rest of ELC EU (beyond OpenWRT Summit).

While it seems like a perfect and cost-effective opportunity to be able to attend both events, that seems harder than I thought! Once I confirmed my OpenWRT Summit travel arrangements, I asked for the hobbyist discount to register for ELC EU, but LF staff informed me yesterday that the hobbyist (as well as the other discounts) are sold out. The moral of the story is that logistics are just plain tough and time-consuming when you work for a charity with an extremely limited travel budget. ☻

Yet, it seems a shame to waste the opportunity of being in town with so many Linux developers and not being able to see or talk to them, so Conservancy is asking for some help from you to fund the $680 of my registration costs for ELC EU. That's just about six new Conservancy supporter signups, so I hope we can get six new Supporters before Linux Foundation's ELC EU conference begins on October 10th. Either way, I look forward to seeing those developers who attend the co-located OpenWRT Summit! And, if the logistics work out — perhaps I'll see you at ELC EU as well!

Posted Wed Sep 21 21:30:00 2016 Tags:

First a definition: a trackstick is also called trackpoint, pointing stick, or "that red knob between G, H, and B". I'll be using trackstick here, because why not.

This post is the continuation of libinput and the Lenovo T450 and T460 series touchpads where we focused on a stalling pointer when moving the finger really slowly. Turns out the T460s at least, possibly others in the *60 series have another bug that caused a behaviour that is much worse but we didn't notice for ages as we were focusing on the high-precision cursor movement. Specifically, the pointer would just randomly stop moving for a short while (spoiler alert: 300ms), regardless of the movement speed.

libinput has built-in palm detection and one of the things it does is to disable the touchpad when the trackstick is in use. It's not uncommon to rest the hand near or on the touchpad while using the trackstick and any detected touch would cause interference with the pointer motion. So events from the touchpad are ignored whenever the trackpoint sends events. [1]

On (some of) the T460s the trackpoint sends spurious events. In the recording I have we have random events at 9s, then again 3.5s later, then 14s later, then 2s later, etc. Each time, our palm detection could would assume the trackpoint was in use and disable the touchpad for 300ms. If you were using the touchpad while this was happening, the touchpad would suddenly stop moving for 300ms and then continue as normal. Depending on how often these spurious events come in and the user's current caffeination state, this was somewhere between odd, annoying and infuriating.

The good news is: this is fixed in libinput now. libinput 1.5 and the upcoming 1.4.3 releases will have a fix that ignores these spurious events and makes the touchpad stalls a footnote of history. Hooray.

[1] we still allow touchpad physical button presses, and trackpoint button clicks won't disable the touchpad

Posted Tue Sep 20 06:43:00 2016 Tags:

This post explains how the evdev protocol works. After reading this post you should understand what evdev is and how to interpret evdev event dumps to understand what your device is doing. The post is aimed mainly at users having to debug a device, I will thus leave out or simplify some of the technical details. I'll be using the output from evemu-record as example because that is the primary debugging tool for evdev.

What is evdev?

evdev is a Linux-only generic protocol that the kernel uses to forward information and events about input devices to userspace. It's not just for mice and keyboards but any device that has any sort of axis, key or button, including things like webcams and remote controls. Each device is represented as a device node in the form of /dev/input/event0, with the trailing number increasing as you add more devices. The node numbers are re-used after you unplug a device, so don't hardcode the device node into a script. The device nodes are also only readable by root, thus you need to run any debugging tools as root too.

evdev is the primary way to talk to input devices on Linux. All X.Org drivers on Linux use evdev as protocol and libinput as well. Note that "evdev" is also the shortcut used for xf86-input-evdev, the X.Org driver to handle generic evdev devices, so watch out for context when you read "evdev" on a mailing list.

Communicating with evdev devices

Communicating with a device is simple: open the device node and read from it. Any data coming out is a struct input_event, defined in /usr/include/linux/input.h:


struct input_event {
struct timeval time;
__u16 type;
__u16 code;
__s32 value;
};
I'll describe the contents later, but you can see that it's a very simple struct.

Static information about the device such as its name and capabilities can be queried with a set of ioctls. Note that you should always use libevdevto interact with a device, it blunts the few sharp edges evdev has. See the libevdev documentation for usage examples.

evemu-record, our primary debugging tool for anything evdev is very simple. It reads the static information about the device, prints it and then simply reads and prints all events as they come in. The output is in machine-readable format but it's annotated with human-readable comments (starting with #). You can always ignore the non-comment bits. There's a second command, evemu-describe, that only prints the description and exits without waiting for events.

Relative devices and keyboards

The top part of an evemu-record output is the device description. This is a list of static properties that tells us what the device is capable of. For example, the USB mouse I have plugged in here prints:


# Input device name: "PIXART USB OPTICAL MOUSE"
# Input device ID: bus 0x03 vendor 0x93a product 0x2510 version 0x110
# Supported events:
# Event type 0 (EV_SYN)
# Event code 0 (SYN_REPORT)
# Event code 1 (SYN_CONFIG)
# Event code 2 (SYN_MT_REPORT)
# Event code 3 (SYN_DROPPED)
# Event code 4 ((null))
# Event code 5 ((null))
# Event code 6 ((null))
# Event code 7 ((null))
# Event code 8 ((null))
# Event code 9 ((null))
# Event code 10 ((null))
# Event code 11 ((null))
# Event code 12 ((null))
# Event code 13 ((null))
# Event code 14 ((null))
# Event type 1 (EV_KEY)
# Event code 272 (BTN_LEFT)
# Event code 273 (BTN_RIGHT)
# Event code 274 (BTN_MIDDLE)
# Event type 2 (EV_REL)
# Event code 0 (REL_X)
# Event code 1 (REL_Y)
# Event code 8 (REL_WHEEL)
# Event type 4 (EV_MSC)
# Event code 4 (MSC_SCAN)
# Properties:
The device name is the one (usually) set by the manufacturer and so are the vendor and product IDs. The bus is one of the "BUS_USB" and similar constants defined in /usr/include/linux/input.h. The version is often quite arbitrary, only a few devices have something meaningful here.

We also have a set of supported events, categorised by "event type" and "event code" (note how type and code are also part of the struct input_event). The type is a general category, and /usr/include/linux/input-event-codes.h defines quite a few of those. The most important types are EV_KEY (keys and buttons), EV_REL (relative axes) and EV_ABS (absolute axes). In the output above we can see that we have EV_KEY and EV_REL set.

As a subitem of each type we have the event code. The event codes for this device are self-explanatory: BTN_LEFT, BTN_RIGHT and BTN_MIDDLE are the left, right and middle button. The axes are a relative x axis, a relative y axis and a wheel axis (i.e. a mouse wheel). EV_MSC/MSC_SCAN is used for raw scancodes and you can usually ignore it. And finally we have the EV_SYN bits but let's ignore those, they are always set for all devices.

Note that an event code cannot be on its own, it must be a tuple of (type, code). For example, REL_X and ABS_X have the same numerical value and without the type you won't know which one is which.

That's pretty much it. A keyboard will have a lot of EV_KEY bits set and the EV_REL axes are obviously missing (but not always...). Instead of BTN_LEFT, a keyboard would have e.g. KEY_ESC, KEY_A, KEY_B, etc. 90% of device debugging is looking at the event codes and figuring out which ones are missing or shouldn't be there.

Exercise: You should now be able to read a evemu-record description from any mouse or keyboard device connected to your computer and understand what it means. This also applies to most special devices such as remotes - the only thing that changes are the names for the keys/buttons. Just run sudo evemu-describe and pick any device in the list.

The events from relative devices and keyboards

evdev is a serialised protocol. It sends a series of events and then a synchronisation event to notify us that the preceeding events all belong together. This synchronisation event is EV_SYN SYN_REPORT, is generated by the kernel, not the device and hence all EV_SYN codes are always available on all devices.

Let's have a look at a mouse movement. As explained above, half the line is machine-readable but we can ignore that bit and look at the human-readable output on the right.


E: 0.335996 0002 0000 0001 # EV_REL / REL_X 1
E: 0.335996 0002 0001 -002 # EV_REL / REL_Y -2
E: 0.335996 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
This means that within one hardware event, we've moved 1 device unit to the right (x axis) and two device units up (y axis). Note how all events have the same timestamp (0.335996).

Let's have a look at a button press:


E: 0.656004 0004 0004 589825 # EV_MSC / MSC_SCAN 589825
E: 0.656004 0001 0110 0001 # EV_KEY / BTN_LEFT 1
E: 0.656004 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
E: 0.727002 0004 0004 589825 # EV_MSC / MSC_SCAN 589825
E: 0.727002 0001 0110 0000 # EV_KEY / BTN_LEFT 0
E: 0.727002 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
For button events, the value 1 signals button pressed, button 0 signals button released.

And key events look like this:


E: 0.000000 0004 0004 458792 # EV_MSC / MSC_SCAN 458792
E: 0.000000 0001 001c 0000 # EV_KEY / KEY_ENTER 0
E: 0.000000 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
E: 0.560004 0004 0004 458976 # EV_MSC / MSC_SCAN 458976
E: 0.560004 0001 001d 0001 # EV_KEY / KEY_LEFTCTRL 1
E: 0.560004 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
[....]
E: 1.172732 0001 001d 0002 # EV_KEY / KEY_LEFTCTRL 2
E: 1.172732 0000 0000 0001 # ------------ SYN_REPORT (1) ----------
E: 1.200004 0004 0004 458758 # EV_MSC / MSC_SCAN 458758
E: 1.200004 0001 002e 0001 # EV_KEY / KEY_C 1
E: 1.200004 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
Mostly the same as button events. But wait, there is one difference: we have a value of 2 as well. For key events, a value 2 means "key repeat". If you're on the tty, then this is what generates repeat keys for you. In X and Wayland we ignore these repeat events and instead use XKB-based key repeat.

Now look at the keyboard events again and see if you can make sense of the sequence. We have an Enter release (but no press), then ctrl down (and repeat), followed by a 'c' press - but no release. The explanation is simple - as soon as I hit enter in the terminal, evemu-record started recording so it captured the enter release too. And it stopped recording as soon as ctrl+c was down because that's when it was cancelled by the terminal. One important takeaway here: the evdev protocol is not guaranteed to be balanced. You may see a release for a key you've never seen the press for, and you may be missing a release for a key/button you've seen the press for (this happens when you stop recording). Oh, and there's one danger: if you record your keyboard and you type your password, the keys will show up in the output. Security experts generally reocmmend not publishing event logs with your password in it.

Exercise: You should now be able to read a evemu-record events list from any mouse or keyboard device connected to your computer and understand the event sequence.This also applies to most special devices such as remotes - the only thing that changes are the names for the keys/buttons. Just run sudo evemu-record and pick any device listed.

Absolute devices

Things get a bit more complicated when we look at absolute input devices like a touchscreen or a touchpad. Yes, touchpads are absolute devices in hardware and the conversion to relative events is done in userspace by e.g. libinput. The output of my touchpad is below. Note that I've manually removed a few bits to make it easier to grasp, they will appear later in the multitouch discussion.


# Input device name: "SynPS/2 Synaptics TouchPad"
# Input device ID: bus 0x11 vendor 0x02 product 0x07 version 0x1b1
# Supported events:
# Event type 0 (EV_SYN)
# Event code 0 (SYN_REPORT)
# Event code 1 (SYN_CONFIG)
# Event code 2 (SYN_MT_REPORT)
# Event code 3 (SYN_DROPPED)
# Event code 4 ((null))
# Event code 5 ((null))
# Event code 6 ((null))
# Event code 7 ((null))
# Event code 8 ((null))
# Event code 9 ((null))
# Event code 10 ((null))
# Event code 11 ((null))
# Event code 12 ((null))
# Event code 13 ((null))
# Event code 14 ((null))
# Event type 1 (EV_KEY)
# Event code 272 (BTN_LEFT)
# Event code 325 (BTN_TOOL_FINGER)
# Event code 328 (BTN_TOOL_QUINTTAP)
# Event code 330 (BTN_TOUCH)
# Event code 333 (BTN_TOOL_DOUBLETAP)
# Event code 334 (BTN_TOOL_TRIPLETAP)
# Event code 335 (BTN_TOOL_QUADTAP)
# Event type 3 (EV_ABS)
# Event code 0 (ABS_X)
# Value 2919
# Min 1024
# Max 5112
# Fuzz 0
# Flat 0
# Resolution 42
# Event code 1 (ABS_Y)
# Value 3711
# Min 2024
# Max 4832
# Fuzz 0
# Flat 0
# Resolution 42
# Event code 24 (ABS_PRESSURE)
# Value 0
# Min 0
# Max 255
# Fuzz 0
# Flat 0
# Resolution 0
# Event code 28 (ABS_TOOL_WIDTH)
# Value 0
# Min 0
# Max 15
# Fuzz 0
# Flat 0
# Resolution 0
# Properties:
# Property type 0 (INPUT_PROP_POINTER)
# Property type 2 (INPUT_PROP_BUTTONPAD)
# Property type 4 (INPUT_PROP_TOPBUTTONPAD)
We have a BTN_LEFT again and a set of other buttons that I'll explain in a second. But first we look at the EV_ABS output. We have the same naming system as above. ABS_X and ABS_Y are the x and y axis on the device, ABS_PRESSURE is an (arbitrary) ranged pressure value.

Absolute axes have a bit more state than just a simple bit. Specifically, they have a minimum and maximum (not all hardware has the top-left sensor position on 0/0, it can be an arbitrary position, specified by the minimum). Notable here is that the axis ranges are simply the ones announced by the device - there is no guarantee that the values fall within this range and indeed a lot of touchpad devices tend to send values slightly outside that range. Fuzz and flat can be safely ignored, but resolution is interesting. It is given in units per millimeter and thus tells us the size of the device. in the above case: (5112 - 1024)/42 means the device is 97mm wide. The resolution is quite commonly wrong, a lot of axis overrides need the resolution changed to the correct value.

The axis description also has a current value listed. The kernel only sends events when the value changes, so even if the actual hardware keeps sending events, you may never see them in the output if the value remains the same. In other words, holding a finger perfectly still on a touchpad creates plenty of hardware events, but you won't see anything coming out of the event node.

Finally, we have properties on this device. These are used to indicate general information about the device that's not otherwise obvious. In this case INPUT_PROP_POINTER tells us that we need a pointer for this device (it is a touchpad after all, a touchscreen would instead have INPUT_PROP_DIRECT set). INPUT_PROP_BUTTONPAD means that this is a so-called clickpad, it does not have separate physical buttons but instead the whole touchpad clicks. Ignore INPUT_PROP_TOPBUTTONPAD because it only applies to the Lenovo *40 series of devices.

Ok, back to the buttons: aside from BTN_LEFT, we have BTN_TOUCH. This one signals that the user is touching the surface of the touchpad (with some in-kernel defined minimum pressure value). It's not just for finger-touches, it's also used for graphics tablet stylus touchpes (so really, it's more "contact" than "touch" but meh).

The BTN_TOOL_FINGER event tells us that a finger is in detectable range. This gives us two bits of information: first, we have a finger (a tablet would have e.g. BTN_TOOL_PEN) and second, we may have a finger in proximity without touching. On many touchpads, BTN_TOOL_FINGER and BTN_TOUCH come in the same event, but others can detect a finger hovering over the touchpad too (in which case you'd also hope for ABS_DISTANCE being available on the touchpad).

Finally, the BTN_TOOL_DOUBLETAP up to BTN_TOOL_QUINTTAP tell us whether the device can detect 2 through to 5 fingers on the touchpad. This doesn't actually track the fingers, it merely tells you "3 fingers down" in the case of BTN_TOOL_TRIPLETAP.

Exercise: Look at your touchpad's description and figure out if the size of the touchpad is correct based on the axis information [1]. Check how many fingers your touchpad can detect and whether it can do pressure or distance detection.

The events from absolute devices

Events from absolute axes are not really any different than events from relative devices which we already covered. The same type/code combination with a value and a timestamp, all framed by EV_SYN SYN_REPORT events. Here's an example of me touching the touchpad:


E: 0.000001 0001 014a 0001 # EV_KEY / BTN_TOUCH 1
E: 0.000001 0003 0000 3335 # EV_ABS / ABS_X 3335
E: 0.000001 0003 0001 3308 # EV_ABS / ABS_Y 3308
E: 0.000001 0003 0018 0069 # EV_ABS / ABS_PRESSURE 69
E: 0.000001 0001 0145 0001 # EV_KEY / BTN_TOOL_FINGER 1
E: 0.000001 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +0ms
E: 0.021751 0003 0018 0070 # EV_ABS / ABS_PRESSURE 70
E: 0.021751 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +21ms
E: 0.043908 0003 0000 3334 # EV_ABS / ABS_X 3334
E: 0.043908 0003 0001 3309 # EV_ABS / ABS_Y 3309
E: 0.043908 0003 0018 0065 # EV_ABS / ABS_PRESSURE 65
E: 0.043908 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +22ms
E: 0.052469 0001 014a 0000 # EV_KEY / BTN_TOUCH 0
E: 0.052469 0003 0018 0000 # EV_ABS / ABS_PRESSURE 0
E: 0.052469 0001 0145 0000 # EV_KEY / BTN_TOOL_FINGER 0
E: 0.052469 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +9ms
In the first event you see BTN_TOOL_FINGER and BTN_TOUCH set (this touchpad doesn't detect hovering fingers). An x/y coordinate pair and a pressure value. The pressure changes in the second event, the third event changes pressure and location. Finally, we have BTN_TOOL_FINGER and BTN_TOUCH released on finger up, and the pressure value goes back to 0. Notice how the second event didn't contain any x/y coordinates? As I said above, the kernel only sends updates on absolute axes when the value changed.

Ok, let's look at a three-finger tap (again, minus the ABS_MT_ bits):


E: 0.000001 0001 014a 0001 # EV_KEY / BTN_TOUCH 1
E: 0.000001 0003 0000 2149 # EV_ABS / ABS_X 2149
E: 0.000001 0003 0001 3747 # EV_ABS / ABS_Y 3747
E: 0.000001 0003 0018 0066 # EV_ABS / ABS_PRESSURE 66
E: 0.000001 0001 014e 0001 # EV_KEY / BTN_TOOL_TRIPLETAP 1
E: 0.000001 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +0ms
E: 0.034209 0003 0000 2148 # EV_ABS / ABS_X 2148
E: 0.034209 0003 0018 0064 # EV_ABS / ABS_PRESSURE 64
E: 0.034209 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +34ms
[...]
E: 0.138510 0003 0000 4286 # EV_ABS / ABS_X 4286
E: 0.138510 0003 0001 3350 # EV_ABS / ABS_Y 3350
E: 0.138510 0003 0018 0055 # EV_ABS / ABS_PRESSURE 55
E: 0.138510 0001 0145 0001 # EV_KEY / BTN_TOOL_FINGER 1
E: 0.138510 0001 014e 0000 # EV_KEY / BTN_TOOL_TRIPLETAP 0
E: 0.138510 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +23ms
E: 0.147834 0003 0000 4287 # EV_ABS / ABS_X 4287
E: 0.147834 0003 0001 3351 # EV_ABS / ABS_Y 3351
E: 0.147834 0003 0018 0037 # EV_ABS / ABS_PRESSURE 37
E: 0.147834 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +9ms
E: 0.157151 0001 014a 0000 # EV_KEY / BTN_TOUCH 0
E: 0.157151 0003 0018 0000 # EV_ABS / ABS_PRESSURE 0
E: 0.157151 0001 0145 0000 # EV_KEY / BTN_TOOL_FINGER 0
E: 0.157151 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
In the first event, the touchpad detected all three fingers at the same time. So get BTN_TOUCH, x/y/pressure and BTN_TOOL_TRIPLETAP set. Note that the various BTN_TOOL_* bits are mutually exclusive. BTN_TOOL_FINGER means "exactly 1 finger down" and you can't have exactly 1 finger down when you have three fingers down. In the second event x and pressure update (y has no event, it stayed the same).

In the event after the break, we switch from three fingers to one finger. BTN_TOOL_TRIPLETAP is released, BTN_TOOL_FINGER is set. That's very common. Humans aren't robots, you can't release all fingers at exactly the same time, so depending on the hardware scanout rate you have intermediate states where one finger has left already, others are still down. In this case I released two fingers between scanouts, one was still down. It's not uncommon to see a full cycle from BTN_TOOL_FINGER to BTN_TOOL_DOUBLETAP to BTN_TOOL_TRIPLETAP on finger down or the reverse on finger up.

Exercise: test out the pressure values on your touchpad and see how close you can get to the actual announced range. Check how accurate the multifinger detection is by tapping with two, three, four and five fingers. (In both cases, you'll likely find that it's very much hit and miss).

Multitouch and slots

Now we're at the most complicated topic regarding evdev devices. In the case of multitouch devices, we need to send multiple touches on the same axes. So we need an additional dimension and that is called multitouch slots (there is another, older multitouch protocol that doesn't use slots but it is so rare now that you don't need to bother).

First: all axes that are multitouch-capable are repeated as ABS_MT_foo axis. So if you have ABS_X, you also get ABS_MT_POSITION_X and both axes have the same axis ranges and resolutions. The reason here is backwards-compatibility: if a device only sends multitouch events, older programs only listening to the ABS_X etc. events won't work. Some axes may only be available for single-touch (ABS_MT_TOOL_WIDTH in this case).

Let's have a look at my touchpad, this time without the axes removed:


# Input device name: "SynPS/2 Synaptics TouchPad"
# Input device ID: bus 0x11 vendor 0x02 product 0x07 version 0x1b1
# Supported events:
# Event type 0 (EV_SYN)
# Event code 0 (SYN_REPORT)
# Event code 1 (SYN_CONFIG)
# Event code 2 (SYN_MT_REPORT)
# Event code 3 (SYN_DROPPED)
# Event code 4 ((null))
# Event code 5 ((null))
# Event code 6 ((null))
# Event code 7 ((null))
# Event code 8 ((null))
# Event code 9 ((null))
# Event code 10 ((null))
# Event code 11 ((null))
# Event code 12 ((null))
# Event code 13 ((null))
# Event code 14 ((null))
# Event type 1 (EV_KEY)
# Event code 272 (BTN_LEFT)
# Event code 325 (BTN_TOOL_FINGER)
# Event code 328 (BTN_TOOL_QUINTTAP)
# Event code 330 (BTN_TOUCH)
# Event code 333 (BTN_TOOL_DOUBLETAP)
# Event code 334 (BTN_TOOL_TRIPLETAP)
# Event code 335 (BTN_TOOL_QUADTAP)
# Event type 3 (EV_ABS)
# Event code 0 (ABS_X)
# Value 5112
# Min 1024
# Max 5112
# Fuzz 0
# Flat 0
# Resolution 41
# Event code 1 (ABS_Y)
# Value 2930
# Min 2024
# Max 4832
# Fuzz 0
# Flat 0
# Resolution 37
# Event code 24 (ABS_PRESSURE)
# Value 0
# Min 0
# Max 255
# Fuzz 0
# Flat 0
# Resolution 0
# Event code 28 (ABS_TOOL_WIDTH)
# Value 0
# Min 0
# Max 15
# Fuzz 0
# Flat 0
# Resolution 0
# Event code 47 (ABS_MT_SLOT)
# Value 0
# Min 0
# Max 1
# Fuzz 0
# Flat 0
# Resolution 0
# Event code 53 (ABS_MT_POSITION_X)
# Value 0
# Min 1024
# Max 5112
# Fuzz 8
# Flat 0
# Resolution 41
# Event code 54 (ABS_MT_POSITION_Y)
# Value 0
# Min 2024
# Max 4832
# Fuzz 8
# Flat 0
# Resolution 37
# Event code 57 (ABS_MT_TRACKING_ID)
# Value 0
# Min 0
# Max 65535
# Fuzz 0
# Flat 0
# Resolution 0
# Event code 58 (ABS_MT_PRESSURE)
# Value 0
# Min 0
# Max 255
# Fuzz 0
# Flat 0
# Resolution 0
# Properties:
# Property type 0 (INPUT_PROP_POINTER)
# Property type 2 (INPUT_PROP_BUTTONPAD)
# Property type 4 (INPUT_PROP_TOPBUTTONPAD)
We have an x and y position for multitouch as well as a pressure axis. There are also two special multitouch axes that aren't really axes. ABS_MT_SLOT and ABS_MT_TRACKING_ID. The former specifies which slot is currently active, the latter is used to track touch points.

Slots are a static property of a device. My touchpad, as you can see above ony supports 2 slots (min 0, max 1) and thus can track 2 fingers at a time. Whenever the first finger is set down it's coordinates will be tracked in slot 0, the second finger will be tracked in slot 1. When the finger in slot 0 is lifted, the second finger continues to be tracked in slot 1, and if a new finger is set down, it will be tracked in slot 0. Sounds more complicated than it is, think of it as an array of possible touchpoints.

The tracking ID is an incrementing number that lets us tell touch points apart and also tells us when a touch starts and when it ends. The two values are either -1 or a positive number. Any positive number means "new touch" and -1 means "touch ended". So when you put two fingers down and lift them again, you'll get a tracking ID of 1 in slot 0, a tracking ID of 2 in slot 1, then a tracking ID of -1 in both slots to signal they ended. The tracking ID value itself is meaningless, it simply increases as touches are created.

Let's look at a single tap:


E: 0.000001 0003 0039 0387 # EV_ABS / ABS_MT_TRACKING_ID 387
E: 0.000001 0003 0035 2560 # EV_ABS / ABS_MT_POSITION_X 2560
E: 0.000001 0003 0036 2905 # EV_ABS / ABS_MT_POSITION_Y 2905
E: 0.000001 0003 003a 0059 # EV_ABS / ABS_MT_PRESSURE 59
E: 0.000001 0001 014a 0001 # EV_KEY / BTN_TOUCH 1
E: 0.000001 0003 0000 2560 # EV_ABS / ABS_X 2560
E: 0.000001 0003 0001 2905 # EV_ABS / ABS_Y 2905
E: 0.000001 0003 0018 0059 # EV_ABS / ABS_PRESSURE 59
E: 0.000001 0001 0145 0001 # EV_KEY / BTN_TOOL_FINGER 1
E: 0.000001 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +0ms
E: 0.021690 0003 003a 0067 # EV_ABS / ABS_MT_PRESSURE 67
E: 0.021690 0003 0018 0067 # EV_ABS / ABS_PRESSURE 67
E: 0.021690 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +21ms
E: 0.033482 0003 003a 0068 # EV_ABS / ABS_MT_PRESSURE 68
E: 0.033482 0003 0018 0068 # EV_ABS / ABS_PRESSURE 68
E: 0.033482 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +12ms
E: 0.044268 0003 0035 2561 # EV_ABS / ABS_MT_POSITION_X 2561
E: 0.044268 0003 0000 2561 # EV_ABS / ABS_X 2561
E: 0.044268 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +11ms
E: 0.054093 0003 0035 2562 # EV_ABS / ABS_MT_POSITION_X 2562
E: 0.054093 0003 003a 0067 # EV_ABS / ABS_MT_PRESSURE 67
E: 0.054093 0003 0000 2562 # EV_ABS / ABS_X 2562
E: 0.054093 0003 0018 0067 # EV_ABS / ABS_PRESSURE 67
E: 0.054093 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 0.064891 0003 0035 2569 # EV_ABS / ABS_MT_POSITION_X 2569
E: 0.064891 0003 0036 2903 # EV_ABS / ABS_MT_POSITION_Y 2903
E: 0.064891 0003 003a 0059 # EV_ABS / ABS_MT_PRESSURE 59
E: 0.064891 0003 0000 2569 # EV_ABS / ABS_X 2569
E: 0.064891 0003 0001 2903 # EV_ABS / ABS_Y 2903
E: 0.064891 0003 0018 0059 # EV_ABS / ABS_PRESSURE 59
E: 0.064891 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 0.073634 0003 0039 -001 # EV_ABS / ABS_MT_TRACKING_ID -1
E: 0.073634 0001 014a 0000 # EV_KEY / BTN_TOUCH 0
E: 0.073634 0003 0018 0000 # EV_ABS / ABS_PRESSURE 0
E: 0.073634 0001 0145 0000 # EV_KEY / BTN_TOOL_FINGER 0
E: 0.073634 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +9ms
We have a tracking ID (387) signalling finger down, as well as a position plus pressure. then some updates and eventually a tracking ID of -1 (signalling finger up). Notice how there is no ABS_MT_SLOT here - the kernel buffers those too so while you stay in the same slot (0 in this case) you don't see any events for it. Also notice how you get both single-finger as well as multitouch in the same event stream. This is for backwards compatibility [2]

Ok, time for a two-finger tap:


E: 0.000001 0003 0039 0496 # EV_ABS / ABS_MT_TRACKING_ID 496
E: 0.000001 0003 0035 2609 # EV_ABS / ABS_MT_POSITION_X 2609
E: 0.000001 0003 0036 3791 # EV_ABS / ABS_MT_POSITION_Y 3791
E: 0.000001 0003 003a 0054 # EV_ABS / ABS_MT_PRESSURE 54
E: 0.000001 0003 002f 0001 # EV_ABS / ABS_MT_SLOT 1
E: 0.000001 0003 0039 0497 # EV_ABS / ABS_MT_TRACKING_ID 497
E: 0.000001 0003 0035 3012 # EV_ABS / ABS_MT_POSITION_X 3012
E: 0.000001 0003 0036 3088 # EV_ABS / ABS_MT_POSITION_Y 3088
E: 0.000001 0003 003a 0056 # EV_ABS / ABS_MT_PRESSURE 56
E: 0.000001 0001 014a 0001 # EV_KEY / BTN_TOUCH 1
E: 0.000001 0003 0000 2609 # EV_ABS / ABS_X 2609
E: 0.000001 0003 0001 3791 # EV_ABS / ABS_Y 3791
E: 0.000001 0003 0018 0054 # EV_ABS / ABS_PRESSURE 54
E: 0.000001 0001 014d 0001 # EV_KEY / BTN_TOOL_DOUBLETAP 1
E: 0.000001 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +0ms
E: 0.012909 0003 002f 0000 # EV_ABS / ABS_MT_SLOT 0
E: 0.012909 0003 0039 -001 # EV_ABS / ABS_MT_TRACKING_ID -1
E: 0.012909 0003 002f 0001 # EV_ABS / ABS_MT_SLOT 1
E: 0.012909 0003 0039 -001 # EV_ABS / ABS_MT_TRACKING_ID -1
E: 0.012909 0001 014a 0000 # EV_KEY / BTN_TOUCH 0
E: 0.012909 0003 0018 0000 # EV_ABS / ABS_PRESSURE 0
E: 0.012909 0001 014d 0000 # EV_KEY / BTN_TOOL_DOUBLETAP 0
E: 0.012909 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +12ms
This was a really quick two-finger tap that illustrates the tracking IDs nicely. In the first event we get a touch down, then an ABS_MT_SLOT event. This tells us that subsequent events belong to the other slot, so it's the other finger. There too we get a tracking ID + position. In the next event we get an ABS_MT_SLOT to switch back to slot 0. Tracking ID of -1 means that touch ended, and then we see the touch in slot 1 ended too.

Time for a two-finger scroll:


E: 0.000001 0003 0039 0557 # EV_ABS / ABS_MT_TRACKING_ID 557
E: 0.000001 0003 0035 2589 # EV_ABS / ABS_MT_POSITION_X 2589
E: 0.000001 0003 0036 3363 # EV_ABS / ABS_MT_POSITION_Y 3363
E: 0.000001 0003 003a 0048 # EV_ABS / ABS_MT_PRESSURE 48
E: 0.000001 0003 002f 0001 # EV_ABS / ABS_MT_SLOT 1
E: 0.000001 0003 0039 0558 # EV_ABS / ABS_MT_TRACKING_ID 558
E: 0.000001 0003 0035 3512 # EV_ABS / ABS_MT_POSITION_X 3512
E: 0.000001 0003 0036 3028 # EV_ABS / ABS_MT_POSITION_Y 3028
E: 0.000001 0003 003a 0044 # EV_ABS / ABS_MT_PRESSURE 44
E: 0.000001 0001 014a 0001 # EV_KEY / BTN_TOUCH 1
E: 0.000001 0003 0000 2589 # EV_ABS / ABS_X 2589
E: 0.000001 0003 0001 3363 # EV_ABS / ABS_Y 3363
E: 0.000001 0003 0018 0048 # EV_ABS / ABS_PRESSURE 48
E: 0.000001 0001 014d 0001 # EV_KEY / BTN_TOOL_DOUBLETAP 1
E: 0.000001 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +0ms
E: 0.027960 0003 002f 0000 # EV_ABS / ABS_MT_SLOT 0
E: 0.027960 0003 0035 2590 # EV_ABS / ABS_MT_POSITION_X 2590
E: 0.027960 0003 0036 3395 # EV_ABS / ABS_MT_POSITION_Y 3395
E: 0.027960 0003 003a 0046 # EV_ABS / ABS_MT_PRESSURE 46
E: 0.027960 0003 002f 0001 # EV_ABS / ABS_MT_SLOT 1
E: 0.027960 0003 0035 3511 # EV_ABS / ABS_MT_POSITION_X 3511
E: 0.027960 0003 0036 3052 # EV_ABS / ABS_MT_POSITION_Y 3052
E: 0.027960 0003 0000 2590 # EV_ABS / ABS_X 2590
E: 0.027960 0003 0001 3395 # EV_ABS / ABS_Y 3395
E: 0.027960 0003 0018 0046 # EV_ABS / ABS_PRESSURE 46
E: 0.027960 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +27ms
E: 0.051720 0003 002f 0000 # EV_ABS / ABS_MT_SLOT 0
E: 0.051720 0003 0035 2609 # EV_ABS / ABS_MT_POSITION_X 2609
E: 0.051720 0003 0036 3447 # EV_ABS / ABS_MT_POSITION_Y 3447
E: 0.051720 0003 002f 0001 # EV_ABS / ABS_MT_SLOT 1
E: 0.051720 0003 0036 3080 # EV_ABS / ABS_MT_POSITION_Y 3080
E: 0.051720 0003 0000 2609 # EV_ABS / ABS_X 2609
E: 0.051720 0003 0001 3447 # EV_ABS / ABS_Y 3447
E: 0.051720 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +24ms
[...]
E: 0.272034 0003 002f 0000 # EV_ABS / ABS_MT_SLOT 0
E: 0.272034 0003 0039 -001 # EV_ABS / ABS_MT_TRACKING_ID -1
E: 0.272034 0003 002f 0001 # EV_ABS / ABS_MT_SLOT 1
E: 0.272034 0003 0039 -001 # EV_ABS / ABS_MT_TRACKING_ID -1
E: 0.272034 0001 014a 0000 # EV_KEY / BTN_TOUCH 0
E: 0.272034 0003 0018 0000 # EV_ABS / ABS_PRESSURE 0
E: 0.272034 0001 014d 0000 # EV_KEY / BTN_TOOL_DOUBLETAP 0
E: 0.272034 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +30ms
Note that "scroll" is something handled in userspace, so what you see here is just a two-finger move. Everything in there i something we've already seen, but pay attention to the two middle events: as updates come in for each finger, the ABS_MT_SLOT changes before the upates are sent. The kernel filter for identical events is still in effect, so in the third event we don't get an update for the X position on slot 1. The filtering is per-touchpoint, so in this case this means that slot 1 position x is still on 3511, just as it was in the previous event.

That's all you have to remember, really. If you think of evdev as a serialised way of sending an array of touchpoints, with the slots as the indices then it should be fairly clear. The rest is then just about actually looking at the touch positions and making sense of them.

Exercise: do a pinch gesture on your touchpad. See if you can track the two fingers moving closer together. Then do the same but only move one finger. See how the non-moving finger gets less updates.

That's it. There are a few more details to evdev but much of that is just more event types and codes. The few details you really have to worry about when processing events are either documented in libevdev or abstracted away completely. The above should be enough to understand what your device does, and what goes wrong when your device isn't working. Good luck.

[1] If not, file a bug against systemd's hwdb and CC me so we can put corrections in
[2] We treat some MT-capable touchpads as single-touch devices in libinput because the MT data is garbage

Posted Mon Sep 19 03:50:00 2016 Tags:

Yes, there was a bug in my vint64 encapsulation commit. I will neither confirm nor deny any conjecture that I left it in there deliberately to see who would be sharp enough to spot it. I will however note that it is a perfect tutorial example for how you should spot bugs, and why revisions with a simple and provable relationship to their ancestors are best

The following call in libntp/ntp_calendar.c is incorrect:

setvint64u(res, vint64s(res)-0x80000000);

Now consider the line this replaced:

res.Q_s -= 0x80000000;

And notice what that expands to, semantically, in C:

res.Q_s = res.Q_s – 0x80000000;

Spotted it yet?

My encapsulation patch is extremely regular in form. One of my blog commenters (the only one to spot the bug, so far) pointed out correctly that an ideal transformation of this kind looks like it was done using a text editor search and replace feature – and, in fact, I did most of it with regexp-replace commands in Emacs.

It’s good when your patches are this regular, because it means that you can spot bugs by looking for irregularities – places where a local change breaks a rule followed in the rest of the patch. Importantly, this way to spot defects works even when you don’t fully understand the code.

This is a major reason the code state after every change should have a single provable relationship to its antecedent – because if it has more than one change in it, telltale irregularities will be harder to see.

OK, here is the corrected form of the call:

setvint64u(res, vint64u(res)-0x80000000);

The one-character difference is that the correct inner call is to vint64u(), not vint64s(). You should have been able to spot this in one of a couple of ways.

One is by noticing that the original expression was doing unsigned arithmetic, so what is that call to get a signed value doing in there?

The even simpler way to spot the irregularity is to have noticed that in the rest of the diff there are no other calls like

setvint64X(res, vint64Y(res) … );

in which X and Y are unequal. There is a purely textual symmetry in the patch that this one statement breaks. Because the author was being careful about simplicity and provable relationships, that in itself should be enough to focus a reviewer’s suspicions even if the reviewer doesn’t know (or has forgotten) the meaning of the s and u suffixes.

I’m writing this as though the author and reviewer are different people, but these techniques for bug spotting – and, more importantly, these techniques for writing patches so bugs are easy to spot – apply even when you are your own reviewer and you are looking at a diff mere moments after you changed the code.

You get fast at coding, and you get good at doing it with a low defect rate, by developing the habits of mind that make self-checking like this easy. The faster you can self-check, the faster you can write while holding expected defect rates constant. The better you can self-check, the lower you can push your defect rate.

“To do large code changes correctly, factor them into a series of smaller steps such that each revision has a well-defined and provable relationship to the last” is good advice exactly because the “well-defined and provable” relationship creates regularities – invariants – that make buggy changes relatively easy to spot before you commit them.

I often go entire months per project without committing a bug to the repository. There have been good stretches on NTPsec in which my error rate was down around one introduced bug per quarter while I was coding at apparently breakneck speed. This is how I do that.

Having good tests – and the habit of adding a unit or regression test on every new feature or bug – helps a lot with that, of course. But prior to testing is good habits of mind. The combination of good habits of mind with good testing is not just additively effective, it’s multiplicatively so.

Posted Sun Sep 18 15:18:49 2016 Tags:

To do large code changes correctly, factor them into a series of smaller steps such that each revision has a well-defined and provable relationship to the last.

(This is the closest I’ve ever come to a 1-sentence answer to the question “How the fsck do you manage to code with such ridiculously high speed and low defect frequency? I was asked this yet again recently, and trying to translate the general principle into actionable advice has been on my mind. I have two particular NTPsec contributors in mind…)

So here’s a case study, and maybe your chance to catch me in a mistake.

NTP needs a 64-bit scalar type for calendar calculations; what it actually wants is 32 bits of seconds since a far-past epoch and 32 bits of fractional-second precision, which you can think of as a counter for units of seconds * 1e-32. (The details are a little messier than this, but never mind that for now.)

Consequently, one of the archaisms in the NTP code is an internal type called vint64. It dates from the era of 32-bit machines (roughly 1977 to 2008). In those days you couldn’t assume your C compiler had int64_t or uint64_t (64-bit integer and unsigned-integer types). Even after the 64-bit hardware transition, it was some years before you could safely assume that compilers for the remaining 32-bit machines (like today’s Raspberry Pis) would support int64_t/uint64_t.

Thus, a vint64 is an NTP structure wrapping 2 32-bit integers. It comes with a bunch of small functions that do 64-bit scalar arithmetic using it. Also, sadly, there was a lot of code using it that didn’t go through the functional interface, instead exposing the guts of the vint64 structure in unclean ways.

This is, for several reasons, an obvious cleanup target. Today in 2016 we can assume that all compilers of interest to us have 64-bit scalars. In fact the NTP code itself has long assumed this, though the assumption is so well-hidden in the ISC library off to the side that many people who have worked in the main codebase probably do not know it’s there.

If all the vint64s in NTP became typedefs to a scalar 64-bit type, we could use native machine operations in most cases and replace a lot of function calls and ugly exposed guts with C’s arithmetic operators. The result would be more readable, less bulky, and more efficient. In this case we’d only pare away about 300LOC, but relentless pursuit of such small improvements adds up to large ones.

The stupid way to do it would have been to try to go from vint64 to int64_t/uint64_t in one fell swoop. NSF and LF didn’t engage me to be that stupid.

Quoting myself: “A series of smaller steps such that each revision has a well-defined and provable relationship to the last.”

Generally, in cases like this, the thing to do is separate changing the interface from changing the implementation. So:

1. First, encapsulate vint64 into an abstract data type (ADT) with an entirely functional interface – un-expose the guts.

2. Then, change the implementation (struct to scalar), changing the ADT methods without disturbing any of the outside calls to them – if you have to do the latter, you failed step 1 and have to clean up your abstract data type.

3. Finally, hand-expand the function calls to native C scalar operations. Now you no longer have an ADT, but that’s OK; it was scaffolding. You knew you were going to discard it.

The goal is that at each step it should be possible, and relatively easy to eyeball-check that the transformation you did is correct. Helps a lot to have unit tests for the code you’re modifying – then, one of your checks is that the unit tests don’t go sproing at any step. If you don’t have unit tests, write them. They’ll save your fallible ass. The better your unit tests are, the more time and pain you’ll save yourself in the long run.

OK, so here’s you chance to catch me in a mistake.

https://gitlab.com/NTPsec/ntpsec/commit/13fa1219f94d2b9ec00ae409ac4b54ee12b1e93f

That is the diff where I pull all the vint64 guts exposure into a ADT (done with macro calls, not true functions, but that’s a C implementation detail).

Can you find an error in this diff? If you decide not, how did it convince you? What properties of the diff are important?

(Don’t pass over that last question lightly. It’s central.)

If you’re feeling brave, try step 2. Start with ‘typedef uint64_t vint4;’, replacing the structure definition, and rewrite the ten macros near the beginning of the diff. (Hint: you’ll need two sets of them.)

Word to the newbies: this is how it’s done. Train your brain so that you analyze programming this way – mostly smooth sequences of refactoring steps with only occasional crisis points where you add a feature or change an assumption.

When you can do this at a microlevel, with code, you are inhabiting the mindset of a master programmer. When you can do it with larger design elements – data structures and entire code subsystems – you are getting inside system-architect skills.

Posted Sat Sep 17 12:30:12 2016 Tags:

libinput's touchpad acceleration is the cause for a few bugs and outcry from a quite vocal (maj|in)ority. A common suggestion is "make it like the synaptics driver". So I spent a few hours going through the pointer acceleration code to figure out what xf86-input-synaptics actually does (I don't think anyone knows at this point) [1].

If you just want the TLDR: synaptics doesn't use physical distances but works in device units coupled with a few magic factors, also based on device units. That pretty much tells you all that's needed.

Also a disclaimer: the last time some serious work was done on acceleration was in 2008/2009. A lot of things have changed since and since the server is effectively un-testable, we ended up with the mess below that seems to make little sense. It probably made sense 8 years ago and given that most or all of the patches have my signed-off-by it must've made sense to me back then. But now we live in the glorious future and holy cow it's awful and confusing.

Synaptics has three options to configure speed: MinSpeed, MaxSpeed and AccelFactor. The first two are not explained beyond "speed factor" but given how accel usually works let's assume they all somewhoe should work as a multiplication on the delta (so a factor of 2 on a delta of dx/dy gives you 2dx/2dy). AccelFactor is documented as "acceleration factor for normal pointer movements", so clearly the documentation isn't going to help clear any confusion.

I'll skip the fact that synaptics also has a pressure-based motion factor with four configuration options because oh my god what have we done. Also, that one is disabled by default and has no effect unless set by the user. And I'll also only handle default values here, I'm not going to get into examples with configured values.

Also note: synaptics has a device-specific acceleration profile (the only driver that does) and thus the acceleration handling is split between the server and the driver.

Ok, let's get started. MinSpeed and MaxSpeed default to 0.4 and 0.7. The MinSpeed is used to set constant acceleration (1/min_speed) so we always apply a 2.5 constant acceleration multiplier to deltas from the touchpad. Of course, if you set constant acceleration in the xorg.conf, then it overwrites the calculated one.

MinSpeed and MaxSpeed are mangled during setup so that MaxSpeed is actually MaxSpeed/MinSpeed and MinSpeed is always 1.0. I'm not 100% why but the later clipping to the min/max speed range ensures that we never go below a 1.0 acceleration factor (and thus never decelerate).

The AccelFactor default is 200/diagonal-in-device-coordinates. On my T440s it's thus 0.04 (and will be roughly the same for most PS/2 Synaptics touchpads). But on a Cyapa with a different axis range it is 0.125. On a T450s it's 0.035 when booted into PS2 and 0.09 when booted into RMI4. Admittedly, the resolution halfs under RMI4 so this possibly maybe makes sense. Doesn't quite make as much sense when you consider the x220t which also has a factor of 0.04 but the touchpad is only half the size of the T440s.

There's also a magic constant "corr_mul" which is set as:


/* synaptics seems to report 80 packet/s, but dix scales for
* 100 packet/s by default. */
pVel->corr_mul = 12.5f; /*1000[ms]/80[/s] = 12.5 */
It's correct that the frequency is roughly 80Hz but I honestly don't know what the 100packet/s reference refers to. Either way, it means that we always apply a factor of 12.5, regardless of the timing of the events. Ironically, this one is hardcoded and not configurable unless you happen to know that it's the X server option VelocityScale or ExpectedRate (both of them set the same variable).

Ok, so we have three factors. 2.5 as a function of MaxSpeed, 12.5 because of 80Hz (??) and 0.04 for the diagonal.

When the synaptics driver calculates a delta, it does so in device coordinates and ignores the device resolution (because this code pre-dates devices having resolutions). That's great until you have a device with uneven resolutions like the x220t. That one has 75 and 129 units/mm for x and y, so for any physical movement you're going to get almost twice as many units for y than for x. Which means that if you move 5mm to the right you end up with a different motion vector (and thus acceleration) than when you move 5mm south.

The core X protocol actually defines who acceleration is supposed to be handled. Look up the man page for XChangePointerControl(), it sets a threshold and an accel factor:

The XChangePointerControl function defines how the pointing device moves. The acceleration, expressed as a fraction, is a multiplier for movement. For example, specifying 3/1 means the pointer moves three times as fast as normal. The fraction may be rounded arbitrarily by the X server. Acceleration only takes effect if the pointer moves more than threshold pixels at once and only applies to the amount beyond the value in the threshold argument.
Of course, "at once" is a bit of a blurry definition outside of maybe theoretical physics. Consider the definition of "at once" for a gaming mouse with 500Hz sampling rate vs. a touchpad with 80Hz (let us fondly remember the 12.5 multiplier here) and the above description quickly dissolves into ambiguity.

Anyway, moving on. Let's say the server just received a delta from the synaptics driver. The pointer accel code in the server calculates the velocity over time, basically by doing a hypot(dx, dy)/dtime-to-last-event. Time in the server is always in ms, so our velocity is thus in device-units/ms (not adjusted for device resolution).

Side-note: the velocity is calculated across several delta events so it gets more accurate. There are some checks though so we don't calculate across random movements: anything older than 300ms is discarded, anything not in the same octant of movement is discarded (so we don't get a velocity of 0 for moving back/forth). And there's two calculations to make sure we only calculate while the velocity is roughly the same and don't average between fast and slow movements. I have my doubts about these, but until I have some more concrete data let's just say this is accurate (altough since the whole lot is in device units, it probably isn't).

Anyway. The velocity is multiplied with the constant acceleration (2.5, see above) and our 12.5 magic value. I'm starting to think that this is just broken and would only make sense if we used a delta of "event count" rather than milliseconds.

It is then passed to the synaptics driver for the actual acceleration profile. The first thing the driver does is remove the constant acceleration again, so our velocity is now just v * 12.5. According to the comment this brings it back into "device-coordinate based velocity" but this seems wrong or misguided since we never changed into any other coordinate system.

The driver applies the accel factor (0.04, see above) and then clips the whole lot into the MinSpeed/MaxSpeed range (which is adjusted to move MinSpeed to 1.0 and scale up MaxSpeed accordingly, remember?). After the clipping, the pressure motion factor is calculated and applied. I skipped this above but it's basically: the harder you press the higher the acceleration factor. Based on some config options. Amusingly, pressure motion has the potential to exceed the MinSpeed/MaxSpeed options. Who knows what the reason for that is...

Oh, and btw: the clipping is actually done based on the accel factor set by XChangePointerControl into the acceleration function here. The code is


double acc = factor from XChangePointerControl();
double factor = the magic 0.04 based on the diagonal;

accel_factor = velocity * accel_factor;
if (accel_factor > MaxSpeed * acc)
accel_factor = MaxSpeed * acc;
So we have a factor set by XChangePointerControl() but it's only used to determine the maximum factor we may have, and then we clip to that. I'm missing some cross-dependency here because this is what the GUI acceleration config bits hook into. Somewhere this sets things and changes the acceleration by some amount but it wasn't obvious to me.

Alrighty. We have a factor now that's returned to the server and we're back in normal pointer acceleration land (i.e. not synaptics-specific). Woohoo. That factor is averaged across 4 events using the simpson's rule to smooth out aprupt changes. Not sure this really does much, I don't think we've ever done any evaluation on that. But it looks good on paper (we have that in libinput as well).

Now the constant accel factor is applied to the deltas. So far we've added the factor, removed it (in synaptics), and now we're adding it again. Which also makes me wonder whether we're applying the factor twice to all other devices but right now I'm past the point where I really want to find out . With all the above, our acceleration factor is, more or less:


f = units/ms * 12.5 * (200/diagonal) * (1.0/MinSpeed)
and the deltas we end up using in the server are

(dx, dy) = f * (dx, dy)
But remember, we're still in device units here (not adjusted for resolution).

Anyway. You think we're finished? Oh no, the real fun bits start now. And if you haven't headdesked in a while, now is a good time.

After acceleration, the server does some scaling because synaptics is an absolute device (with axis ranges) in relative mode [2]. Absolute devices are mapped into the whole screen by default but when they're sending relative events, you still want a 45 degree line on the device to map into 45 degree cursor movement on the screen. The server does this by adjusting dy in-line with the device-to-screen-ratio (taking device resolution into account too). On my T440s this means:


touchpad x:y is 1:1.45 (16:11)
screen is 1920:1080 is 1:177 (16:9)

dy scaling is thus: (16:11)/(16:9) = 9:11 -> y * 11/9
dx is left as-is. Now you have the delta that's actually applied to the cursor. Except that we're in device coordinates, so we map the current cursor position to device coordinates, then apply the delta, then map back into screen coordinates (i.e. pixels). You may have spotted the flaw here: when the screen size changes, the dy scaling changes and thus the pointer feel. Plug in another monitor, and touchpad acceleration changes. Also: the same touchpad feels different on laptops when their screen hardware differs.

Ok, let's wrap this up. Figuring out what the synaptics driver does is... "tricky". It seems much like a glorified random number scheme. I'm not planning to implement "exactly the same acceleration as synaptics" in libinput because this would be insane and despite my best efforts, I'm not that yet. Collecting data from synaptics users is almost meaningless, because no two devices really employ the same acceleration profile (touchpad axis ranges + screen size) and besides, there are 11 configuration options that all influence each other.

What I do plan though is collect more motion data from a variety of touchpads and see if I can augment the server enough that I can get a clear picture of how motion maps to the velocity. If nothing else, this should give us some picture on how different the various touchpads actually behave.

But regardless, please don't ask me to "just copy the synaptics code".

[1] fwiw, I had this really great idea of trying to get behind all this, with diagrams and everything. But then I was printing json data from the X server into the journal to be scooped up by sed and python script to print velocity data. And I questioned some of my life choices.
[2] why the hell do we do this? because synaptics at some point became a device that announce the axis ranges (seemed to make sense at the time, 2008) and then other things started depending on it and with all the fixes to the server to handle absolute devices in relative mode (for tablets) we painted ourselves into a corner. Synaptics should switch back to being a relative device, but last I tried it breaks pointer acceleration and that a) makes the internets upset and b) restoring the "correct" behaviour is, well, you read the article so far, right?

Posted Fri Sep 16 06:00:00 2016 Tags:

This last week has not been kind to the Great Beast of Malvern. Serenity is restored now, but there was drama and (at the last) some rather explosive humor.

For some time the Beast had been having occasional random flakeouts apparently related to the graphics card. My monitors would go black – machine still running but no video. Some consultation with my Beastly brains trust (Wendell Wilson, Phil Salkie, and John D. Bell) turned up a suitable replacement, a Radeon R360 R7 that was interesting because it can drive three displays (I presently drive two and aim to upgrade).

Last Friday I tried to upgrade to the new card. To say it went badly would be to wallow in understatement. While I was first physically plugging it in, I lost one of the big knurled screws that the Beast’s case uses for securing both cards and case, down behind the power supply. Couldn’t get it to come out of there.

Then I realized that the card needed a PCI-Express power tap and oh shit the card vendor hadn’t provided one.

Much frantic running around to local computer stores ensued, because I did not yet know that Wendell had thoughtfully tucked several spares of the right kind of cable behind the disk drive bays when he built the Beast. Which turns out to matter because though the PCI-E end is standardized, the power supply end is not and they have vendor-idiosyncratic plugs.

Eventually I gave up and tried to put the old card back in. And that’s when the real fun began. I broke the retaining toggle on the graphics card’s slot while trying to haggle the new card out. When I tried to boot the machine with the old card plugged back in, my external UPS squealed – and then nothing. No on-board lights, no post beep, no sign of life at all. I knew what that meant; evidently either the internal PSU or the mobo was roached.

Exhausted, pissed off, and flagellating myself for my apparent utter incompetence, I went to bed. Next morning I called Phil and said “I have a hardware emergency.” Phil, bless him, was over here within a couple of hours with a toolkit and a spare Corsair PSU.

I explained the whole wretched sequence of events including the lost case screw and the snapped retaining clip and the external UPS squealing and the machine going dead, and Phil said “First thing we’ll do is get that case screw out of there.” He then picked up the Beast’s case and began shaking it around at various angles.

And because Phil’s hardware-fu is vastly greater than mine, we heard rattling and saw a screw drop into view in short order. But it was the wrong screw! Not the big knurled job I’d dropped earlier but a smaller one.

“Aha!” says Phil. “That’s a board-mount screw.” Sure enough we quickly found that the southeast corner of the mobo had a bare hole where its securing screw ought to be. I figured out what must have happened almost as soon as Phil did; we gazed at each other with a wild surmise. That screw had either worked itself loose or already been loose due to an assembly error, and it had fallen down behind the motherboard.

Where it had bothered nobody, until sometime during my attempt to change cards I inadvertently jostled it into a new position and that little piece of conductive metal shorted out my fscking motherboard. The big knurled screw (which he shook out a few seconds later) couldn’t have done it – that thing was too large to fit where it could do damage.

Phil being Phil, he had my NZXT PSU out of the case and apart almost as fast as he could mutter “I’m going to void your warranty.” Sure enough, the fuse was blown.

This was good on one level, because it meant the mobo probably wasn’t. And indeed when we dropped in Phil’s Corsair the Beast (and the new card, and its monitors!) powered up just fine. And that was a relief, because the ASUS X99 motherboard is hellaciously more expensive than the PSU.

Almost as much of a relief was the realization that I hadn’t been irredeemably cackhanded and fatally damaged the Beast through sheer fucking incompetence. Hadn’t been for that little screw, all might have gone smoothly.

I also got an informative lecture on why the innards of the PSU looked kinda dodgy. Hotglue in excessive use, components really crowded in, flywiring (that’s when you have wires unsupported in space except by their ends, looping or arcing around components, rather than being properly snugged down).

But Phil was also puzzled. “Why didn’t this thing crowbar?” he wondered. The fuse, you see, is a second-order backup. When a short draws too much power, the PSU is supposed to shut itself down before there’s been time for the fuse to blow.

Phil took it home to replace the fuse with a circuit breaker, leaving the Corsair in the Beast. Which is functioning normally and allowing me to write this blog post, none the worse for wear except for the broken retaining clip.

He texted me this morning. Here’s how it went, effectively verbatim. I feel this truly deserves to be preserved for posterity:

Phil: So, I did a very nice repair job, installing the circuit breaker in the power supply. Then, as I was about to connect it up to my machine, I thought “you know, it really _should_ have crowbarred – not blown a fuse.” So, I powered it up sitting on the floor instead, saying to Ariel “either it’ll work, or it’ll catch fire.”

Phil: So, you may as well hang on to the Corsair power supply – I’ve already picked up a replacement to go in my machine.

Me: Are you telling me it caught fire?

Phil: No, no, no, no, no! Well, a little bit, yes. Just some sparks, and a small amount of smoke – no actual _flames_ as such, not on the outside of the box, at least. Hardly a moment’s trouble, really, once it cooled down enough to toss it in the bin…

Me: The magic smoke got out.

Phil: You might well say that, yes. In fact, all the smoke got out – the magic, the mystic, the mundane, all gone up in, well, smoke, as it were.

Me: “No actual _flames_ as such, not on the outside” Cathy was highly amused. I suspect she was visualizing you behind a blast shield, a la Jamie Hyneman.

Phil: I was trying more for Monty Python’s dead parrot sketch. In retrospect, a blast shield may have been warranted, had I only thought to use one…

Me: What about all those extra cables in the Beast? Are they obsolete now?

Phil: Yes, yes they are. At some convenient point I’ll tidy up all the cabling and make sure you have proper spares for the next time we dare disturb the Beast…

No, I did not make any of this up. Yes, Phil is really like that. And thus endeth the tale of the Trials of the Beast.

Posted Tue Sep 13 09:32:36 2016 Tags:
In case someone in interested, I’ve uploaded a one-page PDF with a cheat sheet for the x86 instruction encoding. I use it all the time to the point that I have a laminated copy on my desk. Just in case someone else finds it useful.
Posted Sat Sep 10 17:33:52 2016 Tags: