The lower-post-volume people behind the software in Debian. (List of feeds.)

You’ve heard me uttering teasers about it for months. Now it’s here. The repository is available for cloning; we’re shipping the 0.9.0 beta of NTPsec. You can browse the web pages or clone the git repository by one of several methods. You can “wget” to get a tarball.

This is an initial beta and has some rough edges, mostly due to the rather traumatic (but utterly necessary) replacement of the autoconf build system. Also, our range of ports is still narrow; if you’re on anything but Linux or a recent FreeBSD the build may not work for you yet. These things will be fixed.

However, the core function – syncing your clock via NTP – is solid, and using 0.9.0 for production might be judged a bit adventurous but wouldn’t be crazy. The next few beta releases will rapidly get more polished. Expect them to come quickly, like within weeks.

Most of the changes are under the hood and not user-visible. A few auxiliary tools have been renamed, most notably sntp to ntpdig. If you read documentation, you will notice that what’s there has been massively revised and improved.

The most important change you can’t see is that the code has been very seriously security-hardened, not only by plugging all publicly disclosed holes but by internal preventive measures to close off entire classes of vulnerabilities (by, for example, replacing all function calls that can produce buffer overruns with memory-safe equivalents.)

We’ve already established good relations with security-research and InfoSec communities. Near-future releases will include security fixes currently under embargo.

If you consider this work valuable, please support it by contributing at my Patreon page.

Posted Mon Nov 16 19:28:29 2015 Tags:

The hacker culture, and STEM in general, are under ideological attack. Recently I blogged a safety warning that according to a source I consider reliable, a “women in tech” pressure group has made multiple efforts to set Linus Torvalds up for a sexual assault accusation. I interpreted this as an attempt to beat the hacker culture into political pliability, and advised anyone in a leadership position to beware of similar attempts.

Now comes Roberto Rosario of the Django Software Foundation. Django is a web development framework that is a flourishing and well-respected part of the ecology around the of the Python language. On October 29th 2015 he reported that someone posting as ‘djangoconcardiff’ opened an issue against pull request #176 on ‘awesome-django’, addressing it to Rosario. This was the first paragraph.


great project!! I have one observation and a suggestion. I noticed that you have rejected some pull requests to add some good django libraries and that the people submitting thsoe pull requests are POCs (People of Colour). As a suggestion I recommend adopting the Contributor Code of Conduct ( to ensure everyone’s contributions are accepted regarless [sic] of their sex, sexual orientation, skin color, religion, height, place of origin, etc. etc. etc. As a white straight male and lead of this trending repository, your adoption of this Code of Conduct will send a loud and clear message that inclusion is a primary objective of the Django community and of the software development community in general. D.

Conversation on that issue is preserved in the Twitter link above, but the issue itself in GitHub has apparently been deleted in its totality. Normally, only GitHub staff can do this. A copy is preserved here.

It is unknown who was speaking as ‘djangoconcardiff’, and that login has now been deleted, like the GitHub issue. (DjangoCon Europe 2015 was this past May/June in Cardiff.)

The slippery, Newspeak-like quality of djangoconcardiff’s “suggestion” makes it hard to pin down from the text itself whether he/she is merely stumping for inclusiveness or insinuating that rejection of pull requests by “persons of color” is itself evidence of racism and thoughtcrime.

But, if you think you’re reading that ‘djangoconcardiff’ considers acceptance of pull requests putatively from “persons of color” to be politically mandatory, a look at the Contributor Covenant he/she advocates will do nothing to dissuade you. Paragraph 2 denounces the “pervasive cult of meritocracy”.

It is clear that djangoconcardiff and the author of the Covenant (self-described transgender feminist Coraline Ada Ehmke) want to replace the “cult of meritocracy” with something else. And equally clear that what they want to replace it with is racial and sexual identity politics.

Rosario tagged his Twitter report “Social Justice in action!” He knows who these people are: SJWs, “Social Justice Warriors”. And, unless you have been living under a rock, so do you. These are the people – the political and doctrinal tendency, united if in no other way by an elaborate shared jargon and a seething hatred of djangoconcardiff’s “white straight male”, who recently hounded Nobel laureate Tim Hunt out of his job with a fraudulent accusation of sexist remarks.

I’m not going to analyze SJW ideology here except to point out, again, why the hacker culture must consider anyone who holds it an enemy. This is because we must be a cult of meritocracy. We must constantly demand merit – performance, intelligence, dedication, and technical excellence – of ourselves and each other.

Now that the Internet – the hacker culture’s creation! – is everywhere, and civilization is increasingly software-dependent, we have a duty, the duty I wrote about in Holding Up The Sky. The invisible gears have to turn. The shared software infrastructure of civilization has to work, or economies will seize up and people will die. And for large sections of that infrastructure, it’s on us – us! – to keep it working. Because nobody else is going to step up.

We dare not give less than our best. If we fall away from meritocracy – if we allow the SJWs to remake us as they wish, into a hell-pit of competitive grievance-mongering and political favoritism for the designated victim group of the week – we will betray not only what is best in our own traditions but the entire civilization that we serve.

This isn’t about women in tech, or minorities in tech, or gays in tech. The hacker culture’s norm about inclusion is clear: anybody who can pull the freight is welcome, and twitching about things like skin color or shape of genitalia or what thing you like to stick into what thing is beyond wrong into silly. This is about whether we will allow “diversity” issues to be used as wedges to fracture our community, degrade the quality of our work, and draw us away from our duty.

When hackers fail our own standards of meritocracy, as we sometimes do, it’s up to us to fix it from within our own tradition: judge by the work alone, you are what you do, shut up and show us the code. A movement whose favored tools include the rage mob, the dox, and faked incidents of bigotry is not morally competent to judge us or instruct us.

I have been participating in and running open-source projects for a quarter-century. In all that time I never had to know or care whether my fellow contributors were white, black, male, female, straight, gay, or from the planet Mars, only whether their code was good. The SJWs want to make me care; they want to make all of us obsess about this, to the point of having quotas and struggle sessions and what amounts to political officers threatening us if we are insufficiently “diverse”.

Think I’m exaggerating? Read the whole djangoconcardiff thread. What’s there is totalitarianism in miniature: ideology is everything, merit counts for nothing against the suppression of thoughtcrime, and politics is conducted by naked intimidation against any who refuse to conform. Near the end of the conversation djangoconcardiff threatens to denounce Rosario to the board of the Django Software Foundation in the confused, illiterate, vicious idiom of an orc or a stormtrooper.

It has been suggested that djangoconcardiff might be a troll emulating an SJW, and we should thus take him less seriously. The problem with this idea is that no SJW disclaimed him – more generally, that “Social Justice” has reached a sort of Poe’s Law singularity at which the behavior of trolls and true believers becomes indistinguishable even to each other, and has the same emergent effects.

In the future, the hacker whose community standing the SJWs threaten could be you. The SJWs talk ‘diversity’ but like all totalitarians they measure success only by total ideological surrender – repeating their duckspeak, denouncing others for insufficent political correctness, loving Big Brother. Not being a straight white male won’t save you either – Roberto Rosario is an Afro-Hispanic Puerto Rican.

We must cast these would-be totalitarians out – refuse to admit them on any level except by evaluating on pure technical merit whatever code patches they submit. We must refuse to let them judge us, and learn to recognize their thought-stopping jargon and kafkatraps as a clue that there is no point in arguing with them and the only sane course is to disengage. We can’t fix what’s broken about the SJWs; we can, and must, refuse to let them break us.

(Roberto Rosario, Meredith L. Patterson, and Rick Moen assisted in the composition of this post. However, any errors are the sole responsibility of the author.)

Posted Fri Nov 13 20:10:46 2015 Tags:

I find myself in the embarrassing position of having generated a theoretical insight for a movement I don’t respect very much.

My feelings about the “Red Pill” movement are a lot like my feelings about feminism. Both started out asking important questions about why men and women treat each other badly. Early on, both began to develop some answers that made sense. Later, both movements degenerated – hijacked by whiny, broken, hating people who first edged into outright craziness and then increasingly crossed that line.

But the basic question that motivated the earliest Red-Pill/PUA analysis remains: why do so many women say they want nice guys and then sexually reward arrogant jerks? And the answer has a lot of staying power. Women are instinctive hypergamists who home in on dominance signaling the way men home in on physical pulchritude. And: they’re self-deceivers – their optimal mating strategy is to sincerely promise fidelity to hook a good-provider type while actually being willing to (a) covertly screw any sexy beast who wanders by in order to capture genetic diversity for their offspring, and (b) overtly trade up to a more dominant male when possible.

(This is really complicated compared to the optimal male strategy, which is basically to both find a fertile hottie you think you can keep faithful and screw every other female you can tap without getting killed in hopes of having offspring at the expense of other men.)

What I’ve figured out recently is that there’s another turn of the vise. Sorry, nice-guy betas; you’re even more doomed than the basic theory predicts.

There’a a social-status component to the female game; using it to compete for the attention of fit males. Women are very, very concerned with how their mating value is perceived by others in their social group – they will take extreme measures all the way up to plastic surgery to boost it. Also, female mating value is increased by social status, even though status is not as overwhelmingly important as for males.

Some time back, I tripped over someone else’s realization that this gives women an incentive to be publicly cruel when they reject suitors.

A man courting a woman is implicitly making a status claim: I am good enough for you – in Red Pill terminology, my SMV (sexual market value) meets or exceeds yours. Because other women use male attention to measure SMV and status, such a claim can be threatening to its target because, from a low-status male, it threatens to lower her status, especially if she accepts it.

A woman can deal with this by not merely rejecting a man she evaluates as not being worthy, but publicly insulting him for trying. “How dare you think you’re good enough for me?” is different from a simple “Not interested” because it’s a status defense.

Thus, hot chicks are systematically cruel to beta nerds. It’s a way of socially protecting the proposition that their SMV is high enough to capture a real alpha, and their status among peers.

But – and here’s my insight – it’s even worse than that.

Consider two cases. Bob is slightly lower status than Alice. Ted is much lower status than Alice. Both of them court Alice. She doesn’t think either has SMV to match hers, so her response is to reject both. But: Which one is the bigger status threat?

No, it’s not Ted. The status difference between him and Alice is quite visible to her peers; he can be easily dismissed as just nuts for pitching out of his league. Bob, on the other hand, may look plausible – and the closer to good enough he looks, the more likely it is that the status claim he makes by courting Alice will adjust her status downwards among her peers.

So it’s Bob who will get the cruel, status-defensive rejection, not Ted.

That’s right, guys – being in her league, or nearly so, increases the chance that she’ll have to be nasty to you to protect her game position. The well-spoken, decently groomed nerd is going to get it in the neck from popular hot chick the worst.

However, this analysis does present actionable advice. Because being Bob – being nearly good enough – also increases your odds of being able to raise the SMV she perceives just enough to connect. The advice is: don’t be a social threat. Pitch her privately, not publicly. Give her deniability on your status claim.

At the very least this will give her room to consider whether she likes you without being socially panicked about being seen with the wrong guy.

Posted Sun Nov 8 17:11:16 2015 Tags:

I received a disturbing warning today from a source I trust.

The short version is: if you are any kind of open-source leader or senior figure who is male, do not be alone with any female, ever, at a technical conference. Try to avoid even being alone, ever, because there is a chance that a “women in tech” advocacy group is going to try to collect your scalp.

IRC conversation, portions redacted to protect my informant, follows.

15:17:58 XXXXXXXXXXXX | I'm super careful about honey traps.  For a 
                      | while, that's how the Ada Initiative was    
                      | trying to pre-generate outrage and collect  
                      | scalps.                                     
15:18:12          esr | REALLY?                                    
15:18:22          esr | That's perverse.                           
15:18:42 XXXXXXXXXXXX | Yeah, because the upshot is, I no longer   
                      | can afford to mentor women who are already 
                      | in tech.                                    
15:18:54          esr | Right.                                     
15:19:01 XXXXXXXXXXXX | I can and do mentor ones who are not in    
                      | it, but are interested and able            
15:19:21 XXXXXXXXXXXX | but once one is already in...  nope        
15:20:08 XXXXXXXXXXXX | The MO was to get alone with the target,   
                      | and then immediately after cry "attempted  
                      | sexual assault".                            
15:23:27          esr | When the backlash comes it's going to be   
                      | vicious.  And women who were not part of   
                      | this bullshit will suffer for it.          
15:23:41 XXXXXXXXXXXX | I can only hope.                            
15:25:21          esr | Ah. On the "Pour encourager les autres"    
                      | principle?  I hadn't thought of that.      
                      | Still damned unfortunate, though.          
15:26:40 XXXXXXXXXXXX | Linus is never alone at any conference.    
                      | This is not because he lets fame go to his
                      | head and likes having a posse around.      
15:26:54 XXXXXXXXXXXX | They have made multiple runs at him.       
15:27:29          esr | Implied warning noted.                      
15:27:34            * | XXXXXXXXXXXX nods

An A&D regular who is not myself was present for this conversation, but I’ll let him choose whether to confirm his presence and the content.

“They have made multiple runs at him.” Just let the implications of that sink in for a bit. If my source is to be believed (and I have found him both well-informed and completely trustworthy in the past) this was not a series of misunderstandings, it was a deliberately planned and persistent campaign to frame Linus and feed him to an outrage mob.

I have to see it as an an attempt to smear and de-legitimize the Linux community (and, by extension, the entire open-source community) in order to render it politically pliable.

Linus hasn’t spoken out about this; I can think of several plausible and good reasons for that. And the Ada Initiative shut down earlier this year. Nevertheless, this report is consistent with reports of SJW dezinformatsiya tactics from elsewhere and I think it would be safest to assume that they are being replicated by other women-in-tech groups.

(Don’t like that, ladies? Tough. You were just fine with collective guilt when the shoe was on the other foot. Enjoy your turn!)

I’m going to take my source’s implied advice. And view “sexual assault” claims fitting this MO with extreme skepticism in the future.

Posted Tue Nov 3 21:31:44 2015 Tags:
So, you validated your list of SMILES in the paper you were planning to use (or about to submit), and you found a shortlist of SMILES strings that do not look right. Well, let's visualize them.

We all used to use the Daylight Depict tool, but this is no longer online. I blogged previously already about using AMBIT for SMILES depiction (which uses various tools for depiction; doi:10.1186/1758-2946-3-18), but now John May released a CDK-only tool, called CDK Depict. The download section offers a jar file and a war for easy deployment in a Tomcat environment. But for the impatient, there is also this online host where you can give it a try (it may go offline at some point?).

Just copy/paste your shortlist there, and visually see what is wrong with them :) Big HT to John for doing all these awesome things!
Posted Sat Oct 31 14:35:00 2015 Tags:
Source. License: CC-BY 2.0.
When you stumble upon a nice paper describing a new predictive or explanatory model for a property or a class of compounds that has your interest, the first thing you do is test the training data. For example, validating SMILES (or OpenSMILES) strings in such data files is now easy with the many Open Source tools that can parse SMILES strings: the Chemistry Toolkit Rosetta provides many pointers for parsing SMILES strings. I previously blogged about a CDK/Groovy approach.

Cheminformatics toolkits need to understand what the input is, in order to correctly calculate descriptors. So, let's start there. It does not matter so much which toolkit you use and I will use the Chemistry Development Kit (doi:10.1021/ci025584y) here to illustrate the approach.

Let's assume we have a tab-separated values file, with the compound identifier in the first column and the SMILES in the second column. That can easily be parsed in Groovy. For each SMILES we parse it and determine the CDK atom types. For validation of the supplementary information we only want to report the fails, but let's first show all atom types:

import org.openscience.cdk.smiles.SmilesParser;
import org.openscience.cdk.silent.SilentChemObjectBuilder;
import org.openscience.cdk.atomtype.CDKAtomTypeMatcher;

parser = new SmilesParser(
matcher = CDKAtomTypeMatcher.getInstance(

new File("suppinfo.tsv").eachLine { line ->
  fields = line.split(/\t/)
  id = fields[0]
  smiles = fields[1]
  if (smiles != "SMILES") { // header line
    mol = parser.parseSmiles(smiles)
    println "$id -> $smiles";

    // check CDK atom types
    types = matcher.findMatchingAtomTypes(mol);
    types.each { type ->
      if (type == null) {
        report += "  no CDK atom type\n"
      } else {
        println "  atom type: " + type.atomTypeName

This gives output like:

mo1 -> COC
  atom type: C.sp3
  atom type: O.sp3
  atom type: C.sp3

If we rather only report the errors, we make some small modifications and do something like:

new File("suppinfo.tsv").eachLine { line ->
  fields = line.split(/\t/)
  id = fields[0]
  smiles = fields[1]
  if (smiles != "SMILES") {
    mol = parser.parseSmiles(smiles)
    errors = 0
    report = ""

    // check CDK atom types
    types = matcher.findMatchingAtomTypes(mol);
    types.each { type ->
      if (type == null) {
        errors += 1;
        report += "  no CDK atom type\n"

    // report
    if (errors > 0) {
      println "$id -> $smiles";
      print report;

Alternatively, you can use the InChI library to do such checking. And here too, we will use the CDK and the CDK-InChI integration (doi:10.1186/1758-2946-5-14).

factory = InChIGeneratorFactory.getInstance();

new File("suppinfo.tsv").eachLine { line ->
  fields = line.split(/\t/)
  id = fields[0]
  smiles = fields[1]
  if (smiles != "SMILES") {
    mol = parser.parseSmiles(smiles)

    // check InChI warnings
    generator = factory.getInChIGenerator(mol);
    if (generator.returnStatus != INCHI_RET.OKAY) {
      println "$id -> $smiles";
      println generator.message;

The advantage of doing this, is that it will also give warnings about stereochemistry, like:

mol2 -> BrC(I)(F)Cl
  Omitted undefined stereo

I hope this gives you some ideas on what to do with content in supplementary information of QSAR papers. Of course, this works just as well for MDL molfiles. What kind of validation do you normally do?
Posted Sat Oct 31 09:07:00 2015 Tags:

Here’e where I attempt to revive and popularize a fine old word in a new context.

hieratic, adj. Of or concerning priests; priestly. Often used of the ancient Egyptian writing system of abridged hieroglyphics used by priests.

Earlier today I was criticizing the waf build system in email. I wanted to say that its documentation exhibits a common flaw, which is that it reads not much like an explanation but as a memory aid for people who are already initiates of its inner mysteries. But this was not the main thrust of my argument; I wanted to observe it as an aside.

Here’s what I ended up writing:

waf notation itself relies on a lot of aspect-like side effects and spooky action at a distance. It has much poorer locality and compactness than plain Makefiles or SCons recipes. This is actually waf’s worst downside, other perhaps than the rather hieratic documentation.

I was using “hieratic” in a sense like this:

hieratic, adj. Of computer documentation, impenetrable because the author never sees outside his own intimate knowledge of the subject and is therefore unable to identify or meet the expository needs of newcomers. It might as well be written in hieroglyphics.

Hieratic documentation can be all of complete, correct, and nearly useless at the same time. I think we need this word to distinguish subtle disasters like the waf book – or most of the NTP documentation before I got at it – from the more obvious disasters of documentation that is incorrect, incomplete, or poorly written simply considered as expository prose.

Posted Fri Oct 30 15:21:41 2015 Tags:

An expectation of casual, cynical lying has taken over American political culture. Seldom has this been more obviously displayed than Barack Obama’s address to police chiefs in Chicago two days ago.

Here is what everyone in the United States of America except possibly a handful of mental defectives heard:

Obama’s anti-gun-rights base: “I’m lying. I’m really about the Australian-style gun confiscation I and my media proxies were talking up last week, and you know it. But we have to pretend so the knuckle-dragging mouth-breathers in flyover country will go back to sleep, not put a Republican in the White House in 2016, and not scupper our chances of appointing another Supreme Court justice who’ll burn a hole in the Bill of Rights big enough to let us take away their eeeeevil guns. Eventually, if not next year.”

Gun owners: “I’m lying. And I think you’re so fucking stupid that you won’t notice. Go back back to screwing your sisters and guzzling moonshine now, oh low-sloping foreheads, everything will be juuust fiiine.”

Everyone else: “This bullshit again?”

Of course, the mainstream media will gravely pretend to believe Obama, so that they can maintain their narrative that anyone who doesn’t is a knuckle-dragging, mouth-breathing, sister-fucking, Comfederate-flag-waving RAAACIST.

Posted Thu Oct 29 11:32:46 2015 Tags:

In the wake of the Ars Technica article on NTP vulnerabilities, and Slashdot coverage, there has been sharply increased public interest in the work NTPsec is doing.

A lot of people have gotten the idea that I’m engaged in a full rewrite of the code, however, and that’s not accurate. What’s actually going on is more like a really massive cleanup and hardening effort. To give you some idea how massive, I report that the codebase is now down to about 43% of the size we inherited – in absolute numbers, down from 227KLOC to 97KLOC.

Details, possibly interesting, follow. But this is more than a summary of work; I’m going to use it to talk about good software-engineering practice by example.

The codebase we inherited, what we call “NTP Classic”, was not horrible. When I was first asked to describe it, the first thought that leapt to my mind was that it looked like really good state-of-the-art Unix systems code – from 1995. That is, before API standardization and a lot of modern practices got rolling. And well before ubiquitous Internet made security hardening the concern it is today.

Dave Mills, the original designer and implementor of NTP, was an eccentric genius, an Internet pioneer, and a systems architect with vision and exceptional technical skills. The basic design he laid down is sound. But it was old code, with old-code problems that his successor (Harlan Stenn) never solved. Problems like being full of port shims for big-iron Unixes from the Late Cretaceous, and the biggest, nastiest autoconf hairball of a build system I’ve ever seen.

Any time you try to modify a codebase like that, you tend to find yourself up to your ass in alligators before you can even get a start on draining the swamp. Not the least of the problems is that a mess like that is almost forbiddingly hard to read. You may be able to tell there’s a monument to good design underneath all the accumulated cruft, but that’s not a lot of help if the cruft is overwhelming.

One thing the cruft was overwhelming was efforts to secure and harden NTP. This was a serious problem; by late last year (2014) NTP was routinely cracked and in use as a DDoS amplifier, with consequences Ars Technica covers pretty well.

I got hired (the details are complicated) because the people who brought me on believed me to be a good enough systems architect to solve the general problem of why this codebase had become such a leaky mess, even if they couldn’t project exactly how I’d do it. (Well, if they could, they wouldn’t need me, would they?)

The approach I chose was to start by simplifying. Chiseling away all the historical platform-specific cruft in favor of modern POSIX APIs, stripping the code to its running gears, tossing out as many superannuated features as I could, and systematically hardening the remainder.

To illustrate what I mean by ‘hardening’, I’ll quote the following paragraph from our hacking guide:

* strcpy, strncpy, strcat:  Use strlcpy and strlcat instead.
* sprintf, vsprintf: use snprintf and vsnprintf instead.
* In scanf and friends, the %s format without length limit is banned.
* strtok: use strtok_r() or unroll this into the obvious loop.
* gets: Use fgets instead. 
* gmtime(), localtime(), asctime(), ctime(): use the reentrant *_r variants.
* tmpnam() - use mkstemp() or tmpfile() instead.
* dirname() - the Linux version is re-entrant but this property is not portable.

This formalized an approach I’d used successfully on GPSD – instead of fixing defects and security holes after the fact, constrain your code so that it cannot have defects. The specific class of defects I was going after here was buffer overruns.

OK, you experienced C programmers out there are are thinking “What about wild-pointer and wild-index problems?” And it’s true that the achtung verboten above will not solve those kinds of overruns. But another prong of my strategy was systematic use of static code analyzers like Coverity, which actually is pretty good at picking up the defects that cause that sort of thing. Not 100% perfect, C will always allow you to shoot yourself in the foot, but I knew from prior success with GPSD that the combination of careful coding with automatic defect scanning can reduce the hell out of your bug load.

Another form of hardening is making better use of the type system to express invariants. In one early change, I ran through the entire codebase looking for places where integer flag variables could be turned into C99 booleans. The compiler itself doesn’t do much with this information, but it gives static analyzers more traction.

Back to chiseling away code. When you do that, and simultaneously code-harden, and use static analyzers, you can find yourself in a virtuous cycle. Simplification enables better static checking. The code becomes more readable. You can remove more dead weight and make more changes with higher confidence. You’re not just flailing.

I’m really good at this game (see: 57% of the code removed). I’m stating that to make a methodological point; being good at it is not magic. I’m not sitting on a mountaintop having satori, I’m applying best practices. The method is replicable. It’s about knowing what best practices are, about being systematic and careful and ruthless. I do have an an advantage because I’m very bright and can hold more complex state in my head than most people, but the best practices don’t depend on that personal advantage – its main effect is to make me faster at doing what I ought to be doing anyway.

A best practice I haven’t covered yet is to code strictly to standards. I’ve written before that one of our major early technical decisions was to assume up front that the entire POSIX.1-2001/C99 API would be available on all our target platforms and treat exceptions to that (like Mac OS X not having clock_gettime(2)) as specific defects that need to be isolated and worked around by emulating the standard calls.

This differs dramatically from the traditional Unix policy of leaving all porting shims back to the year zero in place because you never know when somebody might want to build your code on some remnant dinosaur workstation or minicomputer from the 1980s. That tradition is not harmless; the thicket of #ifdefs and odd code snippets that nobody has tested in Goddess knows how many years is a major drag on readability and maintainability. It rigidifies the code – you can wind up too frightened of breaking everything to change anything.

Little things also matter, like fixing all compiler warnings. I thought it was shockingly sloppy that the NTP Classic maintainers hadn’t done this. The pattern detectors behind those warnings are there because they often point at real defects. Also, voluminous warnings make it too easy to miss actual errors that break your build. And you never want to break your build, because later on that will make bisection testing more difficult.

Yet another important thing to do on an expedition like this is to get permission – or give yourself permission, or fscking take permission – to remove obsolete features in order to reduce code volume, complexity, and attack surface.

NTP Classic had two control programs for the main daemon, one called ntpq and one called ntpdc. ntpq used a textual (“mode 6”) packet protocol to talk to nptd; ntpdc used a binary one (“mode 7”). Over the years it became clear that ntpd’s handler coder code for mode 7 messages was a major source of bugs and security vulnerabilities, and ntpq mode 6 was enhanced to match its capabilities. Then ntpdc was deprecated, but not removed – the NTP Classic team had developed a culture of never breaking backward compatibility with anything.

And me? I shot ntpdc through the head specifically to reduce our attack surface. We took the mode 7 handler code out of ntpd. About four days later Cisco sent us a notice of critical DoS vulnerability that wasn’t there for us precisely because we had removed that stuff.

This is why ripping out 130KLOC is actually an even bigger win than the raw numbers suggest. The cruft we removed – the portability shims, the obsolete features, the binary-protocol handling – is disproportionately likely to have maintainability problems, defects and security holes lurking in it and implied by it. It was ever thus.

I cannot pass by the gains from taking a poleaxe to the autotools-based build system. It’s no secret that I walked into this project despising autotools. But the 31KLOC monstrosity I found would have justified a far more intense loathing than I had felt before. Its tentacles were everywhere. A few days ago, when I audited the post-fork commit history of NTP Classic in order to forward-port their recent bug fixes, I could not avoid noticing that a disproportionately large percentage of their commits were just fighting the build system, to the point where the actual C changes looked rather crowded out.

We replaced autotools with waf. It could have been scons – I like scons – but one of our guys is a waf expert and I don’t have time to do everything. It turns out waf is a huge win, possibly a bigger one than scons would have been. I think it produces faster builds than scons – it automatically parallelizes build tasks – which is important.

It’s important because when you’re doing exploratory programming, or mechanical bug-isolation procedures like bisection runs, faster builds reduce your costs. They also have much less tendency to drop you out of a good flow state.

Equally importantly, the waf build recipe is far easier to understand and modify than what it replaced. I won’t deny that waf dependency declarations are a bit cryptic if you’re used to plain Makefiles or scons productions (scons has a pretty clear readability edge over waf, with better locality of information) but the brute fact is this: when your build recipe drops from 31KLOC to 1.1KLOC you are winning big no matter what the new build engine’s language looks like,

The discerning reader will have noticed that though I’ve talked about being a systems architect, none of this sounds much like what you might think systems architects are supposed to do. Big plans! Bold refactorings! Major new features!

I do actually have plans like that. I’ll blog about them in the future. But here is truth: when you inherit a mess like NTP Classic (and you often will), the first thing you need to do is get it to a sound, maintainable, and properly hardened state. The months I’ve spent on that are now drawing to a close. Consequently, we have an internal schedule for first release; I’m not going to announce a date, but think weeks rather than months.

The NTP Classic devs fell into investing increasing effort merely fighting the friction of their own limiting assumptions because they lacked something that Dave Mills had and I have and any systems architect necessarily must have – professional courage. It’s the same quality that a surgeon needs to cut into a patient – the confidence, bordering on arrogance, that you do have what it takes to go in and solve the problem even if there’s bound to be blood on the floor before you’re done.

What justifies that confidence? The kind of best practices I’ve been describing. You have to know what you’re doing, and know that you know what you’re doing. OK, and I fibbed a little earlier. Sometimes there is a kind of Zen to it, especially on your best days. But to get to that you have to draw water and chop wood – you have to do your practice right.

As with GPSD, one of my goals for NTPsec is that it should not only be good software, but a practice model for how to do right. This essay. in addition to being a progress report, was intended as a down payment on that promise.

To support my work on NTPsec, please pledge at my Patreon or GoFundMe pages.

Posted Fri Oct 23 19:13:18 2015 Tags:

NTPsec is preparing for a release, which brought a question to the forefront of my mind. Are tarballs obsolete?

The center of the open-source software-release ritual used to be making a tarball, dropping it somewhere publicly accessible, and telling the world to download it.

But that was before two things happened: pervasive binary-package managers and pervasive git. Now I wonder if it doesn’t make more sense to just say “Here’s the name of the release tag; git clone and checkout”.

Pervasive binary package managers mean that, generally speaking, people no longer download source code unless they’re either (a) interested in modifying it, or (b) a distributor intending to binary-package it. A repository clone is certainly better for (a) and as good or better for (b).

(Yes, I know about source-based distributions, you can pipe down now. First, they’re too tiny a minority to affect my thinking. Secondly, it would be trivial for their build scripts to include a clone and pull.)

Pervasive git means clones are easy and fast even for projects with a back history as long as NTPsec’s. And we’ve long since passed the point where disk storage is an issue.

Here’s an advantage of the clone/pull distribution system; every clone is implicitly validated by its SHA1 hash chain. It would be much more difficult to insert malicious code in the back history of a repo than it is to bogotify a tarball, because people trying to push to the tip of a modified branch would notice sooner.

What use cases are tarballs still good for? Discuss,,,

Posted Thu Oct 22 00:31:39 2015 Tags:

On Thursday I was writing some code, and I wanted to test if an array was all zero.  First I checked if ccan/mem had anything, in case I missed it, then jumped on IRC to ask the author (and overall CCAN co-maintainer) David Gibson about it.

We bikeshedded around names: memallzero? memiszero? memeqz? memeqzero() won by analogy with the already-extant memeq and memeqstr. Then I asked:

rusty: dwg: now, how much time do I waste optimizing?
dwg: rusty, in the first commit, none

Exactly five minutes later I had it implemented and tested.

The Naive Approach: Times: 1/7/310/37064 Bytes: 50

bool memeqzero(const void *data, size_t length)
    const unsigned char *p = data;

    while (length) {
        if (*p)
            return false;
    return true;

As a summary, I’ve give the nanoseconds for searching through 1,8,512 and 65536 bytes only.

Another 20 minutes, and I had written that benchmark, and an optimized version.

128-byte Static Buffer: Times: 6/8/48/5872 Bytes: 108

Here’s my first attempt at optimization; using a static array of 128 bytes of zeroes and assuming memcmp is well-optimized for fixed-length comparisons.  Worse for small sizes, much better for big.

 const unsigned char *p = data;
 static unsigned long zeroes[16];

 while (length > sizeof(zeroes)) {
     if (memcmp(zeroes, p, sizeof(zeroes)))
         return false;
     p += sizeof(zeroes);
     length -= sizeof(zeroes);
 return memcmp(zeroes, p, length) == 0;

Using a 64-bit Constant: Times: 12/12/84/6418 Bytes: 169

dwg: but blowing a cacheline (more or less) on zeroes for comparison, which isn’t necessarily a win

Using a single zero uint64_t for comparison is pretty messy:

bool memeqzero(const void *data, size_t length)
    const unsigned char *p = data;
    const unsigned long zero = 0;
    size_t pre;
    pre = (size_t)p % sizeof(unsigned long);
    if (pre) {
        size_t n = sizeof(unsigned long) - pre;
        if (n > length)
            n = length;
        if (memcmp(p, &zero, n) != 0)
            return false;
        p += n;
        length -= n;
    while (length > sizeof(zero)) {
        if (*(unsigned long *)p != zero)
            return false;
        p += sizeof(zero);
        length -= sizeof(zero);
    return memcmp(&zero, p, length) == 0;

And, worse in every way!

Using a 64-bit Constant With Open-coded Ends: Times: 4/9/68/6444 Bytes: 165

dwg: rusty, what colour is the bikeshed if you have an explicit char * loop for the pre and post?

That’s slightly better, but memcmp still wins over large distances, perhaps due to prefetching or other tricks.

Epiphany #1: We Already Have Zeroes: Times 3/5/92/5801 Bytes: 422

Then I realized that we don’t need a static buffer: we know everything we’ve already tested is zero!  So I open coded the first 16 byte compare, then memcmp()ed against the previous bytes, doubling each time.  Then a final memcmp for the tail.  Clever huh?

But it no faster than the static buffer case on the high end, and much bigger.

dwg: rusty, that is brilliant. but being brilliant isn’t enough to make things work, necessarily :p

Epiphany #2: memcmp can overlap: Times 3/5/37/2823 Bytes: 307

My doubling logic above was because my brain wasn’t completely in phase: unlike memcpy, memcmp arguments can happily overlap!  It’s still worth doing an open-coded loop to start (gcc unrolls it here with -O3), but after 16 it’s worth memcmping with the previous 16 bytes.  This is as fast as naive with as little as 2 bytes, and the fastest solution by far with larger numbers:

 const unsigned char *p = data;
 size_t len;

 /* Check first 16 bytes manually */
 for (len = 0; len < 16; len++) {
     if (!length)
         return true;
     if (*p)
         return false;

 /* Now we know that's zero, memcmp with self. */
 return memcmp(data, p, length) == 0;

You can find the final code in CCAN (or on Github) including the benchmark code.

Finally, after about 4 hours of random yak shaving, it turns out lightning doesn’t even want to use memeqzero() any more!  Hopefully someone else will benefit.

Posted Tue Oct 20 00:09:33 2015 Tags:

I first heard about David Brooks' article criticizing Most Likely to Succeed from a Mom at school that told me it was a rebuttal to the movie, and I should check it out.

I nodded, but did not really expect the article to change my mind.

David Brooks is an all-terrain commentator which dispenses platitudes and opinions on a wide range of topics, usually with little depth or understanding. In my book, anyone that supported and amplified the very fishy evidence for going to war with Iraq has to go an extra mile to prove their worth - and he was specially gross when it came to it.

Considering that the best part about David Brook's writing is that they often prompt beautiful take downs from Matt Taibbi and that his columns have given rise to a cottage industry of bloggers that routinely point out just how wrong he is, my expectations were low.

Anyways, I did read the article.

While the tone of the article is a general disagreement with novel approaches to education, his prescription is bland and generic: you need some basic facts before you can build upon those facts and by doing this, you will become a wise person.

The question of course is just how many facts? Because it is one thing to know basic facts about our world like the fact that there are countries, and another one to memorize every date and place of a historic event.

But you won't find an answer to that on Brooks piece. If there is a case to be made to continue our traditional education and continue relying on tests to raise great kids, you will not find it here.

The only thing that transpires from the article is that he has not researched the subject - he is shooting from the hip. An action necessitated by the need to fill eight hundred words a short hour before lunch.

His contribution to the future of education brings as much intellectual curiosity as washing the dishes.

I rather not shove useless information into our kids. Instead we should fill their most previous years with joy and passion, and give them the tools to plot their own destinies. Raise curious, critical and confident kids.

Ones that when faced with a new problem opt for the more rewarding in-depth problem solving, one that will have them research, reach out to primary sources, and help us invent the future.

Hopefully we can change education and raise a happier, kinder and better generation of humans. The road to get there will be hard, and we need to empower the teachers and schools that want to bring this change.

"Most Likely to Succeed" represends Forward Motion, and helps us start this discussion, and David's opinions should be dismissed for what they are: a case of sloppy stop energy.

Do not miss Ted Dintersmith's response to the article, my favorite part:

I agree with Brooks that some, perhaps even many, gain knowledge and wisdom over time. We just don’t gain it in school. It comes when we’re fully immersed in our careers, when we do things, face setbacks, apply our learning, and evolve and progress. But that almost always comes after our formal education is over. I interview a LOT of recent college graduates and I’m not finding lots of knowledge and wisdom. Instead, I find lots of student debt, fear of failure, and formulaic thinking. And what do I rarely see? Passion, purpose, creativity, and audacity.

So, game on, David Brooks and others defending the 19th Century model of education.

Posted Mon Oct 19 17:02:48 2015 Tags:
Sometimes we wish there were good ways to recreate a complex merge, replaying a previously resolved conflict resolution, and reapplying a previously done evil merge, of a side branch to an updated mainline.

For example, you have a side-branch that consists of two commits, A and B, and you create a merge with the mainline that ends with X, like so:

          /     \

resulting in a new merge commit M. When you created this merge, it could be that changes A and B overlapped (either textually or semantically) with what was done on the mainline since the side branch forked, i.e. what was done by X. Such an overlap, if it is textual, would result in a merge conflict. Perhaps X added a new line at the same place A and/or B added a different line, resulting in something like:

original line 1
<<<<<<< HEAD
line added by X
||||||| O (common ancestor)
line added by A
>>>>>>> B
original line 2

which you may resolve, when recording M, to:

original line 1
line added by A
line added by X
original line 2

Expressed in "git show --cc" format, such a merge result would appear this way:

  original line 1
 +line added by A
+ line added by X
  original line 2

A line with two leading spaces are common lines that both branches agree with, a line with plus at the first column is from the mainline and a line with plus at the second column is from the side branch.

If the overlap were not just textual but semantic, you may have to further update parts of files that did not textually conflict. For example, X may have renamed an existing function F to newF, while A or B added new callsites of F. Such a change is not likely to overlap textually, but in the merge result M, you would need to change the new calls you added to F to instead call newF. Such a change may look like this:

  original line 1
 +line added by A
+ line added by X
  original line 2
 -a new call to F() added by A
++a new call to newF() added by A

A line with minus at the second column is what was only in the side branch but that does not appear in the result (i.e. the side branch added the line, but the result does not have it). A line with two pluses at the beginning is what appears in the result but does not exist in either branch.

A merge that introduces such a line that did not exist in either branch is called an evil merge. It is something that no automated textual merge algorithm would have produced.

Now, while you were working on producing the merge M, the mainline may have progressed and gained a new commit Y. You would like to somehow take advantage of what you have already done when you created M to merge your side branch to the updated mainline to produce N:

          /         \

A good news is that, when the evil merge is in a file that also has textual conflicts to resolve, "git rerere" will automatically take care of this situation. All you need to do is to set the configuration rerere.enabled to true before attempting the merge between X and B and recording their merge M, and then attempt a new merge between B and Y. Without even having to type "git rerere", the mechanism is invoked by "git merge" to replay the recorded resolution (which is where the name of the machinery "rerere" comes from). A bad news is that when an evil merge has to be made to a file that is not involved in any textual conflict (i.e. imagine the case where we didn't have "line added by A" vs "line added by X" conflict earlier in the same file in the above example), "rerere" does not even kick in. The question is what to do, knowing B, X, and M, to recreate N while keeping the adjustment needed for semantic conflicts to record M.

One naive approach would be to take a difference between X and M and apply it to Y. In the previous example, X would have looked like:

original line 1
line added by X
original line 2

and the difference between X and M would be (1) addition of "line added by A", (2) addition of "a new call to newF() added by A", and (3) any other change made by A and B that did not overlap with what X did. Implementation-wise, it is unlikely that we would do this as a "diff | patch" pipeline; most likely we would do it as a three-way merge, i.e.

$ git checkout Y^0
$ git merge-recursive X HEAD M

to compute the state we would have obtain by making the same move as going from X to M starting at Y, using the index and the working tree.

While that approach would work in simple case where Y does not do anything interesting, it would not work well in general. The most obvious case is when Y is actually a merge between X and A:

          /     \   \

The difference between X and M would contain all that was done by A and B, in addition to what was done at M to adjust for textual and semantic conflicts. Replaying that on top of Y, which already contains what was done by A but not B, would end up duplicating what A did. At best, we will get a huge and uninteresting merge conflict. At worst, we will get the same code silently duplicated twice.

I think the right approach to recreate the (potentially evil) merge M is to consider M as two steps.
The first step is to merge X and B mechanically, and make a tree out of the mechanical merge result, with conflict markers and all. Call it T. The difference between T and M is what the person who made M did to adjust for textual and semantic conflicts.

          /     \

Then, you can think of the process of recreating N in a way similar to M was made as a similar two step process. The first step is to merge Y and B mechanically, and create a tree out of the mechanical merge result, and call it S. Applying the difference between T and M on top of S would give you the textual and semantic adjustments the same way "git rerere" replays the recorded resolution.

          /    (\)  \

This should work better whether Y is a merge with A.

$ git checkout X^0
$ git merge --no-commit B
$ git add -u
$ T=$(git write-tree)
$ git reset --hard Y^0
$ git merge --no-commit B
$ git add -u
$ S=$(git commit-tree $(git write-tree) -p HEAD -m S)
$ git checkout $S
$ git merge-recursive $T HEAD M

would compute the result using the index and the working tree, so after eyeballing the result and making sure it makes sense, the above can be concluded with a

$ git commit --amend

Of course, this article is only about outlining the idea. If this proves to be a viable approach, it would make sense to do these procedures inside "rebase --first-parent" or something.

Posted Thu Oct 15 21:47:00 2015 Tags:

High on my list of Things That Annoy Me When I Hack is sourcefiles that contain huge blobs of license text at the top. That is valuable territory which should be occupied by a header comment explaining the code, not a boatload of boilerplate that I’ve seen hundreds of times before.

Hackers have a lot of superstitious ideas about IP law and one is that these blobs are necessary for the license to be binding. They are not: incorporation by reference is a familiar concept to lawyers and courts, it suffices to unambiguously name the license you want to apply rather than quoting it in full.

This is what I do in my code. But to make the practice really comfortable for lawyers we need a registry of standardized license identifiers and an unambiguous way of specifying that we intend to include by reference.

Comes now the Software Package Data Exchange to solve this problem once and for all. It’s a great idea, I endorse it, and I will be using it in all my software projects from now on.

Here is what the hacking guide for NTPsec now says on this topic, lightly edited to remove some project-specific references:

We use the SPDX convention for inclusion by reference You can read about this at

When you create a new file, mark it as follows (updating the year) as required:

/* Copyright 2015 by the NTPsec project contributors
 * SPDX-License-Identifier: BSD-2-Clause

For documentation:

// Copyright 2015 by the NTPsec project contributors
// SPDX-License-Identifier: CC-BY-4.0

Modify as needed for whatever comment syntax the language or markup uses. Good places for these markings are at the end of an extended header comment, or at the very top of the file.

When you modify a file, leave existing copyrights in place. You may add a project copyright and replace the inline license with an SPDX tag. For example:

/* Copyright 2015 by the NTPsec project contributors
 * SPDX-License-Identifier: NTP

We recognize that occasionally a file may have changed so much that the historic copyright is no longer appropriate, but such decisions cannot be made casually. Discuss it with the project management before moving.

Posted Thu Oct 15 14:14:46 2015 Tags:

Sometimes you find performance improvements in the simplest places. Last night I improved the time-stepping precision of NTP by a factor of up to a thousand. With a change of less than 20 lines.

The reason I was able to do this is because the NTP code had not caught up to a change in the precision of modern computer clocks. When it was written, you set time with settimeofday(2), which takes a structure containing seconds and microseconds. But modern POSIX-conformant Unixes have a clock_settime(2) which takes a structure containing seconds and nanoseconds.

Internally, NTP represents times to a precision of under a nanosecond. But because the code was built around the old settimeofday(2) call, until last night it rounded to the nearest microsecond too soon, throwing away precision which clock_settime(2) was capable of passing to the system clock.

Once I noticed this it was almost trivial to fix. The round-off only has to happen if your target platform only has settimeofday(2). Moving it into the handler code for that case, and changing one argument-structure declaration, sufficed.

Now, in practice this is not going to yield a full thousand-fold improvement in stepping accuracy, because you can’t get clock sources that accurate. (Well, not unless you’re a national time authority and can afford a cesium-fountain clock.) This change only helps to the extent that your time-server is actually delivering corrections with sub-microsecond accuracy; otherwise those extra bits will be plain noise.

You won’t get that accuracy from a plain GPS, which is seriously wobbly in the 100-millisecond range. Nor from a GPS with 1PPS, which delivers around one microsecond accuracy. But plug in a GPS-conditioned oscillator (GPSDO) and now you’re talking. These commonly have accuracy in about the 100-nanosecond range, so we can expect computing in nanoseconds to actually pass through an order of magnitude in stepping precision.

Pretty good for a 20-line change!

What are our lessons for today?

First…roundoff is insidious. You should always compute at the highest available precision and round off, when you have to, at the latest possible moment. I knew this and had a to-do item in my head to change as many instances of the old struct timeval (microsecond precision) to struct timespec (nanosecond precision) as possible. This is the first place in the NTP code I’ve found that it makes a provable difference. I’ll be hunting others.

Second…you really ought to beat the dust out of your code every couple of years even if it’s working. Because APIs will improve on you, and if you settle for a quick least-effort shim you may throw away significant functional gains without realizing it. A factor of ten is not bupkis, and this one was stupid-easy to collect; I just had to be paying attention. Clearly the NTP Classic maintainers were not.

So, this is my first non-security-related functional improvement in NTP. To be followed by many others, I hope.

Posted Fri Oct 9 11:57:27 2015 Tags:

[ A version of this blog post was crossposted on Conservancy's blog. ]

Would software-related scandals, such as Volkswagen's use of proprietary software to lie to emissions inspectors, cease if software freedom were universal? Likely so, as I wrote last week. In a world where regulations mandate distribution of source code for all the software in all devices, and where no one ever cheats on that rule, VW would need means other than software to hide their treachery.

Universal software freedom is my lifelong goal, but I realized years ago that I won't live to see it. I suspect that generations of software users will need to repeatedly rediscover and face the harms of proprietary software before a groundswell of support demands universal software freedom. In the meantime, our community has invented semi-permanent strategies, such as copyleft, to maximize software freedom for users in our current mixed proprietary and Free Software world.

In the world we live in today, software freedom can impact the VW situation only if a few complex conditions are met. Let's consider the necessary hypothetical series of events, in today's real world, that would have been necessary for Open Source and Free Software to have stopped VW immediately.

First, VW would have created a combined or derivative work of software with a copylefted program. While many cars today contain Linux, which is copylefted, I am not aware of any cars that use Linux outside of the on-board entertainment and climate control systems. The VW software was not part of those systems, and VW engineers almost surely wrote the emissions testing mode code from scratch. Even if they included some non-copylefted Open Source or Free Software in it, those licenses don't require disclosure of any source code; VW's ability to conceal its bad actions with non-copylefted code is roughly identical to the situation of proprietary VW code before us. As a thought experiment, though, let's pretend, that VW based the nefarious code on Linux by writing a proprietary Linux module to trick the emissions testing systems.

In that case, VW would have violated the GPL. But that alone is far from enough to ensure anyone would catch VW. Indeed, GPL violations remain very prevalent, and only one organization enforces the GPL for Linux (full disclosure: that's Software Freedom Conservancy, where I work). That organization has such limited enforcement resources (only three people on staff, and enforcement is one of many of our programs), I suspect that years would pass before Conservancy had the resources to pursue the violation; Conservancy currently has hundreds of Linux GPL violations queued for action. Even once opened, most GPL violations take years to resolve. As an example, we are currently enforcing the GPL against one auto manufacturer who has Linux in their car. We've already spent hundreds of hours and the company to date continues to fail in their GPL compliance efforts. Admittedly, it's highly unlikely that particular violator has a GPL-violating Linux module specifically designed to circumvent automotive regulations. However, after enforcing the GPL in that case for more than two years, I still don't have enough data about their use of Linux to even know which proprietary Linux modules are present — let alone whether those modules are nefarious in any way other than as violating Linux's license.

Thus, in today's world, a “software freedom solution” to prevent the VW scandal must meet unbelievable preconditions: (a) VW would have to base all its software on copylefted Open Source and Free Software, and (b) an organization with a mission to enforce copyleft for the public good would require the resources to find the majority of GPL violators and ensure compliance in a timely fashion. This thought experiment quickly shows how much more work remains to advance and defend software freedom. While requirements of source code disclosure, such as those in copyleft licenses, are necessary to assure the benefits of software freedom, they cannot operate unless someone exercises the offers for source and looks at the details.

We live in a world where most of the population accepts proprietary software as legitimate. Even major trade associations, such as the OpenStack Foundation and the Linux Foundation, in the Open Source community laud companies who make proprietary software, as long as they adopt and occasionally contribute to some Free Software too. Currently, it feels like software freedom is winning, because the overwhelming majority in the software industry believe Open Source and Free Software is useful and superior in some circumstances. Furthermore, while I appreciate the aspirational ideal of voluntary Open Source, I find in my work that so many companies, just as VW did, will cheat against important social good policies unless someone watches and regulates. Mere adoption of Open Source won't work alone; we only yield the valuable results of software freedom if software is copylefted and someone upholds that copyleft.

Indeed, just as it has been since the 1980s, very few people believe that software freedom is of fundamental importance for all software users. Scandals, like VW's use of proprietary software to hide other bad acts, might slowly change opinions, but one scandal is rarely enough to permanently change public opinion. I therefore encourage those who support software freedom to take this incident as inspiration for a stronger stance, and to prepare yourselves for the long haul of software freedom advocacy.

Posted Mon Sep 28 19:00:00 2015 Tags:

The issue of software freedom is, not surprisingly, not mentioned in the mainstream coverage of Volkswagen's recent use of proprietary software to circumvent important regulations that exist for the public good. Given that Volkswagen is an upstream contributor to Linux, it's highly likely that Volkswagen vehicles have Linux in them.

Thus, we have a wonderful example of how much we sacrifice at the altar of “Linux adoption”. While I'm glad for some Free Software to appear in products rather than none, I also believe that, too often, our community happily accepts the idea that we should gratefully laud a company includes a bit of Free Software in their product, and gives a little code back, even if most of what they do is proprietary software.

In this example, a company poisoned people and our environment with out-of-compliance greenhouse gas emissions, and hid their tracks behind proprietary software. IIUC, the EPA had to do use an (almost literal) analog hole to catch these scoundrels.

It's not that I'm going to argue that end users should modify the software that verifies emissions standards. But if end users could extract these binaries from the physical device, recompile the source, and verify the binaries match, someone would have discovered this problem immediately when the models drove off the lot.

So, why does no one demand for this? To me, this feels like Diebold and voting machines all over again. So tell me, voters' rights advocates who claimed proprietary software was fine, as long as you could get voter-verified paper records: how do are we going to “paper verify” our emissions testing?

Software freedom is the only solution to problems that proprietary software creates. Sadly, opposition to software freedom is so strong, nearly everyone will desperately try every other (failing) solution first.

Posted Wed Sep 23 02:00:00 2015 Tags:

As promised, we now have a version of our IDE powered by Roslyn, Microsoft's open sourced C# compiler as a service

When we did the port we found various leaks in the IDE that were made worse by Roslyn, so we decided to take the time and fix those leaks, and optimize our use of Roslyn.

Next Steps

We want to get your feedback on how well it works and to let us know what problems you are running into. Once we feel that there are no regressions, we will make this part of the default IDE.

While Roslyn is very powerful, this power comes with a memory consumption price tag. The Roslyn edition of Xamarin Studio will use more memory.

We are working to reduce Roslyn's and Xamarin Studio memory usage in future versions.

Posted Mon Sep 21 22:17:12 2015 Tags:

[ This post was cross-posted on Conservancy's blog. ]

In this post, I discuss one example of how a choice for software freedom can cause many strange problems that others will dismiss. My goal here is to explain in gory detail how proprietary software biases in the computing world continue to grow, notwithstanding Open Source ballyhoo.

Two decades ago, nearly every company, organization, entity, and tech-minded individual ran their own email server. Generally speaking, even back then, nearly all the software for both MTAs and MUAs were Free Software0. MTA's are the mail transport agents — the complex software that moves email around from one Internet domain to another. MUAs are the mail user agents, sometimes called mail clients — the local programs with which users manipulate their own email.

I've run my own MTA since around 1993: initially with sendmail, then with exim for a while, and with Postfix since 1999 or so. Also, everywhere I've worked throughout my entire career since 1995, I've either been in charge of — or been the manager of the person in charge of — the MTA installation for the organization where I worked. In all cases, that MTA has always been Free Software, of course.

However, the world of email has changed drastically during that period. The most notable change in the email world is the influx of massive amounts of spam, which has been used as an excuse to implement another disturbing change. Slowly but surely, email service — both the MTA and the MUA — have been outsourced for most organizations. Specifically, either (a) organizations run proprietary software on their own computers to deal with email and/or (b) people pay a third-party to run proprietary and/or trade-secret software on their behalf to handle the email services. Email, generally speaking, isn't handled by Free Software all that much anymore.

This situation became acutely apparent to me this earlier this month when Conservancy moved its email server. I had plenty of warning that the move was needed1, and I'd set up a test site on the new server. We sent and received some of our email for months (mostly mailing list traffic) using that server configured with a different domain ( When the shut-off day came, I moved's email officially. All looked good: I had a current Debian, with a new version of Postfix and Dovecot on a speedier host, and with better spam protection settings in Postfix and better spam filtering with a newer version of SpamAssassin. All was going great, thanks to all those great Free Software projects — until the proprietary software vendors threw a spanner in our works.

For reasons that we'll never determine for sure2, the IPv4 number that our new hosting provide gave us was already listed on many spam blacklists. I won't debate the validity of various blacklists here, but the fact is, for nearly every public-facing, pure-blacklist-only service, delisting is straightforward, takes about 24 hours, and requires at most answering some basic questions about your domain name and answering a captcha-like challenge. These services, even though some are quite dubious, are not the center of my complaint.

The real peril comes from third-party email hosting companies. These companies have arbitrary, non-public blacklisting rules. More importantly, they are not merely blacklist maintainers, they are MTA (and in some cases, even MUA) providers who sell their proprietary and/or trade-secret hosted solutions as a package to customers. Years ago, the idea of giving up that much control of what happens to your own email would be considered unbelievable. Today, it's commonplace.

And herein lies the fact that is obvious to most software freedom advocates but indiscernible by most email users. As a Free Software user, with your own MTA on your own machine, your software only functions if everyone else respects your right to run that software yourself. Furthermore, if the people you want to email are fully removed from their hosting service, they won't realize nor understand that their hosting site might block your emails. These companies have their customers fully manipulated to oppose your software freedom. In other words, you can't appeal to those customers (the people you want to email), because you're likely the only person to ever raise this issue with them (i.e., unless they know you very well, they'll assume you're crazy). You're left begging to the provider, whom you have no business relationship with, to convince them that their customers want to hear from you. Your voice rings out indecipherable from the spammers who want the same permission to attack their customers.

The upshot for Conservancy? For days, Microsoft told all its customers that Conservancy is a spammer; Microsoft did it so subtly that the customers wouldn't even believe it if we told them. Specifically, every time I or one of my Conservancy colleagues emailed organizations using Microsoft's “Exchange Online”, “Office 365” or similar products to host email for their domain4, we got the following response:

        Sep  2 23:26:26 pine postfix/smtp[31888]: 27CD6E12B: to=,[]:25, delay=5.6, delays=0.43/0/0.16/5, dsn=5.7.1, status=bounced (host[] said: 550 5.7.1 Service unavailable; Client host [] blocked using FBLW15; To request removal from this list please forward this message to (in reply to RCPT TO command))

Oh, you ask, did you forward your message to the specified address? Of course I did; right away! I got back an email that said:

Hello ,

Thank you for your delisting request SRXNUMBERSID. Your ticket was received on (Sep 01 2015 06:13 PM UTC) and will be responded to within 24 hours.

Once we passed the 24 hour mark with no response, I started looking around for more information. I also saw a suggestion online that calling is the only way to escalate one of those tickets, so I phoned 800-865-9408 and gave V-2JECOD my ticket number and she told that I could only raise these issues with the “Mail Flow Team”. She put me on hold for them, and told me that I was number 2 in the queue for them so it should be a few minutes. I waited on hold for just under six hours. I finally reached a helpful representative, who said the ticket was the lowest level of escalation available (he hinted that it would take weeks to resolve at that level, which is consistent with other comments about this problem I've seen online). The fellow on the phone agreed to escalate it to the highest priority available, and said within four hours, Conservancy should be delisted. Thus, ultimately, I did resolve these issues after about 72 hours. But, I'd spent about 15 hours all-told researching various blacklists, email hosting companies, and their procedures3, and that was after I'd already carefully configured our MTA and DNS to be very RFC-compliant (which is complicated and confusing, but absolutely essential to stay off these blacklists once you're off).

Admittedly, this sounds like a standard Kafkaesque experience with a large company that almost everyone in post-modern society has experienced. However, it's different in one key way: I had to convince Microsoft to allow me to communicate with their customers who are paying Microsoft for proprietary and/or trade-secret software and services, ostensibly to improve efficiency of their communications. Plus, since Microsoft, by the nature of their so-called spam blocking, doesn't inform their customers whom they've blocked, I and my colleagues would have just sounded crazy if we'd asked our contacts to call their provider instead. (I actually considered this, and realized that we might negatively impact relationships with professional contacts.)

These problems do reduce email software freedom by network effects. Most people rely on third-party proprietary email software from Google, Microsoft, Barracuda, or others. Therefore, most people, don't exercise any software freedom regarding email services. Since exercising software freedom for email slowly becomes a rarer and rarer (rather than norm it once was), society slowly but surely pegs those who do exercise software freedom as “random crazy people”.

There are a few companies who are seeking to do email hosting in a way that respects your software freedom. The real test of such companies is if someone technically minded can get the same software configured on their own systems, and have it work the same way. Yet, in most cases, you go to one of these companies' Github pages and find a bunch of stuff pushed public, but limited information on how to configure it so that it functions the same way the hosted service does. RMS wrote years ago that Free Software cannot properly succeed without Free Documentation, and in many of these hosting cases: the hosting company is using fully upstreamed Free Software, but has configured the software in a way that is difficult to stumble upon by oneself. (For that reason, I'm committing to writing up tutorials on how Conservancy configured our mail server, so at least I'll be part of the solution instead of part of the problem.)

BTW, as I dealt with all this, I couldn't help but think of John Gilmore's activism efforts regarding open mail relays. While I don't agree with all of John's positions on this, his fundamental position is right: we must oppose companies who think they know better how we should configure our email servers (or on which IP numbers we should run those servers). I'd add a corollary that there's a serious threat to software freedom, at least with regard to email software, if we continue to allow such top-down control of the once beautifully decentralized email system.

The future of software freedom depends on issues like this. Imagine someone who has just learned that they can run their own email server, or bought some Free Software-based plug computing system that purports to be a “home cloud” service with email. There's virtually no chance that such users would bother to figure all this out. They'd see their email blocked, declare the “home cloud” solution useless, and would just get a,, or some other third-party email account. Thus, I predict that software freedom that we once had, for our MTAs and MUAs, will eventually evaporate for everyone except those tiny few who invest the time to understand these complexities and fight the for-profit corporate power that curtails software freedom. Furthermore, that struggle becomes Sisyphean as our numbers dwindle.

Email is the oldest software-centric communication system on the planet. The global email system serves as a canary in the coalmine regarding software freedom and network service freedom issues. Frighteningly, software now controls most of the global communications systems. How long will it be before mobile network providers refuse to terminate PSTN calls or SMS's sent from devices running modified Android firmwares like Replicant? Perhaps those providers, like large email providers, will argue that preventing robocalls (the telephone equivalent of SPAM) necessitates such blocking. Such network effects place so many dystopias on software freedom's horizon.

I don't deny that every day, there is more Free Software existing in the world than has ever existed before — the P.T. Barnum's of Open Source have that part right. The part they leave out is that, each day, their corporate backers make it a little more difficult to complete mundane tasks using only Free Software. Open Source wins the battle while software freedom loses the war.

0Yes, I'm intimately aware that Elm's license was non-free, and that the software freedom of PINE's license was in question. That's slightly relevant here but mostly orthogonal to this point, because Free Software MUAs were still very common then, and there were (ultimately successful) projects to actively rewrite the ones whose software freedom was in question

1For the last five years, one of Conservancy's Director Emeriti, Loïc Dachary, has donated an extensive amount of personal time and in-kind donations by providing Cloud server for Conservancy to host its three key servers, including the email server. The burden of maintaining this for us became too time consuming (very reasonably), and Loïc's asked us to find another provider. I want, BTW, to thank Loïc his for years of volunteer work maintaining infrastructure for us; he provided this service for much longer than we could have hoped! Loïc also gave us plenty of warning that we'd need to move. None of these problems are his fault in the least!

2The obvious supposition is that, because IPv4 numbers are so scarce, this particular IP number was likely used previously by a spammer who was shut down.

3I of course didn't count the time time on phone hold, as I was able to do other work while waiting, but less efficiently because the hold music was very distracting.

4If you want to see if someone's domain is a Microsoft customer, see if the MX record for their domain (say, points to

Posted Tue Sep 15 23:30:16 2015 Tags:

Many modern mice have the ability to store profiles, customize button mappings and actions and switch between several hardware resolutions. A number of those mice are targeted at gamers, but the features are increasingly common in standard mice. Under Linux, support for these device is spotty, though there are a few projects dedicated to supporting parts of the available device range. [1] [2] [3]

Benjamin Tissoires and I started a new project: libratbag. libratbag is a library to provide a generic interface to these mice,enabling desktop environments to provide configuration tools without having to worry about the device model. As of the time of this writing, we have partial support for the Logitech HID++ 1.0 (G500, G5) and HID++ 2.0 protocols (G303), the Etekcity Scroll Alpha and Roccat Kone XTD. Thomas H. P. Anderson already added the G5, G9 and the M705.

git clone

The internal architecture is fairly simple, behind the library's API we have a couple of protocol-specific drivers that access the mouse. The drivers match a specific product/vendor ID combination and load the data from the device, the library then exports it to the caller as a struct ratbag_device. Each device has at least one profile, each profile has a number of buttons and at least one resolution. Where possible, the resolutions can be queried and set, the buttons likewise can be queried and set for different functions. If the hardware supports it, you can map buttons to other buttons, assign macros, or special functions such as DPI/profile switching. The main goal of libratbag is to unify access to the devices so a configuration application doesn't need different libraries per hardware. Especially short-term, we envision using some of the projects listed above through custom backends.

We're at version 0.1 at the moment, so the API is still subject to change. It looks like this:

#include <libratbag.h>

struct ratbag *ratbag;
struct ratbag_device *device;
struct ratbag_profile *p;
struct ratbag_button *b;
struct ratbag_resolution *r;

ratbag = ratbag_create_context(...);
device = ratbag_device_new_from_udev(ratbag, udev_device);

/* retrieve the first profile */
p = ratbag_device_get_profile(device, 0);

/* retrieve the first resolution setting of the profile */
r = ratbag_profile_get_resolution(p, 0);
printf("The first resolution is: %dpi @ %d Hz\n",


/* retrieve the fourth button */
b = ratbag_profile_get_button(p, 4);

if (ratbag_button_get_action_type(b) == RATBAG_BUTTON_ACTION_TYPE_SPECIAL &&
ratbag_button_get_special(b) == RATBAG_BUTTON_ACTION_SPECIAL_RESOLUTION_UP)
printf("button 4 selects next resolution");


For testing and playing around with libratbag, we have a tool called ratbag-command that exposes most of the library:

$ ratbag-command info /dev/input/event8
Device 'BTL Gaming Mouse'
Capabilities: res profile btn-key btn-macros
Number of buttons: 11
Profiles supported: 5
Profile 0 (active)
0: 800x800dpi @ 500Hz
1: 800x800dpi @ 500Hz (active)
2: 2400x2400dpi @ 500Hz
3: 3200x3200dpi @ 500Hz
4: 4000x4000dpi @ 500Hz
5: 8000x8000dpi @ 500Hz
Button: 0 type left is mapped to 'button 1'
Button: 1 type right is mapped to 'button 2'
Button: 2 type middle is mapped to 'button 3'
Button: 3 type extra (forward) is mapped to 'profile up'
Button: 4 type side (backward) is mapped to 'profile down'
Button: 5 type resolution cycle up is mapped to 'resolution cycle up'
Button: 6 type pinkie is mapped to 'macro "": H↓ H↑ E↓ E↑ L↓ L↑ L↓ L↑ O↓ O↑'
Button: 7 type pinkie2 is mapped to 'macro "foo": F↓ F↑ O↓ O↑ O↓ O↑'
Button: 8 type wheel up is mapped to 'wheel up'
Button: 9 type wheel down is mapped to 'wheel down'
Button: 10 type unknown is mapped to 'none'
Profile 1
And to toggle/query the various settings on the device:

$ ratbag-command dpi set 400 /dev/input/event8
$ ratbag-command profile 1 resolution 3 dpi set 800 /dev/input/event8
$ ratbag-command profile 0 button 4 set action special doubleclick

libratbag is in a very early state of development. There are a bunch of FIXMEs in the code, the hardware support is still spotty and we'll appreciate any help we can get, especially with the hardware driver backends. There's a TODO in the repo for some things that we already know needs changing. Feel free to browse the repo on github and drop us some patches.

Eventually we want this to be integrated into the desktop environments, either in the respective control panels or in a standalone application. libratbag already provides SVGs for some devices we support but we'll need some designer input for the actual application. Again, any help you want to provide here will be much appreciated.

Posted Tue Sep 15 23:10:00 2015 Tags: