The lower-post-volume people behind the software in Debian. (List of feeds.)

Apparently I never extended the cdk.cite JavaDoc Taglet to use DOIs from the bibliographic database to create hyperlinks in the JavaDoc. But fear no more! I have submitted a simple patch today to add these to the JavaDoc, and I assume it will be part of the next CDK release from the master branch.

Of course, many papers in this bibliographic database (i.e. this cheminf.bibx file) do not have DOIs for all papers :/

Of course, you can help out here! The only thing you need is a web browser and some knowledge how to look up DOIs for papers. Just check this blog post (from Step 4 onwards) and line 260 in cheminf.bibx to see how a DOI addition to a BibTeXML entry should look like.
Posted Tue May 19 17:54:00 2015 Tags:

There’s a deeply annoying class of phenomena which, if you write code for any length of time, you will inevitably encounter. I have found it to be particularly prevalent in transformations to clean up or canonicalize large, complex data sets; repository export tools hit variants of it all the time, and so does my doclifter program for lifting [nt]roff markup to XML-DocBook.

It goes like this. You write code that handles a large fraction (say, 80%) of the problem space in a week. Then you notice that it’s barfing on the 20% remaining edge cases. These will be ugly to handle and greatly increase the complexity of your program, but it can be done, and you do it.

Once again, you have solved 80% of the remaining cases, and it took about a week – because your code is more complex than it used to be; testing it and making sure you don’t have regressions is about twice as difficult. But it can be done, at the cost of doubling your code complexity again, and you do it. Congratulations! You now handle 80% of the remaining cases. Then you notice that it’s barfing on 20% of remaining tricky edge cases….

…lather, rinse, repeat. If the problem space is seriously gnarly you can find yourself in a seemingly neverending cycle in which you’re expending multiplicatively more effort on each greater effort for multiplicatively decreasing returns. This is especially likely if your test range is expanding to include weirder data sets – in my case, older and gnarlier repositories or newer and gnarlier manual pages.

I think this is a common enough hazard of programming to deserve a name.

If this narrative sounds a bit familiar, you may be thinking of the paradox of motion usually attributed to the philosopher Zeno of Elea. From the Internet Encyclopedia of Philosophy:

In his Achilles Paradox, Achilles races to catch a slower runner–for example, a tortoise that is crawling away from him. The tortoise has a head start, so if Achilles hopes to overtake it, he must run at least to the place where the tortoise presently is, but by the time he arrives there, it will have crawled to a new place, so then Achilles must run to this new place, but the tortoise meanwhile will have crawled on, and so forth. Achilles will never catch the tortoise, says Zeno. Therefore, good reasoning shows that fast runners never can catch slow ones.

In honor of Zeno of Elea, and with some reference to the concept of a Turing tarpit, I propose that we label this programming hazard a “Zeno tarpit”.

Once you know this a thing you can be watching for it and perhaps avoid overinvesting in improvement cycles that pile up code complexity you will regret later. Also – if somebody asks you why your project has run so long over its expected ship date, “It turned into a Zeno tarpit” is often both true and extremely expressive.

Posted Tue May 19 02:19:48 2015 Tags:
Originally a series I started in the CDK News, later for some issues part of this blog, and then for some time on Google+, CDK Literature is now returning to my blog. BTW, I created a poll about whether CDK News should be picked up again. The reason why we stopped was that we were not getting enough submissions anymore.

For those who are not familiar with the CDK Literature series, the posts discuss recent literature that cites one of the two CDK papers (the first one is now Open Access). A short description explains what the paper is about and why the CDK is cited. For that I am using the CiTO, of which the data is available from CiteULike. That allows me to keep track how people are using the CDK, resulting, for example, in these wordles.

I will try to pick up this series again, but may be a bit more selective. The number of CDK citing papers has grown extensively, resulting in at least one new paper each week (indeed, not even close to the citation rate of DAVID). I aim at covering ~5 papers each week.

Ring perception
Ring perception has evolved in the CDK. Originally, there was the Figueras algorithm (doi:10.1021/ci960013p) implementation which was improved by Berger et al. (doi:10.1007/s00453-004-1098-x). Now, John May (the CDK release manager) has reworked the ring perception in the CDK, also introduction a new API which I covered recently. Also check John's blog.

May, J. W., Steinbeck, C., Jan. 2014. Efficient ring perception for the chemistry development kit. Journal of Cheminformatics 6 (1), 3+. URL

Screening Assistant 2
A bit longer ago, Vincent Le Guilloux published the second version their Screening Assistant tool fo rmining large sets of compounds. The CDK is used for various purposes. The paper is already from 2012 (I am that much behind with this series) and the source code on SourceForge does not seem to have change much recently.

Figure 2 of the paper (CC-BY) shows an overview of the Screening Assistant GUI.
Guilloux, V. L., Arrault, A., Colliandre, L., Bourg, S., Vayer, P., Morin-Allory, L., Aug. 2012. Mining collections of compounds with screening assistant 2. Journal of Cheminformatics 4 (1), 20+. URL

Similarity and enrichment
Using fingerprints for compound enrichment, i.e. finding the actives in a set of compounds, is a common cheminformatics application. This paper by Avram et al. introduces a new metric (eROCE). I will not go into details, which are best explained by the paper, but note that the CDK is used via PaDEL and that various descriptors and fingerprints are used. The data set they used to show the performance is one of close to 50 thousand inhibitors of ALDH1A1.

Avram, S. I., Crisan, L., Bora, A., Pacureanu, L. M., Avram, S., Kurunczi, L., Mar. 2013. Retrospective group fusion similarity search based on eROCE evaluation metric. Bioorganic & Medicinal Chemistry 21 (5), 1268-1278. URL

The International Chemical Identifier
It is only because Antony Williams advocated the importance of the InChI in this excellent slides that I list this paper again: I covered it here in more detail already. The paper describes work by Sam Adams to wrap the InChI library into a Java library, how it is integrated in the CDK, and how Bioclipse uses it. It does not formally cite the CDK, which now feels silly. Perhaps I did not add because of fear of self-citation? Who knows. Anyway, you find this paper cited on slide 30 in aforementioned presentation from Tony.

Spjuth, O., Berg, A., Adams, S., Willighagen, E., 2013. Applications of the InChI in cheminformatics with the CDK and bioclipse. Journal of Cheminformatics 5 (1), 14+. URL

Predictive toxicology
Cheminformatics is a key tool in predictive toxicology. I starts with the assumption that compounds of similar structure, behave similarly when coming in contact with biological systems. This is a long-standing paradigm which turns out to be quite hard to use, but has not shown to be incorrect either. This paper proposes a new approach using Pareto points and used the CDK to calculate logP values for compounds. However, I cannot find which algorithm it is using to do so.

Palczewska, A., Neagu, D., Ridley, M., Mar. 2013. Using pareto points for model identification in predictive toxicology. Journal of Cheminformatics 5 (1), 16+. URL

Cheminformatics in Python
ChemoPy is a tool to do cheminformatics in Python. This paper cites the CDK just as one of the tools available for cheminformatics. The tool is available from Google Code. It has not been migrated yet, but they still have about half a year to do so. Then again, given that there does not seem to have been activity since 2013, I recommend looking at Cinfony instead (doi:10.1186/1752-153X-2-24): exposed the CDK and is still maintained.

Cao, D.-S., Xu, Q.-S., Hu, Q.-N., Liang, Y.-Z., Apr. 2013. ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics 29 (8), 1092-1094. URL
Posted Fri May 15 13:29:00 2015 Tags:
It's been a while since I blogged about a release of my "Groovy Cheminformatics with the CDK" book, but not too long ago I made another release, 1.5.10-0. This was also the first one with white paper, and updated for the latest CDK development release.

There are two versions (and always check the special deals, e.g. today you can use UNPLUG10 to get an additional 10% off the below prices):
  1. paperback, for $25
  2. eBook, for $15, a PDF version
Compared to the 8th edition, this version offers this new material:
  • Chapter 1: Cheminformatics
  • Section 13.3: Ring counts (though it is not updated for John's ring perception work, doi:10.1186/1758-2946-6-3)
  • Section 14.1: Element and Isotope information
  • Section 16.4: SMARTS matching
  • Chapter 20: four more Chemistry Toolkit Rosetta solutions
  • Section 24.1: CDK 1.4 to 1.6 (see also this series)
This version of the book has 204 Groovy scripts, all of which have been tested against CDK 1.5.10.

Posted Fri May 15 10:30:00 2015 Tags:

Duck breasts
Soy sauce
Fennel bulbs

Sous vide the duck breasts with a bit of honey and salt at around 56C for about an hour to 90 minutes (sorry, but sous vide really is the path to tender duck breasts – if you can’t sous vide, then cook them however you like, to rare or medium rare). Let them cool down a little, then fry for a few minutes on each side to brown (if you’ve done the sous vide thing).

Let them rest for 5-10 minutes, slice into 1/4″ slices.

Thinly slice the fennel.

Peel the orange and break the segments into two or three chunks each.

Quickly stirfry the duck breasts for just a short while – 30 seconds or so. Add soy and honey. Throw in the orange chunks and sliced fennel and stirfry until the fennel has wilted slightly and the orange is warm (and the duck is still somewhat rare, so start with pretty rare duck!).

And then you’re done.

I suspect this would be improved with some sesame seeds stirfried just before the duck breasts, but I haven’t tried it yet.

Posted Thu May 14 16:28:01 2015 Tags:

Now that the Universe Splitter is out, it might be that a lot more people are going to trip over the word “mu” and wonder about it. Or it might be the word only occurs in the G+ poll about Universe Splitter – I don’t know, I haven’t seen the app (which appears to be a pretty good joke about the many-wolds interpretation of quantum mechanics) itself.

In any case, the most important thing to know about “mu” is that it is usually the correct answer to the question “Have you stopped beating your wife?”. More generally, it is a way of saying “Neither a yes or no would be a correct answer, because your question is incorrect”,

But the history of how it got that meaning is also entertaining.

The word “mu” is originally Chinese, and is one of the ways of saying a simple “no” or “nothing” in that language. It got its special meaning in English because was borrowed by Japanese and appears in translations of a Zen koan titled “Joshu’s Dog” from the collection called Gateless Gate. To some (but not all) interpreters in the Zen school, the word “mu” in that koan is interpreted in a sense of denying the question.

Wikipedia will tell you this much, tracing the special question-denying sense of “mu” in English through Robert Pirsig’s Zen and the Art of Motorcycle Maintenance (1974) and Douglas Hofstadter’s Gödel, Escher, Bach (1979).

However, Wikipedia’s account is incomplete in two respects. First, it doesn’t report something I learned from the Japanese translator of The Cathedral and the Bazaar, which is that even educated speakers of modern Japanese are completely unaware of the question-denying use of “mu”. She reported that she had to learn it from me!

Second, Wikipedia is missing one important vector of transmissions: Discordians, for whom “mu” seems to have have had its question-denying sense before 1970 (the date of the 4th edition of Principia Discordia) and from whom Pirsig and Hofstadter may have picked up the word. I suspect most contemporary usage traces through the hacker culture to the Discordians, either directly or through Hofstadter.

Regardless, it’s a useful word which deserves more currency. Sacred Chao says “Mu!” and so should you, early and often!

Posted Mon May 11 12:44:30 2015 Tags:

It’s not news to long-time followers of this blog that I love listening to virtuoso guitarists. Once, long ago in the 1980s I went to see a guitarist named Michael Hedges who astonished the crap out of me. The guy made sounds come out of a wooden flattop that were like nothing else on Earth.

Hedges died a few years later in a car crash, tragically young, and is no longer very well remembered. But I was on IRC yesterday taking music with a friend who mentioned a harmonica and a whistler doing Jimi Hendrix in a “laid back, measured, acoustic style”, and I brought up Hedges because I remembered his cover of All Along The Watchtower as an utterly amazing thing.

Afterwards, in a mood of gentle nostalgia, I searched YouTube for a recording of it. Found one, from the Wolf Trap festival in ’86, and got a surprise.

It was undoubtedly very similar to the performance I heard at around the same time, but…it just didn’t sound that interesting. Technically accomplished, yes, but it didn’t produce the feeling of wonder and awe I experienced then. His original Because It’s There followed on the playlist, and held up better, but…huh?

It didn’t take me long to figure this out. It’s because in 2015 I’m surrounded by guitarists doing what Hedges was doing in the late 1980s. It even has a name these days: “percussive fingerstyle”, Andy McKee, Antoine Dufour, Erik Mongrain, Tommy Emmanuel; players like these come up on my Pandora feed a lot, intermixed with the jazz fusion and progressive metal.

Sometimes progress diminishes its pioneers. It can be difficult to remember how bold an artistic innovation was once we’ve become used to its consequences. Especially when the followers exceed the originator; I must concede that Andy McKee, for example, does Hedges’s thing better than Hedges himself did. It may take memories like mine, acting as a kind of time capsule, to remind us how special the moment of creation was.

(And somwhere out there, some people who made it to Jimi Hendrix concerts when they were very young are nodding at this.)

I’m here to speak up for you, Michel Hedges. Hm..I see Wikipedia doesn’t link him to percussive fingerstyle. I think I’ll fix that.

Posted Tue May 5 18:07:14 2015 Tags:
Summit - Asbjørn Floden (CC BY-NC 2.0)
Summit – Asbjørn Floden (CC BY-NC 2.0)

Just this morning Lubomir released NetworkManager 1.0.2, the latest of the 1.0 stable series.  It’s  a great cleanup and bugfix release with contributions from lots of community members in many different areas of the project!

Some highlights of new functionality and fixes:

  • Wi-Fi device band capability indications, requested by the GNOME Shell team
  • Devices set to ignore carrier that use DHCP configurations will now wait a period of time for the carrier to appear, instead of failing immediately
  • Startup optimizations allow networking-dependent services to be started much earlier by systemd
  • Memory usage reductions through many memory leak fixes and optimizations
  • teamd interface management is now more robust and teamd is respawned when it terminates
  • dnsmasq is now respawned when it terminates in the local caching nameserver configuration
  • Fixes for an IPv6 DoS issue CVE-2015-2924, similar to one fixed recently in the kernel
  • IPv6 Dynamic DNS updates sent through DHCP now work more reliably (and require a fully qualified name, per the RFCs)
  • An IPv6 router solicitation loop due to a non-responsive IPv6 router has been fixed

While the list of generally interesting enhancements may be short, it masks 373 git commits and over 50 bugzilla issues fixed.  It’s a great release and we recommend that everyone upgrade.

Next up is NetworkManager 1.2, with DNS improvements, Wi-Fi scanning and AP list fixes for mobile uses, NM-in-containers improvements (no udev required!), even less dependence on the obsolete dbus-glib, less logging noise, device management fixes, continuing removal of external dependencies (like avahi-autoipd), configuration reload-ability, and much more!

Posted Tue May 5 16:39:37 2015 Tags:

In response to my last post about beamforming, some people wrote to say that MIMO (multi-input multi-output signal processing) is not so complicated and is easy to explain: in fact, your eyes do it.

Now, that's a fun example. It's not exactly how MIMO works in wifi, but it's a good way to start to understand. Let's look at it in more detail.

Imagine you place an array of LEDs some distance away from your eyes (say, 50 meters away)

Assuming there's no interference between the LEDs - which we'll get to in a moment - they can all be considered separate signals. So if they change colour or brightness or toggle on and off, they can each send a message independently of the others. That's common sense, right? And the maximum rate at which each one can send messages is defined by the Shannon limit, based on the colour spectrum your LEDs are able to produce (the "bandwidth" of the signal) and the ambient light (the "noise").

And so, trivially, this is an example of "cheating" the Shannon limit: by using several lights, you can send more data per unit time, using the same frequency spectrum, than the Shannon limit says you can.


Okay, so why can your eyes so easily bypass the limit, while your wifi can't?

The answer is interference between the individual signals. Above, we assumed that there is no interference between the LEDs. That's (mostly) true in this case, because your eyes act like an array of highly directional antennas. Assuming your eye's lens works correctly, every point on the grid you're looking at will end up projected onto a different sensor on your retina. Like this:

Notice how each light emits rays in every direction, but for the rays that actually reach your eye, all the rays from any one light always end up focused on the same point on your retina. Magic!

The maximum number of lights is limited by the quality of your eye's lens and retina, atmospheric interference, etc. That's why a space telescope - with no atmospheric interference and a really great lens - can pick up such great high-resolution images at huge distances. Each "pixel" in the image can vary its signal with the Shannon limit.

Lenses are pretty cool. Just to put that into perspective, here's what you'd get without a lens:

Notice how every point on the retina is receiving a signal from every light in the array; the result is a meaningless blur. If you want an example of this in real life, imagine setting up a white projector screen 50 meters away from your LED array. What shows up on the screen? Nothing much; just a bunch of blurred colour. As you move the LED array closer and closer to the projector screen, the picture gets clearer and clearer, but is always a bit blurred unless you have a lens.

Most of what you need to know to understand lenses, you probably learned in high school, but since I forgot most of what I learned in high school, I had to look it up. A good reference site seems to be, which has all the lens math I vaguely remembered from school. It also has an Interactive Lenses and Mirrors toy, from which I clipped this helpful example:

In this diagram, the big black arrow on the left is the object you're looking at, and the upside-down black arrow on the right is the image projected onto your retina.

'f' is the focal distance of your eye's lens, which is quite short, since the image is projected on your retina, which is less than 2f away. Since the focal distance is so short, almost any object you look at will be much more than 2f away, as shown in this picture. In that case, mathematically it always turns out that the image will be projected somewhere between distance f and 2f beyond the lens.

Your eye adjusts the lens (and thus the focal distance f) by contracting or releasing its muscles until the image is focused properly on the retina.

You can generally find the point where any given part of the image will appear by drawing three lines from the input (the top of the left-hand arrow in this case): a line straight through the center of the lens, a line directly horizontal, which bends at the lens to pass through point f on the right, and a line passing through f on the left, which bends at the lens to become horizontal. The point where all three lines intersect is the location of the image.

And that's MIMO, the way evolution intended. Sadly, it's not how MIMO works with wifi, because unlike your eyes, the wifi antennas in your laptop or phone are not very directional, not adjustable, and don't use lenses at all. In fact, what wifi receivers see is more like the blurred mess on the projector screen that we talked about earlier. We then unblur the mess using the amazing power of math. Next time!

Posted Mon May 4 10:10:07 2015 Tags:

Previously I discussed the use of IBLTs (on the pettycoin blog).  Kalle and I got some interesting, but slightly different results; before I revisited them I wanted some real data to play with.

Finally, a few weeks ago I ran 4 nodes for a week, logging incoming transactions and the contents of the mempools when we saw a block.  This gives us some data to chew on when tuning any fast block sync mechanism; here’s my first impressions looking a the data (which is available on github).

These graphs are my first look; in blue is the number of txs in the block, and in purple stacked on top is the number of txs which were left in the mempool after we took those away.

The good news is that all four sites are very similar; there’s small variance across these nodes (three are in Digital Ocean data centres and one is behind two NATs and a wireless network at my local coworking space).

The bad news is that there are spikes of very large mempools around block 352,800; a series of 731kb blocks which I’m guessing is some kind of soft limit for some mining software [EDIT: 750k is the default soft block limit; reported in 1024-byte quantities as does, this is 732k.  Thanks sipa!].  Our ability to handle this case will depend very much on heuristics for guessing which transactions are likely candidates to be in the block at all (I’m hoping it’s as simple as first-seen transactions are most likely, but I haven’t tested yet).

Transactions in Mempool and in Blocks: Australia (poor connection)

Transactions in Mempool and in Blocks: Singapore

Transactions in Mempool and in Blocks: San Francisco

Transactions in Mempool and in Blocks: San Francisco (using Relay Network)

Posted Thu Apr 30 12:26:32 2015 Tags:
I just encountered an interesting cherry-pick failure.

The change I was trying to cherry-pick was to remove a hunk of text. Its patch conceptually looked like this:

@@ ... @@

even though the pre-context A, removed text B, and post-context C are all multi-line block.
After doing a significant rewrite to the same original codebase (i.e. that had A, B and then C next to each other), the code I wanted to cherry-pick the above commit moved the text around and the block corresponding to B is now done a lot later. A diff between that state and the original perhaps looked like this:

@@ ... @@
@@ ... @@

And cherry-picking the above change succeeded without doing anything (!?!?).

Logically, this behaviour "makes sense", in the sense that it can be explained. The change wants to make A and C adjacent by removing B, and the three-way merge noticed that the updated codebase already had that removal, so there is nothing that needs to be done. In this particular case, I did not remove B but moved it elsewhere, so what cherry-pick did was wrong, but in other cases I may indeed have removed it without adding the equivalent to anywhere else, so it could have been correct. We simply cannot say. I wonder if we should at least flag this "both sides appear to have removed" case as conflicting, but I am not sure how that should be implemented (let alone implemented efficiently). After all, the moved block B might have gone to a completely different file. Would we scan for the matching block of text for the entire working tree?

This is why you should always look at the output from "git show" for the commit being cherry-picked and the output from "git diff HEAD" before concluding the cherry-pick to see if anything is amiss.

Posted Sun Apr 26 05:56:00 2015 Tags:

It’s Penguicon 2015 at the Westin in Southfield, Michigan, and time for the 2015 Friends of Armed & Dangerous party.

9PM tonight, room 314. Nuclear ghost-pepper brownies will be featured.

Posted Fri Apr 24 13:49:43 2015 Tags:
Screenshot of an old CDK-based
JChemPaint, from the first CDK paper.
CC-BY :)
Already a while ago, the American Chemical Society (ACS) decided to allow the Creative Commons Attribution license (version 4.0) to be used on their papers, via their Author Choice program. ACS members pay $1500, which is low for a traditional publisher. While I even rather seem them move to a gold Open Access journal, it is a very welcome option! For the ACS business model it means a guaranteed sell of some 40 copies of this paper (at about $35 dollar each), because it will not immediately affect the sale of the full journal (much). Some papers may sell more than that had the paper remained closed access, but many for papers that sounds like a smart move money wise. Of course, they also buy themselves some goodwill and green Open Access is just around the corner anyway.

Better, perhaps, is that you can also use this option to make a past paper Open Access under a CC-BY license! And that is exactly what Christoph Steinbeck did with five of his papers, including two on which I am co-author. And these are not the least papers either. The first is the first CDK paper from 2003 (doi:10.1021/ci050400b), which featured a screenshot of JChemPaint shown above. Note that in those days, the print journal was still the target, so the screenshot is in gray scale :) BTW, given that this paper is cited 329 times (according to ImpactStory), maybe the ACS could have sold more than 40 copies. But for me, it means that finally people can read this paper about Open Science in chemistry, even after so many years. BTW, there is little chance the second CDK paper will be freed in a similar way.

The second paper that was liberated this way, is the first Blue Obelisk paper (doi:10.1021/ci050400b), which was cited 276 times (see ImpactStory):

This screenshot nicely shows how readers can see the CC-BY license for this paper. Note that it also lists that the copyright is with the ACS, which is correct, because in those days you commonly gave away your copyright to the publisher (I have stopped doing this, bar some unfortunate recent exceptions).

So, head over to your email client and email and let them know you also want your JCICS/JCIM paper available under a CC-BY license! No excuse anymore to make your seminal work in cheminformatics not available as gold Open Access!

Of course, submitting your new work to the Journal of Cheminformatics is cheaper and has the advantage that all papers are Open Access!
Posted Sat Apr 18 10:11:00 2015 Tags:

Back in 2012, Poul-Henning-Kamp wrote a disgruntled article in ACM Queue, A Generation Lost in the Bazaar.
It did not occur to me to respond in public at the time, but someone else’s comment on a G+ thread about the article revived the thread. Rereading my reaction, I think it is still worth sharing for the fundamental point about scaling and chaos.

There are quite a lot of defects in the argument of this piece. One is that Kemp (rightly) complains about autoconf, but then leaps from that to a condemnation of the bazaar model without establishing that one implies the other.

I think, also, that when Kamp elevates control by a single person as a necessary way to get quality he is fooling himself about what is even possible at the scale of operating systems like today’s *BSD or Linux, which are far larger than the successful cathedrals of programming legend.

No single person can be responsible at today’s scale; the planning problem is too hard. It isn’t even really possible to “create architecture” because the attempt would exceed human cognitive capacity; the best we can do is make sure that the components of plannable size are clean, hope we get good emergent behavior from the whole system, and try to nudge it towards good outcomes as it evolves.

What this piece speaks of to me is a kind of nostalgia, and a hankering for the control (or just the illusion of control) that we had when our software systems were orders of magnitude smaller. We don’t have the choice that Kamp wants to take anymore, and it may be we only fooled ourselves into thinking we ever had it

Our choices are all chaos – either chaos harnessed by a transparent, self-correcting social process, or chaos hidden and denied and eating at the roots of our software.

Posted Fri Apr 17 00:57:55 2015 Tags:
In case some readers of this blog would be interested in working with Open Source software and VoIP technologies, Be IP ( is hiring a developer. Please see for the job description. You can contact me directly.
Posted Wed Apr 15 09:58:06 2015 Tags:

I’ve been sent my panel schedule for Penguicon 2015.

Building the “Great Beast of Malvern” – Saturday 5:00 pm

One of us needed a new computer. One of us kicked off the campaign to
fund it. One of us assembled the massive system. One of us installed the
software. We were never all in the same place at the same time. All of us
blogged about it, and had a great time with the whole folderol. Come hear
how Eric “esr” Raymond got his monster machine, with ‘a little help from
his friends’ scattered all over the Internet.

Dark Chocolate Around The World – Sunday 12:00 pm

What makes one chocolate different from others? It’s not just how much
cocoa or sugar it contains or how it’s processed. Different varieties of
are grown in different parts of the world, and sometimes it’s the type of
beans make for different flavor qualities. Join Cathy and Eric Raymond for
a tasting session designed to show you how to tell West African chocolate
from Ecuadorian.

Eric S. Raymond: Ask Me Anything – Sunday 3:00 pm

Ask ESR Anything. What’s he been working on? What’s he shooting?
What’s he thinking about? What’s he building in there?

We do also intend to run the annual “Friends of Armed & Dangerous” party, but don’t yet know if we’re in a party-floor room.

“Geeks With Guns” is already scheduled.

Posted Sun Apr 12 00:28:28 2015 Tags:

This is the fourth part of my series of posts explaining the bitcoin Lightning Networks 0.5 draft paper.  See Part I, Part II and Part III.

The key revelation of the paper is that we can have a network of arbitrarily complicated transactions, such that they aren’t on the blockchain (and thus are fast, cheap and extremely scalable), but at every point are ready to be dropped onto the blockchain for resolution if there’s a problem.  This is genuinely revolutionary.

It also vindicates Satoshi’s insistence on the generality of the Bitcoin scripting system.  And though it’s long been suggested that bitcoin would become a clearing system on which genuine microtransactions would be layered, it was unclear that we were so close to having such a system in bitcoin already.

Note that the scheme requires some solution to malleability to allow chains of transactions to be built (this is a common theme, so likely to be mitigated in a future soft fork), but Gregory Maxwell points out that it also wants selective malleability, so transactions can be replaced without invalidating the HTLCs which are spending their outputs.  Thus it proposes new signature flags, which will require active debate, analysis and another soft fork.

There is much more to discover in the paper itself: recommendations for lightning network routing, the node charging model, a risk summary, the specifics of the softfork changes, and more.

I’ll leave you with a brief list of requirements to make Lightning Networks a reality:

  1. A soft-fork is required, to protect against malleability and to allow new signature modes.
  2. A new peer-to-peer protocol needs to be designed for the lightning network, including routing.
  3. Blame and rating systems are needed for lightning network nodes.  You don’t have to trust them, but it sucks if they go down as your money is probably stuck until the timeout.
  4. More refinements (eg. relative OP_CHECKLOCKTIMEVERIFY) to simplify and tighten timeout times.
  5. Wallets need to learn to use this, with UI handling of things like timeouts and fallbacks to the bitcoin network (sorry, your transaction failed, you’ll get your money back in N days).
  6. You need to be online every 40 days to check that an old HTLC hasn’t leaked, which will require some alternate solution for occasional users (shut down channel, have some third party, etc).
  7. A server implementation needs to be written.

That’s a lot of work!  But it’s all simply engineering from here, just as bitcoin was once the paper was released.  I look forward to seeing it happen (and I’m confident it will).

Posted Wed Apr 8 03:59:37 2015 Tags:

This is the third part of my series of posts explaining the bitcoin Lightning Networks 0.5 draft paper.

In Part I I described how a Poon-Dryja channel uses a single in-blockchain transaction to create off-blockchain transactions which can be safely updated by either party (as long as both agree), with fallback to publishing the latest versions to the blockchain if something goes wrong.

In Part II I described how Hashed Timelocked Contracts allow you to safely make one payment conditional upon another, so payments can be routed across untrusted parties using a series of transactions with decrementing timeout values.

Now we’ll join the two together: encapsulate Hashed Timelocked Contracts inside a channel, so they don’t have to be placed in the blockchain (unless something goes wrong).

Revision: Why Poon-Dryja Channels Work

Here’s half of a channel setup between me and you where I’m paying you 1c: (there’s always a mirror setup between you and me, so it’s symmetrical)

Half a channel: we will invalidate transaction 1 (in favour of a new transaction 2) to send funds.

The system works because after we agree on a new transaction (eg. to pay you another 1c), you revoke this by handing me your private keys to unlock that 1c output.  Now if you ever released Transaction 1, I can spend both the outputs.  If we want to add a new output to Transaction 1, we need to be able to make it similarly stealable.

Adding a 1c HTLC Output To Transaction 1 In The Channel

I’m going to send you 1c now via a HTLC (which means you’ll only get it if the riddle is answered; if it times out, I get the 1c back).  So we replace transaction 1 with transaction 2, which has three outputs: $9.98 to me, 1c to you, and 1c to the HTLC: (once we agree on the new transactions, we invalidate transaction 1 as detailed in Part I)

Our Channel With an Output for an HTLC

Note that you supply another separate signature (sig3) for this output, so you can reveal that private key later without giving away any other output.

We modify our previous HTLC design so you revealing the sig3 would allow me to steal this output. We do this the same way we did for that 1c going to you: send the output via a timelocked mutually signed transaction.  But there are two transaction paths in an HTLC: the got-the-riddle path and the timeout path, so we need to insert those timelocked mutually signed transactions in both of them.  First let’s append a 1 day delay to the timeout path:

Timeout path of HTLC, with locktime so it can be stolen once you give me your sig3.

Similarly, we need to append a timelocked transaction on the “got the riddle solution” path, which now needs my signature as well (otherwise you could create a replacement transaction and bypass the timelocked transaction):

Full HTLC: If you reveal Transaction 2 after we agree it’s been revoked, and I have your sig3 private key, I can spend that output before you can, down either the settlement or timeout paths.

Remember The Other Side?

Poon-Dryja channels are symmetrical, so the full version has a matching HTLC on the other side (except with my temporary keys, so you can catch me out if I use a revoked transaction).  Here’s the full diagram, just to be complete:

A complete lightning network channel with an HTLC, containing a glorious 13 transactions.

Closing The HTLC

When an HTLC is completed, we just update transaction 2, and don’t include the HTLC output.  The funds either get added to your output (R value revealed before timeout) or my output (timeout).

Note that we can have an arbitrary number of independent HTLCs in progress at once, and open and/or close as many in each transaction update as both parties agree to.

Keys, Keys Everywhere!

Each output for a revocable transaction needs to use a separate address, so we can hand the private key to the other party.  We use two disposable keys for each HTLC[1], and every new HTLC will change one of the other outputs (either mine, if I’m paying you, or yours if you’re paying me), so that needs a new key too.  That’s 3 keys, doubled for the symmetry, to give 6 keys per HTLC.

Adam Back pointed out that we can actually implement this scheme without the private key handover, and instead sign a transaction for the other side which gives them the money immediately.  This would permit more key reuse, but means we’d have to store these transactions somewhere on the off chance we needed them.

Storing just the keys is smaller, but more importantly, Section 6.2 of the paper describes using BIP 32 key hierarchies so the disposable keys are derived: after a while, you only need to store one key for all the keys the other side has given you.  This is vastly more efficient than storing a transaction for every HTLC, and indicates the scale (thousands of HTLCs per second) that the authors are thinking.

Next: Conclusion

My next post will be a TL;DR summary, and some more references to the implementation details and possibilities provided by the paper.


[1] The new sighash types are fairly loose, and thus allow you to attach a transaction to a different parent if it uses the same output addresses.  I think we could re-use the same keys in both paths if we ensure that the order of keys required is reversed for one, but we’d still need 4 keys, so it seems a bit too tricky.

Posted Mon Apr 6 11:21:26 2015 Tags:

I’ve released shipper 1.7. The main new feature in this release id that it now knows how to play nice with repository collections managed by gitolite and browseable through gitweb, like this one.

What’s new is that shipper (described in detail here shortly before I shipped the 1.0 version) now treats a gitolite/gitweb colection as just another publishing channel. When you call shipper to announce an update on a project in the collection, it updates the ‘description’ and ‘README.html’ files in the repository from the project control file, thus ensuring that the gitweb view of the collection always displays up-to-date metadata.

This is yet more fallout from the impending Gitorious shutdown. I don’t know if my refugee projects from Gitorious will be hosted on indefinitely; I’m considering several alternatives. But while they’re there I might as well figure out how to make updates as easy as possible so nobody else has to solve this problem and everyone’s productivity can go up.

Actually, I’m a little surprised that I have received neither bug reports nor feature requests on shipper since issuing the beta in 2013. This hints that either the software is perfect (highly unlikely) or nobody else has the problem it solves – that is, having to ship releases of software so frequently that one must either automate the process details or go mad.

Is that really true? Am I the only hacker with this problem? Or is there something I’m missing here? An enquiring mind wants to know.

Posted Sun Apr 5 11:50:22 2015 Tags:

Last Sunday I was informed by email that I have been nominated for the 2015 John W. Campbell award for best new science-fiction writer. I was also asked not to reveal this in public until 4 April.

This is a shame.. I had a really elaborate April Fool’s joke planned where I was going to announce my nomination in the style of a U.S. presidential campaign launch. Lots of talk about a 50-state strategy and my hopes of appealing to swing voters disaffected with both the SJW and Evil League of Evil extremists, invented polling results, and nine yards of political bafflegab.

The plan was to write it so over-the-top that everyone would go “Oh, ha ha, great AFJ but you can’t fool us”…and then, three days later, the other shoe drops. Alas, I checked in with the organizers and they squelched the idea.

It is, of course, a considerable honor to be nominated, and one I am somewhat doubtful I actually deserve. But after considering the ramifications, I have decided not to decline the nomination, but rather to leave the decision on the merits up to the voters.

I make this choice because, even if I myself doubt that my single story is more than competent midlist work, and I want no part of the messy tribal politics in which I seem to have become partly swept up, there is something I don’t mind representing and giving people the opportunity to vote for.

That something is the proud tradition of classic SF, the Golden Age good stuff and its descendants today. It may be that I am among the least and humblest of those descendants, but I think both the virtues and the faults of Sucker Punch demonstrate vividly where I come from and how much that tradition has informed who I am as a writer and a human being.

If you choose to vote for Sucker Punch as a work which, individually flawed as it may be, upholds that tradition and carries it forward, that will make me happy and proud.

Posted Sat Apr 4 07:47:13 2015 Tags: