This feed omits posts by jwz. Just 'cause.

Ernest Moniz, former Secretary of Energy, warns that the danger of accidental nuclear war is increasing.

Posted Mon Feb 19 00:00:00 2018 Tags:

The Netherlands has so many cows that they can't dispose of the manure safely.

Posted Mon Feb 19 00:00:00 2018 Tags:

Brazil's president Temer got donations from companies "linked to slavery". So did many other Brazilian politicians.

Some of them actively oppose efforts to stamp out slavery.

Posted Mon Feb 19 00:00:00 2018 Tags:

A South Korean presidential candidate campaigns to end the burning of coal world-wide.

Posted Mon Feb 19 00:00:00 2018 Tags:

The formerly repressive prime minister of Ethiopia has resigned. Something complicated seems to be going on there.

Posted Mon Feb 19 00:00:00 2018 Tags:

It is fishy that investigating Kushner and other cronies of the bully for security clearances has taken a whole year. The delays are probably not the fault of the investigators.

Posted Mon Feb 19 00:00:00 2018 Tags:

The saboteur's "infrastructure" plan is designed to make toll roads in places where well-off people are likely to pay the tolls.

Posted Mon Feb 19 00:00:00 2018 Tags:

The Onion: New School Shooter Drill Includes Practicing Pleas To Lawmakers To Do Something About This.

Posted Mon Feb 19 00:00:00 2018 Tags:

Rich people are "less likely than poorer people to exhibit flexibility, empathy, and all the other traits: that lead to healthy, long-term relationships."

Posted Mon Feb 19 00:00:00 2018 Tags:

The bully wants to spend billions of dollars to upgrade B61 nuclear bombs which are kept in Europe, even though it is dangerous to have them there.

Posted Mon Feb 19 00:00:00 2018 Tags:

640px-elephas_maximus_eye_closeup

Bufferbloat is responsible for much of the poor performance seen in the Internet today and causes latency (called “lag” by gamers), triggered even by your own routine web browsing and video playing.

But bufferbloat’s causes and solutions remind me of the old parable of the Blind Men and the Elephant, updated for the modern Internet:

It was six men of Interstan,
 To learning much inclined,
Who sought to fix the Internet,
 (With market share in mind),
That evil of congestion,
 Might soon be left behind.

Definitely, the pipe-men say,
A wider link we laud,
For congestion occurs only
When bandwidth exceeds baud.
If only the reverse were true,
We would all live like lords.

But count the cost, if bandwidth had
No end, money-men say,
But count the cost, if bandwidth had
No end, money-men say,
Perpetual upgrade cycle;
True madness lies that way.

From DiffServ-land, their tried and true
Solution now departs,
Some packets are more equal than
Their lesser counterparts.
But choosing which is which presents
New problems in all arts.

“Every packet is sacred!” cries
The data-centre clan,
Demanding bigger buffers for
Their ever-faster LAN.
To them, a millisecond is
Eternity to plan.

The end-to-end principle guides
A certain band of men,
Detecting when congestion strikes;
Inferring bandwidth then.
Alas, old methods outcompete their
Algorithmic maven.

The Isolationists prefer
Explicit signalling,
Ensuring each and every flow
Is equally a king.
If only all the other men
Would listen to them sing!

And so these men of industry,
Disputed loud and long,
Each in his own opinion,
Exceeding stiff and strong,
Though each was partly in the right,
Yet mostly in the wrong!

So, oft in technologic wars,
The disputants, I ween,
Rail on in utter ignorance,
Of what each other mean,
And prate about an Internet,
Not one of them has seen!

Jonathan Morton, with apologies to John Godfrey Saxe

Most technologists are not truly wise: we are usually like the blind men of Interstan. The TCP experts, network operators, telecom operators, router makers, Internet service operators, router vendors and users have all had a grip *only* on their piece of the elephant.

The TCP experts look at TCP and think “if only TCP were changed” in their favorite way, all latency problems would be solved, forgetting that there are many other causes of saturating an Internet link, and that changing every TCP implementation on the planet will take time measured in decades. With the speed of today’s processors, almost everything can potentially transmit at gigabits per second. Saturated (“congested”) links are *normal* operation in the Internet, not abnormal. And indeed, improved congestion avoidance algorithms such as BBR and mark/drop algorithms such as CoDel and PIE are part of the solution to bufferbloat.

And since TCP is innately “unfair” to flows with different RTT’s, and we have both nearby (~10ms) or (inter)continental distant (~75-200ms) services, no TCP only solution can possibly provide good service. This is a particularly acute problem for people living in remote locations in the world who have great inherent latency differences between local and non-local services. But TCP only solutions cannot solve other problems causing unnecessary latency, and can never achieve really good latency as the queue they build depend on the round trip time (RTT) of the flows.

Network operators often think: “if only I can increase the network bandwidth,” they can banish bufferbloat: but at best, they can only move the bottleneck’s location and that bottleneck’s buffering may even be worse! When my ISP increased the bandwidth to my house several years ago (at no cost, without warning me), they made my service much worse, as the usual bottleneck moved from the broadband link (where I had bufferbloat controlled using SQM) to WiFi in my home router. Suddenly, my typical lag became worse by more than a factor of ten, without having touched anything I owned and having double the bandwidth!

The logical conclusion of solving bufferbloat this way would be to build an ultimately uneconomic and impossible-to-build Internet where each hop is at least as fast as the previous link under all circumstances, and which ultimately collides with the limits of wireless bandwidth available at the edge of the Internet. Here lies madness. Today, we often pay for much more bandwidth than needed for our broadband connections just to reduce bufferbloat’s effects; most applications are more sensitive to latency than bandwidth, and we often see serious bufferbloat issues at peering points as well in the last mile, at home and using cellular systems. Unnecessary latency just hurts everyone.

Internet service operators optimize the experience of their applications to users, but seldom stop to see if the their service damages that of other Internet services and applications. Having a great TV experience, at the cost of good phone or video conversations with others is not a good trade-off.

Telecom operators, have often tried to limit bufferbloat damage by hacking the congestion window of TCP, which does not provides low latency, nor does it prevents severe bufferbloat in their systems when they are loaded or the station remote.

Some packets are much more important to deliver quickly, so as to enable timely behavior of applications. These include ACKS, TCP opens, TLS handshakes, and many other packet types such as VOIP, DHCP and DNS lookups. Applications cannot make progress until those responses return. Web browsing and other interactive applications suffer greatly if you ignore this reality. Some network operation experts have developed complex packet classification algorithms to try to send these packets on their way quickly.

Router manufacturers often have extremely complex packet classification rules and user interfaces, that no sane person can possibly understand. How many of you have successfully configured your home router QOS page, without losing hair from your head?

Lastly, packet networks are inherently bursty, and these packet bursts cause “jitter.” With only First In First Out (FIFO) queuing, bursts of tens or hundreds of packets happen frequently. You don’t want your VOIP packet or other time sensitive to be arbitrarily delayed by “head of line” blocking of bursts of packets in other flows. Attacking the right problem is essential.

We must look to solve all of these problems at once, not just individual aspects of the bufferbloat elephant. Flow Queuing CoDel (FQ_CoDel) is the first, but will not be the last such algorithm. And we must attack bufferbloat however it appears.

Flow Queuing Codel Algorithm: FQ_CoDel

Kathleen Nichols and Van Jacobson invented the CoDel Algorithm (described now formally in RFC 8289), which represents a fundamental advance in Active Queue Management that preceded it, which required careful tuning and could hurt you if not tuned properly.  Their recent blog entry helps explain its importance further. CoDel is based on the notion of “sojourn time”, the time that a packet is in the queue, and drops packets at the head of the queue (rather than tail or random drop). Since the CoDel algorithm is independent of queue length, is self tuning, and is solely dependent on sojourn time, additional combined algorithms become possible not possible with predecessors such as RED.

Dave Täht had been experimenting with Paul McKenney’s Stochastic Fair Queuing algorithm (SFQ) as part of his work on bufferbloat, confirming that many of the issues are caused by head of line blocking (and unfairness of differing RTT’s of competing flows) were well handled by that algorithm.

CoDel and Dave’s and Paul’s work inspired Eric Dumazet to invent the FQ_Codel algorithm almost immediately after the CoDel algorithm became available. Note we refer to it as “Flow Queuing” rather than “Fair Queuing” as it is definitely “unfair” in certain respects rather than a true “Fair Queuing” algorithm.

Synopsis of the FQ_Codel Algorithm Itself

FQ_CoDel, documented in RFC 8290, uses a hash (typically but not limited of the usual 5-tuple identifying a flow), and establishes a queue for each, as does the Stochastic Fair Queuing (SFQ) algorithm. The hash bounds memory usage of the algorithm enabling use on even small embedded router or in hardware. The number of hash buckets (flows) and what constitutes a flow can be adjusted as needed; in practice the flows are independent TCP flows.

There are two sets of flow queues: those flows which have built a queue, and queues for new flows.

The packet scheduler preferentially schedules packets from flows that have not built a queue from those which have built a queue (with provisions to ensure that flows that have built queues cannot be starved, and continue to make progress).

If a flow’s queue empties, and packets again appear later, that flow is considered a new flow again.

Since CoDel only depends on the time in queue, it is easily applied in FQ_CoDel to ensure that TCP flows are properly policed to keep their behavior from building large queues.

CoDel does not require tuning and works over a very wide range of bandwidth and traffic. Combining flow queuing with a older mark/drop algorithm is impossible with older AQM algorithms such as Random Early Detection (RED) which depend on queue length rather than time in queue, and require tuning. Without some mark/drop algorithm such as CoDel, individual flows might fill their FQ_CoDel queues, and while they would not affect other flows, those flows would still suffer bufferbloat.

What Effects Does the FQ_CoDel Algorithm Have?

FQ_Codel has a number of wonderful properties for such a simple algorithm:

  • Simple “request/response” or time based protocols are preferentially scheduled relative to bulk data transport. This means that your VOIP packets, your TCP handshakes, cryptographic associations, your button press in your game, your DHCP or other basic network protocols all get preferential service without the complexity of extensive packet classification, even under very heavy load of other ongoing flows. Your phone call can work well despite large downloads or video use.
  • Dramatically improved service for time sensitive protocols, since ACKS, TCP opens, security associations, DNS lookups, etc., fly through the bottleneck link.
  • Idle applications get immediate service as they return from their idle to active state. This is often due to interactive user’s actions and enabling rapid restart of such flows is preferable to ongoing, time insensitive bulk data transfer.
  • Many/most applications put their most essential information in the first data packets after a connection opens: FQ_CoDel sees that these first packets are quickly forwarded. For example, web browsers need the size information included in an image before being able to complete page layout; this information is typically contained in the first packet of the image. Most later packets in a large flow have less critical time requirements than the first packet(s).
  • Your web browsing is significantly faster, particularly when you share your home connection with others, since head of line blocking is no longer such an issue.
  • Flows that are “well behaved” and do not use more than their share of the available bandwidth at the bottleneck are rewarded for their good citizenship. This provides a positive incentive for good behavior (which the packet pacing work at Google has been exploiting with BBR, for example, which depends on pacing packets). Unpaced packet bursts are poison to “jitter”, that is the variance of latency, which sets how close to real time other algorithms (such as VOIP) need to operate.
  • If there are bulk data flows of differing RTT’s, the algorithm ensures (in the limit) fairness between the flows, solving an issue that cannot itself be solved by TCP.
  • User simplicity: without complex packet classification schemes, FQ_CoDel schedules packets far better than previous rigid classification systems.
  • FQ_CoDel is extremely small, simple and fast and scalable both up and down in bandwidth and # of flows, and down onto very constrained systems. It is safe for FQ_CoDel to be “always on”.

Having an algorithm that “just works” for 99% of cases and attacks the real problem is a huge advantage and simplification. Packet classification need only be used only to provide guarantees when essential, rather than be used to work around bufferbloat. No (sane) human can possibly successfully configure the QOS page found on today’s home router.  Once bufferbloat is removed and flow queuing present, any classification rule set can be tremendously simplified and understood.

Limitations and Warnings

If flow queuing AQM is not present at your bottleneck in your path, nothing you do can save you under load. You can at most move the bottleneck by inserting an artificial bottleneck.

The FQ_CoDel algorithm [RFC 8290], particularly as a queue discipline, is not a panacea for bufferbloat as buffers hide all over today’s systems. There has been about a dozen significant improvements to reduce bufferbloat just in Linux starting with Tom Herbert’s Linux BQL, Eric Dumazet’s TCP Small queues, TSO Sizing and FQ scheduler, just naming a few. Our thanks to everyone in the Linux community for their efforts. FQ_Codel, implemented in its generic form as the Linux queue discipline fq_codel, only solves part of the bufferbloat problem, that found in the qdisc layer. The FQ_CoDel algorithm itself is applicable to a wide variety of circumstances, and not just in networking gear, but even sometimes in applications.

WiFi and Wireless

Good examples of the buffering problems occurs in wireless and other technologies such as DOCSIS and DSL. WiFi is particularly hard: rapid bandwidth variation and packet aggregation means that WiFi drivers must have significant internal buffering. Cellular systems have similar (and different) challenges.

Toke Høiland-Jørgensen, Michał Kazior, Dave Täht, Per Hertig and Anna Brunstrom reported amazing improvements for WiFi at the Usenix Technical Conference. Here, FQ_CoDel algorithm is but part of a more complex algorithm that handles aggregation and transmission air-time fairness (which Linux has heretofore lacked). The results are stunning, and we hope to see similar improvements in other technologies by applying FQ_Codel.

WiFiThis chart shows the latency under load in milliseconds of the new WiFi algorithm, beginning adoption in Linux. The log graph is used to enable plotting results on the same graph. The green curve is for the new driver latency under load (cumulative probability of the latency observed by packets) while the orange graph is for the previous Linux driver implementation. You seldom see 1-2 orders of magnitude improvements in anything. Note that latency can exceed one second under load with existing drivers. And implementation of air-time fairness in this algorithm also significantly increases the total bandwidth available in a busy WiFi network, by using the spectrum more efficiently, while delivering this result! See the above paper for more details and data.

This new driver structure and algorithm is in the process of adoption in Linux, though most drivers do not yet take advantage of what this offers. You should demand your device vendors using Linux update their drivers to the new framework.

While the details may vary for different technologies, we hope the WiFi work will help illuminate techniques appropriate for applying FQ_CoDel (or other flow queuing algorithms, as appropriate) to other technologies such as cellular networks.

Performance and Comparison to Other Work

In the implementations available in Linux, FQ_CoDel has seriously outperformed all comers, as shown in: “The Good, the Bad and the WiFi: Modern AQMs in a residential setting,” by Toke Høiland-Jørgensen, Per Hurtig and Anna Brunstrom. This paper is unique(?) in that it compares the algorithms consistently using the identical tests on all algorithms on the same up to date test hardware and software version, and is an apple-to-apple comparison of running code.

Our favorite benchmark for “latency under load” is the “rrul” test from Toke’s flent test tool. It demonstrates when flow(s) may be damaging the latency of other flow(s) as well as goodput and latency.  Low bandwidth performance is more difficult than high bandwidth. At 1Mbps, a single 1500 byte packet represents 12 milliseconds! This shows fq_codel outperforming all comers in latency, while preserving very high goodput for the available bandwidth. The combination of an effective self adjusting mark/drop algorithm with flow queuing is impossible to beat. The following graph measures goodput versus latency for a number of queue disciplines, at two different bandwidths, of 1 and 10Mbps (not atypical of the low end of WiFi or DSL performance). See the paper for more details and results.

aqmsummary

FQ_CoDel radically outperforms CoDel, while solving problems that no conventional TCP mark/drop algorithm can do by itself. The combination of flow queuing with CoDel is a killer improvement, and improves the behavior of CoDel (or PIE), since ACKS are not delayed.

Pfifo_fast previously has been the default queue discipline in Linux (and similar queuing is used on other operating systems): it is for the purposes of this test, a simple FIFO queue. Note that the default length of the FIFO queue is usually ten times longer than that displayed here in most of today’s operating systems. The FIFO queue length here was chosen so that the differences between algorithms would remain easily visible on the same linear plot!

Other self tuning TCP congestion avoidance mark/drop algorithms such as PIE [RFC 8033] are becoming available, and some of those are beginning to be combined with flow queuing to good effect. As you see above, PIE without flow queuing is not competitive (nor is CoDel). An FQ-PIE algorithm, recently implemented for FreeBSD by Rasool Al-Saadi and Grenville Armitage, is more interesting and promising. Again, flow queuing combined with an auto-adjusting mark/drop algorithm is the key to FQ-PIE’s behavior.

Availability of Modern AQM Algorithms

Implementations of both CoDel and FQ_Codel were released in Linux 3.5 in July 2012 and in FreeBSD 11, in October, 2016 (and backported to FreeBSD 10.3); FQ-PIE is only available in FreeBSD (though preliminary patches for Linux exist).

FQ_CoDel has seen extensive testing, deployment and use in Linux since its initial release. There is never a reason to use CoDel, and is present in Linux solely to enable convenient performance comparisons, as in the above results. FQ_CoDel first saw wide use and deployment as part of OpenWrt when it became the default queue discipline there, and used heavily as part of the SQM (Smart Queue Management) system to mitigate “last mile” bufferbloat. FQ_CoDel is now the default queue discipline on most Linux distributions in many circumstances.  FQ_CoDel is being used in an increasing number of commercial products.

The FQ_CoDel based WiFi improvements shown here are available in the latest LEDE/OpenWrt release for a few chips. WiFi chip vendors and router vendors take note: your competitors may be about to pummel you, as this radical improvement is now becoming available. The honor of the commercial product to incorporate this new WiFi code is the Evenroute IQrouter.

PIE, but not FQ-PIE was mandated in DOCSIS 3.1, and is much less effective than FQ_CoDel (or FQ-PIE).

Continuing Work: CAKE

Research in other AQM’s continue apace. FQ_CoDel and FQ-PIE are far from the last words on the topic.

Rate limiting is necessary to mitigate bufferbloat in equipment that has no effective queue management (such as most existing DSL or most Cable modems). While we have used fq_codel in concert with Hierarchical Token Bucket (HTB) rate limiting for years as part of the Smart Queue Management (SQM) scripts (in OpenWrt and elsewhere), one can go further in an integrated queue discipline to address problems difficult to solve piecemeal, and make configuration radically simpler. The CAKE queue discipline, available today in OpenWrt, additionally adds integrated framing compensation, diffserv support and CoDel improvements. We hope to see CAKE upstream in Linux soon. Please come help in its testing and development.

As you see above, the Make WiFi Fast project is making headway and makes WiFi much to be preferred to cellular service when available, much, much more is left to be done. Investigating fixing bufferbloat in WiFi made us aware of just how much additional improvement in WiFi was also possible. Please come help!

Conclusion

Ye men of Interstan, to learning much inclined, see the elephant, by opening your eyes: 

ISP’s, insist your equipment manufacturers support modern flow queuing automatic queue management. Chip manufacturers: ensure your chips implement best practices in the area, or see your competitors win business.

Gamers, stock traders, musicians, and anyone wanting decent telephone or video calls take note. Insist your ISP fix bufferbloat in its network. Linux and RFC 8290 prove you can have your cake and eat it too. Stop suffering.

Insist on up to date flow queuing queue management in devices you buy and use, and insist that bufferbloat be fixed everywhere.

Posted Mon Feb 12 00:39:15 2018 Tags:

Note

This post is based on a whitepaper I wrote at the beginning of 2016 to be used to help many different companies understand the Linux kernel release model and encourage them to start taking the LTS stable updates more often. I then used it as a basis of a presentation I gave at the Linux Recipes conference in September 2017 which can be seen here.

With the recent craziness of Meltdown and Spectre , I’ve seen lots of things written about how Linux is released and how we handle handles security patches that are totally incorrect, so I figured it is time to dust off the text, update it in a few places, and publish this here for everyone to benefit from.

I would like to thank the reviewers who helped shape the original whitepaper, which has helped many companies understand that they need to stop “cherry picking” random patches into their device kernels. Without their help, this post would be a total mess. All problems and mistakes in here are, of course, all mine. If you notice any, or have any questions about this, please let me know.

Overview

This post describes how the Linux kernel development model works, what a long term supported kernel is, how the kernel developers approach security bugs, and why all systems that use Linux should be using all of the stable releases and not attempting to pick and choose random patches.

Linux Kernel development model

The Linux kernel is the largest collaborative software project ever. In 2017, over 4,300 different developers from over 530 different companies contributed to the project. There were 5 different releases in 2017, with each release containing between 12,000 and 14,500 different changes. On average, 8.5 changes are accepted into the Linux kernel every hour, every hour of the day. A non-scientific study (i.e. Greg’s mailbox) shows that each change needs to be submitted 2-3 times before it is accepted into the kernel source tree due to the rigorous review and testing process that all kernel changes are put through, so the engineering effort happening is much larger than the 8 changes per hour.

At the end of 2017 the size of the Linux kernel was just over 61 thousand files consisting of 25 million lines of code, build scripts, and documentation (kernel release 4.14). The Linux kernel contains the code for all of the different chip architectures and hardware drivers that it supports. Because of this, an individual system only runs a fraction of the whole codebase. An average laptop uses around 2 million lines of kernel code from 5 thousand files to function properly, while the Pixel phone uses 3.2 million lines of kernel code from 6 thousand files due to the increased complexity of a SoC.

Kernel release model

With the release of the 2.6 kernel in December of 2003, the kernel developer community switched from the previous model of having a separate development and stable kernel branch, and moved to a “stable only” branch model. A new release happened every 2 to 3 months, and that release was declared “stable” and recommended for all users to run. This change in development model was due to the very long release cycle prior to the 2.6 kernel (almost 3 years), and the struggle to maintain two different branches of the codebase at the same time.

The numbering of the kernel releases started out being 2.6.x, where x was an incrementing number that changed on every release The value of the number has no meaning, other than it is newer than the previous kernel release. In July 2011, Linus Torvalds changed the version number to 3.x after the 2.6.39 kernel was released. This was done because the higher numbers were starting to cause confusion among users, and because Greg Kroah-Hartman, the stable kernel maintainer, was getting tired of the large numbers and bribed Linus with a fine bottle of Japanese whisky.

The change to the 3.x numbering series did not mean anything other than a change of the major release number, and this happened again in April 2015 with the movement from the 3.19 release to the 4.0 release number. It is not remembered if any whisky exchanged hands when this happened. At the current kernel release rate, the number will change to 5.x sometime in 2018.

Stable kernel releases

The Linux kernel stable release model started in 2005, when the existing development model of the kernel (a new release every 2-3 months) was determined to not be meeting the needs of most users. Users wanted bugfixes that were made during those 2-3 months, and the Linux distributions were getting tired of trying to keep their kernels up to date without any feedback from the kernel community. Trying to keep individual kernels secure and with the latest bugfixes was a large and confusing effort by lots of different individuals.

Because of this, the stable kernel releases were started. These releases are based directly on Linus’s releases, and are released every week or so, depending on various external factors (time of year, available patches, maintainer workload, etc.)

The numbering of the stable releases starts with the number of the kernel release, and an additional number is added to the end of it.

For example, the 4.9 kernel is released by Linus, and then the stable kernel releases based on this kernel are numbered 4.9.1, 4.9.2, 4.9.3, and so on. This sequence is usually shortened with the number “4.9.y” when referring to a stable kernel release tree. Each stable kernel release tree is maintained by a single kernel developer, who is responsible for picking the needed patches for the release, and doing the review/release process. Where these changes are found is described below.

Stable kernels are maintained for as long as the current development cycle is happening. After Linus releases a new kernel, the previous stable kernel release tree is stopped and users must move to the newer released kernel.

Long-Term Stable kernels

After a year of this new stable release process, it was determined that many different users of Linux wanted a kernel to be supported for longer than just a few months. Because of this, the Long Term Supported (LTS) kernel release came about. The first LTS kernel was 2.6.16, released in 2006. Since then, a new LTS kernel has been picked once a year. That kernel will be maintained by the kernel community for at least 2 years. See the next section for how a kernel is chosen to be a LTS release.

Currently the LTS kernels are the 4.4.y, 4.9.y, and 4.14.y releases, and a new kernel is released on average, once a week. Along with these three kernel releases, a few older kernels are still being maintained by some kernel developers at a slower release cycle due to the needs of some users and distributions.

Information about all long-term stable kernels, who is in charge of them, and how long they will be maintained, can be found on the kernel.org release page.

LTS kernel releases average 9-10 patches accepted per day, while the normal stable kernel releases contain 10-15 patches per day. The number of patches fluctuates per release given the current time of the corresponding development kernel release, and other external variables. The older a LTS kernel is, the less patches are applicable to it, because many recent bugfixes are not relevant to older kernels. However, the older a kernel is, the harder it is to backport the changes that are needed to be applied, due to the changes in the codebase. So while there might be a lower number of overall patches being applied, the effort involved in maintaining a LTS kernel is greater than maintaining the normal stable kernel.

Choosing the LTS kernel

The method of picking which kernel the LTS release will be, and who will maintain it, has changed over the years from an semi-random method, to something that is hopefully more reliable.

Originally it was merely based on what kernel the stable maintainer’s employer was using for their product (2.6.16.y and 2.6.27.y) in order to make the effort of maintaining that kernel easier. Other distribution maintainers saw the benefit of this model and got together and colluded to get their companies to all release a product based on the same kernel version without realizing it (2.6.32.y). After that was very successful, and allowed developers to share work across companies, those companies decided to not do that anymore, so future LTS kernels were picked on an individual distribution’s needs and maintained by different developers (3.0.y, 3.2.y, 3.12.y, 3.16.y, and 3.18.y) creating more work and confusion for everyone involved.

This ad-hoc method of catering to only specific Linux distributions was not beneficial to the millions of devices that used Linux in an embedded system and were not based on a traditional Linux distribution. Because of this, Greg Kroah-Hartman decided that the choice of the LTS kernel needed to change to a method in which companies can plan on using the LTS kernel in their products. The rule became “one kernel will be picked each year, and will be maintained for two years.” With that rule, the 3.4.y, 3.10.y, and 3.14.y kernels were picked.

Due to a large number of different LTS kernels being released all in the same year, causing lots of confusion for vendors and users, the rule of no new LTS kernels being based on an individual distribution’s needs was created. This was agreed upon at the annual Linux kernel summit and started with the 4.1.y LTS choice.

During this process, the LTS kernel would only be announced after the release happened, making it hard for companies to plan ahead of time what to use in their new product, causing lots of guessing and misinformation to be spread around. This was done on purpose as previously, when companies and kernel developers knew ahead of time what the next LTS kernel was going to be, they relaxed their normal stringent review process and allowed lots of untested code to be merged (2.6.32.y). The fallout of that mess took many months to unwind and stabilize the kernel to a proper level.

The kernel community discussed this issue at its annual meeting and decided to mark the 4.4.y kernel as a LTS kernel release, much to the surprise of everyone involved, with the goal that the next LTS kernel would be planned ahead of time to be based on the last kernel release of 2016 in order to provide enough time for companies to release products based on it in the next holiday season (2017). This is how the 4.9.y and 4.14.y kernels were picked as the LTS kernel releases.

This process seems to have worked out well, without many problems being reported against the 4.9.y tree, despite it containing over 16,000 changes, making it the largest kernel to ever be released.

Future LTS kernels should be planned based on this release cycle (the last kernel of the year). This should allow SoC vendors to plan ahead on their development cycle to not release new chipsets based on older, and soon to be obsolete, LTS kernel versions.

Stable kernel patch rules

The rules for what can be added to a stable kernel release have remained almost identical for the past 12 years. The full list of the rules for patches to be accepted into a stable kernel release can be found in the Documentation/process/stable_kernel_rules.rst kernel file and are summarized here. A stable kernel change:

  • must be obviously correct and tested.
  • must not be bigger than 100 lines.
  • must fix only one thing.
  • must fix something that has been reported to be an issue.
  • can be a new device id or quirk for hardware, but not add major new functionality
  • must already be merged into Linus’s tree

The last rule, “a change must be in Linus’s tree”, prevents the kernel community from losing fixes. The community never wants a fix to go into a stable kernel release that is not already in Linus’s tree so that anyone who upgrades should never see a regression. This prevents many problems that other projects who maintain a stable and development branch can have.

Kernel Updates

The Linux kernel community has promised its userbase that no upgrade will ever break anything that is currently working in a previous release. That promise was made in 2007 at the annual Kernel developer summit in Cambridge, England, and still holds true today. Regressions do happen, but those are the highest priority bugs and are either quickly fixed, or the change that caused the regression is quickly reverted from the Linux kernel tree.

This promise holds true for both the incremental stable kernel updates, as well as the larger “major” updates that happen every three months.

The kernel community can only make this promise for the code that is merged into the Linux kernel tree. Any code that is merged into a device’s kernel that is not in the kernel.org releases is unknown and interactions with it can never be planned for, or even considered. Devices based on Linux that have large patchsets can have major issues when updating to newer kernels, because of the huge number of changes between each release. SoC patchsets are especially known to have issues with updating to newer kernels due to their large size and heavy modification of architecture specific, and sometimes core, kernel code.

Most SoC vendors do want to get their code merged upstream before their chips are released, but the reality of project-planning cycles and ultimately the business priorities of these companies prevent them from dedicating sufficient resources to the task. This, combined with the historical difficulty of pushing updates to embedded devices, results in almost all of them being stuck on a specific kernel release for the entire lifespan of the device.

Because of the large out-of-tree patchsets, most SoC vendors are starting to standardize on using the LTS releases for their devices. This allows devices to receive bug and security updates directly from the Linux kernel community, without having to rely on the SoC vendor’s backporting efforts, which traditionally are very slow to respond to problems.

It is encouraging to see that the Android project has standardized on the LTS kernels as a “minimum kernel version requirement”. Hopefully that will allow the SoC vendors to continue to update their device kernels in order to provide more secure devices for their users.

Security

When doing kernel releases, the Linux kernel community almost never declares specific changes as “security fixes”. This is due to the basic problem of the difficulty in determining if a bugfix is a security fix or not at the time of creation. Also, many bugfixes are only determined to be security related after much time has passed, so to keep users from getting a false sense of security by not taking patches, the kernel community strongly recommends always taking all bugfixes that are released.

Linus summarized the reasoning behind this behavior in an email to the Linux Kernel mailing list in 2008:

On Wed, 16 Jul 2008, pageexec@freemail.hu wrote:
>
> you should check out the last few -stable releases then and see how
> the announcement doesn't ever mention the word 'security' while fixing
> security bugs

Umm. What part of "they are just normal bugs" did you have issues with?

I expressly told you that security bugs should not be marked as such,
because bugs are bugs.

> in other words, it's all the more reason to have the commit say it's
> fixing a security issue.

No.

> > I'm just saying that why mark things, when the marking have no meaning?
> > People who believe in them are just _wrong_.
>
> what is wrong in particular?

You have two cases:

 - people think the marking is somehow trustworthy.

   People are WRONG, and are misled by the partial markings, thinking that
   unmarked bugfixes are "less important". They aren't.

 - People don't think it matters

   People are right, and the marking is pointless.

In either case it's just stupid to mark them. I don't want to do it,
because I don't want to perpetuate the myth of "security fixes" as a
separate thing from "plain regular bug fixes".

They're all fixes. They're all important. As are new features, for that
matter.

> when you know that you're about to commit a patch that fixes a security
> bug, why is it wrong to say so in the commit?

It's pointless and wrong because it makes people think that other bugs
aren't potential security fixes.

What was unclear about that?

    Linus

This email can be found here, and the whole thread is recommended reading for anyone who is curious about this topic.

When security problems are reported to the kernel community, they are fixed as soon as possible and pushed out publicly to the development tree and the stable releases. As described above, the changes are almost never described as a “security fix”, but rather look like any other bugfix for the kernel. This is done to allow affected parties the ability to update their systems before the reporter of the problem announces it.

Linus describes this method of development in the same email thread:

On Wed, 16 Jul 2008, pageexec@freemail.hu wrote:
>
> we went through this and you yourself said that security bugs are *not*
> treated as normal bugs because you do omit relevant information from such
> commits

Actually, we disagree on one fundamental thing. We disagree on
that single word: "relevant".

I do not think it's helpful _or_ relevant to explicitly point out how to
tigger a bug. It's very helpful and relevant when we're trying to chase
the bug down, but once it is fixed, it becomes irrelevant.

You think that explicitly pointing something out as a security issue is
really important, so you think it's always "relevant". And I take mostly
the opposite view. I think pointing it out is actually likely to be
counter-productive.

For example, the way I prefer to work is to have people send me and the
kernel list a patch for a fix, and then in the very next email send (in
private) an example exploit of the problem to the security mailing list
(and that one goes to the private security list just because we don't want
all the people at universities rushing in to test it). THAT is how things
should work.

Should I document the exploit in the commit message? Hell no. It's
private for a reason, even if it's real information. It was real
information for the developers to explain why a patch is needed, but once
explained, it shouldn't be spread around unnecessarily.

    Linus

Full details of how security bugs can be reported to the kernel community in order to get them resolved and fixed as soon as possible can be found in the kernel file Documentation/admin-guide/security-bugs.rst

Because security bugs are not announced to the public by the kernel team, CVE numbers for Linux kernel-related issues are usually released weeks, months, and sometimes years after the fix was merged into the stable and development branches, if at all.

Keeping a secure system

When deploying a device that uses Linux, it is strongly recommended that all LTS kernel updates be taken by the manufacturer and pushed out to their users after proper testing shows the update works well. As was described above, it is not wise to try to pick and choose various patches from the LTS releases because:

  • The releases have been reviewed by the kernel developers as a whole, not in individual parts
  • It is hard, if not impossible, to determine which patches fix “security” issues and which do not. Almost every LTS release contains at least one known security fix, and many yet “unknown”.
  • If testing shows a problem, the kernel developer community will react quickly to resolve the issue. If you wait months or years to do an update, the kernel developer community will not be able to even remember what the updates were given the long delay.
  • Changes to parts of the kernel that you do not build/run are fine and can cause no problems to your system. To try to filter out only the changes you run will cause a kernel tree that will be impossible to merge correctly with future upstream releases.

Note, this author has audited many SoC kernel trees that attempt to cherry-pick random patches from the upstream LTS releases. In every case, severe security fixes have been ignored and not applied.

As proof of this, I demoed at the Kernel Recipes talk referenced above how trivial it was to crash all of the latest flagship Android phones on the market with a tiny userspace program. The fix for this issue was released 6 months prior in the LTS kernel that the devices were based on, however none of the devices had upgraded or fixed their kernels for this problem. As of this writing (5 months later) only two devices have fixed their kernel and are now not vulnerable to that specific bug.

Posted Mon Feb 5 17:40:11 2018 Tags:

For the last few weeks, Benjamin Tissoires and I have been working on a new project: Tuhi [1], a daemon to connect to and download data from Wacom SmartPad devices like the Bamboo Spark, Bamboo Slate and, eventually, the Bamboo Folio and the Intuos Pro Paper devices. These devices are not traditional graphics tablets plugged into a computer but rather smart notepads where the user's offline drawing is saved as stroke data in vector format and later synchronised with the host computer over Bluetooth. There it can be converted to SVG, integrated into the applications, etc. Wacom's application for this is Inkspace.

There is no official Linux support for these devices. Benjamin and I started looking at the protocol dumps last year and, luckily, they're not completely indecipherable and reverse-engineering them was relatively straightforward. Now it is a few weeks later and we have something that is usable (if a bit rough) and provides the foundation for supporting these devices properly on the Linux desktop. The repository is available on github at https://github.com/tuhiproject/tuhi/.

The main core is a DBus session daemon written in Python. That daemon connects to the devices and exposes them over a custom DBus API. That API is relatively simple, it supports the methods to search for devices, pair devices, listen for data from devices and finally to fetch the data. It has some basic extras built in like temporary storage of the drawing data so they survive daemon restarts. But otherwise it's a three-way mapper from the Bluez device, the serial controller we talk to on the device and the Tuhi DBus API presented to the clients. One such client is the little commandline tool that comes with tuhi: tuhi-kete [2]. Here's a short example:


$> ./tools/tuhi-kete.py
Tuhi shell control
tuhi> search on
INFO: Pairable device: E2:43:03:67:0E:01 - Bamboo Spark
tuhi> pair E2:43:03:67:0E:01
INFO: E2:43:03:67:0E:01 - Bamboo Spark: Press button on device now
INFO: E2:43:03:67:0E:01 - Bamboo Spark: Pairing successful
tuhi> listen E2:43:03:67:0E:01
INFO: E2:43:03:67:0E:01 - Bamboo Spark: drawings available: 1516853586, 1516859506, [...]
tuhi> list
E2:43:03:67:0E:01 - Bamboo Spark
tuhi> info E2:43:03:67:0E:01
E2:43:03:67:0E:01 - Bamboo Spark
Available drawings:
* 1516853586: drawn on the 2018-01-25 at 14:13
* 1516859506: drawn on the 2018-01-25 at 15:51
* 1516860008: drawn on the 2018-01-25 at 16:00
* 1517189792: drawn on the 2018-01-29 at 11:36
tuhi> fetch E2:43:03:67:0E:01 1516853586
INFO: Bamboo Spark: saved file "Bamboo Spark-2018-01-25-14-13.svg"
I won't go into the details because most should be obvious and this is purely a debugging client, not a client we expect real users to use. Plus, everything is still changing quite quickly at this point.

The next step is to get a proper GUI application working. As usual with any GUI-related matter, we'd really appreciate some help :)

The project is young and relying on reverse-engineered protocols means there are still a few rough edges. Right now, the Bamboo Spark and Slate are supported because we have access to those. The Folio should work, it looks like it's a re-packaged Slate. Intuos Pro Paper support is still pending, we don't have access to a device at this point. If you're interested in testing or helping out, come on over to the github site and get started!

[1] tuhi: Maori for "writing, script"
[2] kete: Maori for "kit"

Posted Tue Jan 30 08:59:00 2018 Tags:

I keep getting a lot of private emails about my previous post about the latest status of the Linux kernel patches to resolve both the Meltdown and Spectre issues.

These questions all seem to break down into two different categories, “What is the state of the Spectre kernel patches?”, and “Is my machine vunlerable?”

State of the kernel patches

As always, lwn.net covers the technical details about the latest state of the kernel patches to resolve the Spectre issues, so please go read that to find out that type of information.

And yes, it is behind a paywall for a few more weeks. You should be buying a subscription to get this type of thing!

Is my machine vunlerable?

For this question, it’s now a very simple answer, you can check it yourself.

Just run the following command at a terminal window to determine what the state of your machine is:

1
$ grep . /sys/devices/system/cpu/vulnerabilities/*

On my laptop, right now, this shows:

1
2
3
4
$ grep . /sys/devices/system/cpu/vulnerabilities/*
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Vulnerable
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Vulnerable: Minimal generic ASM retpoline

This shows that my kernel is properly mitigating the Meltdown problem by implementing PTI (Page Table Isolation), and that my system is still vulnerable to the Spectre variant 1, but is trying really hard to resolve the variant 2, but is not quite there (because I did not build my kernel with a compiler to properly support the retpoline feature).

If your kernel does not have that sysfs directory or files, then obviously there is a problem and you need to upgrade your kernel!

Some “enterprise” distributions did not backport the changes for this reporting, so if you are running one of those types of kernels, go bug the vendor to fix that, you really want a unified way of knowing the state of your system.

Note that right now, these files are only valid for the x86-64 based kernels, all other processor types will show something like “Not affected”. As everyone knows, that is not true for the Spectre issues, except for very old CPUs, so it’s a huge hint that your kernel really is not up to date yet. Give it a few months for all other processor types to catch up and implement the correct kernel hooks to properly report this.

And yes, I need to go find a working microcode update to fix my laptop’s CPU so that it is not vulnerable against Spectre…

Posted Fri Jan 19 11:10:09 2018 Tags:

Even these days, I still spend too much time on the command line. My friends still make fun of my MacOS desktop when they see that I run a full screen terminal, and the main program that I am running there is the Midnight Commander:

Every once in a while I write an interactive application, and I want to have full bash-like command line editing, history and search. The Unix world used to have GNU readline as a C library, but I wanted something that worked on both Unix and Windows with minimal dependencies.

Almost 10 years ago I wrote myself a C# library to do this, it works on both Unix and Windows and it was the library that has been used by Mono's interactive C# shell for the last decade or so.

This library used to be called getline.cs, and it was part of a series of single source file libraries that we distributed with Mono.

The idea of distributing libraries that were made up of a single source file did not really catch on. So we have modernized our own ways and now we publish these single-file libraries as NuGet packages that you can use.

You can now add an interactive command line shell with NuGet by installing the Mono.Terminal NuGet package into your application.

We also moved the single library from being part of the gigantic Mono repository into its own repository.

The GitHub page has more information on the key bindings available, how to use the history and how to add code-completion (even including a cute popup).

The library is built entirely on top of System.Console, and is distributed as a .NET Standard library which can run on your choice of .NET runtime (.NET Core, .NET Desktop or Mono).

Check the GitHub project page for more information.

Posted Fri Jan 12 15:10:07 2018 Tags:

Last year, just before the holidays Benjamin Tissoires and I worked on a 'new' project - libevdev-python. This is, unsurprisingly, a Python wrapper to libevdev. It's not exactly new since we took the git tree from 2016 when I was working on it the first time round but this time we whipped it into a better shape. Now it's at the point where I think it has the API it should have, pythonic and very easy to use but still with libevdev as the actual workhorse in the background. It's available via pip3 and should be packaged for your favourite distributions soonish.

Who is this for? Basically anyone who needs to work with the evdev protocol. While C is still a thing, there are many use-cases where Python is a much more sensible choice. The libevdev-python documentation on ReadTheDocs provides a few examples which I'll copy here, just so you get a quick overview. The first example shows how to open a device and then continuously loop through all events, searching for button events:


import libevdev

fd = open('/dev/input/event0', 'rb')
d = libevdev.Device(fd)
if not d.has(libevdev.EV_KEY.BTN_LEFT):
print('This does not look like a mouse device')
sys.exit(0)

# Loop indefinitely while pulling the currently available events off
# the file descriptor
while True:
for e in d.events():
if not e.matches(libevdev.EV_KEY):
continue

if e.matches(libevdev.EV_KEY.BTN_LEFT):
print('Left button event')
elif e.matches(libevdev.EV_KEY.BTN_RIGHT):
print('Right button event')
The second example shows how to create a virtual uinput device and send events through that device:

import libevdev
d = libevdev.Device()
d.name = 'some test device'
d.enable(libevdev.EV_REL.REL_X)
d.enable(libevdev.EV_REL.REL_Y)
d.enable(libevdev.EV_KEY.BTN_LEFT)
d.enable(libevdev.EV_KEY.BTN_MIDDLE)
d.enable(libevdev.EV_KEY.BTN_RIGHT)

uinput = d.create_uinput_device()
print('new uinput test device at {}'.format(uinput.devnode))
events = [InputEvent(libevdev.EV_REL.REL_X, 1),
InputEvent(libevdev.EV_REL.REL_Y, 1),
InputEvent(libevdev.EV_SYN.SYN_REPORT, 0)]
uinput.send_events(events)
And finally, if you have a textual or binary representation of events, the evbit function helps to convert it to something useful:

>>> import libevdev
>>> print(libevdev.evbit(0))
EV_SYN:0
>>> print(libevdev.evbit(2))
EV_REL:2
>>> print(libevdev.evbit(3, 4))
ABS_RY:4
>>> print(libevdev.evbit('EV_ABS'))
EV_ABS:3
>>> print(libevdev.evbit('EV_ABS', 'ABS_X'))
ABS_X:0
>>> print(libevdev.evbit('ABS_X'))
ABS_X:0
The latter is particularly helpful if you have a script that needs to analyse event sequences and look for protocol bugs (or hw/fw issues).

More explanations and details are available in the libevdev-python documentation. That doc also answers the question why libevdev-python exists when there's already a python-evdev package. The code is up on github.

Posted Sun Jan 7 23:45:00 2018 Tags:

By now, everyone knows that something “big” just got announced regarding computer security. Heck, when the Daily Mail does a report on it , you know something is bad…

Anyway, I’m not going to go into the details about the problems being reported, other than to point you at the wonderfully written Project Zero paper on the issues involved here. They should just give out the 2018 Pwnie award right now, it’s that amazingly good.

If you do want technical details for how we are resolving those issues in the kernel, see the always awesome lwn.net writeup for the details.

Also, here’s a good summary of lots of other postings that includes announcements from various vendors.

As for how this was all handled by the companies involved, well this could be described as a textbook example of how NOT to interact with the Linux kernel community properly. The people and companies involved know what happened, and I’m sure it will all come out eventually, but right now we need to focus on fixing the issues involved, and not pointing blame, no matter how much we want to.

What you can do right now

If your Linux systems are running a normal Linux distribution, go update your kernel. They should all have the updates in them already. And then keep updating them over the next few weeks, we are still working out lots of corner case bugs given that the testing involved here is complex given the huge variety of systems and workloads this affects. If your distro does not have kernel updates, then I strongly suggest changing distros right now.

However there are lots of systems out there that are not running “normal” Linux distributions for various reasons (rumor has it that it is way more than the “traditional” corporate distros). They rely on the LTS kernel updates, or the normal stable kernel updates, or they are in-house franken-kernels. For those people here’s the status of what is going on regarding all of this mess in the upstream kernels you can use.

Meltdown – x86

Right now, Linus’s kernel tree contains all of the fixes we currently know about to handle the Meltdown vulnerability for the x86 architecture. Go enable the CONFIG_PAGE_TABLE_ISOLATION kernel build option, and rebuild and reboot and all should be fine.

However, Linus’s tree is currently at 4.15-rc6 + some outstanding patches. 4.15-rc7 should be out tomorrow, with those outstanding patches to resolve some issues, but most people do not run a -rc kernel in a “normal” environment.

Because of this, the x86 kernel developers have done a wonderful job in their development of the page table isolation code, so much so that the backport to the latest stable kernel, 4.14, has been almost trivial for me to do. This means that the latest 4.14 release (4.14.12 at this moment in time), is what you should be running. 4.14.13 will be out in a few more days, with some additional fixes in it that are needed for some systems that have boot-time problems with 4.14.12 (it’s an obvious problem, if it does not boot, just add the patches now queued up.)

I would personally like to thank Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, Peter Zijlstra, Josh Poimboeuf, Juergen Gross, and Linus Torvalds for all of the work they have done in getting these fixes developed and merged upstream in a form that was so easy for me to consume to allow the stable releases to work properly. Without that effort, I don’t even want to think about what would have happened.

For the older long term stable (LTS) kernels, I have leaned heavily on the wonderful work of Hugh Dickins, Dave Hansen, Jiri Kosina and Borislav Petkov to bring the same functionality to the 4.4 and 4.9 stable kernel trees. I had also had immense help from Guenter Roeck, Kees Cook, Jamie Iles, and many others in tracking down nasty bugs and missing patches. I want to also call out David Woodhouse, Eduardo Valentin, Laura Abbott, and Rik van Riel for their help with the backporting and integration as well, their help was essential in numerous tricky places.

These LTS kernels also have the CONFIG_PAGE_TABLE_ISOLATION build option that should be enabled to get complete protection.

As this backport is very different from the mainline version that is in 4.14 and 4.15, there are different bugs happening, right now we know of some VDSO issues that are getting worked on, and some odd virtual machine setups are reporting strange errors, but those are the minority at the moment, and should not stop you from upgrading at all right now. If you do run into problems with these releases, please let us know on the stable kernel mailing list.

If you rely on any other kernel tree other than 4.4, 4.9, or 4.14 right now, and you do not have a distribution supporting you, you are out of luck. The lack of patches to resolve the Meltdown problem is so minor compared to the hundreds of other known exploits and bugs that your kernel version currently contains. You need to worry about that more than anything else at this moment, and get your systems up to date first.

Also, go yell at the people who forced you to run an obsoleted and insecure kernel version, they are the ones that need to learn that doing so is a totally reckless act.

Meltdown – ARM64

Right now the ARM64 set of patches for the Meltdown issue are not merged into Linus’s tree. They are staged and ready to be merged into 4.16-rc1 once 4.15 is released in a few weeks. Because these patches are not in a released kernel from Linus yet, I can not backport them into the stable kernel releases (hey, we have rules for a reason…)

Due to them not being in a released kernel, if you rely on ARM64 for your systems (i.e. Android), I point you at the Android Common Kernel tree All of the ARM64 fixes have been merged into the 3.18, 4.4, and 4.9 branches as of this point in time.

I would strongly recommend just tracking those branches as more fixes get added over time due to testing and things catch up with what gets merged into the upstream kernel releases over time, especially as I do not know when these patches will land in the stable and LTS kernel releases at this point in time.

For the 4.4 and 4.9 LTS kernels, odds are these patches will never get merged into them, due to the large number of prerequisite patches required. All of those prerequisite patches have been long merged and tested in the android-common kernels, so I think it is a better idea to just rely on those kernel branches instead of the LTS release for ARM systems at this point in time.

Also note, I merge all of the LTS kernel updates into those branches usually within a day or so of being released, so you should be following those branches no matter what, to ensure your ARM systems are up to date and secure.

Spectre

Now things get “interesting”…

Again, if you are running a distro kernel, you might be covered as some of the distros have merged various patches into them that they claim mitigate most of the problems here. I suggest updating and testing for yourself to see if you are worried about this attack vector

For upstream, well, the status is there is no fixes merged into any upstream tree for these types of issues yet. There are numerous patches floating around on the different mailing lists that are proposing solutions for how to resolve them, but they are under heavy development, some of the patch series do not even build or apply to any known trees, the series conflict with each other, and it’s a general mess.

This is due to the fact that the Spectre issues were the last to be addressed by the kernel developers. All of us were working on the Meltdown issue, and we had no real information on exactly what the Spectre problem was at all, and what patches were floating around were in even worse shape than what have been publicly posted.

Because of all of this, it is going to take us in the kernel community a few weeks to resolve these issues and get them merged upstream. The fixes are coming in to various subsystems all over the kernel, and will be collected and released in the stable kernel updates as they are merged, so again, you are best off just staying up to date with either your distribution’s kernel releases, or the LTS and stable kernel releases.

It’s not the best news, I know, but it’s reality. If it’s any consolation, it does not seem that any other operating system has full solutions for these issues either, the whole industry is in the same boat right now, and we just need to wait and let the developers solve the problem as quickly as they can.

The proposed solutions are not trivial, but some of them are amazingly good. The Retpoline post from Paul Turner is an example of some of the new concepts being created to help resolve these issues. This is going to be an area of lots of research over the next years to come up with ways to mitigate the potential problems involved in hardware that wants to try to predict the future before it happens.

Other arches

Right now, I have not seen patches for any other architectures than x86 and arm64. There are rumors of patches floating around in some of the enterprise distributions for some of the other processor types, and hopefully they will surface in the weeks to come to get merged properly upstream. I have no idea when that will happen, if you are dependant on a specific architecture, I suggest asking on the arch-specific mailing list about this to get a straight answer.

Conclusion

Again, update your kernels, don’t delay, and don’t stop. The updates to resolve these problems will be continuing to come for a long period of time. Also, there are still lots of other bugs and security issues being resolved in the stable and LTS kernel releases that are totally independent of these types of issues, so keeping up to date is always a good idea.

Right now, there are a lot of very overworked, grumpy, sleepless, and just generally pissed off kernel developers working as hard as they can to resolve these issues that they themselves did not cause at all. Please be considerate of their situation right now. They need all the love and support and free supply of their favorite beverage that we can provide them to ensure that we all end up with fixed systems as soon as possible.

Posted Sat Jan 6 15:20:26 2018 Tags:
Cover of the book.
No worries, this is just about my Groovy Cheminformatics book. Seven years ago I started a project that was very educational to me: self-publishing a book. With the help from Lulu.com I managed to get a book out that sold over 100 copies and that was regularly updated. But their lies the problem: supply creates demand. So, I had a system that supplied me with an automated set up that reran scripts, recreated text output and even figures for the book (2D chemical diagrams). I wanted to make an edition for every CDK release. All in all, I got quite far with that: eleven editions.

But the current research setting, or at least in academia, does not provide me with the means to keep this going. Sad thing is, the hardest part is actually updating the graphics for the cover, which needs to resize each time the book gets ticker. But John Mayfield introduced so many API changes, I just did not have the time to update the book. I tried, and I have a twelfth edition on my desk. But where my automated setup scales quite nicely, I don't.

It may we worth reiterating why I started the book. We have had several places where information was given, and questions were answered: the mailing list, wiki pages, JavaDoc, the Chemistry Toolkit Rosetta Wiki, and more. Nothing in the book was not already answered somewhere else. The book was just a boon for me to answer those questions and provide an easy way for people to get many answers.

Now, because I could not keep up with the recent API changes, I am no longer feeling comfortable with releasing the book. As such, I have "retired" the book.

I am now working out on how to move from here. An earlier edition is already online under a Creative Commons license, and it's tempting to release the latest version like this too. That said, I have also been talking with the other CDK project leaders about alternatives. More on this soon, I guess.

Here's an overview of posts about the book:
Posted Sat Jan 6 00:14:00 2018 Tags:

Earlier this year, in February, I wrote a blog post encouraging people to donate to where I work, Software Freedom Conservancy. I've not otherwise blogged too much this year. It's been a rough year for many reasons, and while I personally and Conservancy in general have accomplished some very important work this year, I'm reminded as always that more resources do make things easier.

I understand the urge, given how bad the larger political crises have gotten, to want to give to charities other than those related to software freedom. There are important causes out there that have become more urgent this year. Here's three issues which have become shockingly more acute this year:

  • making sure the USA keeps it commitment to immigrants to allow them make a new life here just like my own ancestors did,
  • assuring that the great national nature reserves are maintained and left pristine for generations to come,
  • assuring that we have zero tolerance abusive behavior — particularly by those in power against people who come to them for help and job opportunities.
These are just three of the many issues this year that I've seen get worse, not better. I am glad that I know and support people who work on these issues, and I urge everyone to work on these issues, too.

Nevertheless, as I plan my primary donations this year, I'm again, as I always do, giving to the FSF and my own employer, Software Freedom Conservancy. The reason is simple: software freedom is still an essential cause and it is frankly one that most people don't understand (yet). I wrote almost two years ago about the phenomenon I dubbed Kuhn's Paradox. Simply put: it keeps getting more and more difficult to avoid proprietary software in a normal day's tasks, even while the number of lines of code licensed freely gets larger every day.

As long as that paradox remains true, I see software freedom as urgent. I know that we're losing ground on so many other causes, too. But those of you who read my blog are some of the few people in the world that understand that software freedom is under threat and needs the urgent work that the very few software-freedom-related organizations, like the FSF and Software Freedom Conservancy are doing. I hope you'll donate now to both of them. For my part, I gave $120 myself to FSF as part of the monthly Associate Membership program, and in a few minutes, I'm going to give $400 to Conservancy. I'll be frank: if you work in technology in an industrialized country, I'm quite sure you can afford that level of money, and I suspect those amounts are less than most of you spent on technology equipment and/or network connectivity charges this year. Make a difference for us and give to the cause of software freedom at least as much a you're giving to large technology companies.

Finally, a good reason to give to smaller charities like FSF and Conservancy is that your donation makes a bigger difference. I do think bigger organizations, such as (to pick an example of an organization I used to give to) my local NPR station does important work. However, I was listening this week to my local NPR station, and they said their goal for that day was to raise $50,000. For Conservancy, that's closer to a goal we have for entire fundraising season, which for this year was $75,000. The thing is: NPR is an important part of USA society, but it's one that nearly everyone understands. So few people understand the threats looming from proprietary software, and they may not understand at all until it's too late — when all their devices are locked down, DRM is fully ubiquitous, and no one is allowed to tinker with the software on their devices and learn the wonderful art of computer programming. We are at real risk of reaching that distopia before 90% of the world's population understands the threat!

Thus, giving to organizations in the area of software freedom is just going to have a bigger and more immediate impact than more general causes that more easily connect with people. You're giving to prevent a future that not everyone understands yet, and making an impact on our work to help explain the dangers to the larger population.

Posted Sun Dec 31 11:50:00 2017 Tags:
Ten alkanes in Wikidata. The ones without CAS regsitry
number previously did not have InChIKey or
PubChem CID. But no more; I added those.
While working on the 'chemical class' aspect for Scholia yesterday I noted that the page for alkanes was quite large, with a list of more than 50 long chain alkanes with pages in the Japanese Wikipedia with no SMILES, InChI, InChIKey, etc.

So, I dug up my Bioclipse scripts to add chemicals to Wikidata starting with a SMILES (btw, the script has significantly evolved since) and extended the query of that Scholia aspect to list just the Wikidata Q-code and name.  This script starts with one or more SMILES strings and generated QuickStatements (a must-learner).

Because the Wikidata entries also had the English IUPAC name, I can use that to autogenerate SMILES. Enter the OPSIN (doi:10.1021/ci100384d) plugin for Bioclipse which in combination with the CDK allowed me to create the matching SMILES, InChI, InChIKey, and use the latter to look up the PubChem compound identifier (CID). This is the script I ended up with:

inputFile = "/Wikidata/Alkanes/alkanes.tsv"
new File(bioclipse.fullPath(inputFile)).eachLine { line ->
fields = line.split("\t")
if (fields[0].startsWith("http://www.wikidata.org/entity/")) {
wdid = fields[0].substring("http://www.wikidata.org/entity/".length())
name = fields[1]
if (fields.length > 2) { // skip entities that already have an InChIKey
inchikey = fields[2]
// println "Skipping: $wdid $inchikey"
} else { // ok, consider adding it
// println "Considering $wdid $name"
try {
mol = opsin.parseIUPACName(name)
smiles = cdk.calculateSMILES(
cdk.addImplicitHydrogens(
cdk.removeExplicitHydrogens(mol)
)
)
//println " SMILES: $smiles"
println "${smiles}\t${wdid}"
} catch (Exception error) {
//println "Could not parse $name with OPSIN: ${error.message}"
}
}
}
}

That way, I ended up with changes like this:


Posted Sat Dec 30 09:59:00 2017 Tags: