The kernelci.org project aims at continuously testing the mainline Linux kernel, from stable branches to linux-next on a variety of platforms. When a revision fails to build or boot, kernel developers get informed via email reports. A summary of all the results can also be found directly on the website.
When a kernel revision fails to boot, while it is reported in the emails it is not always obvious what caused the problem. Development branches get typically merged with many commits on them, and only one boot test is run after the merge. So the initial information is that the main branch used to work, and after these many commits got merged it started failing. The actual problem can be very hard to track down.
For a given set of bad and good revisions, it is possible to run more boot tests while using Git’s bisection feature to determine which one to test next until there is only one left. Ideally, this should be the one that caused the breakage. However there are many subtleties that complicate things, for example there may be several changes introducing different problems especially if the initial range of revisions is very wide. Also, failures to build some revisions or false positives from the boot tests can mislead the bisection logic and land on a change that is not the actual breaking one.
So, does it work?
There is currently an experimental feature to automatically run a bisection for each boot regression found on kernelci.org. This will be started with the known good/bad revisions, on a given platform, in a given lab, with a given config. It is already starting to show some useful results, for example:
- QEMU x86 boot failure on linux-4.14.y
- Peach Pi Chromebook deadlock on v4.15-rc3
- Tegra124 Nyan Big Chromebook DRM driver issues
The main challenge is to bring the results to a high level of quality before actively publishing them. False positives in this area can be very harmful: if the bisection finds a change that is not responsible for the breakage, reporting it can be counterproductive. Developers may spend time chasing a red herring and lose trust in the reports. For this reason, each valid bisection result is currently manually verified, curated and shared on mailing lists or by contacting individuals directly.
We’re now in a maturing phase, identifying issues with the bisection tool and improving it until it’s ready for production. This will initially target only boot tests on mainline and stable branches. Future improvements can include extending it to more kernel trees, bisecting linux-next against mainline and covering more functional tests beyond booting to a prompt.