gtucker.io
VIXI – kernel bisection with Renelick
Advanced automated testing for the upstream Linux kernel

First of all, let’s go though some brief introductions. VIXI stands for Virtual Kernel Continuous Integration and is the main topic of this post. Renelick started as a fork of the new KernelCI stack in early 2024 and has kept evolving ever since into a general purpose data-driven automation framework. VIXI is essentially an application powered by Renelick to continuously test the upstream Linux kernel in virtual environments. It is now reaching the end of its proof-of-concept phase and as such deserves a highlight. A secondary goal is to validate the Renelick API with a comprehensive use case before making a v1.0 release candidate, which is due very soon. I’ll cover that in another post once it has crossed the finish line. For now, let’s focus on the VIXI results.

Context

Building upon all the lessons learnt from working on KernelCI for several years, VIXI brings together a more advanced solution aiming to go above and beyond what classic automated testing typically provides. The scope is the upstream Linux kernel, which is very different from vertically-integrated products. Rather than being just one component among many others in a monolithic stack, the kernel is a moving part within a virtually endless spectrum of configurations. Another critical difference with regular product development is the way changes get merged upstream, with hundreds of patches sent every day via mailing lists and through a hierarchy of maintainers. The concept of CI/CD doesn’t really apply in the usual sense, it would only work if there were a small number of officially supported platforms with a designated user-space stack and compiler toolchain – which would tend to go against the spirit of a neutral, common mainline kernel.

Dynamic orchestration

Extraordinary workflows require extraordinary solutions. With such an open-ended system under test, the key is to rely on dynamic orchestration to activate particular coverage according to the circumstances: area of the code being changed, previous results, available resources etc. Starting with a fast path to get initial results from lightweight tests, the orchestrator can then dynamically schedule additional runs depending on their outcome. Say, if a warning was detected at boot time in a particular device driver, a selection of especially relevant tests can be run with debugging and instrumentation options enabled. Conversely, long-running tests may be scheduled only if the system booted fine in the first place within the allocated resource budget – no need to start a full range of test suites on a system that won’t even boot. While the logic isn’t currently quite there yet, VIXI is being designed with this vision in mind.

One primitive for dealing with such scenario is implemented by VIXI as the “replay” feature which enables any test to be run again while changing one or more parameters. Every time some results come in, the current proof-of-concept will run a post-processing delta task which will compare them against some reference revision to detect new failures. This could be expanded in the future by running the replay with a different compiler version, CPU architecture, hardware platform etc.

When testing a maintainer branch, the base revision in mainline should be tested too in order to determine whether a failure was already there. Then extra checks need to be run to reduce false positives: is the “bad” revision really bad and the “good” one really good? If so, this will provide a reliable range for running an automated bisection and try to identify the individual commit that caused the test to fail. This logic is already implemented in the current VIXI PoC as detailed in the demo section below.

Bisection works well for high-volume branches and in particular linux-next as it integrates new commits from all the subsystems on a daily basis. Another dimension to the problem is testing patches from mailing lists before they get pushed on a Git branch. In this case, a better approach for short series is to test each patch individually but the same concept of comparing results applies. Although the proof-of-concept doesn’t do that yet, some initial trials have been run with an IMAP inbox and it will be enabled in incremental steps.

Demo time

Early experiments have given way to a few demos which show the progress made so far:

From Git to KUnit

The first demo is to lay down the foundation: Git checkout using mirrors within a Kubernetes deployment, a kernel build followed by a KVM boot test using KubeVirt as well as KUnit runs with results parsed into a tree of Renelick database nodes.

Delta and Git bisect

Building upon this, the second demo showcases the delta and bisect tasks. They can automatically compare results and run Git bisections when KUnit test regressions are detected. This only worked for KUnit as it showed the challenge of being able to automatically reproduce arbitrary test results with a different kernel revision.

Automated replay

Then the third demo introduced the “replay” feature which enabled delta and bisect to deal with any kind of workload, not just KUnit. The data nodes now include the information needed to recreate the whole series of tasks which were run to produce them, with can be done while changing some parameters such as the Git revision.

All things combined

The outcome is a first orchestrator which listens for a simple trigger event pointing at a Git kernel revision. Here’s how all the various tasks unfold from there:

VIXI tasks

  • trigger: event sent manually with the Git tree name and revision to test
  • kbuild: kernel build run as a Tekton pipeline
  • kboot: kernel boot with the artifacts produced by kbuild using KubeVirt
  • kunit and parser: KUnit run in Tekton followed by a task to parse the JSON results in a plain Kubernetes pod

The kbuild, kboot and kunit + parser tasks eventually produce some database nodes with test results which can be consumed by the delta task. The orchestrator will therefore schedule a delta for each incoming set of results following the rule set in its configuration file:

# compute delta between test results
- task: delta
  channel: node
  match:
    kind: vixi.group
    op: populate
  scheduler: inline-vixi

Note: The inline-vixi scheduler is for running lightweight tasks directly in Python. For more details, see the Renelick documentation.

Each delta task will look for results from the base revision between the branch being tested and mainline. If it can’t find any, it will get the tests replayed there first in order to be able to detect regressions such as in the example above with KUnit. Once the base revision results are in, it will create a node for each new test failure detected. The orchestrator will then start a bisect task for each of them as per this other rule:

# run a Git bisection based on found delta
- task: bisect
  channel: node
  match:
    kind: vixi.delta
    op: populate
  scheduler: inline-vixi

This logic works independently of the underlying tasks and can reproduce any test results automatically: kbuild, kunit or kboot. The data included within the result nodes contains the parameters required for replaying each task that produced them. As such, the sequence will differ depending on the type of test:

  • kbuild is a single task which directly produces some pass/fail results
  • kboot is run following kbuild results as it needs its artifacts as inputs
  • kunit is followed by the kunit-parser task to produce results

The example in the diagram above shows the kunit + parser case although kbuild and kbuild + kboot have also been verified as part of the demos. In fact, any number of tasks can be chained this way using the input-nodes task attribute to describe how to feed the results of a previous task into the next one. For example, with kboot which takes its inputs from kbuild:

"input-nodes": {
  "kbuild": {
    "id": "69734c80a855520ff4115a9c",
    "kind": "vixi.group",
    "name": "kbuild"
  },
  "image": {
    "id": "69734c67a855520ff4115a96",
    "kind": "artifact-v1",
    "name": "bzImage"
  },
  "modules": {
    "id": "69734c78a855520ff4115a9a",
    "kind": "artifact-v1",
    "name": "modules.tar.gz"
  }

Bisection results

At the end of this long string of events a successful bisect task will eventually find the commit that caused a test to fail and create its own node too. Here’s the Git history from the branch that was being bisected in demo 3, with an artificial KUnit test failure in the middle:

b1d2fdbfdf5f HACK BISECT 4.noop
9e1c130d734e HACK BISECT 3.noop
76819e0f306e HACK fail KUnit overflow.shift_sane_test
71746a491c0f HACK BISECT 1.noop
f8f9c1f4d0c7 Linux 6.19-rc3

The head of the branch b1d2fdbfdf5f was tested first, then the base revision v6.19-rc3 followed by the actual bisection which landed on the expected commit.

What’s especially interesting to mention here is the continuous data path which goes all the way from the Git checkout to the final bisection result. It can be seen in the last node that was created, earlier ones will have a shorter path of course:

"path": [
  "gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:checkout",
  "gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:build",
  "gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:exec",
  "kunit",
  "overflow",
  "shift_sane_test",
  "shift_sane_test:break",
  "bisect:76819e0f306e1d1682ffcdeba5cd408c0eeb85c5"
],

This list contains the names of the nodes following their parent relationships in the tree. The checkout is the root node with no parent. Then the build and KUnit execution are added as child nodes, followed by the parsed results under kunit. There are actually over 800 nodes with individual test cases in a single tree of this kind, here we just see the ones that led to a failure. The shift_sane_test:break node was created by the delta task with the regression data. Finally, its child node comes from the bisect task with the SHA1 of the Git commit that was found.

I won’t go into all the intricate details of how this came to be as that is covered more in depth by the demos themselves. To get a peek under the hood, you can take a look at the Renelick pub/sub event monitor log. Here’s an excerpt with just a few hand-picked lines from it:

I  2026-01-12T17:18:39.505717+00:00 VIXI   trigger       gtucker                               git                 linux-6.19-rc3-kunit-fail
I  2026-01-12T17:18:39.874351+00:00 TASK   add           0365a602-7c45-40ad-880a-c04795ee53db  tekton              kunit
I  2026-01-12T17:19:05.451992+00:00 NODE   add           69652d0967ff704f772f8ce2              vixi.kunit.step     gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:checkout
I  2026-01-12T17:22:05.844831+00:00 NODE   add           69652dbd67ff704f772f8cfe              vixi.kunit.step     gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:exec
I  2026-01-12T17:22:06.531947+00:00 NODE   add           69652dbe67ff704f772f8d00              artifact-v1         kunit.json
I  2026-01-12T17:22:15.510677+00:00 TASK   add           9ba6a2d9-72a1-49e4-bcfc-aca0325874b7  inline-vixi         kunit-parser
I  2026-01-12T17:22:18.566051+00:00 NODE   populate      69652dc867ff704f772f8d02              batch:1             kunit
I  2026-01-12T17:22:18.942489+00:00 TASK   add           72e89f23-7f09-4cca-835f-11bb96ee3f68  inline-vixi         delta
I  2026-01-12T17:22:21.821146+00:00 NODE   populate      69652dcd67ff704f772f9080              batch:1             shift_sane_test:break
I  2026-01-12T17:22:22.251833+00:00 TASK   add           0c8ab94d-290f-48be-8090-683c0da56c5d  inline-vixi         bisect
I  2026-01-12T17:22:23.532738+00:00 VIXI   bisect        v6.19-rc3-4-gb1d2fdbfdf5f             check:bad           b1d2fdbfdf5f9d869bea4714c3e00c5cb3faea38
I  2026-01-12T17:23:49.670967+00:00 VIXI   bisect        v6.19-rc3-4-gb1d2fdbfdf5f             check:good          f8f9c1f4d0c7a64600e2ca312dec824a0bc2f1da
I  2026-01-12T17:25:18.153491+00:00 VIXI   bisect        v6.19-rc3-4-gb1d2fdbfdf5f             iter  1/2           76819e0f306e1d1682ffcdeba5cd408c0eeb85c5
I  2026-01-12T17:26:43.290082+00:00 VIXI   bisect        v6.19-rc3-4-gb1d2fdbfdf5f             iter  2/2           71746a491c0f6557d43047dc9eca5f0a5613f37c
I  2026-01-12T17:28:09.578434+00:00 VIXI   bisect        v6.19-rc3-4-gb1d2fdbfdf5f             found               76819e0f306e1d1682ffcdeba5cd408c0eeb85c5
I  2026-01-12T17:28:09.907141+00:00 NODE   add           69652f2967ff704f772f9115              vixi.bisect         bisect:76819e0f306e1d1682ffcdeba5cd408c0eeb85c5
I  2026-01-12T17:28:10.478395+00:00 NODE   add           69652f2a67ff704f772f9116              artifact-v1         bisect.txt

Next steps

The VIXI experiments have proven to be rather conclusive so far. The replay feature is implemented entirely at the application level using the standard Renelick API primitive operations. This is good news as it shows the API can remain simple while enabling a variety of complex use cases.

It’s also encouraging to see that demo 3 which ran a Kunit bisection lasted only about 10 minutes, with an extra kernel build running in parallel on the same worker followed by a boot test on a separate bare-metal machine with KVM enabled. The Renelick API instance, the orchestrator and the workers are all running on different machines with remote connections so this is a fairly realistic topology for an actual deployment – at least in terms of latencies.

One current limitation is that the delta task only compares results with the base revision in mainline, not previous runs on the same branch. So this will be the next improvement there. A more ambitious goal is to use Scalpel and make bisection an integral part of the dynamic orchestration logic using statistics.

Another issue is that while KubeVirt is great for scaling large workloads in the Cloud, the overhead of starting VMs and the general design isn’t too appopriate for short-lived tests such as booting kernels. Some more low-level Runtime implementations using QEMU and Cloud APIs directly would help there.

Overall, this sets the stage for the upcoming roadmap. Once Renelick v1.0-rc1 is released and hosted on a public instance, VIXI can be enabled as a continuous service: monitoring linux-next once a day, mainline every hour, patches from a couple of mailing lists e.g. KUnit and Rust-for-Linux to start with, sending email reports, showing results clearly on the frontend etc. There is also the open topic of sharing some of the bisection logic with KernelCI via a common Python package, which would be closing a very long loop.

Last but not least, the builds and KUnit tests in VIXI are all using kernel.org compiler toolchains as generic container images. This fits into the bigger picture of having them maintained upstream as per the mailing list discussion and the scripts/container tool that just landed in linux-next.


Last modified on 2026-01-24