First of all, let’s go though some brief introductions. VIXI stands for Virtual Kernel Continuous Integration and is the main topic of this post. Renelick started as a fork of the new KernelCI stack in early 2024 and has kept evolving ever since into a general purpose data-driven automation framework. VIXI is essentially an application powered by Renelick to continuously test the upstream Linux kernel in virtual environments. It is now reaching the end of its proof-of-concept phase and as such deserves a highlight. A secondary goal is to validate the Renelick API with a comprehensive use case before making a v1.0 release candidate, which is due very soon. I’ll cover that in another post once it has crossed the finish line. For now, let’s focus on the VIXI results.
Context
Building upon all the lessons learnt from working on KernelCI for several years, VIXI brings together a more advanced solution aiming to go above and beyond what classic automated testing typically provides. The scope is the upstream Linux kernel, which is very different from vertically-integrated products. Rather than being just one component among many others in a monolithic stack, the kernel is a moving part within a virtually endless spectrum of configurations. Another critical difference with regular product development is the way changes get merged upstream, with hundreds of patches sent every day via mailing lists and through a hierarchy of maintainers. The concept of CI/CD doesn’t really apply in the usual sense, it would only work if there were a small number of officially supported platforms with a designated user-space stack and compiler toolchain – which would tend to go against the spirit of a neutral, common mainline kernel.
Dynamic orchestration
Extraordinary workflows require extraordinary solutions. With such an open-ended system under test, the key is to rely on dynamic orchestration to activate particular coverage according to the circumstances: area of the code being changed, previous results, available resources etc. Starting with a fast path to get initial results from lightweight tests, the orchestrator can then dynamically schedule additional runs depending on their outcome. Say, if a warning was detected at boot time in a particular device driver, a selection of especially relevant tests can be run with debugging and instrumentation options enabled. Conversely, long-running tests may be scheduled only if the system booted fine in the first place within the allocated resource budget – no need to start a full range of test suites on a system that won’t even boot. While the logic isn’t currently quite there yet, VIXI is being designed with this vision in mind.
One primitive for dealing with such scenario is implemented by VIXI as the
“replay” feature which enables any test to be run again while changing one or
more parameters. Every time some results come in, the current proof-of-concept
will run a post-processing delta task which will compare them against some
reference revision to detect new failures. This could be expanded in the
future by running the replay with a different compiler version, CPU
architecture, hardware platform etc.
When testing a maintainer branch, the base revision in mainline should be tested too in order to determine whether a failure was already there. Then extra checks need to be run to reduce false positives: is the “bad” revision really bad and the “good” one really good? If so, this will provide a reliable range for running an automated bisection and try to identify the individual commit that caused the test to fail. This logic is already implemented in the current VIXI PoC as detailed in the demo section below.
Bisection works well for high-volume branches and in particular linux-next as it integrates new commits from all the subsystems on a daily basis. Another dimension to the problem is testing patches from mailing lists before they get pushed on a Git branch. In this case, a better approach for short series is to test each patch individually but the same concept of comparing results applies. Although the proof-of-concept doesn’t do that yet, some initial trials have been run with an IMAP inbox and it will be enabled in incremental steps.
Demo time
Early experiments have given way to a few demos which show the progress made so far:
From Git to KUnit
The first demo is to lay down the foundation: Git checkout using mirrors within a Kubernetes deployment, a kernel build followed by a KVM boot test using KubeVirt as well as KUnit runs with results parsed into a tree of Renelick database nodes.
Delta and Git bisect
Building upon this, the second demo showcases the delta and bisect tasks.
They can automatically compare results and run Git bisections when KUnit test
regressions are detected. This only worked for KUnit as it showed the
challenge of being able to automatically reproduce arbitrary test results with
a different kernel revision.
Automated replay
Then the third demo introduced the “replay” feature which enabled delta and
bisect to deal with any kind of workload, not just KUnit. The data nodes now
include the information needed to recreate the whole series of tasks which were
run to produce them, with can be done while changing some parameters such as
the Git revision.
All things combined
The outcome is a first orchestrator which listens for a simple trigger event pointing at a Git kernel revision. Here’s how all the various tasks unfold from there:

trigger: event sent manually with the Git tree name and revision to testkbuild: kernel build run as a Tekton pipelinekboot: kernel boot with the artifacts produced bykbuildusing KubeVirtkunitandparser: KUnit run in Tekton followed by a task to parse the JSON results in a plain Kubernetes pod
The kbuild, kboot and kunit + parser tasks eventually produce some
database nodes with test results which can be consumed by the delta task.
The orchestrator will therefore schedule a delta for each incoming set of
results following the rule set in its configuration
file:
# compute delta between test results
- task: delta
channel: node
match:
kind: vixi.group
op: populate
scheduler: inline-vixi
Note: The
inline-vixischeduler is for running lightweight tasks directly in Python. For more details, see the Renelick documentation.
Each delta task will look for results from the base revision between the
branch being tested and mainline. If it can’t find any, it will get the tests
replayed there first in order to be able to detect regressions such as in the
example above with KUnit. Once the base revision results are in, it will
create a
node
for each new test failure detected. The orchestrator will then start a
bisect task for each of them as per this other rule:
# run a Git bisection based on found delta
- task: bisect
channel: node
match:
kind: vixi.delta
op: populate
scheduler: inline-vixi
This logic works independently of the underlying tasks and can reproduce any
test results automatically: kbuild, kunit or kboot. The data included
within the result nodes contains the parameters required for replaying each
task that produced them. As such, the sequence will differ depending on the
type of test:
kbuildis a single task which directly produces some pass/fail resultskbootis run followingkbuildresults as it needs its artifacts as inputskunitis followed by thekunit-parsertask to produce results
The example in the diagram above shows the kunit + parser case although
kbuild and kbuild + kboot have also been verified as part of the demos. In
fact, any number of tasks can be chained this way using the input-nodes task
attribute to describe how to feed the results of a previous task into the next
one. For example, with kboot which takes its inputs from kbuild:
"input-nodes": {
"kbuild": {
"id": "69734c80a855520ff4115a9c",
"kind": "vixi.group",
"name": "kbuild"
},
"image": {
"id": "69734c67a855520ff4115a96",
"kind": "artifact-v1",
"name": "bzImage"
},
"modules": {
"id": "69734c78a855520ff4115a9a",
"kind": "artifact-v1",
"name": "modules.tar.gz"
}
Bisection results
At the end of this long string of events a successful bisect task will
eventually find the commit that caused a test to fail and create its own
node
too. Here’s the Git history from the branch that was being bisected in demo
3,
with an artificial KUnit test failure in the middle:
b1d2fdbfdf5f HACK BISECT 4.noop
9e1c130d734e HACK BISECT 3.noop
76819e0f306e HACK fail KUnit overflow.shift_sane_test
71746a491c0f HACK BISECT 1.noop
f8f9c1f4d0c7 Linux 6.19-rc3
The head of the branch b1d2fdbfdf5f was tested first, then the base revision
v6.19-rc3 followed by the actual bisection which landed on the expected
commit.
What’s especially interesting to mention here is the continuous data path which goes all the way from the Git checkout to the final bisection result. It can be seen in the last node that was created, earlier ones will have a shorter path of course:
"path": [
"gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:checkout",
"gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:build",
"gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:exec",
"kunit",
"overflow",
"shift_sane_test",
"shift_sane_test:break",
"bisect:76819e0f306e1d1682ffcdeba5cd408c0eeb85c5"
],
This list contains the names of the nodes following their parent relationships
in the tree. The checkout is the root node with no parent. Then the build and
KUnit execution are added as child nodes, followed by the parsed results under
kunit. There are actually over 800 nodes with individual test cases in a
single tree of this kind, here we just see the ones that led to a failure. The
shift_sane_test:break node was created by the delta task with the
regression data. Finally, its child node comes from the bisect task with the
SHA1 of the Git commit that was found.
I won’t go into all the intricate details of how this came to be as that is covered more in depth by the demos themselves. To get a peek under the hood, you can take a look at the Renelick pub/sub event monitor log. Here’s an excerpt with just a few hand-picked lines from it:
I 2026-01-12T17:18:39.505717+00:00 VIXI trigger gtucker git linux-6.19-rc3-kunit-fail
I 2026-01-12T17:18:39.874351+00:00 TASK add 0365a602-7c45-40ad-880a-c04795ee53db tekton kunit
I 2026-01-12T17:19:05.451992+00:00 NODE add 69652d0967ff704f772f8ce2 vixi.kunit.step gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:checkout
I 2026-01-12T17:22:05.844831+00:00 NODE add 69652dbd67ff704f772f8cfe vixi.kunit.step gtucker:v6.19-rc3-4-gb1d2fdbfdf5f:exec
I 2026-01-12T17:22:06.531947+00:00 NODE add 69652dbe67ff704f772f8d00 artifact-v1 kunit.json
I 2026-01-12T17:22:15.510677+00:00 TASK add 9ba6a2d9-72a1-49e4-bcfc-aca0325874b7 inline-vixi kunit-parser
I 2026-01-12T17:22:18.566051+00:00 NODE populate 69652dc867ff704f772f8d02 batch:1 kunit
I 2026-01-12T17:22:18.942489+00:00 TASK add 72e89f23-7f09-4cca-835f-11bb96ee3f68 inline-vixi delta
I 2026-01-12T17:22:21.821146+00:00 NODE populate 69652dcd67ff704f772f9080 batch:1 shift_sane_test:break
I 2026-01-12T17:22:22.251833+00:00 TASK add 0c8ab94d-290f-48be-8090-683c0da56c5d inline-vixi bisect
I 2026-01-12T17:22:23.532738+00:00 VIXI bisect v6.19-rc3-4-gb1d2fdbfdf5f check:bad b1d2fdbfdf5f9d869bea4714c3e00c5cb3faea38
I 2026-01-12T17:23:49.670967+00:00 VIXI bisect v6.19-rc3-4-gb1d2fdbfdf5f check:good f8f9c1f4d0c7a64600e2ca312dec824a0bc2f1da
I 2026-01-12T17:25:18.153491+00:00 VIXI bisect v6.19-rc3-4-gb1d2fdbfdf5f iter 1/2 76819e0f306e1d1682ffcdeba5cd408c0eeb85c5
I 2026-01-12T17:26:43.290082+00:00 VIXI bisect v6.19-rc3-4-gb1d2fdbfdf5f iter 2/2 71746a491c0f6557d43047dc9eca5f0a5613f37c
I 2026-01-12T17:28:09.578434+00:00 VIXI bisect v6.19-rc3-4-gb1d2fdbfdf5f found 76819e0f306e1d1682ffcdeba5cd408c0eeb85c5
I 2026-01-12T17:28:09.907141+00:00 NODE add 69652f2967ff704f772f9115 vixi.bisect bisect:76819e0f306e1d1682ffcdeba5cd408c0eeb85c5
I 2026-01-12T17:28:10.478395+00:00 NODE add 69652f2a67ff704f772f9116 artifact-v1 bisect.txt
Next steps
The VIXI experiments have proven to be rather conclusive so far. The replay feature is implemented entirely at the application level using the standard Renelick API primitive operations. This is good news as it shows the API can remain simple while enabling a variety of complex use cases.
It’s also encouraging to see that demo 3 which ran a Kunit bisection lasted only about 10 minutes, with an extra kernel build running in parallel on the same worker followed by a boot test on a separate bare-metal machine with KVM enabled. The Renelick API instance, the orchestrator and the workers are all running on different machines with remote connections so this is a fairly realistic topology for an actual deployment – at least in terms of latencies.
One current limitation is that the delta task only compares results with the
base revision in mainline, not previous runs on the same branch. So this will
be the next improvement there. A more ambitious goal is to use
Scalpel and make bisection an integral
part of the dynamic orchestration logic using statistics.
Another issue is that while KubeVirt is great for scaling large workloads in the Cloud, the overhead of starting VMs and the general design isn’t too appopriate for short-lived tests such as booting kernels. Some more low-level Runtime implementations using QEMU and Cloud APIs directly would help there.
Overall, this sets the stage for the upcoming roadmap. Once Renelick v1.0-rc1 is released and hosted on a public instance, VIXI can be enabled as a continuous service: monitoring linux-next once a day, mainline every hour, patches from a couple of mailing lists e.g. KUnit and Rust-for-Linux to start with, sending email reports, showing results clearly on the frontend etc. There is also the open topic of sharing some of the bisection logic with KernelCI via a common Python package, which would be closing a very long loop.
Last but not least, the builds and KUnit tests in VIXI are all using kernel.org
compiler toolchains as generic container
images. This fits into the bigger
picture of having them maintained upstream as per the mailing list
discussion
and the
scripts/container
tool that just landed in linux-next.
Last modified on 2026-01-24
