Data visualisation is something we’re all familiar with, from histograms to graphs and modern interactive tools. A less common technique is data sonification, even though it is already being used in fields such as genetics and volcanology in particular to detect patterns that can’t be easily seen. It can also solve some accessibility issues found with visualisation. On top of that, it’s an interesting tool for musical experiments. Now, how can data from the Linux kernel be sonified? This is what the Linuxonification project is all about.
Linux kernel v5.17 was released a few weeks ago. It included a grand total of 14,200 individual new changes since the previous release v5.16 three months earlier. The code is open source, and the history is publicly available too using Git. Each change or “commit” contains information about when it was made and which other commits happened just before and after. There are also “merge commits” that join together two series of changes together. This ultimately creates a history as a tree of changes with branches merging into each other.
I’ve created a first experimental tool called Gitophone to generate an audio track based on Git history data. Here’s what it sounds like for the v5.17 kernel release:
Short of generating it yourself, please feel free to download and reuse or sample this audio file. It is released under Creative Commons BY-NC-SA 3.0 license.
What sonification is particularly useful for is identifying patterns: things that happen periodically. It’s not very easy yet to make the link between individual sounds and what happened in the history without more clues or the ability to zoom into some parts of it. This is something to improve in future versions. What we can already hear is at first a number of changes on a single segment, which were probably a series that took a long time to complete and finally got merged. Then we can hear on a few occasions two segments with interleaving changes, which might show a dependency between maintainers' branches evolving hand-in-hand. There’s obviously a very high-density climax closer to the end which is allegedly around the merge window and the first -rc tags (release candidates).
Another way to look at it, or rather listen to it is from a musical and artistic point of view. It is arguably an experiment whereby a tailored-made “instrument” is being played following a pre-existing “score” that was written by code contributors without their conscious intent of creating sounds. I guess one might call it accidental crowd-composing… This gives an insight into the depth, breadth and dynamics of a Linux kernel development cycle without the need to know anything about the code or the nature of the changes themselves. It also resulted in some sonorities and textures I would never have been able to conceive otherwise.
This early version has relatively simple rules for converting Git history into sounds. First, it is decomposed into linear segments of consecutive commits ending with a merge commit. The duration of each sound is a logarithmic value taken from the time difference between two commits on the segment (using the authors' timestamps which comes from the patch emails, not when they were applied on a branch). The final merge commit ties the segment in time relatively to its “upstream” segment since it also contains that same merge commit. The master branch is the first and main segment from which all the others are derived via merge commits. This is how the timeline is being determined.
All the sounds on a given segment are generated with the same frequency. An initial one is assigned to the main segment (440Hz which is a middle A note), then each sub-segment gets assigned a new frequency by multiplying it by a natural ratio picked randomly from a fixed set. This creates just intonation intervals. As this is done recursively with branches getting merged into other branches, segments that are further away from the main segment tend to have a more remote frequency (either very high or very low). Also worth noting, distant segments will start introducing dissonant intervals as just intonation does not follow the chromatic scale (see also microtonal music). Similarly, amplitude levels are set so that distant segments are positioned further away from the central point in the stereophonic landscape. The left/right choices are made randomly. Finally, all the sounds are generated with a simple triangle wave and their duration is half the time assigned to each commit.
As a result, the frequency of the last sound (i.e. the commit leading to the v5.17 release tag) is always 440Hz since this is the root commit from the main segment. Everything else depends on the history with a fully deterministic relationship from a timing point of view. Due to a few random elements, the generated audio will however be slightly different every time a same data set is being sonified. In particular, the frequencies and stereo levels will be using different combinations. This property can be used to generate several sonifications and pick the most convincing one, for example by avoiding unfortunate cases where too many segments have the same frequency etc.
Please feel free to create GitLab issues or send an email if you want to take part in this experiment with suggestions, feedback or contribute in any way. Thanks for listening, and hopefully see you around v5.18 for some new sounds.
Last modified on 2022-04-12