Skip to main content

Strategic updates from the KernelCI project

The KernelCI project continues on its mission to ensure the quality, stability and long-term maintenance of the Linux kernel. That means supporting the community (especially maintainers) to not just run their code in a Continuous Integration (CI) system, but also deliver relevant, high confidence results and reports. In this post, we will give you a strategic overview of the steps we have taken towards our mission over the past few months.

Enabling new infrastructure to run tests

Our legacy system has shown its age and it has been failing to meet the growing testing needs of the community. It is a quite limited, unstable python2 project that focuses on embedded hardware. We continue on the journey to put in place our new infrastructure, so we can finally move away from the legacy KernelCI system.

We bring good news! The new core service for running tests is already in place, but still going through a stabilization phase. So, the team is ramping up the number of tests slowly to deal with issues that arise especially when it comes to the quality of the testing KernelCI provides. We do not want to repeat past mistakes with results that can’t really be trusted, so our focus right now is on quality rather than quantity. The team is iterating quickly on that process to enable open, wide adoption in the coming months.

In our new KernelCI infrastructure, we already have support to run tests in labs (through LAVA only at the moment), Docker containers, Kubernetes and natively. Adding new labs or test environments should be relatively straightforward. Then, as we add more test environments, we are also laying down the foundation for integration with other CI systems across the community so we can share kernel builds and offer test environments.

As we shared before, the new infrastructure exposes an API for users to create accounts, query results and even drive tests themselves. At the moment, we are focusing on enabling our own pipeline there, so we can shut down the legacy system. But anyone is welcome to request a user account and try it out.

Another initiative from the KernelCI community is the GitLab CI support in the mainline kernel. Here, the goal is to offer maintainers a CI environment that they can manage themselves. With time, KernelCI API can be leveraged to provide a backend for builds and test runs for repositories using GitLab CI.

Trusting tests results and reports

On one end, we are stabilizing our new infrastructure to run the tests. On the other end, we are looking into improving the quality of the reports KernelCI sends out, so maintainers and developers can actually trust them. Given the huge amount of data coming out of test systems and bots today, we must invest in improving the delivery of the results, or else, KernelCI will be contributing to increasing the maintainer burnout rather than helping solve it. That means improving the quality and confidence of the data, so maintainers and developers only receive reports packed with relevant information and no noise or false-positives.

At the time of this writing, we have a handful of trees enabled, boot testing and a few tests enabled (including kselftest support) in our new test infrastructure. That setup is enabling the team to triage ALL the results to identify infrastructure failures and test patterns in general (flakiness, config issue, intermittent issues, etc). There is a significant investment to develop better tests together with the community (like the device tree probe kselftest) that is improving the quality of the results compared to what exists in our legacy system.

As part of the effort, we are developing a layer for post-processing the test results in KCIDB – the KernelCI database to collect test results from the entire Linux kernel test ecosystem. The work in this area is at proof-of-concept level, but it is already enabling the team to evaluate the results coming from our new infrastructure. The post-processing layer should be a key part of the feedback loop with the community. The goal is to increase automation in triaging the results, saving precious time from kernel maintainers. Also, because KCIDB collects data from various CI systems, the post-processing of test results can be enabled for more systems than just KernelCI.

On top of that, the KernelCI team is redesigning the Web Dashboard UX to enable rich visualization of all that data for the entire community. A public request for feedback on UX should go out in the coming weeks.

It’s all about engaging the community in testing

Solving CI needs for the Linux kernel community is not just a technical challenge. It is in big part a community engagement challenge too. The KernelCI project has a strong focus on engaging the community in testing processes. With our new infrastructure coming into place, we are ready to give a new spin to our Community Engagement initiative.

For that, we are forming a Community Engagement Working Group (WG). The WG will focus on connecting with maintainers to discuss and implement improvements in test quality for their subsystems and also act as a feedback recipient for improvements in our post-processing of the test results. The Community Engagement WG will be led by Shuah Khan, kernel maintainer & Linux Fellow at The Linux Foundation.

A dedicated announcement of the Community Engagement WG was sent to the KernelCI mailing list. If you are interested in participating, raise your hand!

Where are we going from here?

As you can see, a lot is going on in KernelCI at the moment. The team is iterating quickly on the development of the new infrastructure, so we will be engaging with new maintainers and developers every month from now on, bringing them to the new infra and pushing the system limit further. If you are a maintainer and want to bring your tests to KernelCI please send us an email at kernelci@lists.linux.dev.

That’s all for now! Stay tuned for updates on topics discussed in this article. Likewise, as the new infrastructure stabilizes, expect a significant amount of documentation updates too.

We thank all KernelCI member organizations and developer community who have been investing in the project over the years. It is only because of the continued support from our community that we are making the legacy system a past story!

Exciting KernelCI TSC updates

During our last TSC meeting, on March 14th, we had two votes that brought new winds to our community:

Shuah Khan joins the TSC

Shuah, Kernel Maintainer & Linux Fellow at The Linux Foundation, has been elected a new member of the TSC. We believe Shuah’s experience will help the KernelCI project expand its kernel community engagement efforts, taking testing and CI to more parts of the upstream kernel. Welcome, Shuah!

Nikolai Kondrashov elected as the new TSC Chair

Nikolai has been a longstanding contributor under the KernelCI umbrella and the person behind KCIDB. At Red Hat he also contributes to kernel integration technology. His knowledge around the problems KernelCI needs to solve will help the project greatly. His role will be effective on April 1st. All the best to Nikolai in his new role!

While at it, we thank Guillaume for his tenure as TSC Chair of the project!

The detail of the motions which were voted can be found on the TSC documentation.

UX Research for the new Web Dashboard

Following the UX Analysis RFP we shared a few months ago, we’re now delighted to announce that the KernelCI Advisory Board has accepted a proposal. First of all, we’re grateful to have received such a series of high-quality proposals. Many thanks once again to all the submitters for your interest in the project.

After carefully considering all the options with the board and some TSC members, the vote came out in favour of ProFUSION. Congratulations! We’re looking forward to the next milestones, in fact the work has already started.

Milestones

The UX Analysis project is broken down into three milestones with four weeks assigned to each, so a total of twelve weeks. Some breaks will also be scheduled during this time so the calendar end date is likely to be in March 2024.

M1: Result of interviews

Some interviews will be carried out with various stakeholders to help refine user stories and start modelling the user experience and some design aspects.

M2: First UX design proposal

A first interactive prototype will then be created and made available for members of the public to evaluate and provide feedback.

M3: Final UX design proposal

Based on the feedback and outcome of the first two milestones, the last part is about creating high-level specifications of the web dashboard design suitable for the implementation phase.

KernelCI API Early Access

Free for all

Today is the beginning of the Early Access phase for the new KernelCI API. As explained briefly in the previous blog post, this is to give everyone a chance to create a user account and start using the API for beta-testing purposes. There’s now an Early Access documentation page with a quick guide to get started, so please go take a look there and start taking part.

A work in progress

Although the fundamental principles of the API & Pipeline have now settled a bit, it is still under active development. In particular, we’re expecting to see a fair amount of changes in these areas:

kci command line tool

It’s still very new and only provides some basic features, so now it needs some proper design. For example, new commands should be added and it might become more human-readable with things like kci find node rather than kci node find.

Build and test coverage

This can only grow as right now there’s only KUnit, one x86 build and one QEMU smoke test run for each kernel revision (and only from mainline). Starting to scale this up will help tackle the main bottlenecks and performance issues in the infrastructure before reaching production quality.

Documentation

Yes, now that things are shaping up we should also be taking good care of the overall documentation and general ease of use of the project. This should also encompass things such as moderation rules to ensure continuity of the project.

A two-way process

This system is made for you, all the members of the wider Linux kernel ecosystem. So while there’s a small but growing team of developers still typing away all the code needed to make it happen, we need your feedback to help shape things up in such a way that it actually delivers on its expectations.

Please experiment as much as you like and share your stories, thoughts and questions via the project’s usual communication channels. Also feel free to create issues on GitHub and send pull requests of course.

Happy beta-testing!

API Transition Timeline

Once upon a time, on a Thursday afternoon somewhere in Italy, a KernelCI backend API was created:

commit 08c9b0879ebe81463e124308192670c0e7447e0b
Author: Milo Casagrande <milo@ubuntu.com>
Date:   Thu Feb 20 16:10:41 2014 +0100

    First commit.

As you can see, it was nearly 10 years ago. How much does that represent in the modern software world? Of course, it depends. The Linux kernel is much older, still written in C and still going strong. But in most cases, including this particular one, it means a whole new world. The old API was written in Python 2.7 which stopped being maintained as a language on 1st January 2020. We could have just rewritten it in Python 3, which was the initial thought. But in the meantime, KernelCI was also growing as a project. It wasn’t just about building ARM kernels and doing boot testing on embedded dev boards any more. It had become a Linux Foundation project aiming to test the whole upstream kernel.

What is this new API?

Following this move, an increasing number of people became interested as it got under the spotlight. That is when we started to realise that the architecture needed to fundamentally evolve in order to match the scale of the new mission it had been assigned. The Linux kernel is a vast and complex open-source project with a unique ecosystem. As such, it requires some unique tooling too. We all know that Git was initially created out of a need to manage all the kernel patches. Now KernelCI needs an automated testing tool tailor-made to its unique requirements – and that’s why we’re finally launching the new API & Pipeline.

It comes with lots of improvements and it’s still a work-in-progress. We’ll keep publishing blog posts and update the documentation as things evolve over the next few months. Right now we have a pipeline that can monitor Git repositories for new revisions, take a snapshot of the kernel source tree in a tarball, run KUnit with it as well as an x86 kernel build and smoke test it in QEMU. It can then also send a summary email and detect regressions. That’s basically enough to prove we have a workable system. Nothing too groundbreaking there, you might think. So, what’s all the fuss about?

In a nutshell: a Pub/Sub interface to orchestrate distributed client-side services that can be run anywhere. You could have your own too at home. Also: user accounts so you can keep your own personal test data there, an abstraction for runtime environments so jobs can be run seamlessly in Docker, Kubernetes, a local shell, LAVA, [insert your own system here]… a new kci command line tool to rule them all and a unified Node schema to contain all the test data (revision, build, runtime test, regression…) in a tree. But again, we’ll go through all that later in more detail. It’s all based on requirements gathered from the community over the past few years.

Timeline

The main message in this blog post is the timeline for retiring the old system and getting the new API in production. Here’s the proposal:Early AccessMonday 4th September 2023Production DeploymentMonday 4th December 2023Legacy System DeprecationMonday 4th March 2024Legacy System SunsetMonday 4th November 2024

It only takes four Monday-the-Forth milestones to get through all this. Here’s what they mean:

Early Access

This is when a new API & Pipeline instance becomes available to let the public experiment with it. It can be seen as some form of beta-testing. It will be deployed in the Cloud to evaluate how a real production instance would behave, but it’s only kept online as a best effort. There should be frequent updates as the code evolves, probably at least weekly and at most daily. Only changes that made it through early testing on the staging instance should be deployed so it’s meant to be reasonably stable.

Production Deployment

The plan is to build upon the experience learned from the Early Access deployment to prepare a persistent instance that would eventually become the production one. Data should be carefully kept and backed up, changes in the database schema should go through managed migrations and the API code should be deployed from tagged releases. As soon as this has become reliable enough we might shut down the Early Access instance since it should have become redundant by then.

Legacy System Deprecation

In other words, this is when the new API & Pipeline production instance becomes the official main KernelCI instance. We’ll first be going through a transition phase to ramp up the build and test coverage on the new API while equally reducing the load on the legacy system to avoid overloading the shared infrastructure. Ideally, coverage should have reached 80% on the new API and 20% on the old one by this date.

Legacy System Sunset

After being deprecated, the legacy system will keep running with a bare minimal amount of coverage just to facilitate the transition for users who depended the most on it. It will be definitely shut down and the data will be archived when finally reaching the Sunset milestone.

Stay tuned

These dates have been identified as realistic targets for having the new API rolled out and retiring the old one with a transition in between. We’ll be aiming to have the new API in place by these dates and conversely retire the legacy system no sooner than announced here.

In the meantime, we’ll be posting updates as these milestones get reached or if any alterations need to be managed. We’ll also clarify how to use the API and exactly what features become available alongside the main documentation. Please share with us any feedback you may have, if you need some clarifications or to raise any concerns. The best way to do this is via the mailing list as always. Stay tuned!

Request for Proposals: UX Analysis 2023 – Q&A

Following the UX Analysis RFP, we’ve received a number of questions which seem worth sharing publicly in order to equally benefit all the proposals we receive.

Big Picture

What are your organization’s most important broader goals with this new dashboard?

We’ve identified a requirement to have a web dashboard in a community survey we did a couple of years ago. It’s mostly about providing Linux kernel developers with the information they need to facilitate their daily workflows, and also other types of users for example if they’re basing their products on the upstream kernel and need to monitor its quality.

What are biggest issues or problems you’re having with your current system that prompted this UX Analysis RFP?

We currently have a very old web dashboard with an associated backend that can’t be maintained any more. On top of that, the project has been growing and we’re now redesigning the whole approach to be able to better scale with a new API which doesn’t have any actual web dashboard right now.

What factors made your team decide to release an RFP for this project?

None of the KernelCI LF project members had enough in-house expertise. Also, looking for an independent external supplier appeared as an appropriate choice in this case.

Is there an incumbent bidder on this project?

No, this is the first RFP we do about UX Analysis and web development in general.

How will vendors be evaluated and scored?

We will come up with some criteria as a basic comparison method, then each member of the advisory board will look at all the proposals and we’ll discuss it and eventually hold a vote. We may also inquire further with some vendors if needed.

How many rounds of revisions and how many UX flows are you expecting as part of this project?

This is very hard to predict as we’re still in the early design stages. There should probably be a small number both of revisions and flows (e.g. 2 or 3), maybe later we would be dealing with incremental changes resulting in more revisions as part of the full implementation efforts.

Do you have a preference regarding the vendor’s location?

No, there is no preference over the country where the vendor is located. The KernelCI project’s team is remotely distributed around the world.

Content

Would you need any copywriting or content migration services?

None that we’re currently aware of.

Would you need any original or stock videography or photography?

Not with the UX Analysis phase. We might need some original content for a final website in production.

How much content do you currently have on your website?

Our static websites have tens of pages. Our current dynamic web dashboard has millions of entries, and this is what we’re expecting to see covered by the UX Analysis. Some static content may be part of the interactive web dashboard but it’s not the primary goal.

Is there a CMS that you have a preference for over the other?

No, however we do have an API for storing our data. How this is turned into a UX and web dashboard is up to the vendor to decide as part of the proposal.

What CMS platform do you use currently?

For some static websites we use WordPress and Hugo, but this UX Analysis work is for an interactive web dashboard. We don’t have any particular CMS requirements for it.

Requirements

Would you require hosting, dns or ssl services?

No, the KernelCI project can take care of this.

How much initial research has been done as part of this RFP?

A lot of research has been done in the past few years to try and understand what the public and users need. What’s missing is how this may translate into an actual UX. We now have some ideas about “what” we need but not “how” users can have it.

Are there any factors driving the timeline for the completion of the work?

We’re developing a new API which will be used hand-in-hand with the web dashboard. The timeline isn’t set in stone but having a prototype dashboard or basic demo around the end of September would be great. We’re thinking of having the new API in production in the first half of 2024 so it would be good to see the web dashboard getting finalised around that time too.

Can you give us a high-level overview of the demographics of each persona from the user stories?

  • “Someone who cares about the kernel” can be literally anyone, from a student to a high-profile maintainer or developer. The only real criteria is that they need to know about the upstream kernel code quality. There may be a million people in this category.
  • “Kernel / subsystem maintainer” are a relatively small set of people in charge of accepting changes into the kernel. They form some sort of pyramid of trust with several maintainers sending their collected changes to a common maintainer etc. Like any kernel contributors, they are located around the world and have various levels of experience. There’s maybe about 100 subsystem maintainers and 1000 maintainers responsible for smaller areas of the code.

Do you have examples of the email reports that are sent with summaries of test results?

Are the test results currently stored in a database that the new web dashboard will visualize?

Yes, the new API has an auto-generated documentation with OpenAPI description. This is still a staging instance for experiments, we’re planning to roll out a production-like instance in the coming weeks and start refining the schema. Basically, all the test data is contained in a tree of Node objects. The underlying engine is MongoDB, and we’re looking into using Atlas for this. The API also features a Pub/Sub interface for events that trigger different stages of the testing pipeline on the client side.

The KCIDB database has a different schema, but the web dashboard wouldn’t necessarily need to read data from both sources. That’s something we still need to define, there are several ways to solve this. It’s also something which might depend on the outcome of the UX Analysis. There’s already an interim web dashboard for KCIDB based on Grafana.

Request for Proposals: UX Analysis 2023

The KernelCI project runs thousands of Linux kernel tests every day, generating a huge amount of data to help the community identify issues and trends. One way to communicate all this is through a bespoke web application that truly embodies the kernel community’s use cases. This Request For Proposals (RFP) aims to be an initial investigation to understand how the User Experience (UX) of the Web Dashboard could look like based on a set of user stories compiled by the KernelCI team.

Budget

We’re expecting vendors to submit a fixed-price proposal showing the total costs for the different phases they plan for the project. Optional or extra phases can be included as well. The vendors have autonomy to propose a process for the iterative feedback roadmap. Payments would be made by the Linux Foundation using the project’s own budget.

Sending proposals

Please take a look at the full RFP-UX-Analysis-2023-v2.pdf document for all the details. Proposals should be sent by email directly to the project members.

The deadline for responding to this RFP is 10th July 2023, six weeks after it has been made public. Then the KernelCI Advisory Board of Members will vote on the 24th July 2023. Exact dates might be subject to change in case of a major practical issue or unavailability of voting members.

Edit: Following a surge in last-minute queries, the timeline has now been extended. The new deadline for submitting proposals is 24th July 2023 and the advisory board is planning to hold a vote on 9th August 2023.

KernelCI at FOSDEM 2023

Taking place February 4th and 5th, FOSDEM is a fantastic event organised “by the community for the community” in Brussels, Belgium, Europe. FOSDEM provides Open Source communities a place to meet in person.

The KernelCI initiative is delighted to be present at FOSDEM 2023.

Testing is recognized by all to be critically important for Open Source communities at large, and not enough is being done. The KernelCI initiative (a Linux Foundation project) intends to change that.

During the weekend of FOSDEM 2023, please keep testing in mind throughout the conversations you will be having with peers and members of the community.

Remind yourself and your interlocutors of the importance of testing and the existence of the KernelCI initiative.

Find out more about KernelCI

Mission statement: https://kernelci.org/docs/org/

Interested to see your tests ran by KernelCI natively? Here is how to get started with KernelCI.

You already have your own automated execution of tests and would be interested/willing to contribute your results? Please see the KCIDB submitter guide.

You can find the catalog (yaml file) of the tests integrated today here. Some of the most notable test suites reporting today: bpf, kselftest, ltp, perftool, podman, Redhat’s test suites, stress-ng, syzkaller, xfstests, and many more.

If you would like some help, please reach out to the KernelCI community:

Members of the KernelCI Advisory Board (AB) and Technical Steering Committee (TSC) will be present in-person at FOSDEM 2023. Many will be in the Testing and Automation devroom (Sunday morning in room UB4.132).

Looking forward to meeting as many as possible in Brussels, February 4th and 5th 2023.

A case for DAG databases Correlating revision history with CI results

  • Track: Graph Systems and Algorithms devroom
  • Room: K.4.601
  • Day: Saturday
  • Start: 12:00

Growing a lab for automated upstream testing: challenges and lessons learned

  • Track: Testing and Automation devroom
  • Room: UB4.132
  • Day: Sunday
  • Start: 09:30

Rethinking device support for the long-term

  • Track: Kernel devroom
  • Room: UA2.220 (Guillissen)
  • Day: Sunday
  • Start: 16:30

Now KernelCI has a dedicated SysAdmin!

Recently, we posted about the Request For Proposals to undertake SysAdmin tasks for KernelCI.org. That process is now complete, and we are delighted to announce that the KernelCI advisory board hired Vince Hillier from Revenni Inc to conduct the work.

Vince will work together with the Technical Steering Committee (TSC) to maintain and improve the current project infrastructure. As KernelCI.org scales with more kernels being built and more tests being run, we definitely need help to keep our systems stable and up to date.

The KernelCI project relies on a number of web services which need constant maintenance. These include databases, automation tools and web dashboards for several instances. Some are hosted on dedicated virtual machines (VMs), others in the cloud.

Request for Proposals: Sysadmin Maintenance 2022

Summary

The KernelCI project relies on a number of web services which need constant maintenance. These include databases, automation tools and web dashboards for several instances. Some are hosted on dedicated virtual machines (VMs), others in the cloud. This Request for Proposals seeks to extend the current team with dedicated sysadmins to ensure the maintenance of these services is being carried out to guarantee a good quality of service. Additionally, some improvements can be made to reduce the maintenance burden.

Budget

We’re expecting quotations for this work package to range between 15,000 and 30,000 USD depending on the contents of the proposal and for a period of six months. Longer sysadmin time available and extra improvements can justify a higher price. Payments would be made via the Linux Foundation using the project’s own budget.

Sending proposals

Please take a look at the full RFP-Sysadmin-Maintenance-2022-v3.pdf document for all the details. Proposals should be sent by email directly to the project members.

The deadline for responding to this RFP is 8th August 2022, six weeks after it has been made public. Then the KernelCI Advisory Board of Members will vote on the 24th August 2022. Exact dates might be subject to change in case of a major practical issue or unavailability of voting members.

KernelCI Hackfests

KernelCI hackfests span over a few days during which a number of contributors get together to focus on upstream Linux kernel testing. So far, mainly kernel and automated test system developers have been taking part in the hackfests but anyone is welcome to join. Topics mostly include extending test coverage in various ways: enabling new test suites as well as adding test cases to established frameworks such as kselftest and LTP, building additional kernel flavours, bringing up new types of hardware to be tested…

There have been two hackfests to date. The current plan is to hold one every few months. Future hackfests will be announced on the KernelCI mailing list as well as LKML and Twitter. Stay tuned!

Connecting the dots

There is a large ecosystem around the Linux kernel which includes testing in many shapes and forms: kernel developers, test developers, test system developers, OEMs testing fully integrated products… All these teams of people don’t necessarily interact with each other very much outside of their organisations, and kernel developers aren’t necessarily in the habit of writing tests as part of their daily work.

Events such as the KernelCI hackfests give a chance for people from these different areas to work together on solving common issues and keep the ecosystem healthy. It also helps with shifting the upstream Linux kernel development culture towards a more test-driven workflow, to bring mainline Linux closer to the real world where it is actually being used.

Let’s consider a plausible hackfest story. A first participant writes a new test, say in kselftest. A second participant enables the test to run in KernelCI, which fails in some cases and a kernel bug is found. A third participant makes a fix for the bug, which can then be tested directly in KernelCI to confirm it works as expected. This may even happen on the staging instance before the patches for the test and the fix are sent to any mailing list, in which case the fix would get a Tested-by: "kernelci.org bot" <bot@kernelci.org> trailer from the start. This scenario also relies on some hardware previously made available in test labs by other people, putting together efforts from at least four different participants.

Timeline

Here’s a summary of the first two hackfests:

Hackfest #1 – 27th May to 4th June 2021

  • Workboard: https://github.com/orgs/kernelci/projects/3
  • Participans
    • Several kernel developers from Google Chrome OS
    • Several kernel developers from Collabora
    • A few members of the core KernelCI team
  • Achievements
    • New KernelCI instance for Chrome OS on https://chromeos.kernelci.org
    • Added support for building Chrome OS configs on top of mainline
    • Enabled clang-13 builds for Chrome OS and main KernelCI builds
    • New test suite for libcamera enabled on https://linux.kernelci.org
    • Enabled LTP crypto tests with extra kernel config fragment
    • Several patches were also sent to extend kselftest and KUnit coverage in the kernel tree

Hackfest #2 – 6th to 10th September 2021

Lessons Learned

What went well

  • There were several new contributors. The hackfest is a great way to get people started with KernelCI.
  • Hackfest #2 showed more diversity with a wider representation from the ecosystem.
  • There were many improvements in various areas (bug fixes, documentation) which is a sign of a healthy project.
  • The workflow based on GitHub and the Big Blue Button platform appear to have been easily adopted by the participants.

What needs to be improved

  • Having more kernel maintainers involved would help with setting priorities and ojectives for KernelCI in accordance with kernel developers’ needs.
  • A hackfest every 3 months may be a bit too often, or maybe some could be shorter or have a more particular theme.
  • Changes can take a long time to get merged. The main limitation seems to be the number of people available to do code reviews and drive discussions.
  • KernelCI has been focusing on running more tests for over a year (kselftest, LTP, IGT…). Now the core architecture needs to be improved to scale better.

Next hackfest

The actual dates haven’t been confirmed yet, but with the current 3-month frequency the next hackfest should be taking place early December 2021. A proposed theme could be “KernelCI for newbies”, with a selection of tasks well suited for first-time contributors and documentation improved in that area prior to the hackfest. As always, suggestions are always welcome so please do get in touch if you have any.

See you there!

The first ever KernelCI hackfest

The first KernelCI test development and coverage hackfest took place from 27th May to 4th June 2021. For a total of seven days, developers from the KernelCI team, Google, and Collabora worked to improve many different aspects of KernelCI testing capabilities.

The hackfest was a community event promoted by the KernelCI team. It aimed at bringing developers and companies together to improve testing for areas of the Linux kernel they care about. Through this effort, the KernelCI team also expects to increase awareness for continuous kernel testing and validation – more hackfests will happen in the future, so stay tuned if you want to join.

The first-ever KernelCI hackfest was a success. It kicked off the work to enable kernel testing through Chromium OS, a product-specific userspace. Enabling full userspace images and real-world tests like video call simulations adds a lot of complexity to the testing process. However, the benefits are a clear win for the community. They allow a more thorough kernel testing and validation through real application use cases, which can exercise several different kernel areas at the same time in an organized manner. Generally, it is not simple for lower-level kernel test suites like kselftests or LTP to orchestrate a similar use case.

Consider video call simulation for example. Once the user starts a video call, the test can begin by measuring the time needed to set up the video feed and show it in the browser rendered with the rest of the video call website. Then, as soon as the video call is up, many other measurements can be made: camera capture latency, camera stream to network latency, memory consumption, power consumption, GPU performance, background tasks latency, and user interaction latency. These types of tests stress the kernel in unique ways, exposing problems that might otherwise go unnoticed from release to release.

The support for the Chromium OS userspace is the start of full-stack tests in KernelCI. It is still quite experimental, but the support will evolve over the next few months, opening the path for other product-specific userspaces. Increased kernel testing diversity will definitely result in catching more regressions earlier. “Keeping upstream healthy is really important to us in the Chrome OS team (and Google broadly!) since we constantly pull in stable fixes and regularly push out major kernel version upgrades to our users.” said Jesse Barnes who leads the Base OS team of Chrome OS at Google.

On another front, there was progress enabling more testing for different kernel areas as well as improvements to the rootfs testing images used by KernelCI:

  • kselftests received new tests for the futex() system call, basic semantics validation, and soft-dirty page table entry mechanism corner cases;
  • improved LTP crypto tests by enabling missing kernel configs needed;
  • libcamera now has its first few tests running on KernelCI;
  • fs/unicode tests converted to the new Kunit mechanism;
  • bootr test to check if all CPUs went online successfully;
  • experimental support for including firmware files in the rootfs.

The overall results were significant for only a few days of work. Kernel testing through product-specific userspace opens a whole new avenue of possibilities for KernelCI. On top of that, there was an accomplishment for many test cases and test suites, as detailed above. “It has been an amazing week and a half. We’ve achieved a lot in such a short time, in spite of a few workflow weaknesses which we’re now addressing to help further grow the KernelCI community with new developers.” said Guillaume Tucker, KernelCI project lead and Senior Software Engineer at Collabora.

More hackfests will come in the future – this was only the first one. Dates are being discussed for the next hackfest to happen around the end of August, a few weeks before the Linux Plumbers Conference. The KernelCI team invites developers and companies to participate. Joining a hackfest is a great way to quickly evolve the kernel testing knowledge of your team, leading to products and services that work better with easier maintenance over time. So make sure to raise your hand and join the next time a KernelCI hackfest happens.


KernelCI is a Linux Foundation project. If you are interested in learning more or becoming a member contact us.

Looking back, looking forward

2020 was the first year of the KernelCI project under the Linux Foundation and has been an interesting one.  Maybe slightly less “interesting” than the rest of the world-changing events of 2020, but it’s still been an adventure.  This article aims to give a quick summary of the major milestones of the first year of KernelCI project, and highlight our goals for the next year.

Highlights of the first year

The founding members spent the end of 2019 doing a formal launch and ramping up the project structure and organization. This led to our mission statement and key goals.  Throughout 2020 we gave talks and led discussions at several virtual conferences such as FOSDEM, Open-Source Summit / Embedded Linux Conference (ELC).  Check out our blog for more details about the talks and discussions from these events. 

Community Collaboration

In the middle of 2020, we did a Community survey to get a sense for what the kernel testing and automation community was looking for.  This survey has helped guide where we focus our time and resources.  See our blog for an article covering the full results of the survey.

One highlight of the 2020 conference circuit was Linux Plumbers Conference (LPC). At LPC, we gave talks and held focused discussions with our target audience: kernel developers and maintainers.  The full details are in a blog article covering the event , but this is where we kicked off public discussions of how to unify test results and reporting from various testing and CI efforts in the community.  We’re calling this common datastore for kernel testing results kcidb.  Thanks to the discussions kicked off at LPC, we’re now collecting results from several other projects such as Red Hat CKI, Google syzbot, Arm, Gentoo, and the Fuego project.  Continued collaboration with these projects as well as other new ones will be a focus area for 2021.

Infrastructure

Another area of growth in 2020 was in our IT infrastructure.  As you might expect, we do lots of kernel builds, and that requires lots of compute horsepower.  Our build capacity had been capped by a fixed number of donated build machines.  But now, thanks to the generous donations of founding members Google and Microsoft, we now have scalable cloud compute resources under Google Compute Platform (GCP) and Microsoft Azure which we manage with Kubernetes so that we can dynamically scale as our compute needs grow.

Looking forward

Kicking off 2021

The governing board kicked off our 2nd year with some project organizational matters such as budgeting and electing this year’s executive committee.  We are very happy to welcome Guillaume Tucker (Collabora) as the new board chair and Chris Paterson (CIP/Renesas) as the new treasurer.  We also say a big thank you to outgoing chair Kevin Hilman (BayLibre) and outgoing treasurer Guy Lunardi (Collabora).  

Focus areas

Data

As mentioned above, the collaboration with other testing and CI projects will remain a major focus for 2021.  We want it to be easy for anyone doing kernel testing to be able to submit results to our open, centralized datastore: KCIDB.  The amount of data we’re collecting is growing rapidly, so we’re also looking for help from “big data” experts to help us build the tooling to visualize and learn from all the data.  Please write to us on the mailing list if you have any interest in helping here.

Infrastructure

We’ve been using Jenkins for years to manage our CI pipeline jobs, but as we’ve moved more of our infrastructure into the cloud, we’re looking at ways to migrate our CI infrastructure to a cloud-native framework such as Tekton or Jenkins-X.  We’re in the early stages of exploration here, so anyone with experience here that could help guide us, we’d love to hear from you!

Data Visualization & Analysis

We’re also in the early stages of planning new dashboards for visualization and analysis of our growing data set.  We’re soliciting feedback from the broader community by collecting user stories to better understand what our users want from new dashboards.  In addition to making all the testing data and logs available through advanced visualization tools, we’d also like to enable analytics and deep learning on our growing data set.  Once again, this is something we’d love your help on, so if you’re a big data enthusiast and want to put your skills to use to help the Linux kernel, please let us know.

Get involved!

Did you notice any themes above?  We’re looking for help!  We have some big ideas and plans, but we’re still a very small team and are looking for expertise in a few areas to help guide the future of the project.

Please keep in touch with what we’re up to or to get involved, you can read our blog, follow on twitter @kernelci or join our mailing list.

Notes from OSS/ELC Europe 2020

The OSS/ELC Europe 2020 conference took place online from 26th to 28th October. There was one BoF session and one talk about KernelCI, followed by an impromptu video call. The notes below were gathered based on these events.

BoF: Lessons Learned

Guillaume Tucker, Collabora

A lot has happened since KernelCI was announced as a new Linux Foundation project at ELC-E 2019 in Lyon. One year on, what have we learnt?

See the full Event description for slides and more details. Below are a list of Q&A gathered from the session.

Q: I wonder if you plan to add any subsystem-specific CI? Are there any plans/ideas? e.g. for scsi drivers

There are already subsystem-specific tests being run, and subsystem branches can be monitored. Then results can be sent to subsystem mailing lists. For example, this is the case with v4l2: kernelci.org runs v4l2-compliance on a number of platforms for several branches including the media tree, mainline, stable and linux-next, and sends reports with regressions.

There should not be any subsystem-specific infrastructure needed on kernelci.org, but rather different tests and maybe different parameters to adjust to the workflows according to maintainers’ needs.

Q: Some time ago there was a way to search for test runs in a specific lab. I mean on the dashboard. But it seems this feature is gone now. Was that intended? Is it coming back? Can we help and contribute here? 🙂

The web frontend was scaled down to accommodate for functional testing rather than boot testing. This was because all the boot testing search pages were tailor-made, which doesn’t scale very well and is very hard to maintain.

We’re now looking into a fresh web dashboard design with flexible search features to be able to do that. As a first step, we are collecting user stories.  If you have any, such as “I want to find out all the test results for the devices in my lab”, feel free to reply to this thread:
https://groups.io/g/kernelci/topic/rfc_dashboards/77367531
“RFC: dashboards, visualization and analytics for test results”

Q: What is the relationship between KernelCI project and LAVA project? Does KernelCI have non-upstream changes to LAVA? Do LAVA people participate in KernelCI?

LAVA is used in many test labs that provide results to KernelCI, but KernelCI doesn’t run any labs itself. Some people do contribute to both, as KernelCI is one of the biggest public use-cases of LAVA, but they really are independent projects. The core KernelCI tools are designed to facilitate working with LAVA labs, but this is not a requirement and other test lab frameworks are also used.

Q: Is there any documentation on how to write those “custom” tests and to integrate it with KernelCI? (e.g. the SCSI drivers/storage devices you just mention before)

See Khouloud Touils’ talk Let’s Test with KernelCI with some hands-on examples.

There is also the user guide as part of the KernelCI documentation:
https://github.com/kernelci/kernelci-core/blob/master/doc/kci_testsuite.md

Each test is a bit different as they all have their own dependencies and are written in various languages. Typically, they will require a user-space image with all the required packages installed to be able to run as well as the latest versions of some test suites built from source. This is the case with v4l-utils, igt-gpu-tools or LTP. Some are plain scripts and don’t depend on anything in particular, such as bootrr.

When prototyping some new tests to run in LAVA, the easiest approach is to use nfsroot with the plain Debian Buster image provided by KernelCI and install extra packages at runtime, before starting the tests. Then when this is working well, dependencies and any data files can be baked into a fixed rootfs image for performance and reproducibility.

Q: How to properly deal with boards which are able to boot only from a mass-storage device and prevent them from being stuck with a non-working image?

In order to be useful with KernelCI, it’s required to at least be able to dynamically load the kernel image as well as any modules and device tree with a ramdisk for the tests that fit in a small enough image. If this can’t be done, then the kernel and user-space images need to be written to the persistent storage before each job. It might also be possible to load the kernel over TFTP and then extract the image onto the persistent storage and use it as a chroot. Ultimately this is the lab’s responsibility and it will depend on many things. If the kernel and the user-space can’t be changed at all, or if there is a possibility of bricking the device, then it’s basically not practical to do any CI on such a platform.

Let’s Test with KernelCI

Khouloud Touil, Baylibre

A growing number of Linux developers want to use KernelCI to run their test suites, but there’s a bit of a learning curve for how to make test suites work with KernelCI. “Let’s Test with KernelCI” will give an overview of the ways to integrate test suites and/or test results into the KernelCI modular pipeline.

See the full Event description for more details. Below are a list of Q&A gathered from the session.

Q: Is there also support for custom YP/OE distros or is it currently limited to the usage of predefined kernels and file systems?

The kernels are all built with regular “make”, not any packaging or yocto recipe is supported right now. However, that could be done with a bit of plumbing. Then for user-space, kernelci only really tests the kernel: the buildroot and debian images are only there to be able to run kernel tests. If you create your own KernelCI instance, you can run tests with your own user-space built using Yocto and extend testing to cover some user-space if you want.

Q: Is there some kind of test config to require a certain kernel flag active? I am basically thinking about running some pre-defined test base, based on my own kernel config and then report back the results with something like “ran test X, which requires kernel config flag Y, on architecture/platform Z on kernel version V”.

Yes, there are a couple of ways to adjust the kernel config on kernelci.org. One way is with a special syntax like defconfig+CONFIG_SOMETHING=y. Another way is to define a config fragment. Each KernelCI test result will have the information you mentioned as meta-data.

Q: Which firewall streams must be permitted in order for KernelCI to use a custom Lab? I mean if we want to contribute a lab (with associated boards) to KernelCI.org?

LAVA exposes a REST API over HTTPS. It’s also possible to have the LAVA server hosted publicly and using LAVA dispatchers in a private network which will be connecting to the server as clients, with no incoming connections.

When not using LAVA, you can also periodically poll storage.kernelci.org for new kernel builds to appear, and download them to test them then send results to api.kernelci.org. In this case, no incoming connections are required either.

Q: In real life how are tests that need to check hardware I/O done? For example in your audio playback case it’s probably not enough to run the play command but we want to check that something was actually played e.g. by capturing the output.

For audio (and video), some hardware has loopback devices which can be used to compare against expected output. For more advanced setup, labs can have external capture equipment as well. But this ends up to be lab-specific since there are many ways to do it.

Follow-up impromptu video discussion

As we neared the end of the time slot for the “Let’s Test with KernelCI” talk, we decided to start a public video call with anyone who was interested and attended the talk. We discussed various general things about the project, and a few notes and Q&A were captured:

Q: How can a test lab get added to kernelci.org?

This is something that would require better documentation. We can distinguish 3 different “levels” of integration for labs:

  1. LAVA-style: fully integrated into the pipeline
    If you have a LAVA lab, it’s the easiest way to contribute test results to KernelCI. It also enables automated bisection and is the most efficient way of getting tests run.
  2. Asynchronous test lab
    If you have a test lab with no way to receive requests to run tests, you can look for kernelci.org builds to appear and submit results with kci_data. A typical example is Labgrid. One way to improve this is to implement some notification protocol so these labs could avoid polling and get requests to run tests like the LAVA labs.
  3. Autonomous CI system: KCIDB
    With options 1 and 2, tests use kernel builds from kernelci.org and report results to the same backend.  This is called the “native” KernelCI tests. Option 3 is for full CI systems creating their own kernel builds and running their own set of tests. The results are sent to the common reporting database using the KCIDB tools.

Q: Where can we find the source and definition of tests visible on kernelci.org frontend?

This is also something that would require better documentation, with a directory of all the test plans and how they are created. Functional tests are fairly recent on kernelci.org, which is why we don’t have that yet.

All the tests are normally defined in the kernelci-core repository. This includes building some test suites from source and including them in user-space rootfs images, and defining how to run the tests.

User story: Checking results for devices in “my” lab across all the branches and revisions.

KernelCI Notes from Plumbers 2020

The Linux Plumbers Conference 2020 was held as a virtual event this year. The online platform provided a really good experience, with talks and live discussions using Big Blue Button for the video and Rocket Chat for text-based discussions. KernelCI was mentioned many times in several micro-conferences, with two talks in Testing & Fuzzing which are now available on YouTube:

The notes below were gathered publicly from a number of attendees, they give a good insight into what was discussed. In short, while there is still a lot to be done, the KernelCI project is healthy and growing well in its role of a central CI system for the upstream Linux kernel.

Real-Time Linux

We’ve been making great progress with running LAVA jobs using the test-definitions repository from Linaro, thanks to Daniel Wagner’s help in particular. This was prompted by the discussions in the real-time micro-conference.

The next steps from a KernelCI infrastructure point of view is to be able to detect performance regressions, as these are different to binary pass/fail results. KernelCI can already handle measurements, but not yet compare them to detect regressions. Real-time getting merged upstream means it is becoming increasingly important to be able to support this.

There was also an interesting talk about determining the scheduler latency when using RT_PREEMPT and the introduction of a new tool “rtsl” to trace real-time latency.  This might be an interesting area to investigate and potentially run automated tests with:

Static Analysis

The topic of static analysis and CI systems came up during the Kernel Dependability MC, and in particular, they were looking for a place to do “common reporting” in order to collect results for the various types of static analysis and checkers available.  We pointed them to the KernelCI common reporting talks/BoFs.

Some static analysis can also be done by KernelCI “native” tests using the kernelci.org Cloud infrastructure via Kubernetes, which is currently only used to build kernels. This is probably where KUnit and devicetree validation will be run, but the rest still needs to be defined.

KCIDB

Fuego

Tim Bird, the main developer of Fuego at SONY, started joined the KCIDB BoF  and we had a good discussion. Unfortunately he had not enough time to go through to an actual submission. We got about a quarter way through converting his mock data to KCIDB.

Gentoo Kernel CI

Alice Ferrazzi, maintainer of GKernelCI at Gentoo, had more time available for the KCIDB BoF and we talked through getting the data out of her system. A mockup of her data was made and successfully submitted to the KCIDB playground database setup.

Intel

Tim Orling, Yocto project architect at Intel, has expressed keen interest in KCIDB.  He said he would experiment at home and will push Intel internally to participate.

LLVM/Clang

The recently added support for “LLVM=1” upstream means we can now have better support for making Clang builds. In particular, this means we’re now using all the LLVM binaries and not just clang. It also solved the issue with merge_config.sh and the default CC=gcc in the top-level Makefile.

This was enabled in kernelci.org shortly after LPC.

kselftest

The first kselftest results were produced on staging.kernelci.org during Plumbers as a collective effort.  We have now started enabling them in production, so stay tuned as they should soon start appearing on kernelci.org.

Initial set of results: https://kernelci.org/test/job/next/branch/master/kernel/next-20200923/plan/kselftest/

AutoFDO

AutoFDO will hopefully get merged upstream, once it is it might be useful for CI systems to share profiling data from benchmarking runs in particular.

Building randconfig

The TuxML project carries out some research around Linux kernel builds: determining the build time, what can be optimised, which configurations are not valid… The project could benefit from the kernelci.org Cloud infrastructure to extend its build capacity while also providing more build coverage to kernelci.org. This could be done by detecting kernel configurations that don’t build or lead to problems that can’t be found with the regular defconfigs.

Using tuxmake

The goal of tuxmake is to provide a way to reproduce Linux kernel builds in a controlled environment. This is used primarily by LKFT, but it should be generic enough to cover any use-case related to building kernels. KernelCI uses its kci_build tool to generate kernel configurations and produce kernel builds with some associated meta-data. It could reuse tuxmake to avoid some duplication of effort and only implement the KernelCI-specific aspects.

KernelCI Community Survey Report

We are thrilled to share with you the results of our first KernelCI Community Survey. It has been a very interesting experience, with just under 100 responses from people who all provided quality feedback. We are really thankful for every single one of them. It was also a great way to engage more widely with the community. The full results are available for everyone to see in a shared spreadsheet. Individual comments are not shared publicly although they are very valuable and will be taken into account.

Continue Reading