EXOTIC SILICON
“More of this, and less of that! This week it's kernel configuration time!”
Jay shows us custom kernel configurations on OpenBSD
Reckless Guide
Part 2
Compiling a custom kernel on OpenBSD isn't difficult, and it's a great way to start learning about the kernel internals.
Today, Jay takes us through the process step by step.
This article is part of a series - check out the index.
Website themes
The Exotic Silicon website is available in ten themes, but you haven't chosen one yet!
Preamble
The official documentation doesn't encourage users to change the configuration file from it's default settings, in fact users are actively discouraged from doing so.
Nevertheless, we learn things by doing them!
Like many large pieces of open source software, the OpenBSD kernel has a large number of options that you can configure at compile time. The official documentation doesn't encourage users to change the configuration file from it's default settings, in fact users are actively discouraged from doing so.
Nevertheless, we learn things by doing them. Whether you actually want to use a custom kernel on production machines or not, it's useful to have some practical experience of how the whole system works. Building a custom kernel was actually one of the very first things I ever did with an OpenBSD machine, partly as a learning experience, but mostly because I needed support for some hardware that wasn't automatically recognised.
There is no particular technical barrier to compiling such a custom kernel on OpenBSD, but it does come with some caveats. You're doing something that is not recommended by the project's own documentation, so shouldn't expect any support. Such changes might introduce bugs that are not present in the GENERIC kernel, and cause erroneous or unwanted behaviour of the system, including opening security vulnerabilities. Of course, the standard GENERIC kernel doesn't exactly come with any guarantees anyway, and a custom kernel could also remove security vulnerabilities, so if you're prepared to solve your own problems and accept any undesired behaviour, that's your choice.
At the end of the day, here at Exotic Silicon we run our production machines with all sorts of bizarre configurations. But then, we have the experience and expertise to fix them if they break...
Kernel compiling pre-requisites
You'll need the kernel sources corresponding to the OpenBSD release that you have installed. These are in the sys.tar.gz archive that comes as part of the source code distribution. Later in this series we'll be looking at the source for some userland programs that are included in the base system too, so you'll probably want to install the src.tar.gz archive as well. Both archives extract to /usr/src, and should be checked for integrity using signify before extracting.
Assuming that we have a copy of the release files in /install, we can do the following:
# cd /install
# signify -C -p /etc/signify/openbsd-XX-base.pub -x SHA256.sig src.tar.gz sys.tar.gz
# cd /usr/src/
# tar -xvz /install/src.tar.gz
# tar -xvz /install/sys.tar.gz
Verify the signature on the checksums file and the checksums of the source archives, then extract the sources to /usr/src/.
Applying any available errata patches is also recommended.
Keeping a local copy of the errata patches for the current release in a consistent location across all of your machines is a good practice to adopt, as it allows you to quickly see if a particular machine is missing any of the available errata, as well as distribute the patches to local machines which lack direct internet connections.
Here we'll use /errata/ to store the local copies, with a subdirectory una/ for as-yet unapplied patches:
# mkdir -p /errata/una
$ cd /tmp
$ ftp https://ftp.openbsd.org/pub/OpenBSD/patches/X.X.tar.gz
$ tar -xvzf X.X.tar.gz
# mv /tmp/X.X/common/* /errata/una
# chown root:wheel /errata/una/*
Download and apply any currently available errata patches to the system source code
Obviously, you'll want to replace X.X with the actual version number of the release you're using. If you're writing scripts to do this kind of thing, it's worth remembering that uname -r will give you this information. So in a script, you could use a construction similar to the following:
$ ftp https://ftp.openbsd.org/pub/OpenBSD/patches/`uname -r`.tar.gz
How to reference the current version number information in a script
Fun fact!
Errata patches and directory structure
Many years ago, the common/ directory only contained patches applicable for all hardware architectures, and any architecture-specific errata were in their own subdirectories. However, the errata tarball for OpenBSD 4.6 seems to have been the last time that this was done, and since OpenBSD 4.7, which was released in 2010, all of the errata patches have been in the common/ directory.
Each errata patch contains instructions on how to apply it, but the ones that apply to the kernel code are almost always based on /usr/src/, so for those, you can do something like this:
# cd /usr/src
# signify -Vep /etc/signify/openbsd-XX-base.pub -x /errata/001_foobar.patch.sig -m - | patch -p0
Applying a patch to the kernel source
Signify will only output the patchfile, and thus it will only be piped to the patch command, if the signature verifies. If not, you'll either get a ‘signature verification failed’ message, or more rarely, ‘unsupported file’, followed by an error from patch complaining that no patch could be found.
Next, we can move the patch from the /errata/una/ directory to the main /errata/ directory, and either re-compile the kernel, or follow whatever other instructions are listed in the errata if it isn't a kernel-related one.
# mv /errata/una/001_foobar.patch.sig /errata
Move the applied patch to another directory so that we know that it's been applied on this machine, but we have it available for later reference
Top tip!
A shell function to apply errata patches
We can use a shell function to reduce the amount of repetitive typing required when applying errata patches.
For users of the korn shell, adding the following line to your profile will define a suitable function for this purpose:
function apply_errata { signify -Vep /etc/signify/openbsd-`uname -r | cut -b 1,3`-base.pub -x $1 -m - | patch -p0; }
Note that we need to use a function rather than a simple alias, as we are passing a parameter.
This function can then be used by changing the directory indicated in the individual errata, (typically /usr/src/ or /usr/xenocara/), and passing the full path to the patch file to the function we defined above:
# cd /usr/src/
# apply_errata /errata/001_foobar.patch.sig
The kernel configuration files
A standard set of configuration files, comprised of an architecture-specific one for each supported hardware architecture, and a global one containing directives which are common to all architectures, is provided in the distribution. Each file of the set resides in a different directory, and is named ‘GENERIC’. Apart from the multiprocessor ‘GENERIC.MP’ variant, and the special ‘RAMDISK’ kernel used by the installer, discussion of any other configuration is very sparse indeed.
To compile a custom kernel on OpenBSD, we need to create our own set of configuration files. In these examples, we'll call ours ‘custom’, and, ‘custom.mp’. The choice of name is arbitrary, but it's worth adopting a sensible naming scheme if you're dealing with multiple different configurations, some of which have been more thoroughly tested than others. You might decide to use custom.1, custom.2, custom.3, and so on, as a kind of versioning system if you're changing a few options at a time and testing them.
The architecture-independent configuration file for GENERIC lives in /usr/src/sys/conf. We'll put our own architecture-independent configuration here too, although there is no reason that it has to be in /usr/src/sys/conf, or even that we split our file into architecture-independent and architecture-specific sections. The configuration files for the RAMDISK kernel, for example, are all self-contained in the various architecture-specific directories, and don't reference /usr/src/sys/conf/GENERIC at all.
The architecture-specific configurations files for GENERIC live in /usr/src/sys/arch/*/conf, one for each architecture. Since in our examples we're using the amd64 architecture, in this case we'll be using /usr/src/sys/arch/amd64/conf/.
Fun fact!
Unexpected comments in the source tree
If you look at the comments at the top of the architecture-specific GENERIC configuration file in /usr/src/sys/arch/amd64/conf/GENERIC, you'll see a couple of paragraphs with pointers to further information. These comments were added to GENERIC in 2005, (with revision 1.408 of the i386 architecture GENERIC), and earlier versions just mentioned that GENERIC included everything that was supported.
However, if you look in /usr/src/sys/arch/arm64/conf/GENERIC, you'll find a couple of extra paragraphs in the comments, telling us that the file can be customised to reduce the size of the kernel and improve performance. This might seem surprising, as the official OpenBSD documentation strongly suggests to use GENERIC unmodified. In fact, the text is almost identical to the corresponding text in the NetBSD GENERIC kernel configuration file, where it was added in 2001, (with revision 1.448 of the i386 architecture GENERIC), differing only with the changing of ‘NetBSD’ to ‘OpenBSD’.
So unfortunately, we probably can't take the inclusion of these comments as a genuine suggestion for users of the arm64 architecture to go hacking around in their config files and then expecting any additional support if something breaks. It's probably just a copy and paste oversight. Oh well.
Now that we've looked at an overview of the directory layout, we can create our own customised configuration files using the GENERIC files as templates. Once we've done that, we'll delve into the actual syntax of the files and make our own local changes.
Creating another set of kernel configuration files
Starting from /usr/src/sys/ we'll first copy conf/GENERIC, arch/amd64/conf/GENERIC, and arch/amd64/GENERIC.MP to new files:
# cd /usr/src/sys
# cp conf/GENERIC conf/custom
# cp arch/amd64/conf/GENERIC arch/amd64/conf/custom
# cp arch/amd64/conf/GENERIC.MP arch/amd64/conf/custom.mp
Using the GENERIC configuration files as templates for our own configuration.
Although the configuration options are split between three files, we only directly supply one to the kernel ‘make’ process. This will be /usr/src/sys/arch/amd64/conf/custom.mp, (unless we want to build a single-processor kernel at some point, in which case we can ignore custom.mp, and use /usr/src/sys/arch/amd64/conf/custom instead).
Don't forget to change the references to ‘GENERIC’ in the various include directives
The GENERIC.MP file contains an INCLUDE directive to include the contents of /usr/src/sys/arch/amd64/conf/GENERIC, which in turn includes the architecture-independent /usr/src/sys/conf/GENERIC. Since we just copied the GENERIC configuration files, if we were to build using /usr/src/sys/arch/amd64/conf/custom.mp, it would ignore any changes we had made to /usr/src/sys/arch/amd64/conf/custom, as it's still set to include /usr/src/sys/arch/amd64/conf/GENERIC. The same goes for /usr/src/sys/arch/amd64/conf/custom, which is still set to include /usr/src/sys/conf/GENERIC.
This is easy to overlook if you are new to creating custom configurations, so the first thing to do is to edit /usr/src/sys/arch/amd64/custom.mp, add a comment near the top with the date and the modifications you're making, (not strictly necessary, but a good practice to adopt), and change the reference to GENERIC to custom. Next, we edit /usr/src/sys/arch/amd64/custom, and change the INCLUDE line to reference custom instead of GENERIC.
Handy hint!
Using sed to avoid editing sessions
Readers who are familiar with sed will realise that you can easily create the copies and make the necessary changes without editing the files interactively:
sed -e 's!/GENERIC"$!/custom"!' /usr/src/sys/arch/amd64/conf/GENERIC.MP > /usr/src/sys/arch/amd64/conf/custom.mp
sed -e 's!/GENERIC"$!/custom"!' /usr/src/sys/arch/amd64/conf/GENERIC > /usr/src/sys/arch/amd64/conf/custom
Now we have our own set of config files that are ready for customisation.
Understanding and modifying the config files
Let's start with /usr/src/sys/conf/custom. This file is about 100 lines, and consists mostly of option directives, (as opposed to references to device drivers, which make up most of /usr/src/sys/arch/amd64/conf/custom).
Most, (but not all), of these options are described in the options(4) manual page. Ultimately, if you want to find out what any of them really do, including the undocumented ones, then just grep across the entire source tree and find where they are actually used in the code.
The main options of interest here are probably the various filesystems. Removing these has only ever caused problems for us on one occasion that I remember, which was with 5.7-release. The networking options towards the end of the config file, however, often will cause compile time breakage if removed. In fact, as recently as 2018, one of these options which is enabled by default, pseudo-device etherip, was the subject of a errata patch, as disabling it enabled a code path which could cause kernel memory to be freed twice.
This should serve as a reminder that you really can introduce subtle and potentially serious problems into your system by changing the kernel configuration from the default.
Also note that you gain almost nothing in terms of kernel size by disabling these options. Disabling FFS_SOFTUPDATES, UFS_DIRHASH, QUOTA, EXT2FS, NFSCLIENT, NFSSERVER, CD9660, and UDF, reduces the kernel by about 9 kilobytes.
Handy hint!
Comment out unwanted kernel options rather than deleting them
Like many configuration files, lines in the kernel config beginning with a # are considered comments, so things can easily be ‘commented out’.
Doing this, rather than actually deleting lines, makes it easier to see what has changed by performing a diff against a previous version of the file or against GENERIC. This can be an invaluable time-saver if you make a lot of changes at the same time, break your system unexpectedly, and can't remember exactly what you last changed.
It also makes restoring a configuration option that you previously disabled quicker, easier and more reliable, as you can simply un-comment it again.
Leaving /usr/src/sys/conf/custom and moving our attention to /usr/src/sys/arch/amd64/conf/custom, we find a few more option directives, but most of this file deals with device drivers. This obviously makes sense, as physical devices and peripherals are much more likely to be specific to an architecture than a kernel option is.
There are, however, a few useful options which are not at the top of the file but scattered throughout it. The options PCIVERBOSE, USBVERBOSE, PCMCIAVERBOSE, and ONEWIREVERBOSE, include tables of vendor and product names into the kernel which allow for more verbose identification of some attached devices. PCMCIAVERBOSE is already commented out by default in GENERIC, but removing the other three reduces the compiled size of the kernel by about 300k. This might plausibly be useful on small VMs, where you're not so likely to need to identify hardware devices anyway.
The directives near the top of the file which are neither options nor references to device drivers, are described in config(8). Reducing the value of maxusers might be useful on very memory constrained or embedded systems, however the most interesting line here is probably the config one:
config bsd swap generic
Configuring the root, swap and dump devices
This seemingly cryptic directive allows you to manually specify the root, swap, and dump devices that the kernel will use. The default setting is a sort of auto-selection based on various heuristics which work in the vast majority of cases. Nevertheless, if you create a complicated multi-device partitioning setup and the logic gets things wrong, here is where you can override it.
If you want a separate dump partition rather than having it share the swap space, this can also be configured. Additional swap devices can be configured here as well, but there isn't really any need to as they can also be configured at runtime using the swapctl utility.
At this point, we reach the device driver definitions. These generally take the following form:
[driver_name][digit or wildcard *] [tab] at [connection_point][digit or wildcard ?] [disable] [various other parameters and flags]
General format of the device driver definitions in the kernel config
The GENERIC file often appends a free-form comment at the end of the line, too, explaining what the driver is.
Let's look at a typical entry in more detail:
umass* at uhub?
USB mass storage device driver entry in the kernel config
This is the driver for USB mass storage devices, such as an external hard disk, or flash drive. Generally each driver will have a manual page in section 4 with the same name, so in this case we can look up umass(4) and find more information about it.
The * following the driver name indicates that any number of these devices can connect to the specified point. You've probably seen such devices attach as umass0, umass1, umass2, and so on. This is the configuration option which enables such behaviour. Most devices are configured like this, especially modern PCI and USB ones. Older devices such as those that connect to the ISA bus are often configured to connect to very specific place. If you have a single floppy disk controller, it's usually at port 0x3f0 and will attach as fdc0. If you have a single floppy controller at port 0x370 instead, which is traditionally the ‘secondary’ floppy controller port, it won't attach at all unless you enable the commented out option for fdc1. It would then appear as fdc1, and you wouldn't have an fdc0. Any number of floppy drives themselves though, (up to the maximum supported by the controller, obviously, which is usually two), will attach to the various fdc devices in your system.
It's possible to include two or more lines for the same device, either with different numbers or one with a wildcard and the rest numbered. In the latter case, the devices matched by the wildcard will be numbered starting one above the highest explicitly numbered instance of the device, (even if it connects to a different connection point). For example, many machines will have a uhub device connected to usb0, which would usually be the root hub that is built into the usb controller. By default, other uhub devices, (physical external usb hubs), can connect to this.
uhub* at uhub?
Other uhub devices can connect to the root hub since the GENERIC configuration includes this line
Those devices will typically produce messages similar to the following on the console when they attach:
uhub0 at usb0 configuration 1 interface 0 # This is the root hub
uhub1 at uhub0 port 5 configuration 1 interface 0 # This is a physical USB hub
uhub2 at uhub0 port 21 configuration 1 interface 0 # This is another physical USB hub
Typical console messages from USB devices attaching. First the root hub, then two more external physical hubs.
We can change the way that the devices are enumerated by modifying the configuration file. Consider the following change:
-uhub* at uhub?
+uhub2 at uhub?
Proposed configuration change for demonstration purposes
Configured in this way, the devices will be enumerated as follows:
uhub3 at usb0 configuration 1 interface 0 # This is the root hub
uhub2 at uhub3 port 5 configuration 1 interface 0 # This is a physical USB hub
ugen0 at uhub3 port 21 configuration 1 interface 0 # This is another physical USB hub
Console messages from USB devices attaching, after our change to the configuration file.
The root hub has been assigned to uhub3, as uhub2 was reserved for a specific device connection, and the next highest available number was 3.
Note that the second USB hub is no longer enumerated by the uhub driver, as we limited that driver to a single instance connecting to another uhub device. Instead the second USB hub is left unconfigured and later picked up by the ugen driver, which is a generic ‘catch-all’ driver for USB devices that don't otherwise connect to a more specific driver. In this configuration, the second USB hub would not be usable as a hub, (although we could access it in a low-level way using the ugen driver interface).
The connection point is, apart from the special case of mainbus0 at root, just another configured device. So an individual USB serial port, ucom0, might connect to a USB serial adaptor uftdi0, which itself has connected to a USB hub at uhub0, the hub being connected to usb0 which is a kind of abstraction layer from one of several possible controllers, such as xhci, ehci, ohci, and uhci. That device, for example xhci0, would likely connect to the pci bus pci0, which is connected to mainbus0, the root of the device tree structure.
Each point of connection except for mainbus0 is just a device in itself, which will be found elsewhere in the kernel configuration file.
ucom0uftdi0uhub0usb0xhci0pci0mainbus0
Removing device drivers from the kernel will reduce it's size considerably. The kernel on the workstation that I'm using to prepare this page is currently 7.7 Mb, compared to the 20.6 Mb of the GENERIC kernel. This is a saving of almost 13 Mb, which admittedly doesn't sound like much on a machine that might have 32 Gb or even 64 Gb of physical RAM. However, some VMs are much smaller than this, and remember that kernel memory cannot be swapped out. A small VM with just 512 Mb of physical RAM, might genuinely benefit from recovering an extra 13 Mb, (2.5% of total memory).
The kernel will also usually boot faster, which is probably a bigger advantage than the space savings on a typical workstation but also a useful gain on an embedded system or low power SBC. Once again using my current machine as an example, from the OpenBSD boot prompt the time to boot the kernel and arrive at the login prompt is under 15 seconds.
Obviously a kernel build after a ‘make clean’ is also quicker when there are fewer files to compile.
As for faster kernel performance in general, unfortunately this isn't usually significantly improved when running directly on modern hardware. Whilst we might imagine that smaller kernel code would fit better in cache memory and allow for a greater ratio of cache hits to misses, and that the removal of certain features should result in less kernel code running overall, in reality both benchmarks and real world experience show almost zero performance increase for cpu-bound tasks by running a kernel with most device drivers removed.
The two main exceptions that I have observed are when running within a virtual machine under vmd, and to a lesser extent on some SBCs. In the case of the VM, I'm going to speculate that memory access patterns on a host running a moderate number of VMs are not particularly cache-friendly, as a lot of physical memory is being allocated, (the total for all of the VMs), and then accessed in a fairly random way, (because the activities and memory accesses of one VM are almost certainly not related to those of any other VM). In the case of the SBCs, limited cache sizes might play a role in making small optimisations more noticeable, but it's also a case of small percentage improvements simply being more significant on slower hardware.
Nope!
The digit or wildcard after the connection point allows us to restrict devices to attaching to a certain instance of the connection point and not others.
However, this only works for connection points which are devices that have been explicitly numbered in the configuration file.
So if we have more than one USB root hub connecting to the usb driver and being automatically enumerated with a wildcard, we couldn't do the following:
uhub* at usb?
uhub* at uhub1
This won't work
If we try to, then the make config step complains that ‘uhub1’ is orphaned, (not declared):
/usr/src/sys/arch/amd64/conf/custom.mp:226: uhub* at uhub1 is orphaned
(no uhub1 declared)
*** Stop.
*** Error 1 in /usr/src/sys/arch/amd64/compile/custom.mp (Makefile:1854 'config')
See? We said it wouldn't work!
A work-around might be to replace the wildcard for uhub with two numbered entries:
uhub0 at usb?
uhub1 at usb?
uhub* at uhub1
A possible work-around
Now we can refer to uhub1 as a connection point without causing a configuration-time error, but obviously if the kernel was booted on a machine with three or more root hubs that would otherwise have connected to the usb driver, then the third and subsequent ones would instead be picked up by ugen and be unusable, (except via the ugen interface). Note that we could still successfully boot the kernel on a machine with a single usb root hub, and in that case uhub1 would not exist, and any non-root hubs that would have connected to the root hub as, presumably, uhub0 when using GENERIC, would now also be picked up by ugen instead.
Including the ‘disable’ directive will allow the driver to be compiled into the kernel, but it will not be used. This is most useful for drivers which might conflict with others, or cause problems if the corresponding hardware is not present. In this case, compiling them in but leaving them marked as disabled allows for re-enabling them interactively at boot time via boot_config. Of course, if we're building custom kernels for individual machines then in most cases we might as well just remove the un-needed drivers completely. Disabling a driver could also be a useful option if we don't want to use it, but it's presence in the kernel is necessary for successful compilation. This issue of dependencies will be discussed in more detail below.
Some device drivers accept compile-time flags to change their behaviour. For example, the ahci driver will avoid negotiation of some higher data transfer speeds if bit 0 of the supplied flags is set. Details of possible flag settings can usually be found in the driver's section 4 manual page.
Re-compiling the kernel
Important note
Different ways to compile the kernel
Several ways to compile the OpenBSD kernel exist, and each one may have subtle differences in the locations of the files created, as well as their permissions. Be especially careful if you mix information from this guide with steps from material published by sources other than Exotic Silicon, as you might create an installation that seems to work, but which will break in subtle ways in the future.
Diagnosing the cause of such problems will likely be very difficult and time consuming unless you have a lot of experience with OpenBSD systems.
With our newly created configuration files in hand, we can finally build a customised kernel.
Essentially, you can compile and re-install the kernel with the following steps:
Step 1
# cd /usr/src/sys/arch/amd64/conf
# config custom.mp
Populate the compilation directory
Step 2
# cd ../compile/custom.mp
# make clean
Clean any previous compilation attempts
Step 3
# make -j 8
Start the build using eight CPU cores
Step 4
# make install
Install the new kernel as /bsd
Then just reboot into the newly installed kernel:
# reboot
Reboot into the new kernel
The config command populates the directory /usr/src/sys/arch/amd64/compile/custom.mp/ with various object files. In fact, most of the contents will be stored in /usr/obj, as a symbolic link is made in /usr/src/sys/arch/amd64/compile/custom.mp/obj to /usr/obj/sys/arch/amd64/compile/custom.mp.
One the compilation directory is populated, we change to it and run ‘make clean’. Skipping the ‘make clean’ step should cause the main make process to only re-compile the parts that have changed, which can significantly reduce compile time. However, some changes to the kernel configuration do require this ‘make clean’ step, and although the config command usually advises when that is the case, if you are unsure or experience any problems during the build then running make clean is usually a good idea. Compile time for a kernel with a streamlined configuration, running on a moderately fast amd64 machine, might only be around 50 seconds or possibly less, even after running ‘make clean’. So in any case, doing so even when it wasn't strictly necessary won't usually waste a large amount of time.
On the other hand, seeing exactly which files are re-compiled after changing the kernel configuration might be useful if you're unfamiliar with the layout of the code and trying to find where a particular feature is implemented.
Handy hint!
Attention when tidying up old files
If you're tempted to try removing old files by using something like rm -r /usr/src/sys/arch/amd64/compile/CONFIG_NAME, remember that /usr/src/sys/arch/amd64/compile/CONFIG_NAME/obj is a symbolic link to /usr/obj/sys/arch/amd64/compile/CONFIG_NAME, so you'll probably want to remove that path manually too.
The actual kernel compile itself is invoked with a simple ‘make’, but usually you will want to specify the -j parameter to speed up the process by making use of multiple CPU cores. At the end of the compile, assuming there were no errors, the newly built kernel image will be in /usr/src/sys/arch/amd64/compile/custom.mp/obj/bsd. This image has been stripped of debugging symbols, but there will also be an un-stripped kernel image in bsd.gdb.
Installing the new kernel image with ‘make install’, renames the existing /bsd kernel image to /obsd, copies the new kernel image to /bsd, copies the object files to /usr/share/relink/kernel/custom.mp, and stores a sha256 checksum of the new kernel image to /var/db/kernel.SHA256. This checksum is used by the /usr/libexec/reorder_kernel script, which is run by /etc/rc on every boot to randomise the linking order of the object files. The checksum should ensure that the object files it's about to parse in /usr/share/relink_kernel actually come from the same kernel as /bsd. Whilst this would usually be the case, it might not be, for example if you rename an /obsd kernel binary to /bsd.
Handy hint!
Beware of unexpected changes during kernel re-linking
Note that relink_kernel is hard-coded to check the checksum against the kernel in /bsd and install a new kernel image in /bsd, too, regardless of the kernel you actually booted. This has the unexpected effect of replacing /bsd with a different kernel if the object files corresponding to the currently booted kernel are deemed to be available.
For example, imagine we were to compile a kernel using a configuration file called, ‘known_good’, install this with ‘make install’, and then rename it from /bsd to /bsd.works_ok. Next, we continue to compile another kernel using a configuration file called ‘custom.mp’, and install it with ‘make install’, placing it in /bsd. Now, every time that we we boot /bsd, the relink_kernel script will replace /bsd with a new kernel image using the object files in /usr/share/relink/kernel/custom.mp. However, the first time that we boot /bsd.works_ok, the relink_kernel script will see a valid checksum for /bsd, but create a new /bsd kernel image using the object files in /usr/share/relink/kernel/known_good, the configuration that was used to compile /bsd.works_ok. If we then reboot into /bsd, we'll be running a kernel with the known_good configuration instead of the custom.mp configuration.
This seems like unintented and undesirable behaviour to me, but whether it's a bug or an intentional design feature, I have no idea.
Furthermore, the creation of the path to the object files in a subdirectory of /usr/share/relink/kernel/ is done simply by taking the value of kern.version from the currently booted kernel and stripping the version number at the end. The new /bsd binary will be created from the object files in this directory, if they exist, regardless of whether they actually match the booted kernel or not. This can cause problems if you compile a kernel using a known good configuration and store it for use as a backup, then change the corresponding configuration file afterwards, in this case /usr/src/sys/arch/amd64/conf/known_good, and compile and install a new kernel using it, all the time keeping the old /bsd.works_ok binary in place to boot if the new /bsd becomes unusable. If you then configure, compile, and install a different kernel, say custom.mp, which doesn't work as intended and you try to boot /bsd.works_ok, then the relink_kernel script will proceed to create a new /bsd image based on the files in /usr/share/relink/kernel/known_good, which now do not match either the currently booted kernel or the kernel that was previously in /bsd.
Of course, since /usr/libexec/relink_kernel is just a shell script, it's easily customised or even completely disabled if you don't want the kernel re-linked in a random order on each boot. However this is a security feature, and for normal production machines, especially servers, the benefit of such re-linking is obvious.
You might be wondering why ‘make install’ does the copying and renaming of /bsd to /osb in several steps, rather than a simple move operation...
Why not?
mv /bsd /obsd
Surely this would be easier...
Instead of...
[[ ! -f /bsd ]] || cmp -s bsd /bsd || ln -f /bsd /obsd
Than this?
Because...
The initial test just skips the renaming altogether if the newly compiled kernel is already the same as /bsd. So if you run ‘make install’ several times in a row, /bsd will be overwritten, (with exactly the same file), but /obsd will be left untouched. This is usually the desired behaviour, as it doesn't make much sense to overwrite a previous and possibly known-good kernel image with an identical copy of a newly compiled one which might be broken.
Next we replace /obsd with a hard link to the current /bsd, which is about to be replaced. At this point, both /bsd and /obsd point to the same file, and if the system crashed it would still happily boot into the existing kernel image. This would not be the case if we'd simply renamed /bsd to /obsd, as there would be no /bsd for the bootloader to find, and it would require manual intervention at the console to bring the system back up again.
The final install command replaces /bsd with a copy of /usr/src/sys/arch/amd64/compile/custom.mp/obj/bsd, by writing it to a temporary file and then renaming it. This process effectively ensures that there is always a readable kernel image in /bsd, so in theory if the system crashed at any point, it should be able to reboot without manual intervention. The source code for /usr/bin/install is in /usr/src/usr.bin/xinstall, and it's quite well commented as well as easy to follow.
It's strongly recommended to keep a known good kernel image available in case you create a custom kernel which either doesn't boot or which boots but mis-behaves. Although the existing /bsd kernel is copied to /obsd after each successful re-compilation followed by a ‘make install’, imagine the situation if you have a non-functional kernel in /bsd, then boot into /obsd, and compile and install another kernel which is also broken. Then you will have non-functioning kernels in both /bsd and /obsd.
In this case, if you haven't manually copied a known good kernel somewhere else, such as /bsd.known_good, and you're using a multiprocessor machine, then you can probably rely on the fact that the installer will have copied the single processor kernel to /bsd.sp, and boot into this in order to fix your system. However, it's quicker and easier once you have a known good kernel configuration that you've used and tested for a period of time without problems, to copy this to a memorable location such as the /bsd.known_good that we just suggested. Alternatively just keep the GENERIC kernel as something like /bsd.generic.mp.
Common problems at compile time
If we start randomly removing device drivers from the kernel configuration, it soon becomes obvious that there are various inter-dependencies and that code for devices that are still compiled in can fail to compile because another seemingly unrelated item was removed.
For this reason, it's often a good idea to make just a few changes at a time and test them, rather than commenting out 75% of the GENERIC configuration in one go only to find that it no longer even compiles. With experience, it becomes easier to predict such inter-dependencies, but things can change from one release to the next.
An example of such non-obvious interdependencies would be DDB. For a long time it was possible to simply remove DDB from the kernel config, and reduce the size of the kernel by around 200 kilobytes.
At some point, though, changes to the code broke this, and trying to compile such a kernel on OpenBSD 7.0-release will fail with the following error:
In file included from /usr/src/sys/dev/pci/amas.c:27:
In file included from /usr/src/sys/dev/pci/amas.h:57:
In file included from /usr/src/sys/dev/pci/pcivar.h:77:
/usr/src/sys/arch/amd64/compile/test.mp/obj/machine/pci_machdep.h:91:19: error: declaration of 'struct cpu_info' will not be visible outside of this function [-Werror,-Wvisibility]
int, struct cpu_info *,
^
Typical error message at compile time, due to breakage following a change to the kernel configuration
The build stopped during compilation of the amas driver. Reading the manual page for amas, there isn't any obvious connection with DDB, and if you're not a C programmer then the error probably won't mean much to you either. In that case, your only options are either to re-enable DDB, or to remove the amas driver as well. If we look at the original GENERIC config, we can see that although amas is compiled in by default it's also disabled. Since this means that no devices are going to connect to it, (unless you're changing the kernel configuration at boot time), removing it from the kernel shouldn't cause any loss of functionality.
Of course, the real way to fix the problem is to work out why amas.c is failing to compile. If you are a C programmer, then you can probably guess that it's due to a missing include somewhere, which is mitigated by having DDB compiled in, as by chance it also includes the same header file.
The cpu_info structure is architecture-specific. On amd64, it's defined in /usr/src/sys/arch/amd64/include/cpu.h, but since the location varies with architecture, the correct way to include it in the kernel sources is using the ‘machine’ path:
#include <machine/cpu.h>
Including the missing header file
If we add this line to /usr/src/sys/dev/pci/amas.c before the other includes, then the compilation of amas.c will succeed and the kernel build will continue.
Fun fact!
Finding bugs through casual experimentation
The reference to struct cpu_info in /usr/src/sys/arch/amd64/include/pci_machdep.h was added in revision 1.29, which is probably when this particular bug was introduced.
This is a good example of how we can find real bugs in the OpenBSD sourcecode, just by trying to compile a custom kernel configuration. It's not even necessary to run a custom kernel, (which is when the concerns of instability and opening new vulnerabilities could arise). Simply compiling a non-standard configuration can help to uncover oversights such as missing includes, yet presumably very few people are doing this, as the particular example cited above has been in the source tree since June of 2020.
Having fixed the problem with the compilation of the amas driver, we can re-start the kernel build process. However our success will be short-lived, as the compile now fails at the linking stage:
ld -T ld.script -X --warn-common -nopie -o bsd ${SYSTEM_HEAD} vers.o ${OBJS}
ld: error: undefined symbol: setjmp
>>> referenced by x86emu.c:231 (/usr/src/sys/dev/x86emu/x86emu.c:231)
>>> x86emu.o:(x86emu_exec)
ld: error: undefined symbol: longjmp
>>> referenced by x86emu.c:287 (/usr/src/sys/dev/x86emu/x86emu.c:287)
>>> x86emu.o:(x86emu_halt_sys)
ld: error: undefined symbol: stacktrace_save_at
>>> referenced by dt_dev.c:0 (/usr/src/sys/dev/dt/dt_dev.c:0)
>>> dt_dev.o:(dt_pcb_ring_get)
*** Error 1 in /usr/src/sys/arch/amd64/compile/test.mp (Makefile:823 'bsd': @echo ld -T ld.script -X --warn-common -nopie -o bsd '${SYSTEM_H...)
Kernel build failing at the linking stage
Here we are seeing one error related to the compilation of the dt pseudo device driver, and two errors related to option X86EMU. This time, we'll just comment out those two directives from the configuration. Before we re-start the build process, we need to run ‘make clean’, as we've changed one of the option lines.
With that change, our compilation is now successful. Of course, just because the code compiles doesn't mean that the kernel will boot. Even if it does boot, it might not be stable in operation, so further testing is certainly warranted.
And with that, we wrap up week two's installment of this series!
Summary
This week we looked at how to customise the kernel configuration files, and compile a custom kernel. Along the way, we also looked at how the kernel re-linking and object file order randomisation script relink_kernel works. Even if we never actually use a custom kernel on a production machine, our efforts at building a custom kernel certainly paid off as we found and fixed a bug in the amas driver.
Next week we'll be building on this experience, and taking things one step further by reading through and modifying parts of the kernel code itself. Specifically, we'll be looking at some of the code that handles the framebuffer. So get both your technical and artistic hats on in time for part three!
IN NEXT WEEK'S INSTALLMENT, WE GO BEYOND CONFIGURATION CHANGES AND START MODIFYING THE KERNEL CODE ITSELF. SEE YOU THERE!