Georg Lukas, 2024-12-16 18:06

Exactly one year ago, after updating a bunch of Debian packages, my laptop stopped booting Linux. Instead, it briefly showed the GRUB banner, then rebooted into the BIOS setup. On every startup. Reproducibly. Last Friday 13th, I was bitten by this bug again, on a machine running Kali Linux, and had to spend an extra hour at work to fix it.

TL;DR: the GRUB config got extended with a call to fwsetup --is-supported. Older GRUB binaries don't know the parameter and will just reboot into the BIOS setup instead. Oops!

Screenshots of GRUB2 and BIOS setup overlaid with a red double-arrow

The analysis

Of course, I didn't know the root cause yet, and it took me two hours to isolate the problem and some more time to identify the root cause. This post documents the steps of the systematic analyis approach f*cking around and finding out phase, in the hope that it might help future you and me.

Booting my Debian via UEFI or from the SSD's "legacy" boot sector reproducibly crashed into BIOS setup. Upgrading the BIOS didn't improve the situation.

Starting the Debian 12 recovery worked, however. Manually typing the linux /boot/vmlinux-something root=UUID=long-hex-number and initrd /boot/initrd-same-something and boot commands from the Debian 12 GRUB also brought me back into "my" Linux.

Running update-grub and grub-install from there, in order to fix my GRUB, had no positive effect.

The installed GRUB wasn't displaying anything, so I used the recovery to disable gfx mode in GRUB. It still crashed, but there was a brief flash of some text output. Reading it required a camera, as it disappeared after half a second:

   bli.mod not found

A relevant error or a red herring? Googling it didn't yield anything back in 2023, but it was indeed another symptom of the same issue.

Another, probably much more significant finding was that merely loading my installation's grub.cfg from the Debian 12 installer's GRUB also crashed into the BIOS. So there was something wrong with the GRUB config after all.

Countless config changes and reboots later, the problem was bisected to the rather new "UEFI Firmware Settings" menu item. In retrospect, it's quite obvious that the enter setup menu will enter setup, except that... I wasn't selecting it.

But the config file ran fwsetup --is-supported in order to check whether to even display the new menu item. Quite sensible, isn't it?

Manually running fwsetup --is-supported from my installed GRUB or from the Debian installer... crashed into the BIOS setup! The obvious conclusion was that the new feature somehow had a bug or triggered a bug in the laptop's UEFI firmware.

But given that I was pretty late to the GRUB update, and I was running on a quite common Lenovo device, there should have been hundreds of users complaining about their Debian falling apart. And there were none. So it was something unique to my setup after all?

The code change

The "UEFI Firmware Settings" menu used to be unconditional on EFI systems. But then, somebody complained, and a small pre-check was added to grub_cmd_fwsetup() in the efifwsetup module in 2022:

if (argc >= 1 && grub_strcmp(args[0], "--is-supported") == 0)
    return !efifwsetup_is_supported ();

If the argument is passed, the module will check for support and return 0 or 1. If it's not passed, the code will fall through to resetting the system into BIOS setup.

No further argument checks exist in the module.

Before this addition, there were no checks for module arguments. None at all. Calling the pervious version of the module with --is-supported wouldn't check for support. It wouldn't abort with an unsupported argument error. It would do what the fwsetup call would do without arguments. It would reboot into the BIOS setup. This is where I opened Debian bug #1058818, deleted the whole /etc/grub.d/30_uefi-firmware file and moved on.

The root cause

The Debian 12 installer quite obviously had the old version of the module. My laptop, for some weird (specific to me) reason, also had the old module.

The relevant file, /boot/grub/x86_64-efi/efifwsetup.mod is not part of any Debian package, but there exists another copy that's normally distributed as part of the grub-efi-amd64-bin package, and gets installed to /boot/grub/ by grub-install:

   grub-efi-amd64-bin: /usr/lib/grub/x86_64-efi/efifwsetup.mod

My laptop had the file, but didn't have this package installed. This was caused by installing Debian, then restoring a full backup from the old laptop, which didn't use EFI yet, over the root filesystem.

The old system had the grub-pc package which satisfies the dependencies but only had the files to install GRUB into the [MBR] (https://en.wikipedia.org/wiki/Master_boot_record).

grub-install correctly identified the system as EFI, and copied the stale(!) modules from /usr/lib/grub/x86_64-efi/ to /boot/grub/. This had been working for two years, until Debian integrated the breaking change into the config and into the not installed grub-efi-amd64-bin package, and I upgraded GRUB2 from 2.04-1 to 2.12~rc1-12.

Simply installing grub-efi-amd64-bin properly resolved the issue for me, until one year later.

The Kali machine

Last Friday (Friday the 13th), I was preparing a headless pentest box for a weekend run on a slow network, and it refused to boot up. After attaching a HDMI-to-USB grabber I was greeted with this unwelcoming screen:

Screenshots of GRUB2 shell from Kali

Manually loading the grub.cfg restarted the box into UEFI setup. Now this is something I know from last year! Let's kickstart recovery and check the GRUB2 install:

┌──(root㉿pentest-mobil)-[~]
└─# dpkg -l | grep grub
ii  grub-common               2.12-5+kali1   amd64   GRand Unified Bootloader (common files)
ii  grub-efi                  2.12-5+kali1   amd64   GRand Unified Bootloader, version 2 (dummy package)
ii  grub-efi-amd64            2.12-5+kali1   amd64   GRand Unified Bootloader, version 2 (EFI-AMD64 version)
ii  grub-efi-amd64-bin        2.12-5+kali1   amd64   GRand Unified Bootloader, version 2 (EFI-AMD64 modules)
ii  grub-efi-amd64-unsigned   2.12-5+kali1   amd64   GRand Unified Bootloader, version 2 (EFI-AMD64 images)
ii  grub2-common              2.12-5+kali1   amd64   GRand Unified Bootloader (common files for version 2)

┌──(root㉿pentest-mobil)-[~]
└─# grub-install
Installing for x86_64-efi platform.
Installation finished. No error reported.

┌──(root㉿pentest-mobil)-[~]
└─# 

That looks like it should be working. Why isn't it?

┌──(root㉿pentest-mobil)-[~]
└─# ls -al /boot/efi/EFI 
total 16
drwx------ 4 root root 4096 Dec 13 17:11 .
drwx------ 3 root root 4096 Jan  1  1970 ..
drwx------ 2 root root 4096 Sep 12  2023 debian
drwx------ 2 root root 4096 Nov  4 12:53 kali

Oh no! This also used to be a Debian box before, but the rootfs got properly formatted when moving to Kali. The whole rootfs? Yes! But the EFI files are on a separate partition!

Apparently, the UEFI firmware is still starting the grubx64.efi file from Debian, which comes with a grub.cfg that will bootstrap the config from /boot/ and that... will run fwsetup --is-supported. BOOM!

Renaming the debian folder into something that comes after kali in the alphabet finally allowed me to call it a day.

The conclusion

When adding a feature that is spread over multiple places, it is very important to consider the potential side-effects. Not only of what the new feature adds, but also what a partial change can cause. This is especially true for complex software like GRUB2, that comes with different targeted installation pathways and is spread over a bunch of packages.


Comments on HN

Comments on Mastodon