Linux went away but then I got it to come back
I have been running this blog for two months. My original goal was to set up a Linux OS on my new laptop that I could use for day to day web browsing and development. This has not happened. My primary computer remains an obsolete Macbook. My new laptop is plugged directly into the TV and I run Windows on it for games and watching movies. It’s actually great.
But, I’m going to a conference tomorrow and I do need to get Linux working.
On my last post here, I had a successful Linux install, albeit lacking Lenovo drivers. The post ended with me happily booting back into Windows. What I didn’t mention here is that about a week after this I booted back into Linux, and it just… didn't. The GRUB bootloader no longer appeared at startup. The Lenovo BIOS-alike menu showed a list of bootable partitions, and one said "ubuntu", but when I selected that partition it just booted Windows.
The first suggestion I am given, when I ask around what might be happening here, is to debug with a Windows program called "Visual BCD".
I dig around a little more and turn up these resources:
http://www.av8n.com/computer/htm/grub-reinstall.htm
https://help.ubuntu.com/community/RecoveringUbuntuAfterInstallingWindows
The second one-- the one promisingly named "RecoveringUbuntuAfterInstallingWindows"-- explains there's a nice, easy, GUI way to fix things from an Ubuntu LiveCD. However-- bafflingly given that this is semi-official advice from the Ubuntu website-- it requires you to set up a PPA (non-linux users, that's a third party package repository) and install something over the internet. The Kubuntu LiveCD doesn't have the Lenovo wifi drivers, so that's out.
Taking the terminal route, the two guides actually just say to do the same thing. According to the guides, all I actually need to do is type `sudo grub-install /dev/sda`.
The worst imaginable thing happens.
My exact problem turns out to be on Stack Overflow, which is great, because there's nothing I love more than blindly copypasting commands containing "sudo" from Stack Overflow. (Non-linux users, that was sarcasm.) The answer requires me to know the exact "/dev/sda#" names of my partitions, at which point I feel really glad I set up this blog, because it means I have screenshots of my disk setup during installation.
My screenshots tell me that my Ubuntu is on /dev/sda6, which allows me to proceed with Stack Overflow's approach. I mount /dev/sda6 to /mnt and…
Let's do take a moment to reflect that this is the *most basic thing* you could be doing with Linux-- I am just trying to make Linux *boot*-- and the person who wrote this piece of "grub-install" software, this basic universal thing, decided to write error messages that could only be possibly understood if you read the program's source code.
On my first pass, research tells me that the problem is that I don't have a "/boot partition". Okay? Do I need a "/boot partition"? Nobody told me I needed a "/boot partition". I didn't even know such a thing existed as "/boot partition" until today. Multiple guides suggest I create a new 2MB partition to hold the "/boot", which makes tons of sense, except if that's how it works then why didn't the installer do it? And why was whatever-the-installer-did *working* until suddenly it didn't? At about this point I really want to stop and play some video games, but I can't because my gaming PC is in an Ubuntu LiveCD, admonishing me not to use "blocklists".
Somewhere around here I go to Twitter for help and what multiple people tell me is that pretty much all available tutorials, documentation, forum threads and wiki pages within the Linux community refer to pre-UEFI computers and will be incorrect for my case.
A bunch of time on Google and several dead ends later, I have learned a series of things:
In the world of UEFI, every computer has a "EFI partition". This is a little FAT32 partition on every hard drive. Each OS creates a directory on this partition and installs their bootloader into it, in the form of .efi files, which are apparently just executables.
Before UEFI, when you installed Grub, you generally created a tiny partition and installed Grub onto it. Many walkthroughs and tools assume it still works this way.
But when Grub runs on a UEFI system, it installs itself into EFI/YourOSName on the EFI partition. It installs an executable named shimx64.efi, an executable named grubx64.efi, and a file named grub.cfg with information about how to find the OSes to boot. Grubx64.efi boots up, loads grub.cfg, and displays the boot choices menu. Shimx64.efi meanwhile does nothing but execute grubx64.efi; it's there because shimx64.efi is signed with Microsoft's private key, which means Secure Boot will execute it, but grubx64.efi is not.
Normally, the way to install Grub is to run a program called grub-install. This program *completely assumes* you are trying to set up grub to boot the same copy of Linux you are currently at that moment running. This means it does not work right if you have booted from a USB stick, which is why it was freaking out before.
The way to run grub-install off of a USB stick is to mount the drive you wish to set up grub with, then chroot into the mounted drive directory. However, this is not enough by itself-- you also have to do some weird tricks to make the chrooted directory *appear* such that it was the OS you actually booted from, so grub-install will work. You have to have the devices mounted in the normal places, you have to have the efi partition mounted at /boot/efi etc.
It turns out a program called EasyBCD exists which is more human-comprehensible than "VisualBCD", and it lets me get the list of items on the EFI partition which the Lenovo BIOS-alike is using to construct its list of boot options.
What this screenshot tells me is that at *some point*, *something* installed the Grub shimx64.efi, and that the "ubuntu" option in the Lenovo boot menu is pointing at it. I decide that before I make any more attempts to install a new Grub, I should understand what happened to the old one. Is it still there?
I boot back to the Ubuntu USB stick. Because the EFI partition is just FAT32, I can mount it like a normal drive. I check, and the “ubuntu” folder is there!
There's also a "Microsoft" directory there, and it contains… well, exactly what I'd expect you see if you looked in the install directory for Microsoft's bootloader folder:
You know this isn't Linux anymore because whoever created this knows what "locales" are.
So: I have an EFI partition containing a normal-looking "Microsoft" directory, and an "ubuntu" directory containing alien garbage. After a few minutes and an unmount/remount, I can't even see the alien garbage anymore; ls just says "input/output error" when I attempt to look at the "ubuntu" directory. A friend suggests dmesg might have more information about what "input/output error" means; it does:
My bootloader partition is corrupted. D:
Well… can I fix it? It's just a FAT32 partition, after all. It turns out there's a fsck-for-FAT32 utility in Linux called "dosfsck":
Notice what decade this utility appears to have been written in: "toggle Atari filesystem format". Also, "Drop that file", which sounds like the party hit of the summer.
I run dosfsck three times with the "automatically repair errors" option; each time it finds a different error:
On the fourth try, there was silence:
And I awoke to a changed world:
I now have a working, repaired EFI partition, with a working Windows 8 bootloader (I reboot into Windows at about this point, just to make sure that's true). My corrupted "ubuntu" directory is gone forever. However, now that I have a usable EFI partition, and now that I actually understand what's going on, I can just reinstall it.
I find some instructions on the main Ubuntu site for doing the weird chroot procedure; they're pre-EFI instructions, but I'm able to adapt them to what I need. My adapted procedure is:
Mount the Linux partition to /mnt/drive
for i in /dev /dev/pts /proc /sys /run; do sudo mount -B $i /mnt/drive$i; done
Mount the EFI partition to /mnt/drive/boot/efi
And then I can run grub-install. I take care to make it explicit I want the EFI version and I know where the efi directory is, since the errors the first time all seemed to indicate grub-install was failing to figure that stuff out on its own:
I then run update-grub to create the grub.cfg file, and I'm done.
Let us take a moment to reflect on how completely garbage it is that grub-install-- the tool you *use to install the software you use to boot*-- is designed such that it requires you to boot into the OS you're trying to boot before you install it.
I restart, and I'm in grub!
…except for some reason I don't have a menu.
At this point I am *totally lost*. I have never used grub, and apparently the grub console's normal mode of interaction is that if you type something incorrect it fails silently and prints no error message. This is what happens when you type "help" at the grub console:
Someone on IRC tells me to run:
configfile '(hd1,gpt6)'/boot/grub/grub.cfg
I have no idea what this means. It works, but then when I do a test restart, I'm back at the grub command line. IRC person tells me that after I'm booted into Linux, I need to run this to build the menu:
sudo grub-install --recheck /dev/sdX; sudo update-grub
I have no idea what this means. It looks exactly like what I typed before when I was running grub-install, except for the "--recheck". That was something I had made a specific decision *not* to include, since the help said that --recheck means "delete device map if it already exists", which sounds bad. Whatever, I run it. I reboot. I have a menu! I can boot into Linux! I boot a couple more times. It doesn't magically go away! Everything seems to work! I try booting back into Windows.
I now have a working Linux, but my Windows can't boot. I try to boot into my Windows recovery partition, but I get this same screen. I dig out the USB key with my Windows install ISO on it, and I run their "Startup Repair" feature:
…but it doesn't work. A friend gives me a thing to type into the install ISO's command line prompt that's supposed to fix the BCD error:
…*this* claims that it can't find a Windows partition to boot to. (At this exact moment, typing "dir C:" reveals that it is, in fact, my Windows partition.)
I dig around on Google. I find bizarre but repeatedly corroborated claims that the /rebuildbcd command in the Windows 8 installer only works if you've booted off of a DVD, and does not work when booting off a USB key. I find an *INCREDIBLY* detailed article about BCD files on support.microsoft.com which unfortunately specifically refers to Windows 7.
Then I find this, on some StackExchange, and it actually works:
http://superuser.com/a/504360
The advice here is to leave nothing to Microsoft at all, and manually mount the EFI drive:
Then manually remove the old “BCD” file, and manually create a new one, specifically telling it where to find Windows:
This sounds terrifying, but it actually works! *Everything* works! When I reboot, I find it boots into Grub by default, I can pick Ubuntu from the Grub menu and it works, I can pick Windows from the Grub menu it works, I boot back and forth between OSes several times and nothing magically breaks. I have a working computer!
That took five hours. Plus another hour to write this summary.