fix(installer): harden disk selection and partitioning phase

The disk phase was the dominant source of incomplete installs. Six
concrete failure modes addressed in one pass:

1. Live-ISO USB excluded from the disk picker. select_disk previously
   filtered loop|ram|zram|sr but not the device the installer booted
   from; picking it would format the boot media mid-install. New
   detect_live_iso_devices walks /, /iso, /run/initramfs/live,
   /nix/.ro-store, /nix/store and resolves each backing device to its
   parent disk via lsblk -no PKNAME. Override with
   NOMARCHY_INSTALL_ALLOW_ISO_TARGET=1 for the developer case.

2. 10 GiB minimum-capacity preflight. Disko fails late and obscurely
   on undersized media; surface it while the picker is still open.

3. prewipe_target_drive rewritten:
   - Enumerates every active dm-crypt mapping via dmsetup ls and
     closes those whose backing device is on the target drive. The
     old version only knew about the hardcoded names "crypted" /
     "crypted_main" so an aborted multi-disk run or a non-Nomarchy
     install would leave a holder open and silently break the wipe.
   - Drops `|| true` from wipefs / sgdisk / dd. After the LUKS and
     swap teardown above, a real failure means something is still
     holding the device — surface that instead of papering over it.
   - udevadm settle bounded to 30s so a flapping USB can't hang.
   - Post-wipe sanity check: refuse to hand the disk to disko if
     anything is still mounted off it.

4. run_disko_with_retry wraps the disko call. On failure, shows the
   last 30 lines of output via gum style and offers Retry /
   View full log / Abort. set -e is suspended for the disko call so
   the exit code can be inspected. The previous bare `disko --mode
   disko` aborted the whole installer with output scrolled past.

5. Sed-templated disko-golden.nix + disko-btrfs-multi.nix pair
   replaced by a single disko-config.nix Nix function of
   { mainDrive, extraDrives ? [] } called via --argstr / --arg.
   Templating Nix via shell-escaped string substitution caused at
   least one production bug (3aadc36 fixed embedded-newline
   escaping); function arguments are the right shape and eliminate
   the entire class of escaping concerns. Single-disk path is
   `extraDrives = []`; multi-disk gets BTRFS `-d single -m raid1`
   plus the additional /dev/mapper/* devices. Hosts that shipped
   /etc/disko-golden.nix now ship /etc/disko-config.nix.

6. EXIT trap added so the tmpfs LUKS key file (/dev/shm/nomarchy-
   luks.key) is removed even if the script aborts between key-write
   and the explicit unset. Replaced redundant `shred -u` on tmpfs
   with `rm -f` (already in RAM).

Verification: bash -n on install.sh, nix-instantiate parse + strict
eval on disko-config.nix in both single and multi shapes, full
nix flake check --no-build evaluating all three NixOS configurations
(default, nomarchy-installer, nomarchy-live) plus the installerVm.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Bernardo Magri
2026-04-30 19:42:00 +01:00
parent 386da51178
commit f318585dc4
9 changed files with 317 additions and 254 deletions

View File

@@ -96,8 +96,8 @@
# Symlink for easy access (merged into systemPackages above)
# The nomarchy-install script is created by writeShellScriptBin in the main list
# Include disko configurations
environment.etc."disko-golden.nix".source = ../installer/disko-golden.nix;
# Include disko configuration
environment.etc."disko-config.nix".source = ../installer/disko-config.nix;
# Include Nomarchy source for installation
environment.etc."nomarchy".source = inputs.self;

View File

@@ -58,8 +58,8 @@
mode = "0644";
};
environment.etc."disko-golden.nix" = {
source = ../installer/disko-golden.nix;
environment.etc."disko-config.nix" = {
source = ../installer/disko-config.nix;
};
environment.etc."nomarchy".source = inputs.self;