fix(installer): pre-flight resume polish (4 gaps)

Four resume-flow papercuts in `installer/install.sh` that hurt the
"interrupted install" path the most.

1. `--resume` with no state file is no longer silent.
   The most common operator confusion: reboot the live ISO, forget
   /tmp/ is tmpfs, re-run with --resume, watch the installer start
   over from scratch without saying anything. Now: loud error, tmpfs
   explanation, exit 1.

2. Validate the saved TARGET_DRIVE still exists on resume.
   Live ISO USB sticks get unplugged between sessions, dev hosts
   sometimes have non-deterministic /dev/sdX numbering. Without the
   guard the install proceeds and fails with cryptic disko / mount
   errors deep in execute_installation. Now we fail at load_state
   with the actual reason and a clean recovery path.

3. Resume now shows what's being resumed.
   `save_state` stamps an ISO-8601 timestamp; `load_state` prints
   "Resumed from <path> (saved Xm ago)" plus a "Target: /dev/X → user
   @ host" summary line. Lets the user Ctrl-C before any destructive
   prompt fires if they're resuming onto the wrong machine.

4. `--help` documents the tmpfs limitation.
   Saved state lives in /tmp/ which is tmpfs on the live ISO; --resume
   only works within the same boot. The man-page now says so instead
   of letting users discover it the hard way.

`format_age` is the one new helper — pretty-prints "Xs/Xm/Xh Ym/Xd"
relative to now, falls back to the raw timestamp if `date -d` can't
parse the input. shellcheck --severity=error passes.

Out of scope (potential future work):
- Persistent state across reboots (would need a writable USB / external
  drive — chicken/egg with the installer setting up the only persistent
  storage in the first place).
- `--show-state` flag to inspect a saved file without running.
- State-file schema versioning.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Bernardo Magri
2026-05-18 18:00:02 +01:00
parent 7fa909ddf4
commit 9c672953bc
2 changed files with 70 additions and 11 deletions

View File

@@ -77,7 +77,7 @@ Each PR description should reference the row(s) in `docs/SCRIPTS.md` it closes,
- Richer disk metadata (Shipped).
- Form-factor → laptop preset (Now, depends on Pillar 5).
- `disko-golden.nix` variants for software-RAID and BTRFS-pool-as-root (Shipped).
- Pre-flight resume polish (Next).
- Pre-flight resume polish (Shipped).
- "What's installed?" summary screen on boot of a freshly-installed system, sourced from `state.json` + `nomarchy-system-scripts` introspection.
- Optional non-LUKS branch in the installer for users who explicitly opt out of FDE.
@@ -121,6 +121,7 @@ Each PR description should reference the row(s) in `docs/SCRIPTS.md` it closes,
(Move items here when they land — keep them brief, link the commit/PR.)
- _2026-05-18_ — Pillar 4: pre-flight resume polish. Fixed four resume-flow gaps in `installer/install.sh`: (1) `--resume` with a missing state file now errors loudly with a tmpfs explanation instead of silently falling through to a fresh prompt cycle (the most common operator confusion was "rebooted, forgot tmpfs eats /tmp/, watched the installer start over without realising"); (2) on resume, the saved target drive is validated as a block device before any disk-phase step runs — catches the live-ISO USB-unplugged / non-deterministic /dev/sdX class of mid-install failures; (3) `save_state` now stamps an ISO-8601 timestamp and `load_state` shows a `(saved Xm ago)` banner plus a `Target: /dev/X → user @ host` summary line, so the user can `Ctrl-C` if they're resuming onto the wrong host before any destructive prompt fires; (4) `--help` now documents the tmpfs limitation. `shellcheck --severity=error` passes.
- _2026-05-18_ — Declarative-state defaults centralization. Made `lib/state-schema.nix` the single source of truth for every state-default that previously lived in three places (the schema itself, `core/system/options.nix` / `core/home/options.nix` `default = …` clauses, and `core/home/state.nix` `or …` fallbacks). Replaced ~25 hardcoded literals with `schema.<scope>.<key>` reads. Side-effect: fixed a lingering bug where `core/home/options.nix:theme` still defaulted to `"summer-night"` after the system-side was moved to `"nord"` — half the codebase's home option resolved to the wrong theme when state.json was missing/blank. `nix flake check --no-build` confirms zero semantic change for every other field. Doesn't touch the installer-written `state.json` (separate batch — needs schema → JSON generation).
- _2026-05-18_ — Pillar 7 first step: Forgejo Actions CI (eval + lint). New `.forgejo/workflows/check.yml` runs on every push to `main` and every PR: (1) `nix flake check --no-build` to catch eval regressions, (2) `bash -n` + `shellcheck --severity=error` over every `nomarchy-*` bash script (whole-tree, not just changed files — gates branches that bypass the pre-commit hook), (3) `docs/SCRIPTS.md` drift check (fails loudly if a script change didn't regenerate the audit doc). All three checks pass locally on the current tree. Activation requires enabling Actions on the Forgejo repo and registering a `forgejo-runner`; the workflow itself is dormant until then. ISO build job is intentionally deferred — needs a binary cache (Cachix/Attic) to be tractable.
- _2026-05-18_ — **Pillar 3 Phase B: complete.** Final batch (restart/sudo/theme/misc clusters) cleared the last 13 `unused?` rows. Deleted five truly dead scripts: `nomarchy-restart-{hyprctl,mako}` (theme switching calls `hyprctl reload`/`makoctl reload` directly now), `nomarchy-restart-tmux` (one-liner of marginal value), `nomarchy-battery-present` (battery monitor checks `/sys/class/power_supply/BAT*` inline), `nomarchy-sudo-keepalive` (intended-to-be-sourced building block with no users). Surfaced eight useful tools in `SKILL.md` so the audit catches them as `kept` and AI assistants can discover them: `nomarchy-restart-trackpad` (intel_quicki2c reload), `nomarchy-sudo-{passwordless-toggle,reset}`, `nomarchy-theme-{bg-install,refresh,remove}`, `nomarchy-refresh-fastfetch`, `nomarchy-windows-vm` (new Virtualization section). Final state: 159 scripts, all `kept`, `unused?` = 0, missing references = 0.