fix(installer): pre-flight resume polish (4 gaps)
Four resume-flow papercuts in `installer/install.sh` that hurt the "interrupted install" path the most. 1. `--resume` with no state file is no longer silent. The most common operator confusion: reboot the live ISO, forget /tmp/ is tmpfs, re-run with --resume, watch the installer start over from scratch without saying anything. Now: loud error, tmpfs explanation, exit 1. 2. Validate the saved TARGET_DRIVE still exists on resume. Live ISO USB sticks get unplugged between sessions, dev hosts sometimes have non-deterministic /dev/sdX numbering. Without the guard the install proceeds and fails with cryptic disko / mount errors deep in execute_installation. Now we fail at load_state with the actual reason and a clean recovery path. 3. Resume now shows what's being resumed. `save_state` stamps an ISO-8601 timestamp; `load_state` prints "Resumed from <path> (saved Xm ago)" plus a "Target: /dev/X → user @ host" summary line. Lets the user Ctrl-C before any destructive prompt fires if they're resuming onto the wrong machine. 4. `--help` documents the tmpfs limitation. Saved state lives in /tmp/ which is tmpfs on the live ISO; --resume only works within the same boot. The man-page now says so instead of letting users discover it the hard way. `format_age` is the one new helper — pretty-prints "Xs/Xm/Xh Ym/Xd" relative to now, falls back to the raw timestamp if `date -d` can't parse the input. shellcheck --severity=error passes. Out of scope (potential future work): - Persistent state across reboots (would need a writable USB / external drive — chicken/egg with the installer setting up the only persistent storage in the first place). - `--show-state` flag to inspect a saved file without running. - State-file schema versioning. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -77,7 +77,7 @@ Each PR description should reference the row(s) in `docs/SCRIPTS.md` it closes,
|
|||||||
- Richer disk metadata (Shipped).
|
- Richer disk metadata (Shipped).
|
||||||
- Form-factor → laptop preset (Now, depends on Pillar 5).
|
- Form-factor → laptop preset (Now, depends on Pillar 5).
|
||||||
- `disko-golden.nix` variants for software-RAID and BTRFS-pool-as-root (Shipped).
|
- `disko-golden.nix` variants for software-RAID and BTRFS-pool-as-root (Shipped).
|
||||||
- Pre-flight resume polish (Next).
|
- Pre-flight resume polish (Shipped).
|
||||||
- "What's installed?" summary screen on boot of a freshly-installed system, sourced from `state.json` + `nomarchy-system-scripts` introspection.
|
- "What's installed?" summary screen on boot of a freshly-installed system, sourced from `state.json` + `nomarchy-system-scripts` introspection.
|
||||||
- Optional non-LUKS branch in the installer for users who explicitly opt out of FDE.
|
- Optional non-LUKS branch in the installer for users who explicitly opt out of FDE.
|
||||||
|
|
||||||
@@ -121,6 +121,7 @@ Each PR description should reference the row(s) in `docs/SCRIPTS.md` it closes,
|
|||||||
|
|
||||||
(Move items here when they land — keep them brief, link the commit/PR.)
|
(Move items here when they land — keep them brief, link the commit/PR.)
|
||||||
|
|
||||||
|
- _2026-05-18_ — Pillar 4: pre-flight resume polish. Fixed four resume-flow gaps in `installer/install.sh`: (1) `--resume` with a missing state file now errors loudly with a tmpfs explanation instead of silently falling through to a fresh prompt cycle (the most common operator confusion was "rebooted, forgot tmpfs eats /tmp/, watched the installer start over without realising"); (2) on resume, the saved target drive is validated as a block device before any disk-phase step runs — catches the live-ISO USB-unplugged / non-deterministic /dev/sdX class of mid-install failures; (3) `save_state` now stamps an ISO-8601 timestamp and `load_state` shows a `(saved Xm ago)` banner plus a `Target: /dev/X → user @ host` summary line, so the user can `Ctrl-C` if they're resuming onto the wrong host before any destructive prompt fires; (4) `--help` now documents the tmpfs limitation. `shellcheck --severity=error` passes.
|
||||||
- _2026-05-18_ — Declarative-state defaults centralization. Made `lib/state-schema.nix` the single source of truth for every state-default that previously lived in three places (the schema itself, `core/system/options.nix` / `core/home/options.nix` `default = …` clauses, and `core/home/state.nix` `or …` fallbacks). Replaced ~25 hardcoded literals with `schema.<scope>.<key>` reads. Side-effect: fixed a lingering bug where `core/home/options.nix:theme` still defaulted to `"summer-night"` after the system-side was moved to `"nord"` — half the codebase's home option resolved to the wrong theme when state.json was missing/blank. `nix flake check --no-build` confirms zero semantic change for every other field. Doesn't touch the installer-written `state.json` (separate batch — needs schema → JSON generation).
|
- _2026-05-18_ — Declarative-state defaults centralization. Made `lib/state-schema.nix` the single source of truth for every state-default that previously lived in three places (the schema itself, `core/system/options.nix` / `core/home/options.nix` `default = …` clauses, and `core/home/state.nix` `or …` fallbacks). Replaced ~25 hardcoded literals with `schema.<scope>.<key>` reads. Side-effect: fixed a lingering bug where `core/home/options.nix:theme` still defaulted to `"summer-night"` after the system-side was moved to `"nord"` — half the codebase's home option resolved to the wrong theme when state.json was missing/blank. `nix flake check --no-build` confirms zero semantic change for every other field. Doesn't touch the installer-written `state.json` (separate batch — needs schema → JSON generation).
|
||||||
- _2026-05-18_ — Pillar 7 first step: Forgejo Actions CI (eval + lint). New `.forgejo/workflows/check.yml` runs on every push to `main` and every PR: (1) `nix flake check --no-build` to catch eval regressions, (2) `bash -n` + `shellcheck --severity=error` over every `nomarchy-*` bash script (whole-tree, not just changed files — gates branches that bypass the pre-commit hook), (3) `docs/SCRIPTS.md` drift check (fails loudly if a script change didn't regenerate the audit doc). All three checks pass locally on the current tree. Activation requires enabling Actions on the Forgejo repo and registering a `forgejo-runner`; the workflow itself is dormant until then. ISO build job is intentionally deferred — needs a binary cache (Cachix/Attic) to be tractable.
|
- _2026-05-18_ — Pillar 7 first step: Forgejo Actions CI (eval + lint). New `.forgejo/workflows/check.yml` runs on every push to `main` and every PR: (1) `nix flake check --no-build` to catch eval regressions, (2) `bash -n` + `shellcheck --severity=error` over every `nomarchy-*` bash script (whole-tree, not just changed files — gates branches that bypass the pre-commit hook), (3) `docs/SCRIPTS.md` drift check (fails loudly if a script change didn't regenerate the audit doc). All three checks pass locally on the current tree. Activation requires enabling Actions on the Forgejo repo and registering a `forgejo-runner`; the workflow itself is dormant until then. ISO build job is intentionally deferred — needs a binary cache (Cachix/Attic) to be tractable.
|
||||||
- _2026-05-18_ — **Pillar 3 Phase B: complete.** Final batch (restart/sudo/theme/misc clusters) cleared the last 13 `unused?` rows. Deleted five truly dead scripts: `nomarchy-restart-{hyprctl,mako}` (theme switching calls `hyprctl reload`/`makoctl reload` directly now), `nomarchy-restart-tmux` (one-liner of marginal value), `nomarchy-battery-present` (battery monitor checks `/sys/class/power_supply/BAT*` inline), `nomarchy-sudo-keepalive` (intended-to-be-sourced building block with no users). Surfaced eight useful tools in `SKILL.md` so the audit catches them as `kept` and AI assistants can discover them: `nomarchy-restart-trackpad` (intel_quicki2c reload), `nomarchy-sudo-{passwordless-toggle,reset}`, `nomarchy-theme-{bg-install,refresh,remove}`, `nomarchy-refresh-fastfetch`, `nomarchy-windows-vm` (new Virtualization section). Final state: 159 scripts, all `kept`, `unused?` = 0, missing references = 0.
|
- _2026-05-18_ — **Pillar 3 Phase B: complete.** Final batch (restart/sudo/theme/misc clusters) cleared the last 13 `unused?` rows. Deleted five truly dead scripts: `nomarchy-restart-{hyprctl,mako}` (theme switching calls `hyprctl reload`/`makoctl reload` directly now), `nomarchy-restart-tmux` (one-liner of marginal value), `nomarchy-battery-present` (battery monitor checks `/sys/class/power_supply/BAT*` inline), `nomarchy-sudo-keepalive` (intended-to-be-sourced building block with no users). Surfaced eight useful tools in `SKILL.md` so the audit catches them as `kept` and AI assistants can discover them: `nomarchy-restart-trackpad` (intel_quicki2c reload), `nomarchy-sudo-{passwordless-toggle,reset}`, `nomarchy-theme-{bg-install,refresh,remove}`, `nomarchy-refresh-fastfetch`, `nomarchy-windows-vm` (new Virtualization section). Final state: 159 scripts, all `kept`, `unused?` = 0, missing references = 0.
|
||||||
|
|||||||
@@ -74,6 +74,9 @@ Usage: install.sh [--dry-run] [--resume] [-h|--help]
|
|||||||
Doesn't touch the disk, doesn't run nixos-install.
|
Doesn't touch the disk, doesn't run nixos-install.
|
||||||
--resume Reuse answers from a previous interrupted run
|
--resume Reuse answers from a previous interrupted run
|
||||||
(saved at $STATE_FILE — passwords excluded).
|
(saved at $STATE_FILE — passwords excluded).
|
||||||
|
NOTE: the live ISO uses tmpfs, so the state file is lost
|
||||||
|
on reboot. --resume only works within the same live-ISO
|
||||||
|
session as the original interrupted run.
|
||||||
-h, --help Print this message.
|
-h, --help Print this message.
|
||||||
USAGE
|
USAGE
|
||||||
}
|
}
|
||||||
@@ -102,19 +105,74 @@ parse_args() {
|
|||||||
# will silently install a system with an empty password hash and lock the
|
# will silently install a system with an empty password hash and lock the
|
||||||
# user out. Keep the guard checking the hash.
|
# user out. Keep the guard checking the hash.
|
||||||
save_state() {
|
save_state() {
|
||||||
|
# The leading timestamp lets --resume surface "how old is this state?"
|
||||||
|
# and is parsed back via NOMARCHY_INSTALL_STATE_SAVED_AT.
|
||||||
|
{
|
||||||
|
echo "NOMARCHY_INSTALL_STATE_SAVED_AT=\"$(date -Iseconds)\""
|
||||||
declare -p \
|
declare -p \
|
||||||
TARGET_DRIVE USERNAME HOSTNAME TIMEZONE \
|
TARGET_DRIVE USERNAME HOSTNAME TIMEZONE \
|
||||||
KEYMAP_LAYOUT KEYMAP_VARIANT LOCALE FORM_FACTOR \
|
KEYMAP_LAYOUT KEYMAP_VARIANT LOCALE FORM_FACTOR \
|
||||||
ENABLE_IMPERMANENCE HARDWARE_MODULES NOMARCHY_HW_OPTS \
|
ENABLE_IMPERMANENCE HARDWARE_MODULES NOMARCHY_HW_OPTS \
|
||||||
SELECTED_PROFILES NOMARCHY_REV \
|
SELECTED_PROFILES NOMARCHY_REV
|
||||||
> "$STATE_FILE"
|
} > "$STATE_FILE"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Pretty-print "X minutes/hours/days ago" from an ISO-8601 timestamp.
|
||||||
|
# Falls back to the raw string if `date -d` can't parse it (defensive —
|
||||||
|
# the timestamp is always produced by `date -Iseconds` above, but we
|
||||||
|
# don't want a stale state file to crash --resume).
|
||||||
|
format_age() {
|
||||||
|
local saved="$1" saved_epoch now_epoch diff
|
||||||
|
saved_epoch=$(date -d "$saved" +%s 2>/dev/null) || { echo "$saved"; return; }
|
||||||
|
now_epoch=$(date +%s)
|
||||||
|
diff=$(( now_epoch - saved_epoch ))
|
||||||
|
if (( diff < 60 )); then echo "${diff}s ago"
|
||||||
|
elif (( diff < 3600 )); then echo "$((diff / 60))m ago"
|
||||||
|
elif (( diff < 86400 )); then echo "$((diff / 3600))h $((diff % 3600 / 60))m ago"
|
||||||
|
else echo "$((diff / 86400))d ago"
|
||||||
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
load_state() {
|
load_state() {
|
||||||
if [[ "$RESUME" == "true" ]] && [[ -f "$STATE_FILE" ]]; then
|
if [[ "$RESUME" != "true" ]]; then
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
# --resume with no state file is almost always operator error — the
|
||||||
|
# most common cause is "rebooted the live ISO and forgot tmpfs eats
|
||||||
|
# /tmp/". Fail loudly so the user doesn't sit through a fresh prompt
|
||||||
|
# cycle thinking it was resumed.
|
||||||
|
if [[ ! -f "$STATE_FILE" ]]; then
|
||||||
|
error "--resume was passed but no saved state exists at $STATE_FILE."
|
||||||
|
info "The live ISO uses tmpfs — saved state doesn't survive a reboot."
|
||||||
|
info "Re-run install.sh without --resume to start fresh."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
# shellcheck disable=SC1090
|
# shellcheck disable=SC1090
|
||||||
source "$STATE_FILE"
|
source "$STATE_FILE"
|
||||||
info "Resumed from $STATE_FILE"
|
|
||||||
|
# If the saved target drive isn't visible right now, every later
|
||||||
|
# disk-phase step will fail with cryptic errors. Catch it here.
|
||||||
|
# Live ISOs frequently get their non-boot USB sticks unplugged
|
||||||
|
# between sessions, and dev hosts sometimes have non-deterministic
|
||||||
|
# /dev/sdX numbering.
|
||||||
|
if [[ -n "${TARGET_DRIVE:-}" ]] && [[ ! -b "$TARGET_DRIVE" ]]; then
|
||||||
|
error "Saved target drive $TARGET_DRIVE is no longer a block device."
|
||||||
|
info "The drive may have been unplugged or renamed since the saved run."
|
||||||
|
info "Delete $STATE_FILE and re-run without --resume."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Show what we're resuming into so the user can Ctrl-C if they're on
|
||||||
|
# the wrong host before any password / disk-wipe prompts fire.
|
||||||
|
local age="unknown age"
|
||||||
|
if [[ -n "${NOMARCHY_INSTALL_STATE_SAVED_AT:-}" ]]; then
|
||||||
|
age=$(format_age "$NOMARCHY_INSTALL_STATE_SAVED_AT")
|
||||||
|
fi
|
||||||
|
info "Resumed from $STATE_FILE (saved $age)"
|
||||||
|
if [[ -n "${USERNAME:-}" || -n "${HOSTNAME:-}" || -n "${TARGET_DRIVE:-}" ]]; then
|
||||||
|
info " Target: ${TARGET_DRIVE:-?} → ${USERNAME:-?} @ ${HOSTNAME:-?}"
|
||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user