fix(installer): pre-flight resume polish (4 gaps)
Four resume-flow papercuts in `installer/install.sh` that hurt the "interrupted install" path the most. 1. `--resume` with no state file is no longer silent. The most common operator confusion: reboot the live ISO, forget /tmp/ is tmpfs, re-run with --resume, watch the installer start over from scratch without saying anything. Now: loud error, tmpfs explanation, exit 1. 2. Validate the saved TARGET_DRIVE still exists on resume. Live ISO USB sticks get unplugged between sessions, dev hosts sometimes have non-deterministic /dev/sdX numbering. Without the guard the install proceeds and fails with cryptic disko / mount errors deep in execute_installation. Now we fail at load_state with the actual reason and a clean recovery path. 3. Resume now shows what's being resumed. `save_state` stamps an ISO-8601 timestamp; `load_state` prints "Resumed from <path> (saved Xm ago)" plus a "Target: /dev/X → user @ host" summary line. Lets the user Ctrl-C before any destructive prompt fires if they're resuming onto the wrong machine. 4. `--help` documents the tmpfs limitation. Saved state lives in /tmp/ which is tmpfs on the live ISO; --resume only works within the same boot. The man-page now says so instead of letting users discover it the hard way. `format_age` is the one new helper — pretty-prints "Xs/Xm/Xh Ym/Xd" relative to now, falls back to the raw timestamp if `date -d` can't parse the input. shellcheck --severity=error passes. Out of scope (potential future work): - Persistent state across reboots (would need a writable USB / external drive — chicken/egg with the installer setting up the only persistent storage in the first place). - `--show-state` flag to inspect a saved file without running. - State-file schema versioning. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -77,7 +77,7 @@ Each PR description should reference the row(s) in `docs/SCRIPTS.md` it closes,
|
||||
- Richer disk metadata (Shipped).
|
||||
- Form-factor → laptop preset (Now, depends on Pillar 5).
|
||||
- `disko-golden.nix` variants for software-RAID and BTRFS-pool-as-root (Shipped).
|
||||
- Pre-flight resume polish (Next).
|
||||
- Pre-flight resume polish (Shipped).
|
||||
- "What's installed?" summary screen on boot of a freshly-installed system, sourced from `state.json` + `nomarchy-system-scripts` introspection.
|
||||
- Optional non-LUKS branch in the installer for users who explicitly opt out of FDE.
|
||||
|
||||
@@ -121,6 +121,7 @@ Each PR description should reference the row(s) in `docs/SCRIPTS.md` it closes,
|
||||
|
||||
(Move items here when they land — keep them brief, link the commit/PR.)
|
||||
|
||||
- _2026-05-18_ — Pillar 4: pre-flight resume polish. Fixed four resume-flow gaps in `installer/install.sh`: (1) `--resume` with a missing state file now errors loudly with a tmpfs explanation instead of silently falling through to a fresh prompt cycle (the most common operator confusion was "rebooted, forgot tmpfs eats /tmp/, watched the installer start over without realising"); (2) on resume, the saved target drive is validated as a block device before any disk-phase step runs — catches the live-ISO USB-unplugged / non-deterministic /dev/sdX class of mid-install failures; (3) `save_state` now stamps an ISO-8601 timestamp and `load_state` shows a `(saved Xm ago)` banner plus a `Target: /dev/X → user @ host` summary line, so the user can `Ctrl-C` if they're resuming onto the wrong host before any destructive prompt fires; (4) `--help` now documents the tmpfs limitation. `shellcheck --severity=error` passes.
|
||||
- _2026-05-18_ — Declarative-state defaults centralization. Made `lib/state-schema.nix` the single source of truth for every state-default that previously lived in three places (the schema itself, `core/system/options.nix` / `core/home/options.nix` `default = …` clauses, and `core/home/state.nix` `or …` fallbacks). Replaced ~25 hardcoded literals with `schema.<scope>.<key>` reads. Side-effect: fixed a lingering bug where `core/home/options.nix:theme` still defaulted to `"summer-night"` after the system-side was moved to `"nord"` — half the codebase's home option resolved to the wrong theme when state.json was missing/blank. `nix flake check --no-build` confirms zero semantic change for every other field. Doesn't touch the installer-written `state.json` (separate batch — needs schema → JSON generation).
|
||||
- _2026-05-18_ — Pillar 7 first step: Forgejo Actions CI (eval + lint). New `.forgejo/workflows/check.yml` runs on every push to `main` and every PR: (1) `nix flake check --no-build` to catch eval regressions, (2) `bash -n` + `shellcheck --severity=error` over every `nomarchy-*` bash script (whole-tree, not just changed files — gates branches that bypass the pre-commit hook), (3) `docs/SCRIPTS.md` drift check (fails loudly if a script change didn't regenerate the audit doc). All three checks pass locally on the current tree. Activation requires enabling Actions on the Forgejo repo and registering a `forgejo-runner`; the workflow itself is dormant until then. ISO build job is intentionally deferred — needs a binary cache (Cachix/Attic) to be tractable.
|
||||
- _2026-05-18_ — **Pillar 3 Phase B: complete.** Final batch (restart/sudo/theme/misc clusters) cleared the last 13 `unused?` rows. Deleted five truly dead scripts: `nomarchy-restart-{hyprctl,mako}` (theme switching calls `hyprctl reload`/`makoctl reload` directly now), `nomarchy-restart-tmux` (one-liner of marginal value), `nomarchy-battery-present` (battery monitor checks `/sys/class/power_supply/BAT*` inline), `nomarchy-sudo-keepalive` (intended-to-be-sourced building block with no users). Surfaced eight useful tools in `SKILL.md` so the audit catches them as `kept` and AI assistants can discover them: `nomarchy-restart-trackpad` (intel_quicki2c reload), `nomarchy-sudo-{passwordless-toggle,reset}`, `nomarchy-theme-{bg-install,refresh,remove}`, `nomarchy-refresh-fastfetch`, `nomarchy-windows-vm` (new Virtualization section). Final state: 159 scripts, all `kept`, `unused?` = 0, missing references = 0.
|
||||
|
||||
@@ -74,6 +74,9 @@ Usage: install.sh [--dry-run] [--resume] [-h|--help]
|
||||
Doesn't touch the disk, doesn't run nixos-install.
|
||||
--resume Reuse answers from a previous interrupted run
|
||||
(saved at $STATE_FILE — passwords excluded).
|
||||
NOTE: the live ISO uses tmpfs, so the state file is lost
|
||||
on reboot. --resume only works within the same live-ISO
|
||||
session as the original interrupted run.
|
||||
-h, --help Print this message.
|
||||
USAGE
|
||||
}
|
||||
@@ -102,19 +105,74 @@ parse_args() {
|
||||
# will silently install a system with an empty password hash and lock the
|
||||
# user out. Keep the guard checking the hash.
|
||||
save_state() {
|
||||
declare -p \
|
||||
TARGET_DRIVE USERNAME HOSTNAME TIMEZONE \
|
||||
KEYMAP_LAYOUT KEYMAP_VARIANT LOCALE FORM_FACTOR \
|
||||
ENABLE_IMPERMANENCE HARDWARE_MODULES NOMARCHY_HW_OPTS \
|
||||
SELECTED_PROFILES NOMARCHY_REV \
|
||||
> "$STATE_FILE"
|
||||
# The leading timestamp lets --resume surface "how old is this state?"
|
||||
# and is parsed back via NOMARCHY_INSTALL_STATE_SAVED_AT.
|
||||
{
|
||||
echo "NOMARCHY_INSTALL_STATE_SAVED_AT=\"$(date -Iseconds)\""
|
||||
declare -p \
|
||||
TARGET_DRIVE USERNAME HOSTNAME TIMEZONE \
|
||||
KEYMAP_LAYOUT KEYMAP_VARIANT LOCALE FORM_FACTOR \
|
||||
ENABLE_IMPERMANENCE HARDWARE_MODULES NOMARCHY_HW_OPTS \
|
||||
SELECTED_PROFILES NOMARCHY_REV
|
||||
} > "$STATE_FILE"
|
||||
}
|
||||
|
||||
# Pretty-print "X minutes/hours/days ago" from an ISO-8601 timestamp.
|
||||
# Falls back to the raw string if `date -d` can't parse it (defensive —
|
||||
# the timestamp is always produced by `date -Iseconds` above, but we
|
||||
# don't want a stale state file to crash --resume).
|
||||
format_age() {
|
||||
local saved="$1" saved_epoch now_epoch diff
|
||||
saved_epoch=$(date -d "$saved" +%s 2>/dev/null) || { echo "$saved"; return; }
|
||||
now_epoch=$(date +%s)
|
||||
diff=$(( now_epoch - saved_epoch ))
|
||||
if (( diff < 60 )); then echo "${diff}s ago"
|
||||
elif (( diff < 3600 )); then echo "$((diff / 60))m ago"
|
||||
elif (( diff < 86400 )); then echo "$((diff / 3600))h $((diff % 3600 / 60))m ago"
|
||||
else echo "$((diff / 86400))d ago"
|
||||
fi
|
||||
}
|
||||
|
||||
load_state() {
|
||||
if [[ "$RESUME" == "true" ]] && [[ -f "$STATE_FILE" ]]; then
|
||||
# shellcheck disable=SC1090
|
||||
source "$STATE_FILE"
|
||||
info "Resumed from $STATE_FILE"
|
||||
if [[ "$RESUME" != "true" ]]; then
|
||||
return
|
||||
fi
|
||||
|
||||
# --resume with no state file is almost always operator error — the
|
||||
# most common cause is "rebooted the live ISO and forgot tmpfs eats
|
||||
# /tmp/". Fail loudly so the user doesn't sit through a fresh prompt
|
||||
# cycle thinking it was resumed.
|
||||
if [[ ! -f "$STATE_FILE" ]]; then
|
||||
error "--resume was passed but no saved state exists at $STATE_FILE."
|
||||
info "The live ISO uses tmpfs — saved state doesn't survive a reboot."
|
||||
info "Re-run install.sh without --resume to start fresh."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# shellcheck disable=SC1090
|
||||
source "$STATE_FILE"
|
||||
|
||||
# If the saved target drive isn't visible right now, every later
|
||||
# disk-phase step will fail with cryptic errors. Catch it here.
|
||||
# Live ISOs frequently get their non-boot USB sticks unplugged
|
||||
# between sessions, and dev hosts sometimes have non-deterministic
|
||||
# /dev/sdX numbering.
|
||||
if [[ -n "${TARGET_DRIVE:-}" ]] && [[ ! -b "$TARGET_DRIVE" ]]; then
|
||||
error "Saved target drive $TARGET_DRIVE is no longer a block device."
|
||||
info "The drive may have been unplugged or renamed since the saved run."
|
||||
info "Delete $STATE_FILE and re-run without --resume."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Show what we're resuming into so the user can Ctrl-C if they're on
|
||||
# the wrong host before any password / disk-wipe prompts fire.
|
||||
local age="unknown age"
|
||||
if [[ -n "${NOMARCHY_INSTALL_STATE_SAVED_AT:-}" ]]; then
|
||||
age=$(format_age "$NOMARCHY_INSTALL_STATE_SAVED_AT")
|
||||
fi
|
||||
info "Resumed from $STATE_FILE (saved $age)"
|
||||
if [[ -n "${USERNAME:-}" || -n "${HOSTNAME:-}" || -n "${TARGET_DRIVE:-}" ]]; then
|
||||
info " Target: ${TARGET_DRIVE:-?} → ${USERNAME:-?} @ ${HOSTNAME:-?}"
|
||||
fi
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user