ref:main

install.sh skips service restart when no legacy migration is needed #2

open Opened by cole.christensen@gmail.com

Links

No links yet.

Problem

The runner installer (install.sh) only restarts the runner service as part of the legacy migration path — i.e. when it detects an old anvil-runner binary and walks the stop-old / remove-old / runner service install / svc-start sequence.

When there’s no legacy install to migrate (i.e. every upgrade after the first, which is the common case), the script:

  1. Downloads the new binary to $DEST.tmp.$$
  2. mv -f replaces the binary on disk
  3. Exits successfully

It does not restart the running systemd/launchd service. Linux keeps the old binary’s pages mapped in the running process, so the service continues executing the previous version indefinitely — until something else restarts it (reboot, manual systemctl restart, a crash).

Note: the script itself is currently served by the anvil server and lives at anvil:lib/anvil_web/controllers/runner_download_controller.ex:111 (install_script_content/1). The fix lands there, but the concern is an anvil-cli deployment concern and should be tracked alongside the runner.

Impact (real incident, 2026-04-17)

This masked the runner regression behind #49. After install.sh was re-run on carl to upgrade to 2026.04.6 (which contains the #49 fix), the service kept running the in-memory 0.1.0 hand build from earlier. The fix was only verified because I then manually ran systemctl --user restart anvil-runner. A normal user would have assumed the upgrade took effect.

More broadly: any fix shipped via a new CalVer release is silently deferred on every host that still has an already-migrated install. Security fixes, behavior fixes, schema updates — all affected.

Expected behavior

After a successful binary swap, install.sh should detect the running service and restart it, regardless of whether a legacy migration happened.

Rough shape (in install_script_content/1 in the anvil repo):

# After `mv -f $DEST_TMP $DEST`
if [ "$OS" = "Darwin" ]; then
if launchctl list | grep -q com.anvil.runner; then
info "Restarting service to pick up new binary"
"$DEST" runner service svc-restart || info "WARNING: restart failed, restart manually"
fi
else
if systemctl --user is-active --quiet anvil-runner 2>/dev/null \
|| systemctl is-active --quiet anvil-runner 2>/dev/null; then
info "Restarting service to pick up new binary"
"$DEST" runner service svc-restart || info "WARNING: restart failed, restart manually"
fi
fi

If runner service svc-restart doesn’t exist yet (this repo already has svc-start), add it here in anvil-cli as part of the fix.

Acceptance criteria

  • REQ-RUN-NEW-1: Running install.sh on a host with an active anvil-runner service MUST restart that service after the binary is replaced.
  • REQ-RUN-NEW-2: If the restart fails, the script MUST surface a clear warning (and exit non-zero, or print the manual command) — not silently succeed.
  • REQ-RUN-NEW-3: Running install.sh on a host with no service installed MUST continue to work as today (no restart attempted).
  • Add an E2E test that covers the “upgrade an already-installed runner” path and asserts the in-memory binary matches the on-disk binary after the script returns.