Skip to content

Add WSL gateway dependency-recovery to setup wizard#804

Merged
shanselman merged 2 commits into
openclaw:mainfrom
bkudiess:bkudiess/wsl-gateway-install-recovery
Jun 24, 2026
Merged

Add WSL gateway dependency-recovery to setup wizard#804
shanselman merged 2 commits into
openclaw:mainfrom
bkudiess:bkudiess/wsl-gateway-install-recovery

Conversation

@bkudiess

@bkudiess bkudiess commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What & why

When the gateway config wizard (provider/model/OAuth onboarding) fails on an app-managed WSL gateway — typically because a required CLI/tool isn't installed in the distro — the wizard used to dead-end on a generic error with no way forward. This PR adds in-product recovery so the user can fix the problem and continue without leaving the app.

The wizard error state now offers:

  • Selectable error text — the gateway's error is shown verbatim and is selectable, so the user can copy whatever install command (or other guidance) the gateway reports.
  • Open terminal — drops the user into the WSL distro to run the fix.
  • Restart gateway — restarts the distro, reconnects, and re-enters the config wizard right where they left off.

We don't parse the gateway error text. The gateway's wording is outside our control and can change, so rather than guessing at "is this a missing dependency?" or extracting a command, we surface the message as-is and let the user decide. The recovery actions are gated purely on the gateway being an app-managed WSL distro we control (GatewayHostAccess.CanControlWslGateway + a known DistroName), so they never appear for gateways we can't act on.

Changes

  • Wizard recovery UIGatewayRecovery panel in WizardPage.xaml (Open terminal + Restart gateway) plus handlers in WizardPage.xaml.cs (MaybeShowGatewayRecovery, open-terminal / restart). ErrorText is now selectable.
  • WSL helpers relocatedWslCommandRunner, WslGatewayController, GatewayHostAccess, GatewayTerminalLauncher move from OpenClaw.Tray.WinUI/Services into OpenClaw.Connection so the SetupEngine UI can consume them (history-preserving renames).
  • Docsdocs/ONBOARDING_WIZARD.md notes the recovery flow.

Hardening (from a dual-model review)

  • Restart handler claims the operation generation and locks the UI synchronously before the first await, and re-checks the generation after each await, so a double-click can't fire a stale restart.
  • WslGatewayController only reports "distro not registered" when enumeration definitively lacks it; an empty list (probe failure/timeout) fails open so the real error surfaces.
  • Gateway restart uses a 2-minute timeout (vs the 30s default) so a slow cold-distro restart isn't killed as a spurious timeout.

Validation

  • ./build.ps1 — ✅ all projects (ARM64, warnings-as-errors clean)
  • OpenClaw.Shared.Tests — ✅ 2370 passed / 0 failed / 29 skipped
  • OpenClaw.Tray.Tests — ✅ 1124 passed / 0 failed

Rebased onto main; the GatewayHostAccess relocation was merged with main's new
localization indirection (GatewayHostAccessLocalization is now public so the Tray
App.xaml.cs wiring still resolves across the assembly boundary).

Screenshots

The GatewayRecovery panel in the wizard error state: the gateway error shown verbatim (selectable) with Open terminal and Restart gateway actions — only rendered for an app-managed WSL gateway. No parsed command box or generated help text.

WSL gateway recovery — selectable error with Open terminal and Restart gateway actions

@clawsweeper

clawsweeper Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codex review: needs real behavior proof before merge. Reviewed June 24, 2026, 3:29 PM ET / 19:29 UTC.

Summary
The PR adds setup-wizard WSL recovery actions, makes wizard error text selectable, relocates WSL terminal/control helpers into OpenClaw.Connection, updates docs, and adjusts tray tests.

Reproducibility: yes. from source inspection. Trigger a wizard error with a live server-side session, click Restart gateway, and have the WSL control command fail before stopping the gateway; the PR has already disconnected and can no longer send wizard.cancel.

Review metrics: 2 noteworthy metrics.

  • Touched surface: 12 files, +182/-26. The diff spans setup UI, connection helper ownership, docs, and tests, so review needs to cover both UX behavior and helper-boundary effects.
  • Moved WSL helpers: 4 renamed helpers. Relocating terminal and WSL control helpers into OpenClaw.Connection affects existing settings/diagnostics consumers beyond the new wizard panel.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦐 gold shrimp
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Cancel the active wizard session before disconnecting for gateway restart.
  • [P1] Add redacted runtime proof for terminal launch, restart, reconnect, and wizard re-entry.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The screenshot shows the recovery panel, but the contributor still needs redacted recording, terminal output, logs, or a linked artifact proving terminal launch, WSL restart, reconnect, and wizard re-entry; after updating the PR body, ClawSweeper should re-review automatically or a maintainer can request it.

Mantis proof suggestion
A visible desktop proof would materially verify the setup wizard recovery UI plus real WSL terminal and restart behavior. A maintainer can ask Mantis to capture proof by posting this exact PR comment:

@openclaw-mantis visual task: verify a missing CLI error in the WSL setup wizard shows recovery buttons, opens the distro terminal, restarts the gateway, reconnects, and re-enters provider setup.

Risk before merge

  • [P1] If the WSL restart control command fails before the gateway stops, the PR has already disconnected and may be unable to cancel the still-active wizard session, causing setup retries to stall on an already-running session.
  • [P1] The screenshot shows the recovery panel but does not prove Open terminal, WSL restart, reconnect, or wizard re-entry in a real app-managed WSL gateway.
  • [P1] This read-only review did not run the AGENTS build/test commands; the PR body reports local validation and GitHub still had several in-progress checks when inspected.

Maintainer options:

  1. Fix restart ordering, then prove the flow (recommended)
    Cancel the active wizard session before clearing the client, then require redacted runtime proof that terminal launch, restart, reconnect, and wizard re-entry work in an app-managed WSL gateway.
  2. Accept source-only restart risk
    Maintainers could intentionally merge based on source review and screenshot-only UI proof, but they would own the risk of stranded wizard sessions in the failed-restart path.
  3. Pause WSL recovery
    If maintainers are not ready to own setup-wizard WSL recovery yet, pause the PR rather than landing a restart path that can strand retries.

Next step before merge

  • [P1] Contributor runtime proof is still required, and the restart-session ordering blocker should be fixed before normal maintainer merge review; do not queue automation while proof is insufficient.

Security
Cleared: No concrete security or supply-chain issue was found; the WSL terminal/restart actions reuse fixed helper commands gated to app-managed gateway records and do not execute gateway error text.

Review findings

  • [P2] Cancel the wizard session before restarting the gateway — src/OpenClaw.SetupEngine.UI/Pages/WizardPage.xaml.cs:831
Review details

Best possible solution:

Keep the WSL recovery UI, cancel the active wizard session before any restart disconnect, and require redacted runtime proof of terminal launch, restart, reconnect, and wizard re-entry.

Do we have a high-confidence way to reproduce the issue?

Yes, from source inspection. Trigger a wizard error with a live server-side session, click Restart gateway, and have the WSL control command fail before stopping the gateway; the PR has already disconnected and can no longer send wizard.cancel.

Is this the best way to solve the issue?

No, not yet. The recovery direction fits the setup flow, but the implementation should cancel the wizard session before detaching the client, and proof should show the real WSL terminal/restart/reconnect path.

Full review comments:

  • [P2] Cancel the wizard session before restarting the gateway — src/OpenClaw.SetupEngine.UI/Pages/WizardPage.xaml.cs:831
    RestartGatewayAsync clears _client via DisconnectAsync() before the WSL restart command has actually stopped the gateway. If that control command fails first, the later error/retry path cannot send wizard.cancel, so the next start can dead-end on an already-running wizard session.
    Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 2cae69ba6ca4.

Label changes

Label changes:

  • add merge-risk: 🚨 session-state: The restart path can strand an active gateway wizard session because it clears the client before sending wizard.cancel.

Label justifications:

  • P2: This is a normal-priority setup-wizard improvement with limited blast radius but a concrete restart-session blocker before merge.
  • merge-risk: 🚨 availability: The failed-restart path can leave setup retries stalled, making the setup wizard unavailable for recovery.
  • merge-risk: 🚨 session-state: The restart path can strand an active gateway wizard session because it clears the client before sending wizard.cancel.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦐 gold shrimp.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The screenshot shows the recovery panel, but the contributor still needs redacted recording, terminal output, logs, or a linked artifact proving terminal launch, WSL restart, reconnect, and wizard re-entry; after updating the PR body, ClawSweeper should re-review automatically or a maintainer can request it.
  • proof: 📸 screenshot: Contributor real behavior proof includes screenshot evidence. The screenshot shows the recovery panel, but the contributor still needs redacted recording, terminal output, logs, or a linked artifact proving terminal launch, WSL restart, reconnect, and wizard re-entry; after updating the PR body, ClawSweeper should re-review automatically or a maintainer can request it.
Evidence reviewed

What I checked:

Likely related people:

  • bkudiess: Prior merged work changed the same setup wizard option flow that this PR extends. (role: recent area contributor; confidence: high; commits: b90ded9a5eea; files: src/OpenClaw.SetupEngine.UI/Pages/WizardPage.xaml.cs, src/OpenClaw.SetupEngine/SetupWizardRunner.cs)
  • ranjeshj: Recent merged work changed the gateway terminal helper stack that this PR relocates and reuses. (role: recent adjacent contributor; confidence: high; commits: 429be9ba9368; files: src/OpenClaw.Tray.WinUI/Services/GatewayTerminalLauncher.cs, src/OpenClaw.Tray.WinUI/Pages/ConnectionPage.xaml.cs, tests/OpenClaw.Tray.Tests/GatewayTerminalLaunchCommandBuilderTests.cs)
  • kmahone: The current-main setup wizard and initial WSL helper baseline trace to the grafted baseline commit in this checkout. (role: introduced behavior; confidence: medium; commits: 37b0ea672037; files: src/OpenClaw.SetupEngine.UI/Pages/WizardPage.xaml.cs, src/OpenClaw.Tray.WinUI/Services/GatewayHostAccess.cs, src/OpenClaw.Tray.WinUI/Services/WslGatewayController.cs)
  • shanselman: Merged the related stale-session cleanup PR and authored the latest docs-only commit on this PR branch. (role: recent merger and PR contributor; confidence: medium; commits: 80029feef028, 6877f6612c26; files: src/OpenClaw.SetupEngine.UI/Pages/WizardPage.xaml.cs, docs/ONBOARDING_WIZARD.md)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. labels Jun 23, 2026
bkudiess added a commit to bkudiess/openclaw-windows-node that referenced this pull request Jun 23, 2026
bkudiess added a commit to bkudiess/openclaw-windows-node that referenced this pull request Jun 23, 2026
@clawsweeper clawsweeper Bot added the proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. label Jun 23, 2026
@bkudiess bkudiess force-pushed the bkudiess/wsl-gateway-install-recovery branch from 247beac to b1fefc1 Compare June 23, 2026 17:32
@bkudiess bkudiess changed the base branch from master to main June 23, 2026 17:32
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. labels Jun 23, 2026
@bkudiess bkudiess force-pushed the bkudiess/wsl-gateway-install-recovery branch from acd4475 to 66e5936 Compare June 23, 2026 22:21
@bkudiess bkudiess marked this pull request as ready for review June 23, 2026 22:21
Copilot and others added 2 commits June 24, 2026 14:09
When the gateway config wizard (provider/model/OAuth onboarding) fails on
an app-managed WSL gateway, the wizard used to dead-end on a generic error
with no way forward. This adds in-product recovery: the gateway error is
shown verbatim and selectable (so the user can copy any install command it
reports), plus two actions:

- Open terminal  - drops into the WSL distro to run the fix
- Restart gateway - restarts the distro, reconnects, and re-enters the
  config wizard where the user left off

We don't parse the gateway error text; the gateway's wording is outside our
control and can change. The recovery actions are gated purely on the gateway
being an app-managed WSL distro we control (GatewayHostAccess.CanControlWslGateway
plus a known DistroName).

WSL helpers (WslCommandRunner, WslGatewayController, GatewayHostAccess,
GatewayTerminalLauncher) move from OpenClaw.Tray.WinUI/Services into
OpenClaw.Connection so the SetupEngine UI can consume them (history-preserving
renames).

Hardening (from a dual-model review):
- Restart handler claims the operation generation and locks the UI
  synchronously before the first await, re-checking after each await, so a
  double-click can't fire a stale restart.
- WslGatewayController only reports 'distro not registered' when enumeration
  definitively lacks it; an empty list (probe failure/timeout) fails open.
- Gateway restart uses a 2-minute timeout for slow cold-distro restarts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document that the wizard shows WSL recovery actions based on the managed gateway record and does not parse gateway error text.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@shanselman shanselman force-pushed the bkudiess/wsl-gateway-install-recovery branch from 66e5936 to 6877f66 Compare June 24, 2026 19:23
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 session-state 🚨 Merging this PR could lose, corrupt, stale, or mis-associate session or agent state. label Jun 24, 2026
@shanselman shanselman merged commit 1e69513 into openclaw:main Jun 24, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 session-state 🚨 Merging this PR could lose, corrupt, stale, or mis-associate session or agent state. P2 Normal priority bug or improvement with limited blast radius. proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants