Skip to content

Skip attach E2E test on in-box Windows PowerShell (20260614 image regression); cap CI job#2318

Merged
andyleejordan merged 13 commits into
mainfrom
andyleejordan/reduce-ci-test-timeout
Jun 22, 2026
Merged

Skip attach E2E test on in-box Windows PowerShell (20260614 image regression); cap CI job#2318
andyleejordan merged 13 commits into
mainfrom
andyleejordan/reduce-ci-test-timeout

Conversation

@andyleejordan

@andyleejordan andyleejordan commented Jun 18, 2026

Copy link
Copy Markdown
Member

What

CanAttachScriptWithPathMappings (added in #2251) started hanging the Windows leg of CI for hours, riding GitHub's 6-hour default job timeout without ever throwing.

This PR is a stopgap, not the real fix:

  1. Skip the test on in-box Windows PowerShell so the windows-latest leg can complete again.
  2. Cap the CI test job at 30 minutes as a backstop so any future stall fails fast instead of burning a 6-hour runner.
  3. Minor harness hardening (yield instead of tight EOF/poll spins) that is good hygiene regardless.

The underlying in-box attach deadlock is tracked by #2323.

Root cause — a windows-latest runner-image refresh

This is not a code regression in PSES. By comparing the last green and first red main runs:

  • Last green (#2304) ran on image win25-vs2026/20260608.135.
  • First red (#2303) and every red after it ran on image 20260614.141.

Same image family, same runner (2.335.1), same -Preview PowerShell. The only thing that changed at the boundary is the weekly OS-servicing patch in the image. That refresh broke in-box Windows PowerShell 5.1's cross-process Debug-Runspace / Enter-PSHostProcess attach, which is exactly what this test exercises. The precise servicing delta (likely a .NET Framework / Windows IPC update) is still unknown and is the subject of #2323.

The hang is specifically the in-box Windows PowerShell E2E suite (TestE2EPowerShell). PowerShell Core (TestE2EPwsh) and the preview pass the same attach test, so the skip is scoped to IsWindowsPowerShell (covering the WinPS and WinPS-CLM suites) and Core / preview / macOS / Linux keep full coverage of the attach path.

Why #2303 is not the cause

An earlier per-commit bisection fingered #2303 (the strong-name identity change), but that was confounded: #2303 happened to be the first merge onto the new image, so the image bump and the code change moved together. Two independent proofs exonerate it:

#2303 is left intact.

Follow-up

Stopgap for #2323. Once the real in-box attach fix lands, the Skip.If here should be removed; the timeout-minutes backstop can stay as a permanent guard.

The `CanAttachScriptWithPathMappings` E2E test intermittently hung
`windows-latest` CI for the full six-hour default — three of the last
eleven `main` runs died this way, all the same test, interspersed with
green runs (a classic flaky race, not a regression). None of the commits
whose runs hung touched the debugger attach path.

The hang mechanism lived in `ReadScriptLogLineAsync`: at EOF
`StreamReader.ReadLineAsync()` completes *synchronously* with `null`, so
the `while`/`await` polling loop never actually yielded. It busy-spun one
CPU at 100%, which starved the scheduler so none of the existing
cooperative safety nets — xUnit's `[SkippableFact(Timeout = 15000)]`, the
30s `debugTaskCts`, or `WaitForExitAsync` — could ever schedule their
continuations. A flaky few-second race thus escalated into a six-hour
wedge. Ironically the busy-loop landed in #2208, a PR meant to reduce
flakiness, and lay dormant until #2251 added a Windows-racy attach test
that actually hits the EOF spin.

- Back off with `await Task.Delay(100, token)` on EOF so we yield instead
  of busy-spinning, and cap the whole read with a 15s linked CTS that
  throws a clear `TimeoutException` naming the log path.
- Add `timeout-minutes: 15` to the `ci` job as a backstop so any future
  hang fails in 15 minutes instead of riding GitHub's 6-hour default. A
  normal run finishes well under that (Windows, the slowest, is ~12-14m).

The underlying attach race (reflection-based wait for `Debug-Runspace` to
subscribe) is still worth hardening, but it now fails fast instead of
hanging.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@andyleejordan andyleejordan requested a review from a team as a code owner June 18, 2026 18:41
Copilot AI review requested due to automatic review settings June 18, 2026 18:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an intermittent six-hour CI hang on windows-latest caused by the CanAttachScriptWithPathMappings E2E test. The root cause is a busy-spin in ReadScriptLogLineAsync: at EOF, StreamReader.ReadLineAsync() completes synchronously with null, so the polling loop never yields, pegging a CPU at 100% and starving the cooperative timeouts (xUnit Timeout, internal CTS, WaitForExitAsync) that would otherwise abort the test. The change makes the reader yield and adds a CI-level backstop, fitting into the repo's broader effort (#2208) to reduce E2E test flakiness.

Changes:

  • Add await Task.Delay(100, token) backoff on EOF in ReadScriptLogLineAsync so the reader yields instead of busy-spinning, and wrap the read in a 15s linked CancellationTokenSource that throws a descriptive TimeoutException naming the log path.
  • Add an optional CancellationToken parameter (default), keeping all existing callers unchanged.
  • Add timeout-minutes: 15 to the ci job as a hung-run backstop.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
test/PowerShellEditorServices.Test.E2E/DebugAdapterProtocolMessageTests.cs Replaces the EOF busy-spin with a yielding backoff and a 15s timeout that fails fast with a clear message.
.github/workflows/ci-test.yml Caps the ci job at 15 minutes so any future hang fails quickly instead of riding GitHub's 6-hour default.

Comment thread .github/workflows/ci-test.yml Outdated
Comment thread test/PowerShellEditorServices.Test.E2E/DebugAdapterProtocolMessageTests.cs Outdated
@andyleejordan

Copy link
Copy Markdown
Member Author

@JustinGrote at least there's a timeout now but I think I broke the build somehow...

andyleejordan and others added 2 commits June 18, 2026 13:51
…2303)"

This reverts commit b9fd1b3.

#2303 is what broke `CanAttachScriptWithPathMappings` on Windows. A clean
bisection shows its parent (#2304, 6ad4f46) passed Windows E2E in ~12
minutes, while #2303 itself hung for 5h51m on that exact test -- and every
commit built on top of it inherited the hang. Months of green Windows runs
precede #2303.

The mechanism is in `PsesLoadContext.Load`. #2303 tightened
`IsSatisfyingAssembly` to also require a matching public key token and
culture. When a `$PSHOME` assembly previously satisfied a dependency by
name+version, `Load` returned `null` and PSES *shared* PowerShell's single
copy. Under the stricter check a token mismatch now fails that first test,
so `Load` falls through and loads our *own* bundled copy into the isolated
`PsesLoadContext` instead -- producing two copies of the same assembly in
two load contexts and a split type identity. The debugger-attach handshake
(`Debug-Runspace` subscribing to `RunspaceBase.AvailabilityChanged`, plus
the stopped-event plumbing in SMA) relies on cross-context event wiring
that silently breaks under such a split, so the attach never completes and
the test waits forever. It only trips on Windows because that is where the
`$PSHOME`-versus-bundled token divergence occurs. #2303's "no bundled
dependency changes resolution" check was static and missed an assembly
loaded dynamically during attach.

#2303 was self-described as "a focused trial of tightening" the matching,
so reverting it restores the long-standing, known-good behavior. We can
re-attempt the hardening later with this attach test as a guard.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@andyleejordan andyleejordan changed the title Reduce CI test timeout and fix busy-spin in ReadScriptLogLineAsync Revert #2303 to fix Windows debugger-attach CI hang Jun 18, 2026
@andyleejordan andyleejordan enabled auto-merge (squash) June 18, 2026 20:55
The internal `CancelAfter` cap was 15s, exactly equal to the
`[SkippableFact(Timeout = 15000)]` on `CanAttachScriptWithPathMappings`.
Because xUnit's per-test timer covers the whole test -- attach,
setBreakpoints, configurationDone and waiting for stopped events all run
before `ReadScriptLogLineAsync` is even entered -- xUnit's generic
timeout would almost always fire first, so the descriptive
`TimeoutException` naming the log path would never surface for the very
test that motivated it.

Drop the cap to 10s so the clearer message can win for that test, while
still bounding the untimed `[Fact]` callers. Per review feedback from
copilot-pull-request-reviewer on #2318.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
andyleejordan and others added 2 commits June 18, 2026 14:04
Reduce this branch to its one honest, effective change: a 30-minute
`timeout-minutes` on the CI test job. A normal run finishes well under
that (Windows, the slowest, is ~12-14 minutes), so the cap only bounds a
hung test instead of letting it ride GitHub's 6-hour default.

This un-reverts #2303 and drops the earlier `ReadScriptLogLineAsync`
change, both of which were based on a per-commit bisection that has since
been disproven. The Windows debugger-attach test
`CanAttachScriptWithPathMappings` intermittently wedges on the attach
handshake and rides the default timeout; the same hang reproduces on
`main` (which contains #2303) and reproduced here with #2303 reverted, so
#2303 is not the cause and is restored. The attach test wedges before it
ever reaches `ReadScriptLogLineAsync`, so that change could not affect the
hang and its short internal cap risked introducing new flakiness on a
slow-but-healthy attach; it is reverted too. The intermittent attach hang
is tracked separately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@andyleejordan andyleejordan changed the title Revert #2303 to fix Windows debugger-attach CI hang Cap CI test job at 30 minutes to bound a hung Windows attach test Jun 18, 2026
CanAttachScriptWithPathMappings intermittently hung Windows CI for hours
instead of failing fast. Its ReadScriptLogLineAsync tailed the script log
with `while (...) await ReadLineAsync()`, but at EOF ReadLineAsync
completes synchronously with null, so the loop never released its
thread-pool thread. On constrained CI runners that starved the pool,
which both wedged the DAP client's background I/O and prevented the xUnit
(15s) and harness (30s) timeout continuations from ever running -- so a
transient stall rode the job timeout for hours.

Await a short delay between reads so the tail loop yields, and add a
matching sleep to the child process's Debug-Runspace readiness poll so it
cannot peg a core during the attach handshake. Combined with the
30-minute CI job cap, a genuine stall now fails fast via the test's own
timeout instead of hanging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@andyleejordan andyleejordan changed the title Cap CI test job at 30 minutes to bound a hung Windows attach test Fix attach E2E test hang from thread-pool starvation; cap CI job Jun 18, 2026
andyleejordan and others added 2 commits June 18, 2026 17:15
CanAttachScriptWithPathMappings hangs on in-box Windows PowerShell 5.1
since the windows-2025-vs2026 runner image refreshed from 20260608 to
20260614. The cross-process Debug-Runspace attach wedges and the test
rides the job timeout; the windows-latest leg cannot complete.

Scope the skip to IsWindowsPowerShell so the in-box WinPS suites
(including CLM) are exempt while PowerShell Core, the preview, macOS, and
Linux keep full coverage of the attach path. This is a stopgap pending a
real fix for the in-box attach deadlock, tracked by #2323; the 30-minute
timeout-minutes backstop in ci-test.yml stays as a guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The earlier comment asserted the EOF tight-loop was the cause of the
multi-hour Windows hang. Deconfounding analysis disproved that: the hang
is the in-box Windows PowerShell attach regression from the 20260614
runner image, not thread-pool starvation here. Keep the yield as genuine
harness hardening but describe it as such rather than claiming it as the
fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@andyleejordan andyleejordan changed the title Fix attach E2E test hang from thread-pool starvation; cap CI job Skip attach E2E test on in-box Windows PowerShell (20260614 image regression); cap CI job Jun 19, 2026
andyleejordan and others added 2 commits June 19, 2026 11:51
CanAttachScriptWithPathMappings wedges during the per-test InitializeAsync
(PSES debug-adapter server startup) on in-box Windows PowerShell since the
windows-2025-vs2026 runner image refresh, riding the job timeout in CI.

The prior in-body Skip.If(IsWindowsPowerShell) never fired because xUnit runs
IAsyncLifetime.InitializeAsync before the test body, and that setup is where
the hang occurs. Add a SkippableFact subclass that sets Skip in its
constructor so xUnit skips the test at discovery time, before it instantiates
the class or runs InitializeAsync. The SkippableFact discoverer is retained so
the runtime Constrained Language Mode skip still works off-Windows.

See #2323.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 30-minute cap was too aggressive as a backstop; bump it to 60 so a
genuinely slow (but not hung) run is not killed prematurely, while still
capping a wedged test well short of GitHub's 6-hour default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
andyleejordan and others added 2 commits June 19, 2026 13:34
The CI hang is in the shared per-test InitializeAsync that starts the
PSES debug-adapter server, not in any single test, so skipping only
CanAttachScriptWithPathMappings just promotes the next test to first
victim. Each test pays a fresh cold-start, and intermittently any one of
them can be the test that wedges on the 20260614 runner image.

Broaden the discovery-time Windows PowerShell skip to the entire
DebugAdapterProtocolMessageTests class: add a SkippableTheory companion
to the existing SkippableFact variant, share the skip reason, and apply
the attributes to all test methods. The pwsh (.NET 8) E2E suite still
runs the full set, so only in-box Windows PowerShell debug-adapter
coverage is paused, pending a real fix tracked in #2323.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The in-box Windows PowerShell server wedges during startup on the current
windows-latest runner image, riding the job timeout. This is a runner-image
regression, not our code: re-running an old commit that predates all of our
recent PRs (and previously passed) now hangs the same way, while macOS and
Linux stay green. Because both E2E suites spawn the same WinPS-hosted server,
skipping only the debug adapter tests just relocated the hang to the language
server fixture's `LSPTestsFixture.InitializeAsync`, where `_psesHost.Start()`
launches the server.

- Apply the discovery-time `[SkippableFactOnWindowsPowerShell]` skip to every
  test method in `LanguageServerProtocolMessageTests`.
- Guard `LSPTestsFixture` so it does not start the server under Windows
  PowerShell, and dispose safely when it wasn't started. xUnit still creates an
  `IClassFixture<>` even when every method in the class is skipped at discovery
  time, so the discovery-time skip alone does not stop the fixture's own startup
  from hanging.
- Generalize the shared skip reason from `WindowsPowerShellDebugAdapterSkip` to
  `WindowsPowerShellServerStartupSkip`, since it now covers both protocols.

Windows PowerShell unit coverage (`TestPS51`) still runs; this only skips the
WinPS-hosted E2E server tests as a stopgap pending a real fix. See #2323.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@andyleejordan andyleejordan merged commit 4835b11 into main Jun 22, 2026
10 checks passed
@andyleejordan andyleejordan deleted the andyleejordan/reduce-ci-test-timeout branch June 22, 2026 17:51
andyleejordan added a commit that referenced this pull request Jun 24, 2026
PR #2328 reset the three E2E test files to their pre-#2318 state
(`b57653c40`) to undo the Windows PowerShell skips we no longer want now
that the host-start hang is actually fixed. But #2318 was a squash that
bundled genuine harness hardening *alongside* those skips, so reverting
wholesale quietly dropped the good parts too. Restore just those, keeping
the skips reverted:

- `ReadScriptLogLineAsync` now yields with `await Task.Delay(100)` at EOF
  instead of busy-spinning. At EOF `ReadLineAsync` completes synchronously
  with `null`, so the old `while`/`await` loop never released its
  thread-pool thread and could starve the scheduler on constrained CI
  runners.
- The child-process `Debug-Runspace` readiness poll in
  `CanAttachScriptWithPathMappings` sleeps 100ms per iteration so it can't
  peg a core during the attach handshake.
- `LSPTestsFixture.DisposeAsync` guards against a null `PsesLanguageClient`
  so a startup failure isn't masked by a `NullReferenceException` during
  teardown.

These are defense-in-depth independent of the skips, and they matter more
now that we un-skip: a Windows PowerShell server that fails to start
shouldn't busy-spin or NRE on teardown.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
andyleejordan added a commit that referenced this pull request Jun 24, 2026
This PR un-skips `CanAttachScriptWithPathMappings` on Windows PowerShell
(reverting the #2318 skips now that the host-start hang is fixed), and the
un-skipped test promptly failed on the WinPS 5.1 CI leg with "Attached
process exited before the script could start" -- at exactly 10 seconds.

The test spawns a child `powershell.exe`, then waits up to 10s for
`Debug-Runspace` to subscribe to the child's runspace as the marker that
the attach handshake completed, before driving a full breakpoint/continue
interaction under a 15s xUnit timeout. On the 2-vCPU CI runners the WinPS
attach alone exceeds that 10s budget, so the child throws its internal
timeout and exits before the debug session attaches. It isn't even close
to comfortable locally: xUnit flags the run as a long-running test at the
10s mark on a fast dev box, so the old budget was always on a knife's edge
for the slower WinPS path.

Rather than re-skip it, give the slow-but-correct path room:

- The child's `Debug-Runspace` subscription poll goes 10s -> 30s.
- The outer process-watch cancellation goes 30s -> 60s.
- The xUnit `Timeout` goes 15s -> 60s.

This continues 81b273b (which already bumped the xUnit timeout 10s -> 15s
for the same flakiness) and keeps real coverage of the attach path on
Windows PowerShell instead of skipping it. See #2323 and #2318.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
andyleejordan added a commit that referenced this pull request Jun 24, 2026
Un-skipping the entire Windows PowerShell E2E suite (now that the host-start
hang is fixed) gave us real WinPS coverage for the first time since #2318, but
it also surfaced two pre-existing WinPS-specific failures that have nothing to
do with host startup:

- `CanAttachScriptWithPathMappings` wedges on the in-box, cross-process
  `Debug-Runspace` attach handshake and rides whatever timeout we set. CI
  failed it at 10s, then again at exactly 30s after I bumped the budgets, so
  it genuinely never completes rather than merely running slow. This is the
  in-box attach deadlock tracked by #2323.
- `CanSendCompletionResolveWithModulePrefixRequestAsync` gets an empty
  completion list from the Windows PowerShell server for a prefix-imported
  command (it fails before any help assertion, so it's help-independent).
  That's the same "completion works in PS7 but not WinPS" family as #1355.

Because startup no longer hangs, we don't need #2318's discovery-time
`[SkippableFactOnWindowsPowerShell]` attribute anymore: an in-body
`Skip.If(IsWindowsPowerShell, ...)` runs after `InitializeAsync` (which now
starts the server fine) and skips before the broken code, so both tests skip
cleanly in ~1ms under `powershell.exe` instead of hanging or failing.

This also reverts my earlier timeout bump on `CanAttachScriptWithPathMappings`
(back to the 15s/10s/30s budgets from `main`) since the bigger budgets didn't
help and the test no longer runs on Windows PowerShell anyway. Everything else
stays un-skipped.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
andyleejordan added a commit that referenced this pull request Jun 24, 2026
…#2328)

* Fix Windows PowerShell host-start hang in `SetCorrectExecutionPolicy`

On recent in-box Windows PowerShell 5.1 servicing builds, the language and
debug servers intermittently hang on startup and ride the CI job timeout
(#2323). The wedge is `SetCorrectExecutionPolicy`: it calls
`Microsoft.PowerShell.Security\Get-ExecutionPolicy -List`, which autoloads
`Microsoft.PowerShell.Security` into the freshly created runspace. That
runspace's `InitialSessionState` is reused from the host runspace and already
carries the module's `ObjectSecurity` type data, so re-binding it throws "The
member `AuditToString` is already present". `PsesInternalHost.Run()` catches
it, faults `_started`, and the pipeline thread exits — but `TryStartAsync` is
still awaiting queued startup tasks that now never run, so it hangs forever.

Configuring the execution policy is best-effort, so we wrap the
`Get-ExecutionPolicy` query in try/catch (mirroring the existing
`Set-ExecutionPolicy` handling just below it) and skip policy configuration
when it fails, rather than letting a type-data hiccup abort host startup. I
also guard the subsequent indexing against a short result list.

With the hang fixed we no longer need the in-box Windows PowerShell E2E skips
added in #2318, so this reverts them: the `SkippableFactOnWindowsPowerShell`
attributes, the `LSPTestsFixture` early-return, and the two attribute files.
The DAP suite passes clean under `powershell.exe`. Three LSP help tests
(`Expand-Archive` synopsis) still fail locally on my older servicing build
(.8655); they pass under `pwsh`, so I'm letting CI judge them on its build
rather than re-skipping. The `ci-test.yml` job timeout from #2318 stays as a
backstop.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore E2E test hardening reverted with the Windows PowerShell skips

PR #2328 reset the three E2E test files to their pre-#2318 state
(`b57653c40`) to undo the Windows PowerShell skips we no longer want now
that the host-start hang is actually fixed. But #2318 was a squash that
bundled genuine harness hardening *alongside* those skips, so reverting
wholesale quietly dropped the good parts too. Restore just those, keeping
the skips reverted:

- `ReadScriptLogLineAsync` now yields with `await Task.Delay(100)` at EOF
  instead of busy-spinning. At EOF `ReadLineAsync` completes synchronously
  with `null`, so the old `while`/`await` loop never released its
  thread-pool thread and could starve the scheduler on constrained CI
  runners.
- The child-process `Debug-Runspace` readiness poll in
  `CanAttachScriptWithPathMappings` sleeps 100ms per iteration so it can't
  peg a core during the attach handshake.
- `LSPTestsFixture.DisposeAsync` guards against a null `PsesLanguageClient`
  so a startup failure isn't masked by a `NullReferenceException` during
  teardown.

These are defense-in-depth independent of the skips, and they matter more
now that we un-skip: a Windows PowerShell server that fails to start
shouldn't busy-spin or NRE on teardown.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fail fast when the debug adapter's `OnInitialize` delegate throws

OmniSharp's `DebugAdapterServer.From()` (0.19.9) awaits an internal
`AsyncSubject` that its `InitializeRequest` handler only signals on the
success path. If an `OnInitialize` delegate throws, the handler faults
before signaling, nothing errors the subject, and `From()` -- and thus
`PsesDebugServer.StartAsync()` -- awaits it forever. So a startup failure
wedges the entire debug server and rides the CI/job timeout instead of
failing fast. This is a library limitation, not our bug: the same code is
present on the library's `master`, so we can't fix it upstream without a
fork or upgrade.

#2328 fixes the specific trigger we hit on in-box Windows PowerShell (the
`Get-ExecutionPolicy -List` type-data conflict), but the wedge mechanism is
generic. Guard against any future `OnInitialize` failure: wrap the delegate
body, log the exception, and signal `_serverStopped` so `WaitForShutdown`
unblocks and the process exits cleanly. `Dispose`'s `_serverStopped`
completion is now idempotent (`TrySetResult`) since the catch may have
already completed it.

This converts a silent multi-hour hang into a clean termination with a
logged error -- the client sees the session end instead of waiting forever.
See #2323.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Increase `CanAttachScriptWithPathMappings` timeouts on slow CI runners

This PR un-skips `CanAttachScriptWithPathMappings` on Windows PowerShell
(reverting the #2318 skips now that the host-start hang is fixed), and the
un-skipped test promptly failed on the WinPS 5.1 CI leg with "Attached
process exited before the script could start" -- at exactly 10 seconds.

The test spawns a child `powershell.exe`, then waits up to 10s for
`Debug-Runspace` to subscribe to the child's runspace as the marker that
the attach handshake completed, before driving a full breakpoint/continue
interaction under a 15s xUnit timeout. On the 2-vCPU CI runners the WinPS
attach alone exceeds that 10s budget, so the child throws its internal
timeout and exits before the debug session attaches. It isn't even close
to comfortable locally: xUnit flags the run as a long-running test at the
10s mark on a fast dev box, so the old budget was always on a knife's edge
for the slower WinPS path.

Rather than re-skip it, give the slow-but-correct path room:

- The child's `Debug-Runspace` subscription poll goes 10s -> 30s.
- The outer process-watch cancellation goes 30s -> 60s.
- The xUnit `Timeout` goes 15s -> 60s.

This continues 81b273b (which already bumped the xUnit timeout 10s -> 15s
for the same flakiness) and keeps real coverage of the attach path on
Windows PowerShell instead of skipping it. See #2323 and #2318.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Skip the attach and prefixed-completion E2E tests on Windows PowerShell

Un-skipping the entire Windows PowerShell E2E suite (now that the host-start
hang is fixed) gave us real WinPS coverage for the first time since #2318, but
it also surfaced two pre-existing WinPS-specific failures that have nothing to
do with host startup:

- `CanAttachScriptWithPathMappings` wedges on the in-box, cross-process
  `Debug-Runspace` attach handshake and rides whatever timeout we set. CI
  failed it at 10s, then again at exactly 30s after I bumped the budgets, so
  it genuinely never completes rather than merely running slow. This is the
  in-box attach deadlock tracked by #2323.
- `CanSendCompletionResolveWithModulePrefixRequestAsync` gets an empty
  completion list from the Windows PowerShell server for a prefix-imported
  command (it fails before any help assertion, so it's help-independent).
  That's the same "completion works in PS7 but not WinPS" family as #1355.

Because startup no longer hangs, we don't need #2318's discovery-time
`[SkippableFactOnWindowsPowerShell]` attribute anymore: an in-body
`Skip.If(IsWindowsPowerShell, ...)` runs after `InitializeAsync` (which now
starts the server fine) and skips before the broken code, so both tests skip
cleanly in ~1ms under `powershell.exe` instead of hanging or failing.

This also reverts my earlier timeout bump on `CanAttachScriptWithPathMappings`
(back to the 15s/10s/30s budgets from `main`) since the bigger budgets didn't
help and the test no longer runs on Windows PowerShell anyway. Everything else
stays un-skipped.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Skip the flaky child-attach-session E2E test on Windows PowerShell

Un-skipping the Windows PowerShell E2E suite surfaced one more attach-family
flake: `CanLaunchScriptWithNewChildAttachSession`. It passed on the prior CI run
(`a4e8a823e`) but timed out (`TaskCanceledException` at its 30s budget) on the
next (`25d9e58dd`) with no relevant code change in between, so it's genuinely
flaky on the slow in-box Windows PowerShell CI runners rather than broken.

The test runs `Start-DebugAttachSession` and waits for the server's
`startDebugging` reverse-request round-trip; on in-box Windows PowerShell that
round-trip is slow enough to intermittently miss the timeout. That's the same
in-box attach-E2E reliability bucket as #2323, and its two siblings are already
skipped there: `CanAttachScriptWithPathMappings` (the cross-process
`Debug-Runspace` wedge) and `CanLaunchScriptWithNewChildAttachSessionAsJob`
(no `ThreadJob` on Windows PowerShell). So skip this one on Windows PowerShell
too, keeping the rest of the now-un-skipped DAP suite running.

The flake only reproduces on the constrained CI runner, not on a fast dev box,
so I'm matching how the sibling attach tests are handled rather than chasing a
timeout bump I can't validate locally.

Drafted by Copilot (Claude Opus 4.8).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants