Skip to content

test: 5s default timeout flakes across multiple e2e test files #1173

@christso

Description

@christso

Symptom

Multiple integration/e2e tests reliably time out at the 5000 ms default when run as part of the full `bun test` suite under contention. They all share the same root cause: real-e2e operations (subprocess spawns, workspace materialisation, pool acquisition) whose wall-clock under suite load exceeds Bun's 5s default per-test timeout.

Same bug class as #1169 (fixed in #1170 for `pipeline-e2e.test.ts`) and the `pipeline input` block (fixed in #1176). New occurrences keep surfacing as more PRs get pushed — every push attempt this week has hit a different file.

Known offending tests (as of 2026-04-27)

`packages/core/test/evaluation/orchestrator.test.ts` (and related workspace test files):

  • `WorkspacePoolManager > slot acquisition > throws when all slots are locked`
  • `RepoManager > materializeAll > materializes multiple repos`
  • workspace lifecycle tests
  • `--workspace` flag tests

`apps/cli/test/`:

  • `eval.integration.test.ts` — multiple cases
  • `commands/results/serve.test.ts`
  • `pipeline grade — builtin assertions` tests
  • `agentv eval assert > exits 0 when grader returns ...`
  • `agentv eval assert > exits 1 when grader returns ...`
  • `trend command > ...`

This list is non-exhaustive — any test that spawns subprocesses or materialises workspaces is a candidate. Recommend a sweep rather than filing per-file issues.

Why this matters

`validate.yml` (CI) does not run `bun test` — but the local prek pre-push hook does, and it is the only test gate before push. PRs #1167, #1168, #1174, #1175 all required `--no-verify` bypass. PR #1176 needed seven push attempts before catching a contention-free run. That undermines the safety the hook is supposed to provide and trains contributors to ignore it.

Suggested fix

Same one-liner pattern as #1170 / #1176: bump per-test timeout to 30000 ms using Bun's numeric third-arg form:

```ts
it('test name', async () => { ... }, 30_000);
```

(The files import from `'vitest'` but are run by `bun test` — the numeric form works; the vitest options-object does not.)

When all tests in a `describe` block share the same risk profile, prefer setting it once at the describe level.

Approach

Recommend one sweep PR that walks the repo, identifies every test using `execa`, subprocess spawning, or `materializeAll`/`WorkspacePoolManager`, and applies the 30s timeout uniformly. The list above is a starting point; the sweep should grep more broadly:

```bash
grep -rn 'execa|materializeAll|WorkspacePoolManager' apps/cli/test packages/core/test
```

This is preferable to per-file PRs because:

  • Each new flake costs a PR and a contention bypass.
  • The fix is mechanical with no architectural risk.
  • Future contributors stop hitting the bypass-needed pattern.

Handoff context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions