Skip to content

OUT-3544: Add retry mechanisms for transient network errors#242

Open
SandipBajracharya wants to merge 8 commits into
masterfrom
OUT-3544
Open

OUT-3544: Add retry mechanisms for transient network errors#242
SandipBajracharya wants to merge 8 commits into
masterfrom
OUT-3544

Conversation

@SandipBajracharya
Copy link
Copy Markdown
Collaborator

Summary

  • Expand withRetry classifier to cover transient 5xx (500/502/503/504), undici network failures (ECONNRESET/ECONNREFUSED/ETIMEDOUT/UND_ERR_CONNECT_TIMEOUT), and AbortSignal.timeout/AbortController rejections. Tune factor: 4, retries: 4 so maxTimeout: 10_000 actually engages on the final attempt (back-off becomes 0.5s/2s/8s/10s ≈ 20.5s across 5 attempts).
  • Add AbortSignal.timeout(EXTERNAL_FETCH_TIMEOUT_MS) (default 30s, env-overridable) to every external fetch in fetch.helper.ts and copilotAPI.manualFetch.
  • Validate response.ok in fetch helpers; non-2xx now throws a typed HttpFetchError (status, statusText, url, body — JSON-parsed if possible, raw text otherwise) instead of cryptic JSON-parse failures.
  • Remove 15 dead if (!res) throw new APIError(... 'no response') guards from intuitAPI.ts (97 LOC) now that helpers throw on non-2xx; route real status through getMessageAndCodeFromError so qb_sync_logs records 503 as 503 instead of 500.
  • Update useSettings/useQuickbooks to wrap postFetcher calls in try/catch since helpers now throw.

Worst-case op duration: 5 × 30s + 20.5s ≈ 170s, well within the webhook route's maxDuration = 300.

Test plan

  • yarn vitest run --project unit (72/72 passing — adds 19 cases covering 5xx retries, undici/cause network errors, TimeoutError, fetch.helper response.ok and timeout paths)
  • yarn lint:check clean (only pre-existing warnings)
  • yarn prettier:check clean
  • yarn tsc --noEmit clean
  • Manual smoke against QBO sandbox: trigger a webhook, point one outbound at an unroutable host to force ECONNREFUSED, confirm 5 attempts in qb_sync_logs over ~20s before final failure
  • UI spot-check: product/invoice settings save flow still surfaces error state correctly when the API route returns non-2xx

🤖 Generated with Claude Code

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 6, 2026

@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quickbooks-sync Building Building May 11, 2026 10:28am
quickbooks-sync (dev) Ready Ready Preview, Comment May 11, 2026 10:28am

Request Review

SandipBajracharya and others added 3 commits May 11, 2026 14:31
The fetch.helper getFetcher/postFetcher always returned response.json()
(a Promise that resolves to a defined value or threw), so the 13
`if (!res) throw new APIError(...)` branches scattered across IntuitAPI
methods were unreachable. Removing them de-noises the file ahead of the
forthcoming throw-on-non-2xx behavior. Also corrects a stale comment in
resolveUniqueCustomerName whose math referenced the old retries=3 budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…for external calls

Outbound calls to QBO and Copilot could previously hang for the full 300s
Vercel function budget on a stalled connection, returned only 429 to the
retry classifier, and lost the real upstream error detail by the time the
sync log was written. This change makes all three layers resilient:

- AbortSignal.timeout (default 30s, EXTERNAL_FETCH_TIMEOUT_MS, NaN-safe)
  applied in fetch.helper and CopilotAPI.manualFetch so hung connections
  release the function slot and surface a retryable TimeoutError.
- New HttpFetchError captures {status, statusText, url, body} on non-2xx
  responses; buildHttpFetchError extracts the upstream detail
  (Intuit Fault.Error[].{Message,Detail}, generic {message}/{error}) into
  error.message so qb_sync_logs records the real reason, not "HTTP 400".
- isRetryableError now handles 429/5xx, undici fetch-failed envelopes,
  AbortSignal TimeoutError/AbortError, and Node ECONNRESET/REFUSED/etc.,
  with cause.code checked before the generic 'fetch failed' fallback so
  ENOTFOUND short-circuits to non-retryable. Retry budget retuned to
  5 attempts / ~20.5s backoff, fitting under the webhook execution cap.
- getMessageAndCodeFromError and withErrorHandler both grow a branch for
  HttpFetchError so source attribution (intuit/copilot) and Sentry capture
  work consistently for transport failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntract

postFetcher/getFetcher now throw HttpFetchError on non-2xx instead of
returning the parsed body (which was always truthy), so the truthiness
checks in useSettings/useQuickbooks were already dead code. Switch them
to try/catch and pass { timeoutMs: null } so browser fetches against the
app's own routes are not subject to the external-API timeout — SWR
handles its own error-retry for GETs, and user-initiated POSTs surface
errors back to the form rather than auto-aborting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR hardens outbound HTTP calls with structured timeout, retry, and error-surfacing logic. It introduces HttpFetchError with body and status fields, adds AbortSignal.timeout to all external fetches, expands the withRetry classifier to cover 5xx and network errors, and guards non-idempotent QBO writes with an idempotent: false mode that restricts retries to 429-only.

  • Retry classifier rewrite: isRetryableError replaces the previous 429-only check with a documented two-mode classifier; 5xx and network errors retry for reads, are blocked for writes.
  • HttpFetchError class: Replaces silent JSON-parse failures with a typed error carrying status, url, body, and an Intuit-detail-aware message that flows into qb_sync_logs.
  • IntuitAPI method unwrapping: Methods that internally call the retry-wrapped customQuery are now exposed as plain .bind(this) entries, eliminating the previously identified double-retry nesting on 5xx and timeout errors.

Confidence Score: 5/5

Safe to merge — the classifier, timeout, and error-typing changes are well-scoped and the write-safety mode correctly blocks 5xx retries on non-idempotent QBO operations.

The three concerns raised in previous rounds (retry nesting on methods that chain through customQuery, AbortError vs TimeoutError distinction, and duplicate writes from 5xx retries) are each concretely addressed. Test coverage directly exercises each mode. No new logic gaps were found.

No files require special attention. The most behaviorally sensitive change — the idempotent:false guard in intuitAPI.ts — is the correct fix for the write-duplication risk and is tested.

Important Files Changed

Filename Overview
src/app/api/core/utils/withRetry.ts Expanded from a 429-only classifier to a two-mode (idempotent/strict) retry engine with documented backoff tuning; logic is well-tested and addresses the previously flagged nesting and AbortError concerns.
src/utils/intuitAPI.ts Correctly switches write-path methods to idempotent:false, unwraps methods that chain through customQuery to eliminate double-retry nesting, and removes 15 now-unreachable null-response guards.
src/helper/fetch.helper.ts Adds response.ok validation, structured HttpFetchError construction with Intuit/Copilot body parsing, and opt-in AbortSignal.timeout; API surface cleanly extended with FetcherOptions without breaking callers.
src/utils/error.ts Introduces HttpFetchError class and wires it into getMessageAndCodeFromError so real upstream status codes reach qb_sync_logs; URL-based source inference is correctly noted as a heuristic.
src/utils/copilotAPI.ts manualFetch now validates response.ok and attaches a 30s timeout signal; consistent with fetch.helper changes though per-call timeout override not exposed here.
src/hook/useSettings.ts Converts postFetcher result-truthy checks to try/catch, correctly gating SWR cache invalidation on success and showing the confirm button on any thrown error.
src/hook/useQuickbooks.ts Wraps disconnectAction postFetcher call in try/catch; error is logged but silently swallowed — previously flagged outside-diff comment about surfacing failure to the UI is still unaddressed.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant withRetry
    participant IntuitAPI
    participant fetch.helper
    participant QBO as QuickBooks API

    Note over withRetry: idempotent:true (reads) / idempotent:false (writes)

    Caller->>withRetry: customQuery(query)
    withRetry->>IntuitAPI: _customQuery(query)
    IntuitAPI->>fetch.helper: getFetcher(url, headers)
    fetch.helper->>QBO: GET + AbortSignal.timeout(30s)

    alt 2xx response
        QBO-->>fetch.helper: 200 OK
        fetch.helper-->>Caller: parsed JSON
    else 5xx on read (idempotent:true)
        QBO-->>fetch.helper: 503
        fetch.helper-->>withRetry: throws HttpFetchError(503)
        Note over withRetry: isRetryableError=true, backoff 500ms/2s/8s/10s
        withRetry->>IntuitAPI: retry
    else 5xx on write (idempotent:false)
        QBO-->>fetch.helper: 503
        fetch.helper-->>withRetry: throws HttpFetchError(503)
        Note over withRetry: isRetryableError=false, no retry
        withRetry-->>Caller: throws HttpFetchError(503)
    else 429 (both modes)
        QBO-->>fetch.helper: 429
        fetch.helper-->>withRetry: throws HttpFetchError(429)
        withRetry->>IntuitAPI: retry (429 always retryable)
    end
Loading

Reviews (3): Last reviewed commit: "fix(OUT-3544): strict retry classifier f..." | Re-trigger Greptile

Comment thread src/app/api/core/utils/withRetry.ts
Comment thread src/app/api/core/utils/withRetry.ts
SandipBajracharya and others added 4 commits May 11, 2026 15:01
…esting

Greptile P1: _getInvoice, _getSingleIncomeAccount, _getCompanyInfo and
their siblings (_getACustomer, _getAnItem, _getAllItems, _getAnAccount)
each call this.customQuery — the wrapWithRetry-wrapped version — from
inside their own wrapWithRetry-wrapped public exports. With the broader
retry classifier (5xx + timeouts), worst-case attempts compound to
5 outer × 5 inner = 25 calls × 30s timeout, far past the 300s webhook
maxDuration.

Drop the outer wrapWithRetry from the public read exports (now plain
.bind(this) like getCustomerByEmail already did) so retries happen only
at the customQuery network boundary. Writes (create*/update*/delete*/
void*/*SparseUpdate) stay wrapped — they hit the network directly via
postFetchWithHeaders and have no nesting concern.

_getCompanyInfo previously threw a RetryableError(NOT_FOUND, true) when
customQuery returned falsy, relying on the outer wrap to retry. With
the unwrap, that defensive retry path becomes ineffective. Replace it
with a terminal APIError — an empty CompanyInfo response is almost
always a permanent issue (wrong realmId, bad token) that retries do
not fix. Drops the unused RetryableError import as a side effect.

Add a wrap-convention comment block above the export list and a
nesting-hazard note in the withRetry JSDoc so future contributors do
not silently reintroduce the amplification by wrapping new read methods.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…redundancy

- Rename "does not retry on non-429 status errors" to "does not retry on
  permanent 4xx responses" — the old name was misleading once the retry
  classifier was broadened to cover 5xx.
- Drop the "throws HttpFetchError with parsed JSON body on non-2xx" test:
  its only assertion is instanceof HttpFetchError, which is already
  covered by every downstream test that catches and inspects err.status /
  err.url / err.body / err.message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shorten the HttpFetchError source-detection block in error.ts and the
wrap-convention block in intuitAPI.ts to the actionable rule. The
extended hostname list and budget arithmetic lived in the prose; they
belong in PR descriptions, not inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AbortController.abort() throws AbortError and signals a deliberate
cancellation; retrying would defeat the intent. Node 18+ emits a
distinct TimeoutError for AbortSignal.timeout() rejections, which is
the only abort flavor we actually want to retry. Update the test
that previously asserted AbortError as retryable to assert the
opposite, and switch the fetch.helper timeout test to mock
TimeoutError (matching production behavior).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SandipBajracharya
Copy link
Copy Markdown
Collaborator Author

@greptileai

Comment thread src/utils/intuitAPI.ts Outdated
Comment on lines +799 to +803
customQuery = this.wrapWithRetry(this._customQuery)
createInvoice = this.wrapWithRetry(this._createInvoice)
createCustomer = this.wrapWithRetry(this._createCustomer)
createItem = this.wrapWithRetry(this._createItem)
getSingleIncomeAccount = this.wrapWithRetry(this._getSingleIncomeAccount)
getSingleIncomeAccount = this._getSingleIncomeAccount.bind(this)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Non-idempotent writes retried on 5xx — risk of duplicate financial records

createInvoice, createCustomer, createItem, createPayment, createAccount, and createPurchase are all wrapped with wrapWithRetry. With the expanded retry set now including 500/502/503/504 and network errors, a scenario where QBO processes the write successfully but drops the response (e.g., network failure in the response path, or QBO proxy returns 503 after committing the transaction) will cause the retry to issue the same create request again — producing a duplicate invoice, payment, or customer in QuickBooks. A 429 (previous retry condition) signals the server rejected the request; a 503 does not. This gap is not addressed for write paths.

The OUT-3544 broadening of the retry classifier (429 → 429 + 5xx + network)
applies uniformly to every wrapped call, including QBO write endpoints
(`createInvoice`, `createCustomer`, `createItem`, `createPayment`,
`createAccount`, `createPurchase`) which have no Intuit-side request-key
dedupe. A 5xx or network error after the upstream commits could cause
pRetry to replay the write and produce a duplicate financial record.

Adds an `idempotent` option to `withRetry` / `isRetryableError`. In strict
mode (`idempotent: false`), the classifier retries only on 429 and explicit
`RetryableError.retry === true`; 5xx, network codes, and AbortSignal
timeouts are all treated as possibly-after-commit and never replayed.

Inverts the convention in both QBO-facing wrappers:

- `IntuitAPI.wrapWithRetry` defaults to strict; `customQuery` (the only
  read in the wrapped set) opts back into broad retry. All 13 wrapped
  writes (create/update/void/delete) inherit strict by default, so any
  future write method added without options is automatically safe.
- `Intuit.wrapWithRetry` (OAuth) also defaults to strict. `createToken`
  and `refreshAccessToken` consume single-use credentials and would
  previously misdiagnose a 5xx-after-commit refresh as `invalid_grant`
  via `tokenRefresh.handleInvalidGrant`, throwing `QBReconnectRequiredError`
  for a healthy connection. Strict mode lets the transport error surface
  honestly.

Both wrappers merge `{ idempotent: false, ...options }` rather than
relying on a parameter default, so an explicitly-passed `{}` cannot
silently flip back to the global broad-retry default.

Out of scope (not addressed here):
- The same after-commit-with-dropped-response can still produce a
  duplicate when the 3-hour resync re-runs the write. Eliminating that
  requires QBO's `requestid` query parameter on POST endpoints — a
  separate ticket.
- `copilotAPI.ts` / `authenticate.ts` retain the global `idempotent: true`
  default. None of those calls are QBO writes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SandipBajracharya
Copy link
Copy Markdown
Collaborator Author

@greptileai

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 11, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant