OUT-3544: Add retry mechanisms for transient network errors#242
OUT-3544: Add retry mechanisms for transient network errors#242SandipBajracharya wants to merge 8 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
The fetch.helper getFetcher/postFetcher always returned response.json() (a Promise that resolves to a defined value or threw), so the 13 `if (!res) throw new APIError(...)` branches scattered across IntuitAPI methods were unreachable. Removing them de-noises the file ahead of the forthcoming throw-on-non-2xx behavior. Also corrects a stale comment in resolveUniqueCustomerName whose math referenced the old retries=3 budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…for external calls
Outbound calls to QBO and Copilot could previously hang for the full 300s
Vercel function budget on a stalled connection, returned only 429 to the
retry classifier, and lost the real upstream error detail by the time the
sync log was written. This change makes all three layers resilient:
- AbortSignal.timeout (default 30s, EXTERNAL_FETCH_TIMEOUT_MS, NaN-safe)
applied in fetch.helper and CopilotAPI.manualFetch so hung connections
release the function slot and surface a retryable TimeoutError.
- New HttpFetchError captures {status, statusText, url, body} on non-2xx
responses; buildHttpFetchError extracts the upstream detail
(Intuit Fault.Error[].{Message,Detail}, generic {message}/{error}) into
error.message so qb_sync_logs records the real reason, not "HTTP 400".
- isRetryableError now handles 429/5xx, undici fetch-failed envelopes,
AbortSignal TimeoutError/AbortError, and Node ECONNRESET/REFUSED/etc.,
with cause.code checked before the generic 'fetch failed' fallback so
ENOTFOUND short-circuits to non-retryable. Retry budget retuned to
5 attempts / ~20.5s backoff, fitting under the webhook execution cap.
- getMessageAndCodeFromError and withErrorHandler both grow a branch for
HttpFetchError so source attribution (intuit/copilot) and Sentry capture
work consistently for transport failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntract
postFetcher/getFetcher now throw HttpFetchError on non-2xx instead of
returning the parsed body (which was always truthy), so the truthiness
checks in useSettings/useQuickbooks were already dead code. Switch them
to try/catch and pass { timeoutMs: null } so browser fetches against the
app's own routes are not subject to the external-API timeout — SWR
handles its own error-retry for GETs, and user-initiated POSTs surface
errors back to the form rather than auto-aborting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4ed6cf8 to
59ab70b
Compare
Greptile SummaryThis PR hardens outbound HTTP calls with structured timeout, retry, and error-surfacing logic. It introduces
Confidence Score: 5/5Safe to merge — the classifier, timeout, and error-typing changes are well-scoped and the write-safety mode correctly blocks 5xx retries on non-idempotent QBO operations. The three concerns raised in previous rounds (retry nesting on methods that chain through customQuery, AbortError vs TimeoutError distinction, and duplicate writes from 5xx retries) are each concretely addressed. Test coverage directly exercises each mode. No new logic gaps were found. No files require special attention. The most behaviorally sensitive change — the idempotent:false guard in intuitAPI.ts — is the correct fix for the write-duplication risk and is tested. Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant withRetry
participant IntuitAPI
participant fetch.helper
participant QBO as QuickBooks API
Note over withRetry: idempotent:true (reads) / idempotent:false (writes)
Caller->>withRetry: customQuery(query)
withRetry->>IntuitAPI: _customQuery(query)
IntuitAPI->>fetch.helper: getFetcher(url, headers)
fetch.helper->>QBO: GET + AbortSignal.timeout(30s)
alt 2xx response
QBO-->>fetch.helper: 200 OK
fetch.helper-->>Caller: parsed JSON
else 5xx on read (idempotent:true)
QBO-->>fetch.helper: 503
fetch.helper-->>withRetry: throws HttpFetchError(503)
Note over withRetry: isRetryableError=true, backoff 500ms/2s/8s/10s
withRetry->>IntuitAPI: retry
else 5xx on write (idempotent:false)
QBO-->>fetch.helper: 503
fetch.helper-->>withRetry: throws HttpFetchError(503)
Note over withRetry: isRetryableError=false, no retry
withRetry-->>Caller: throws HttpFetchError(503)
else 429 (both modes)
QBO-->>fetch.helper: 429
fetch.helper-->>withRetry: throws HttpFetchError(429)
withRetry->>IntuitAPI: retry (429 always retryable)
end
Reviews (3): Last reviewed commit: "fix(OUT-3544): strict retry classifier f..." | Re-trigger Greptile |
…esting Greptile P1: _getInvoice, _getSingleIncomeAccount, _getCompanyInfo and their siblings (_getACustomer, _getAnItem, _getAllItems, _getAnAccount) each call this.customQuery — the wrapWithRetry-wrapped version — from inside their own wrapWithRetry-wrapped public exports. With the broader retry classifier (5xx + timeouts), worst-case attempts compound to 5 outer × 5 inner = 25 calls × 30s timeout, far past the 300s webhook maxDuration. Drop the outer wrapWithRetry from the public read exports (now plain .bind(this) like getCustomerByEmail already did) so retries happen only at the customQuery network boundary. Writes (create*/update*/delete*/ void*/*SparseUpdate) stay wrapped — they hit the network directly via postFetchWithHeaders and have no nesting concern. _getCompanyInfo previously threw a RetryableError(NOT_FOUND, true) when customQuery returned falsy, relying on the outer wrap to retry. With the unwrap, that defensive retry path becomes ineffective. Replace it with a terminal APIError — an empty CompanyInfo response is almost always a permanent issue (wrong realmId, bad token) that retries do not fix. Drops the unused RetryableError import as a side effect. Add a wrap-convention comment block above the export list and a nesting-hazard note in the withRetry JSDoc so future contributors do not silently reintroduce the amplification by wrapping new read methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…redundancy - Rename "does not retry on non-429 status errors" to "does not retry on permanent 4xx responses" — the old name was misleading once the retry classifier was broadened to cover 5xx. - Drop the "throws HttpFetchError with parsed JSON body on non-2xx" test: its only assertion is instanceof HttpFetchError, which is already covered by every downstream test that catches and inspects err.status / err.url / err.body / err.message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shorten the HttpFetchError source-detection block in error.ts and the wrap-convention block in intuitAPI.ts to the actionable rule. The extended hostname list and budget arithmetic lived in the prose; they belong in PR descriptions, not inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AbortController.abort() throws AbortError and signals a deliberate cancellation; retrying would defeat the intent. Node 18+ emits a distinct TimeoutError for AbortSignal.timeout() rejections, which is the only abort flavor we actually want to retry. Update the test that previously asserted AbortError as retryable to assert the opposite, and switch the fetch.helper timeout test to mock TimeoutError (matching production behavior). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| customQuery = this.wrapWithRetry(this._customQuery) | ||
| createInvoice = this.wrapWithRetry(this._createInvoice) | ||
| createCustomer = this.wrapWithRetry(this._createCustomer) | ||
| createItem = this.wrapWithRetry(this._createItem) | ||
| getSingleIncomeAccount = this.wrapWithRetry(this._getSingleIncomeAccount) | ||
| getSingleIncomeAccount = this._getSingleIncomeAccount.bind(this) |
There was a problem hiding this comment.
Non-idempotent writes retried on 5xx — risk of duplicate financial records
createInvoice, createCustomer, createItem, createPayment, createAccount, and createPurchase are all wrapped with wrapWithRetry. With the expanded retry set now including 500/502/503/504 and network errors, a scenario where QBO processes the write successfully but drops the response (e.g., network failure in the response path, or QBO proxy returns 503 after committing the transaction) will cause the retry to issue the same create request again — producing a duplicate invoice, payment, or customer in QuickBooks. A 429 (previous retry condition) signals the server rejected the request; a 503 does not. This gap is not addressed for write paths.
The OUT-3544 broadening of the retry classifier (429 → 429 + 5xx + network)
applies uniformly to every wrapped call, including QBO write endpoints
(`createInvoice`, `createCustomer`, `createItem`, `createPayment`,
`createAccount`, `createPurchase`) which have no Intuit-side request-key
dedupe. A 5xx or network error after the upstream commits could cause
pRetry to replay the write and produce a duplicate financial record.
Adds an `idempotent` option to `withRetry` / `isRetryableError`. In strict
mode (`idempotent: false`), the classifier retries only on 429 and explicit
`RetryableError.retry === true`; 5xx, network codes, and AbortSignal
timeouts are all treated as possibly-after-commit and never replayed.
Inverts the convention in both QBO-facing wrappers:
- `IntuitAPI.wrapWithRetry` defaults to strict; `customQuery` (the only
read in the wrapped set) opts back into broad retry. All 13 wrapped
writes (create/update/void/delete) inherit strict by default, so any
future write method added without options is automatically safe.
- `Intuit.wrapWithRetry` (OAuth) also defaults to strict. `createToken`
and `refreshAccessToken` consume single-use credentials and would
previously misdiagnose a 5xx-after-commit refresh as `invalid_grant`
via `tokenRefresh.handleInvalidGrant`, throwing `QBReconnectRequiredError`
for a healthy connection. Strict mode lets the transport error surface
honestly.
Both wrappers merge `{ idempotent: false, ...options }` rather than
relying on a parameter default, so an explicitly-passed `{}` cannot
silently flip back to the global broad-retry default.
Out of scope (not addressed here):
- The same after-commit-with-dropped-response can still produce a
duplicate when the 3-hour resync re-runs the write. Eliminating that
requires QBO's `requestid` query parameter on POST endpoints — a
separate ticket.
- `copilotAPI.ts` / `authenticate.ts` retain the global `idempotent: true`
default. None of those calls are QBO writes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7734b2a to
77df1c3
Compare
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
Summary
withRetryclassifier to cover transient 5xx (500/502/503/504), undici network failures (ECONNRESET/ECONNREFUSED/ETIMEDOUT/UND_ERR_CONNECT_TIMEOUT), andAbortSignal.timeout/AbortControllerrejections. Tunefactor: 4,retries: 4somaxTimeout: 10_000actually engages on the final attempt (back-off becomes 0.5s/2s/8s/10s ≈ 20.5s across 5 attempts).AbortSignal.timeout(EXTERNAL_FETCH_TIMEOUT_MS)(default 30s, env-overridable) to every external fetch infetch.helper.tsandcopilotAPI.manualFetch.response.okin fetch helpers; non-2xx now throws a typedHttpFetchError(status, statusText, url, body — JSON-parsed if possible, raw text otherwise) instead of cryptic JSON-parse failures.if (!res) throw new APIError(... 'no response')guards fromintuitAPI.ts(97 LOC) now that helpers throw on non-2xx; route real status throughgetMessageAndCodeFromErrorsoqb_sync_logsrecords 503 as 503 instead of 500.useSettings/useQuickbooksto wrappostFetchercalls in try/catch since helpers now throw.Worst-case op duration: 5 × 30s + 20.5s ≈ 170s, well within the webhook route's
maxDuration = 300.Test plan
yarn vitest run --project unit(72/72 passing — adds 19 cases covering 5xx retries, undici/cause network errors, TimeoutError, fetch.helper response.ok and timeout paths)yarn lint:checkclean (only pre-existing warnings)yarn prettier:checkcleanyarn tsc --noEmitcleanECONNREFUSED, confirm 5 attempts inqb_sync_logsover ~20s before final failure🤖 Generated with Claude Code