Fix DATE_BIN TIME scaling overflows by centralizing nanosecond conversion#22823
Draft
kosiew wants to merge 8 commits into
Draft
Fix DATE_BIN TIME scaling overflows by centralizing nanosecond conversion#22823kosiew wants to merge 8 commits into
kosiew wants to merge 8 commits into
Conversation
…e64MicrosecondArray - Introduced scalar Time64Microsecond(i64::MAX) overflow reproducer. - Introduced array Time64MicrosecondArray(i64::MAX) overflow reproducer. - Updated tests to catch current panic using catch_unwind.
…timize timestamp handling - Added `value_to_nanos(value, scale)` function. - Refactored to eliminate repeated `timestamp scalar checked_mul` blocks. - Implemented helper for timestamp array scaling. - Left TIME direct multiplies for SUB_ISSUE_03.
- Implemented value_to_nanos for TIME origin scaling. - Updated TIME scalar source scaling to use value_to_nanos. - Modified TIME array source scaling to include value_to_nanos and ArrowError::ComputeError mapping. - Revised overflow repro tests to ensure no panic occurs and handle normal errors appropriately.
…ate_bin - Renamed the overflow helper from `timestamp_scale_overflow_error` to `nanos_scale_overflow_error`. - Updated error message to be more generic: "DATE_BIN value ... cannot be represented in nanoseconds". - Added a new test helper: `invoke_time64_microsecond_date_bin(...)`. - Simplified scalar and array overflow tests by using the new helper.
…to reflect new representation limit
…dling - Restored timestamp overflow message for DATE_BIN source timestamp. - Retained generic TIME/value overflow message for DATE_BIN value. - Updated value_to_nanos to take an error constructor. - Revised timestamp paths to utilize timestamp_scale_overflow_error. - Updated TIME paths to use nanos_scale_overflow_error.
- Removed timestamp_scale_overflow_error from date_bin.rs - Updated value_to_nanos function to only use nanos_scale_overflow_error for scaling - Modified expected DATE_BIN value in date_bin_errors.slt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #22685
Rationale for this change
DATE_BIN converts timestamp and TIME values to nanoseconds in multiple code paths. While timestamp paths already performed checked multiplication, the logic was duplicated and TIME paths still used direct multiplication that could overflow on extreme inputs.
This change centralizes value-to-nanoseconds scaling in a single helper so overflow handling is consistent across timestamp and TIME scalar/array paths. It also prevents panics or wrapped values when scaling extreme TIME values such as
Time64Microsecond(i64::MAX).What changes are included in this PR?
Added a private helper:
value_to_nanos(value, scale) -> Result<i64>Renamed the overflow helper to the more generic:
nanos_scale_overflow_errorReplaced duplicated timestamp scaling logic with calls to
value_to_nanosin:Replaced direct TIME scaling multiplications with
value_to_nanosin:Converted TIME array scaling failures into
ArrowError::ComputeErrorat the Arrow boundary.Updated overflow error text from:
DATE_BIN source timestamp ... cannot be represented in nanosecondsDATE_BIN value ... cannot be represented in nanosecondsAdded regression tests covering overflow in both scalar and array
Time64Microsecond(i64::MAX)DATE_BIN paths.Are these changes tested?
Yes.
Added the following unit tests:
test_date_bin_time64_microsecond_scalar_scaling_overflow_reproducertest_date_bin_time64_microsecond_array_scaling_overflow_reproducerThese tests verify that DATE_BIN returns a normal error for
Time64Microsecond(i64::MAX)scaling overflow rather than panicking.Also updated SQL logic test expectations to match the generalized overflow error message.
Are there any user-facing changes?
Yes, the overflow error message has been generalized from:
to:
Behavior is otherwise unchanged, except that previously overflow-prone TIME scaling paths now return normal execution errors instead of risking panic or overflow.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.