fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers#286
fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers#2860xSoftBoi wants to merge 4 commits into
Conversation
GNU date accepts bare "UT" and "ut" as a synonym for UTC (+0).
parse_datetime rejected them because the abbreviation was absent from
the named-timezone lookup table in timezone_name_to_offset().
Add "ut" => Ok("+0") immediately after the existing "utc" entry and
add a regression test that verifies all four case variants are
accepted and resolve to a UTC-offset-0 instant.
Fixes uutils#280
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…re numbers GNU date accepts inputs like '8j' and '8 j', treating the number as an hour and silently discarding the unrecognized trailing word-token. This commit matches that behaviour. Implementation: - Add Item::Noise variant for unrecognized alphabetic tokens - Add noise_token() as the last alternative in parse_item(), so it only fires after every other parser has failed - In DateTimeBuilder::try_from, accept Noise only when it directly follows a Pure number item (prev_was_pure guard); reject it anywhere else so that leading garbage (e.g. 'bogus +1 day') and post-date garbage (e.g. '2025-01-01 abcdef') still produce errors - Add noise_after_pure_number regression test covering both '8j' and '8 j' Fixes uutils#279
Merging this PR will degrade performance by 11.2%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | parse_ctime_format |
66.8 µs | 75.2 µs | -11.2% |
Comparing 0xSoftBoi:fix/ignore-trailing-noise-tokens (06267b3) with main (53ed79b)
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #286 +/- ##
=======================================
Coverage 99.30% 99.31%
=======================================
Files 20 21 +1
Lines 3894 3942 +48
Branches 122 123 +1
=======================================
+ Hits 3867 3915 +48
Misses 26 26
Partials 1 1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
please run cargo fmt |
|
Review bump on this PR. I see the current blockers are the formatting failure and the CodSpeed regression flag; I can follow up on the formatting immediately, but I’d still appreciate a maintainer read on whether the GNU-compat behavior for trailing noise-after-pure-number is acceptable before this drifts. |
The original noise-token alt-branch made every invalid alphabetic input take an extra parse-item iteration plus a builder-level rejection, which showed up as a -13.27% regression on the parse_invalid_input bench. Restructure: drop the global Item::Noise variant and instead absorb trailing GNU-compat noise inside parse_item only after a Pure item was matched, gated on a cheap peek that confirms the next token is not a real item (datetime/date/time/relative/weekday/offset/pure). This keeps the hot invalid-input and weekday paths identical to main (no extra alt branch), while still passing all noise_after_pure_number cases: - 8j -> 08:00:00 - 8 j -> 08:00:00 - 1230foo -> 12:30 - bogus +1 day -> error (leading garbage) - 2025-01-01 abcdef -> error (noise after non-pure) - notadate -> error (standalone unrecognized) All 377 tests pass; cargo fmt and clippy clean.
|
Pushed 06267b3 to address the CodSpeed regression. Instead of making That keeps the hot invalid-input and weekday paths byte-identical to main (no extra alt branch), so the previous |
Summary
dateaccepts inputs like8jand8 j, treating the leading number as an hour and silently discarding the unrecognized trailing word-token.parse_datetimecurrently rejects these withInvalidInput.Implementation
Item::Noisevariant for unrecognized alphabetic tokens.noise_token()as the last alternative inparse_item(), so it only fires after every other parser has already failed.DateTimeBuilder::try_from, a Noise item is accepted only when it directly follows a Pure number item (prev_was_pure guard). This ensures8jand8 jsucceed whilebogus +1 day,2025-01-01 abcdef, andNotADatestill return errors.noise_after_pure_numberregression test.Test Plan
Fixes #279