fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers by 0xSoftBoi · Pull Request #286 · uutils/parse_datetime

0xSoftBoi · 2026-04-06T01:08:41Z

Summary

GNU date accepts inputs like 8j and 8 j, treating the leading number as an hour and silently discarding the unrecognized trailing word-token. parse_datetime currently rejects these with InvalidInput.
This PR matches GNU behavior for these cases while keeping all existing error paths intact.

Implementation

Added Item::Noise variant for unrecognized alphabetic tokens.
Added noise_token() as the last alternative in parse_item(), so it only fires after every other parser has already failed.
In DateTimeBuilder::try_from, a Noise item is accepted only when it directly follows a Pure number item (prev_was_pure guard). This ensures 8j and 8 j succeed while bogus +1 day, 2025-01-01 abcdef, and NotADate still return errors.
Added noise_after_pure_number regression test.

Test Plan

All 134+ existing tests pass
New regression test covers both fixed inputs and error cases

Fixes #279

GNU date accepts bare "UT" and "ut" as a synonym for UTC (+0). parse_datetime rejected them because the abbreviation was absent from the named-timezone lookup table in timezone_name_to_offset(). Add "ut" => Ok("+0") immediately after the existing "utc" entry and add a regression test that verifies all four case variants are accepted and resolve to a UTC-offset-0 instant. Fixes uutils#280 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…re numbers GNU date accepts inputs like '8j' and '8 j', treating the number as an hour and silently discarding the unrecognized trailing word-token. This commit matches that behaviour. Implementation: - Add Item::Noise variant for unrecognized alphabetic tokens - Add noise_token() as the last alternative in parse_item(), so it only fires after every other parser has failed - In DateTimeBuilder::try_from, accept Noise only when it directly follows a Pure number item (prev_was_pure guard); reject it anywhere else so that leading garbage (e.g. 'bogus +1 day') and post-date garbage (e.g. '2025-01-01 abcdef') still produce errors - Add noise_after_pure_number regression test covering both '8j' and '8 j' Fixes uutils#279

codspeed-hq · 2026-04-06T05:19:52Z

Merging this PR will degrade performance by 11.2%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 20 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`parse_ctime_format`	66.8 µs	75.2 µs	-11.2%

_{Comparing 0xSoftBoi:fix/ignore-trailing-noise-tokens (06267b3) with main (53ed79b)}

codecov · 2026-04-06T05:29:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.31%. Comparing base (7440dd9) to head (06267b3).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #286   +/-   ##
=======================================
  Coverage   99.30%   99.31%           
=======================================
  Files          20       21    +1     
  Lines        3894     3942   +48     
  Branches      122      123    +1     
=======================================
+ Hits         3867     3915   +48     
  Misses         26       26           
  Partials        1        1

Flag	Coverage Δ
macos_latest	`99.31% <100.00%> (+<0.01%)`	⬆️
ubuntu_latest	`99.31% <100.00%> (+<0.01%)`	⬆️
windows_latest	`13.95% <10.00%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sylvestre · 2026-04-06T08:47:19Z

please run cargo fmt

tomagsx · 2026-04-06T09:10:08Z

Review bump on this PR. I see the current blockers are the formatting failure and the CodSpeed regression flag; I can follow up on the formatting immediately, but I’d still appreciate a maintainer read on whether the GNU-compat behavior for trailing noise-after-pure-number is acceptable before this drifts.

The original noise-token alt-branch made every invalid alphabetic input take an extra parse-item iteration plus a builder-level rejection, which showed up as a -13.27% regression on the parse_invalid_input bench. Restructure: drop the global Item::Noise variant and instead absorb trailing GNU-compat noise inside parse_item only after a Pure item was matched, gated on a cheap peek that confirms the next token is not a real item (datetime/date/time/relative/weekday/offset/pure). This keeps the hot invalid-input and weekday paths identical to main (no extra alt branch), while still passing all noise_after_pure_number cases: - 8j -> 08:00:00 - 8 j -> 08:00:00 - 1230foo -> 12:30 - bogus +1 day -> error (leading garbage) - 2025-01-01 abcdef -> error (noise after non-pure) - notadate -> error (standalone unrecognized) All 377 tests pass; cargo fmt and clippy clean.

0xSoftBoi · 2026-05-07T14:17:49Z

Pushed 06267b3 to address the CodSpeed regression. Instead of making Item::Noise a global alt branch, noise consumption is now folded into parse_item and only fires after a Pure item, gated on a cheap peek that confirms the next token is not a real item (timezone/offset/etc., e.g. BRT in 8 BRT).

That keeps the hot invalid-input and weekday paths byte-identical to main (no extra alt branch), so the previous -13.27% flag on parse_invalid_input should clear. All 377 tests still pass and the original noise_after_pure_number cases (8j, 8 j, 1230foo, plus the negative cases) are unchanged. cc @sylvestre

0xSoftBoi and others added 2 commits April 5, 2026 21:08

0xSoftBoi changed the title ~~fix(offset): accept ut / UT as a UTC timezone abbreviation~~ fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers Apr 6, 2026

style: run rustfmt on stacked UT/noise parser fixes

118bb5a

0xSoftBoi mentioned this pull request Apr 29, 2026

fix: support HH:MM am/pm time formats in combined date-time parsing #284

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers#286

fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers#286
0xSoftBoi wants to merge 4 commits into
uutils:mainfrom
0xSoftBoi:fix/ignore-trailing-noise-tokens

0xSoftBoi commented Apr 6, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

sylvestre commented Apr 6, 2026

Uh oh!

tomagsx commented Apr 6, 2026

Uh oh!

0xSoftBoi commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0xSoftBoi commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation

Test Plan

Uh oh!

codspeed-hq Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 11.2%

Performance Changes

Uh oh!

codecov Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sylvestre commented Apr 6, 2026

Uh oh!

tomagsx commented Apr 6, 2026

Uh oh!

0xSoftBoi commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

0xSoftBoi commented Apr 6, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 6, 2026 •

edited

Loading

codecov Bot commented Apr 6, 2026 •

edited

Loading