Skip to content

fix: use byte-safe indexing in style encapsulation to prevent UTF-8 panics#248

Open
BenjaminDobler wants to merge 2 commits intovoidzero-dev:mainfrom
BenjaminDobler:fix/utf8-char-boundary-panic
Open

fix: use byte-safe indexing in style encapsulation to prevent UTF-8 panics#248
BenjaminDobler wants to merge 2 commits intovoidzero-dev:mainfrom
BenjaminDobler:fix/utf8-char-boundary-panic

Conversation

@BenjaminDobler
Copy link
Copy Markdown
Contributor

Summary

  • Fix panic when CSS selectors contain multibyte UTF-8 characters (e.g. ü, é, box-drawing chars in comments)
  • Five functions in encapsulation.rs used char indices to slice &str, which panics when a char index falls inside a multibyte UTF-8 sequence
  • split_by_combinators, find_pseudo_element_start, find_pseudo_class_start: switched from chars().collect() to char_indices().collect() so string slicing uses byte offsets
  • try_scope_pseudo_function_with_context, find_matching_paren: switched from chars[i] to bytes[i] matching since (/) are ASCII and UTF-8 guarantees ASCII bytes never appear as continuation bytes

Reproduction

Any SCSS file with multibyte UTF-8 characters processed through style encapsulation triggers the panic:

thread '<unnamed>' panicked at crates/oxc_angular_compiler/src/styles/encapsulation.rs:2279:37:
byte index 5 is not a char boundary; it is inside '─' (bytes 3..6)

Test plan

  • Added 4 new unit tests covering multibyte UTF-8 in attribute selectors, pseudo-elements, pseudo-classes, and combinator splitting
  • All 60 existing shadow CSS tests pass
  • Full cargo test passes
  • cargo fmt --check passes
  • No new clippy warnings introduced (pre-existing warnings remain)

🤖 Generated with Claude Code

BenjaminDobler and others added 2 commits April 25, 2026 15:32
…-8 panics

Several functions in the style encapsulation module used char indices
to slice UTF-8 strings, causing panics on selectors containing
multibyte characters (e.g. `ü`, `é`, `─`). This fixes
`split_by_combinators`, `find_pseudo_element_start`,
`find_pseudo_class_start`, `find_matching_paren`, and
`try_scope_pseudo_function_with_context` to use either
`char_indices()` or byte-level scanning for ASCII-only delimiters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merged upstream's addition of \n, \t, \r to combinator matching
with our byte-safe char_indices() indexing fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant