Skip to content

Optimise bitwise extraction and insertion#787

Open
lijunbie wants to merge 1 commit into
QuEST-Kit:develfrom
lijunbie:optimise-bitwise-bmi2
Open

Optimise bitwise extraction and insertion#787
lijunbie wants to merge 1 commit into
QuEST-Kit:develfrom
lijunbie:optimise-bitwise-bmi2

Conversation

@lijunbie

Copy link
Copy Markdown

Summary

  • Add guarded BMI2 fast paths for insertBits() and getValueOfBits() using _pdep_u64 / _pext_u64 on x86-64 builds compiled with __BMI2__.
  • Keep the existing portable loop fallback for non-BMI2 builds, CUDA/HIP compilation, small fixed-size cases, and getValueOfBits() calls whose bit indices are not in increasing order.
  • Add unit coverage for insertBits() and getValueOfBits() over empty, low-bit, multi-bit, and non-sorted extraction cases.

Closes #717

Validation

  • Ran git diff --check.
  • Ran an independent 64-bit equivalence check over 261 sample cases comparing the BMI2-style algorithms against the existing loop semantics.
  • I could not run the full Catch2/CMake test suite locally because cmake is not installed in this environment.

@TysonRayJones

Copy link
Copy Markdown
Member

That's some truly low effort AI slop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants