Skip to content

feat: Support RIGHT/FULL joins in NLJ memory-limited execution#21833

Open
viirya wants to merge 1 commit intoapache:mainfrom
viirya:nlj-right-full-join-spill
Open

feat: Support RIGHT/FULL joins in NLJ memory-limited execution#21833
viirya wants to merge 1 commit intoapache:mainfrom
viirya:nlj-right-full-join-spill

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented Apr 24, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Previously RIGHT/FULL/RIGHT SEMI/RIGHT ANTI/RIGHT MARK joins were excluded from the memory-limited (multi-pass) fallback path because they need to track which right rows have been matched across all left chunks. They would OOM instead of spilling.

Now all join types support the fallback:

  • A global right bitmap (Vec, indexed by right batch sequence number) accumulates matches across all left chunk passes. ReplayableStreamSource guarantees consistent batch boundaries across passes, so batch sequence numbers are stable.

  • In memory-limited mode, EmitRightUnmatched merges the current batch's bitmap into the global accumulator (bitwise OR) instead of emitting unmatched rows immediately.

  • After the last left chunk, a new state EmitGlobalRightUnmatched replays the right side one more time and uses the accumulated bitmap to emit unmatched right rows correctly.

Single-pass behavior is unchanged: the global bitmap path is only active when is_memory_limited() is true.

Co-authored-by: Claude Code

Are these changes tested?

Unit tests and e2e tests

Are there any user-facing changes?

No

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Apr 24, 2026
@viirya viirya marked this pull request as draft April 24, 2026 15:52
@viirya viirya force-pushed the nlj-right-full-join-spill branch 3 times, most recently from 75e43cf to eb80057 Compare April 24, 2026 17:38
Previously RIGHT/FULL/RIGHT SEMI/RIGHT ANTI/RIGHT MARK joins were
excluded from the memory-limited (multi-pass) fallback path because
they need to track which right rows have been matched across all
left chunks. They would OOM instead of spilling.

Now all join types support the fallback:

- A global right bitmap (Vec<BooleanBuffer>, indexed by right batch
  sequence number) accumulates matches across all left chunk passes.
  ReplayableStreamSource guarantees consistent batch boundaries
  across passes, so batch sequence numbers are stable.

- In memory-limited mode, EmitRightUnmatched merges the current
  batch's bitmap into the global accumulator (bitwise OR) instead
  of emitting unmatched rows immediately.

- After the last left chunk, a new state EmitGlobalRightUnmatched
  replays the right side one more time and uses the accumulated
  bitmap to emit unmatched right rows correctly.

Single-pass behavior is unchanged: the global bitmap path is only
active when is_memory_limited() is true.

Co-authored-by: Claude Code
@viirya viirya force-pushed the nlj-right-full-join-spill branch from eb80057 to 57c8cb8 Compare April 24, 2026 17:41
@viirya viirya marked this pull request as ready for review April 24, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant