Adding Use of arrow's has_true() / has_false()#21806
Adding Use of arrow's has_true() / has_false()#21806raushanprabhakar1 wants to merge 1 commit intoapache:mainfrom
Conversation
|
run benchmarks |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing feat/21784-has-true-has-false (ca7e0a8) to 067ba4b (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing feat/21784-has-true-has-false (ca7e0a8) to 067ba4b (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Closes #21784
Rationale for this change
Apache Arrow added
BooleanArray::has_true()andhas_false()so callers can answer “any true/false?” without a full bit count. That can short-circuit and avoid unnecessary work compared to patterns liketrue_count() == 0ortrue_count() > 0.This PR applies those APIs across DataFusion where the logic is purely existential (or equivalent via null-safe “all true” / “no true” checks), matching the audit suggested in the issue.
What changes are included in this PR?
has_true()/has_false()(andnull_count()where needed), including:array_has, list replace), Sparkarray_containsnull-semantics fast pathevaluate_selection, binary AND/OR short-circuit, CASE/IN list loopsscatterfast pathsmetadata.rs,has_any_exact_match)true_count()/false_count()where an actual count is required (row counts, metrics, selectivity,to_array(n), etc.)arrow::array::Arraywherenull_count()is used onBooleanArrayin trait-heavy pathsAre these changes tested?
Existing tests cover this behavior; the edits are semantics-preserving refactors (same conditions, cheaper primitives). No new tests were added.
Are there any user-facing changes?
No. Behavior should be unchanged; this is an internal performance/clarity improvement.