Skip to content

[fix](be) Fix ANN query vector extraction to handle all constant expression types#62637

Merged
yiguolei merged 6 commits intoapache:masterfrom
zhiqiang-hhhh:fix-ann-const-literal-extract
Apr 29, 2026
Merged

[fix](be) Fix ANN query vector extraction to handle all constant expression types#62637
yiguolei merged 6 commits intoapache:masterfrom
zhiqiang-hhhh:fix-ann-const-literal-extract

Conversation

@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor

Proposed changes

Fix extract_query_vector in ann_topn_runtime.cpp to handle all constant expression types instead of only VArrayLiteral and VCastExpr.

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

The extract_query_vector function previously checked whether the query vector argument was specifically a VArrayLiteral or VCastExpr via dynamic_pointer_cast. When FE produced other constant expression forms (e.g., after expression rewrites), ANN index queries would fail with InvalidArgument: Constant must be ArrayLiteral or CAST to array.

The fix removes the rigid type check and instead:

  1. Calls get_const_col() to materialize the constant expression
  2. Unwraps ColumnConst via check_and_get_column<ColumnConst> to get the underlying data column
  3. Validates the resulting column shape (Nullable → Array → Float32)

This approach works for any constant expression that materializes to an array column, regardless of the expression node type.

Release note

Fix ANN vector index query failure when the query vector expression is not a direct array literal or CAST expression.

Check List (For Author)

  • Test: Regression test
  • Behavior changed: No
  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

HappenLee
HappenLee previously approved these changes Apr 21, 2026
Copy link
Copy Markdown
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HappenLee
Copy link
Copy Markdown
Contributor

better we add a regression test to test if it's valid @zhiqiang-hhhh

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Apr 21, 2026
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

2 similar comments
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

…ession types

### What problem does this PR solve?

Problem Summary: The `extract_query_vector` function in `ann_topn_runtime.cpp` only accepted `VArrayLiteral` or `VCastExpr` as the query vector expression. This caused ANN index queries to fail when FE produced other constant expression types (e.g., `ColumnConst` wrapping). The fix removes the rigid type check and instead uses `get_const_col()` + `ColumnConst` unwrapping, which works for any constant expression that materializes to an array column.

### Release note

Fix ANN vector index query failure when the query vector expression is not a direct array literal or CAST expression.

### Check List (For Author)

- Test: Regression test
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Problem Summary: Add BE unit tests and regression coverage for ANN query vectors produced by constant expressions such as `array_repeat` and `array_with_constant`, and update the NULL-array expectation to match the new extraction behavior.

### Release note

None

### Check List (For Author)

- Test: Unit Test and Regression test
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Fix the ANN query vector unit test build failure caused by a stale MockConstVExpr override that no longer matches the current VExpr interface.

### Release note

None

### Check List (For Author)

- Test: No need to test (compile-only fix; full build not rerun in this session)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Fix the ANN extract_query_vector unit test compile failure in CI by updating the mock expression to match the current VExpr execution interface.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - Built extract_query_vector_test.cpp object and vector_search_test in be/ut_build_ASAN
- Behavior changed: No
- Does this need documentation: No
@zhiqiang-hhhh zhiqiang-hhhh force-pushed the fix-ann-const-literal-extract branch from eb2cd14 to 66ba080 Compare April 23, 2026 08:13
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Fix the remaining BE UT compile failure in CI by matching MockConstVExpr to the current VExpr selector signature, where execute_column_impl takes const Selector*.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Verified latest TeamCity failure and aligned the mock signature with the VExpr API used in CI
- Behavior changed: No
- Does this need documentation: No
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.33% (20370/38194)
Line Coverage 36.88% (192013/520619)
Region Coverage 33.18% (149271/449848)
Branch Coverage 34.30% (65316/190404)

@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run cloud_p0

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.22% (26661/37433)
Line Coverage 53.64% (278530/519230)
Region Coverage 46.93% (213705/455328)
Branch Coverage 50.27% (96837/192617)

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.22% (26661/37433)
Line Coverage 53.64% (278530/519230)
Region Coverage 46.93% (213705/455328)
Branch Coverage 50.27% (96837/192617)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.22% (26660/37433)
Line Coverage 53.63% (278482/519230)
Region Coverage 46.92% (213620/455328)
Branch Coverage 50.27% (96827/192617)

Copy link
Copy Markdown
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@yiguolei yiguolei merged commit 430781b into apache:master Apr 29, 2026
30 of 32 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 29, 2026
…ession types (#62637)

## Proposed changes

Fix `extract_query_vector` in `ann_topn_runtime.cpp` to handle all
constant expression types instead of only `VArrayLiteral` and
`VCastExpr`.

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

The `extract_query_vector` function previously checked whether the query
vector argument was specifically a `VArrayLiteral` or `VCastExpr` via
`dynamic_pointer_cast`. When FE produced other constant expression forms
(e.g., after expression rewrites), ANN index queries would fail with
`InvalidArgument: Constant must be ArrayLiteral or CAST to array`.

The fix removes the rigid type check and instead:
1. Calls `get_const_col()` to materialize the constant expression
2. Unwraps `ColumnConst` via `check_and_get_column<ColumnConst>` to get
the underlying data column
3. Validates the resulting column shape (Nullable → Array → Float32)

This approach works for any constant expression that materializes to an
array column, regardless of the expression node type.

### Release note

Fix ANN vector index query failure when the query vector expression is
not a direct array literal or CAST expression.

### Check List (For Author)

- Test: Regression test
- Behavior changed: No
- Does this need documentation: No
@zhiqiang-hhhh zhiqiang-hhhh deleted the fix-ann-const-literal-extract branch April 29, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants