[BugFix] Rewrite IS NOT NULL / IS NULL to native exists DSL in v2 filter pushdown#5378
Merged
mengweieric merged 2 commits intoopensearch-project:feature/vector-search-p0from Apr 22, 2026
Conversation
The v2 FilterQueryBuilder's luceneQueries map did not register entries for
BuiltinFunctionName.IS_NULL or IS_NOT_NULL, so visitFunction fell through to
buildScriptQuery for these unary predicates. This produced a compounded script
query in the pushed-down DSL, e.g.:
{"script": {"script": {"source": "{\"langType\":\"v2\",\"script\":\"is not null(age)\"}", ...}}}
instead of the native OpenSearch exists query. Script fallback is more
expensive to evaluate, is not supported on AOSS / serverless, and diverges
from what the Calcite path and downstream tooling already produce.
This change adds a new ExistsQuery lucene query class that overrides the
LuceneQuery.canSupport(FunctionExpression) / build(FunctionExpression) pair
to accept a single ReferenceExpression argument (the standard base class
contract is binary {ref, literal}). Two instances are registered in
FilterQueryBuilder, one for each predicate sense, producing:
IS NOT NULL col -> {"exists": {"field": "col"}}
IS NULL col -> {"bool": {"must_not": [{"exists": {"field": "col"}}]}}
matching the Calcite PredicateAnalyzer output for these predicates.
Nested-field predicates are explicitly guarded: ExistsQuery.canSupport
returns false when the argument is a nested() function, mirroring the
equivalent guard in PredicateAnalyzer ("OpenSearch DSL does not handle
IS_NULL / IS_NOT_NULL on nested fields correctly"). When canSupport returns
false the existing default branch in FilterQueryBuilder.visitFunction falls
through to the script query path, preserving the prior behavior for that
edge case.
This closes the tracker row "Pushdown coverage - IS NOT NULL pushes script
instead of exists". The change affects all v2 SQL queries, not just
vectorSearch table function queries.
Test coverage:
- FilterQueryBuilderTest: updated should_build_script_query_for_unsupported_lucene_query
to two new assertions - should_build_exists_query_for_is_not_null and
should_build_must_not_exists_query_for_is_null - pinning the new DSL shape.
- ExistsPushdownIT (new): explain-plan integration tests asserting that
WHERE col IS NOT NULL pushes as native exists DSL (no "script" key) and
WHERE col IS NULL pushes as bool/must_not[exists].
- AggregationIT (existing) continues to pass, confirming result-level
correctness is preserved for queries like "where int0 IS NOT NULL".
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
ExistsQuery.canSupport() already returns false for nested references, but the default branch of FilterQueryBuilder.visitFunction() has a second dispatch: when canSupport() is false, it checks isNestedPredicate() and, if true, routes to NestedQuery.buildNested(func, query). That helper reads func.getArguments().get(1) on the assumption of a binary predicate shape, so unary IS NULL / IS NOT NULL would throw at runtime. Override isNestedPredicate() to return false in ExistsQuery so the dispatch falls through to the compiled-script path for nested-field predicates. Add unit tests covering both IS NULL and IS NOT NULL to lock in the fallback. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
af73737
into
opensearch-project:feature/vector-search-p0
37 of 38 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
IS NOT NULL/IS NULLpushdown in the v2FilterQueryBuilderfrom a serialized compounded script query to native OpenSearchexistsDSL.vectorSearch()table-function queries.DSL shape: before / after
Before (both predicates serialized through the script engine):
{"script": {"script": {"source": "{\"langType\":\"v2\",\"script\":\"is not null(age)\"}", ...}}}After:
Native
existsis cheaper to evaluate, supported on AOSS / serverless, and matches what the CalcitePredicateAnalyzeralready emits for these predicates.Design
New
ExistsQuerylucene query class inopensearch/.../storage/script/filter/lucene/overrides theLuceneQuery.canSupport/buildpair to accept a singleReferenceExpressionargument (the base class contract is binary{ref, literal}). Two instances are registered inFilterQueryBuilder.luceneQueries:Nested-field predicates are explicitly guarded:
ExistsQuery.canSupportreturns false when the argument is anested()function. This mirrors the equivalent guard inPredicateAnalyzer("OpenSearch DSL does not handle IS_NULL / IS_NOT_NULL on nested fields correctly"). WhencanSupportreturns false, the default branch inFilterQueryBuilder.visitFunctionfalls through to the script query path, preserving prior behavior for that edge case.Test plan
./gradlew spotlessCheck- clean./gradlew :opensearch:test --tests "*FilterQueryBuilderTest*"- greenshould_build_script_query_for_unsupported_lucene_querywith two focused assertions:should_build_exists_query_for_is_not_nullandshould_build_must_not_exists_query_for_is_null, pinning the new DSL shape../gradlew :integ-test:integTest -Dtests.class="*ExistsPushdownIT*"- greenExistsPushdownITcovers both senses via_explainagainstTEST_INDEX_ACCOUNT, asserting"exists"is present and"script"is absent in the source builder JSON../gradlew :integ-test:integTest -Dtests.class="*AggregationIT*"- greenwhere int0 IS NOT NULLandwhere int0 IS NULLqueries continue to produce correct result-level output.Follow-up: nested unary predicate fallback (commit
aba5e89)Addresses reviewer feedback that
ExistsQuery.canSupport()returningfalsefor nested arguments was not sufficient:FilterQueryBuilder.visitFunction()has a second dispatch that checksisNestedPredicate()whencanSupport()is false, and the inheritedLuceneQuery.isNestedPredicate()returnstruefor any function whosearg[0]is a nested reference. UnaryIS NULL/IS NOT NULLcould therefore still route intoNestedQuery.buildNested(), which readsarg[1]and would throw.The follow-up commit overrides
ExistsQuery.isNestedPredicate()to returnfalse, forcing the script-query fallback for nested-field predicates. Two new regression unit tests (should_fallback_to_script_for_nested_is_not_null,should_fallback_to_script_for_nested_is_null) lock in the fallback inFilterQueryBuilderTest../gradlew :opensearch:test --tests "*FilterQueryBuilderTest*"- green (including two new nested fallback tests)./gradlew :integ-test:integTest -Dtests.class="*ExistsPushdownIT*"- green