Skip to content

[BugFix] Rewrite IS NOT NULL / IS NULL to native exists DSL in v2 filter pushdown#5378

Merged
mengweieric merged 2 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:fix/filter-is-not-null-exists
Apr 22, 2026
Merged

[BugFix] Rewrite IS NOT NULL / IS NULL to native exists DSL in v2 filter pushdown#5378
mengweieric merged 2 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:fix/filter-is-not-null-exists

Conversation

@mengweieric
Copy link
Copy Markdown
Collaborator

@mengweieric mengweieric commented Apr 21, 2026

Summary

  • Rewrites IS NOT NULL / IS NULL pushdown in the v2 FilterQueryBuilder from a serialized compounded script query to native OpenSearch exists DSL.
  • Closes tracker row "Pushdown coverage - IS NOT NULL pushes script instead of exists".
  • Affects all v2 SQL queries, not just vectorSearch() table-function queries.

DSL shape: before / after

Before (both predicates serialized through the script engine):

{"script": {"script": {"source": "{\"langType\":\"v2\",\"script\":\"is not null(age)\"}", ...}}}

After:

IS NOT NULL col  ->  {"exists": {"field": "col"}}
IS NULL col      ->  {"bool": {"must_not": [{"exists": {"field": "col"}}]}}

Native exists is cheaper to evaluate, supported on AOSS / serverless, and matches what the Calcite PredicateAnalyzer already emits for these predicates.

Design

New ExistsQuery lucene query class in opensearch/.../storage/script/filter/lucene/ overrides the LuceneQuery.canSupport / build pair to accept a single ReferenceExpression argument (the base class contract is binary {ref, literal}). Two instances are registered in FilterQueryBuilder.luceneQueries:

.put(BuiltinFunctionName.IS_NULL.getName(), new ExistsQuery(true /* negated */))
.put(BuiltinFunctionName.IS_NOT_NULL.getName(), new ExistsQuery(false))

Nested-field predicates are explicitly guarded: ExistsQuery.canSupport returns false when the argument is a nested() function. This mirrors the equivalent guard in PredicateAnalyzer ("OpenSearch DSL does not handle IS_NULL / IS_NOT_NULL on nested fields correctly"). When canSupport returns false, the default branch in FilterQueryBuilder.visitFunction falls through to the script query path, preserving prior behavior for that edge case.

Test plan

  • ./gradlew spotlessCheck - clean
  • ./gradlew :opensearch:test --tests "*FilterQueryBuilderTest*" - green
    • Replaced should_build_script_query_for_unsupported_lucene_query with two focused assertions: should_build_exists_query_for_is_not_null and should_build_must_not_exists_query_for_is_null, pinning the new DSL shape.
  • ./gradlew :integ-test:integTest -Dtests.class="*ExistsPushdownIT*" - green
    • New IT ExistsPushdownIT covers both senses via _explain against TEST_INDEX_ACCOUNT, asserting "exists" is present and "script" is absent in the source builder JSON.
  • ./gradlew :integ-test:integTest -Dtests.class="*AggregationIT*" - green
    • Regression check: existing where int0 IS NOT NULL and where int0 IS NULL queries continue to produce correct result-level output.

Follow-up: nested unary predicate fallback (commit aba5e89)

Addresses reviewer feedback that ExistsQuery.canSupport() returning false for nested arguments was not sufficient: FilterQueryBuilder.visitFunction() has a second dispatch that checks isNestedPredicate() when canSupport() is false, and the inherited LuceneQuery.isNestedPredicate() returns true for any function whose arg[0] is a nested reference. Unary IS NULL / IS NOT NULL could therefore still route into NestedQuery.buildNested(), which reads arg[1] and would throw.

The follow-up commit overrides ExistsQuery.isNestedPredicate() to return false, forcing the script-query fallback for nested-field predicates. Two new regression unit tests (should_fallback_to_script_for_nested_is_not_null, should_fallback_to_script_for_nested_is_null) lock in the fallback in FilterQueryBuilderTest.

  • ./gradlew :opensearch:test --tests "*FilterQueryBuilderTest*" - green (including two new nested fallback tests)
  • ./gradlew :integ-test:integTest -Dtests.class="*ExistsPushdownIT*" - green

The v2 FilterQueryBuilder's luceneQueries map did not register entries for
BuiltinFunctionName.IS_NULL or IS_NOT_NULL, so visitFunction fell through to
buildScriptQuery for these unary predicates. This produced a compounded script
query in the pushed-down DSL, e.g.:

    {"script": {"script": {"source": "{\"langType\":\"v2\",\"script\":\"is not null(age)\"}", ...}}}

instead of the native OpenSearch exists query. Script fallback is more
expensive to evaluate, is not supported on AOSS / serverless, and diverges
from what the Calcite path and downstream tooling already produce.

This change adds a new ExistsQuery lucene query class that overrides the
LuceneQuery.canSupport(FunctionExpression) / build(FunctionExpression) pair
to accept a single ReferenceExpression argument (the standard base class
contract is binary {ref, literal}). Two instances are registered in
FilterQueryBuilder, one for each predicate sense, producing:

    IS NOT NULL col  ->  {"exists": {"field": "col"}}
    IS NULL col      ->  {"bool": {"must_not": [{"exists": {"field": "col"}}]}}

matching the Calcite PredicateAnalyzer output for these predicates.

Nested-field predicates are explicitly guarded: ExistsQuery.canSupport
returns false when the argument is a nested() function, mirroring the
equivalent guard in PredicateAnalyzer ("OpenSearch DSL does not handle
IS_NULL / IS_NOT_NULL on nested fields correctly"). When canSupport returns
false the existing default branch in FilterQueryBuilder.visitFunction falls
through to the script query path, preserving the prior behavior for that
edge case.

This closes the tracker row "Pushdown coverage - IS NOT NULL pushes script
instead of exists". The change affects all v2 SQL queries, not just
vectorSearch table function queries.

Test coverage:
- FilterQueryBuilderTest: updated should_build_script_query_for_unsupported_lucene_query
  to two new assertions - should_build_exists_query_for_is_not_null and
  should_build_must_not_exists_query_for_is_null - pinning the new DSL shape.
- ExistsPushdownIT (new): explain-plan integration tests asserting that
  WHERE col IS NOT NULL pushes as native exists DSL (no "script" key) and
  WHERE col IS NULL pushes as bool/must_not[exists].
- AggregationIT (existing) continues to pass, confirming result-level
  correctness is preserved for queries like "where int0 IS NOT NULL".

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric added bug Something isn't working enhancement New feature or request SQL skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. labels Apr 21, 2026
ExistsQuery.canSupport() already returns false for nested references, but
the default branch of FilterQueryBuilder.visitFunction() has a second
dispatch: when canSupport() is false, it checks isNestedPredicate() and, if
true, routes to NestedQuery.buildNested(func, query). That helper reads
func.getArguments().get(1) on the assumption of a binary predicate shape,
so unary IS NULL / IS NOT NULL would throw at runtime.

Override isNestedPredicate() to return false in ExistsQuery so the dispatch
falls through to the compiled-script path for nested-field predicates. Add
unit tests covering both IS NULL and IS NOT NULL to lock in the fallback.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric merged commit af73737 into opensearch-project:feature/vector-search-p0 Apr 22, 2026
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant