Skip to content

[BugFix] Reject OFFSET, WHERE on _score, and script subtrees under filter_type=efficient on vectorSearch()#5374

Merged
mengweieric merged 2 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:fix/vector-search-query-builder-hardening
Apr 22, 2026
Merged

[BugFix] Reject OFFSET, WHERE on _score, and script subtrees under filter_type=efficient on vectorSearch()#5374
mengweieric merged 2 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:fix/vector-search-query-builder-hardening

Conversation

@mengweieric
Copy link
Copy Markdown
Collaborator

@mengweieric mengweieric commented Apr 20, 2026

Summary

  • Reject OFFSET on vectorSearch(). Previously OFFSET silently rewrote the search window and dropped top results.
  • Reject WHERE predicates referencing the synthetic _score column. Previously these pushed a range query on a non-existent field and silently returned 0 rows. Users should use option='min_score=...' instead.
  • Reject script subtrees when filter_type=efficient. Previously WHERE predicates that compile to script queries (arithmetic, function calls, CASE, date math) were embedded under knn.filter, which is incompatible with AOSS/serverless vector collections.

Test plan

  • Unit tests cover all three rejection paths.
  • Integration tests assert 400 status and user-facing error messages.
  • spotlessCheck passes.

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit c723515.

'Diff too large, requires skip by maintainers after manual review'


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@mengweieric mengweieric added bug Something isn't working skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. SQL labels Apr 20, 2026
@mengweieric mengweieric force-pushed the feature/vector-search-p0 branch from 9aa62fc to fa444fe Compare April 20, 2026 22:27
@mengweieric mengweieric added the enhancement New feature or request label Apr 20, 2026
@mengweieric mengweieric force-pushed the fix/vector-search-query-builder-hardening branch 3 times, most recently from 55160f9 to 3945b72 Compare April 21, 2026 18:52
…=efficient

Three bug fixes in VectorSearchQueryBuilder that close silent-data-loss paths
on vectorSearch() relations:

1. OFFSET on vectorSearch(): pushDownLimit now rejects any LogicalLimit with
   a non-zero offset. The parent path would push `from: <offset>` into the
   OpenSearch request, silently shifting the top-k window and dropping the
   highest-scoring results. LogicalSort carries only a count (no offset),
   so no change is needed on the sort path.

2. WHERE on _score: pushDownFilter walks the condition with an
   ExpressionNodeVisitor before compiling to DSL and rejects any
   ReferenceExpression whose attr matches "_score" case-insensitively.
   Previously the synthetic _score column compiled to a range query on a
   non-existent stored field, which returned zero rows at the cluster with
   no error. Users who want a score floor should use option='min_score=...'.

3. Script subtrees under filter_type=efficient: after building the WHERE
   QueryBuilder, the EFFICIENT branch recursively scans for any
   ScriptQueryBuilder (traversing BoolQueryBuilder's must/filter/should/
   mustNot lists and ConstantScoreQueryBuilder wrappers). If found, refuse
   to embed it under knn.filter. AOSS and serverless vector collections
   reject script queries inside knn.filter; without this guard the user
   saw an opaque cluster-side failure instead of a clear plan-time error.

Unit tests cover each rejection (including the compound predicate path for
_score, the uppercase _SCORE bypass attempt, and the POST-mode negative
case for script predicates). Integration tests assert 400 status and the
user-facing message from the SQL endpoint, including the combined sort +
limit + offset shape that exercises the pushDownSort() path.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the fix/vector-search-query-builder-hardening branch from 3945b72 to 154df52 Compare April 22, 2026 20:56
if (qb instanceof ConstantScoreQueryBuilder) {
return containsScriptQuery(((ConstantScoreQueryBuilder) qb).innerQuery());
}
return false;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The walker handles BoolQueryBuilder and ConstantScoreQueryBuilder, and falls through to return false for any other wrapper type. This is correct today because FilterQueryBuilder (the only source of whereQuery here) only produces those shapes plus leaf queries and ScriptQueryBuilder.

The concern: if FilterQueryBuilder later adds a new wrapper type (e.g. NestedQueryBuilder, FunctionScoreQueryBuilder, DisMaxQueryBuilder), a ScriptQueryBuilder nested inside would silently slip past this guard and be embedded under knn.filter, producing exactly the AOSS failure this PR is trying to prevent.

Two options to make this future-proof:

  1. Fail closed on unknown wrapper types — throw new ExpressionEvaluationException("unexpected QueryBuilder type " + qb.getClass().getSimpleName() + " in filter_type=efficient path; refusing to embed") instead of return false. Verbose, but safe.
  2. Leave the return false but add a unit test on FilterQueryBuilder that locks in "all shapes produced by build() must be leaf | BoolQuery | ConstantScoreQuery | ScriptQuery" so future additions fail loudly in tests.

Either is fine — just noting that the current shape carries a "quietly re-introduces the bug" risk if FilterQueryBuilder evolves.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, updated. Efficient mode now uses a fail-closed validator instead of a best-effort script walker: scripts are rejected, nested
predicates are rejected for the P0 preview, known safe leaf query builders are allowed, known containers are recursively checked, and unknown
query shapes fail closed. Added tests for nested rejection, safe leaf trees, exists, and unknown wrappers.

// defeats the entire point of a relevance-ranked top-k query. The parent path would push
// `from: <offset>` into the OpenSearch request; reject it explicitly so users get a clear
// error instead of surprising result shifts.
if (limit.getOffset() != null && limit.getOffset() != 0) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: limit.getOffset() — is this Integer (boxed) or int (primitive)? If LogicalLimit.getOffset() returns primitive int, the != null check is always true and dead. Worth verifying: if boxed, keep the null-check for defensive coding; if primitive, drop it for readability.

…ent filter

The prior containsScriptQuery walker missed any wrapper type it didn't know about, which risked silently embedding unsupported queries (e.g., NestedQueryBuilder) under knn.filter. Switch to an allow-list validator over FilterQueryBuilder's actual leaf output (term/range/wildcard/match*/query_string/simple_query_string/match_bool_prefix/exists), recurse bool and constant_score, reject script and nested with targeted messages, and fail closed on unknown shapes.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric merged commit 01a5d87 into opensearch-project:feature/vector-search-p0 Apr 22, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants