SOLR 18174 AsyncTracker Semaphore permit leak fix (branch_9x)#4292
Open
janhoy wants to merge 6 commits intoapache:branch_9xfrom
Open
SOLR 18174 AsyncTracker Semaphore permit leak fix (branch_9x)#4292janhoy wants to merge 6 commits intoapache:branch_9xfrom
janhoy wants to merge 6 commits intoapache:branch_9xfrom
Conversation
apache#4236) Also add metric asyncPermits.available/max Make max async requests configurable with sysprop solr.solrj.http.jetty.async_requests.max Cherry-pick of 3792f2d from main, adapted for branch_9x: - Http2SolrClient instead of HttpJettySolrClient - LBHttp2SolrClient instead of LBJettySolrClient - Dropwizard gauge registration instead of OTEL ObservableLongGauge - Major-changes entry moved to major-changes-in-solr-9.adoc under 9.11 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add missing EnvUtils import in Http2SolrClient
- Fix test: use org.eclipse.jetty.client.api.{Request,Response,Result} (Jetty 10 API)
- Fix test: replace replicasForCollectionAreFullyActive with clusterShape(2,1)
- Fix test: disambiguate LBHttp2SolrClient.Builder varargs with new String[0]
- Fix test: use reflection to call package-private getHttpClient() for timeout recovery
- Add @SuppressForbidden on testSemaphoreLeakOnLBRetry for new reflection usage
- Run gradlew tidy to auto-format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- LBHttp2SolrClient.Builder: use new Endpoint[0] instead of deprecated String[] variant - clusterShape(2,2): clusterShape counts TOTAL active replicas, not per-shard; 2 shards × 1 replica each = 2 total, so clusterShape(2,2) is correct Both AsyncTrackerSemaphoreLeakTest tests now pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
janhoy
commented
Apr 20, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Backports the SOLR-18174 fix to Solr 9.11 by hardening Http2SolrClient’s async-request tracking to prevent semaphore permit leaks / IO-thread deadlocks, and adds observability + configurability for the async permit limit.
Changes:
- Fixes async-request retry failure handling to complete futures off the Jetty IO thread and adds an idempotency guard to prevent double-acquire on re-queued exchanges.
- Makes the async outstanding-request limit configurable via
solr.solrj.http.jetty.async_requests.maxand exposes gauges for available/max permits. - Adds reproduction tests plus documentation and Prometheus exporter mapping for the new metrics.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java |
AsyncTracker fixes (idempotency guard), IO-thread dispatch on failure, sysprop-based permit cap, and permit getters for metrics/tests. |
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandlerFactory.java |
Registers node-level gauges for async permit max/available. |
solr/core/src/test/org/apache/solr/handler/component/AsyncTrackerSemaphoreLeakTest.java |
Adds tests reproducing the two failure patterns and validating the fix. |
solr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-9.adoc |
Documents the new configurability and points to metrics. |
solr/solr-ref-guide/modules/deployment-guide/pages/metrics-reporting.adoc |
Documents the Dropwizard keys and Prometheus metric for async permits. |
solr/prometheus-exporter/conf/solr-exporter-config.xml |
Exports the new async-permit gauges as solr_client_request_async_permits. |
changelog/unreleased/SOLR-18174-prevent-double-registration.yml |
Adds an unreleased changelog entry for the fix/metrics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Author
|
@dsmiley I'd like to get this backport into |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://issues.apache.org/jira/browse/SOLR-18174
Backport of the fix in #4236 to branch_9x, targeting Solr 9.11.
All tests seem to pass, but have not had the chance to spin up solr exporter to test the metric mapping JQ. The snippet is AI generated by Claude and reviewed by Copilot, but please review.