Skip to content

SOLR 18174 AsyncTracker Semaphore permit leak fix (branch_9x)#4292

Open
janhoy wants to merge 6 commits intoapache:branch_9xfrom
janhoy:SOLR-18174-backport-branch9x
Open

SOLR 18174 AsyncTracker Semaphore permit leak fix (branch_9x)#4292
janhoy wants to merge 6 commits intoapache:branch_9xfrom
janhoy:SOLR-18174-backport-branch9x

Conversation

@janhoy
Copy link
Copy Markdown
Contributor

@janhoy janhoy commented Apr 19, 2026

https://issues.apache.org/jira/browse/SOLR-18174

Backport of the fix in #4236 to branch_9x, targeting Solr 9.11.

  • Backported the deadlock fixes and their test
  • Backported configurable max permits sysprop
  • Backported metric, and added it to SolrExporter, using same prometheus metric name as in main
  • Documented and flagged in major-changes-in-solr-9 under 9.11 heading

All tests seem to pass, but have not had the chance to spin up solr exporter to test the metric mapping JQ. The snippet is AI generated by Claude and reviewed by Copilot, but please review.

janhoy and others added 3 commits April 20, 2026 00:38
apache#4236)

Also add metric asyncPermits.available/max
Make max async requests configurable with sysprop solr.solrj.http.jetty.async_requests.max

Cherry-pick of 3792f2d from main, adapted for branch_9x:
- Http2SolrClient instead of HttpJettySolrClient
- LBHttp2SolrClient instead of LBJettySolrClient
- Dropwizard gauge registration instead of OTEL ObservableLongGauge
- Major-changes entry moved to major-changes-in-solr-9.adoc under 9.11

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add missing EnvUtils import in Http2SolrClient
- Fix test: use org.eclipse.jetty.client.api.{Request,Response,Result} (Jetty 10 API)
- Fix test: replace replicasForCollectionAreFullyActive with clusterShape(2,1)
- Fix test: disambiguate LBHttp2SolrClient.Builder varargs with new String[0]
- Fix test: use reflection to call package-private getHttpClient() for timeout recovery
- Add @SuppressForbidden on testSemaphoreLeakOnLBRetry for new reflection usage
- Run gradlew tidy to auto-format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- LBHttp2SolrClient.Builder: use new Endpoint[0] instead of deprecated String[] variant
- clusterShape(2,2): clusterShape counts TOTAL active replicas, not per-shard;
  2 shards × 1 replica each = 2 total, so clusterShape(2,2) is correct

Both AsyncTrackerSemaphoreLeakTest tests now pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation client:solrj tests cat:search labels Apr 19, 2026
@janhoy janhoy changed the title SOLR 18174 AsyncTracker Semaphore permit leak fix (branch_9x( SOLR 18174 AsyncTracker Semaphore permit leak fix (branch_9x) Apr 20, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backports the SOLR-18174 fix to Solr 9.11 by hardening Http2SolrClient’s async-request tracking to prevent semaphore permit leaks / IO-thread deadlocks, and adds observability + configurability for the async permit limit.

Changes:

  • Fixes async-request retry failure handling to complete futures off the Jetty IO thread and adds an idempotency guard to prevent double-acquire on re-queued exchanges.
  • Makes the async outstanding-request limit configurable via solr.solrj.http.jetty.async_requests.max and exposes gauges for available/max permits.
  • Adds reproduction tests plus documentation and Prometheus exporter mapping for the new metrics.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java AsyncTracker fixes (idempotency guard), IO-thread dispatch on failure, sysprop-based permit cap, and permit getters for metrics/tests.
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandlerFactory.java Registers node-level gauges for async permit max/available.
solr/core/src/test/org/apache/solr/handler/component/AsyncTrackerSemaphoreLeakTest.java Adds tests reproducing the two failure patterns and validating the fix.
solr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-9.adoc Documents the new configurability and points to metrics.
solr/solr-ref-guide/modules/deployment-guide/pages/metrics-reporting.adoc Documents the Dropwizard keys and Prometheus metric for async permits.
solr/prometheus-exporter/conf/solr-exporter-config.xml Exports the new async-permit gauges as solr_client_request_async_permits.
changelog/unreleased/SOLR-18174-prevent-double-registration.yml Adds an unreleased changelog entry for the fix/metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread changelog/unreleased/SOLR-18174-prevent-double-registration.yml Outdated
@janhoy janhoy requested a review from dsmiley April 20, 2026 11:25
@janhoy
Copy link
Copy Markdown
Contributor Author

janhoy commented Apr 20, 2026

@dsmiley I'd like to get this backport into branch_9x before you cut the 9.11 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:search client:solrj documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants