HDDS-15114. Replace misconfigured ThreadPoolExecutor with Executors factory methods by ptlrs · Pull Request #10133 · apache/ozone

ptlrs · 2026-04-25T00:51:44Z

What changes were proposed in this pull request?

Some threadpools are misconfigured. Their effective number of threads is one or they never reach their maximum number of threads.

This PR:

Updates misconfigured threadpools to have more than one thread
Update misconfigured thredpools to be limited to their number of core threads

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15114

How was this patch tested?

CI: https://github.com/ptlrs/ozone/actions/runs/24918410704

…actory methods.

devmadhuu

Thanks @ptlrs for the patch. Largely LGTM, just a minor comment for an existing code.

devmadhuu · 2026-04-27T06:02:18Z

The %d placeholder will be substituted with the thread number, resulting in names like FetchOMDBTar-0somePrefix rather than the presumably intended FetchOMDBTar-somePrefix-0. This is a pre-existing issue, but since the code is being touched it's a good opportunity to fix it.

Thanks for the review @devmadhuu. I have updated the code. I took a look at the other ThreadFactories in our code. I adopted their convention and moved the threadNamePrefix to be the initial part of the thread name.

devmadhuu · 2026-04-28T05:28:34Z

Thanks @ptlrs for updating the patch. Can we add atleast one integration test testing the change because now there can be multiple concurrent worker threads running for downloading the OM DB tar for multiple sst files. May be a test that verifies tasks submitted to TarExtractor can run in parallel. I mean we can assert active threads count to verify throughput.

devmadhuu · 2026-04-28T05:34:58Z

    if (executorServiceStarted.compareAndSet(false, true)) {
-      this.executor =
-          new ThreadPoolExecutor(0, threadPoolSize, 60L, TimeUnit.SECONDS, new LinkedBlockingQueue<>(), threadFactory);
+      this.executor = Executors.newFixedThreadPool(threadPoolSize, threadFactory);


Just a thought: The pool size is Math.max(64, Runtime.getRuntime().availableProcessors()) at the call site in OzoneManagerServiceProviderImpl. On high-CPU machines this could be large. With the old code it barely mattered; now it will actually create that many threads. On a 256-core host that's 256 threads all blocking on I/O at the same time. Consider documenting or probably IMO we should be capping this.

devmadhuu · 2026-04-28T06:19:14Z

-        new ThreadFactoryBuilder()
-            .setNameFormat("DataNodeMetricsCollector-%d")
-            .build());
+    ThreadFactory threadFactory = new ThreadFactoryBuilder()


The original pool was intended (if misconfigured) to scale up to 500 threads under load (e.g., Recon polling thousands of DataNodes simultaneously). The fix correctly reflects actual behavior today, but it also avoiding efficient future scaling. If the intent was truly to allow bursting to a larger pool, can we think of using a bounded queue rather than simply removing it. At minimum, a comment explaining the corePoolSize choice and impact on removing the MAX_POOL_SIZE = 500.

Bu wait, this could make it slower in large cluster having hundreds of datanodes where it makes concurrent DataNode queries. The original MAX_POOL_SIZE = 500 captured the right intent — it was just broken by the unbounded LinkedBlockingQueue. A better fix would be to make the pool size configurable with a sensible default:

// e.g., add a config key OZONE_RECON_DN_METRICS_THREAD_COUNT, default 128 or min(nodeCount, 500) int poolSize = config.getInt(OZONE_RECON_DN_METRICS_THREAD_COUNT, 128); this.executorService = Executors.newFixedThreadPool(poolSize, threadFactory);

HDDS-15114. Replace misconfigured ThreadPoolExecutor with Executors f…

cb0bb86

…actory methods.

adoroszlai requested review from aryangupta1998 and devmadhuu April 25, 2026 04:47

devmadhuu reviewed Apr 27, 2026

View reviewed changes

HDDS-15114. Fix thread name format issue in TarExtractor thread factory.

cb65e89

ptlrs requested a review from devmadhuu April 27, 2026 17:29

devmadhuu reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-15114. Replace misconfigured ThreadPoolExecutor with Executors factory methods#10133

HDDS-15114. Replace misconfigured ThreadPoolExecutor with Executors factory methods#10133
ptlrs wants to merge 2 commits intoapache:masterfrom
ptlrs:HDDS-15114-Fix-misconfigured-threadpools

ptlrs commented Apr 25, 2026

Uh oh!

devmadhuu left a comment

Uh oh!

devmadhuu Apr 27, 2026

Uh oh!

ptlrs Apr 27, 2026

Uh oh!

devmadhuu commented Apr 28, 2026

Uh oh!

devmadhuu Apr 28, 2026

Uh oh!

devmadhuu Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ptlrs commented Apr 25, 2026

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

devmadhuu Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

ptlrs Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

devmadhuu commented Apr 28, 2026

Uh oh!

devmadhuu Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

devmadhuu Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants