Skip to content

Remove per-bind global lock contention in AuthenticatedUsers#660

Open
vharseko wants to merge 2 commits into
OpenIdentityPlatform:masterfrom
vharseko:features/bind
Open

Remove per-bind global lock contention in AuthenticatedUsers#660
vharseko wants to merge 2 commits into
OpenIdentityPlatform:masterfrom
vharseko:features/bind

Conversation

@vharseko

@vharseko vharseko commented Jun 25, 2026

Copy link
Copy Markdown
Member

Summary

Removes a global lock that every LDAP BIND (and unbind) takes on the hot
path, which serialized authentications under concurrency. Replaces the
AuthenticatedUsers map (a DITCacheMap guarded by a single
ReentrantReadWriteLock) with a ConcurrentHashMap, so registering /
deregistering a connection no longer contends on one global write lock.

Test PR vs Release: https://github.com/vharseko/OpenDJ/actions/runs/28219689094

Problem

Every successful bind registers the connection in the server-wide
AuthenticatedUsers map via ClientConnection.setAuthenticationInfo()
(put on bind, remove on unbind / re-bind). The map was backed by a
DITCacheMap protected by a single ReentrantReadWriteLock, so put() and
remove() took one global write lock on the hot path of every
authentication — even though connections authenticating as different users
never actually conflict.

Under load this serializes binds. With 200 connections binding as different
users, jstack sampling showed roughly half of the worker threads parked
acquiring that lock:

at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock
at org.opends.server.core.AuthenticatedUsers.put / .remove
at org.opends.server.api.ClientConnection.setAuthenticationInfo
at org.opends.server.core.BindOperationBasis.run

Change

  • AuthenticatedUsers now uses
    ConcurrentHashMap<DN, CopyOnWriteArraySet<ClientConnection>>. put /
    remove / get are lock-free at the map level (bin-granularity locking via
    computeIfAbsent / computeIfPresent); binds for different users run in
    parallel.
  • The post-response handlers that keep authenticated connections in sync with
    changes to their entries are updated:
    • modify → exact-DN lookup, O(1) (a modify never affects descendants,
      so no subtree scan is needed);
    • delete / modify-DN → keep subtree semantics via a single key-set
      pass (removeSubtree) with an isEmpty() fast-exit; the redundant
      pre-check operationDoesNotTargetAuthenticatedUser is removed.
  • Public behaviour and the get() return type (CopyOnWriteArraySet) are
    unchanged.

Benchmark

Packaged server, je backend, 5000 users (uid=user.N, userPassword
hashed {SSHA}), 200 client threads on persistent connections each
re-binding as a random distinct user, JDK 11.

metric before (global lock) after (ConcurrentHashMap)
throughput ~18,400 binds/s ~23,000 binds/s (+~25%)
latency mean 11.0 ms 8.5 ms
latency p50 9.5 ms 6.5 ms
latency p90 16.1 ms 12 ms
workers parked on the lock (jstack) ~half 0

At moderate (non-saturating) concurrency the win is larger and tail latency
also improves; at 200-thread saturation p99 is GC-bound at the higher
operating point (no longer the bind bookkeeping).

Testing

  • Adds BindLatencyBenchmarkTestCase — an opt-in concurrent BIND
    latency/throughput benchmark (N threads, distinct users, persistent
    connections). Disabled by default; it only runs with -Dbind.bench=true,
    so the normal test suite is unaffected. Example:

    mvn -P precommit -pl opendj-server-legacy verify \
        -Dit.test=BindLatencyBenchmarkTestCase -DfailIfNoTests=false \
        -Dbind.bench=true -Dbind.bench.threads=200 \
        -Dbind.bench.durationSeconds=120 -Dbind.bench.label=after
    
  • Existing bind / AuthenticatedUsers regression tests continue to pass;
    main and test sources compile (JDK 11, target 11).

Files

  • opendj-server-legacy/src/main/java/org/opends/server/core/AuthenticatedUsers.java
  • opendj-server-legacy/src/test/java/org/opends/server/core/BindLatencyBenchmarkTestCase.java

vharseko added 2 commits June 25, 2026 20:46
The `snmp` profile assembles the server template config.ldif in two antrun
steps: `copy-config-ldif` copies the pristine resource/config/config.ldif into
target/template/config, and `generate-config-ldif` appends config.snmp.ldif to
it with `concat append="true"`.

The copy used Ant's default `overwrite="false"`, so on an incremental build
(without `clean`) the already-populated target file was newer than the source
and was not refreshed, while the concat appended the SNMP fragment again on
every build. Repeated builds therefore accumulated several
`cn=SNMP Connection Handler,cn=Connection Handlers,cn=config` entries, and
`setup` then failed with "Entry Already Exists ... multiple times".

Add `overwrite="true"` to the config.ldif copy so the SNMP fragment is always
appended to a freshly copied base file, making config.ldif generation
idempotent regardless of incremental builds.
Every successful bind (and unbind) registers/deregisters the client
connection in the global AuthenticatedUsers map via
ClientConnection.setAuthenticationInfo(). The map was a DITCacheMap guarded
by a single ReentrantReadWriteLock, so put()/remove() serialized on one
write lock that is taken on the hot path of every authentication.

Under high concurrency this is a real bottleneck: with 200 connections
binding as different users, roughly half of the worker threads were parked
acquiring that lock (AuthenticatedUsers.put/remove -> WriteLock.lock),
confirmed by jstack sampling, even though the connections target different
users and never actually conflict.

Replace the DITCacheMap + global lock with a
ConcurrentHashMap<DN, CopyOnWriteArraySet<ClientConnection>>:

- put/remove/get are now lock-free at the map level (bin-granularity locking
  via computeIfAbsent/computeIfPresent); binds for different users no longer
  serialize.
- The post-response handlers that react to changes of authenticated user
  entries are adjusted accordingly:
  - modify uses an exact-DN lookup (O(1)) - a modify never affects
    descendants, so no subtree scan is needed;
  - delete and modify-DN keep subtree semantics via a single key-set pass
    (removeSubtree), guarded by an isEmpty() fast-exit; the redundant
    pre-check method operationDoesNotTargetAuthenticatedUser is removed.

Public behaviour and the get() return type (CopyOnWriteArraySet) are
unchanged.

Measured on a packaged JE server (5000 users, 200 threads, persistent
connections rebinding as random distinct users):

  throughput  18.4k -> ~23k binds/s   (+~25%)
  mean        11.0  -> ~8.5 ms
  p50          9.5  -> ~6.5 ms
  workers parked on AuthenticatedUsers lock: ~half -> 0

Add BindLatencyBenchmarkTestCase: an opt-in concurrent BIND latency /
throughput benchmark (N threads, distinct users, persistent connections)
used to capture the before/after numbers. It is disabled by default and only
runs with -Dbind.bench=true, so it never slows the normal test suite.
@vharseko vharseko added this to the 5.2.0 milestone Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants