Skip to content

Benchmark the built Docker image against the released one#656

Open
vharseko wants to merge 16 commits into
OpenIdentityPlatform:masterfrom
vharseko:features/performance-build
Open

Benchmark the built Docker image against the released one#656
vharseko wants to merge 16 commits into
OpenIdentityPlatform:masterfrom
vharseko:features/performance-build

Conversation

@vharseko

Copy link
Copy Markdown
Member

Adds a "Build vs Release" performance benchmark to the Docker build jobs in
build.yml. On every push/PR, after the existing functional smoke tests, the
freshly built OpenDJ image is benchmarked against the latest released
image and the comparison is published in the job summary — so performance
regressions are caught at build time.

This builds on the OpenDJ-vs-OpenLDAP benchmark already on features/performance
(reusing benchmark.jmx and the now-generic summary.sh).

What's added

File Purpose
.github/benchmark/compare-opendj.sh Reusable helper: benchmark two OpenDJ images and emit the comparison
.github/workflows/build.yml Wire the benchmark into build-docker and build-docker-alpine

How it works

compare-opendj.sh <A_name> <A_image> <B_name> <B_image> (env THREADS=200,
DURATION=150):

  • Installs ldap-utils/jq and Apache JMeter (cached).
  • For each image, sequentially (one container at a time on port 1389): start
    the OpenDJ container, wait for readiness, seed ou=People, capture the version
    (fullVendorVersion), run benchmark.jmx, stop the container.
  • Renders the comparison via the generic summary.sh
    (<name> <statistics.json> <version> <image> ×2) into $GITHUB_STEP_SUMMARY:
    versions, totals, per-operation p99 latency, and QuickChart bars.

Both sides are OpenDJ, so no per-server hashing/index setup is needed — identical
product ⇒ identical default password scheme ⇒ hashing parity is automatic.

Wiring in build.yml

Both build-docker and build-docker-alpine gain:

  • a sparse actions/checkout of .github/benchmark (these jobs have no checkout,
    and the build artifact does not include those files);
  • a JMeter cache step;
  • a benchmark step after the functional tests:
    • build-dockerBuild = localhost:5000/${GITHUB_REPOSITORY,,}:${release_version}
      vs Release = openidentityplatform/opendj:latest;
    • build-docker-alpineBuild-alpine = …:${release_version}-alpine
      vs Release-alpine = openidentityplatform/opendj:alpine.

Isolation

build-docker and build-docker-alpine are separate jobs, each on its own
ephemeral runner (own registry:3, ports and containers), so they never contend.
Within a job the benchmark runs after the functional tests (whose containers
are already killed) and uses one OpenDJ container at a time on port 1389.

Notes

  • Load profile is 200 threads / 150 s per image.
  • Validated with actionlint and bash -n; full end-to-end runs only on CI.

vharseko added 13 commits June 22, 2026 11:30
Adds .github/workflows/performance.yml plus supporting assets under
.github/performance/ to run an automated LDAP benchmark comparing the
latest OpenDJ and OpenLDAP Docker images.

- Triggers: manual (workflow_dispatch) and after the Release workflow
  (workflow_run). Image tags and load profile (threads/duration/rampup/
  jmeter_version) are configurable inputs, defaulting to latest images and
  the original 256-thread/600s profile.
- Benchmarks OpenLDAP (osixia/openldap) first, then OpenDJ
  (openidentityplatform/opendj), sequentially so only one server is under
  load at a time.
- JMeter plan (benchmark.jmx): the admin bind is cached once per thread
  (Once Only Controller, labelled ADMIN_CONNECT and excluded from metrics);
  ADD/SEARCH/COMPARE/MODIFY/DELETE/ADD WITHOUT DELETE run on that admin
  connection; the measured BIND is a single bind/unbind (test=sbind, own
  connection) as the per-thread user with the password MODIFY sets, so real
  password-hash verification is exercised.
- Captures versions via shipped tools (OpenLDAP slapd -VV, OpenDJ rootDSE
  fullVendorVersion) and writes versions, a per-operation comparison table
  and two comparative Mermaid xychart-beta charts to GITHUB_STEP_SUMMARY;
  full JMeter HTML dashboards are uploaded as the jmeter-reports artifact.
Renames the workflow and its assets, and fixes the job failing before the
benchmark ran.

- Rename .github/workflows/performance.yml -> benchmark.yml (workflow name
  "LDAP benchmark: OpenDJ vs OpenLDAP") and .github/performance/ ->
  .github/benchmark/; update all internal paths and the concurrency group.
- Fix the "Capture OpenLDAP version" step failing with exit 141: `slapd -VV`
  piped to `head -1` got SIGPIPE when head closed the pipe early, and under
  `pipefail` that failed the whole step before any JMeter run. Capture the
  output fully and take the first line in bash, with a fallback to the image
  reference.
- Harden the OpenDJ version step the same way: guard ldapsearch with `|| true`
  so a transient bind failure cannot trip pipefail.
The benchmark plan failed to compile with "__P called with wrong number of
parameters. Actual: 3. Expected: >= 1 and <= 2": the default value in
${__P(basedn,dc=example,dc=com)} contains commas, which JMeter's function
parser treats as extra argument separators. Escape them as
${__P(basedn,dc=example\,dc=com)} in the ADMIN_CONNECT and BIND samplers.

Without this the tree never compiled, so no statistics.json was produced and
the summary step failed on jq. Verified locally with JMeter 5.6.3: the plan
now compiles and runs, and report generation produces statistics.json.
Five follow-up tweaks to the benchmark workflow and plan.

- Rename the "ADD WITHOUT DELETE" sampler to READD (benchmark.jmx, summary.sh OPS).
- Stop each Docker container right after its own benchmark (new Stop OpenLDAP /
  Stop OpenDJ steps) to free resources and keep only one server under load at a
  time; the final always() cleanup stays as the failure-path safety net.
- Make both servers hash passwords with the same scheme (SSHA-256), hashed
  server-side on write: MODIFY sends the password in cleartext and each server
  hashes it. OpenLDAP loads the pw-sha2 and ppolicy modules, sets
  olcPasswordHash {SSHA256} and enables the ppolicy hash-cleartext overlay on the
  mdb database; OpenDJ sets the Default Password Policy default storage scheme to
  Salted SHA-256. Each server hashes/verifies with its own implementation, so
  there is no cross-implementation hash-format dependency.
- Charts: render each operation as two adjacent, non-overlapping columns
  (OpenLDAP / OpenDJ) in different colors instead of two overlapping series, using
  an interleaved x-axis with zero-padded series and an explicit color palette
  (Mermaid xychart-beta has no grouped bars).
- Bump actions/cache@v4 -> @v5 and actions/upload-artifact@v4 -> @v7 (latest used
  elsewhere in this repo).

Validated locally with JMeter 5.6.3: the plan compiles, statistics.json carries
the READD label with ADMIN_CONNECT filtered out, summary.sh renders the two-column
charts, and actionlint passes.
Two report fixes in summary.sh.

- Make the per-operation columns actually render side by side. xychart-beta
  merges duplicate x-axis category labels into one slot, so the previous
  "ADD","ADD" axis collapsed both series back onto a single column and the
  OpenLDAP/OpenDJ bars overlapped. Use distinct labels ("ADD OL","ADD DJ", ...)
  so the zero-padded series stay on separate columns.
- Replace per-operation throughput with a single total-throughput comparison.
  In a sequential loop every operation runs once per iteration, so per-op
  throughput just equals the loop rate (nearly identical across ops, which
  looked wrong). The per-operation table now shows latency only (mean/p99/
  errors); a two-bar chart shows total OpenLDAP vs OpenDJ throughput, with a
  note explaining why per-op throughput is not charted.

Verified by rendering the report from the real run's statistics.json artifacts.
Two report fixes in summary.sh.

- Make the per-operation columns actually render side by side. xychart-beta
  merges duplicate x-axis category labels into one slot, so the previous
  "ADD","ADD" axis collapsed both series back onto a single column and the
  OpenLDAP/OpenDJ bars overlapped. Use distinct labels ("ADD OL","ADD DJ", ...)
  so the zero-padded series stay on separate columns.
- Replace per-operation throughput with a single total-throughput comparison.
  In a sequential loop every operation runs once per iteration, so per-op
  throughput just equals the loop rate (nearly identical across ops, which
  looked wrong). The per-operation table now shows latency only (mean/p99/
  errors); a two-bar chart shows total OpenLDAP vs OpenDJ throughput, with a
  note explaining why per-op throughput is not charted.

Verified by rendering the report from the real run's statistics.json artifacts.
benchmark.jmx
- SEARCH now filters on `mail` instead of `sn`. mail is equality-indexed by
  default on BOTH OpenDJ and OpenLDAP/osixia, whereas sn is indexed on OpenDJ
  only, so the search becomes a real indexed lookup on both servers.
- Every created value is unique: ADD keys each entry by a per-iteration JMeter
  counter, READD by a UUID. The search now matches exactly one entry, and the
  accumulated (never-deleted) READD entries no longer inflate the result set.
- Entries are minimal and index-symmetric: RDN is mail, objectClass is
  top/locality/extensibleObject, and no cn/sn/uid/givenName/telephoneNumber/
  member/uniqueMember are stored (those are indexed on OpenDJ but not osixia,
  which would bias the write cost). MODIFY writes description + userPassword.

summary.sh
- Replace the Mermaid xychart-beta charts with QuickChart (Chart.js) grouped bar
  charts rendered as images: proper side-by-side OpenLDAP/OpenDJ bars per
  operation, a legend, readable labels and no overlap. xychart-beta cannot group
  bars or show a legend and crowded the 14 x-axis labels. Throughput is a two-bar
  chart; latency is grouped bars per operation. Config is built with jq and
  URL-encoded.
…tifacts

- Lower the default load to 128 threads / 300s (was 256 / 600), in the workflow
  inputs and env fallbacks and in the JMX __P defaults.
- Latency chart: switch the Y axis to logarithmic so the small per-operation
  values stay readable next to the large ones (e.g. OpenLDAP BIND).
- Capture server logs and upload them as two per-server artifacts. Each Stop step
  (if: always(), so logs are kept on failure too) now saves the container's
  `docker logs` to <server>/server.log and copies the in-container log directory
  (OpenDJ /opt/opendj/data/logs -> opendj/internal; OpenLDAP /var/log ->
  openldap/var-log). Uploaded as artifacts logs-opendj and logs-openldap.
- Set retention-days: 90 on all artifacts (jmeter-reports, logs-openldap,
  logs-opendj).
- Raise the default concurrent thread count to 200 (was 128) in the workflow
  input and env fallback and in the JMX __P default.
- Latency chart now plots p99 instead of mean: tail latency is the more
  meaningful metric for the skewed LDAP latency distributions under load (mean
  hides the tail). The Y axis is linear (logarithmic scale removed).
Switch the OpenLDAP side from osixia/openldap (OpenLDAP 2.4.57, unmaintained) to
vegardit/docker-openldap (OpenLDAP 2.6.10).

- Adapt the setup to vegardit's interface: `LDAP_INIT_ORG_DN`,
  `LDAP_INIT_ROOT_USER_DN` (override to `cn=admin,<base>`), `LDAP_INIT_ROOT_USER_PW`,
  disable TLS/LDAPS, and neutralize the image's built-in ppolicy friction
  (lockout, pqChecker, min length) for the benchmark.
- vegardit ships no SHA-2 module, so use `{SSHA}` (Salted SHA-1) instead of
  `{SSHA256}`: it is OpenLDAP core and a built-in OpenDJ scheme, so both servers
  still hash identically. Set `olcPasswordHash {SSHA}` and enable hash-cleartext on
  vegardit's already-loaded ppolicy overlay (no module load or restart needed);
  set OpenDJ's default storage scheme to Salted SHA-1. cn=config edits go via
  EXTERNAL over ldapi as root.
- `mail` is still equality-indexed by default on vegardit (`uid,mail`), so the
  indexed SEARCH-on-mail comparison remains fair.
The Docker Hub image is published as `vegardit/openldap`, not
`vegardit/docker-openldap` (that is the GitHub repository name). The wrong name
made `docker run` fail with "pull access denied / repository does not exist".
Fix both the input default and the env fallback.
- Parametrize summary.sh: the server name, version and image are now arguments
  per server (`<name> <statistics.json> <version> <image>` x2), replacing the
  hardcoded "OpenLDAP"/"OpenDJ". The report title, tables and chart
  legends/labels are all driven by the passed names, so the script can compare
  any two LDAP servers.
- Move the benchmark-specific Notes out of the script into the workflow's
  "Build job summary" step (appended to $GITHUB_STEP_SUMMARY after the generic
  report).
Add a "Build vs Release" benchmark to the build-docker and build-docker-alpine
jobs: run the LDAP benchmark against the freshly built image and the latest
released image, and publish the comparison in the job summary to catch
performance regressions per push/PR (200 threads / 150s).

- New reusable helper .github/benchmark/compare-opendj.sh runs benchmark.jmx
  against two OpenDJ images sequentially (one container at a time on port 1389:
  start, seed ou=People, capture version, run, stop) and renders the comparison
  via the generic summary.sh. Both sides are OpenDJ, so no per-server
  hashing/index setup is needed (identical default password scheme).
- Each job gets a sparse `actions/checkout` of .github/benchmark (these jobs have
  no checkout and the build artifact does not include those files) plus a JMeter
  cache. build-docker compares localhost:5000/<repo>:<release_version> ("Build")
  vs openidentityplatform/opendj:latest ("Release"); build-docker-alpine compares
  the corresponding -alpine / :alpine tags.
vharseko added 2 commits June 24, 2026 15:22
- Rename the throughput unit from "ops/s" to "tests/s" in the Totals table
  column, the throughput section heading, and the chart title/dataset label/alt
  text (per-operation latency stays in ms). The value is unchanged — it is still
  the JMeter Total throughput (total samples/sec), just relabeled.
- Flip the Total throughput chart to a horizontal bar chart (type:horizontalBar),
  so the two servers sit on the vertical axis and throughput runs along the
  horizontal axis; widen it to suit horizontal bars.
@vharseko vharseko added the CI label Jun 24, 2026
…hart

- build.yml: upload the build-vs-release benchmark outputs (JMeter HTML reports,
  *.jtl, docker logs, *.jmeter.out) as artifacts with `if: always()` and 90-day
  retention in both build-docker and build-docker-alpine, so benchmark failures
  are diagnosable from the run.
- compare-opendj.sh: stop discarding JMeter output (write it to <slug>.jmeter.out)
  and print a per-image error breakdown (count | op | code | message) parsed from
  the .jtl to the step log.
- summary.sh: revert the Total throughput chart back to vertical bars (keep the
  tests/s label).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants