Benchmark the built Docker image against the released one#656
Open
vharseko wants to merge 16 commits into
Open
Benchmark the built Docker image against the released one#656vharseko wants to merge 16 commits into
vharseko wants to merge 16 commits into
Conversation
Adds .github/workflows/performance.yml plus supporting assets under .github/performance/ to run an automated LDAP benchmark comparing the latest OpenDJ and OpenLDAP Docker images. - Triggers: manual (workflow_dispatch) and after the Release workflow (workflow_run). Image tags and load profile (threads/duration/rampup/ jmeter_version) are configurable inputs, defaulting to latest images and the original 256-thread/600s profile. - Benchmarks OpenLDAP (osixia/openldap) first, then OpenDJ (openidentityplatform/opendj), sequentially so only one server is under load at a time. - JMeter plan (benchmark.jmx): the admin bind is cached once per thread (Once Only Controller, labelled ADMIN_CONNECT and excluded from metrics); ADD/SEARCH/COMPARE/MODIFY/DELETE/ADD WITHOUT DELETE run on that admin connection; the measured BIND is a single bind/unbind (test=sbind, own connection) as the per-thread user with the password MODIFY sets, so real password-hash verification is exercised. - Captures versions via shipped tools (OpenLDAP slapd -VV, OpenDJ rootDSE fullVendorVersion) and writes versions, a per-operation comparison table and two comparative Mermaid xychart-beta charts to GITHUB_STEP_SUMMARY; full JMeter HTML dashboards are uploaded as the jmeter-reports artifact.
Renames the workflow and its assets, and fixes the job failing before the benchmark ran. - Rename .github/workflows/performance.yml -> benchmark.yml (workflow name "LDAP benchmark: OpenDJ vs OpenLDAP") and .github/performance/ -> .github/benchmark/; update all internal paths and the concurrency group. - Fix the "Capture OpenLDAP version" step failing with exit 141: `slapd -VV` piped to `head -1` got SIGPIPE when head closed the pipe early, and under `pipefail` that failed the whole step before any JMeter run. Capture the output fully and take the first line in bash, with a fallback to the image reference. - Harden the OpenDJ version step the same way: guard ldapsearch with `|| true` so a transient bind failure cannot trip pipefail.
The benchmark plan failed to compile with "__P called with wrong number of
parameters. Actual: 3. Expected: >= 1 and <= 2": the default value in
${__P(basedn,dc=example,dc=com)} contains commas, which JMeter's function
parser treats as extra argument separators. Escape them as
${__P(basedn,dc=example\,dc=com)} in the ADMIN_CONNECT and BIND samplers.
Without this the tree never compiled, so no statistics.json was produced and
the summary step failed on jq. Verified locally with JMeter 5.6.3: the plan
now compiles and runs, and report generation produces statistics.json.
Five follow-up tweaks to the benchmark workflow and plan.
- Rename the "ADD WITHOUT DELETE" sampler to READD (benchmark.jmx, summary.sh OPS).
- Stop each Docker container right after its own benchmark (new Stop OpenLDAP /
Stop OpenDJ steps) to free resources and keep only one server under load at a
time; the final always() cleanup stays as the failure-path safety net.
- Make both servers hash passwords with the same scheme (SSHA-256), hashed
server-side on write: MODIFY sends the password in cleartext and each server
hashes it. OpenLDAP loads the pw-sha2 and ppolicy modules, sets
olcPasswordHash {SSHA256} and enables the ppolicy hash-cleartext overlay on the
mdb database; OpenDJ sets the Default Password Policy default storage scheme to
Salted SHA-256. Each server hashes/verifies with its own implementation, so
there is no cross-implementation hash-format dependency.
- Charts: render each operation as two adjacent, non-overlapping columns
(OpenLDAP / OpenDJ) in different colors instead of two overlapping series, using
an interleaved x-axis with zero-padded series and an explicit color palette
(Mermaid xychart-beta has no grouped bars).
- Bump actions/cache@v4 -> @v5 and actions/upload-artifact@v4 -> @v7 (latest used
elsewhere in this repo).
Validated locally with JMeter 5.6.3: the plan compiles, statistics.json carries
the READD label with ADMIN_CONNECT filtered out, summary.sh renders the two-column
charts, and actionlint passes.
Two report fixes in summary.sh.
- Make the per-operation columns actually render side by side. xychart-beta
merges duplicate x-axis category labels into one slot, so the previous
"ADD","ADD" axis collapsed both series back onto a single column and the
OpenLDAP/OpenDJ bars overlapped. Use distinct labels ("ADD OL","ADD DJ", ...)
so the zero-padded series stay on separate columns.
- Replace per-operation throughput with a single total-throughput comparison.
In a sequential loop every operation runs once per iteration, so per-op
throughput just equals the loop rate (nearly identical across ops, which
looked wrong). The per-operation table now shows latency only (mean/p99/
errors); a two-bar chart shows total OpenLDAP vs OpenDJ throughput, with a
note explaining why per-op throughput is not charted.
Verified by rendering the report from the real run's statistics.json artifacts.
Two report fixes in summary.sh.
- Make the per-operation columns actually render side by side. xychart-beta
merges duplicate x-axis category labels into one slot, so the previous
"ADD","ADD" axis collapsed both series back onto a single column and the
OpenLDAP/OpenDJ bars overlapped. Use distinct labels ("ADD OL","ADD DJ", ...)
so the zero-padded series stay on separate columns.
- Replace per-operation throughput with a single total-throughput comparison.
In a sequential loop every operation runs once per iteration, so per-op
throughput just equals the loop rate (nearly identical across ops, which
looked wrong). The per-operation table now shows latency only (mean/p99/
errors); a two-bar chart shows total OpenLDAP vs OpenDJ throughput, with a
note explaining why per-op throughput is not charted.
Verified by rendering the report from the real run's statistics.json artifacts.
benchmark.jmx - SEARCH now filters on `mail` instead of `sn`. mail is equality-indexed by default on BOTH OpenDJ and OpenLDAP/osixia, whereas sn is indexed on OpenDJ only, so the search becomes a real indexed lookup on both servers. - Every created value is unique: ADD keys each entry by a per-iteration JMeter counter, READD by a UUID. The search now matches exactly one entry, and the accumulated (never-deleted) READD entries no longer inflate the result set. - Entries are minimal and index-symmetric: RDN is mail, objectClass is top/locality/extensibleObject, and no cn/sn/uid/givenName/telephoneNumber/ member/uniqueMember are stored (those are indexed on OpenDJ but not osixia, which would bias the write cost). MODIFY writes description + userPassword. summary.sh - Replace the Mermaid xychart-beta charts with QuickChart (Chart.js) grouped bar charts rendered as images: proper side-by-side OpenLDAP/OpenDJ bars per operation, a legend, readable labels and no overlap. xychart-beta cannot group bars or show a legend and crowded the 14 x-axis labels. Throughput is a two-bar chart; latency is grouped bars per operation. Config is built with jq and URL-encoded.
…tifacts - Lower the default load to 128 threads / 300s (was 256 / 600), in the workflow inputs and env fallbacks and in the JMX __P defaults. - Latency chart: switch the Y axis to logarithmic so the small per-operation values stay readable next to the large ones (e.g. OpenLDAP BIND). - Capture server logs and upload them as two per-server artifacts. Each Stop step (if: always(), so logs are kept on failure too) now saves the container's `docker logs` to <server>/server.log and copies the in-container log directory (OpenDJ /opt/opendj/data/logs -> opendj/internal; OpenLDAP /var/log -> openldap/var-log). Uploaded as artifacts logs-opendj and logs-openldap. - Set retention-days: 90 on all artifacts (jmeter-reports, logs-openldap, logs-opendj).
- Raise the default concurrent thread count to 200 (was 128) in the workflow input and env fallback and in the JMX __P default. - Latency chart now plots p99 instead of mean: tail latency is the more meaningful metric for the skewed LDAP latency distributions under load (mean hides the tail). The Y axis is linear (logarithmic scale removed).
Switch the OpenLDAP side from osixia/openldap (OpenLDAP 2.4.57, unmaintained) to
vegardit/docker-openldap (OpenLDAP 2.6.10).
- Adapt the setup to vegardit's interface: `LDAP_INIT_ORG_DN`,
`LDAP_INIT_ROOT_USER_DN` (override to `cn=admin,<base>`), `LDAP_INIT_ROOT_USER_PW`,
disable TLS/LDAPS, and neutralize the image's built-in ppolicy friction
(lockout, pqChecker, min length) for the benchmark.
- vegardit ships no SHA-2 module, so use `{SSHA}` (Salted SHA-1) instead of
`{SSHA256}`: it is OpenLDAP core and a built-in OpenDJ scheme, so both servers
still hash identically. Set `olcPasswordHash {SSHA}` and enable hash-cleartext on
vegardit's already-loaded ppolicy overlay (no module load or restart needed);
set OpenDJ's default storage scheme to Salted SHA-1. cn=config edits go via
EXTERNAL over ldapi as root.
- `mail` is still equality-indexed by default on vegardit (`uid,mail`), so the
indexed SEARCH-on-mail comparison remains fair.
The Docker Hub image is published as `vegardit/openldap`, not `vegardit/docker-openldap` (that is the GitHub repository name). The wrong name made `docker run` fail with "pull access denied / repository does not exist". Fix both the input default and the env fallback.
- Parametrize summary.sh: the server name, version and image are now arguments per server (`<name> <statistics.json> <version> <image>` x2), replacing the hardcoded "OpenLDAP"/"OpenDJ". The report title, tables and chart legends/labels are all driven by the passed names, so the script can compare any two LDAP servers. - Move the benchmark-specific Notes out of the script into the workflow's "Build job summary" step (appended to $GITHUB_STEP_SUMMARY after the generic report).
Add a "Build vs Release" benchmark to the build-docker and build-docker-alpine
jobs: run the LDAP benchmark against the freshly built image and the latest
released image, and publish the comparison in the job summary to catch
performance regressions per push/PR (200 threads / 150s).
- New reusable helper .github/benchmark/compare-opendj.sh runs benchmark.jmx
against two OpenDJ images sequentially (one container at a time on port 1389:
start, seed ou=People, capture version, run, stop) and renders the comparison
via the generic summary.sh. Both sides are OpenDJ, so no per-server
hashing/index setup is needed (identical default password scheme).
- Each job gets a sparse `actions/checkout` of .github/benchmark (these jobs have
no checkout and the build artifact does not include those files) plus a JMeter
cache. build-docker compares localhost:5000/<repo>:<release_version> ("Build")
vs openidentityplatform/opendj:latest ("Release"); build-docker-alpine compares
the corresponding -alpine / :alpine tags.
maximthomas
approved these changes
Jun 24, 2026
- Rename the throughput unit from "ops/s" to "tests/s" in the Totals table column, the throughput section heading, and the chart title/dataset label/alt text (per-operation latency stays in ms). The value is unchanged — it is still the JMeter Total throughput (total samples/sec), just relabeled. - Flip the Total throughput chart to a horizontal bar chart (type:horizontalBar), so the two servers sit on the vertical axis and throughput runs along the horizontal axis; widen it to suit horizontal bars.
…hart - build.yml: upload the build-vs-release benchmark outputs (JMeter HTML reports, *.jtl, docker logs, *.jmeter.out) as artifacts with `if: always()` and 90-day retention in both build-docker and build-docker-alpine, so benchmark failures are diagnosable from the run. - compare-opendj.sh: stop discarding JMeter output (write it to <slug>.jmeter.out) and print a per-image error breakdown (count | op | code | message) parsed from the .jtl to the step log. - summary.sh: revert the Total throughput chart back to vertical bars (keep the tests/s label).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a "Build vs Release" performance benchmark to the Docker build jobs in
build.yml. On every push/PR, after the existing functional smoke tests, thefreshly built OpenDJ image is benchmarked against the latest released
image and the comparison is published in the job summary — so performance
regressions are caught at build time.
This builds on the OpenDJ-vs-OpenLDAP benchmark already on
features/performance(reusing
benchmark.jmxand the now-genericsummary.sh).What's added
.github/benchmark/compare-opendj.sh.github/workflows/build.ymlbuild-dockerandbuild-docker-alpineHow it works
compare-opendj.sh <A_name> <A_image> <B_name> <B_image>(envTHREADS=200,DURATION=150):ldap-utils/jqand Apache JMeter (cached).the OpenDJ container, wait for readiness, seed
ou=People, capture the version(
fullVendorVersion), runbenchmark.jmx, stop the container.summary.sh(
<name> <statistics.json> <version> <image>×2) into$GITHUB_STEP_SUMMARY:versions, totals, per-operation p99 latency, and QuickChart bars.
Both sides are OpenDJ, so no per-server hashing/index setup is needed — identical
product ⇒ identical default password scheme ⇒ hashing parity is automatic.
Wiring in
build.ymlBoth
build-dockerandbuild-docker-alpinegain:actions/checkoutof.github/benchmark(these jobs have no checkout,and the build artifact does not include those files);
Build=localhost:5000/${GITHUB_REPOSITORY,,}:${release_version}vs
Release=openidentityplatform/opendj:latest;Build-alpine=…:${release_version}-alpinevs
Release-alpine=openidentityplatform/opendj:alpine.Isolation
build-dockerandbuild-docker-alpineare separate jobs, each on its ownephemeral runner (own
registry:3, ports and containers), so they never contend.Within a job the benchmark runs after the functional tests (whose containers
are already killed) and uses one OpenDJ container at a time on port 1389.
Notes
actionlintandbash -n; full end-to-end runs only on CI.