Skip to content

perf(schemaview): incremental walker + safety net (audit only)#10002

Draft
asheshv wants to merge 34 commits into
pgadmin-org:masterfrom
asheshv:dev/incremental-audit
Draft

perf(schemaview): incremental walker + safety net (audit only)#10002
asheshv wants to merge 34 commits into
pgadmin-org:masterfrom
asheshv:dev/incremental-audit

Conversation

@asheshv
Copy link
Copy Markdown
Contributor

@asheshv asheshv commented Jun 3, 2026

Incremental schema walker + safety net

Introduces an incremental schema walker (O(visited) instead of O(total)
field re-evaluation per keystroke — 2–4× faster on 500–1000-row
dialogs), flips it on globally by default, and ships the safety
infrastructure to make that flip safe.

The walker prunes via mustVisit = changedPath ∪ depDests(changedPath) ∪ knownErrorPaths + structural sharing of
unvisited subtrees. Bug class: closures reading sibling rows
without declaring field.deps see stale data once pruning is on.
This PR makes that detectable in CI.

What lands

1 — Walker prototype

Incremental schemaOptionsEvalulator + validateSchema. Structural sharing of unvisited collection rows, mustVisit derived from changedPath ∪ depDests(changedPath) ∪ knownErrorPaths. Initial opt-in per schema via incrementalOptions = true; window-level __INCREMENTAL_OPTIONS__ flag for canarying. Initial flip targets: Table, Column, Partition, Index, Domain. Six parent-row dep declarations added.

2 — Deferred-protocol & DataGridView hardening

Documents the deferredDepChange protocol (always-settling Promises, recovery callbacks, side-effect placement). Migrates Table / ForeignTable / ExclusionConstraint / Index / AzureCred schemas to the protocol. Drain useEffect ref-equality fix, queue accumulation, DepListener prefix-match protection, six small DataGridView typo / staleness / className fixes. Plus a deterministic race test for the drain.

3 — Divergence canary

Two side-by-side walkers that diff their output and console.error (or throw under tests) on mismatch. Build-time gated via webpack DefinePlugin substituting process.env.__CANARY_BUILD__ to a literal boolean; in default production builds the canary module is DCE'd entirely. Verified by a CI script (see #7).

4 — Schema registry + audit harness

registerSchema(Class) self-registration, ESLint rule, AST codemod that migrated 86 default-exported schemas in one shot. The audit harness (auditSchema) walks every registered schema with the canary on. Multi-path error tracking (_knownErrorPaths) gives incremental validation production-equivalent safety semantics. ADD_ROW / DELETE_ROW / MOVE_ROW / BULK_UPDATE / nested-fieldset dispatch coverage, 3-row collections with per-row sentinels for chained reads, 6-variant scalar mutations (empty / whitespace / unicode / long), exhaustive k-combination batches (k=2..4) with all-k primary rotations, persisted-prev 10-step sequence pass, fast-check property-based fuzzing on top.

5 — UI smoke (Playwright)

5 dialogs against a real pgAdmin instance with the canary's throw flag on: Register Server, Create Table, Create Function, Edit Table, Edit Function. Tree navigation via pgAdmin's JS tree API (the DOM-based path was brittle against react-aspen virtualization).

6 — Production-flip hardening

Three real bugs surfaced by the audit / smoke + their fixes:

  1. listenDepChanges registered NO listener for evaluator-only deps. Fields declaring deps but no depChange callback had no entry in the listener registry, so _collectDepDestsForPath couldn't include them in mustVisit — pruned silently. Fixed: register a listener for every declared dep regardless of callback presence.

  2. state.__lastChangedPath overwritten by React batching. Two sibling fixedRows promises (VacuumSettingsSchema's vacuum_table + vacuum_toast) resolving in one microtask tick: the second setUnpreparedData dispatch clobbered the first's path; validate ran with only one path; the walker pruned the other collection's newly-arrived rows. Fixed: __pendingChangedPaths accumulator; validate() consumes the full set.

  3. drainDeferredQueue bypassed the listener wrapper. Raw sessDispatch for every resolved deferredDepChange — tripped the bypass guard AND dropped the path from the accumulator. Fixed: route via sessDispatchWithListener.

Plus the bypass guard itself: reducer fires console.error (which CI's setup-jest expect(console.error).not.toHaveBeenCalled() afterEach catches) on any path-bearing action that reaches the reducer without the __viaListener sentinel. Canary-only — production tree-shakes it.

LRU cap (1024) on _knownErrorPaths with eviction telemetry. Edit-mode audit data seed (idAttribute + populated text fields). Thicker schema.state stub. Developer guide at web/pgadmin/static/js/SchemaView/README.md.

7 — CI workflows

  • check-canary-treeshake.yml: builds NON-canary bundle, runs scripts/verify-canary-treeshake.sh. Fails if any of runOptionsCanary, runValidationCanary, formatDivergence, applyAllowlist, __throw_on_canary_divergence__ appear in app.bundle.js.
  • run-schemaview-ui-smoke.yml: PG 16, seeds a server via setup.py load-servers, pre-creates a test table + function for edit-mode coverage, builds with CANARY_BUILD=true, starts pgAdmin, runs Playwright. Uploads server log + Playwright trace on failure.

8 — Jest infra cleanups

Mocks for 5 ESM-only deps (react-data-grid, react-dnd, react-dnd-html5-backend, react-resize-detector, marked) + global AMD define() stub. Unblocked 16 audit-blocked schemas and 10 pre-existing failing Jest suites along the way.

9 — Default-on flip

The walker is enabled globally by default — opt-out via incrementalOptions = false on the schema instance (only used by tests). A new INCREMENTAL_OFF=1 env var on the perf-bench harness lets the bench replay the old slow-walk path for same-session A/B comparison.

Test scoreboard

Suites:

  • Before: 155 passing, 8 failing
  • After: 167 / 167 passing

Tests:

  • Before: 1049
  • After: 1289 (+240)

Audit coverage:

  • 87 schemas × 3 modes (create / edit / properties)
  • 270+ dispatch shapes per schema across: scalar / cell / structure / MOVE_ROW / BULK_UPDATE / batched (k=2..4 with all-k primary rotations) / 10-step sequence with persisted prev / nested-fieldset / 500-run property-based fuzzing.

UI smoke:

  • 5 dialogs (3 create + 2 edit), ~25 s

Per-keystroke latency at 10 outer × 1000 inner rows (synthetic nested.spec.js, same-session A/B with INCREMENTAL_OFF=1):

  • Outer-field typing: 231 ms → 61 ms (3.79× faster)
  • Inner-cell typing: 346 ms → 173 ms (2.00× faster)

Branch contents

This branch was originally split as dev/incremental-audit (safety net) + dev/incremental-flip (the actual default-on flip), then combined into one PR for reviewer convenience. The combined branch is 6 logical commits + 1 merge + 2 CI workflow commits since master:

  1. Walker prototype
  2. Deferred-protocol & DataGridView hardening
  3. Divergence canary
  4. Schema registry & audit harness
  5. UI smoke (Playwright)
  6. Production-flip hardening
  7. (merge from dev/incremental-flip carrying the default-on flip + perf-bench INCREMENTAL_OFF env)
  8. CI workflows + lint cleanup
  9. UI smoke workflow polish (setup-db, pre-seed table/function)

asheshv added 13 commits June 3, 2026 10:07
Replaces O(total fields) per-keystroke walks of the schema option
tree with O(visited fields) pruning. Two walkers are made
incremental:

  schemaOptionsEvalulator
    - Threads `changedPath` + DepListener-derived `depDests` through
      every level of recursion as `mustVisit`.
    - Collection rows whose globalPath doesn't overlap any
      mustVisit entry are kept as their previous-walk options via
      structural sharing (the row's whole subtree retains its
      previous object reference, so downstream subscribers
      short-circuit on Object.is).
    - Nested-tab / nested-fieldset / inline-groups share the
      parent's data level; recursion threads the same prev-options
      slice through.

  validateSchema
    - Same prune logic but for error map: rows outside mustVisit
      keep their prior error state instead of being re-walked.

Cross-row reads remain the known correctness hazard: a closure
(`editable`, `disabled`, `visible`, `readonly`, `validate`) that
reads a sibling row without declaring the source as `field.deps`
will silently see stale data. This commit ships the prototype with
that hazard documented on the file; the safety net (canary +
audit harness) lands in subsequent commits.

Other supporting changes:

  - Per-schema opt-in (`incrementalOptions = true`) and a
    window-level `__INCREMENTAL_OPTIONS__` flag for canarying
    without rebuilding plumbing. Initially flipped on for
    TableSchema, IndexSchema, PartitionSchema, ColumnSchema,
    DomainSchema; expanded later via the audit ratchet.
  - Six grid-cell evaluators (partition values, exclusion ops,
    index storage params, etc.) get explicit `field.deps`
    declarations for sibling-row reads.
  - DepListener-reverse-deps fold into the walker so cross-row
    deps stay correct under incremental mode.
  - Subscribe hooks pin useEffect deps so the SubscriberManager
    doesn't tear subs down/up on every render (C.1).
  - SubscriberManager.signal stops re-creating subscriptions (C.2).
  - DataGridView mappedCell.jsx render path trimmed.
  - checkUniqueCol short-circuits when the changedPath can't
    affect any uniqueCol.
  - Performance bench harness (Playwright) with INCREMENTAL=1 flag
    for same-session A/B comparison + gated instrumentation
    (record() / measure() / count() in perf.js).
  - Per-incremental-options + parent-row deps Jest tests.

Performance (same-session A/B, nested.spec OUTER=10):
  INNER=500:  outer typing 2.60x, inner typing 1.79x faster
  INNER=1000: outer typing 3.79x, inner typing 2.00x faster
  OUTER=500 INNER=10: both axes 1.88x faster.
Cluster of correctness fixes that fell out of building the
incremental walker. None alter user-visible behavior on their own;
together they harden the dep-listener plumbing and the deferred-
dependency-change queue so the walker can rely on consistent
inputs.

Deferred-dep protocol
---------------------
Establishes a documented contract for `field.deferredDepChange` so
schemas can return Promises without leaking pending updates or
losing recovery paths:

  - Return undefined to opt out (no Promise queued).
  - Otherwise return a Promise that ALWAYS settles. On success,
    resolve with `(tmpstate) => deltaObj`. On failure, prefer
    resolving with a recovery callback that resets in-progress
    flags rather than rejecting.
  - Side effects belong inside the Promise body BEFORE resolving;
    exception: side effects whose input legitimately depends on
    drain-time state may live in the returned callback.

Five existing schemas were migrated to the protocol (cleanup of
inline rejections and unresolved-promise leaks):

  - TableSchema typname + coll_inherits
  - ForeignTable.inherits
  - ExclusionConstraint.amname
  - Index.amname
  - AzureCredSchema.is_authenticating

Drain-queue plumbing
--------------------
  - Append to __deferred__ instead of replacing: two SET_VALUEs in
    the same React batch each contribute their pending promises;
    replacing the array dropped the earlier action's.
  - useEffect depends on the array REFERENCE, not its length:
    React's auto-batching can round-trip length 0 → N → 0 in one
    commit, and length-based equality misses the round-trip.
  - Prefix-match protection in DepListener.getDeferredDepChange so
    a listener bound at ['a'] doesn't accidentally fire on ['ab'].
  - Source path snapshot in addDepListener so caller mutations to
    the array after registration can't corrupt the listener entry.
  - Drain protocol guard: resFunc must be a function — log and
    skip otherwise. Rejected promises surface via notifier.error
    (with console.error fallback when notifier is unavailable, eg
    in jest harnesses).
  - Protocol-violation log promoted warn → error so test suites
    that fail on console.error catch the regression.

DataGridView + schema misc
--------------------------
  - DepListener caches joined source keys (the inner loop hot path
    in the walker's prev-dep lookup); early-bail when no listener
    registered.
  - mappedCell.jsx render path trimmed.
  - Six small bug fixes that surfaced during the audit:
      * row className uses join(' '), not join[' '] (was concat
        of literal " ")
      * 'priorty' → 'priority' typo in DataGridView feature
        comparator
      * TableSchema typname callback guarded against stale
        ofTypeTables (race when user changes type quickly)
      * typname changeColumnOptions moved out of resolved callback
        so it runs synchronously before drain
      * Azure auth_btn compares against the last source segment
        (was full path) — fix surfaces auth failures correctly
      * Azure clears stale auth_code when auth fails, resets
        is_authenticating
      * Schema inherits REMOVE branch guards against undefined
        getTableOid result
      * Stale columns cleaned up on same-length inherits swap
  - Tests: deterministic race test for drain useEffect dep array
    (verifies APPEND semantics under fast double-dispatch).
  - perf-bench nested.spec timeout raised from 300s to 500s.

Prerequisite for the canary safety net and the audit harness that
follow.
The incremental walker prototype prunes rows whose subtree the
changedPath cannot affect. Cross-row reads that aren't declared as
`field.deps` will silently see stale data — the prototype's known
limitation. Without an automated detection mechanism, every schema
flipping incremental on would need a manual review of every
closure in its `editable` / `disabled` / `visible` / `readonly` /
`validate` callbacks.

The canary is that detection mechanism. Two canaries land here:

  options/canary.js (runOptionsCanary)
    Wraps schemaOptionsEvalulator. When `window.__INCREMENTAL_AUDIT__`
    is set, runs BOTH the full walk and an incremental walk against
    the same prev-options + sessData baseline, diffs the resulting
    options trees, and reports any divergence.

  SchemaState/validation_canary.js (runValidationCanary)
    Same shape but for validateSchema's error map. Mirrors the
    options canary almost exactly so the two stay in lockstep.

Routing (defaultReport):
  - Production with `window.__incremental_canary_endpoint__`
    configured: navigator.sendBeacon to that endpoint.
  - Test env (NODE_ENV=test) with `__throw_on_canary_divergence__`:
    throws an Error with the diff — what Jest assertions catch.
  - Otherwise: console.error.

The branches are mutually exclusive so the test suite's
`expect(console.error).not.toHaveBeenCalled()` afterEach doesn't
fight the "thrown" path.

Build-time gate
---------------
`process.env.__CANARY_BUILD__` is substituted to literal `false`
(or true under `CANARY_BUILD=true yarn run bundle`) via webpack
DefinePlugin. In a default production build the entire canary
branch + its `require('./canary')` is dead-code eliminated. The
canary module ships ZERO bytes to end users.

The DefinePlugin substitutes a literal boolean (`process.env.X` →
`true`/`false`), NOT `JSON.stringify(...)` — `JSON.stringify(false)`
would yield the string "false" which is truthy in JS and defeats
DCE in the false branch.

Verification
------------
  - V5/M4 integration tests confirm both canaries fire on
    synthetic schemas with intentional cross-row divergence
    (option-side AND validation-side patterns).
  - V6 bundle smoke test confirms the canary doesn't appear in a
    non-canary build (later promoted to a CI shell script in the
    production-hardening commit).
  - Multiple aggressive-review fixes wired in to the canary
    wrapper: allowlist with TTLs for known-stale entries,
    per-session throttle so production sampling doesn't pay the
    2x walk cost on every keystroke, FIELD_OPTIONS diff format
    that's stable for grep-based debugging.

Together these form the safety net that lets the incremental
walker stay on by default for every schema — divergence becomes a
caught error, not a silent UI bug.
Production-flip blocker: the canary safety net catches divergence
when it fires, but only against schemas that have been opened in a
browser with `__INCREMENTAL_AUDIT__` enabled. To gate the global
flip behind CI, every registered schema needs synthetic dispatch
coverage that runs on every PR.

Three layers:

  schema_registry.js
    `registerSchema(SchemaClass)` records every BaseUISchema
    subclass at module-load time. `getRegisteredSchemas()` returns
    the registry as a Map.
    Throws on anonymous classes / non-class arguments so a future
    contributor can't accidentally register a factory function.

  ESLint rule (eslint-plugins/local-rules/register-schema.js)
    Flags any `export default class extends BaseUISchema {...}`
    that isn't wrapped in `registerSchema(...)`. Makes the
    registration contract enforceable in code review.

  AST codemod (scripts/codemod-register-schema.js)
    @babel/parser-driven, idempotent. Rewrites
      `export default class Foo extends BaseUISchema {...}`
    into
      `class Foo extends BaseUISchema {...}
       export default registerSchema(Foo);`
    Migrated 86 default-exported schemas in one shot.

  audit_harness.js
    The per-schema audit runner. For one SchemaClass:
      1. tryInstantiate with several constructor signatures
         (no-args, fieldOptions+initValues, generic stub fixtures).
      2. Generate default sessData via schema.getNewData({}).
      3. Establish baseline via the FULL walk (the canary's
         reference) to populate prev-options.
      4. For each scalar field, dispatch a synthetic SET_VALUE
         with the canary on; divergence throws and the calling
         spec fails fast with the diff.
      5. Same for collection cells, plus ADD_ROW / DELETE_ROW
         structural dispatches at the collection root.

  registered_schemas_audit.spec.js
    Loops over the registry and calls auditSchema for each. A
    KNOWN_DIVERGING allowlist (initially empty, the ratchet) lets
    expected-to-diverge schemas stay GREEN until they're fixed —
    when a schema is fixed and stops diverging, the test starts
    FAILING because the divergence didn't happen. That's the
    signal to remove it from the list.

Other supporting changes:
  - Multi-path error tracking: SchemaState now maintains
    `_knownErrorPaths` so previously-erroring rows ride mustVisit
    forever after, even if the user's current change doesn't
    touch that row. Catches re-errors without a full walk.
  - TableSchema partition fields get declared `field.deps` so the
    canary stays clean across the existing flip targets.
  - Canary resilience: throttle cap, allowlist TTL semantics,
    per-schema canaryAllowedDivergences hook for triage.
  - Tree-shake sentinels extended to audit_harness so the audit
    module itself is dead-code eliminated from production builds.
  - tryInstantiate failure messages report the RICHEST attempt's
    error, not the no-args one (which is almost always "X is not
    a function" and masks the real problem).
  - RoleSchema-specific instantiation fixture (its constructor
    needs a real userInfo arg that the generic fallbacks can't
    synthesize).
Opens real pgAdmin dialogs in a real browser with __INCREMENTAL_AUDIT__
+ __throw_on_canary_divergence__ enabled. Asserts no console.error or
pageerror at end. Covers three create dialogs: Register Server, Create
Table, Create Function.

audit-helpers.js holds shared boot/error-recorder utilities + an
ensureServerRegistered that uses a pre-seeded SQLite ID. Requires
CANARY_BUILD=true bundle so the canary stays in the bundle.

Tree navigation is DOM-based here (best-effort against react-aspen
virtualization). The JS tree API variant lands in the production
hardening group.
Three real bugs found by the audit + UI smoke:

  listenDepChanges registered NO listener for fields with declared
  deps but no depChange callback. The walker's
  _collectDepDestsForPath couldn't resolve evaluator-only deps so
  rows whose closures read sibling state via field.deps were
  silently pruned (vacuum_table.value editable was the canary's
  first catch).

  state.__lastChangedPath was a single scalar overwritten by React
  batching. Two sibling fixedRows promises (vacuum_table +
  vacuum_toast) resolving in one microtask lost one path entirely.
  Replaced with __pendingChangedPaths accumulator + validate()
  consumes the full set; legacy __lastChangedPath kept as a
  back-compat shim only when the accumulator is empty.

  drainDeferredQueue dispatched raw sessDispatch, bypassing the
  listener wrapper. Every deferredDepChange's DEFERRED_DEPCHANGE
  fired the bypass guard AND dropped its path from the accumulator.
  Now routed via sessDispatchWithListener.

Audit + smoke infrastructure additions:

  - Per-schema fixtures for the last 12 audit SKIPs.
  - 87 schemas x 2 modes (create + edit) with realistic edit-mode
    data seed (idAttribute + populated text fields) and a thicker
    schema.state stub so closures reading top.state.X see real
    shapes.
  - MOVE_ROW + BULK_UPDATE passes.
  - 3-row collections with per-row sentinels for chained reads.
  - 6-variant scalar mutations (empty / whitespace / unicode / long
    / two sentinels).
  - Batched-dispatch combinations k=2..4 with up to 2 rotations per
    combo.
  - 10-step sequence pass with persisted prev across dispatches
    (closes the bug-class that started this whole session: stale
    prev across batched commits).
  - Edit-mode UI smoke (Edit Table, Edit Function) using a new
    openEditDialogViaApi helper.
  - UI smoke navigates via pgAdmin's JS tree API instead of DOM
    expansion (works around react-aspen virtualization).

Production safety:

  - LRU cap (1024 entries) on _knownErrorPaths with insertion-order
    eviction + recency refresh; one-shot console.warn under canary
    on first eviction + perf-counter telemetry on every eviction.
  - Bypass guard: reducer fires console.error on path-bearing
    actions (SET_VALUE, ADD_ROW, DELETE_ROW, MOVE_ROW, BULK_UPDATE,
    DEFERRED_DEPCHANGE) missing the __viaListener sentinel. Under
    canary builds only; production tree-shakes it out. Fails Jest
    because setup-jest asserts console.error is never called.
  - CI script web/scripts/verify-canary-treeshake.sh greps the
    non-canary production bundle for canary-only symbols; non-zero
    exit if any leak.
  - Jest mocks for 5 ESM-only deps (react-data-grid, react-dnd,
    react-dnd-html5-backend, react-resize-detector, marked); plus
    a global define() shim for AMD modules. Together these unblock
    16 audit-blocked schemas + 10 pre-existing failing suites.
  - Developer guide at web/pgadmin/static/js/SchemaView/README.md
    covering the correctness contract, deps syntax, canary build,
    audit layers, dispatch rules, and common pitfalls.

Test results: 166/166 suites, 1201/1201 tests in 47s. UI smoke 5/5
dialogs in 25s. Same-session perf bench unchanged: 10x1000 outer
typing 3.79x faster, inner typing 2.00x faster.
…es (pgadmin-org#9985)

The container previously applied CAP_NET_BIND_SERVICE to the python
interpreter so the non-root pgadmin user could bind to ports 80/443.
Some platforms refuse to honor file capabilities:

  - --cap-drop=ALL / OpenShift restricted-v2 SCC zero the bounding set,
    so the kernel returns EPERM on exec of any capability-tagged binary.
    This makes the image fail to start (issue pgadmin-org#9657).
  - --security-opt=no-new-privileges / allowPrivilegeEscalation: false
    causes the kernel to silently strip file capabilities on exec, so
    the binary runs but a subsequent bind() to <1024 still fails.

Split the interpreter so neither default behavior nor restricted-runtime
support has to give up the other:

  - Dockerfile copies python3.X to /usr/local/bin/python3-cap and applies
    setcap to the copy. /usr/local/bin/python3.X stays un-capped, so
    /venv/bin/python3 (which symlinks to it) execs cleanly under
    restricted SCCs. A parallel /venv/bin/python3-cap symlink keeps the
    venv activation working when the capped interpreter is used.
  - entrypoint.sh reads /proc/self/status at startup. If NoNewPrivs is
    set, or CAP_NET_BIND_SERVICE is missing from the bounding set,
    gunicorn is invoked through the un-capped python and (when
    PGADMIN_LISTEN_PORT is unset) the default port falls back to 8080
    for plain HTTP or 8443 for TLS. A startup message records the
    choice.
  - Existing deployments with the default 80/443 mapping are unaffected:
    on every unrestricted runtime the bounding set still contains
    NET_BIND_SERVICE and gunicorn runs through the capped interpreter
    exactly as before.
  - PGADMIN_LISTEN_PORT, if set, is honored in both paths.

Docs gain a "Restricted Security Contexts" subsection covering the new
auto-detected fallback and the OpenShift / --cap-drop=ALL invocation.

Fixes pgadmin-org#9657
…ations

Six audit/doc polish items from the post-rebase punch list, bundled
because they share the same harness file and run as one Jest pass.

  Properties mode coverage
  ------------------------
  MODES = ['edit', 'create', 'properties']. The 'properties' mode
  filters fields by mode containing 'properties' — a different
  field subset than create/edit. Read-only display dispatches no
  user input but the walker still runs on every prop change, so
  divergence under properties is a real bug class.

  Tests grow from 174 to 261 (87 schemas x 3 modes); all pass.

  Nested-fieldset / nested-tab / inline-groups dispatch
  -----------------------------------------------------
  New auditNestedFields pass. Walks one level into every group
  field of type nested-fieldset / nested-tab / inline-groups,
  dispatches a SET_VALUE on each scalar inside. These shared the
  parent's data level but lived in the walker's nested branch (a
  different code path from auditScalars); bugs that only
  manifested in that branch slipped through.

  Production users covered: publication, trigger, table, index,
  type, sequence, pga_schedule, compound_trigger and others.

  All-k rotations in auditBatched
  -------------------------------
  MAX_ROTATIONS_PER_COMBO bumped from 2 to Infinity (effectively
  k). Each path in a k-combo gets a turn as `primary` instead of
  just the first two — closes the order-dependence gap (for k=4
  was 2/24 perm coverage, now 4/24 with every PATH primary
  covered).

  Runtime impact: audit suite went from 52s to 60s.

  RERENDER action documentation
  -----------------------------
  RERENDER is declared in SCHEMA_STATE_ACTIONS but has NO reducer
  case and NO production dispatch site. Documented in the enum as
  reserved/unused with an explicit note: if a future caller starts
  using it, they MUST add it to reducer.js's PATH_BEARING_ACTIONS
  set so the bypass guard catches accidental raw sessDispatch.

  Mutation-counter reset in setup-jest
  ------------------------------------
  Module-level capture of _resetMutationCounter (resolved once at
  worker boot, called in beforeEach). Without this, the 6-variant
  text-mutation cycle's position bleeds across specs, making CI
  failures harder to reproduce locally. The resolve must happen at
  module load (not inside beforeEach) because requiring the audit
  harness from beforeEach transitively pulls in the zustand mock
  whose top-level afterEach() registration would then run in
  test phase — caught a 15-suite regression during verification.

  Developer guide additions
  -------------------------
  README.md now includes:
    - Action types table with bypass-guard coverage column
      (SET_VALUE / ADD_ROW / DELETE_ROW / MOVE_ROW / BULK_UPDATE /
      DEFERRED_DEPCHANGE guarded; INIT / CLEAR_DEFERRED_QUEUE
      exempt; RERENDER reserved)
    - "Reading canary divergence output" section with worked
      example (the vacuum_table divergence pattern) + the three
      root-cause patterns surfaced this session
    - Updated bypass-guard description (console.error, not warn)

Test scoreboard:
  Before: 1201/1201 in 47s
  After:  1287/1287 in 72s  (+86 from new modes/passes)
Adds fast-check (devDependency) and a fuzz spec that drives RANDOM
k-batches across every registered schema in every mode. Closes the
last open coverage item from the post-review punch list.

What the fuzzer adds over deterministic auditBatched:
  - Random batch sizes (k in [2, 6]) instead of fixed k in {2, 3, 4}.
  - Random path-INDEX permutations (deterministic sweep uses fixed
    candidate emission order + k primary rotations; fuzzer covers
    the cross-product of extras orderings the rotations can't).
  - Shrinking. When a property fails, fast-check shrinks the
    schema/mode/index-array to a minimal reproducer — the
    counterexample lands in the test failure message and points
    new contributors at the smallest path-set that triggers
    divergence.

The fuzzer's payoff isn't today (every fuzz run is GREEN); it's the
day someone introduces a closure with an undeclared cross-row read
and the audit's deterministic sweep misses the specific batch
shape. The shrinker will pin it down without manual triage.

  fuzzBatchAgainst (audit_harness.js export)
    Single-batch driver: takes (SchemaClass, mode, pathIndices),
    instantiates, seeds, runs the FULL mount-time validate to
    populate knownErrorPaths (the harness equivalent of
    SchemaState's _knownErrorPaths populated on mount), then
    drives one batched dispatch against the canary. Returns
    {ok, candidates, batch, message} so fast-check's shrinker can
    work the minimal counterexample.

    Path indices are taken modulo the candidate list length so any
    input array of length >= 2 maps to a valid k-combination;
    duplicates dedupe silently so the shrinker can collapse same-
    path inputs without us special-casing.

  audit_fuzz.spec.js
    Property: for any random (schemaName, mode, pathIndices), the
    incremental walker output equals the full walker output.
    Defaults to 500 numRuns; total runtime ~3s with shrinking
    disabled in this happy-path config (verbose: false). Total
    Jest suite goes from 1287 to 1288 tests in 75s.

One real harness bug surfaced while writing this — initial
fuzzBatchAgainst skipped the mount-time validate that populates
knownErrorPaths. Production always validates on mount BEFORE any
user dispatch, so the incremental walker's mustVisit always
includes any pre-existing error paths. Without the prep, the
fuzzer's first incremental walk pruned paths that production
wouldn't and reported false-positive divergences on schemas with
pre-existing validation errors (TriggerFunctionSchema's
seclabels.*.label was the first shrunk repro). Fix: run the
discovery walk before the batched dispatch. The same fix was
already present in auditSchema; fuzzBatchAgainst now mirrors it.
Two new workflows that close the CI gap on the incremental walker's
safety stack. The existing run-javascript-tests.yml already picks
up the new 1288-test Jest suite (setup-jest.js sets
__CANARY_BUILD__='true' unconditionally), so the audit + canary +
fuzz + sequence + nested-fieldset + properties-mode + 3-row +
6-variant + all-k-rotation passes all run in CI today. The two
remaining gaps are:

  1. The canary tree-shaking gate
  ------------------------------
  check-canary-treeshake.yml: lightweight ubuntu-22.04 job that
  builds a NON-canary production bundle (CANARY_BUILD intentionally
  unset) and runs scripts/verify-canary-treeshake.sh. The script
  greps the resulting app.bundle.js for canary-only symbols
  (runOptionsCanary, runValidationCanary, formatDivergence,
  applyAllowlist, __throw_on_canary_divergence__). If any leak,
  the DefinePlugin gate failed and the canary (~7 KiB) is
  shipping to end users along with the 2x walk cost on every
  keystroke.

  No PG service, no Python, no pgAdmin runtime — just node +
  webpack + grep. Runs on every PR.

  2. The Playwright UI smoke
  --------------------------
  run-schemaview-ui-smoke.yml: ubuntu-22.04 job that:
    - installs PostgreSQL 16 from PGDG, starts it on port 5916
    - creates web/config_local.py with desktop mode, no master
      password, no OS secret storage (so the smoke's connect
      flow can drive the "Connect to Server" prompt without an
      Unlock-Saved-Passwords detour)
    - pre-registers a server via `python setup.py load-servers`
      from /tmp/servers.json (the smoke's
      ensureServerRegistered helper looks for an existing server;
      without seeding it finds none in fresh CI)
    - builds with CANARY_BUILD=true so the canary stays in the
      bundle
    - launches pgAdmin in the background + polls /browser/ until
      it responds
    - installs Playwright + chromium
    - runs audit-smoke.spec.js with PGADMIN_SERVER_NAME=CI-PG16
      pointing at the seeded server

Five dialogs covered: Register Server (canary on top-level
scalars + DataGridView dispatches), Create Table (TableSchema +
ColumnSchema + Constraints + Partition), Create Function
(FunctionSchema + NodeVariableSchema), Edit Table (TableSchema
in edit mode against an existing system table), Edit Function
(FunctionSchema in edit mode).

Failure artifacts: pgAdmin server log + Playwright trace +
screenshots uploaded on any non-success path so a flake or real
divergence can be triaged from the failed run.

Security audit of both files: only ${{ github.* }} interpolation
is concurrency.group (a YAML-level identifier, not shell context).
All run: blocks use static content. env: blocks use hardcoded
constants. No untrusted-input → shell paths. Matches the existing
workflow security pattern in the repo.
The audit harness ratchet on dev/incremental-audit reached zero —
all 86 registered schemas pass cleanly under the incremental walker.
Flip the conditional in SchemaOptionsEvalulator (options/registry.js)
and SchemaState.validate so any dispatch with a concrete changedPath
takes the pruned path. Schemas / dialogs that genuinely need
full-walk semantics can opt out by setting `incrementalOptions: false`
on viewHelperProps or the schema instance; the global
`window.__INCREMENTAL_OPTIONS__ = false` is an emergency-rollback
escape hatch that disables it everywhere without rebuilding.

SchemaState.updateOptions now propagates schema-level opt-out into
viewHelperProps so the walker (which only reads vhp) honors it.
The mirror logic for opt-IN stays for back-compat in case the
default ever shifts back.

Test updates:
- The "incremental (window global opt-in)" describe block flipped
  to "window global escape hatch" — afterEach uses `delete` instead
  of `= false` so the cleanup doesn't accidentally leave the
  opt-out signal set for subsequent tests.
- "schema without incrementalOptions runs full walk" now asserts
  default-on (the schema needs an explicit `false` to opt out);
  added a separate test for the explicit opt-out path.

Full jest suite: 1097/1097, including the audit harness reporting
zero divergences across all registered schemas.
The datagridview perf bench compares per-keystroke + per-action
costs in pgAdmin's SchemaView. To validate the incremental walker
flip (default-on), the spec needs to run twice in the same session
— once with incremental ON (default), once OFF — so cross-session
drift doesn't muddle the delta (see memory note about same-session
A/B requirement).

`INCREMENTAL_OFF=1` toggles `window.__INCREMENTAL_OPTIONS__ = false`
before each dispatch, opting out of the flip's default-on
behavior. Result files prefixed `on-` vs `off-` so a diff is
straightforward.

A/B results captured on this branch (Register Server > Parameters,
100 rows, 20 keystrokes per scenario):

  Grid cell typing:        validate -33%  (3.40ms vs 5.07ms / dispatch)
  General Name typing:     validate -36%  (3.25ms vs 5.12ms / dispatch)
  schemaOptionsEvalulator: -85% to -86%
  SchemaState.updateOptions: -84% to -86%
  validateSchema:          +6% to +8% (collectAll's no-short-circuit cost)
  setError calls:          identical (collectAll doesn't increase
                            volume in this fixture)
  ADD_ROW (100 rows):      ~no change (collection-level changedPath
                            overlaps every row's path → effectively
                            a full walk anyway)

Per-keystroke walker code: ON ~3.3ms vs OFF ~7.3ms. The dominant
cost (options evaluation) drops by 84-86%; the small validateSchema
regression from removing short-circuit is dominated by the wins.

The infrastructure (audit harness, multi-path tracker, canary
resilience) DID NOT cost us perf — it shipped alongside a net win.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9ad259f9-8734-458e-ac72-f4a92524a655

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

asheshv added 16 commits June 3, 2026 13:06
Three CI fixes from the first run of PR pgadmin-org#10002:

  1. ESLint indent + no-console + display-name + import-rule
     Local dev passed because yarn run test:js-once linted on the
     way through but the runner caught extra cases on a clean
     clone. 78 errors → 0.
       - audit_harness.js rotation loop had inconsistent indent
         around the inner combo/extras block (~70 lines)
       - perf-bench/ scripts use console.log to emit per-keystroke
         numbers; they're diagnostic dev scripts, added the
         directory to .eslintrc.js's ignores
       - react-resize-detector mock's withResizeDetector HOC
         lacked displayName on the wrapped component
       - perf.js had a stale 'import/no-unassigned-import' disable
         comment referencing a rule the project doesn't load

  2. schemaview-ui-smoke 'Start PostgreSQL' step failure
     ubuntu-22.04 runners come with an older PostgreSQL pre-
     installed; the new install of pg16 didn't recreate
     /etc/postgresql/16/main/ when a conflicting cluster was
     present. Added a 'Uninstall any pre-installed PostgreSQL'
     step mirroring run-feature-tests-pg.yml's pattern, runs
     before the pg16 install.

  3. Cascade: run-javascript-tests (3 OS), run-feature-tests-pg
     (5 versions), and build-container all failed on the same
     lint preflight as their first step. Fix pgadmin-org#1 cascade-fixes
     them all.

verify-canary-treeshake passed on the first CI run; no changes
needed there.

All 1288 Jest tests still pass locally.
# Conflicts:
#	web/pgadmin/static/js/SchemaView/options/registry.js
Locally replayed the UI smoke workflow against an isolated SQLite
+ the existing pgAdmin codebase and confirmed:

  - JSON server format (Servers / Name / Group / Port / Username /
    Host / MaintenanceDB / SSLMode) is the right shape.
  - load-servers requires the SQLite to already exist; bare
    workflow run errored 'SQLite database file does not exist'
    before adding any entries.
  - setup.py setup-db creates the DB cleanly given a valid
    config_local.py with SQLITE_PATH set.

Added a 'Initialise pgAdmin config DB' step that runs setup-db
between config_local.py creation and load-servers. With the fix
in place, the local replay completes all 5 UI smoke tests in
~27s — same shape as a normal local run.
CI run #2 result: 3 of 5 smoke specs passed (Register Server,
Create Table, Create Function); the two edit-mode specs failed
with 'openEditDialogViaApi: no child of type "table"/"function"
found under selected node — does the database have any?'

CI's fresh PG 16 starts with an empty postgres database — no
user tables, no user functions. The smoke's openEditDialogViaApi
helper looks for the first child of the requested type and opens
Properties on it; with no children, it correctly throws.

Locally this worked because my dev PG has objects in the test
database. In CI we need to pre-create them.

Added a 'Seed schema objects' step that runs a single psql with:
  - CREATE TABLE IF NOT EXISTS audit_smoke_table (id, name)
  - CREATE OR REPLACE FUNCTION audit_smoke_func() RETURNS int

The 'IF NOT EXISTS' / 'OR REPLACE' make the step idempotent so
re-runs on the same runner don't error.

All other 14 jobs went green in the prior run; this should bring
schemaview-ui-smoke green too.
13 lint warnings on the latest CI run, all about unused vars. Three
fixes:

  1. catch (_e) → catch  (10 sites)
     ES2019 optional catch binding. The error is genuinely unused
     in each site:
       - canary.js, validation_canary.js: sendBeacon swallow
         (documented browser quirk; silent telemetry path).
       - audit_harness.js tryInstantiate fall-through: next
         attempt's outcome replaces the error.
       - audit_harness.js seedCollections / ADD_ROW / sequence
         / applyMutation: per-step skips already covered by other
         passes; failure mode is captured via null returns or
         skip-result fields.
       - audit_harness.js auditSchema initial validate:
         dispatch loop's own try/catch catches the same error
         with the field path context.
       - setup-jest.js AMD define + audit_harness import:
         best-effort module loading; failures are expected for
         non-SchemaView Jest workers.

  2. catch → catch (e) WITH the error captured in fuzzBatchAgainst
     initial validate (one site, audit_harness.js ~1263). When the
     fuzzer's mount-time validate throws, the previous swallow
     reported only 'initial validate threw'. fast-check's shrinker
     would land on a minimal counterexample like (TableSchema,
     create, [0, 1]) with no clue WHY. Now the reason includes
      so the shrunk reproducer names
     the real failure.

  3. Removed dead code in batched_changed_paths.spec.jsx
     buildState helper + the SchemaState import that fed it. Both
     were leftovers from an earlier iteration that built
     SchemaState directly; the spec switched to a Harness
     component using useSchemaState end-to-end. The two test()
     cases now reference Harness exclusively.

CI scoreboard:
  Before this commit: 1289 tests pass, 13 lint warnings.
  After:              1289 tests pass, 0 lint warnings, 0 errors.
Replaces previously-introduced lint suppressions with actual fixes:

1. ESLint config now declares the genuine project globals
   ----------------------------------------------------
   pgAdmin (webpack ProvidePlugin global, used in bench-fixture.js)
   and expect (Jest global, used at module scope in setup-jest.js's
   afterEach) added to the eslintrc globals block. Three
   eslint-disable-next-line no-undef markers removed in turn.

2. perf-bench directory ignore -> per-file disables
   ------------------------------------------------
   Dropped the broad regression/perf-bench/ ignore from
   .eslintrc.js. Replaced with file-level
   /* eslint-disable no-console */ headers ONLY in the two specs
   that actually need them (nested.spec.js, datagridview.spec.js).
   audit-helpers.js uses only console.error which the rule
   already allows — no disable needed.

   File-level disables are more honest than a directory ignore:
   a maintainer reading the spec sees consent for THAT file, not
   a blanket carve-out hidden in the config.

3. perf.js: 5 line-level disables -> one block-level disable
   ----------------------------------------------------------
   The dump() function uses console.table for tabular output —
   the no-console rule's allowed warn/error/trace can't express
   it. Switched 5 // eslint-disable-next-line no-console comments
   to a single /* eslint-disable no-console */ /* eslint-enable */
   block around the function body. Rest of perf.js still under
   the rule.

4. Stale openTreeContextMenu helper removed
   ----------------------------------------
   The directory ignore was hiding an unused-vars warning on
   openTreeContextMenu in audit-helpers.js — leftover from before
   I switched UI smoke to the JS tree API. Deleted (8 lines).
   navigateToCatalogNodeViaApi + openCreateDialogViaApi / Edit
   variant superseded it.

5. Empty catch block in nested.spec.js
   ----------------------------------
   try { ... } catch {} -> catch { /* try the next candidate */ }
   for the cancel-dialog candidate selector loop. The block is
   intentionally empty; the comment now says so explicitly.

Net change:
  Before: 11 eslint-disable directives + 1 directory ignore
          + 13 'warning' findings in CI (the original cleanup)
  After:  2 eslint-disable markers total (perf.js dump block,
          bench-fixture.js mount-log — both with explicit
          comments explaining WHY), 0 directory ignores added
          by this PR, 0 lint warnings, 0 errors.

All 1289 Jest tests still pass.
…y reset

Three honest improvements over per-PR suppressions / shortcuts.

  1. scripts/ directory ignore removed from .eslintrc.js
  -----------------------------------------------------
  The 'scripts/' entry was added by this PR (commit d22a249)
  to hide codemod-register-schema.js from lint. With it gone,
  the codemod surfaces 8 real issues:
    - 1 string-quote violation (autofixed)
    - 7 console.* statements (CLI report mechanism — legitimate
      for a one-shot script; gated by a single file-level
      /* eslint-disable no-console */ marker with an explicit
      comment noting it's the script's output channel)
  Net change: 1 directory ignore + 0 disables → 0 directory
  ignores + 1 well-commented file-level disable.

  2. auditNestedFields recurses through ALL depths
  ------------------------------------------------
  Previously dispatched against scalars only 1 level into a
  nested-fieldset / nested-tab / inline-groups container. Real
  schemas chain group containers multiple levels deep — e.g.
  TableSchema's Storage block, IndexSchema's With block — and
  deep scalars went unexercised.

  New walkNestedScalars(rootSchema) generator yields every
  scalar reachable through the group hierarchy, bounded by
  MAX_NEST_DEPTH=6 (generous for production shapes, guards
  against pathological / cyclical schemas).

  Note: the walker doesn't prune within nested-fieldset (all
  fields in a group are always walked), so multi-level
  divergences aren't a likely bug class today. The value is
  coverage — dispatching on deep scalars in case a future walker
  change adds pruning, or in case a deep closure has its own bug.
  New synthetic test (3-level L1->L2->L3) verifies the recursion
  fires at every depth by dispatch count.

  3. setup-jest: lazy mutation-counter capture via require.cache
  --------------------------------------------------------------
  Pre-fix: setup-jest required audit_harness at module load on
  every Jest worker (including ~80% non-SchemaView workers).
  Module-load require chosen because requiring inside beforeEach
  trips the zustand-mock's top-level afterEach() registration.

  Post-fix: resolve the audit_harness path once at module load
  (require.resolve only — doesn't load the module), then check
  require.cache[path] inside beforeEach. If a SchemaView spec
  has already loaded the harness, capture _resetMutationCounter
  from the cached exports; otherwise no-op. Non-SchemaView
  workers do one property lookup per beforeEach and skip — no
  audit_harness in their require graph.

Test scoreboard:
  Before: 1289 tests in 75s
  After:  1290 tests in 75s  (+1 from the depth-3 recursion
                              regression-lock)

Perf re-benched post-rebase at 10 × 1000:
  Outer typing  ON 60 / OFF 231 ms/key  → 3.85x
  Inner typing  ON 174 / OFF 351 ms/key → 2.02x
  Inner ADD_ROW ON 566 / OFF 696 ms     → 1.23x
PR description's 3.79x / 2.00x figures still accurate (slight
improvement within run-to-run noise).
Two correctness/safety improvements surfaced by an aggressive
post-build review of this PR.

  Iter 1: depDests now includes knownErrorPaths (LOW-severity gap)
  --------------------------------------------------------------
  updateOptions(changedPath, depDests) was called with depDests
  built from { changedPath, extras, extras' deps }. The mustVisit
  for validateSchema ALSO included _knownErrorPaths.values(), but
  depDests passed to the options walker DID NOT.

  Latent inconsistency: the options walker could prune a row that
  previously had a validation error, miscomputing its options if
  any closure read top.errors / row error state. The canary would
  catch divergence here, but adding the paths is the correct fix.

  Iter 3: per-pass try/catch so one closure crash doesn't bury 7 passes
  -------------------------------------------------------------------
  Pre-fix: a SINGLE outer try around all 8 audit passes. If any
  pass threw a non-divergence error (closure crash on missing
  nodeInfo etc.), the entire chain aborted — remaining passes
  never ran. CompoundTriggerSchema, ViewSchema, SubscriptionSchema
  were affected: 4 SKIP entries in the registered_schemas_audit
  output where the schema was getting ZERO audit coverage.

  Refactored to runPass(label, fn) wrapper that catches per-pass.
  Divergences still propagate (the isDivergenceError check is
  preserved). Non-divergence errors aggregate into a passSkips
  array; the audit reports skip ONLY when zero passes contributed
  dispatches — otherwise the schema gets PARTIAL coverage from the
  passes that ran.

  Result: 4 SKIPs → 2 SKIPs. CompoundTriggerSchema now gets
  partial coverage (previously 0); ViewSchema and SubscriptionSchema
  still skip because EVERY pass throws on those (genuine harness
  fixture limitation, not a regression).

Test scoreboard:
  Jest:  167 / 167 suites, 1290 / 1290 tests passing (unchanged
         count — these are quality fixes, not new tests)
  Audit: 261 (87 schemas × 3 modes) — 260 passing tests + 1
         discovery (was 260 passing, 0 visible regression in
         net count, but better partial-coverage shape)
Adds two confidence-building suites for the SchemaView incremental
walker on top of the original 5-dialog audit-smoke set:

audit-smoke-extended.spec.js (15 dialogs):
  - Schema-level (10): View, MaterializedView, Sequence, Type, Domain,
    Procedure, Aggregate, ForeignTable, Collation, FTS Configuration
  - Server-level (2): Role, Tablespace
  - Sub-catalog (2): Trigger, Index
  - Function-like (1): Trigger Function
  Each test opens the dialog with the canary's throw-on-divergence
  flag enabled, waits for Name, closes, and asserts canary clean.

audit-visual-regression.spec.js (5 dialogs):
  Snapshot-based regression for Edit Table, Create Function, Create
  Type (composite default), Edit Role, Create Index. Catches CSS /
  layout drift the walker canary can't see. Includes initial darwin
  baselines + README documenting the master-baseline-then-diff flow
  with explicit cross-OS / PG-version / browser-version caveats.

Helper changes (audit-helpers.js):
  - navigateToServerCollectionViaApi for server-level nodes
  - navigateToTableSubCollectionViaApi for sub-catalog nodes
  - Expanded coll-X catalog mapping (mview, fts_configuration, etc.)
H1 — Vacuous canary assertion. expect(__INCREMENTAL_AUDIT__).toBe(true)
only verified the harness flag was set, NOT that the canary actually
walked. On a non-CANARY_BUILD bundle the canary is tree-shaken and the
spec passes silently. Added window.__canary_entry_count__ in
validation_canary.js (zero cost in prod — entire file is tree-shaken)
and new expectCanaryExecuted() asserts > 0 after each dialog open.

H2 — Close-button selector ambiguity. button:has-text("Close").first()
could match a transient toast or a lingering Register Server dialog
under race conditions. Scoped to
.dock-panel.dock-style-dialogs button[data-test="Close"].

H3 — Sub-collection nav error message. Clarified that an empty
'available:' list from openAndFind means public has no tables; the CI
seed step likely didn't run.

H4/H5 — README workflow. Original "capture on master then cherry-pick"
was unworkable because the spec is new to this PR (doesn't exist on
master). Rewrote with: (a) single-branch capture-and-commit flow, and
(b) a temp-branch-off-master pattern for true regression diffing.

M1 — Visual-diff threshold was dialed up to make CI green, not
calibrated. Replaced maxDiffPixelRatio: 0.005 (~4800 px on a typical
dialog) with maxDiffPixels: 100 — a count-based threshold based on
observed back-to-back drift (0-15 px) + 5x safety margin.

M3 — Helper try-finally so a missing Name textbox doesn't leave a
stale dialog over the tree for the next spec.

M5 — Visual specs auto-skip on non-darwin via test.beforeEach; the
committed darwin baselines guarantee-fail on Linux without this.

M6 — Removed misleading mask-CodeMirror comment block; specs don't
navigate to the SQL tab so the mask wasn't needed.

L1 — README kill command now uses pkill ... || true to avoid set -e
aborting on no-such-process.

L3 — Edit Role visual baseline masks Name + Comments fields so the
spec works against any install (different first-role names otherwise
guarantee a diff).

L4 — Removed dead Operators → coll-operator mapping (no spec uses it).

L5 — Weakened "(inherits deferred)" / "(amname deferred)" claims to
"(mount)" — the specs don't toggle the dropdowns that would actually
exercise the deferred-dep path.

NOT addressed (out of scope for this branch):
  M2 — bootPage duplication between audit-smoke and the new spec
  files. Refactor belongs on a dedicated cleanup branch.
  H1 also applies to web/regression/perf-bench/audit-smoke.spec.js
  on the parent (audit) branch — fix there belongs on PR pgadmin-org#10002.
- Counter delta check: expectCanaryExecuted now compares against a
  baseline taken before the dialog opens (defends against future
  Playwright page-reuse modes that would silently pass on a stale
  leftover count).
- README: python -> python3 (only python3 is on PATH in common
  environments including modern macOS dev boxes).
- Stale doc reference in audit-visual-regression.spec.js: pointed to
  removed README section "Mitigation: capture baselines in CI itself";
  now points to the actual current section name.

Acknowledged-as-follow-up (not addressed in this commit):
  - Exclusion Constraint deferredDepChange not covered by smoke
    (tree path is 4-level: table -> constraints group -> coll- ->
    sub-node, needs a new helper).
  - Mount-only smoke doesn't exercise the deferred-dep RUNTIME path
    (would need specs that actually toggle amname / inherits).
Brings visual snapshot coverage in line with the 20-dialog UI smoke
set. Refactored the 5 existing specs onto three category helpers
(snapshotSchemaChild / snapshotServerChild / snapshotTableChild) so
adding the 15 new specs avoided 15x copy-paste.

New visual specs:
  Schema-level (12 net new):
    Create Table, Edit Function, Create View, Create MView,
    Create Sequence, Create Domain, Create Procedure, Create Aggregate,
    Create Foreign Table, Create Collation, Create FTS Configuration,
    Create Trigger Function
  Server-level (2):
    Create Role, Create Tablespace
  Sub-catalog (1):
    Create Trigger

Existing specs (Edit Table, Create Function, Create Type, Edit Role,
Create Index) preserved; baselines re-captured under the new helper
flow to keep the 20-baseline set internally consistent.

Edit Function masks Name (env-specific first-function name).

Resource note: 20 sequential pgAdmin sessions can exhaust PG
max_connections. Restart pgAdmin between capture and verify runs
(documented in README-visual-regression.md).
…+ visual

C1 — README pointed to docs/scratch/2026-06-04-audit-mock-harness.md
which is untracked. Inlined the 2-sentence mock-pivot summary in the
README directly so the doc stands on its own.

C2 — Visual specs silently dropped any walker divergence: they called
installErrorRecorders but never asserted on the captured errors. A
real walker regression during a visual spec's dialog mount would slip
past visual diff if pixels happened to match. Added expectNoDivergence
inside each snapshotXChild helper after the screenshot.

C3 — bootPage in both new spec files dropped the
`expect(__INCREMENTAL_AUDIT__).toBe(true)` assertion from the original
audit-smoke.spec.js. Re-added in both files so a SPA reload between
goto and dialog open fails the test instead of silently disabling the
canary.

H2 — Helper error message referenced a CI seed step
`create_test_tables_function` that doesn't exist in any workflow.
Replaced with a real pointer ("the schemaview-ui-smoke job's pre-spec
seed step").

H3 — Edit Role / Edit Function visual masks used page-level
getByRole locators that could match unrelated overlays. Scoped to
dialogLocator(page).getByRole(...) so mask resolution stays inside
the dialog panel.

M3 — Section comment said "Schema-level dialogs (8 tests)"; actually 10.

M5 — README "Dialog coverage" table listed 5 dialogs; visual spec now
has 20. Replaced the per-dialog table with a 3-row category summary
plus a "Highlights worth knowing about" section.

Not addressed (out of scope or follow-up):
  - M1 bootPage triplicate refactor (audit-smoke also has it; touching
    parent branch's spec belongs on PR pgadmin-org#10002).
  - M2 snapshotTableChild editMode (sub-catalog Edit isn't a stated
    coverage goal yet).
  - M4 Edit Function CI seed (the spec relies on a function existing;
    same gap as table-existence-for-sub-catalog).
  - H4 5-6 min runtime (performance; functionality is correct).
…on-level

Mount-only smoke only catches "dialog fails to construct" — the
narrowest possible regression. A walker bug that triggers on
ADD_ROW, tab-switch, or any cross-tab dep evaluation slips past
silently. Aggressive review flagged this honestly: the 15 smoke
specs were theater compared to the 5 original audit-smoke specs
which actually interact.

New exerciseDialog helper fires four real dispatch types after the
dialog mounts:

  1. SET_VALUE on Name      (top-level scalar fill)
  2. tab-switch through    (renders each tab's fields + initial
     all visible tabs       validate pass on the schema's other
                            collections)
  3. ADD_ROW on the first  (collection dispatcher; the most common
     DataGridView in the    failure surface for walker bugs because
     dialog (if any)         it adds an entry the validator has to
                            re-walk)
  4. SET_VALUE on the new  (collection-row scalar; exercises
     row's first cell        per-row validation paths)

Best-effort: dialogs without Name (none, today) skip step 1;
dialogs without a DataGridView (Collation, Tablespace) skip steps
3-4. Validation errors from the typed values aren't confused with
canary divergences — expectNoDivergence filters on canary-specific
"Incremental walker divergence" / "Incremental validator
divergence" messages only.

Close handler now auto-accepts the "Discard changes?" confirm that
some dialogs pop after Name was mutated.

Runtime: ~1.3 min for all 15 specs (was ~1.0 min mount-only).

Renamed "(mount)" suffixes off Foreign Table and Index specs — no
longer accurate.
…ental mode

Edit-mode smoke specs (added in this branch) caught a real walker bug
in the options-tree incremental walker: when the prior walk's options
tree doesn't have an entry for a collection row (race with initial
walk's state propagation, or first dispatch after fresh data load in
Edit mode), the incremental prune at registry.js:253 would skip the
row AND leave nextColl[idx] undefined. Pruned rows are supposed to
inherit their prior reference via the earlier `{...prevColl}` spread,
but when prevColl[idx] is undefined there's nothing to inherit.

Symptom: walker canary divergence in 4 Edit-mode dialogs:
  Edit Type           — composite.0 collection
  Edit Domain         — constraints.0 + cell states
  Edit Foreign Table  — columns.0 + cell states
  Edit Index          — columns.0 + per-cell options (colname, nulls,
                        op_class, ...)
All four shared the same diagnostic pattern: incremental=undefined,
full={"canEditRow":true,"canDeleteRow":true}.

Fix: guard the prune on `prevColl[idx] !== undefined`. When there's
no prior to inherit, walk the row fully — that's the only way to
produce a non-undefined options entry for downstream consumers
(DataGridView's canEditRow / canDeleteRow checks, per-cell field
options). When prev IS populated, the prune still applies and
structural sharing is preserved.

Test updates:
  - incremental_options.spec.js (unit tests for the walker) now do a
    warm-up full walk to populate prev before exercising incremental
    behaviour. Without the warm-up, the new guard correctly visits
    all rows because none have prior entries to inherit — which is
    the right runtime behaviour but the wrong shape for asserting
    "incremental visits only the changed row." Switched 9 assertions
    from `visitedRowIdxs` (FIELD_OPTIONS presence) to
    `newlyVisitedRowIdxs` (reference comparison vs prev) — the
    correct semantic for "this walk visited the row" once pruned
    rows correctly inherit from prev.

Spec/seed/helper changes co-located in this commit:
  - audit-smoke-extended.spec.js: 15 Edit-mode smoke specs added
    (one per dialog type), exercising real dispatches via the
    existing exerciseDialog helper. Coverage now 30/30 (15 Create +
    15 Edit) on the extended dialog set.
  - audit-helpers.js:
    * openEditDialogViaApi: poll for children extended from 2s to
      10s (sub-collection REST fetches are slower than top-level
      catalog ones).
    * navigateToTableSubCollectionViaApi: pin to `audit_smoke_table`
      by name instead of "first table" — local environments had a
      pre-existing XSS-payload-named table sorting first
      alphabetically, with no triggers/indexes. Force-find the
      CI-seeded table; emit a clear seed-step pointer if missing.
    * tree.open() short-circuits on already-opened nodes; replaced
      with ensureLoaded() to force the REST fetch.
    * Stash the just-selected node on window.__pgadminLastNavigatedNode
      so openEditDialogViaApi can recover from tree.selected() drift
      (observed: pgAdmin's tree.selected() slips back to a parent
      node after a tree-refresh event triggered by the sub-collection
      REST fetch completing).
  - .github/workflows/run-schemaview-ui-smoke.yml: extended the
    Edit-mode seed step to provide all 13 newly-required object
    types (view, mview, sequence, type, domain, procedure, aggregate,
    foreign_table, collation, fts_configuration, trigger_function,
    trigger, index).
asheshv added 5 commits June 5, 2026 10:32
pgAdmin's tree.selected() observably drifts to a parent node
(typically the database) after a tree-refresh event fires when the
selected collection's REST children land. Test code that does
tree.select(node, true) followed by immediate dispatch via
tree.selected() races against the refresh: sometimes the drift
hasn't happened yet, sometimes it has — surfacing as random
"no child of type X found" (openEditDialogViaApi) or
"wait Name textbox timeout" (openCreateDialogViaApi) flakes
depending on which spec hit the race.

Validation runs with --workers=1 had a 30% iteration failure rate
before this fix (3 of 10 iters had 1 spec fail). --workers=4 had a
50% iteration failure rate. Different specs failed each iteration —
the flake was sequence-dependent, not spec-specific.

Two-part fix:

1. Every navigation helper now stashes the just-selected node on
   `window.__pgadminLastNavigatedNode`. Previously only
   navigateToTableSubCollectionViaApi did this — adding the same
   stash to navigateToCatalogNodeViaApi and
   navigateToServerCollectionViaApi.

2. openCreateDialogViaApi now consults the stash first and falls
   back to tree.selected() only if no stash is set. Mirrors what
   openEditDialogViaApi already did.

Post-fix validation runs:
  --workers=1: 10 iters × 30 specs = 300/300 pass, 0 flakes
  --workers=4: 10 iters × 30 specs = 300/300 pass, 0 flakes

Production code is NOT affected: callers of tree.select() in
production (server connect/disconnect callbacks, search-objects
result-click, etc.) hold the node reference they passed and don't
re-consume via tree.selected() afterward. The drift only surfaces
in programmatic select-then-consume patterns used by tests.
…l-by-default

Three test-infrastructure improvements developed during the flake
investigation that culminated in the tree.selected() drift fix:

1. Per-test disconnect (audit-helpers: disconnectServerViaApi) —
   afterEach hook fires DELETE /browser/server/connect/{gid}/{sid}
   for each connected server in the test's session. Releases the
   PG connection pool pgAdmin opened during the spec. Required
   because pgAdmin's connection_manager is keyed by Flask session
   (psycopg3 driver:78), so a worker that doesn't disconnect leaves
   the pool open until session GC — 30 specs × N iterations would
   otherwise exhaust PG max_connections. Includes the CSRF token
   from window.pgAdmin so the DELETE isn't silently rejected.
   Wired into audit-smoke-extended + audit-visual-regression.

2. AUDIT_DEBUG=1 instrumentation (audit-helpers: auditDebug, plus
   markers in ensureServerRegistered + the three navigate helpers
   + the smoke spec's openAndAssertClean + exerciseDialog). Silent
   by default. Surfaces phase-by-phase timing so flake
   investigation can see WHICH phase stalled (page boot vs server
   connect vs navigation vs dialog open vs interaction vs close)
   without re-instrumenting code each time.

3. Default workers: 4 (playwright.config.js). Validated on this
   branch with --workers=4 hitting 10 iters × 30 specs = 300/300
   with zero flakes after the drift fix landed (separate commit).
   --workers=8 also validated for one iteration (45s → 27s).
   Default kept at 4 for safer CI margin against PG max_connections.

Also: idempotent installErrorRecorders (stashes on page.__errorRecorder
and clears on reuse) — defensive against a possible future shift to
worker-scoped page fixtures. ensureLoaded() instead of tree.open()
for sub-collection children to bypass aspen's "already-opened"
short-circuit when the previous walk cached the node.

Run-time impact (workers=4 default): 30-spec suite finishes in
~45s vs ~162s serial = 3.6x speedup. Combined with the drift fix,
zero flakes across 300 specs in validation.
Static review of the 10 cherry-picked commits surfaced a few items:

H2 — exerciseDialog's Name fill in Edit mode was a latent danger.
The CI seed creates fixtures by name (audit_smoke_table,
audit_smoke_view, ...) which subsequent Edit-mode specs in the same
iteration look up by name. The close path auto-clicks "Yes" on the
"Discard changes?" confirm — which is currently correct. But if a
future pgAdmin change ever shifts that prompt's verb (e.g.
"Save changes?" with Yes meaning Save), Name fills would rename
seeded fixtures and break every subsequent Edit-mode test in the
run. Added `editMode` parameter to exerciseDialog; skip the Name
fill entirely in Edit mode. Create mode is unaffected — the dialog
mounts with no persisted record, save would create a NEW object.

H3 — Comment for the cell-fill step claimed "first input that
became editable in the newly-added row" but in practice
`dialog.locator('table input').first()` resolves to the first
input anywhere in the dialog, typically the first cell of the
first grid (NOT necessarily the newly added row). Updated the
comment to reflect actual behavior; the test still exercises a
SET_VALUE dispatch in a grid which is what matters for the walker
canary.

L1 — Stale comment in installErrorRecorders referenced a worker-
scoped page fixture that was dropped during the flake
investigation. Rewrote to describe the actual idempotency rationale.

L2 — Stale comment in audit-visual-regression about restarting
pgAdmin between capture/verify is no longer accurate — the
per-test afterEach disconnect (also added in this branch) handles
the connection cleanup. Updated.

L3 — Comment cited `test.use({ page: 'serial' })` which isn't a
real Playwright option. Corrected to the actual API
(`test.describe.configure({ mode: 'serial' })`).

Verified after fixes:
  - lint clean
  - audit-smoke (5+30): 35/35 pass
  - audit-visual-regression: 20/20 pass
  - Jest unit suite: 1290/1290 pass

Items NOT addressed (acceptable in current architecture):
  - H1: dialogLocator mask resolved lazily — works correctly under
    test-scoped page fixture; would need revisiting if worker-scoped
    is reintroduced.
  - H4: window.__pgadminLastNavigatedNode global stash not cleared
    after read — works correctly because each test gets a fresh
    window; documented contract.
Visual baselines now live under
`audit-visual-regression.spec.js-snapshots/<platform>/` instead of
the previous flat layout with `-darwin` filename suffix. The
unconditional `process.platform !== 'darwin'` skip is replaced with
"skip if no baselines exist for this platform" — so linux gets
coverage as soon as someone runs `--update-snapshots` on linux
and commits the resulting `linux/` directory. snapshotPathTemplate
in playwright.config.js wires it together.

Existing darwin baselines moved (via `git mv`) from
`<name>-darwin.png` to `darwin/<name>.png`. All 20 still match in
the new path.

Also: exerciseDialog's cell-fill step now targets the LAST row of
the dialog's tbody (the row just added by ADD_ROW), not the first
input anywhere. DataGridView appends new rows at the end of tbody,
so `tbody tr:last-of-type input:first-of-type` is the right
target. Best-effort still — many grids render the first cell as a
typeahead/dropdown that swallows the fill.

Verified after the move:
  - lint clean
  - audit-smoke-extended: 30/30 pass
  - audit-visual-regression: 20/20 pass on darwin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant