Skip to content

prompt-hook: HIGH-tier gate telemetry ('high-keyword'/'high-token') fires even when codegraph_explore errors or returns nothing, masking real failures from the gate's own measured recall funnel #1143

Description

@inth3shadows

Summary

The prompt-hook's HIGH tier calls codegraph_explore and only writes the <codegraph_context>
injection when the result succeeds and has content — correctly guarded. But the telemetry call
right after it fires unconditionally regardless of that guard, so a silent explore failure gets
recorded as a successful HIGH-tier fire rather than folding into the noop-* signal the repo's own
telemetry docs describe as the gate's measured recall/precision funnel.

Root cause

// src/bin/codegraph.ts:1131-1146
const result = await handler.execute('codegraph_explore', { query: prompt });
const text = result.content[0]?.text ?? '';
if (!result.isError && text.trim()) {
  // ... writes <codegraph_context> to stdout ...
}
gate(keyworded ? 'high-keyword' : 'high-token');
return;

gate(...) sits one indent level OUTSIDE the if (!result.isError && text.trim()) block — no
else, so it runs whether or not the injection actually happened.

// src/bin/codegraph.ts:1077-1079
const gate = (outcome: string): void => {
  try { getTelemetry().recordUsage('cli_command', `prompt-hook-gate-${outcome}`, true); } catch { /* never break the hook */ }
};

recordUsage's third argument (success/ok) is hardcoded true — the outcome name is the only
thing that varies, so gate('high-keyword')/gate('high-token') always records success regardless
of what codegraph_explore actually returned.

Why this matters here specifically

// docs/design/telemetry.md:75-81
The prompt hook additionally rolls up its gate DECISION as `cli_command`
counters named `prompt-hook-gate-<outcome>`, outcome ∈ `high-keyword` /
`high-token` / `medium-segment` / `nudge-projects` / `noop-shape` /
`noop-no-index` / `noop-unverified` — decision names only, never prompt
content. This is the gate's measured recall/precision funnel: a rising
`noop-*` share against the `high`/`medium` tiers is the signal that the
gate (keyword table or segment matching) is missing real questions.

That interpretation only holds if a high-* outcome reliably means the tier actually delivered
context. A silent codegraph_explore failure (transient error, or a query that happens to return no
relevant nodes) currently counts as a high-* success in that funnel instead of degrading it toward
noop-* — the exact signal this telemetry exists to surface.

Impact

I believe (not measured against live telemetry data — I don't have access to it) this understates
real HIGH-tier failures in the aggregate rollup, making the gate look more effective than it is on
any day/period where codegraph_explore errors or comes back empty for keyword/token-verified
prompts. I don't know how often that happens in practice.

Suggested fix

Move the gate(...) call inside the success branch, and add an explicit failure-outcome name to the
else (or fold it into an existing noop-* name — that's a naming call I'd leave to you, since the
telemetry doc's outcome enum is presumably meant to stay a fixed, documented set):

if (!result.isError && text.trim()) {
  // ... existing injection ...
  gate(keyworded ? 'high-keyword' : 'high-token');
} else {
  gate(keyworded ? 'high-keyword-failed' : 'high-token-failed');  // or reuse noop-unverified
}
return;

Verification / scope

Environment

Found on main (tip e699ee9, v1.2.0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions