feat(helm): Add monitoring.scrapeAnnotations to control whether or not to add prometheus.io/* annotations to Service by azagarelz · Pull Request #825 · Cloudzero/cloudzero-agent

azagarelz · 2026-05-25T22:30:13Z

Why?

When components.monitoring.enabled is true (or auto with CRDs installed), the chart creates ServiceMonitor that instruct the Prometheus Operator on how to scrape CZ metrics. In that configuration, the prometheus.io/* annotations on Services become redundant.
In clusters where both annotation-based discovery and CRD-based discovery are active simultaneously, CZ metrics may be scraped twice, generating unnecessary load or inconsistencies.

Setting scrapeAnnotations: false removes the annotations from Services, eliminating the risk of double scraping while keeping the ServiceMonitor based discovery intact. The default remains true for full backward compatibility.

What

Adds components.monitoring.scrapeAnnotations value (default: true)
When set to false, remove prometheus.io/* annotations from the agent, aggregator, and webhook Services
Updates monitoring-infrastructure.md to document the new flag and the double scraping scenario
Adds helm unit tests covering both true and false scenarios

How Tested

Helm unit tests added in charts/cloudzero-agent/tests/defaults_service_test.yaml covering:

Default behaviour: prometheus.io/scrape annotation present on all three Services
Opt-out: annotation absent on all three Services when scrapeAnnotations: false

…nnotations When monitoring.enabled is true (or auto with CRDs installed), the chart creates ServiceMonitor CRDs that instruct the Prometheus Operator to scrape CloudZero Agent metrics. In that configuration, the prometheus.io/* annotations on Services become redundant. In clusters where both annotation-based and CRD-based discovery are active simultaneously, metrics may be scraped twice, generating unnecessary load on the metrics pipeline. Adds components.monitoring.scrapeAnnotations (default: true) to address this. When set to false, prometheus.io/* annotations are omitted from the agent, aggregator, and webhook Services, eliminating the risk of double scraping while keeping ServiceMonitor-based discovery intact. Updates monitoring-infrastructure.md to document the new flag and the double scraping scenario, and adds helm unit tests covering both true and false states for all three affected Services. The default value of true preserves full backward compatibility for users relying on annotation-based Prometheus discovery without the Operator. This approach is purely additive, users without ServiceMonitors are unaffected, while Operator users gain a clean way to eliminate redundant scrape targets. A future consideration would be auto-disabling annotations when monitoring.enabled is true, though that would be a breaking change requiring a major version bump.

A customer contributed PR #825 proposing a components.monitoring.scrapeAnnotations boolean to drop the prometheus.io/* annotations and avoid double-scraping when ServiceMonitor-based discovery is in use. Rather than add a single-purpose boolean, this generalizes the chart's monitoring configuration: it decides what to emit based on whether the Prometheus Operator is present, and exposes the discovery mechanism as one explicit choice. It supersedes the scrapeAnnotations flag from PR #825. Implementation Approach: The chart emits two kinds of monitoring resources -- Prometheus Operator CRDs (ServiceMonitor + PrometheusRule) and prometheus.io/* annotations on the agent Services. Both are now driven by two values resolved in helm/templates/_helpers.tpl and consumed by the Service, ServiceMonitor, and PrometheusRule templates. Functional Requirements: 1. The Prometheus Operator resources must be created when the Operator is present and omitted otherwise, without the user having to configure it. Added components.monitoring.enabled (null | true | false, default null), resolved by cloudzero-agent.monitoring.operatorEnabled: null auto-detects the Operator via the monitoring.coreos.com/v1 capability, true forces on, false forces off -- the same idiom as integrations.istio.enabled. The four servicemonitor-*.yaml templates and prometheusrule.yaml are gated on it. Previously null defaulted to disabled while the feature was being validated; it now auto-detects, so an Operator cluster that relied on the default begins emitting these resources on upgrade. 2. Operators must be able to choose the discovery mechanism, including an annotation-only mode for clusters without the Operator. Added components.monitoring.discovery.method (auto | serviceMonitors | annotations, default auto). auto resolves to serviceMonitors today and is left as an enum so a future auto-detectable backend can extend it; annotations is an explicit opt-in and is never auto-selected. The prometheus.io/* annotations on agent-service.yaml, aggregator-service.yaml, and webhook-service.yaml are gated on cloudzero-agent.monitoring.annotationsActive. 3. The ServiceMonitors and the PrometheusRule must form a single Operator bundle; annotation-based discovery is discovery-only. serviceMonitorsActive and rulesActive resolve together on the serviceMonitors path, so choosing annotations yields no ServiceMonitors and no alert rules. 4. The new values must validate and be documented for users. Updated helm/values.schema.yaml (enabled: null|boolean; discovery.method enum) and regenerated helm/values.schema.json; set defaults and comments in helm/values.yaml; rewrote helm/docs/monitoring-infrastructure.md as user-facing reference, removing the prior validation-log content. Validation: - helm lint passing - helm unit tests passing; helm/tests/monitoring_integration_test.yaml and defaults_service_test.yaml were rewritten for the new model - helm schema tests passing, including new fixtures under tests/helm/schema for the default, serviceMonitors, annotations, none, and invalid-value cases. kubeconform validates the rendered ServiceMonitor and PrometheusRule against the CRD catalog - golden manifests regenerated; a default `helm template` (no Operator detected) emits no ServiceMonitors, no PrometheusRule, and no prometheus.io/* annotations - helm subchart tests passing Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A customer contributed PR #825 proposing a components.monitoring.scrapeAnnotations boolean to drop the prometheus.io/* annotations and avoid double-scraping when ServiceMonitor-based discovery is in use. Rather than add a single-purpose boolean, this generalizes the chart's monitoring configuration into an explicit, opt-in model and supersedes the scrapeAnnotations flag from PR #825. Implementation Approach: The chart emits two kinds of monitoring resources -- Prometheus Operator CRDs (ServiceMonitor + PrometheusRule) and prometheus.io/* annotations on the agent Services. Both are driven by two values resolved in helm/templates/_helpers.tpl and consumed by the Service, ServiceMonitor, and PrometheusRule templates. There is no cluster auto-detection; the configuration is explicit. Functional Requirements: 1. Monitoring must be off by default and turned on explicitly. Added components.monitoring.enabled as a plain boolean defaulting to false. The four servicemonitor-*.yaml templates, prometheusrule.yaml, and the prometheus.io/* annotations on the three Service templates are all gated on it via helpers in _helpers.tpl. Previously the operator CRDs defaulted to disabled while the feature was validated and the prometheus.io/* annotations were always emitted; both are now opt-in, so an upgrade emits nothing until enabled, and annotation-based scrapers must set discovery.method: annotations. 2. When enabled, the operator must be able to choose the discovery mechanism. Added components.monitoring.discovery.method (auto | serviceMonitors | annotations, default auto). auto resolves to serviceMonitors today and is left as an enum so a future mechanism can extend it; annotations is an explicit opt-in. serviceMonitors emits monitoring.coreos.com/v1 CRDs, so the install fails if the Prometheus Operator is absent; annotations needs no CRDs. 3. The ServiceMonitors and the PrometheusRule must form a single bundle; annotation-based discovery is discovery-only. serviceMonitorsActive and rulesActive resolve together on the serviceMonitors path, so method=annotations yields no ServiceMonitors and no alert rules. 4. The new values must validate and be documented for users. Updated helm/values.schema.yaml (enabled: boolean; discovery.method enum) and regenerated helm/values.schema.json; set defaults and comments in helm/values.yaml; rewrote helm/docs/monitoring-infrastructure.md as user-facing reference. Validation: - helm lint passing - helm unit tests passing; helm/tests/monitoring_integration_test.yaml and defaults_service_test.yaml were rewritten for the new model - helm schema tests passing, including new fixtures under tests/helm/schema for the default, serviceMonitors, annotations, none, and invalid-value cases. kubeconform validates the rendered ServiceMonitor and PrometheusRule against the CRD catalog - golden manifests regenerated; a default `helm template` emits no ServiceMonitors, no PrometheusRule, and no prometheus.io/* annotations - helm subchart tests passing Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

evan-cz · 2026-06-11T13:54:11Z

Hi @azagarelz. First off, thank you very much for the contribution! You're right, this is definitely an issue... sorry about that.

I've been thinking about this a bit, trying to decide on the best API, and I have some reservations on scrapeAnnotations as a boolean. What do you think of this for the defaults:

monitoring:
  enabled: false
  method: auto

The method is one of auto, serviceMonitors, annotations, and auto would be serviceMonitors if the monitoring.coreos.com/v1 CRD is present, annotations if not. So in your case, just setting components.monitoring.enabled = true should give you service monitors without annotations.

I opened up a new new PR #840 to implement it, largely because we can't actually merge external PRs right now (it's an issue with accessing secrets from people outside the organization), LMK if that sounds good to you, and would resolve your problem.

azagarelz changed the title ~~Add monitoring.scrapeAnnotations to control whether or not to add prometheus.io/* annotations to Service~~ feat(helm): Add monitoring.scrapeAnnotations to control whether or not to add prometheus.io/* annotations to Service May 25, 2026

azagarelz marked this pull request as ready for review May 25, 2026 22:32

azagarelz requested a review from a team as a code owner May 25, 2026 22:32

azagarelz force-pushed the feat/monitoring-scrape-annotations-toggle branch from f570231 to 20a7bf7 Compare May 25, 2026 22:37

evan-cz mentioned this pull request Jun 10, 2026

CP-42614: Add monitoring discovery model (enabled + discovery.method) #840

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(helm): Add monitoring.scrapeAnnotations to control whether or not to add prometheus.io/* annotations to Service#825

feat(helm): Add monitoring.scrapeAnnotations to control whether or not to add prometheus.io/* annotations to Service#825
azagarelz wants to merge 1 commit into
Cloudzero:developfrom
azagarelz:feat/monitoring-scrape-annotations-toggle

azagarelz commented May 25, 2026 •

edited

Loading

Uh oh!

evan-cz commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

azagarelz commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What

How Tested

Uh oh!

evan-cz commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

azagarelz commented May 25, 2026 •

edited

Loading