Skip to content

docs: refactor service registration brief to HTTP traffic metering#181

Open
JoseSzycho wants to merge 1 commit into
feat/service-catalog-registration-designfrom
feat/http-traffic-metering
Open

docs: refactor service registration brief to HTTP traffic metering#181
JoseSzycho wants to merge 1 commit into
feat/service-catalog-registration-designfrom
feat/http-traffic-metering

Conversation

@JoseSzycho

@JoseSzycho JoseSzycho commented Jun 10, 2026

Copy link
Copy Markdown

This PR refactors and expands the initial service catalog registration design brief into a structured, standard enhancement proposal focusing on HTTP Traffic Metering for Network Services.

It addresses and resolves all open questions (OD-1 through OD-8) carried over from the previous design brief.

Related to:

…e diagrams and Taskfile automation

This changes replaces the service-catalog-registration with documentation more focused in the architecture design and implementation
@JoseSzycho JoseSzycho changed the base branch from main to feat/service-catalog-registration-design June 10, 2026 15:03
@JoseSzycho JoseSzycho requested a review from scotwells June 10, 2026 15:06

@scotwells scotwells left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directionally the architecture looks good. Let's just clean up some of the implementation details. We want to keep this document product / consumer focused.


### Goals

- Define a standard `Service` and `ServiceConfiguration` to register Network Services under the service domain `networking.datumapis.com`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an implementation detail. I'd keep goals high level / consumer focused.

Comment on lines +38 to +40
This enhancement defines the architecture, data structures, and roadmap to bring HTTP traffic metering and catalog registration to Network Services. The work is split into two phases:
- **Phase 1 (Catalog & Metadata):** Declare a `Service` and a companion `ServiceConfiguration` resource (`services.miloapis.com/v1alpha1`) carrying the monitored-resource and meter declarations inline. This is a YAML-only delivery packaged in the `config/services/` bundle.
- **Phase 2 (Emission & Integration):** Configure Envoy Gateway proxy logging and deploy a custom Vector Agent to scrape access logs, parse billing signals into CloudEvents, and forward them to the local `billing-usage-collector-vector` DaemonSet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's important to highlight the phases in the enhancement. How the work breaks down and is sequenced is better for a project management tool (e.g. GitHub issues), not an enhancement document.


Network Services is a core utility that incurs direct infrastructure costs. Capturing consumption signals is necessary for platform billing and cost-attribution.

Because `MeterDefinition` fields (such as `meterName` and `measurement.unit`) are immutable once published, establishing correct definitions in the `Draft`/`Provisional` phase is critical. Doing so avoids costly SDK upgrades, meter deprecation cycles, and data migrations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like internal / implementation detail. The motivation should be more product / consumer focused.

- Altering the `MeterDefinition` schema or billing pipeline contract.
- Implementation of the core Billing SDK (owned by the Billing Team).
- Shared-infrastructure cost attribution or cross-project billing logic.
- **Deploying or modifying the Billing System or `billing-usage-collector-vector` DaemonSet.** These components are pre-existing, shared platform infrastructure. Our work is limited to deploying a custom Vector Agent (Log Parser) to parse logs and forward them to this existing collector.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a non-goal, it's more of an implementation detail.


- The **Monitored Resource** is the Kubernetes Gateway API `HTTPRoute` resource, representing the customer-facing HTTP endpoint.
- **Phase 1** registers the service with the service catalog via declarative YAML configurations. The service catalog fan-out controller automatically creates `MonitoredResourceType` and `MeterDefinition` resources in the billing namespace.
- **Phase 2** instruments the Envoy Gateway instances to write structured JSON access logs to stdout. A node-level Vector Agent (Log Parser) tails these logs, parses and maps the raw logs into CloudEvents, and forwards them locally via HTTP to the `billing-usage-collector-vector` DaemonSet. The billing collector then handles local disk buffering and reliably forwards them to the Billing System.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should measure the resource impact this would have on the worker node. It may make sense to have the existing vector agent handle collection from envoy and transforming the results into billable usage events.


---

### Service and ServiceConfiguration Definitions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's important to have the actual service configuration file here. That's an implementation detail of registering the metering definitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants