Skip to content

Support the OAP admin-server REST API (swctl admin ...) and adapt to OAP 11.0.0#228

Merged
wu-sheng merged 2 commits into
masterfrom
support-admin-host-rest-api
Jun 3, 2026
Merged

Support the OAP admin-server REST API (swctl admin ...) and adapt to OAP 11.0.0#228
wu-sheng merged 2 commits into
masterfrom
support-admin-host-rest-api

Conversation

@wu-sheng
Copy link
Copy Markdown
Member

@wu-sheng wu-sheng commented Jun 3, 2026

Motivation

SkyWalking OAP 11.0.0 introduced an admin-server — a second HTTP surface (default port 17128) separate from the public GraphQL/MQE surface (12800) — that bundles the operator-facing feature modules (status, inspect, ui-management, dsl-debugging, receiver-runtime-rule), all enabled by default. swctl previously spoke only GraphQL and had no concept of the admin host, so operators had to curl these endpoints by hand. Several long-standing endpoints (cluster status, effective config, TTL, alarm runtime status) also relocated here in 11.0.0 and were never wrapped by swctl.

This PR gives swctl a first-class admin-host REST client and a swctl admin ... command tree covering every admin feature module, and adapts the existing commands to the OAP 11.0.0 breaking changes.

Admin REST surface (default port 17128)

  • New global --admin-url flag (env SW_ADMIN_URL, config key admin-url), derived from --base-url's host with port 17128 when unset. --username/--password/--authorization/--insecure apply to it the same way.
  • New pkg/transport (shared TLS / basic-auth, factored out of the GraphQL client) and pkg/admin/client (REST client with a typed error envelope) + pkg/admin/preflight (admin-module feature detection via /debugging/config/dump, with friendly "module not enabled / admin unreachable" messages).
  • swctl admin ... commands, one group per module:
    • admin preflight
    • status: admin cluster nodes, admin config dump|ttl, admin alarm rules|rule
    • inspect: admin inspect metrics|entities
    • ui-management: admin ui-template list|get|create|update|disable
    • runtime-rule: admin runtime-rule list|bundled|get|add|inactivate|delete|dump (raw-YAML upload, X-Sw-*/ETag/304, tar.gz)
    • dsl-debugging: admin dsl-debug status|sessions|session start|get|stop and admin oal files|file|rules|rule

OAP 11.0.0 adaptations

  • alarm list: migrate the deprecated getAlarmqueryAlarms, adding --layer and --rules filters.
  • menu get: detect the retired getMenuItems query and report a clear message ("OAP 11.0.0+ no longer serves the UI menu …") instead of a raw GraphQL error.

E2E

  • Bump the e2e OAP to an 11.0.0+ build and switch storage from Elasticsearch → BanyanDB (lighter to spin up).
  • basic: layer list made order-insensitive via yq sort; the trace cases migrated to trace-v2 (BanyanDB rejects the v1 trace API: "BanyanDB Trace Model changed, please use queryTraces").
  • New admin case (static admin REST) and live-debugging case (OAL live capture driving admin dsl-debug session — asserts the captured pipeline is exactly the bound metric: correct source → cpm() → output, with per-metric gate isolation).
  • New admin-command-tests and live-debugging-tests CI jobs.

Verification

All three e2e suites were run locally against the bumped OAP + BanyanDB and pass: basic (27 checks), admin (11 checks), live-debugging (OAL capture with exact-metric assertions) — plus manual write round-trips against the live backend (ui-template CRUD, runtime-rule add/inactivate/delete lifecycle). go build / go vet / go test / golangci-lint (0 issues) / license-eye all green.

Compatibility

Adds a >= 0.15.0 → >= 11.0.0 row to the compatibility table — the admin commands and queryAlarms require OAP 11.0.0+.

🤖 Generated with Claude Code

wu-sheng and others added 2 commits June 3, 2026 13:53
Add a first-class admin-host (REST) client and a `swctl admin ...` command
tree, alongside the existing GraphQL surface, and adapt to the OAP 11.0.0
breaking changes.

Admin REST surface (default port 17128):
- New global `--admin-url` flag (env SW_ADMIN_URL / config key `admin-url`),
  derived from `--base-url`'s host with port 17128 when unset.
- New pkg/transport (shared TLS/basic-auth) and pkg/admin/client REST client
  with a typed error envelope and admin-module preflight detection.
- `swctl admin ...` covering every feature module: preflight; cluster nodes,
  config dump/ttl, alarm rules/rule (status); inspect metrics/entities;
  ui-template list/get/create/update/disable; runtime-rule
  list/bundled/get/add/inactivate/delete/dump; dsl-debug status/sessions/
  session start|get|stop and oal files/file/rules/rule.

OAP 11.0.0 adaptations:
- alarm list: migrate getAlarm -> queryAlarms, add --layer/--rules filters.
- menu get: detect the retired getMenuItems and report a clear message
  instead of a raw GraphQL error.

E2E:
- Bump OAP to 11.0.0+, switch storage from Elasticsearch to BanyanDB.
- basic: layer list normalized via `yq sort`; trace cases migrated to
  trace-v2 (BanyanDB rejects the v1 trace API).
- New `admin` case (static admin REST) and `live-debugging` case (OAL live
  capture, asserting the captured pipeline is exactly the bound metric).
- New admin-command-tests and live-debugging-tests CI jobs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- live-debugging: drop the per-record `.rule.ruleName` gate-isolation check;
  the server does not reliably populate the OAL `.rule` envelope. Gate
  isolation is already verified from the `.dsl` source and the output samples
  (no output sample for any other metric).
- runtime-rule: send the rule file bytes verbatim instead of
  `TrimSpace(...) + "\n"`. The API hashes the raw body for contentHash /
  no-change detection, so normalizing whitespace could make a byte-identical
  rule look changed.
- admin client: use net.JoinHostPort when deriving the admin URL so IPv6 base
  URLs are bracketed correctly (http://[::1]:12800 -> http://[::1]:17128);
  add an IPv6 unit-test case.
- CI: hoist OAP_TAG to a single workflow-level env and drop the single-value
  matrix from the e2e jobs, so the job names no longer carry the commit SHA.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wu-sheng wu-sheng added this to the 0.15.0 milestone Jun 3, 2026
@wu-sheng wu-sheng added the enhancement New feature or request label Jun 3, 2026
@wu-sheng wu-sheng merged commit 612a2df into master Jun 3, 2026
7 checks passed
@wu-sheng wu-sheng deleted the support-admin-host-rest-api branch June 3, 2026 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants