api: route per-call against unified hosts#5137
Conversation
The next PR teaches `databricks api` to detect workspace-vs-account scope per call. That decision needs a deny-list of paths under accounts/ that the SDK builds without an account-ID slot (workspace proxies). Hand-maintaining that list drifts from the SDK; this commit generates it. genpaths walks every service/*/impl.go in the pinned SDK with go/ast, classifies each `path :=` assignment, and emits cmd/api/paths_generated.go with a closed allowlist on account-ID source spellings. Refuses to emit prefixes that would over-match, fails loudly on idioms it doesn't recognize, handles var/define/assign forms and rejects compound assignments. Hooked into ./task generate-paths and the existing generated-files staleness gate in CI. The generated tables are not yet referenced at runtime; the next PR wires them in. Generated-file lint exclusions (lax rules) cover the unused declarations until then. Co-authored-by: Isaac
- Drop stdlib log in favor of fmt.Fprintf+os.Exit for the generator binary (depguard rule against the stdlib log package). - Make the path-class switch exhaustive by listing the no-op cases. - Lowercase the format-verb error string (staticcheck ST1005). Co-authored-by: Isaac
6db6f9e to
a775d99
Compare
databricks api host-agnosticKeep the unified-host routing support small for the initial PR by hand-maintaining the current workspace-proxy exceptions instead of parsing generated SDK source.
`databricks api <verb> <path>` previously bypassed the generated SDK
header logic and called client.Do directly, so on unified hosts where
workspace-vs-account routing is decided per-call it had no way to
distinguish the two. This wires up per-call detection using the
deny-list from the prior PR, plus three explicit overrides:
--account scope this call to the account API
--workspace-id <id> override the workspace routing identifier
{account_id} literal substituted from the active profile
Detection runs on URL.Path so query strings and fragments can't
false-match. The CLI-only WorkspaceIDNone sentinel (workspace_id =
none in .databrickscfg) is normalized to empty before the SDK's
idiomatic check sees it, so the literal "none" never goes on the
wire.
Behavior change for classic workspace profiles that have workspace_id
set: the routing identifier is now sent. Classic gateways ignore the
header so this should be benign; called out in the manual smoke plan
in case it surfaces.
Co-authored-by: Isaac
Keep the manual workspace-proxy list behind one helper so tests exercise the same path used by runtime account detection.
Approval status: pending
|
Add the standard OS and CI/cicd replacements to acceptance/cmd/api/test.toml and regenerate the recorded User-Agent strings to use os/[OS]. Without these, goldens generated locally on macOS contain os/darwin and no cicd/ segment, which fail on Linux + GitHub Actions where the SDK records os/linux ... cicd/github. Co-authored-by: Isaac
SPOG URLs from the Databricks UI carry the workspace ID as a query param (e.g. /api/2.2/jobs/list?o=7474644166319138). Recognize that param when present and use it as the routing identifier so pasted URLs route correctly without requiring --workspace-id. Precedence: --account > --workspace-id flag > ?o= > account-path auto-detect > profile workspace_id. Co-authored-by: Isaac
…' into simonfaltum/cli-api-spog-routing
cmd/api scripts pass POSIX paths like /api/2.0/clusters/list to the CLI. Git Bash on Windows converts a leading "/" arg to a Windows path, so the CLI sees /Program Files/Git/api/2.0/... and the testserver returns 404. Set MSYS_NO_PATHCONV=1 in the parent test.toml, matching the pattern used by cmd/workspace/export-dir-* and workspace/repos. Also regenerate out.test.toml with the diff-friendly inline format introduced by #5146. Co-authored-by: Isaac
| // correctly without requiring --workspace-id. | ||
| orgIDQueryParam = "o" | ||
|
|
||
| accountIDPlaceholder = "{account_id}" |
There was a problem hiding this comment.
This feels like it belongs in a different PR. The current impl doesn't interpolate account IDs and we don't need to start doing that (yet). Might be useful, but then we should see if we should interpolate more.
There was a problem hiding this comment.
That's fair - I just wanted to make it easy to copy from the docs.
| "/api/2.0/preview/accounts/access-control/assignable-roles": {}, | ||
| "/api/2.0/preview/accounts/access-control/rule-sets": {}, | ||
| } | ||
|
|
There was a problem hiding this comment.
The need for this list is unfortunate. What happens if we tack on the ?o= regardless?
There was a problem hiding this comment.
Or can we match on /accounts and /preview/accounts?
There was a problem hiding this comment.
You have the workspaceProxyPrefixes that makes that matching not possible afaik
| @@ -0,0 +1 @@ | |||
| $CLI api get /api/2.0/clusters/list --account | |||
There was a problem hiding this comment.
The goal is that we don't see any org ID in the request log, correct?
If so, it's worth capturing in another call here that confirms that assertion.
Easy to miss the intent otherwise.
Why
databricks apialways sent the workspace routing identifier (X-Databricks-Org-Id) when the profile had one, even when the path was an account API. On unified hosts (one host serving both workspace and account APIs) this misrouted account calls. There was also no way to explicitly route a call to the account API, override the identifier per call, or substitute the account ID into a path.Changes
Before: routing was decided once from the profile and applied to every call.
Now: routing is decided per call from the path being requested.
/accounts/{id}/are auto-detected as account-scope; the routing identifier is dropped.cmd/api/paths.gocarves out workspace-routed proxy APIs that happen to live under/accounts/, so they keep the identifier.--accountforces account-scope on a non-/accounts/path.--workspace-id <id>overrides the identifier per call. Mutually exclusive with--account.{account_id}in the path is substituted from the active profile'saccount_id. Missingaccount_idreturns an actionable error before any request is sent.workspace_id = nonesentinel is stripped before the routing decision so the literal "none" never goes on the wire.Routing logic lives in pure functions (
substituteAccountID,hasAccountSegment,resolveOrgID,normalizeWorkspaceID,isWorkspaceProxyPath) that take primitives. The cobraRunEis a thin adapter that resolves config and calls them.Test plan
go test ./cmd/apicovers the helpers with table-driven cases: deny-list hits and misses, query/fragment edge cases, mutual-exclusion errors, sentinel stripping, and{account_id}substitution including the missing-account-id error.go test ./acceptance -run TestAccept/cmd/apiexercises eight variants end to end: workspace path, account path, deny-listed proxy under/accounts/,--account,--workspace-id,{account_id}substitution, missingaccount_id, andworkspace_id = none. Each runs against terraform and direct engines.make checks