diff --git a/examples/proposals/vertex_ai_provider.md b/examples/proposals/vertex_ai_provider.md new file mode 100644 index 000000000..60cfe2c8b --- /dev/null +++ b/examples/proposals/vertex_ai_provider.md @@ -0,0 +1,229 @@ +# RFC: Native Google Vertex AI Provider + +> Status: RFC / planning. This document proposes a future provider. Native +> Vertex AI support does **not** exist in PentAGI today; every variable, type, +> and migration named here is a candidate, not a shipped feature. The intent is +> to agree on direction (especially the open questions) before any code is +> written. + +## Summary + +This RFC proposes a native **Google Vertex AI** provider (`vertex`) so users can +authenticate with GCP project credentials (Application Default Credentials or a +service-account key) instead of an AI Studio API key. It is motivated by #310 +and #321: today the only Google option is the AI Studio `gemini` provider, which +accepts an API key against `https://generativelanguage.googleapis.com` and +cannot consume Vertex project/service-account credentials. Anthropic +Claude-on-Vertex is likewise not reachable through any current provider. + +The RFC recommends a **staged** approach: a small v1 that adds Gemini-on-Vertex +with ADC / service-account authentication, and a separately-decided follow-up +for Claude-on-Vertex. It deliberately stops short of prescribing the final code +because two design questions (adapter strategy and Gemini-vs-Claude scope) need +maintainer direction first. + +## Goals + +- Let users authenticate to Vertex AI with GCP **Application Default + Credentials** or a **service-account JSON** key, with explicit project ID and + region/location, rather than an AI Studio API key. +- Support Gemini models served through Vertex AI in a first iteration. +- Keep the new path additive: existing providers and their configuration are + untouched. +- Reuse existing request-shaping logic where it is safe to do so, to minimize + new surface area. + +## Non-Goals + +- Replacing or changing the existing AI Studio `gemini` provider. It stays as-is. +- Per-request dynamic credentials, multi-project routing, or credential rotation + (possible future work, explicitly out of scope here). +- Committing to Claude-on-Vertex in v1. Whether and how to add it is an open + question below, not a decision in this RFC. +- Any change to flow lifecycle, queueing, or persisted state. This is a provider + proposal only and introduces no hidden background state. + +## Current Provider Landscape + +PentAGI currently registers the following provider types: `openai`, +`anthropic`, `gemini`, `bedrock`, `ollama`, `custom`, `deepseek`, `glm`, +`kimi`, and `qwen`. The Google- and Anthropic-relevant options today are: + +- **Google AI Studio (`gemini`)**: API-key auth against + `https://generativelanguage.googleapis.com`. This is the consumer AI Studio + surface, not Vertex AI. It cannot accept a GCP project or service-account + credential. +- **Direct Anthropic (`anthropic`)**: `ANTHROPIC_API_KEY` / + `ANTHROPIC_SERVER_URL` against Anthropic's own API. Not Vertex. +- **AWS Bedrock (`bedrock`)**: Anthropic and other models via AWS, with a + multi-mode auth model (default AWS credential chain, bearer token, or static + access/secret keys). This is the closest existing precedent for + cloud-IAM-style provider auth. +- **Custom OpenAI-compatible (`custom`, `LLM_SERVER_*`)**: the present + workaround for Vertex is to front it with a **LiteLLM** proxy that exposes an + OpenAI-compatible endpoint, then point the `custom` provider at it. This works + but requires running and securing extra infrastructure, and it relies on the + proxy to translate Vertex auth and message schemas correctly. + +A native `vertex` provider would remove the need for the LiteLLM workaround for +the common Gemini-on-Vertex case. + +## Proposed v1 Scope + +Proposed for the first iteration: + +- A new `vertex` provider type that serves **Gemini models on Vertex AI**. +- Authentication via **ADC** or a **service-account JSON file**, plus explicit + **project ID** and **location**. +- Wiring through the same registration and validation path every other provider + uses, so the provider is selectable in flows and accepted by the REST API. + +Proposed to defer: + +- **Claude-on-Vertex** (Anthropic models through Vertex). See Open Questions Q2. +- Bearer-token / workload-identity auth beyond ADC and service-account file. +- Settings-UI configuration if the credential-file requirement makes env-only + configuration the safer starting point (Open Questions Q4). + +## Authentication Model + +Vertex AI uses GCP IAM (OAuth2 access tokens minted from ADC or a service +account), not a static API key. The AWS Bedrock provider already demonstrates a +multi-mode auth pattern in PentAGI, and a Vertex auth model could mirror its +shape: + +- **ADC (default)**: use Application Default Credentials resolved from the + environment (for example `GOOGLE_APPLICATION_CREDENTIALS`, a mounted metadata + service, or `gcloud auth application-default login`). Analogous to + `BEDROCK_DEFAULT_AUTH` using the AWS default credential chain. +- **Explicit service-account file**: a candidate `VERTEX_CREDENTIALS_FILE` + pointing at a mounted JSON key, used when ADC is not available. Analogous to + Bedrock static credentials. +- **Project and location**: candidate `VERTEX_PROJECT_ID` and `VERTEX_LOCATION` + (for example `us-central1`), which Vertex requires and which have no AI Studio + equivalent. +- **Optional regional/private endpoint**: a candidate `VERTEX_SERVER_URL` for + regional or private Service endpoints, analogous to `BEDROCK_SERVER_URL`. + +All of the above names are **candidate** keys for discussion, not shipped +configuration. + +## Provider Architecture Options + +The existing `gemini` provider is built on the langchaingo `googleai` client +configured for the AI Studio REST surface with an API key. Vertex AI changes +both the transport (GCP IAM auth, `aiplatform.googleapis.com` regional +endpoints) and, for Claude, the message schema. Two broad options: + +- **Option A - parameterize an existing adapter.** Add a Vertex transport/auth + mode to the Gemini path so request-shaping is shared and only auth + endpoint + differ. Lower duplication, but couples two surfaces that authenticate very + differently and risks regressing the stable AI Studio path. +- **Option B - a separate `vertex` package.** A dedicated provider that owns its + auth and endpoint logic and reuses request-shaping helpers where practical. + More code, cleaner separation, no risk to the existing `gemini` provider. + +This RFC leans toward **Option B for v1** (separation first, extract shared +helpers later if duplication proves real), but defers to maintainer preference +(Open Questions Q1). + +A key architectural note: **Claude-on-Vertex likely does not fit the same +adapter as Gemini-on-Vertex.** Gemini-on-Vertex uses the Gemini request/response +schema, while Claude-on-Vertex uses the Anthropic message schema over a Vertex +endpoint with GCP auth. That asymmetry argues for routing Claude-on-Vertex +through the **Anthropic** adapter with a Vertex auth/endpoint mode, rather than +bolting it onto a Gemini-shaped Vertex provider. Treating "Vertex" as one +monolithic provider for both model families would mix two schemas behind one +type. + +## Config and Migration Considerations + +A native provider would follow the repository's documented "Adding a New LLM +Provider" checklist (`CLAUDE.md`). At a high level that means, when +implementation is approved: + +- A `ProviderVertex` type constant and default provider name. +- Registration in the provider factory functions. +- Addition of `vertex` to the REST `Valid()` whitelist (without this the REST + API rejects the type with 422). +- Candidate config keys in the central config (the `VERTEX_*` names above). +- A goose migration adding `vertex` to the `PROVIDER_TYPE` enum, following the + enum-swap pattern used by the existing provider migrations under + `backend/migrations/sql/` (Up: recreate the enum including the new value; + Down: remove rows of the new type, then recreate the enum without it). + +The migration is the least reversible step and the Down path deletes any rows of +the new provider type, so it warrants explicit review. None of these changes are +part of this RFC; they are listed so the eventual implementation size is clear. + +## Frontend / Installer Considerations + +For parity with other providers, a future implementation would add a provider +icon and register it, and decide whether Vertex appears in the Settings UI. The +service-account-file requirement is the wrinkle: unlike an API key, a JSON key +is a file and should not be pasted into a web form or stored as plain settings +text. A reasonable starting point is **env/file-mounted configuration only**, +with Settings-UI support considered later (Open Questions Q4). The interactive +installer wizard could later grow a Vertex section that asks for project, +location, and credential-file path. + +## Testing Strategy + +- **Unit / config**: a future `vertex` provider would get the same provider + unit tests other providers have, exercised through the existing config-loading + path. +- **Provider validation**: the `ctester` utility (which tests LLM agent + capabilities and tool-calling agent types) would be the pre-merge smoke test + for the new provider once credentials are available. +- **Credentials caveat**: end-to-end testing requires real GCP credentials and a + Vertex-enabled project, which maintainers would need to supply or stub. This + is called out as a practical gating factor on any implementation PR. + +## Open Questions + +1. **Adapter strategy** - parameterize the existing Gemini adapter with a Vertex + transport/auth mode (Option A), or ship a separate `vertex` package + (Option B)? +2. **Scope** - Gemini-on-Vertex only in v1, or include Claude-on-Vertex? If + Claude-on-Vertex is in scope, should it route through the Anthropic adapter + (Vertex auth/endpoint mode) rather than a Gemini-shaped provider, given the + schema difference? +3. **Auth surface** - are ADC and a service-account JSON file sufficient for v1, + or is bearer-token / workload-identity auth (mirroring the Bedrock multi-auth + approach) also wanted? +4. **Web settings** - should Vertex be configurable from the Settings UI like + other providers, or env/file-mounted only at first, given the credential-file + requirement? + +## Security Considerations + +- **Service-account JSON is a sensitive secret.** It should be **file-mounted or + secret-managed**, never pasted into UI text, never committed, and never + written to logs. Provider initialization and any error surface must avoid + echoing credential contents or file paths beyond what is necessary to + diagnose a misconfiguration. +- **Least privilege**: documentation for any implementation should recommend a + dedicated service account scoped to the minimum Vertex AI prediction roles. +- **No hidden state**: this proposal adds a provider, not background lifecycle + state; credentials are supplied explicitly via env/mounted file and are not + cached or queued anywhere implicit. +- **Endpoint trust**: regional/private endpoint overrides should be validated so + a misconfigured `VERTEX_SERVER_URL` cannot silently redirect traffic. + +## Suggested First Milestone + +If maintainers confirm Option B and a Gemini-on-Vertex-only v1, a minimal first +PR could add the `vertex` provider package, the type constant and registration, +the REST whitelist entry, the candidate `VERTEX_*` config keys, the enum +migration, and `.env.example` plus docs - with Claude-on-Vertex, extra auth +modes, and Settings-UI support tracked as explicit follow-ups. Confirmation on +Open Questions Q1 and Q2 is the blocker before any of that work begins. + +## References + +- #310 - original Vertex AI configuration request (clarified that Vertex is not + natively supported today). +- #321 - native Vertex AI provider request and implementation outline. +- `CLAUDE.md` - "Adding a New LLM Provider" checklist. +- `backend/migrations/sql/` - existing provider enum-swap migrations that + demonstrate the `PROVIDER_TYPE` pattern.