Skip to content

feat(tokowaka-client): add CloudFront Optimize-at-Edge control-plane#1722

Merged
nit23uec merged 15 commits into
mainfrom
feat/edge-optimize-cf-automation-simplified
Jun 27, 2026
Merged

feat(tokowaka-client): add CloudFront Optimize-at-Edge control-plane#1722
nit23uec merged 15 commits into
mainfrom
feat/edge-optimize-cf-automation-simplified

Conversation

@ABHA61

@ABHA61 ABHA61 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What

Adds the CloudFront "Optimize at Edge" control-plane to @adobe/spacecat-shared-tokowaka-client. This is the AWS automation that onboards a customer's CloudFront distribution to Adobe LLM Optimizer —
assuming a cross-account connector role and wiring the distribution to route agentic/LLM-crawler traffic through the Edge Optimize origin.

Why

The control-plane belongs in the shared client (which already owns the CloudFront invalidation layer), so it can be reused and the spacecat-api-service controller stays thin. It lands under
src/cdn/cloudfront/ next to the existing CloudFront code.

What it provides

A CloudFrontEdgeClient, initialized once with the assumed connector credentials, that performs every step of the onboarding:

  • Connect — assume the customer's cross-account connector role (assumeConnectorRole).
  • Discover — list distributions and read a distribution's origins / cache behaviors.
  • Provision — add the Edge Optimize origin, create the viewer-request CloudFront Function, clone/update the cache policy, and create the Lambda@Edge origin handler (+ its execution role).
  • Associate — attach the function and Lambda@Edge to the chosen cache behavior.
  • Verify — probe the live distribution to confirm routing is active.
  • OrchestraterunDeployStep (step-on-poll, non-blocking) and planDeploy (non-mutating dry-run) drive the full sequence and report per-step status.
import { assumeConnectorRole, CloudFrontEdgeClient } from '@adobe/spacecat-shared-tokowaka-client';

const { credentials } = await assumeConnectorRole({ accountId, externalId, roleName });
const cf = new CloudFrontEdgeClient({ credentials });
const plan = await cf.planDeploy(params);   // preview
const step = await cf.runDeployStep(params); // execute, one step per poll

Scope

  • New: src/cdn/cloudfront/index.js (control-plane) and src/cdn/cloudfront/edge-code.js (CloudFront Function + Lambda@Edge source builders).
  • src/index.js / src/index.d.ts: public exports + typings (CloudFrontEdgeClient, AWSCredentials, DeployStep, plus the standalone functions and builders).
  • package.json: adds @aws-sdk/client-sts, -iam, -lambda.
  • No changes to existing client functionality.

Tests

src/cdn/cloudfront/** at 100% line / branch / statement / function coverage (package all: true gate); full suite green.

ABHA61 and others added 2 commits June 25, 2026 14:11
Move the BYOCDN "Optimize at Edge" CloudFront control-plane into the shared
client as free functions, re-exported from the package root: assume-role,
distribution discovery, origin/routing-function/cache/Lambda@Edge provisioning,
behavior association, routing verification, and the step-on-poll deploy
orchestrator + dry-run plan. Adds @aws-sdk/client-sts, -iam, -lambda.

Migrated verbatim from spacecat-api-service src/support/edge-optimize.js
(+ edge-optimize-edge-code.js). Option A (Adobe-managed assume-role) only:
drops cleanupHeaderValue; routing verification probes the real forwarded host.

100% line/branch/statement/function coverage; full package suite (938) passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

This PR will trigger a minor release when merged.

@@ -0,0 +1,149 @@
/*
* Copyright 2026 Adobe. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move all this to cloudfront folder.

@ABHA61 ABHA61 Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall i move to src/cdn/cloudfront/ ? and by means all this , all these changes? or just this lambda and function code?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all of the changes
Later on we will extract it in a new pkg

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved the whole Edge Optimize control-plane to src/cdn/cloudfront/ (index.js + edge-code.js, test under test/cdn/cloudfront/) in 759f918. Public root exports are unchanged. Noted on extracting it into a separate package later.

…e create

When createEdgeOptimizeLambda creates the IAM exec role in the same call and the
function is still missing, return { status: 'provisioning', functionArn: null }
immediately instead of waiting ~12s for role propagation and then running
CreateFunction in the same request. The next poll (role now exists) performs the
create after the role has propagated. This keeps the first deploy poll well under
the CDN ~60s first-byte timeout, the root cause of the intermittent 503. Removes
the now-unused roleWaitMs option; tests updated, 100% coverage retained.

Co-authored-by: Cursor <cursoragent@cursor.com>
@nit23uec nit23uec requested a review from MysticatBot June 25, 2026 12:19

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ABHA61,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos’ docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Request changes - functional blocker (nodejs24.x runtime) and several reliability/correctness gaps that should be fixed before this ships to customers.
Complexity: HIGH - large diff; new dependency surface + cross-account AWS control-plane.
Changes: Moves the BYOCDN “Optimize at Edge” CloudFront control-plane into @adobe/spacecat-shared-tokowaka-client as free functions with full TypeScript declarations (7 files).

Must fix before merge

  1. [Critical] nodejs24.x Lambda runtime does not exist - will hard-fail every fresh Lambda@Edge deploy - src/edge-optimize/index.js:823 (details inline)
  2. [Important] verifyEdgeOptimizeRouting has no fetch timeout - can hang the poll request and cause 503 from gateway - src/edge-optimize/index.js:1146 (details inline)
  3. [Important] listCloudFrontDistributions does not paginate - customers with 100+ distributions hang at propagation - src/edge-optimize/index.js:167 (details inline)
  4. [Important] planEdgeOptimizeDeploy reports "blocked" when already EO-associated - success state reads as error - src/edge-optimize/index.js:1710 (details inline)
  5. [Important] Template string injection in code generators - values interpolated into JS without escaping - src/edge-optimize/edge-code.js:83 (details inline)
  6. [Important] Lambda code never updated once a published version exists - template fixes won’t reach existing customers - src/edge-optimize/index.js:872 (details inline)
Non-blocking (8): minor issues and suggestions
  • nit: JSDoc for createEdgeOptimizeRoutingFunction missing distributionId param - source docs disagree with implementation - src/edge-optimize/index.js:378
  • suggestion: Add URL scheme validation (https:// only) and metadata-endpoint blocklist to verifyEdgeOptimizeRouting for SSRF defense-in-depth - src/edge-optimize/index.js:1173
  • suggestion: Scope the logs:CreateLogGroup IAM resource from arn:aws:logs:*:${accountId}:* to the specific log group path - src/edge-optimize/index.js:623
  • nit: EDGE_OPTIMIZE_CACHE_POLICY_NAME exported but never used - src/edge-optimize/index.js:79
  • suggestion: 1729-line index.js mixes IAM, Lambda, CF Function, cache policy, and orchestration concerns - consider splitting into focused submodules - src/edge-optimize/index.js
  • suggestion: Consider a package.json "exports" map so consumers not using edge-optimize avoid loading 3 new AWS SDK clients - package.json
  • nit: Bot UA in verify probe is 'chatgpt-user' while routing function list uses 'ChatGPT-User' - works via .toLowerCase() but may drift - src/edge-optimize/index.js:1177
  • suggestion: Add a zip roundtrip decompression assertion in tests - hand-rolled binary format needs a decode-and-compare test - test/edge-optimize/index.test.js:1017

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 0m 23s | Cost: $8.71 | Commit: 7550bee85494f194260e9505d1e948ca1561f533
If this code review was useful, please react with 👍. Otherwise, react with 👎.

for (let attempt = 0; attempt < 3; attempt += 1) {
try {
const created = await lambda.send(new LambdaCreateFunctionCommand({
FunctionName: lambdaName,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): nodejs24.x Lambda runtime does not exist in AWS. Runtime: "nodejs24.x" will fail with InvalidParameterValueException on every fresh Lambda@Edge deployment. As of mid-2025, the latest Lambda@Edge-supported Node.js runtime is nodejs20.x (or nodejs22.x in some regions). Why it matters: This is a hard functional failure - no customer can onboard until this is fixed. Fix: Replace with nodejs20.x (current LTS, broadly supported for Lambda@Edge). Consider extracting it to a named constant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://aws.amazon.com/blogs/compute/node-js-24-runtime-now-available-in-aws-lambda/
Runtime: "nodejs24.x" is available from long time and now aws has stopped the support for nodejs20.x. so this is correct.

}

async function fetchEdgeOptimizeHeaders(url, userAgent) {
const response = await fetch(url, {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): No timeout on outbound fetch calls in fetchEdgeOptimizeHeaders. Both fetch calls have no AbortController/signal timeout. If the customer distribution is still deploying or has a non-responsive origin, fetch can hang indefinitely. Since the orchestrator sits behind a CDN/gateway with a ~60s first-byte timeout, a single hung probe causes a 503. Fix: Add an AbortController with a 10-15s timeout per probe. Treat a timeout as passed: false so the next poll retries.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4d9bec0. fetchEdgeOptimizeHeaders now wraps each probe in an AbortController (constant EDGE_OPTIMIZE_VERIFY_PROBE_TIMEOUT_MS = 20000, clearTimeout in finally). On timeout/abort or network error the probe resolves to a non-passing { status: 0, headers: {} } instead of throwing, so verifyEdgeOptimizeRouting returns passed:false and the poll loop retries on the next cycle — a single hung probe can no longer breach the ~60s gateway first-byte budget. Chose 20s (vs 10–15) to be generous for a cold ChatGPT-User prerender while staying well under 60s.

* @returns {Promise<Array<object>>} distributions projected to the fields the wizard needs.
*/
export async function listCloudFrontDistributions(credentials, region = EDGE_OPTIMIZE_REGION) {
const client = new CloudFrontClient({ region, credentials });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): listCloudFrontDistributions does not paginate. ListDistributionsCommand({}) returns at most 100 items per call. The propagation step uses this to find the distribution by ID - customers with 100+ distributions will hang at "waiting for the distribution to appear" forever. Fix: Either paginate with NextMarker/IsTruncated, or replace the propagation list-and-find with a direct GetDistribution call which returns Status directly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4d9bec0 — did both: listCloudFrontDistributions now paginates the Marker/IsTruncated/NextMarker chain (so the picker handles 100+ distributions), and the propagation gate no longer list-and-finds — it calls GetDistributionCommand({ Id }) directly for Status/DomainName.

const assocBehavior = config ? getBehaviorFromConfig(config, behavior) : null;
const assocConflict = assocBehavior
? findEdgeOptimizeAssociationConflict(assocBehavior, behavior) : null;
if (await isBehaviorAlreadyAssociated(client, distributionId, behavior)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): planEdgeOptimizeDeploy reports canProceed: false when the behavior is already fully EO-associated. A customer who completed the wizard will see a "blocked" state when they re-run the plan. The deploy orchestrator correctly handles this (marks associate: done), but the plan endpoint rejects it. Fix: Return canProceed: true with steps[associate].action = "exists" when all associations are EO-owned. Keep the blocker only for the conflict case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not return proceed for now in this case. Can only give message of if already EO associated, than do the manual changes.Should not proceed to next Deploy step.


console.log("Routing to Default origin for userAgent: " + userAgent);
return request;
}`;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): Template string injection - defaultOriginId interpolated into generated JS without escaping. The value is interpolated directly into JavaScript source that becomes a CloudFront Function. Same applies to eoOriginDomain in buildEdgeOptimizeLambdaCode. Why it matters: Defense-in-depth at a code-generation trust boundary. Fix: Validate with a strict character set before interpolation, or use JSON.stringify(value) for proper escaping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it's a code-gen trust boundary worth flagging. In this flow both values are low-risk: eoOriginDomain is our own env-controlled config, and defaultOriginId is read from the customer's own distribution and deployed back into the customer's own CloudFront Function using their own assumed role — so the blast radius is the customer's own account, not cross-tenant or our control plane. We're not changing it in this release; tracking the JSON.stringify/charset-validation hardening as a follow-up. Happy to pull it in before merge if you'd prefer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid code duplication and prefer a single source of truth for this code.

}

// Active and idle. If a numbered version already exists, reuse it (idempotent).
const existingVersion = await getLatestLambdaVersion(lambda, lambdaName);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): Lambda code is never updated once a published version exists. Once getLatestLambdaVersion finds a numbered version, createEdgeOptimizeLambda returns alreadyExisted: true without comparing the code hash. If the template changes, redeploying to an existing customer will never update their Lambda code. Fix: Compute the expected CodeSha256 from the zip buffer, compare to existingVersion.codeSha256, and if they differ, call UpdateFunctionCode + PublishVersion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not updating as of now as part of this release. Once it is created the same version can be used. Rest update in lambda and creating new version and associating new version with Behaviour is manual

@MysticatBot MysticatBot added ai-reviewed Reviewed by AI complexity:high High complexity PR labels Jun 25, 2026
(o) => o.Id === EDGE_OPTIMIZE_ORIGIN_ID || o.DomainName === originDomain,
);

if (existing) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it already exists, we shouldn't proceed any further. Instead, we should roll back all the steps that have already been completed. Not sure if i am missing something here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create steps (origin/function/cache/lambda) are intentionally idempotent rather than rollback-on-exists. Every per-distribution resource is named with an -adobe-${distributionId} suffix — edgeoptimize-origin-adobe-<id> (Lambda), edgeoptimize-routing-adobe-<id> (CloudFront Function), the cloned cache policy <name>-adobe-<id> (e.g. CachingOptimized-adobe-E2VLBZCBR857CC), and the role. Before creating any of these we look it up by that suffixed name: if it exists we reuse it (Lambda/CF Function reuse the published version / update in place; for a managed source policy like CachingOptimized we associate the existing CachingOptimized-adobe-<id> clone if one exists — even if it isn't currently attached — rather than creating a duplicate), and only create when it's absent. The wizard is poll-driven and re-entrant, so each step must be safe to re-run; rolling back completed steps on a re-run would break that and could tear down a working setup mid-deploy. The deploy plan also surfaces, per resource, whether it's being created, reused, or updated for full visibility. The hard stop you're thinking of is applied at the associate step instead: when the behavior is already EO-associated we don't proceed and surface a message for manual handling. Let me know if you're seeing a specific gap.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it exists and it was created by us (through metadata) - we should proceed ahead and reuse it.
if it exists but was not created by us - then we should stop.

* @param {string[]|null} [targetedPaths] - explicit paths to target, or null for "all HTML pages".
* @returns {string} the CloudFront Function source code.
*/
export function buildRoutingFunctionCode(defaultOriginId, targetedPaths = null) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildCloudfrontFunctionCode

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — renamed buildRoutingFunctionCodebuildCloudfrontFunctionCode in e0c8486 (definition, re-exports, type decl, and tests). It's internal to the package (not consumed by spacecat-api-service), so the rename is self-contained.

* @param {string} eoOriginDomain - the Edge Optimize origin domain baked into the routing check.
* @returns {string} the Lambda@Edge function source code.
*/
export function buildEdgeOptimizeLambdaCode(eoOriginDomain) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we also thought about how we will keep the code in sync with the LLMO code samples whenever changes are required there?

Will these two codebases always remain separate, or are we planning to publish the LLMO code samples as an npm package and consume them here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either LLMO code samples as an npm package and consume them here , or i was reading about
Build-time vendoring - A prebuild/CI step downloads the files from a pinned tag/commit of llmo-code-samples → generates a bundled module → helix-deploy ships it; + a CI drift-check
Need to explore more. But yes would sync with llmo-code-sample apparently . Not considered this as P0 for our first release

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is handled in cloudflare by fetching it directly from github: https://github.com/adobe/spacecat-api-service/blob/4d58b178322eb3fc32838160e4a588ade20e1d23/src/controllers/llmo/llmo-cloudflare.js#L105 - pinning a commit id to avoid any non tested code to go through from main

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we will handle this in the next release?

// Name of the custom cache policy we create when cloning an AWS-managed policy.
export const EDGE_OPTIMIZE_CACHE_POLICY_NAME = 'edgeoptimize-cache';

// Per the BYOCDN doc, force the cache policy MinTTL to 0 so agentic responses are not

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a decision the customer needs to make. Can the customer make that decision during the deployment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree . As of now we are showing a message that we are making TTL 0 as a change on the Review Window. Can take this from customer as well in inputs and can give default as 0 in that. I can take this as well in next release. Noted at my end.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image image

@jindaliiita

Copy link
Copy Markdown
Contributor

I just had a quick look at it from a high level. I still need to review it in detail.

Akash Bhardwaj and others added 4 commits June 26, 2026 08:12
…/cloudfront

Addresses review nit (nit23uec): relocate the CloudFront-specific Edge Optimize
control-plane and edge runtime code into the cdn/cloudfront folder, alongside the
existing CDN clients. Pure move — public API (package root re-exports) unchanged.

  src/edge-optimize/index.js     -> src/cdn/cloudfront/index.js
  src/edge-optimize/edge-code.js -> src/cdn/cloudfront/edge-code.js
  test/edge-optimize/index.test.js -> test/cdn/cloudfront/index.test.js

Co-authored-by: Cursor <cursoragent@cursor.com>
…bution lookup

Two AWS-reliability fixes in the CloudFront Edge Optimize control plane:

- Verify probes (fetchEdgeOptimizeHeaders) now run under a bounded 20s
  AbortController timeout (EDGE_OPTIMIZE_VERIFY_PROBE_TIMEOUT_MS). On
  timeout/abort or any network error the probe resolves to a non-passing
  result instead of throwing, so verifyEdgeOptimizeRouting returns
  passed:false and the FE poll loop retries — a hung origin can no longer
  cascade into a gateway 503. clearTimeout runs in a finally.
- listCloudFrontDistributions now paginates the Marker/IsTruncated/
  NextMarker chain so accounts with >100 distributions are fully listed.
- The deploy propagation gate now reads this distribution directly via
  GetDistributionCommand instead of scanning the (previously unpaginated)
  list, preserving the existing not-found / not-Deployed / Deployed / error
  branches and distDomain reuse.

Public export surface unchanged; EDGE_OPTIMIZE_VERIFY_PROBE_TIMEOUT_MS is
the only additive export.

Co-authored-by: Cursor <cursoragent@cursor.com>
…oudfrontFunctionCode

Rename the CloudFront Function code-builder to a platform-typed name per
review feedback on #1722. Internal-only symbol (not consumed by
spacecat-api-service); updates definition, re-exports, type decl, and tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
The bare `cf` prefix is misleading since `cf` denotes Cloudflare elsewhere
in this codebase. Spell out CloudFront for the function ARN returned by
applyEdgeOptimizeAssociations.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ABHA61 ABHA61 requested a review from MysticatBot June 26, 2026 07:51

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ABHA61,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Request changes - five reliability/hardening gaps remain that should be addressed before this ships to customer accounts.
Complexity: HIGH - large diff; new dependency surface + cross-account AWS control-plane.
Changes: Moves the BYOCDN "Optimize at Edge" CloudFront control-plane into @adobe/spacecat-shared-tokowaka-client as free functions with full TypeScript declarations (7 files).

Must fix before merge

  1. [Important] getLatestLambdaVersion does not paginate ListVersionsByFunction (50-version ceiling) - src/cdn/cloudfront/index.js:988 (details inline)
  2. [Important] logs:CreateLogGroup IAM grant is overly broad - least-privilege gap in customer accounts - src/cdn/cloudfront/index.js:908 (details inline)
  3. [Important] verifyEdgeOptimizeRouting has no URL scheme validation - SSRF defense-in-depth on a publicly exported function - src/cdn/cloudfront/index.js:1479 (details inline)
  4. [Important] No format validation for distributionId - malformed IDs could hit AWS naming limits or create unexpected resources - src/cdn/cloudfront/index.js:1029 (details inline)
  5. [Important] No retry on ETag conflict (PreconditionFailed) in UpdateDistribution calls - concurrent edits cause transient 412 errors the orchestrator does not recover from within a single poll - src/cdn/cloudfront/index.js:611 (details inline)
Non-blocking (5): minor issues and suggestions
  • nit: EDGE_OPTIMIZE_CACHE_POLICY_NAME exported but never used anywhere in the code - dead export may confuse consumers - src/cdn/cloudfront/index.js:349
  • nit: ResourceConflictException path in createEdgeOptimizeLambda returns functionArn: undefined instead of null, mismatching the TypeScript declaration - src/cdn/cloudfront/index.js:1142
  • suggestion: Accept an optional timeoutMs parameter on verifyEdgeOptimizeRouting so integration tests against slow staging environments can override the 20s default - src/cdn/cloudfront/index.js:1479
  • suggestion: Update the package README to mention the new CloudFront control-plane exports (12 functions + 10 constants) for discoverability - packages/spacecat-shared-tokowaka-client/README.md
  • suggestion: Consider a package.json "exports" map so consumers not using edge-optimize avoid loading 3 new AWS SDK clients on cold-start - packages/spacecat-shared-tokowaka-client/package.json

Previously flagged, now resolved

  • Fetch timeout on verify probes now bounded (AbortController, 20s)
  • listCloudFrontDistributions now paginates via Marker/IsTruncated chain
  • GetDistributionCommand used directly for propagation gate (no list-and-find)
  • Files moved to src/cdn/cloudfront/ per reviewer request
  • buildRoutingFunctionCode renamed to buildCloudfrontFunctionCode

const iam = new IAMClient({ region, credentials });

// Execution role status: present AND correctly configured (trust + logs policy). roleOk gates the
// deploy's "done" decision so a missing OR mis-configured role is healed even when the function

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): getLatestLambdaVersion does not paginate ListVersionsByFunction. The API returns at most 50 versions per page. A function published 50+ times (common during iterative development) will have its latest version beyond the first page, causing this code to sort only the first page and potentially miss the true latest. The orchestrator would then publish a redundant new version on every poll cycle.

Why it matters: Redundant version publishes create version churn and waste the 75-version soft limit per function.

Fix: Paginate to the last page (versions are returned in ascending order, so only the final page is needed), or pass MaxItems and document the ceiling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is a valid hardening item, but we are deferring it from the current blocker set. The current onboarding path creates/reuses one Lambda@Edge version per distribution, so 50+ versions is unlikely unless someone is manually iterating on the same function. I added this to the punch list as P1: paginate ListVersionsByFunction with Marker/NextMarker before we support automated Lambda code update/re-publish flows.

functionArn: cfg.FunctionArn,
versionArn: published.FunctionArn, // includes the :N version suffix
version: published.Version,
roleArn,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): logs:CreateLogGroup IAM resource is overly broad. The first statement grants logs:CreateLogGroup on arn:aws:logs:*:${accountId}:* - all regions, all log group names in the customer account. The second statement already scopes CreateLogStream/PutLogEvents to the specific function log group.

Why it matters: This role is created in the customer's AWS account. Overly broad grants are a compliance and audit concern.

Fix: Scope the resource to the specific log group pattern, matching the second statement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked this against the current CloudFront BYOCDN sample/docs. The published cloudwatch-policy.json currently grants logs:CreateLogGroup on arn:aws:logs:*:ACCOUNT_ID:*, and the Experience League guide points customers to that policy because Lambda@Edge writes logs from edge regions while using the /aws/lambda/us-east-1.<functionName> log-group naming pattern. This implementation is intentionally mirroring that public policy.

I agree this can be tightened as a least-privilege hardening in the product automation. If we decide to diverge from the current published sample, the concrete change would be to scope CreateLogGroup to arn:aws:logs:*:${accountId}:log-group:/aws/lambda/us-east-1.${functionName} and keep CreateLogStream/PutLogEvents on arn:aws:logs:*:${accountId}:log-group:/aws/lambda/us-east-1.${functionName}:*; we should then update the sample/docs in the same follow-up so they stay aligned.

return { routingDeployed, verified, steps };
}
const result = await verifyEdgeOptimizeRouting(`https://${domain}/`);
// Per-probe summary the wizard renders (Human vs Agentic): UA, HTTP status, the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): verifyEdgeOptimizeRouting performs outbound fetch to a caller-supplied URL with no scheme validation. The function is publicly exported. While current callers construct the URL from trusted sources, the exported surface area will gain new callers over time.

Why it matters: Defense-in-depth at a trust boundary. Without scheme validation, a caller passing unsanitized input enables SSRF against cloud metadata endpoints.

Fix: Validate https: scheme at the top of the function before issuing the fetch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this suggestion. We should consider contructimg URL object inside the function impl


// Re-read the distribution for a fresh ETag, repoint the behavior to the new custom policy.
const freshDist = await client.send(new GetDistributionConfigCommand({ Id: distributionId }));
const freshConfig = freshDist.DistributionConfig;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): No retry on ETag conflict in UpdateDistribution calls. The orchestrator performs multiple sequential UpdateDistributionCommand calls across steps (origin, cache, association), each reading the config and ETag independently. A concurrent edit between the read and write causes PreconditionFailed (412).

Why it matters: In a poll-driven system where multiple tabs or retries can overlap, 412 errors will surface as intermittent failures. The next poll would succeed (idempotent steps), but the current poll reports error status, which may confuse the wizard UI.

Fix: Extract a bounded-retry wrapper for the read-modify-write pattern (re-read config on 412, reapply mutation, retry with fresh ETag, max 2 attempts). Apply in createEdgeOptimizeOrigin, applyEdgeOptimizeCacheHeaders, and applyEdgeOptimizeAssociations.

functionArn: cfg.FunctionArn,
versionArn: latest?.versionArn || null,
version: latest?.version,
ready,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): No format validation for distributionId. The parameter is checked only with hasText() but is interpolated into CloudFront Function names, Lambda function names, and IAM role names (all have a 64-char limit). A malformed or excessively long value could hit naming limits in non-obvious ways. CloudFront distribution IDs are 13-14 uppercase alphanumeric characters. Validating the format at the boundary provides fail-fast behavior and documents the constraint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is useful fail-fast validation, but we are not taking it as a current release blocker. Normal FE flow selects the distributionId from AWS results and AWS still performs the existence check; the added value here is boundary hardening before the id is reused in per-distribution resource names. I added it to the punch list as P1.

/**
* Temporary AWS credentials returned by {@link assumeConnectorRole}.
*/
export interface EdgeOptimizeCredentials {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWSCredentials

/**
* Create or update the routing CloudFront Function and publish it to LIVE (idempotent).
*/
export function createEdgeOptimizeRoutingFunction(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then naming should be generic enough for aws client - createCloudfrontFunction, createLambdaAtEdge, etc.

/**
* Add the Edge Optimize routing headers to the cache key for the target behavior.
*/
export function applyEdgeOptimizeCacheHeaders(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateCacheSettings

/**
* Create (or update) the Edge Optimize Lambda@Edge function and publish a version (idempotent).
*/
export function createEdgeOptimizeLambda(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createLambdaAtEdge

* Create (or update) the Edge Optimize Lambda@Edge function and publish a version (idempotent).
*/
export function createEdgeOptimizeLambda(
credentials: EdgeOptimizeCredentials,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid passing creds in every method and instead have a client class whose initialisation (through constructor) will have these creds.

@ABHA61

ABHA61 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Review feedback update (latest pushed code 2e4949bf)

Fixed

  • Code generation hardening: buildCloudfrontFunctionCode now serializes defaultOriginId with JSON.stringify, and buildEdgeOptimizeLambdaCode serializes eoOriginDomain with JSON.stringify. Tests were updated for the escaping path.
  • Shared API naming: added AWSCredentials, kept EdgeOptimizeCredentials as a deprecated compatibility alias, and added AWS-style aliases: listDistributions, createOrigin, createCloudFrontFunction, updateCacheSettings, createLambdaAtEdge, getLambdaAtEdgeStatus, applyAssociations, verifyRouting, runDeployStep, planDeploy.
  • Existing-origin ownership guard: the EO origin is reused only when the id is EdgeOptimize_Origin; if the EO domain already exists under a different origin id, plan/deploy now stop instead of reusing a non-EO origin. CloudFront origins do not expose per-origin metadata/tags, so this is the v1 ownership signal.
  • Rebuilt the shared tarball/gist and updated/pushed api-service to consume it.

Deferred / asking for next release

  • Client class initialized with credentials: kept the free-function surface for this release, with AWS-style aliases added now. In api-service, credentials are still per-request STS credentials from assumeConnectorRole, so they are not stored globally or shared across customers. Can we move to a CloudFrontEdgeOptimizeClient({ credentials, region }) wrapper in the next release?
  • Single source of truth for the edge code: agreed. Current release keeps the generated code local but now hardens interpolation; the pinned llmo-code-samples fetch or build-time vendoring path is tracked in the punch list. Can we take that in the next release?
  • Additional P1 hardening/reliability items already captured in CLOUDFRONT_WIZARD_PUNCHLIST.md: getLatestLambdaVersion pagination, tighter logs:CreateLogGroup, HTTPS-only validation in verifyEdgeOptimizeRouting, bounded UpdateDistribution retry on 412/ETag conflict, CloudFront distribution-id format validation, and automated Lambda code update/re-publish. Can these stay next-release follow-ups while this PR stays focused on the working onboarding path?

Focused verification: CloudFront shared test file = 944 passing; eslint on changed JS files passed.

Comment thread packages/spacecat-shared-tokowaka-client/src/index.d.ts Outdated
Comment thread packages/spacecat-shared-tokowaka-client/src/index.js Outdated
Comment thread packages/spacecat-shared-tokowaka-client/src/index.d.ts Outdated
ABHA61 and others added 2 commits June 27, 2026 02:03
…orts

Address review (#1722): nothing is released yet, so remove the duplicate
EdgeOptimize-prefixed exports instead of deprecating them.

- Generic names are now the real function definitions (createOrigin,
  listDistributions, runDeployStep, ...); deleted the alias block.
- Rename client class CloudFrontEdgeOptimizeClient -> CloudFrontEdgeClient.
- Remove the @deprecated tail, EdgeOptimizeCredentials, and the prefixed
  constant/helper declarations from the public surface (kept internal).
- EdgeOptimizeStep -> DeployStep.
- Add tests for all CloudFrontEdgeClient methods -> 100% coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@nit23uec nit23uec left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move it in a separate pkg in next iteration along with covering pending non blocking items. Also add the punching list doc.
Currently the pr desc says do not merge. Pls update it as necessary.

verifyRouting,
runDeployStep,
planDeploy,
CloudFrontEdgeClient,

@nit23uec nit23uec Jun 27, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally only the client and input output schema objects need to be exported.
The methods that rely on cf connection should be called via the client itself

@nit23uec nit23uec requested a review from MysticatBot June 27, 2026 04:50

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ABHA61,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Approve - prior blocking findings addressed or explicitly deferred with reviewer agreement; remaining items are non-blocking.
Complexity: HIGH - large diff; new dependency surface + cross-account AWS control-plane.
Changes: Moves the BYOCDN "Optimize at Edge" CloudFront control-plane into @adobe/spacecat-shared-tokowaka-client as free functions with full TypeScript declarations (7 files).

Non-blocking (5): minor issues and suggestions
  • suggestion: verifyRouting still lacks URL scheme validation (prior finding, nit23uec +1'd) - tracked on punch list but worth noting the exported function accepts any string and passes it to fetch; a bare new URL(url) check at the top would fail-fast on malformed input - src/cdn/cloudfront/index.js:1485
  • suggestion: planDeploy reports canProceed: false when the behavior is already EO-associated, making the "Review and Deploy" screen inaccessible after a successful onboard; consider returning action: 'exists' to match runDeployStep's idempotent semantics - src/cdn/cloudfront/index.js:2033
  • nit: JSDoc for createLambdaAtEdge says "up to ~5x5s, within ~30s" but the loop is attempt < 3 (3 attempts, max ~10s wait) - stale comment from a prior iteration - src/cdn/cloudfront/index.js:1012
  • nit: verifyRouting probe sends 'chatgpt-user' (all lowercase) while real bots send 'ChatGPT-User' - some WAF/bot-management layers inspect case; using canonical case would better match production conditions - src/cdn/cloudfront/index.js:1489
  • suggestion: The public export surface includes many implementation-detail constants (EDGE_OPTIMIZE_REGION, eoRoutingFunctionName, etc.) - already tracked by nit23uec for the next-iteration package split - src/cdn/cloudfront/index.js

Previously flagged, now resolved

  • Template string injection in code generators - defaultOriginId and eoOriginDomain now serialized with JSON.stringify
  • Fetch timeout on verify probes - bounded with AbortController (20s)
  • listCloudFrontDistributions pagination - paginated via Marker/IsTruncated chain
  • Propagation gate uses direct GetDistributionCommand instead of list-and-find
  • Files moved to src/cdn/cloudfront/ per reviewer request
  • buildRoutingFunctionCode renamed to buildCloudfrontFunctionCode
  • Existing-origin ownership guard added (refuses to reuse a non-EO origin)
  • AWS-style function aliases added (listDistributions, createOrigin, etc.)

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 1m 46s | Cost: $9.43 | Commit: 0ad84f67132bee1515ac9f30a2d6656e58a057a8
If this code review was useful, please react with 👍. Otherwise, react with 👎.

@nit23uec nit23uec merged commit e465be0 into main Jun 27, 2026
6 checks passed
@nit23uec nit23uec deleted the feat/edge-optimize-cf-automation-simplified branch June 27, 2026 05:10
solaris007 pushed a commit that referenced this pull request Jun 27, 2026
## [@adobe/spacecat-shared-tokowaka-client-v1.20.0](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-tokowaka-client-v1.19.1...@adobe/spacecat-shared-tokowaka-client-v1.20.0) (2026-06-27)

### Features

* **tokowaka-client:** add CloudFront Optimize-at-Edge control-plane ([#1722](#1722)) ([e465be0](e465be0))
@solaris007

Copy link
Copy Markdown
Member

🎉 This PR is included in version @adobe/spacecat-shared-tokowaka-client-v1.20.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

ABHA61 added a commit to adobe/spacecat-api-service that referenced this pull request Jun 27, 2026
Bump @adobe/spacecat-shared-tokowaka-client off the temporary gist test build
to the published ^1.20.0 (adobe/spacecat-shared#1722, which moved the CloudFront
control-plane into the shared client).

Also rename the test stub variables to match the client's generic method names
(e.g. createEdgeOptimizeOriginStub -> createOriginStub) for consistency with the
purged package API. No behavioural change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed Reviewed by AI complexity:high High complexity PR released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants