Effect agents on GCP — production course

00

Copy-paste the project

Every box below is a real shell command. Replace anything in <ANGLE_BRACKETS> once. The flow takes about 25 minutes the first time.

A

Install prerequisites

Versions known to work: Node 22+, pnpm 9+, Pulumi 3.130+, Docker, gcloud. We install pnpm directly because Node no longer treats corepack as a supported way to ship a package manager.

# macOS — node@22 keeps you on the current LTS line
brew install node@22 docker
brew install pulumi/tap/pulumi google-cloud-sdk
npm install -g pnpm@9    # install pnpm directly, do not use corepack

# Linux (Debian/Ubuntu)
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt install -y nodejs docker.io
curl -fsSL https://get.pulumi.com | sh
curl https://sdk.cloud.google.com | bash
sudo npm install -g pnpm@9

# Sanity check — every version should print
node --version       # v22.x
pnpm --version       # 9.x
docker --version
pulumi version
gcloud --version

B

Create the GCP project

Replace PROJECT_ID with something globally unique. Billing account id comes from gcloud billing accounts list.

export PROJECT_ID=<effect-agents-PROJECT>
export REGION=us-central1
export BILLING_ACCOUNT=<XXXXXX-XXXXXX-XXXXXX>

gcloud auth login
gcloud projects create "$PROJECT_ID" --name="Effect agents"
gcloud config set project "$PROJECT_ID"
gcloud billing projects link "$PROJECT_ID" \
  --billing-account "$BILLING_ACCOUNT"

gcloud services enable \
  container.googleapis.com \
  artifactregistry.googleapis.com \
  sqladmin.googleapis.com \
  secretmanager.googleapis.com \
  vpcaccess.googleapis.com \
  servicenetworking.googleapis.com \
  iam.googleapis.com \
  monitoring.googleapis.com \
  logging.googleapis.com

gcloud auth application-default login

C

Scaffold the repo

The capstone (module 23) is the full shape. Start with just enough to deploy a single runner.

mkdir effect-agents && cd effect-agents
pnpm init
pnpm add effect @effect/ai @effect/ai-openai \
  @effect/cluster @effect/workflow \
  @effect/platform @effect/platform-node \
  @effect/rpc @effect/sql @effect/sql-pg
pnpm add -D typescript tsx vitest @effect/vitest

mkdir -p apps/agent-runtime/src apps/agent-gateway/src \
  packages/domain/src packages/event-store/src \
  infra/components

The smallest runtime that actually boots, joins the shard ring, and serves a health probe. Replace with the full agent runtime once it works.

import { Entity, RunnerAddress, Rpc } from "@effect/cluster"
import { HttpRouter, HttpServer, HttpServerResponse } from "@effect/platform"
import {
  NodeClusterSocket,
  NodeHttpServer,
  NodeRuntime
} from "@effect/platform-node"
import { PgClient } from "@effect/sql-pg"
import { Config, Effect, Layer, Logger, Option, Schema } from "effect"
import { createServer } from "node:http"

// A tiny entity that proves the cluster is wired correctly. Every
// call to Echo(text) for a given entityId routes to the same runner.
const Echo = Rpc.make("Echo", {
  payload: { text: Schema.String },
  success: Schema.String
})

const EchoEntity = Entity.make("Echo", [Echo])

const EchoEntityLive = EchoEntity.toLayer(
  Effect.gen(function* () {
    const address = yield* Entity.CurrentAddress
    return EchoEntity.of({
      Echo: ({ payload }) =>
        Effect.succeed(`[${address.entityId}] ${payload.text}`)
    })
  })
)

// Probes for the StatefulSet from module 09.
const HealthLive = HttpRouter.empty.pipe(
  HttpRouter.get("/healthz", HttpServerResponse.text("ok")),
  HttpRouter.get("/readyz", HttpServerResponse.text("ready")),
  HttpServer.serve()
)

const podHost = Config.string("POD_NAME").pipe(
  Config.map((name) =>
    `${name}.effect-agent-runners.agents.svc.cluster.local`
  ),
  Config.withDefault("localhost")
)

const ClusterLive = Layer.unwrapEffect(
  Effect.gen(function* () {
    const host = yield* podHost
    return NodeClusterSocket.layer({
      storage: "sql",
      runnerHealth: "k8s",
      runnerHealthK8s: {
        namespace: "agents",
        labelSelector: "app=effect-agent-runtime"
      },
      shardingConfig: {
        runnerAddress: Option.some(RunnerAddress.make(host, 50000)),
        runnerListenAddress: Option.some(
          RunnerAddress.make("0.0.0.0", 50000)
        )
      }
    })
  })
)

const SqlLive = PgClient.layerConfig({
  url: Config.redacted("DATABASE_URL")
})

const HttpLive = NodeHttpServer.layer(createServer, { port: 8080 })

const MainLive = Layer.mergeAll(EchoEntityLive, HealthLive).pipe(
  Layer.provide(HttpLive),
  Layer.provideMerge(ClusterLive),
  Layer.provideMerge(SqlLive),
  Layer.provide(Logger.json)
)

Layer.launch(MainLive).pipe(NodeRuntime.runMain)

# syntax=docker/dockerfile:1.7

# Node 22+ no longer guarantees a working corepack, so we install
# pnpm directly with npm and pin the version. Multi-stage keeps the
# production image small and free of build tooling.

FROM node:22-slim AS base
ENV PNPM_HOME=/pnpm PATH=/pnpm:$PATH
RUN npm install -g pnpm@9.15.0
WORKDIR /app

# --- deps: lockfile + manifests only, maximises layer cache hits
FROM base AS deps
COPY package.json pnpm-lock.yaml pnpm-workspace.yaml ./
COPY apps/agent-runtime/package.json apps/agent-runtime/
COPY packages packages
RUN --mount=type=cache,id=pnpm,target=/pnpm/store \
    pnpm install --frozen-lockfile

# --- build: compile TypeScript
FROM deps AS build
COPY tsconfig.json ./
COPY apps/agent-runtime apps/agent-runtime
RUN pnpm --filter agent-runtime build

# --- runtime: only what node needs to run
FROM node:22-slim AS runtime
ENV NODE_ENV=production NODE_OPTIONS=--enable-source-maps
RUN npm install -g pnpm@9.15.0
WORKDIR /app
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/packages ./packages
COPY --from=build /app/apps/agent-runtime/dist apps/agent-runtime/dist
COPY --from=build /app/apps/agent-runtime/package.json apps/agent-runtime/
USER node
EXPOSE 8080 50000
CMD ["node", "apps/agent-runtime/dist/main.js"]

D

Pulumi project and stack

Run inside infra/. Pick TypeScript. Stack name is the environment; start with dev.

cd infra
pulumi new gcp-typescript --force \
  --name effect-agents-infra \
  --description "Effect agents on GCP"

pulumi stack init dev
pulumi config set gcp:project "$PROJECT_ID"
pulumi config set gcp:region "$REGION"
pulumi config set --secret postgresPassword \
  "$(openssl rand -hex 24)"
pulumi config set --secret openaiApiKey "<sk-...>"

E

Build and push the runtime image

Tag with the git sha — never latest. Artifact Registry repo name matches what Pulumi creates next.

export SHA=$(git rev-parse --short HEAD 2>/dev/null || echo dev)
export IMAGE="$REGION-docker.pkg.dev/$PROJECT_ID/agents/agent-runtime:$SHA"

gcloud auth configure-docker "$REGION-docker.pkg.dev" --quiet

docker build -t "$IMAGE" -f apps/agent-runtime/Dockerfile .
docker push "$IMAGE"

cd infra
pulumi config set agentRuntimeImage "$IMAGE"

F

Pulumi up

First apply creates VPC, Cloud SQL HA, GKE Autopilot, Artifact Registry, Secret Manager, namespace, RBAC, StatefulSet, gateway Service. Expect 10–15 minutes.

pulumi preview        # review the plan
pulumi up --yes       # apply

pulumi stack output gatewayUrl
pulumi stack output cloudSqlConnection
pulumi stack output kubeconfig --show-secrets > ~/.kube/agents.yaml
export KUBECONFIG=~/.kube/agents.yaml

G

Smoke test

Start a run, poll the workflow, open a WebSocket and replay events.

export GW=$(cd infra && pulumi stack output gatewayUrl)

RUN=$(curl -sX POST "$GW/agents/session-1/runs" \
  -H 'content-type: application/json' \
  -d '{"runId":"run-1","message":"hello"}')
echo "$RUN"

EXEC=$(echo "$RUN" | jq -r .executionId)
curl -s "$GW/runs/$EXEC" | jq .

npx -y wscat -c "${GW/https/wss}/ws/session-1"

Cost guard, honestly

This stack idles around $300–500/month in us-central1. Cloud SQL HA on db-custom-2-7680 is the bulk of it (~$250–350); GKE Autopilot adds ~$70 in cluster fee plus small per-pod resource cost; Artifact Registry and Secret Manager are rounding errors. Don't leave this running over a weekend by accident — run pulumi destroy from module 25.

01

North star: what you are building

A production deployment is a contract between Cloud Run as the user edge, GKE as the durable runtime, and Cloud SQL as the brain. Every later module sharpens this triangle.

User-facing tier

Cloud Run app

Stateless HTTP and WebSocket entry. Auth, rate-limits, and tenant boundaries live here.

Runtime tier

GKE Autopilot

Long-lived Effect cluster runners and durable workflow execution per agent session.

Durability tier

Cloud SQL Postgres

Cluster runner registry, workflow state, durable events, and pgvector memory for retrieval.

Course bias

Keep Cloud Run for stateless ingress. Keep GKE Autopilot for durable runtime. Avoid hybrid systems where Cloud Run holds in-flight agent state.

02

Docker-to-GKE mental model

Learn only the Kubernetes objects a Docker operator needs: image, replicas, DNS, secrets, ports, and health. No Helm. No custom controllers.

Docker concept	GKE object	Why for Effect	Course rule
`docker run`	Pod	One Effect runtime process.	Never deploy bare pods.
`docker compose --scale`	StatefulSet	Stable names `agent-0`, `agent-1`.	Required for runner addresses.
Compose network alias	Headless Service DNS	Per-pod DNS for cluster.	Maps to `runnerAddress`.
Container port	Service port	Runners speak TCP.	Expose one internal port.
`.env`	Secret + env var	DB and provider credentials.	Pulumi creates and binds.
Restart policy	Probe + rollout	Bad runners are replaced safely.	Boring startup is mandatory.

03

The Effect runtime you will ship

One Node container launches a single Layer graph. Cluster, workflow engine, entities, workflows, SQL, AI providers, and telemetry are services in the same graph.

const podDns = () =>
  `${process.env.POD_NAME}` +
  `.effect-agent-runners.agents.svc.cluster.local`

const ClusterLayer = NodeClusterSocket.layer({
  storage: "sql",
  runnerHealth: "k8s",
  runnerHealthK8s: {
    namespace: "agents",
    labelSelector: "app=effect-agent-runtime"
  },
  shardingConfig: {
    runnerAddress: Option.some(
      RunnerAddress.make(podDns(), 50000)
    ),
    runnerListenAddress: Option.some(
      RunnerAddress.make("0.0.0.0", 50000)
    )
  }
})

const RuntimeLayer = Layer.mergeAll(
  AgentEntityLive,
  AgentWorkflowLive,
  AgentGatewayLive
).pipe(
  Layer.provideMerge(ClusterWorkflowEngine.layer),
  Layer.provideMerge(ClusterLayer),
  Layer.provideMerge(SqlLayer),
  Layer.provideMerge(AiLayer)
)

Layer.launch(RuntimeLayer).pipe(NodeRuntime.runMain)

Effect type lens

Every agent action stays explicit

A typical business function reads Effect<Answer, AgentError, AgentMemory | LanguageModel | Tools>. Pulumi and GKE decide where requirements are provided. The signature decides correctness.

Success: answer, run id, stream event, workflow status.
Error: domain failure types like rate limit, tool failure, policy block.
Requirements: model, SQL, tools, config, telemetry.

04

Pulumi stack: one program owns infra and Kubernetes

Pulumi TypeScript first creates the GCP resources, then uses the generated kubeconfig to create namespace, RBAC, services, StatefulSet, secrets, and gateway objects.

GCP

Project services

Enable GKE, Artifact Registry, Cloud SQL, Secret Manager, IAM, VPC Access, Cloud Logging.

Network

Private path

VPC, subnet, private service access for Cloud SQL, optional Cloud Run VPC connector.

Data

Cloud SQL Postgres

Cluster runner registry, shard locks, persisted messages, workflow state, agent app tables.

Runtime

GKE Autopilot

Managed cluster runs the Docker image as a long-lived StatefulSet.

Kubernetes

Stateful objects

Namespace, ServiceAccount, RBAC, headless Service, StatefulSet, gateway Service.

Delivery

Artifact Registry

Images built in CI, pushed, and referenced by Pulumi config.

Where state lives

The Quickstart leaves Pulumi state in Pulumi Cloud, which is fine while you're learning. For real environments most teams self-host on GCS: pulumi login gs://<PROJECT_ID>-pulumi-state before stack init. You get versioning, IAM, and no third-party access to your infra graph. The bucket should be in the same project, have versioning on, and have uniform_bucket_level_access enabled.

import * as gcp from "@pulumi/gcp"
import * as k8s from "@pulumi/kubernetes"
import * as pulumi from "@pulumi/pulumi"

const cfg = new pulumi.Config()
const region = cfg.require("region")

const network = new gcp.compute.Network("agents-vpc", {
  autoCreateSubnetworks: false
})
const subnet = new gcp.compute.Subnetwork("agents-subnet", {
  region,
  network: network.id,
  ipCidrRange: "10.42.0.0/20"
})
const repo = new gcp.artifactregistry.Repository("agents", {
  location: region,
  repositoryId: "agents",
  format: "DOCKER"
})
const db = new gcp.sql.DatabaseInstance("agents-pg", {
  region,
  databaseVersion: "POSTGRES_16",
  settings: {
    tier: "db-custom-2-7680",
    ipConfiguration: { ipv4Enabled: false, privateNetwork: network.id },
    backupConfiguration: { enabled: true, pointInTimeRecoveryEnabled: true },
    availabilityType: "REGIONAL"
  }
})
const cluster = new gcp.container.Cluster("agents-autopilot", {
  location: region,
  enableAutopilot: true,
  network: network.id,
  subnetwork: subnet.id,
  privateClusterConfig: { enablePrivateNodes: true }
})

const k8sProvider = new k8s.Provider("gke", { kubeconfig: makeKubeconfig(cluster) })

// Namespace, ServiceAccount, RBAC, headless Service, StatefulSet,
// Secret, gateway Service — all created with k8sProvider.

05

Build path: local to production

Every module ships a lab outcome. You do not jump to multi-replica GKE on day one.

01

Local Docker Compose

One Node container plus Postgres. Use storage: "local" first, then switch to storage: "sql".

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: agents
      POSTGRES_PASSWORD: agents
      POSTGRES_DB: agents
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U agents"]
      interval: 2s
      timeout: 5s
      retries: 10
    ports: ["5432:5432"]
    volumes: [pgdata:/var/lib/postgresql/data]

  runtime:
    build:
      context: .
      dockerfile: apps/agent-runtime/Dockerfile
    depends_on:
      postgres: { condition: service_healthy }
    environment:
      DATABASE_URL: postgres://agents:agents@postgres:5432/agents
      POD_NAME: runtime-local
    ports:
      - "8080:8080"     # health + http gateway
      - "50000:50000"   # cluster socket (only for multi-node compose)

volumes:
  pgdata:

Run with docker compose up --build. First boot creates the database; subsequent boots reuse it. To simulate two runners, copy runtime as runtime-2 with POD_NAME: runtime-local-2 and a different host port.

02
Pulumi bootstrap
Create Artifact Registry, Cloud SQL HA, GKE Autopilot, Secret Manager, IAM, VPC.
03
Single runner on GKE
Deploy with replicas: 1. Confirm SQL tables, logs, and one workflow execution.
04
Three runners
Scale out. Confirm shard ownership moves and same session id routes consistently.
05
Cloud Run gateway
Cloud Run calls GKE gateway over private networking. Start runs, poll workflows, stream events.
06
Production drills
Kill a pod, restart the runtime, fail a tool, rotate a secret, restore from backup.

import { it } from "@effect/vitest"
import { Effect, Layer, TestClock } from "effect"
import { TestRunner } from "@effect/cluster"
import { AgentWorkflow, AgentWorkflowLive } from "../src/AgentWorkflow"

// TestRunner.layer runs the whole cluster in-memory: no SQL, no
// network, no real time. Perfect for entity + workflow unit tests.
const TestLayer = AgentWorkflowLive.pipe(
  Layer.provideMerge(TestRunner.layer)
)

it.effect("workflow completes for a simple ask", () =>
  Effect.gen(function*() {
    const executionId = yield* AgentWorkflow.execute(
      { runId: "r1", sessionId: "s1", message: "hi" },
      { discard: true }
    )
    // Advance time so any DurableClock.sleep resolves instantly
    yield* TestClock.adjust("1 hour")

    const result = yield* AgentWorkflow.poll(executionId)
    // assert result is Complete with the expected exit
  }).pipe(Effect.provide(TestLayer))
)

What to test where

Three rings, decreasing speed

Unit with TestRunner.layer: entity protocols, workflow happy path + each error type, schema decoders. Fast, no Docker.
Integration with docker compose up postgres: SQL storage round-trip, real idempotencyKey dedup, real shard math. Slow enough to run on PR but not in watch mode.
Smoke against a deployed staging stack: the seven steps from Quickstart G run as a script after every deploy. This is the only test that proves Cloud Run, GKE, and Cloud SQL are wired together correctly.

06

Networking without hand plumbing

Effect runners need stable addresses. The public app does not need to join the cluster. Cloud Run calls a gateway; the gateway calls typed entity clients.

Runner-to-runner

Headless Service DNS

Use StatefulSet pod DNS as the runnerAddress. Use 0.0.0.0 only as listen address.

effect-agent-runtime-0.effect-agent-runners.agents.svc.cluster.local
Port 50000 for cluster socket traffic.
Pod labels drive Kubernetes health checks.

Cloud Run-to-GKE

Gateway, not cluster client

Cloud Run hits a small HTTP gateway inside GKE. The gateway uses typed Effect entity clients.

POST /agents/:sessionId/runs — start workflow.
GET /runs/:executionId — poll workflow.
GET /runs/:executionId/events — stream events.

07

Security defaults you keep from day one

Least-privilege service accounts, secrets out of source, private DB. Pulumi owns both GCP IAM and Kubernetes RBAC.

Surface	Default	Why	Pulumi creates
Runtime identity	Workload identity	Pods use a GCP service account without static keys.	GSA, KSA, IAM binding.
Secrets	Secret Manager	No plaintext config in Git.	Secret versions, K8s Secret projection.
Cloud SQL	Private IP	No public DB endpoint.	Private service access and HA instance.
K8s health	Pod read RBAC	Effect `runnerHealth: "k8s"` needs pod visibility.	Role and RoleBinding.
Network	Internal first	Cloud Run reaches gateway privately where possible.	VPC connector or internal ingress.

// 1. GCP service account the pod will act as
const runtimeGsa = new gcp.serviceaccount.Account("agent-runtime", {
  accountId: "agent-runtime",
  displayName: "Effect agent runtime pods"
})

// 2. Let the Kubernetes service account impersonate the GSA
new gcp.serviceaccount.IAMBinding("agent-runtime-wi", {
  serviceAccountId: runtimeGsa.name,
  role: "roles/iam.workloadIdentityUser",
  members: [pulumi.interpolate`serviceAccount:${project}.svc.id.goog` +
            `[agents/agent-runtime]`]
})

// 3. Give the GSA only what the pod actually needs
new gcp.projects.IAMMember("agent-runtime-sql", {
  project, role: "roles/cloudsql.client",
  member: pulumi.interpolate`serviceAccount:${runtimeGsa.email}`
})
new gcp.projects.IAMMember("agent-runtime-secrets", {
  project, role: "roles/secretmanager.secretAccessor",
  member: pulumi.interpolate`serviceAccount:${runtimeGsa.email}`
})

// 4. Annotate the K8s service account
new k8s.core.v1.ServiceAccount("agent-runtime", {
  metadata: {
    name: "agent-runtime", namespace: "agents",
    annotations: {
      "iam.gke.io/gcp-service-account": runtimeGsa.email
    }
  }
}, { provider: k8sProvider })

Cloud Run → GKE gateway

Authenticate with an OIDC ID token

The gateway sits inside the VPC behind an internal load balancer. Cloud Run reaches it through a VPC connector. To prove identity, Cloud Run grabs a Google-signed ID token from the metadata server and sends it as Authorization: Bearer <token>. The gateway verifies the token's aud claim against its custom audience.

Set audience = the gateway's internal URL (or a custom audience string).
Grant the Cloud Run service account roles/run.invoker only if you're invoking another Cloud Run; for a self-verifying gateway the gateway middleware checks the JWT directly using Google's JWKS.
Never trust x-tenant-id from Cloud Run. Re-extract tenant from a verified claim or your auth header inside the gateway.

08

Observability that sees fibers, workflows, and infra

Agents fail in unusual ways: rate limits, tool failures, partitions, workflow suspension, runner churn. Logs alone are not enough.

Logs

Structured Effect logs

Annotate with sessionId, runId, executionId, model, tool, attempt.

Traces

OpenTelemetry

Export spans to Cloud Trace. Wrap agent turns, tool calls, model calls, and workflow activities.

Metrics

Runtime health

Active runners, shard ownership, workflow states, tool latency, tokens, retry counts.

Minimum dashboard checklist

Runner pods and Kubernetes readiness.
Cloud SQL CPU, connections, storage, lock waits.
Workflow completions, failures, suspensions, retries.
LLM latency, rate-limit count, fallback count, tool error rate.
Cloud Run gateway latency and error rate.

09

Operations and failure nuances

The course treats these as required labs, not optional notes.

DB sessions

Advisory lock nuance

Cluster SQL storage uses advisory locks. Avoid poolers that rotate sessions underneath ownership.

Rollouts

One change at a time

Deploy with replicas: 1 after schema or runtime changes, then scale out.

Passivation

Entity memory is cache

Active memory disappears on idle or restart. Durable facts live in SQL or workflow state.

Idempotency

Run IDs matter

Workflow idempotency keys are deterministic. Client retries do not double-trigger runs.

Scaling

Shards before replicas

Set shard count up front. Changing shards later is a migration decision.

Shutdown

Graceful drain

Termination grace, readiness, and logs confirm runners release work cleanly.

spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60   # give cluster time to shed shards
      containers:
        - name: agent-runtime
          # Startup probe lets the Layer graph come up before liveness fires.
          # First boot includes opening pg pool, joining the shard ring,
          # and running pending migrations — easy 20s.
          startupProbe:
            httpGet: { path: /healthz, port: 8080 }
            failureThreshold: 30
            periodSeconds: 2
          readinessProbe:
            httpGet: { path: /readyz, port: 8080 }
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /healthz, port: 8080 }
            periodSeconds: 10
          lifecycle:
            preStop:
              exec:
                # Tell the runtime to stop accepting new entity messages,
                # then let in-flight workflow activities finish. The Effect
                # runtime's own scope finalizers release shard locks on
                # SIGTERM; preStop just gives the load balancer time to
                # mark this pod NotReady first.
                command: ["/bin/sh", "-c", "sleep 10"]

What the probes actually answer

/healthz = "the process is up." /readyz = "this runner is in the shard ring AND can reach its database AND has finished startup migrations." If /readyz 200s before the shard ring is joined, you'll route traffic to a runner that doesn't own any entities yet. Don't conflate them.

10

Effect Cluster: entities, runners, shards, clients

Cluster is typed distributed actors. An Entity is the protocol. An entity layer is the server. Entity.client is the typed remote handle.

Core idea

One entity id owns one serial lane

For agents the entity id is usually sessionId, agentId, or tenant:user. Calls to the same id route to the same active runner while alive.

Runner: a GKE pod with NodeClusterSocket.layer.
Shard: bucket assigning entity ids to runners.
Entity: typed RPC plus stateful handler.
Client: location-transparent typed caller.

const Ask = Rpc.make("Ask", {
  payload: { runId: Schema.String, message: Schema.String },
  success: Schema.String
})
const Cancel = Rpc.make("Cancel", {
  payload: { executionId: Schema.String },
  success: Schema.Void
})

export const AgentEntity = Entity.make("Agent", [Ask, Cancel])

export const AgentEntityLive = AgentEntity.toLayer(
  Effect.gen(function* () {
    const address = yield* Entity.CurrentAddress
    const sessionId = address.entityId

    return AgentEntity.of({
      Ask: ({ payload }) => AgentWorkflow.execute({
        runId: payload.runId,
        sessionId,
        message: payload.message
      }, { discard: true }),

      Cancel: ({ payload }) =>
        AgentWorkflow.interrupt(payload.executionId)
    })
  }),
  { maxIdleTime: "10 minutes" }
)

Concept	Effect API	Agent use	Nuance
Protocol	`Rpc.make`	Ask, cancel, snapshot, append event.	Small schema-defined messages.
Entity definition	`Entity.make`	Defines the actor type.	No implementation yet.
Entity behavior	`Entity.toLayer`	Handlers and in-memory state.	Memory is cache.
Remote call	`AgentEntity.client`	Gateway calls `clientFor(sessionId).Ask`.	Caller does not know the pod.
Persistence	`ClusterSchema.Persisted`	Persist important messages.	Skip noisy read calls.
Concurrency	`Rpc.fork`	Run safe readers concurrently.	Default serial protects state.

11

Effect Workflow: durable agent runs

Cluster answers where does this agent live. Workflow answers how do I run this long task durably, retry it, sleep, resume, and inspect status.

class AgentError extends Schema.TaggedError<AgentError>("AgentError")(
  "AgentError",
  { message: Schema.String }
) {}

export const AgentWorkflow = Workflow.make({
  name: "AgentWorkflow",
  payload: {
    runId: Schema.String,
    sessionId: Schema.String,
    message: Schema.String
  },
  success: Schema.String,
  error: AgentError,
  idempotencyKey: ({ runId }) => runId
})

export const AgentWorkflowLive = AgentWorkflow.toLayer(
  Effect.fn(function* (payload, executionId) {
    yield* EventStore.append({
      executionId, type: "started", message: payload.message
    })

    const answer = yield* Activity.make({
      name: "RunAgentTurn",
      success: Schema.String,
      error: AgentError,
      execute: Effect.gen(function* () {
        const assistant = yield* AiAssistant
        return yield* assistant.agent(payload.message)
      })
    }).pipe(Activity.retry({ times: 3 }))

    yield* EventStore.append({ executionId, type: "completed", answer })
    return answer
  })
)

Durable model

Workflow state survives runner death.

Use workflows for work that outlives one HTTP request: reasoning, tool chains, approvals, scheduled follow-ups, retries, and cancellation.

Workflow.make: name, schemas, idempotency key.
toLayer: durable implementation.
Activity.make: retryable external work.
DurableClock: sleep without holding compute.
DurableDeferred: approval or external signal.

Start

`execute(payload, { discard: true })`

Returns execution id quickly. Perfect for HTTP and WebSocket start endpoints.

Poll

`poll(executionId)`

Returns complete, suspended, or undefined. Status pages and retries use this.

Cancel

`interrupt(executionId)`

Map a user cancel action to workflow interruption.

12

WebSocket gateway: realtime over durable work

WebSockets belong at the gateway edge. They never own state. They start workflows, subscribe to persisted events, and reconnect by replaying from lastEventId.

Why GKE for sockets, not Cloud Run

Cloud Run treats WebSockets as long-lived HTTP requests and applies the 60-minute request timeout hard cap. Anything still connected at 60 minutes gets killed regardless of activity. That's fine for short chat sessions if your client reconnects cleanly, but for agent runs that observe a long workflow, GKE is the calmer home: idle timeouts are yours to choose, instance scaling is predictable, and you avoid the cold-start-during-reconnect dance.

const GatewayLive = HttpRouter.empty.pipe(
  HttpRouter.get(
    "/ws/:sessionId",
    Effect.gen(function* () {
      const req = yield* HttpServerRequest.HttpServerRequest
      const sessionId = req.pathParams.sessionId

      const outbound = EventStore.stream(sessionId).pipe(
        Stream.map((event) => JSON.stringify(event)),
        Stream.encodeText
      )

      const inbound = outbound.pipe(
        Stream.pipeThroughChannel(HttpServerRequest.upgradeChannel()),
        Stream.decodeText(),
        Stream.runForEach((raw) =>
          Effect.gen(function* () {
            const msg = yield* Schema.decodeUnknown(ClientMessage)(
              JSON.parse(raw)
            )
            const clientFor = yield* AgentEntity.client
            const agent = clientFor(sessionId)
            switch (msg._tag) {
              case "Ask":
                yield* agent.Ask({ runId: msg.runId, message: msg.text })
                return
              case "Cancel":
                yield* agent.Cancel({ executionId: msg.executionId })
                return
            }
          })
        )
      )

      yield* inbound
      return HttpServerResponse.empty()
    })
  ),
  HttpServer.serve()
)

Realtime contract

Sockets are a projection, not the source of truth.

The browser can disconnect anytime. Every meaningful event is persisted so a reconnect resumes by lastEventId.

Inbound: ask, cancel, approve, heartbeat.
Outbound: accepted, token, tool-call, tool-result, completed, failed.
Reconnect: client sends last seen event id.
Backpressure: bounded queues and heartbeat timeouts.

Problem	Pattern	Effect tool	GCP nuance
Client disconnect	Persist events; replay from offset.	`Stream` + SQL EventStore.	Do not rely on Cloud Run memory.
Long LLM response	Workflow emits events while running.	`Activity` + EventStore.	Gateway restarts are safe.
Cancel button	Cancel message interrupts workflow.	`Workflow.interrupt`	Return final cancelled event to all clients.
Multiple tabs	Same session id subscribes once.	`AgentEntity.client`	Entity serializes mutations.
Socket scaling	Gateway pods are stateless subscribers.	`Stream` + SQL/PubSub bridge.	Prefer GKE gateway over Cloud Run for sockets.

13

Capacity, SLOs, and scale model for 5–10M users

10 million users is not one number. Model MAU, peak concurrent users, concurrent WebSockets, active agent runs, tool calls per run, tokens per minute, and database writes per run.

Metric	Why	Lever	Deliverable
MAU / DAU	Business scale, not raw load.	Plans, retention, tenant tiers.	Traffic worksheet per tier.
Peak concurrent users	Gateway CPU and request capacity.	Cloud Run autoscaling, GKE HPA.	Staged concurrency load test.
Concurrent WebSockets	Realtime connection pressure.	GKE gateway, heartbeat, idle timeout.	Soak test and reconnect test.
Active agent runs	Workflow and LLM pressure.	Quotas, rate limits, queues.	Admission policy.
DB writes per run	Cloud SQL bottleneck.	Event compaction, batching, retention.	Event schema plan.
Tokens per minute	Provider limits and cost.	Model router, fallback plan.	Per-tenant token dashboard.

SLO

Interactive path

p95 gateway response, first-token latency, WebSocket reconnect success, run-start acceptance.

SLO

Durable path

Workflow completion rate, retry exhaustion, suspended workflow age, event replay lag.

SLO

Data path

Cloud SQL utilization, p95 query latency, lock waits, backup success, restore drill age.

14

Production data architecture

At this scale the hard part is not starting a workflow. The hard part is deciding which state is hot, durable, searchable, replayable, compacted, and deletable.

State taxonomy

Four stores, four jobs.

Entity memory: hot coordination only.
Workflow state: durable execution and retries.
Event store: realtime replay and audit.
Memory store: semantic and user memory.

type AgentEvent =
  | { _tag: "RunAccepted"; executionId: string; runId: string }
  | { _tag: "Token"; executionId: string; index: number; text: string }
  | { _tag: "ToolCall"; executionId: string; tool: string; callId: string }
  | { _tag: "ToolResult"; executionId: string; callId: string; ok: boolean }
  | { _tag: "Completed"; executionId: string; answerRef: string }
  | { _tag: "Failed"; executionId: string; reason: string }

// Persist every event with tenantId, sessionId, executionId,
// monotonic eventId, createdAt, and retention bucket.

Data	Primary GCP choice	Why	Nuance
Cluster + workflow	Cloud SQL Postgres	Official SQL-backed cluster storage.	Stable sessions; watch locks.
Event stream	Cloud SQL first, Pub/Sub later	SQL is simple and replayable.	Pub/Sub is not source of truth.
Semantic memory	AlloyDB pgvector or Vertex Vector	Search and embeddings at scale.	Clear PII and deletion path.
Artifacts and files	Cloud Storage	Cheap durable blobs.	Store refs in SQL, not blobs.
Analytics	BigQuery sink	Large reporting without OLTP.	Export redacted events only.

-- The event store is your code, not Effect's. This is one
-- reasonable shape; adapt freely. Cluster and workflow state are
-- created by @effect/cluster and @effect/workflow themselves — do
-- not touch those tables.

create table agent_events (
  tenant_id     text        not null,
  session_id    text        not null,
  execution_id  text        not null,
  event_id      bigserial   primary key,
  event_type    text        not null,
  payload       jsonb       not null,
  created_at    timestamptz not null default now()
);

create index agent_events_session_idx
  on agent_events (tenant_id, session_id, event_id);

create index agent_events_exec_idx
  on agent_events (execution_id, event_id);

-- Retention partitioning lives here too. The course leaves the
-- specifics to you because tenancy and compliance shape them.

Honest gap

You own the event store

Cluster runner registry and workflow execution tables are managed for you by @effect/cluster and @effect/workflow when you pick SQL storage. The event store above is application code — there is no built-in module to install. Treat the DDL on the left as a starting point, not a published library.

Two things to decide before you ship it:

Retention. Daily partition table or scheduled delete? Most teams partition by created_at::date and drop old partitions.
PII strategy. Either filter at append time or store raw and have a deletion workflow per tenant on request.

Connection pool reality

Cloud SQL on db-custom-2-7680 caps at roughly 200 connections. PgClient.layer defaults to a pool of 10 per process. With three runner pods + two gateway pods that's already 50 connections — fine. Add a fourth tier (background jobs, migrations runner) and you're at 80. Past about 120 total open connections, queue queries inside the pool instead of buying more headroom; the database is the bottleneck, not the pool. PgClient.layer({ url, transformQueryNames, transformResultNames }) takes pool tuning via the underlying postgres driver options if you need to lower the per-pod cap.

15

Delivery: CI/CD, migrations, rollouts, rollback

Pulumi makes environments repeatable. Production also needs disciplined release mechanics.

A
PR checks
Typecheck, tests, lint, Pulumi preview on every PR.
B
Image build
Tagged by git SHA, signed, pushed to Artifact Registry.
C
Migration gate
Apply backward-compatible DB changes before deploying the new image.
D
Progressive rollout
Staging, one runner, then full replica count. Smoke tests after each step.

Pulumi

Stack separation

Distinct dev, staging, and prod stacks with protected resources.

Migrations

Expand, deploy, contract

Add columns and tables first. Deploy code. Backfill. Remove old fields later.

Rollback

Image is not enough

Release plans cover DB compatibility, workflow versions, event schema compatibility.

Workflow versioning

Never break old runs

Long-running workflows resume after deploys. Version names or preserve old handlers.

Policy

Preview before apply

CI posts Pulumi preview. Production apply requires approval and pinned providers.

Smoke

Real run after deploy

Start a workflow, stream events, kill a pod, verify completion, check logs.

Migration tool — pick one

What we use, why

For a course-shaped repo, Drizzle (drizzle-kit migrate) is the lightest option that produces plain SQL files you can read and audit. node-pg-migrate is also fine if you prefer hand-written up/down. The course's expand/deploy/contract rule applies to either:

Expand: additive only — new columns nullable, new tables, new indexes CONCURRENTLY.
Deploy: code reads/writes both old and new shapes.
Backfill: a workflow does this safely; never a long transaction.
Contract: drop the old fields one release later, never the same release.

# Service account the CI uses to run `pulumi up` against prod.
# Bind these roles project-wide unless you can scope tighter.

roles/container.admin          # GKE Autopilot cluster + workloads
roles/cloudsql.admin           # Cloud SQL instances and users
roles/artifactregistry.admin   # repos + image pushes
roles/secretmanager.admin      # versions + IAM bindings
roles/iam.serviceAccountAdmin  # to create the workload-identity GSAs
roles/iam.serviceAccountUser   # to *use* those GSAs in workloads
roles/compute.networkAdmin     # VPC, subnets, Cloud Armor
roles/servicenetworking.networksAdmin  # private services access
roles/storage.admin            # Pulumi state bucket + course buckets
roles/monitoring.admin         # dashboards + alert policies

# Do NOT grant roles/owner. It works, then it leaks.

16

Reliability operations: HA, DR, load, incidents

For a 5–10M user product, start with strong single-region HA and tested disaster recovery. Multi-region active-active comes after this is solid.

Area	Standard target	GCP mechanism	Course lab
GKE runtime	Multi-replica, disruption-safe rollouts.	StatefulSet, readiness, PDB, Autopilot.	Drain one pod mid-workflow.
Database	HA primary, PITR, tested restore.	Cloud SQL HA, backups, PITR.	Restore staging from backup.
Gateway	Horizontal autoscaling, stateless reconnect.	GKE HPA or Cloud Run autoscaling.	WebSocket reconnect soak test.
Provider outage	Graceful fallback or queueing.	ExecutionPlan, rate limits, fallback models.	Force primary model failure.
Traffic spike	Admission control before overload.	Per-tenant quotas, queue depth, 429 policy.	Load test to controlled rejection.
Incident response	Known runbooks and owners.	Cloud Monitoring alerts, dashboards.	Run a simulated incident review.

Production rule

If a failure mode does not have an alert, a dashboard panel, and a runbook entry, it is not production-ready.

17

Governance: tenancy, security, abuse, compliance, cost

Agent systems carry unique blast-radius problems: model spend, tool permissions, prompt injection, retention, and tenant isolation.

Tenancy

Tenant id everywhere

Every table, event, payload, trace, metric, log carries tenant id. Quotas before workflows start.

Abuse

Admission control

Rate limits by tenant, user, IP, and tool category. Reject early before LLM cost.

Cost

Budgeted reasoning

Track tokens, tool calls, retries, model class, run duration. Hard caps and fallbacks.

Privacy

Retention and deletion

Define TTLs for raw events, summaries, embeddings, artifacts, and audit records.

Tools

Least-privilege tools

Tool handlers are Effect services with typed requirements and policy checks.

Pulumi

Policy as code

Enforce private DB, no public buckets, required labels, deletion protection, approved regions.

Industry-standard readiness checklist

SLO document with error budget and alert thresholds.
Threat model for prompts, tools, data exfiltration, tenant isolation.
Runbooks for model outage, DB saturation, stuck workflows, socket storms.
Backup restore drill and rollback drill completed before launch.
Load test report with bottleneck and controlled rejection point.
Cost dashboard by tenant, model, tool, and workflow type.
Data retention and deletion policy implemented, not just documented.

18

Multi-region HA: when single-region is not enough

Single-region HA covers most 5–10M-user products. Cross-region buys disaster recovery and lower global latency, but it adds replication lag, failover drills, and workflow-versioning rules. Start with read replicas before active-active.

Step 1

Cross-region read replica

Primary stays availabilityType: "REGIONAL". Add a replicaInstance in another region. Promote on disaster, then repoint the gateway.

Step 2

Stateless gateway in N regions

Deploy the gateway in two regions behind a global external load balancer. Sticky to primary DB until a failover playbook flips it.

Step 3

Workflow versioning

Cross-region failover may resume workflows under the new primary. Old workflow names must keep their handlers for at least one major version.

new gcp.sql.DatabaseInstance("agents-pg-replica", {
  region: "us-east1",
  databaseVersion: "POSTGRES_16",
  masterInstanceName: primary.name,
  // Postgres read replicas can never be auto-failover targets in
  // Cloud SQL — this field is effectively a no-op here. Failover for
  // Postgres uses the primary's REGIONAL availability + a manual
  // promote on the replica during a regional outage.
  replicaConfiguration: { failoverTarget: false },
  settings: {
    tier: "db-custom-2-7680",
    availabilityType: "ZONAL", // replicas don't need HA
    ipConfiguration: {
      ipv4Enabled: false,
      privateNetwork: network.id
    }
  }
})

Honest limit

Effect Cluster is not designed for split-brain across regions today. Real active-active means workflow ids partitioned by region or an external coordinator. Most 5–10M-user products do not need this.

19

Shard groups per tenant or tier

By default all entities share one shard ring. For premium tenants, large customers, or noisy workloads, route them to a dedicated runner pool with shard groups.

export const AgentEntity = Entity.make("Agent", [Ask, Cancel])
  .annotate(
    ClusterSchema.ShardGroup,
    (entityId) =>
      entityId.startsWith("ent-")
        ? "enterprise"
        : "default"
  )

Runtime config

Runners opt into groups

A runner only owns shards from its declared groups. Run two StatefulSets — one with shardGroups: ["default"], one with ["enterprise"] — and scale them independently.

New groups need a one-time SQL migration of the shards table.
Always keep "default" on at least one runner pool.
Use a single group until you have a real noisy-neighbor problem.

20

Typed RPC gateway with @effect/rpc

Hand-rolling HTTP between Cloud Run and GKE loses your types at the boundary. @effect/rpc gives a typed schema, no codegen, that both sides import from a shared package.

export class StartRun extends Rpc.make("StartRun", {
  payload: {
    sessionId: Schema.String,
    runId: Schema.String,
    message: Schema.String
  },
  success: Schema.Struct({ executionId: Schema.String })
}) {}

export class PollRun extends Rpc.make("PollRun", {
  payload: { executionId: Schema.String },
  success: WorkflowResult
}) {}

export const AgentRpc = RpcGroup.make(StartRun, PollRun)

Why it pays

Schema is the contract

Cloud Run imports the typed client; GKE imports the handler.
Breaking changes appear as TypeScript errors before deploy.
Tracing context propagates automatically across the boundary.
Versioning a payload field is a schema change, not a docs change.

21

Cloud Armor and IAP: stricter ingress

Cloud Armor is the WAF and rate limiter in front of your load balancer. IAP enforces Google identity on internal endpoints. Both are Pulumi resources you add once the gateway is stable.

WAF

Block known bad

Use Google preconfigured rules for SQLi, XSS, and protocol attacks. Add per-tenant rate limit by header.

Rate limit

Per-IP + per-tenant

Two limit policies stack: a global IP throttle and a tenant-key throttle. Reject early with 429 before any LLM cost.

IAP

Internal endpoints

Admin dashboards, replay tooling, and runbook actions sit behind IAP with Google login. No VPN needed.

new gcp.compute.SecurityPolicy("gw-armor", {
  rules: [
    {
      action: "deny(403)",
      priority: 1000,
      match: {
        expr: {
          expression:
            "evaluatePreconfiguredExpr('sqli-stable')"
        }
      }
    },
    {
      action: "rate_based_ban",
      priority: 2000,
      rateLimitOptions: {
        rateLimitThreshold: { count: 100, intervalSec: 60 },
        conformAction: "allow",
        exceedAction: "deny(429)",
        enforceOnKey: "HTTP-HEADER",
        enforceOnKeyName: "x-tenant-id"
      },
      match: {
        versionedExpr: "SRC_IPS_V1",
        config: { srcIpRanges: ["*"] }
      }
    },
    {
      action: "allow",
      priority: 2147483647,
      match: {
        versionedExpr: "SRC_IPS_V1",
        config: { srcIpRanges: ["*"] }
      }
    }
  ]
})

22

Cost attribution: tokens to dollars to tenants

At 5–10M users you must answer “which tenant spent how much, on which model, doing what.” Two streams feed it: GCP Billing export and your own token accounting.

Labels

Stamp every resource

Pulumi applies tenant, env, app, component labels to GKE node pools, Cloud SQL, Cloud Run, and Cloud Storage. Cloud Billing rolls these up.

Token usage

Effect AI exposes counts

response.usage from LanguageModel.generateText gives input/output tokens. Persist them keyed by tenant, workflow, model.

Reports

BigQuery sink

Export Cloud Billing to BigQuery. Join with your token usage table. Looker Studio for the per-tenant dashboard.

const response = yield* LanguageModel.generateText({
  prompt: payload.message
})

// response.usage is { inputTokens, outputTokens, totalTokens,
// cachedInputTokens }. The model name is not on usage — pass it
// through from the workflow payload or your model config so the
// number you store matches the call you actually made.
yield* Activity.make({
  name: "RecordUsage",
  execute: TokenUsage.append({
    tenantId: payload.tenantId,
    executionId,
    model: payload.model,
    inputTokens: response.usage.inputTokens,
    outputTokens: response.usage.outputTokens,
    cachedInputTokens: response.usage.cachedInputTokens ?? 0
  })
})

23

Capstone: the production repo

The capstone is a reusable repo: app code, infrastructure code, local development, and production deployment.

apps/
  cloudrun-main/          # existing app or adapter
  agent-runtime/          # Effect cluster + workflow runtime
  agent-gateway/          # internal HTTP + WebSocket API in GKE
packages/
  domain/                 # schemas, errors, tools, prompts
  effect-agents/          # Entity + Workflow modules
  event-store/            # SQL-backed durable events
infra/
  Pulumi.yaml
  index.ts                # GCP + Kubernetes resources
  components/
    network.ts
    artifact-registry.ts
    cloud-sql.ts
    gke-autopilot.ts
    agent-runtime.ts
    gateway.ts
docker-compose.yml        # local Postgres + runtime
Makefile                  # build, push, preview, deploy

Graduation criteria

You are done when these pass.

Local Docker Compose runs one agent workflow.
Pulumi creates GCP services, SQL, GKE, and K8s objects.
One GKE runner starts and writes cluster tables.
Three runners route the same session id consistently.
Cloud Run starts an agent run through the gateway.
WebSocket reconnect resumes from lastEventId.
Killing a pod does not lose durable workflow state.
Secrets rotate without code changes.
Logs show session, run, workflow, activity, and model spans.
Load test reaches controlled rejection, not collapse.

24

Troubleshooting: what actually goes wrong

Every entry below is a real failure mode from running this stack. Symptoms in the words you'll see in a log or alert, then the fix.

Cluster tables never created; the runner logs say "no shards assigned".

You left storage: "local" from local development. NodeClusterSocket.layer only persists shard ownership and runner registry when storage: "sql" is set and the SQL client layer is provided. Set both, redeploy with one replica, then scale out.

Runners can't reach each other; entity calls time out across pods.

runnerAddress and runnerListenAddress are not the same thing. runnerAddress is the address peers use to call this runner — it must be the pod's StatefulSet DNS (pod-0.svc.cluster.local). runnerListenAddress is what the local server binds to — that one is 0.0.0.0. Using 0.0.0.0 for both is the most common bug; the cluster registers an unreachable address.

A workflow keeps firing duplicates every time the client retries.

Your idempotencyKey is not deterministic. Anything that includes Date.now(), crypto.randomUUID(), or the current pod hostname produces a new execution id every retry. The key must be a stable function of the request — usually a client-provided runId:

// bad — new execution every call
idempotencyKey: () => crypto.randomUUID()

// good — client owns the run id, retries dedupe
idempotencyKey: ({ runId }) => runId

An Activity retries forever and never surfaces the error.

Activity.make on its own retries until the workflow's suspendedRetrySchedule gives up — effectively forever in dev. Wrap with Activity.retry({ times: N }) for a bounded budget, and consider .withCompensation so a final failure cleans up.

WebSockets disconnect every hour on Cloud Run.

Cloud Run caps every request — WebSockets included — at 60 minutes. There's no setting to extend it. Move the WebSocket gateway to GKE (module 12) and keep Cloud Run for short HTTP calls only.

Cloud SQL: FATAL: sorry, too many clients already.

db-custom-2-7680 caps at ~200 connections. With PgClient.layer's default pool of 10, you'll hit this around 20 pods. Either lower the per-pod pool (driver options on the layer), use the Cloud SQL Auth Proxy with its built-in pooling, or upsize the instance. Adding more pods to "fix" latency makes this worse, not better.

Cloud Run → GKE gateway returns 403 unauthorized.

Three things to check, in order. (1) The Cloud Run service identity has roles/run.invoker on the target — or the gateway is verifying the JWT itself and the audience doesn't match. (2) The ID token's aud claim equals the gateway's URL exactly, scheme and trailing slash included. (3) The token is in Authorization: Bearer, not in a custom header that the load balancer strips.

Pods OOMKilled on GKE Autopilot under load.

Autopilot enforces requests strictly — there's no burst headroom from co-tenants on the node. If your container peaks at 800Mi but requests 512Mi, it gets killed at 512Mi. Set resources.requests.memory to your real p95 plus a margin (e.g. 1Gi for the agent runtime), and set limits.memory equal to requests so Node sees the right heap target.

Workload Identity 401: pod can't reach Cloud SQL or Secret Manager.

The Kubernetes service account is missing the annotation that binds it to the GCP service account. The pod is running under the default KSA with no IAM. Check the spec:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: agent-runtime
  namespace: agents
  annotations:
    iam.gke.io/gcp-service-account: agent-runtime@PROJECT.iam.gserviceaccount.com

And the GSA must have roles/iam.workloadIdentityUser granted to PROJECT.svc.id.goog[agents/agent-runtime]. Module 07 has the Pulumi for both halves.

A pod dies, but its shards never move; entity calls hang.

RunnerHealth.layerK8s is what tells the cluster a pod is gone. If you're using runnerHealth: "k8s", the K8s API client needs read access to pods in the namespace — that's the Role + RoleBinding from module 07. If you've left RunnerHealth.layerNoop in by accident (very common when copy-pasting test setup), dead pods stay "alive" forever and their shards are orphaned.

Old workflow executions vanish after a deploy.

You renamed a workflow. Workflow names are the lookup key — Workflow.make({ name: "AgentWorkflow" }) binds executions to that exact string. Renaming, even cosmetically, abandons every in-flight execution. Keep old names. If you must evolve, add a new workflow alongside and migrate explicitly.

pulumi up sits on Cloud SQL for ten minutes then errors.

Private services access peering wasn't ready when the instance tried to create. Pulumi needs gcp.servicenetworking.Connection to exist and propagate before gcp.sql.DatabaseInstance with privateNetwork set. Add an explicit dependsOn: [privateVpcConnection] on the DB resource — Pulumi will not infer this dependency because the values aren't interpolated.

When this list isn't enough

Two diagnostics worth running before opening an issue. First: kubectl -n agents logs -l app=effect-agent-runtime --tail=200 --prefix with structured-log filter on severity>=WARNING — the cluster prints fairly readable warnings before things fail. Second: query the cluster runner registry directly: SELECT * FROM cluster_runners; SELECT * FROM cluster_shards; tells you exactly who owns what.

25

Teardown: leave nothing behind

Pulumi destroys what it created. Before destroy, take a final backup and confirm no data you need is left in Cloud SQL or Cloud Storage. Resources outside Pulumi (manual buckets, ad-hoc IAM bindings) must be cleaned up by hand.

Step 1

Final backup

Force an on-demand Cloud SQL backup and export to Cloud Storage. Export workflow tables if you need them long-term.

Step 2

Drain runners

Scale the StatefulSet to zero first so workflows release shard locks and write final state.

Step 3

Destroy stack

Pulumi destroy removes every resource it owns. Dependency order is automatic.

# 1. final SQL backup + export
gcloud sql backups create \
  --instance="$(pulumi -C infra stack output cloudSqlInstance)" \
  --description="pre-destroy snapshot"

gcloud sql export sql \
  "$(pulumi -C infra stack output cloudSqlInstance)" \
  "gs://$PROJECT_ID-teardown/dump-$(date +%Y%m%d).sql.gz" \
  --database=effect_cluster

# 2. drain runners gracefully
kubectl -n agents scale statefulset effect-agent-runtime --replicas=0
kubectl -n agents rollout status statefulset/effect-agent-runtime

# 3. destroy infrastructure
cd infra
pulumi destroy --yes
pulumi stack rm dev --yes

# 4. delete the GCP project (irreversible — only if isolated)
gcloud projects delete "$PROJECT_ID"

Last guard

Never run gcloud projects delete on a shared or production project. Keep teardown stacks in their own GCP project so destruction has no blast radius.

Ship durable Effect agents on GCP.

Three things to know up front.

Who this is for

What you will know after

Versions assumed

Copy-paste the project

Install prerequisites

Create the GCP project

Scaffold the repo

Pulumi project and stack

Build and push the runtime image

Pulumi up

Smoke test

North star: what you are building

Cloud Run app

GKE Autopilot

Cloud SQL Postgres

Docker-to-GKE mental model

The Effect runtime you will ship

Every agent action stays explicit

Pulumi stack: one program owns infra and Kubernetes

Project services

Private path

Cloud SQL Postgres

GKE Autopilot

Stateful objects

Artifact Registry

Build path: local to production

Local Docker Compose

Pulumi bootstrap

Single runner on GKE

Three runners

Cloud Run gateway

Production drills

Three rings, decreasing speed

Networking without hand plumbing

Headless Service DNS

Gateway, not cluster client

Security defaults you keep from day one

Authenticate with an OIDC ID token

Observability that sees fibers, workflows, and infra

Structured Effect logs

OpenTelemetry

Runtime health

Operations and failure nuances

Advisory lock nuance

One change at a time

Entity memory is cache

Run IDs matter

Shards before replicas

Graceful drain

Effect Cluster: entities, runners, shards, clients

One entity id owns one serial lane

Effect Workflow: durable agent runs

Workflow state survives runner death.

execute(payload, { discard: true })

poll(executionId)

interrupt(executionId)

WebSocket gateway: realtime over durable work

Sockets are a projection, not the source of truth.

Capacity, SLOs, and scale model for 5–10M users

Interactive path

Durable path

Data path

Production data architecture

Four stores, four jobs.

You own the event store

Delivery: CI/CD, migrations, rollouts, rollback

PR checks

Image build

Migration gate

Progressive rollout

Stack separation

Expand, deploy, contract

Image is not enough

Never break old runs

Preview before apply

Real run after deploy

What we use, why

Reliability operations: HA, DR, load, incidents

`execute(payload, { discard: true })`

`poll(executionId)`

`interrupt(executionId)`