Production agents on GCP

Ship durable Effect agents on GCP.

A long-form course on running stateful Effect Cluster runtimes behind a Cloud Run gateway, with Pulumi owning every resource.

Before you start

Three things to know up front.

Who this is for

Backend engineers who already write some TypeScript and want a serious-scale Effect deployment. Not an Effect introduction.

What you will know after

How to run Cluster entities and durable Workflow executions on GKE Autopilot, fronted by a stateless Cloud Run gateway, with Pulumi as the single source of truth.

Versions assumed

Node 22+, pnpm 9+, Pulumi 3.130+, Effect 3.x, @effect/cluster and @effect/workflow latest. Note that Node no longer ships corepack as a stable feature — the Quickstart installs pnpm directly. The Quickstart pins exact versions.

Quickstart

From gcloud login to a running agent

00

Copy-paste the project

Every box below is a real shell command. Replace anything in <ANGLE_BRACKETS> once. The flow takes about 25 minutes the first time.

  1. A

    Install prerequisites

    Versions known to work: Node 22+, pnpm 9+, Pulumi 3.130+, Docker, gcloud. We install pnpm directly because Node no longer treats corepack as a supported way to ship a package manager.

    setup.shlocal machine
    # macOS — node@22 keeps you on the current LTS line
    brew install node@22 docker
    brew install pulumi/tap/pulumi google-cloud-sdk
    npm install -g pnpm@9    # install pnpm directly, do not use corepack
    
    # Linux (Debian/Ubuntu)
    curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
    sudo apt install -y nodejs docker.io
    curl -fsSL https://get.pulumi.com | sh
    curl https://sdk.cloud.google.com | bash
    sudo npm install -g pnpm@9
    
    # Sanity check — every version should print
    node --version       # v22.x
    pnpm --version       # 9.x
    docker --version
    pulumi version
    gcloud --version
  2. B

    Create the GCP project

    Replace PROJECT_ID with something globally unique. Billing account id comes from gcloud billing accounts list.

    gcp-project.shone-time
    export PROJECT_ID=<effect-agents-PROJECT>
    export REGION=us-central1
    export BILLING_ACCOUNT=<XXXXXX-XXXXXX-XXXXXX>
    
    gcloud auth login
    gcloud projects create "$PROJECT_ID" --name="Effect agents"
    gcloud config set project "$PROJECT_ID"
    gcloud billing projects link "$PROJECT_ID" \
      --billing-account "$BILLING_ACCOUNT"
    
    gcloud services enable \
      container.googleapis.com \
      artifactregistry.googleapis.com \
      sqladmin.googleapis.com \
      secretmanager.googleapis.com \
      vpcaccess.googleapis.com \
      servicenetworking.googleapis.com \
      iam.googleapis.com \
      monitoring.googleapis.com \
      logging.googleapis.com
    
    gcloud auth application-default login
  3. C

    Scaffold the repo

    The capstone (module 23) is the full shape. Start with just enough to deploy a single runner.

    scaffold.shrepo bootstrap
    mkdir effect-agents && cd effect-agents
    pnpm init
    pnpm add effect @effect/ai @effect/ai-openai \
      @effect/cluster @effect/workflow \
      @effect/platform @effect/platform-node \
      @effect/rpc @effect/sql @effect/sql-pg
    pnpm add -D typescript tsx vitest @effect/vitest
    
    mkdir -p apps/agent-runtime/src apps/agent-gateway/src \
      packages/domain/src packages/event-store/src \
      infra/components

    The smallest runtime that actually boots, joins the shard ring, and serves a health probe. Replace with the full agent runtime once it works.

    apps/agent-runtime/src/main.tsrunnable
    import { Entity, RunnerAddress, Rpc } from "@effect/cluster"
    import { HttpRouter, HttpServer, HttpServerResponse } from "@effect/platform"
    import {
      NodeClusterSocket,
      NodeHttpServer,
      NodeRuntime
    } from "@effect/platform-node"
    import { PgClient } from "@effect/sql-pg"
    import { Config, Effect, Layer, Logger, Option, Schema } from "effect"
    import { createServer } from "node:http"
    
    // A tiny entity that proves the cluster is wired correctly. Every
    // call to Echo(text) for a given entityId routes to the same runner.
    const Echo = Rpc.make("Echo", {
      payload: { text: Schema.String },
      success: Schema.String
    })
    
    const EchoEntity = Entity.make("Echo", [Echo])
    
    const EchoEntityLive = EchoEntity.toLayer(
      Effect.gen(function* () {
        const address = yield* Entity.CurrentAddress
        return EchoEntity.of({
          Echo: ({ payload }) =>
            Effect.succeed(`[${address.entityId}] ${payload.text}`)
        })
      })
    )
    
    // Probes for the StatefulSet from module 09.
    const HealthLive = HttpRouter.empty.pipe(
      HttpRouter.get("/healthz", HttpServerResponse.text("ok")),
      HttpRouter.get("/readyz", HttpServerResponse.text("ready")),
      HttpServer.serve()
    )
    
    const podHost = Config.string("POD_NAME").pipe(
      Config.map((name) =>
        `${name}.effect-agent-runners.agents.svc.cluster.local`
      ),
      Config.withDefault("localhost")
    )
    
    const ClusterLive = Layer.unwrapEffect(
      Effect.gen(function* () {
        const host = yield* podHost
        return NodeClusterSocket.layer({
          storage: "sql",
          runnerHealth: "k8s",
          runnerHealthK8s: {
            namespace: "agents",
            labelSelector: "app=effect-agent-runtime"
          },
          shardingConfig: {
            runnerAddress: Option.some(RunnerAddress.make(host, 50000)),
            runnerListenAddress: Option.some(
              RunnerAddress.make("0.0.0.0", 50000)
            )
          }
        })
      })
    )
    
    const SqlLive = PgClient.layerConfig({
      url: Config.redacted("DATABASE_URL")
    })
    
    const HttpLive = NodeHttpServer.layer(createServer, { port: 8080 })
    
    const MainLive = Layer.mergeAll(EchoEntityLive, HealthLive).pipe(
      Layer.provide(HttpLive),
      Layer.provideMerge(ClusterLive),
      Layer.provideMerge(SqlLive),
      Layer.provide(Logger.json)
    )
    
    Layer.launch(MainLive).pipe(NodeRuntime.runMain)
    apps/agent-runtime/Dockerfilenode:22-slim, no corepack
    # syntax=docker/dockerfile:1.7
    
    # Node 22+ no longer guarantees a working corepack, so we install
    # pnpm directly with npm and pin the version. Multi-stage keeps the
    # production image small and free of build tooling.
    
    FROM node:22-slim AS base
    ENV PNPM_HOME=/pnpm PATH=/pnpm:$PATH
    RUN npm install -g pnpm@9.15.0
    WORKDIR /app
    
    # --- deps: lockfile + manifests only, maximises layer cache hits
    FROM base AS deps
    COPY package.json pnpm-lock.yaml pnpm-workspace.yaml ./
    COPY apps/agent-runtime/package.json apps/agent-runtime/
    COPY packages packages
    RUN --mount=type=cache,id=pnpm,target=/pnpm/store \
        pnpm install --frozen-lockfile
    
    # --- build: compile TypeScript
    FROM deps AS build
    COPY tsconfig.json ./
    COPY apps/agent-runtime apps/agent-runtime
    RUN pnpm --filter agent-runtime build
    
    # --- runtime: only what node needs to run
    FROM node:22-slim AS runtime
    ENV NODE_ENV=production NODE_OPTIONS=--enable-source-maps
    RUN npm install -g pnpm@9.15.0
    WORKDIR /app
    COPY --from=build /app/node_modules ./node_modules
    COPY --from=build /app/packages ./packages
    COPY --from=build /app/apps/agent-runtime/dist apps/agent-runtime/dist
    COPY --from=build /app/apps/agent-runtime/package.json apps/agent-runtime/
    USER node
    EXPOSE 8080 50000
    CMD ["node", "apps/agent-runtime/dist/main.js"]
  4. D

    Pulumi project and stack

    Run inside infra/. Pick TypeScript. Stack name is the environment; start with dev.

    pulumi-init.shinfra/
    cd infra
    pulumi new gcp-typescript --force \
      --name effect-agents-infra \
      --description "Effect agents on GCP"
    
    pulumi stack init dev
    pulumi config set gcp:project "$PROJECT_ID"
    pulumi config set gcp:region "$REGION"
    pulumi config set --secret postgresPassword \
      "$(openssl rand -hex 24)"
    pulumi config set --secret openaiApiKey "<sk-...>"
  5. E

    Build and push the runtime image

    Tag with the git sha — never latest. Artifact Registry repo name matches what Pulumi creates next.

    build.shrepo root
    export SHA=$(git rev-parse --short HEAD 2>/dev/null || echo dev)
    export IMAGE="$REGION-docker.pkg.dev/$PROJECT_ID/agents/agent-runtime:$SHA"
    
    gcloud auth configure-docker "$REGION-docker.pkg.dev" --quiet
    
    docker build -t "$IMAGE" -f apps/agent-runtime/Dockerfile .
    docker push "$IMAGE"
    
    cd infra
    pulumi config set agentRuntimeImage "$IMAGE"
  6. F

    Pulumi up

    First apply creates VPC, Cloud SQL HA, GKE Autopilot, Artifact Registry, Secret Manager, namespace, RBAC, StatefulSet, gateway Service. Expect 10–15 minutes.

    deploy.shinfra/
    pulumi preview        # review the plan
    pulumi up --yes       # apply
    
    pulumi stack output gatewayUrl
    pulumi stack output cloudSqlConnection
    pulumi stack output kubeconfig --show-secrets > ~/.kube/agents.yaml
    export KUBECONFIG=~/.kube/agents.yaml
  7. G

    Smoke test

    Start a run, poll the workflow, open a WebSocket and replay events.

    smoke.shany shell
    export GW=$(cd infra && pulumi stack output gatewayUrl)
    
    RUN=$(curl -sX POST "$GW/agents/session-1/runs" \
      -H 'content-type: application/json' \
      -d '{"runId":"run-1","message":"hello"}')
    echo "$RUN"
    
    EXEC=$(echo "$RUN" | jq -r .executionId)
    curl -s "$GW/runs/$EXEC" | jq .
    
    npx -y wscat -c "${GW/https/wss}/ws/session-1"

Cost guard, honestly

This stack idles around $300–500/month in us-central1. Cloud SQL HA on db-custom-2-7680 is the bulk of it (~$250–350); GKE Autopilot adds ~$70 in cluster fee plus small per-pod resource cost; Artifact Registry and Secret Manager are rounding errors. Don't leave this running over a weekend by accident — run pulumi destroy from module 25.

Foundation

Mental model and architecture

01

North star: what you are building

A production deployment is a contract between Cloud Run as the user edge, GKE as the durable runtime, and Cloud SQL as the brain. Every later module sharpens this triangle.

User-facing tier

Cloud Run app

Stateless HTTP and WebSocket entry. Auth, rate-limits, and tenant boundaries live here.

Runtime tier

GKE Autopilot

Long-lived Effect cluster runners and durable workflow execution per agent session.

Durability tier

Cloud SQL Postgres

Cluster runner registry, workflow state, durable events, and pgvector memory for retrieval.

Course bias

Keep Cloud Run for stateless ingress. Keep GKE Autopilot for durable runtime. Avoid hybrid systems where Cloud Run holds in-flight agent state.

02

Docker-to-GKE mental model

Learn only the Kubernetes objects a Docker operator needs: image, replicas, DNS, secrets, ports, and health. No Helm. No custom controllers.

Docker conceptGKE objectWhy for EffectCourse rule
docker runPodOne Effect runtime process.Never deploy bare pods.
docker compose --scaleStatefulSetStable names agent-0, agent-1.Required for runner addresses.
Compose network aliasHeadless Service DNSPer-pod DNS for cluster.Maps to runnerAddress.
Container portService portRunners speak TCP.Expose one internal port.
.envSecret + env varDB and provider credentials.Pulumi creates and binds.
Restart policyProbe + rolloutBad runners are replaced safely.Boring startup is mandatory.

03

The Effect runtime you will ship

One Node container launches a single Layer graph. Cluster, workflow engine, entities, workflows, SQL, AI providers, and telemetry are services in the same graph.

src/runtime.ts Layer graph
const podDns = () =>
  `${process.env.POD_NAME}` +
  `.effect-agent-runners.agents.svc.cluster.local`

const ClusterLayer = NodeClusterSocket.layer({
  storage: "sql",
  runnerHealth: "k8s",
  runnerHealthK8s: {
    namespace: "agents",
    labelSelector: "app=effect-agent-runtime"
  },
  shardingConfig: {
    runnerAddress: Option.some(
      RunnerAddress.make(podDns(), 50000)
    ),
    runnerListenAddress: Option.some(
      RunnerAddress.make("0.0.0.0", 50000)
    )
  }
})

const RuntimeLayer = Layer.mergeAll(
  AgentEntityLive,
  AgentWorkflowLive,
  AgentGatewayLive
).pipe(
  Layer.provideMerge(ClusterWorkflowEngine.layer),
  Layer.provideMerge(ClusterLayer),
  Layer.provideMerge(SqlLayer),
  Layer.provideMerge(AiLayer)
)

Layer.launch(RuntimeLayer).pipe(NodeRuntime.runMain)

Effect type lens

Every agent action stays explicit

A typical business function reads Effect<Answer, AgentError, AgentMemory | LanguageModel | Tools>. Pulumi and GKE decide where requirements are provided. The signature decides correctness.

  • Success: answer, run id, stream event, workflow status.
  • Error: domain failure types like rate limit, tool failure, policy block.
  • Requirements: model, SQL, tools, config, telemetry.

Platform

Pulumi stack, build path, and infra surface

04

Pulumi stack: one program owns infra and Kubernetes

Pulumi TypeScript first creates the GCP resources, then uses the generated kubeconfig to create namespace, RBAC, services, StatefulSet, secrets, and gateway objects.

GCP

Project services

Enable GKE, Artifact Registry, Cloud SQL, Secret Manager, IAM, VPC Access, Cloud Logging.

Network

Private path

VPC, subnet, private service access for Cloud SQL, optional Cloud Run VPC connector.

Data

Cloud SQL Postgres

Cluster runner registry, shard locks, persisted messages, workflow state, agent app tables.

Runtime

GKE Autopilot

Managed cluster runs the Docker image as a long-lived StatefulSet.

Kubernetes

Stateful objects

Namespace, ServiceAccount, RBAC, headless Service, StatefulSet, gateway Service.

Delivery

Artifact Registry

Images built in CI, pushed, and referenced by Pulumi config.

Where state lives

The Quickstart leaves Pulumi state in Pulumi Cloud, which is fine while you're learning. For real environments most teams self-host on GCS: pulumi login gs://<PROJECT_ID>-pulumi-state before stack init. You get versioning, IAM, and no third-party access to your infra graph. The bucket should be in the same project, have versioning on, and have uniform_bucket_level_access enabled.

infra/index.tsPulumi skeleton
import * as gcp from "@pulumi/gcp"
import * as k8s from "@pulumi/kubernetes"
import * as pulumi from "@pulumi/pulumi"

const cfg = new pulumi.Config()
const region = cfg.require("region")

const network = new gcp.compute.Network("agents-vpc", {
  autoCreateSubnetworks: false
})
const subnet = new gcp.compute.Subnetwork("agents-subnet", {
  region,
  network: network.id,
  ipCidrRange: "10.42.0.0/20"
})
const repo = new gcp.artifactregistry.Repository("agents", {
  location: region,
  repositoryId: "agents",
  format: "DOCKER"
})
const db = new gcp.sql.DatabaseInstance("agents-pg", {
  region,
  databaseVersion: "POSTGRES_16",
  settings: {
    tier: "db-custom-2-7680",
    ipConfiguration: { ipv4Enabled: false, privateNetwork: network.id },
    backupConfiguration: { enabled: true, pointInTimeRecoveryEnabled: true },
    availabilityType: "REGIONAL"
  }
})
const cluster = new gcp.container.Cluster("agents-autopilot", {
  location: region,
  enableAutopilot: true,
  network: network.id,
  subnetwork: subnet.id,
  privateClusterConfig: { enablePrivateNodes: true }
})

const k8sProvider = new k8s.Provider("gke", { kubeconfig: makeKubeconfig(cluster) })

// Namespace, ServiceAccount, RBAC, headless Service, StatefulSet,
// Secret, gateway Service — all created with k8sProvider.

05

Build path: local to production

Every module ships a lab outcome. You do not jump to multi-replica GKE on day one.

  1. 01

    Local Docker Compose

    One Node container plus Postgres. Use storage: "local" first, then switch to storage: "sql".

    docker-compose.ymllocal dev
    services:
      postgres:
        image: postgres:16
        environment:
          POSTGRES_USER: agents
          POSTGRES_PASSWORD: agents
          POSTGRES_DB: agents
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -U agents"]
          interval: 2s
          timeout: 5s
          retries: 10
        ports: ["5432:5432"]
        volumes: [pgdata:/var/lib/postgresql/data]
    
      runtime:
        build:
          context: .
          dockerfile: apps/agent-runtime/Dockerfile
        depends_on:
          postgres: { condition: service_healthy }
        environment:
          DATABASE_URL: postgres://agents:agents@postgres:5432/agents
          POD_NAME: runtime-local
        ports:
          - "8080:8080"     # health + http gateway
          - "50000:50000"   # cluster socket (only for multi-node compose)
    
    volumes:
      pgdata:

    Run with docker compose up --build. First boot creates the database; subsequent boots reuse it. To simulate two runners, copy runtime as runtime-2 with POD_NAME: runtime-local-2 and a different host port.

  2. 02

    Pulumi bootstrap

    Create Artifact Registry, Cloud SQL HA, GKE Autopilot, Secret Manager, IAM, VPC.

  3. 03

    Single runner on GKE

    Deploy with replicas: 1. Confirm SQL tables, logs, and one workflow execution.

  4. 04

    Three runners

    Scale out. Confirm shard ownership moves and same session id routes consistently.

  5. 05

    Cloud Run gateway

    Cloud Run calls GKE gateway over private networking. Start runs, poll workflows, stream events.

  6. 06

    Production drills

    Kill a pod, restart the runtime, fail a tool, rotate a secret, restore from backup.

test/agent-workflow.test.ts@effect/vitest
import { it } from "@effect/vitest"
import { Effect, Layer, TestClock } from "effect"
import { TestRunner } from "@effect/cluster"
import { AgentWorkflow, AgentWorkflowLive } from "../src/AgentWorkflow"

// TestRunner.layer runs the whole cluster in-memory: no SQL, no
// network, no real time. Perfect for entity + workflow unit tests.
const TestLayer = AgentWorkflowLive.pipe(
  Layer.provideMerge(TestRunner.layer)
)

it.effect("workflow completes for a simple ask", () =>
  Effect.gen(function*() {
    const executionId = yield* AgentWorkflow.execute(
      { runId: "r1", sessionId: "s1", message: "hi" },
      { discard: true }
    )
    // Advance time so any DurableClock.sleep resolves instantly
    yield* TestClock.adjust("1 hour")

    const result = yield* AgentWorkflow.poll(executionId)
    // assert result is Complete with the expected exit
  }).pipe(Effect.provide(TestLayer))
)

What to test where

Three rings, decreasing speed

  • Unit with TestRunner.layer: entity protocols, workflow happy path + each error type, schema decoders. Fast, no Docker.
  • Integration with docker compose up postgres: SQL storage round-trip, real idempotencyKey dedup, real shard math. Slow enough to run on PR but not in watch mode.
  • Smoke against a deployed staging stack: the seven steps from Quickstart G run as a script after every deploy. This is the only test that proves Cloud Run, GKE, and Cloud SQL are wired together correctly.

06

Networking without hand plumbing

Effect runners need stable addresses. The public app does not need to join the cluster. Cloud Run calls a gateway; the gateway calls typed entity clients.

Runner-to-runner

Headless Service DNS

Use StatefulSet pod DNS as the runnerAddress. Use 0.0.0.0 only as listen address.

  • effect-agent-runtime-0.effect-agent-runners.agents.svc.cluster.local
  • Port 50000 for cluster socket traffic.
  • Pod labels drive Kubernetes health checks.

Cloud Run-to-GKE

Gateway, not cluster client

Cloud Run hits a small HTTP gateway inside GKE. The gateway uses typed Effect entity clients.

  • POST /agents/:sessionId/runs — start workflow.
  • GET /runs/:executionId — poll workflow.
  • GET /runs/:executionId/events — stream events.

07

Security defaults you keep from day one

Least-privilege service accounts, secrets out of source, private DB. Pulumi owns both GCP IAM and Kubernetes RBAC.

SurfaceDefaultWhyPulumi creates
Runtime identityWorkload identityPods use a GCP service account without static keys.GSA, KSA, IAM binding.
SecretsSecret ManagerNo plaintext config in Git.Secret versions, K8s Secret projection.
Cloud SQLPrivate IPNo public DB endpoint.Private service access and HA instance.
K8s healthPod read RBACEffect runnerHealth: "k8s" needs pod visibility.Role and RoleBinding.
NetworkInternal firstCloud Run reaches gateway privately where possible.VPC connector or internal ingress.
infra/components/workload-identity.tsPulumi
// 1. GCP service account the pod will act as
const runtimeGsa = new gcp.serviceaccount.Account("agent-runtime", {
  accountId: "agent-runtime",
  displayName: "Effect agent runtime pods"
})

// 2. Let the Kubernetes service account impersonate the GSA
new gcp.serviceaccount.IAMBinding("agent-runtime-wi", {
  serviceAccountId: runtimeGsa.name,
  role: "roles/iam.workloadIdentityUser",
  members: [pulumi.interpolate`serviceAccount:${project}.svc.id.goog` +
            `[agents/agent-runtime]`]
})

// 3. Give the GSA only what the pod actually needs
new gcp.projects.IAMMember("agent-runtime-sql", {
  project, role: "roles/cloudsql.client",
  member: pulumi.interpolate`serviceAccount:${runtimeGsa.email}`
})
new gcp.projects.IAMMember("agent-runtime-secrets", {
  project, role: "roles/secretmanager.secretAccessor",
  member: pulumi.interpolate`serviceAccount:${runtimeGsa.email}`
})

// 4. Annotate the K8s service account
new k8s.core.v1.ServiceAccount("agent-runtime", {
  metadata: {
    name: "agent-runtime", namespace: "agents",
    annotations: {
      "iam.gke.io/gcp-service-account": runtimeGsa.email
    }
  }
}, { provider: k8sProvider })

Cloud Run → GKE gateway

Authenticate with an OIDC ID token

The gateway sits inside the VPC behind an internal load balancer. Cloud Run reaches it through a VPC connector. To prove identity, Cloud Run grabs a Google-signed ID token from the metadata server and sends it as Authorization: Bearer <token>. The gateway verifies the token's aud claim against its custom audience.

  • Set audience = the gateway's internal URL (or a custom audience string).
  • Grant the Cloud Run service account roles/run.invoker only if you're invoking another Cloud Run; for a self-verifying gateway the gateway middleware checks the JWT directly using Google's JWKS.
  • Never trust x-tenant-id from Cloud Run. Re-extract tenant from a verified claim or your auth header inside the gateway.

08

Observability that sees fibers, workflows, and infra

Agents fail in unusual ways: rate limits, tool failures, partitions, workflow suspension, runner churn. Logs alone are not enough.

Logs

Structured Effect logs

Annotate with sessionId, runId, executionId, model, tool, attempt.

Traces

OpenTelemetry

Export spans to Cloud Trace. Wrap agent turns, tool calls, model calls, and workflow activities.

Metrics

Runtime health

Active runners, shard ownership, workflow states, tool latency, tokens, retry counts.

Minimum dashboard checklist
  • Runner pods and Kubernetes readiness.
  • Cloud SQL CPU, connections, storage, lock waits.
  • Workflow completions, failures, suspensions, retries.
  • LLM latency, rate-limit count, fallback count, tool error rate.
  • Cloud Run gateway latency and error rate.

09

Operations and failure nuances

The course treats these as required labs, not optional notes.

DB sessions

Advisory lock nuance

Cluster SQL storage uses advisory locks. Avoid poolers that rotate sessions underneath ownership.

Rollouts

One change at a time

Deploy with replicas: 1 after schema or runtime changes, then scale out.

Passivation

Entity memory is cache

Active memory disappears on idle or restart. Durable facts live in SQL or workflow state.

Idempotency

Run IDs matter

Workflow idempotency keys are deterministic. Client retries do not double-trigger runs.

Scaling

Shards before replicas

Set shard count up front. Changing shards later is a migration decision.

Shutdown

Graceful drain

Termination grace, readiness, and logs confirm runners release work cleanly.

k8s/statefulset.yaml (excerpt)probes + drain
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60   # give cluster time to shed shards
      containers:
        - name: agent-runtime
          # Startup probe lets the Layer graph come up before liveness fires.
          # First boot includes opening pg pool, joining the shard ring,
          # and running pending migrations — easy 20s.
          startupProbe:
            httpGet: { path: /healthz, port: 8080 }
            failureThreshold: 30
            periodSeconds: 2
          readinessProbe:
            httpGet: { path: /readyz, port: 8080 }
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /healthz, port: 8080 }
            periodSeconds: 10
          lifecycle:
            preStop:
              exec:
                # Tell the runtime to stop accepting new entity messages,
                # then let in-flight workflow activities finish. The Effect
                # runtime's own scope finalizers release shard locks on
                # SIGTERM; preStop just gives the load balancer time to
                # mark this pod NotReady first.
                command: ["/bin/sh", "-c", "sleep 10"]

What the probes actually answer

/healthz = "the process is up." /readyz = "this runner is in the shard ring AND can reach its database AND has finished startup migrations." If /readyz 200s before the shard ring is joined, you'll route traffic to a runner that doesn't own any entities yet. Don't conflate them.

Effect runtime

Cluster, workflow, and realtime

10

Effect Cluster: entities, runners, shards, clients

Cluster is typed distributed actors. An Entity is the protocol. An entity layer is the server. Entity.client is the typed remote handle.

Core idea

One entity id owns one serial lane

For agents the entity id is usually sessionId, agentId, or tenant:user. Calls to the same id route to the same active runner while alive.

  • Runner: a GKE pod with NodeClusterSocket.layer.
  • Shard: bucket assigning entity ids to runners.
  • Entity: typed RPC plus stateful handler.
  • Client: location-transparent typed caller.
cluster/AgentEntity.tsprotocol
const Ask = Rpc.make("Ask", {
  payload: { runId: Schema.String, message: Schema.String },
  success: Schema.String
})
const Cancel = Rpc.make("Cancel", {
  payload: { executionId: Schema.String },
  success: Schema.Void
})

export const AgentEntity = Entity.make("Agent", [Ask, Cancel])

export const AgentEntityLive = AgentEntity.toLayer(
  Effect.gen(function* () {
    const address = yield* Entity.CurrentAddress
    const sessionId = address.entityId

    return AgentEntity.of({
      Ask: ({ payload }) => AgentWorkflow.execute({
        runId: payload.runId,
        sessionId,
        message: payload.message
      }, { discard: true }),

      Cancel: ({ payload }) =>
        AgentWorkflow.interrupt(payload.executionId)
    })
  }),
  { maxIdleTime: "10 minutes" }
)
ConceptEffect APIAgent useNuance
ProtocolRpc.makeAsk, cancel, snapshot, append event.Small schema-defined messages.
Entity definitionEntity.makeDefines the actor type.No implementation yet.
Entity behaviorEntity.toLayerHandlers and in-memory state.Memory is cache.
Remote callAgentEntity.clientGateway calls clientFor(sessionId).Ask.Caller does not know the pod.
PersistenceClusterSchema.PersistedPersist important messages.Skip noisy read calls.
ConcurrencyRpc.forkRun safe readers concurrently.Default serial protects state.

11

Effect Workflow: durable agent runs

Cluster answers where does this agent live. Workflow answers how do I run this long task durably, retry it, sleep, resume, and inspect status.

workflow/AgentWorkflow.tsdurable run
class AgentError extends Schema.TaggedError<AgentError>("AgentError")(
  "AgentError",
  { message: Schema.String }
) {}

export const AgentWorkflow = Workflow.make({
  name: "AgentWorkflow",
  payload: {
    runId: Schema.String,
    sessionId: Schema.String,
    message: Schema.String
  },
  success: Schema.String,
  error: AgentError,
  idempotencyKey: ({ runId }) => runId
})

export const AgentWorkflowLive = AgentWorkflow.toLayer(
  Effect.fn(function* (payload, executionId) {
    yield* EventStore.append({
      executionId, type: "started", message: payload.message
    })

    const answer = yield* Activity.make({
      name: "RunAgentTurn",
      success: Schema.String,
      error: AgentError,
      execute: Effect.gen(function* () {
        const assistant = yield* AiAssistant
        return yield* assistant.agent(payload.message)
      })
    }).pipe(Activity.retry({ times: 3 }))

    yield* EventStore.append({ executionId, type: "completed", answer })
    return answer
  })
)

Durable model

Workflow state survives runner death.

Use workflows for work that outlives one HTTP request: reasoning, tool chains, approvals, scheduled follow-ups, retries, and cancellation.

  • Workflow.make: name, schemas, idempotency key.
  • toLayer: durable implementation.
  • Activity.make: retryable external work.
  • DurableClock: sleep without holding compute.
  • DurableDeferred: approval or external signal.

Start

execute(payload, { discard: true })

Returns execution id quickly. Perfect for HTTP and WebSocket start endpoints.

Poll

poll(executionId)

Returns complete, suspended, or undefined. Status pages and retries use this.

Cancel

interrupt(executionId)

Map a user cancel action to workflow interruption.

12

WebSocket gateway: realtime over durable work

WebSockets belong at the gateway edge. They never own state. They start workflows, subscribe to persisted events, and reconnect by replaying from lastEventId.

Why GKE for sockets, not Cloud Run

Cloud Run treats WebSockets as long-lived HTTP requests and applies the 60-minute request timeout hard cap. Anything still connected at 60 minutes gets killed regardless of activity. That's fine for short chat sessions if your client reconnects cleanly, but for agent runs that observe a long workflow, GKE is the calmer home: idle timeouts are yours to choose, instance scaling is predictable, and you avoid the cold-start-during-reconnect dance.

gateway/WebSocket.ts@effect/platform
const GatewayLive = HttpRouter.empty.pipe(
  HttpRouter.get(
    "/ws/:sessionId",
    Effect.gen(function* () {
      const req = yield* HttpServerRequest.HttpServerRequest
      const sessionId = req.pathParams.sessionId

      const outbound = EventStore.stream(sessionId).pipe(
        Stream.map((event) => JSON.stringify(event)),
        Stream.encodeText
      )

      const inbound = outbound.pipe(
        Stream.pipeThroughChannel(HttpServerRequest.upgradeChannel()),
        Stream.decodeText(),
        Stream.runForEach((raw) =>
          Effect.gen(function* () {
            const msg = yield* Schema.decodeUnknown(ClientMessage)(
              JSON.parse(raw)
            )
            const clientFor = yield* AgentEntity.client
            const agent = clientFor(sessionId)
            switch (msg._tag) {
              case "Ask":
                yield* agent.Ask({ runId: msg.runId, message: msg.text })
                return
              case "Cancel":
                yield* agent.Cancel({ executionId: msg.executionId })
                return
            }
          })
        )
      )

      yield* inbound
      return HttpServerResponse.empty()
    })
  ),
  HttpServer.serve()
)

Realtime contract

Sockets are a projection, not the source of truth.

The browser can disconnect anytime. Every meaningful event is persisted so a reconnect resumes by lastEventId.

  • Inbound: ask, cancel, approve, heartbeat.
  • Outbound: accepted, token, tool-call, tool-result, completed, failed.
  • Reconnect: client sends last seen event id.
  • Backpressure: bounded queues and heartbeat timeouts.
ProblemPatternEffect toolGCP nuance
Client disconnectPersist events; replay from offset.Stream + SQL EventStore.Do not rely on Cloud Run memory.
Long LLM responseWorkflow emits events while running.Activity + EventStore.Gateway restarts are safe.
Cancel buttonCancel message interrupts workflow.Workflow.interruptReturn final cancelled event to all clients.
Multiple tabsSame session id subscribes once.AgentEntity.clientEntity serializes mutations.
Socket scalingGateway pods are stateless subscribers.Stream + SQL/PubSub bridge.Prefer GKE gateway over Cloud Run for sockets.

Scale

Capacity, data, delivery, reliability, governance

13

Capacity, SLOs, and scale model for 5–10M users

10 million users is not one number. Model MAU, peak concurrent users, concurrent WebSockets, active agent runs, tool calls per run, tokens per minute, and database writes per run.

MetricWhyLeverDeliverable
MAU / DAUBusiness scale, not raw load.Plans, retention, tenant tiers.Traffic worksheet per tier.
Peak concurrent usersGateway CPU and request capacity.Cloud Run autoscaling, GKE HPA.Staged concurrency load test.
Concurrent WebSocketsRealtime connection pressure.GKE gateway, heartbeat, idle timeout.Soak test and reconnect test.
Active agent runsWorkflow and LLM pressure.Quotas, rate limits, queues.Admission policy.
DB writes per runCloud SQL bottleneck.Event compaction, batching, retention.Event schema plan.
Tokens per minuteProvider limits and cost.Model router, fallback plan.Per-tenant token dashboard.

SLO

Interactive path

p95 gateway response, first-token latency, WebSocket reconnect success, run-start acceptance.

SLO

Durable path

Workflow completion rate, retry exhaustion, suspended workflow age, event replay lag.

SLO

Data path

Cloud SQL utilization, p95 query latency, lock waits, backup success, restore drill age.

14

Production data architecture

At this scale the hard part is not starting a workflow. The hard part is deciding which state is hot, durable, searchable, replayable, compacted, and deletable.

State taxonomy

Four stores, four jobs.

  • Entity memory: hot coordination only.
  • Workflow state: durable execution and retries.
  • Event store: realtime replay and audit.
  • Memory store: semantic and user memory.
data/event-store.tsindustry pattern
type AgentEvent =
  | { _tag: "RunAccepted"; executionId: string; runId: string }
  | { _tag: "Token"; executionId: string; index: number; text: string }
  | { _tag: "ToolCall"; executionId: string; tool: string; callId: string }
  | { _tag: "ToolResult"; executionId: string; callId: string; ok: boolean }
  | { _tag: "Completed"; executionId: string; answerRef: string }
  | { _tag: "Failed"; executionId: string; reason: string }

// Persist every event with tenantId, sessionId, executionId,
// monotonic eventId, createdAt, and retention bucket.
DataPrimary GCP choiceWhyNuance
Cluster + workflowCloud SQL PostgresOfficial SQL-backed cluster storage.Stable sessions; watch locks.
Event streamCloud SQL first, Pub/Sub laterSQL is simple and replayable.Pub/Sub is not source of truth.
Semantic memoryAlloyDB pgvector or Vertex VectorSearch and embeddings at scale.Clear PII and deletion path.
Artifacts and filesCloud StorageCheap durable blobs.Store refs in SQL, not blobs.
AnalyticsBigQuery sinkLarge reporting without OLTP.Export redacted events only.
migrations/001_event_store.sqlstarting point
-- The event store is your code, not Effect's. This is one
-- reasonable shape; adapt freely. Cluster and workflow state are
-- created by @effect/cluster and @effect/workflow themselves — do
-- not touch those tables.

create table agent_events (
  tenant_id     text        not null,
  session_id    text        not null,
  execution_id  text        not null,
  event_id      bigserial   primary key,
  event_type    text        not null,
  payload       jsonb       not null,
  created_at    timestamptz not null default now()
);

create index agent_events_session_idx
  on agent_events (tenant_id, session_id, event_id);

create index agent_events_exec_idx
  on agent_events (execution_id, event_id);

-- Retention partitioning lives here too. The course leaves the
-- specifics to you because tenancy and compliance shape them.

Honest gap

You own the event store

Cluster runner registry and workflow execution tables are managed for you by @effect/cluster and @effect/workflow when you pick SQL storage. The event store above is application code — there is no built-in module to install. Treat the DDL on the left as a starting point, not a published library.

Two things to decide before you ship it:

  • Retention. Daily partition table or scheduled delete? Most teams partition by created_at::date and drop old partitions.
  • PII strategy. Either filter at append time or store raw and have a deletion workflow per tenant on request.

Connection pool reality

Cloud SQL on db-custom-2-7680 caps at roughly 200 connections. PgClient.layer defaults to a pool of 10 per process. With three runner pods + two gateway pods that's already 50 connections — fine. Add a fourth tier (background jobs, migrations runner) and you're at 80. Past about 120 total open connections, queue queries inside the pool instead of buying more headroom; the database is the bottleneck, not the pool. PgClient.layer({ url, transformQueryNames, transformResultNames }) takes pool tuning via the underlying postgres driver options if you need to lower the per-pod cap.

15

Delivery: CI/CD, migrations, rollouts, rollback

Pulumi makes environments repeatable. Production also needs disciplined release mechanics.

  1. A

    PR checks

    Typecheck, tests, lint, Pulumi preview on every PR.

  2. B

    Image build

    Tagged by git SHA, signed, pushed to Artifact Registry.

  3. C

    Migration gate

    Apply backward-compatible DB changes before deploying the new image.

  4. D

    Progressive rollout

    Staging, one runner, then full replica count. Smoke tests after each step.

Pulumi

Stack separation

Distinct dev, staging, and prod stacks with protected resources.

Migrations

Expand, deploy, contract

Add columns and tables first. Deploy code. Backfill. Remove old fields later.

Rollback

Image is not enough

Release plans cover DB compatibility, workflow versions, event schema compatibility.

Workflow versioning

Never break old runs

Long-running workflows resume after deploys. Version names or preserve old handlers.

Policy

Preview before apply

CI posts Pulumi preview. Production apply requires approval and pinned providers.

Smoke

Real run after deploy

Start a workflow, stream events, kill a pod, verify completion, check logs.

Migration tool — pick one

What we use, why

For a course-shaped repo, Drizzle (drizzle-kit migrate) is the lightest option that produces plain SQL files you can read and audit. node-pg-migrate is also fine if you prefer hand-written up/down. The course's expand/deploy/contract rule applies to either:

  • Expand: additive only — new columns nullable, new tables, new indexes CONCURRENTLY.
  • Deploy: code reads/writes both old and new shapes.
  • Backfill: a workflow does this safely; never a long transaction.
  • Contract: drop the old fields one release later, never the same release.
ci/pulumi-roles.txtleast privilege
# Service account the CI uses to run `pulumi up` against prod.
# Bind these roles project-wide unless you can scope tighter.

roles/container.admin          # GKE Autopilot cluster + workloads
roles/cloudsql.admin           # Cloud SQL instances and users
roles/artifactregistry.admin   # repos + image pushes
roles/secretmanager.admin      # versions + IAM bindings
roles/iam.serviceAccountAdmin  # to create the workload-identity GSAs
roles/iam.serviceAccountUser   # to *use* those GSAs in workloads
roles/compute.networkAdmin     # VPC, subnets, Cloud Armor
roles/servicenetworking.networksAdmin  # private services access
roles/storage.admin            # Pulumi state bucket + course buckets
roles/monitoring.admin         # dashboards + alert policies

# Do NOT grant roles/owner. It works, then it leaks.

16

Reliability operations: HA, DR, load, incidents

For a 5–10M user product, start with strong single-region HA and tested disaster recovery. Multi-region active-active comes after this is solid.

AreaStandard targetGCP mechanismCourse lab
GKE runtimeMulti-replica, disruption-safe rollouts.StatefulSet, readiness, PDB, Autopilot.Drain one pod mid-workflow.
DatabaseHA primary, PITR, tested restore.Cloud SQL HA, backups, PITR.Restore staging from backup.
GatewayHorizontal autoscaling, stateless reconnect.GKE HPA or Cloud Run autoscaling.WebSocket reconnect soak test.
Provider outageGraceful fallback or queueing.ExecutionPlan, rate limits, fallback models.Force primary model failure.
Traffic spikeAdmission control before overload.Per-tenant quotas, queue depth, 429 policy.Load test to controlled rejection.
Incident responseKnown runbooks and owners.Cloud Monitoring alerts, dashboards.Run a simulated incident review.

Production rule

If a failure mode does not have an alert, a dashboard panel, and a runbook entry, it is not production-ready.

17

Governance: tenancy, security, abuse, compliance, cost

Agent systems carry unique blast-radius problems: model spend, tool permissions, prompt injection, retention, and tenant isolation.

Tenancy

Tenant id everywhere

Every table, event, payload, trace, metric, log carries tenant id. Quotas before workflows start.

Abuse

Admission control

Rate limits by tenant, user, IP, and tool category. Reject early before LLM cost.

Cost

Budgeted reasoning

Track tokens, tool calls, retries, model class, run duration. Hard caps and fallbacks.

Privacy

Retention and deletion

Define TTLs for raw events, summaries, embeddings, artifacts, and audit records.

Tools

Least-privilege tools

Tool handlers are Effect services with typed requirements and policy checks.

Pulumi

Policy as code

Enforce private DB, no public buckets, required labels, deletion protection, approved regions.

Industry-standard readiness checklist
  • SLO document with error budget and alert thresholds.
  • Threat model for prompts, tools, data exfiltration, tenant isolation.
  • Runbooks for model outage, DB saturation, stuck workflows, socket storms.
  • Backup restore drill and rollback drill completed before launch.
  • Load test report with bottleneck and controlled rejection point.
  • Cost dashboard by tenant, model, tool, and workflow type.
  • Data retention and deletion policy implemented, not just documented.

Advanced

Multi-region, shard groups, typed RPC, ingress, cost

18

Multi-region HA: when single-region is not enough

Single-region HA covers most 5–10M-user products. Cross-region buys disaster recovery and lower global latency, but it adds replication lag, failover drills, and workflow-versioning rules. Start with read replicas before active-active.

Step 1

Cross-region read replica

Primary stays availabilityType: "REGIONAL". Add a replicaInstance in another region. Promote on disaster, then repoint the gateway.

Step 2

Stateless gateway in N regions

Deploy the gateway in two regions behind a global external load balancer. Sticky to primary DB until a failover playbook flips it.

Step 3

Workflow versioning

Cross-region failover may resume workflows under the new primary. Old workflow names must keep their handlers for at least one major version.

infra/components/cloud-sql-replica.tsPulumi
new gcp.sql.DatabaseInstance("agents-pg-replica", {
  region: "us-east1",
  databaseVersion: "POSTGRES_16",
  masterInstanceName: primary.name,
  // Postgres read replicas can never be auto-failover targets in
  // Cloud SQL — this field is effectively a no-op here. Failover for
  // Postgres uses the primary's REGIONAL availability + a manual
  // promote on the replica during a regional outage.
  replicaConfiguration: { failoverTarget: false },
  settings: {
    tier: "db-custom-2-7680",
    availabilityType: "ZONAL", // replicas don't need HA
    ipConfiguration: {
      ipv4Enabled: false,
      privateNetwork: network.id
    }
  }
})

Honest limit

Effect Cluster is not designed for split-brain across regions today. Real active-active means workflow ids partitioned by region or an external coordinator. Most 5–10M-user products do not need this.

19

Shard groups per tenant or tier

By default all entities share one shard ring. For premium tenants, large customers, or noisy workloads, route them to a dedicated runner pool with shard groups.

cluster/AgentEntity.tsannotation
export const AgentEntity = Entity.make("Agent", [Ask, Cancel])
  .annotate(
    ClusterSchema.ShardGroup,
    (entityId) =>
      entityId.startsWith("ent-")
        ? "enterprise"
        : "default"
  )

Runtime config

Runners opt into groups

A runner only owns shards from its declared groups. Run two StatefulSets — one with shardGroups: ["default"], one with ["enterprise"] — and scale them independently.

  • New groups need a one-time SQL migration of the shards table.
  • Always keep "default" on at least one runner pool.
  • Use a single group until you have a real noisy-neighbor problem.

20

Typed RPC gateway with @effect/rpc

Hand-rolling HTTP between Cloud Run and GKE loses your types at the boundary. @effect/rpc gives a typed schema, no codegen, that both sides import from a shared package.

packages/domain/src/api.tsshared
export class StartRun extends Rpc.make("StartRun", {
  payload: {
    sessionId: Schema.String,
    runId: Schema.String,
    message: Schema.String
  },
  success: Schema.Struct({ executionId: Schema.String })
}) {}

export class PollRun extends Rpc.make("PollRun", {
  payload: { executionId: Schema.String },
  success: WorkflowResult
}) {}

export const AgentRpc = RpcGroup.make(StartRun, PollRun)

Why it pays

Schema is the contract

  • Cloud Run imports the typed client; GKE imports the handler.
  • Breaking changes appear as TypeScript errors before deploy.
  • Tracing context propagates automatically across the boundary.
  • Versioning a payload field is a schema change, not a docs change.

21

Cloud Armor and IAP: stricter ingress

Cloud Armor is the WAF and rate limiter in front of your load balancer. IAP enforces Google identity on internal endpoints. Both are Pulumi resources you add once the gateway is stable.

WAF

Block known bad

Use Google preconfigured rules for SQLi, XSS, and protocol attacks. Add per-tenant rate limit by header.

Rate limit

Per-IP + per-tenant

Two limit policies stack: a global IP throttle and a tenant-key throttle. Reject early with 429 before any LLM cost.

IAP

Internal endpoints

Admin dashboards, replay tooling, and runbook actions sit behind IAP with Google login. No VPN needed.

infra/components/armor.tsPulumi
new gcp.compute.SecurityPolicy("gw-armor", {
  rules: [
    {
      action: "deny(403)",
      priority: 1000,
      match: {
        expr: {
          expression:
            "evaluatePreconfiguredExpr('sqli-stable')"
        }
      }
    },
    {
      action: "rate_based_ban",
      priority: 2000,
      rateLimitOptions: {
        rateLimitThreshold: { count: 100, intervalSec: 60 },
        conformAction: "allow",
        exceedAction: "deny(429)",
        enforceOnKey: "HTTP-HEADER",
        enforceOnKeyName: "x-tenant-id"
      },
      match: {
        versionedExpr: "SRC_IPS_V1",
        config: { srcIpRanges: ["*"] }
      }
    },
    {
      action: "allow",
      priority: 2147483647,
      match: {
        versionedExpr: "SRC_IPS_V1",
        config: { srcIpRanges: ["*"] }
      }
    }
  ]
})

22

Cost attribution: tokens to dollars to tenants

At 5–10M users you must answer “which tenant spent how much, on which model, doing what.” Two streams feed it: GCP Billing export and your own token accounting.

Labels

Stamp every resource

Pulumi applies tenant, env, app, component labels to GKE node pools, Cloud SQL, Cloud Run, and Cloud Storage. Cloud Billing rolls these up.

Token usage

Effect AI exposes counts

response.usage from LanguageModel.generateText gives input/output tokens. Persist them keyed by tenant, workflow, model.

Reports

BigQuery sink

Export Cloud Billing to BigQuery. Join with your token usage table. Looker Studio for the per-tenant dashboard.

workflow/usage.tsactivity
const response = yield* LanguageModel.generateText({
  prompt: payload.message
})

// response.usage is { inputTokens, outputTokens, totalTokens,
// cachedInputTokens }. The model name is not on usage — pass it
// through from the workflow payload or your model config so the
// number you store matches the call you actually made.
yield* Activity.make({
  name: "RecordUsage",
  execute: TokenUsage.append({
    tenantId: payload.tenantId,
    executionId,
    model: payload.model,
    inputTokens: response.usage.inputTokens,
    outputTokens: response.usage.outputTokens,
    cachedInputTokens: response.usage.cachedInputTokens ?? 0
  })
})

Capstone

The finished repo, fixing it, teardown

23

Capstone: the production repo

The capstone is a reusable repo: app code, infrastructure code, local development, and production deployment.

repo shapetarget
apps/
  cloudrun-main/          # existing app or adapter
  agent-runtime/          # Effect cluster + workflow runtime
  agent-gateway/          # internal HTTP + WebSocket API in GKE
packages/
  domain/                 # schemas, errors, tools, prompts
  effect-agents/          # Entity + Workflow modules
  event-store/            # SQL-backed durable events
infra/
  Pulumi.yaml
  index.ts                # GCP + Kubernetes resources
  components/
    network.ts
    artifact-registry.ts
    cloud-sql.ts
    gke-autopilot.ts
    agent-runtime.ts
    gateway.ts
docker-compose.yml        # local Postgres + runtime
Makefile                  # build, push, preview, deploy

Graduation criteria

You are done when these pass.

  • Local Docker Compose runs one agent workflow.
  • Pulumi creates GCP services, SQL, GKE, and K8s objects.
  • One GKE runner starts and writes cluster tables.
  • Three runners route the same session id consistently.
  • Cloud Run starts an agent run through the gateway.
  • WebSocket reconnect resumes from lastEventId.
  • Killing a pod does not lose durable workflow state.
  • Secrets rotate without code changes.
  • Logs show session, run, workflow, activity, and model spans.
  • Load test reaches controlled rejection, not collapse.

24

Troubleshooting: what actually goes wrong

Every entry below is a real failure mode from running this stack. Symptoms in the words you'll see in a log or alert, then the fix.

Cluster tables never created; the runner logs say "no shards assigned".
You left storage: "local" from local development. NodeClusterSocket.layer only persists shard ownership and runner registry when storage: "sql" is set and the SQL client layer is provided. Set both, redeploy with one replica, then scale out.
Runners can't reach each other; entity calls time out across pods.
runnerAddress and runnerListenAddress are not the same thing. runnerAddress is the address peers use to call this runner — it must be the pod's StatefulSet DNS (pod-0.svc.cluster.local). runnerListenAddress is what the local server binds to — that one is 0.0.0.0. Using 0.0.0.0 for both is the most common bug; the cluster registers an unreachable address.
A workflow keeps firing duplicates every time the client retries.
Your idempotencyKey is not deterministic. Anything that includes Date.now(), crypto.randomUUID(), or the current pod hostname produces a new execution id every retry. The key must be a stable function of the request — usually a client-provided runId:
good vs bad idempotencyKey
// bad — new execution every call
idempotencyKey: () => crypto.randomUUID()

// good — client owns the run id, retries dedupe
idempotencyKey: ({ runId }) => runId
An Activity retries forever and never surfaces the error.
Activity.make on its own retries until the workflow's suspendedRetrySchedule gives up — effectively forever in dev. Wrap with Activity.retry({ times: N }) for a bounded budget, and consider .withCompensation so a final failure cleans up.
WebSockets disconnect every hour on Cloud Run.
Cloud Run caps every request — WebSockets included — at 60 minutes. There's no setting to extend it. Move the WebSocket gateway to GKE (module 12) and keep Cloud Run for short HTTP calls only.
Cloud SQL: FATAL: sorry, too many clients already.
db-custom-2-7680 caps at ~200 connections. With PgClient.layer's default pool of 10, you'll hit this around 20 pods. Either lower the per-pod pool (driver options on the layer), use the Cloud SQL Auth Proxy with its built-in pooling, or upsize the instance. Adding more pods to "fix" latency makes this worse, not better.
Cloud Run → GKE gateway returns 403 unauthorized.
Three things to check, in order. (1) The Cloud Run service identity has roles/run.invoker on the target — or the gateway is verifying the JWT itself and the audience doesn't match. (2) The ID token's aud claim equals the gateway's URL exactly, scheme and trailing slash included. (3) The token is in Authorization: Bearer, not in a custom header that the load balancer strips.
Pods OOMKilled on GKE Autopilot under load.
Autopilot enforces requests strictly — there's no burst headroom from co-tenants on the node. If your container peaks at 800Mi but requests 512Mi, it gets killed at 512Mi. Set resources.requests.memory to your real p95 plus a margin (e.g. 1Gi for the agent runtime), and set limits.memory equal to requests so Node sees the right heap target.
Workload Identity 401: pod can't reach Cloud SQL or Secret Manager.
The Kubernetes service account is missing the annotation that binds it to the GCP service account. The pod is running under the default KSA with no IAM. Check the spec:
k8s/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: agent-runtime
  namespace: agents
  annotations:
    iam.gke.io/gcp-service-account: agent-runtime@PROJECT.iam.gserviceaccount.com
And the GSA must have roles/iam.workloadIdentityUser granted to PROJECT.svc.id.goog[agents/agent-runtime]. Module 07 has the Pulumi for both halves.
A pod dies, but its shards never move; entity calls hang.
RunnerHealth.layerK8s is what tells the cluster a pod is gone. If you're using runnerHealth: "k8s", the K8s API client needs read access to pods in the namespace — that's the Role + RoleBinding from module 07. If you've left RunnerHealth.layerNoop in by accident (very common when copy-pasting test setup), dead pods stay "alive" forever and their shards are orphaned.
Old workflow executions vanish after a deploy.
You renamed a workflow. Workflow names are the lookup key — Workflow.make({ name: "AgentWorkflow" }) binds executions to that exact string. Renaming, even cosmetically, abandons every in-flight execution. Keep old names. If you must evolve, add a new workflow alongside and migrate explicitly.
pulumi up sits on Cloud SQL for ten minutes then errors.
Private services access peering wasn't ready when the instance tried to create. Pulumi needs gcp.servicenetworking.Connection to exist and propagate before gcp.sql.DatabaseInstance with privateNetwork set. Add an explicit dependsOn: [privateVpcConnection] on the DB resource — Pulumi will not infer this dependency because the values aren't interpolated.

When this list isn't enough

Two diagnostics worth running before opening an issue. First: kubectl -n agents logs -l app=effect-agent-runtime --tail=200 --prefix with structured-log filter on severity>=WARNING — the cluster prints fairly readable warnings before things fail. Second: query the cluster runner registry directly: SELECT * FROM cluster_runners; SELECT * FROM cluster_shards; tells you exactly who owns what.

25

Teardown: leave nothing behind

Pulumi destroys what it created. Before destroy, take a final backup and confirm no data you need is left in Cloud SQL or Cloud Storage. Resources outside Pulumi (manual buckets, ad-hoc IAM bindings) must be cleaned up by hand.

Step 1

Final backup

Force an on-demand Cloud SQL backup and export to Cloud Storage. Export workflow tables if you need them long-term.

Step 2

Drain runners

Scale the StatefulSet to zero first so workflows release shard locks and write final state.

Step 3

Destroy stack

Pulumi destroy removes every resource it owns. Dependency order is automatic.

teardown.shsafe order
# 1. final SQL backup + export
gcloud sql backups create \
  --instance="$(pulumi -C infra stack output cloudSqlInstance)" \
  --description="pre-destroy snapshot"

gcloud sql export sql \
  "$(pulumi -C infra stack output cloudSqlInstance)" \
  "gs://$PROJECT_ID-teardown/dump-$(date +%Y%m%d).sql.gz" \
  --database=effect_cluster

# 2. drain runners gracefully
kubectl -n agents scale statefulset effect-agent-runtime --replicas=0
kubectl -n agents rollout status statefulset/effect-agent-runtime

# 3. destroy infrastructure
cd infra
pulumi destroy --yes
pulumi stack rm dev --yes

# 4. delete the GCP project (irreversible — only if isolated)
gcloud projects delete "$PROJECT_ID"

Last guard

Never run gcloud projects delete on a shared or production project. Keep teardown stacks in their own GCP project so destruction has no blast radius.