User-facing tier
Cloud Run app
Stateless HTTP and WebSocket entry. Auth, rate-limits, and tenant boundaries live here.
Production agents on GCP
A long-form course on running stateful Effect Cluster runtimes behind a Cloud Run gateway, with Pulumi owning every resource.
Before you start
Backend engineers who already write some TypeScript and want a serious-scale Effect deployment. Not an Effect introduction.
How to run Cluster entities and durable Workflow executions on GKE Autopilot, fronted by a stateless Cloud Run gateway, with Pulumi as the single source of truth.
Node 22+, pnpm 9+, Pulumi 3.130+, Effect 3.x, @effect/cluster and @effect/workflow latest. Note that Node no longer ships corepack as a stable feature — the Quickstart installs pnpm directly. The Quickstart pins exact versions.
Quickstart
gcloud login to a running agent00
Every box below is a real shell command. Replace anything in <ANGLE_BRACKETS> once. The flow takes about 25 minutes the first time.
Versions known to work: Node 22+, pnpm 9+, Pulumi 3.130+, Docker, gcloud. We install pnpm directly because Node no longer treats corepack as a supported way to ship a package manager.
# macOS — node@22 keeps you on the current LTS line
brew install node@22 docker
brew install pulumi/tap/pulumi google-cloud-sdk
npm install -g pnpm@9 # install pnpm directly, do not use corepack
# Linux (Debian/Ubuntu)
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt install -y nodejs docker.io
curl -fsSL https://get.pulumi.com | sh
curl https://sdk.cloud.google.com | bash
sudo npm install -g pnpm@9
# Sanity check — every version should print
node --version # v22.x
pnpm --version # 9.x
docker --version
pulumi version
gcloud --version
Replace PROJECT_ID with something globally unique. Billing account id comes from gcloud billing accounts list.
export PROJECT_ID=<effect-agents-PROJECT>
export REGION=us-central1
export BILLING_ACCOUNT=<XXXXXX-XXXXXX-XXXXXX>
gcloud auth login
gcloud projects create "$PROJECT_ID" --name="Effect agents"
gcloud config set project "$PROJECT_ID"
gcloud billing projects link "$PROJECT_ID" \
--billing-account "$BILLING_ACCOUNT"
gcloud services enable \
container.googleapis.com \
artifactregistry.googleapis.com \
sqladmin.googleapis.com \
secretmanager.googleapis.com \
vpcaccess.googleapis.com \
servicenetworking.googleapis.com \
iam.googleapis.com \
monitoring.googleapis.com \
logging.googleapis.com
gcloud auth application-default login
The capstone (module 23) is the full shape. Start with just enough to deploy a single runner.
mkdir effect-agents && cd effect-agents
pnpm init
pnpm add effect @effect/ai @effect/ai-openai \
@effect/cluster @effect/workflow \
@effect/platform @effect/platform-node \
@effect/rpc @effect/sql @effect/sql-pg
pnpm add -D typescript tsx vitest @effect/vitest
mkdir -p apps/agent-runtime/src apps/agent-gateway/src \
packages/domain/src packages/event-store/src \
infra/components
The smallest runtime that actually boots, joins the shard ring, and serves a health probe. Replace with the full agent runtime once it works.
import { Entity, RunnerAddress, Rpc } from "@effect/cluster"
import { HttpRouter, HttpServer, HttpServerResponse } from "@effect/platform"
import {
NodeClusterSocket,
NodeHttpServer,
NodeRuntime
} from "@effect/platform-node"
import { PgClient } from "@effect/sql-pg"
import { Config, Effect, Layer, Logger, Option, Schema } from "effect"
import { createServer } from "node:http"
// A tiny entity that proves the cluster is wired correctly. Every
// call to Echo(text) for a given entityId routes to the same runner.
const Echo = Rpc.make("Echo", {
payload: { text: Schema.String },
success: Schema.String
})
const EchoEntity = Entity.make("Echo", [Echo])
const EchoEntityLive = EchoEntity.toLayer(
Effect.gen(function* () {
const address = yield* Entity.CurrentAddress
return EchoEntity.of({
Echo: ({ payload }) =>
Effect.succeed(`[${address.entityId}] ${payload.text}`)
})
})
)
// Probes for the StatefulSet from module 09.
const HealthLive = HttpRouter.empty.pipe(
HttpRouter.get("/healthz", HttpServerResponse.text("ok")),
HttpRouter.get("/readyz", HttpServerResponse.text("ready")),
HttpServer.serve()
)
const podHost = Config.string("POD_NAME").pipe(
Config.map((name) =>
`${name}.effect-agent-runners.agents.svc.cluster.local`
),
Config.withDefault("localhost")
)
const ClusterLive = Layer.unwrapEffect(
Effect.gen(function* () {
const host = yield* podHost
return NodeClusterSocket.layer({
storage: "sql",
runnerHealth: "k8s",
runnerHealthK8s: {
namespace: "agents",
labelSelector: "app=effect-agent-runtime"
},
shardingConfig: {
runnerAddress: Option.some(RunnerAddress.make(host, 50000)),
runnerListenAddress: Option.some(
RunnerAddress.make("0.0.0.0", 50000)
)
}
})
})
)
const SqlLive = PgClient.layerConfig({
url: Config.redacted("DATABASE_URL")
})
const HttpLive = NodeHttpServer.layer(createServer, { port: 8080 })
const MainLive = Layer.mergeAll(EchoEntityLive, HealthLive).pipe(
Layer.provide(HttpLive),
Layer.provideMerge(ClusterLive),
Layer.provideMerge(SqlLive),
Layer.provide(Logger.json)
)
Layer.launch(MainLive).pipe(NodeRuntime.runMain)
# syntax=docker/dockerfile:1.7
# Node 22+ no longer guarantees a working corepack, so we install
# pnpm directly with npm and pin the version. Multi-stage keeps the
# production image small and free of build tooling.
FROM node:22-slim AS base
ENV PNPM_HOME=/pnpm PATH=/pnpm:$PATH
RUN npm install -g pnpm@9.15.0
WORKDIR /app
# --- deps: lockfile + manifests only, maximises layer cache hits
FROM base AS deps
COPY package.json pnpm-lock.yaml pnpm-workspace.yaml ./
COPY apps/agent-runtime/package.json apps/agent-runtime/
COPY packages packages
RUN --mount=type=cache,id=pnpm,target=/pnpm/store \
pnpm install --frozen-lockfile
# --- build: compile TypeScript
FROM deps AS build
COPY tsconfig.json ./
COPY apps/agent-runtime apps/agent-runtime
RUN pnpm --filter agent-runtime build
# --- runtime: only what node needs to run
FROM node:22-slim AS runtime
ENV NODE_ENV=production NODE_OPTIONS=--enable-source-maps
RUN npm install -g pnpm@9.15.0
WORKDIR /app
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/packages ./packages
COPY --from=build /app/apps/agent-runtime/dist apps/agent-runtime/dist
COPY --from=build /app/apps/agent-runtime/package.json apps/agent-runtime/
USER node
EXPOSE 8080 50000
CMD ["node", "apps/agent-runtime/dist/main.js"]
Run inside infra/. Pick TypeScript. Stack name is the environment; start with dev.
cd infra
pulumi new gcp-typescript --force \
--name effect-agents-infra \
--description "Effect agents on GCP"
pulumi stack init dev
pulumi config set gcp:project "$PROJECT_ID"
pulumi config set gcp:region "$REGION"
pulumi config set --secret postgresPassword \
"$(openssl rand -hex 24)"
pulumi config set --secret openaiApiKey "<sk-...>"
Tag with the git sha — never latest. Artifact Registry repo name matches what Pulumi creates next.
export SHA=$(git rev-parse --short HEAD 2>/dev/null || echo dev)
export IMAGE="$REGION-docker.pkg.dev/$PROJECT_ID/agents/agent-runtime:$SHA"
gcloud auth configure-docker "$REGION-docker.pkg.dev" --quiet
docker build -t "$IMAGE" -f apps/agent-runtime/Dockerfile .
docker push "$IMAGE"
cd infra
pulumi config set agentRuntimeImage "$IMAGE"
First apply creates VPC, Cloud SQL HA, GKE Autopilot, Artifact Registry, Secret Manager, namespace, RBAC, StatefulSet, gateway Service. Expect 10–15 minutes.
pulumi preview # review the plan
pulumi up --yes # apply
pulumi stack output gatewayUrl
pulumi stack output cloudSqlConnection
pulumi stack output kubeconfig --show-secrets > ~/.kube/agents.yaml
export KUBECONFIG=~/.kube/agents.yaml
Start a run, poll the workflow, open a WebSocket and replay events.
export GW=$(cd infra && pulumi stack output gatewayUrl)
RUN=$(curl -sX POST "$GW/agents/session-1/runs" \
-H 'content-type: application/json' \
-d '{"runId":"run-1","message":"hello"}')
echo "$RUN"
EXEC=$(echo "$RUN" | jq -r .executionId)
curl -s "$GW/runs/$EXEC" | jq .
npx -y wscat -c "${GW/https/wss}/ws/session-1"
Cost guard, honestly
This stack idles around $300–500/month in us-central1. Cloud SQL HA on db-custom-2-7680 is the bulk of it (~$250–350); GKE Autopilot adds ~$70 in cluster fee plus small per-pod resource cost; Artifact Registry and Secret Manager are rounding errors. Don't leave this running over a weekend by accident — run pulumi destroy from module 25.
Foundation
01
A production deployment is a contract between Cloud Run as the user edge, GKE as the durable runtime, and Cloud SQL as the brain. Every later module sharpens this triangle.
User-facing tier
Stateless HTTP and WebSocket entry. Auth, rate-limits, and tenant boundaries live here.
Runtime tier
Long-lived Effect cluster runners and durable workflow execution per agent session.
Durability tier
Cluster runner registry, workflow state, durable events, and pgvector memory for retrieval.
Course bias
Keep Cloud Run for stateless ingress. Keep GKE Autopilot for durable runtime. Avoid hybrid systems where Cloud Run holds in-flight agent state.
02
Learn only the Kubernetes objects a Docker operator needs: image, replicas, DNS, secrets, ports, and health. No Helm. No custom controllers.
| Docker concept | GKE object | Why for Effect | Course rule |
|---|---|---|---|
docker run | Pod | One Effect runtime process. | Never deploy bare pods. |
docker compose --scale | StatefulSet | Stable names agent-0, agent-1. | Required for runner addresses. |
| Compose network alias | Headless Service DNS | Per-pod DNS for cluster. | Maps to runnerAddress. |
| Container port | Service port | Runners speak TCP. | Expose one internal port. |
.env | Secret + env var | DB and provider credentials. | Pulumi creates and binds. |
| Restart policy | Probe + rollout | Bad runners are replaced safely. | Boring startup is mandatory. |
03
One Node container launches a single Layer graph. Cluster, workflow engine, entities, workflows, SQL, AI providers, and telemetry are services in the same graph.
const podDns = () =>
`${process.env.POD_NAME}` +
`.effect-agent-runners.agents.svc.cluster.local`
const ClusterLayer = NodeClusterSocket.layer({
storage: "sql",
runnerHealth: "k8s",
runnerHealthK8s: {
namespace: "agents",
labelSelector: "app=effect-agent-runtime"
},
shardingConfig: {
runnerAddress: Option.some(
RunnerAddress.make(podDns(), 50000)
),
runnerListenAddress: Option.some(
RunnerAddress.make("0.0.0.0", 50000)
)
}
})
const RuntimeLayer = Layer.mergeAll(
AgentEntityLive,
AgentWorkflowLive,
AgentGatewayLive
).pipe(
Layer.provideMerge(ClusterWorkflowEngine.layer),
Layer.provideMerge(ClusterLayer),
Layer.provideMerge(SqlLayer),
Layer.provideMerge(AiLayer)
)
Layer.launch(RuntimeLayer).pipe(NodeRuntime.runMain)
Effect type lens
A typical business function reads Effect<Answer, AgentError, AgentMemory | LanguageModel | Tools>. Pulumi and GKE decide where requirements are provided. The signature decides correctness.
Platform
04
Pulumi TypeScript first creates the GCP resources, then uses the generated kubeconfig to create namespace, RBAC, services, StatefulSet, secrets, and gateway objects.
GCP
Enable GKE, Artifact Registry, Cloud SQL, Secret Manager, IAM, VPC Access, Cloud Logging.
Network
VPC, subnet, private service access for Cloud SQL, optional Cloud Run VPC connector.
Data
Cluster runner registry, shard locks, persisted messages, workflow state, agent app tables.
Runtime
Managed cluster runs the Docker image as a long-lived StatefulSet.
Kubernetes
Namespace, ServiceAccount, RBAC, headless Service, StatefulSet, gateway Service.
Delivery
Images built in CI, pushed, and referenced by Pulumi config.
Where state lives
The Quickstart leaves Pulumi state in Pulumi Cloud, which is fine while you're learning. For real environments most teams self-host on GCS: pulumi login gs://<PROJECT_ID>-pulumi-state before stack init. You get versioning, IAM, and no third-party access to your infra graph. The bucket should be in the same project, have versioning on, and have uniform_bucket_level_access enabled.
import * as gcp from "@pulumi/gcp"
import * as k8s from "@pulumi/kubernetes"
import * as pulumi from "@pulumi/pulumi"
const cfg = new pulumi.Config()
const region = cfg.require("region")
const network = new gcp.compute.Network("agents-vpc", {
autoCreateSubnetworks: false
})
const subnet = new gcp.compute.Subnetwork("agents-subnet", {
region,
network: network.id,
ipCidrRange: "10.42.0.0/20"
})
const repo = new gcp.artifactregistry.Repository("agents", {
location: region,
repositoryId: "agents",
format: "DOCKER"
})
const db = new gcp.sql.DatabaseInstance("agents-pg", {
region,
databaseVersion: "POSTGRES_16",
settings: {
tier: "db-custom-2-7680",
ipConfiguration: { ipv4Enabled: false, privateNetwork: network.id },
backupConfiguration: { enabled: true, pointInTimeRecoveryEnabled: true },
availabilityType: "REGIONAL"
}
})
const cluster = new gcp.container.Cluster("agents-autopilot", {
location: region,
enableAutopilot: true,
network: network.id,
subnetwork: subnet.id,
privateClusterConfig: { enablePrivateNodes: true }
})
const k8sProvider = new k8s.Provider("gke", { kubeconfig: makeKubeconfig(cluster) })
// Namespace, ServiceAccount, RBAC, headless Service, StatefulSet,
// Secret, gateway Service — all created with k8sProvider.
05
Every module ships a lab outcome. You do not jump to multi-replica GKE on day one.
One Node container plus Postgres. Use storage: "local" first, then switch to storage: "sql".
services:
postgres:
image: postgres:16
environment:
POSTGRES_USER: agents
POSTGRES_PASSWORD: agents
POSTGRES_DB: agents
healthcheck:
test: ["CMD-SHELL", "pg_isready -U agents"]
interval: 2s
timeout: 5s
retries: 10
ports: ["5432:5432"]
volumes: [pgdata:/var/lib/postgresql/data]
runtime:
build:
context: .
dockerfile: apps/agent-runtime/Dockerfile
depends_on:
postgres: { condition: service_healthy }
environment:
DATABASE_URL: postgres://agents:agents@postgres:5432/agents
POD_NAME: runtime-local
ports:
- "8080:8080" # health + http gateway
- "50000:50000" # cluster socket (only for multi-node compose)
volumes:
pgdata:
Run with docker compose up --build. First boot creates the database; subsequent boots reuse it. To simulate two runners, copy runtime as runtime-2 with POD_NAME: runtime-local-2 and a different host port.
Create Artifact Registry, Cloud SQL HA, GKE Autopilot, Secret Manager, IAM, VPC.
Deploy with replicas: 1. Confirm SQL tables, logs, and one workflow execution.
Scale out. Confirm shard ownership moves and same session id routes consistently.
Cloud Run calls GKE gateway over private networking. Start runs, poll workflows, stream events.
Kill a pod, restart the runtime, fail a tool, rotate a secret, restore from backup.
import { it } from "@effect/vitest"
import { Effect, Layer, TestClock } from "effect"
import { TestRunner } from "@effect/cluster"
import { AgentWorkflow, AgentWorkflowLive } from "../src/AgentWorkflow"
// TestRunner.layer runs the whole cluster in-memory: no SQL, no
// network, no real time. Perfect for entity + workflow unit tests.
const TestLayer = AgentWorkflowLive.pipe(
Layer.provideMerge(TestRunner.layer)
)
it.effect("workflow completes for a simple ask", () =>
Effect.gen(function*() {
const executionId = yield* AgentWorkflow.execute(
{ runId: "r1", sessionId: "s1", message: "hi" },
{ discard: true }
)
// Advance time so any DurableClock.sleep resolves instantly
yield* TestClock.adjust("1 hour")
const result = yield* AgentWorkflow.poll(executionId)
// assert result is Complete with the expected exit
}).pipe(Effect.provide(TestLayer))
)
What to test where
TestRunner.layer: entity protocols, workflow happy path + each error type, schema decoders. Fast, no Docker.docker compose up postgres: SQL storage round-trip, real idempotencyKey dedup, real shard math. Slow enough to run on PR but not in watch mode.06
Effect runners need stable addresses. The public app does not need to join the cluster. Cloud Run calls a gateway; the gateway calls typed entity clients.
Runner-to-runner
Use StatefulSet pod DNS as the runnerAddress. Use 0.0.0.0 only as listen address.
effect-agent-runtime-0.effect-agent-runners.agents.svc.cluster.local50000 for cluster socket traffic.Cloud Run-to-GKE
Cloud Run hits a small HTTP gateway inside GKE. The gateway uses typed Effect entity clients.
POST /agents/:sessionId/runs — start workflow.GET /runs/:executionId — poll workflow.GET /runs/:executionId/events — stream events.07
Least-privilege service accounts, secrets out of source, private DB. Pulumi owns both GCP IAM and Kubernetes RBAC.
| Surface | Default | Why | Pulumi creates |
|---|---|---|---|
| Runtime identity | Workload identity | Pods use a GCP service account without static keys. | GSA, KSA, IAM binding. |
| Secrets | Secret Manager | No plaintext config in Git. | Secret versions, K8s Secret projection. |
| Cloud SQL | Private IP | No public DB endpoint. | Private service access and HA instance. |
| K8s health | Pod read RBAC | Effect runnerHealth: "k8s" needs pod visibility. | Role and RoleBinding. |
| Network | Internal first | Cloud Run reaches gateway privately where possible. | VPC connector or internal ingress. |
// 1. GCP service account the pod will act as
const runtimeGsa = new gcp.serviceaccount.Account("agent-runtime", {
accountId: "agent-runtime",
displayName: "Effect agent runtime pods"
})
// 2. Let the Kubernetes service account impersonate the GSA
new gcp.serviceaccount.IAMBinding("agent-runtime-wi", {
serviceAccountId: runtimeGsa.name,
role: "roles/iam.workloadIdentityUser",
members: [pulumi.interpolate`serviceAccount:${project}.svc.id.goog` +
`[agents/agent-runtime]`]
})
// 3. Give the GSA only what the pod actually needs
new gcp.projects.IAMMember("agent-runtime-sql", {
project, role: "roles/cloudsql.client",
member: pulumi.interpolate`serviceAccount:${runtimeGsa.email}`
})
new gcp.projects.IAMMember("agent-runtime-secrets", {
project, role: "roles/secretmanager.secretAccessor",
member: pulumi.interpolate`serviceAccount:${runtimeGsa.email}`
})
// 4. Annotate the K8s service account
new k8s.core.v1.ServiceAccount("agent-runtime", {
metadata: {
name: "agent-runtime", namespace: "agents",
annotations: {
"iam.gke.io/gcp-service-account": runtimeGsa.email
}
}
}, { provider: k8sProvider })
Cloud Run → GKE gateway
The gateway sits inside the VPC behind an internal load balancer. Cloud Run reaches it through a VPC connector. To prove identity, Cloud Run grabs a Google-signed ID token from the metadata server and sends it as Authorization: Bearer <token>. The gateway verifies the token's aud claim against its custom audience.
audience = the gateway's internal URL (or a custom audience string).roles/run.invoker only if you're invoking another Cloud Run; for a self-verifying gateway the gateway middleware checks the JWT directly using Google's JWKS.x-tenant-id from Cloud Run. Re-extract tenant from a verified claim or your auth header inside the gateway.08
Agents fail in unusual ways: rate limits, tool failures, partitions, workflow suspension, runner churn. Logs alone are not enough.
Logs
Annotate with sessionId, runId, executionId, model, tool, attempt.
Traces
Export spans to Cloud Trace. Wrap agent turns, tool calls, model calls, and workflow activities.
Metrics
Active runners, shard ownership, workflow states, tool latency, tokens, retry counts.
09
The course treats these as required labs, not optional notes.
DB sessions
Cluster SQL storage uses advisory locks. Avoid poolers that rotate sessions underneath ownership.
Rollouts
Deploy with replicas: 1 after schema or runtime changes, then scale out.
Passivation
Active memory disappears on idle or restart. Durable facts live in SQL or workflow state.
Idempotency
Workflow idempotency keys are deterministic. Client retries do not double-trigger runs.
Scaling
Set shard count up front. Changing shards later is a migration decision.
Shutdown
Termination grace, readiness, and logs confirm runners release work cleanly.
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # give cluster time to shed shards
containers:
- name: agent-runtime
# Startup probe lets the Layer graph come up before liveness fires.
# First boot includes opening pg pool, joining the shard ring,
# and running pending migrations — easy 20s.
startupProbe:
httpGet: { path: /healthz, port: 8080 }
failureThreshold: 30
periodSeconds: 2
readinessProbe:
httpGet: { path: /readyz, port: 8080 }
periodSeconds: 5
livenessProbe:
httpGet: { path: /healthz, port: 8080 }
periodSeconds: 10
lifecycle:
preStop:
exec:
# Tell the runtime to stop accepting new entity messages,
# then let in-flight workflow activities finish. The Effect
# runtime's own scope finalizers release shard locks on
# SIGTERM; preStop just gives the load balancer time to
# mark this pod NotReady first.
command: ["/bin/sh", "-c", "sleep 10"]
What the probes actually answer
/healthz = "the process is up." /readyz = "this runner is in the shard ring AND can reach its database AND has finished startup migrations." If /readyz 200s before the shard ring is joined, you'll route traffic to a runner that doesn't own any entities yet. Don't conflate them.
Effect runtime
10
Cluster is typed distributed actors. An Entity is the protocol. An entity layer is the server. Entity.client is the typed remote handle.
Core idea
For agents the entity id is usually sessionId, agentId, or tenant:user. Calls to the same id route to the same active runner while alive.
NodeClusterSocket.layer.const Ask = Rpc.make("Ask", {
payload: { runId: Schema.String, message: Schema.String },
success: Schema.String
})
const Cancel = Rpc.make("Cancel", {
payload: { executionId: Schema.String },
success: Schema.Void
})
export const AgentEntity = Entity.make("Agent", [Ask, Cancel])
export const AgentEntityLive = AgentEntity.toLayer(
Effect.gen(function* () {
const address = yield* Entity.CurrentAddress
const sessionId = address.entityId
return AgentEntity.of({
Ask: ({ payload }) => AgentWorkflow.execute({
runId: payload.runId,
sessionId,
message: payload.message
}, { discard: true }),
Cancel: ({ payload }) =>
AgentWorkflow.interrupt(payload.executionId)
})
}),
{ maxIdleTime: "10 minutes" }
)
| Concept | Effect API | Agent use | Nuance |
|---|---|---|---|
| Protocol | Rpc.make | Ask, cancel, snapshot, append event. | Small schema-defined messages. |
| Entity definition | Entity.make | Defines the actor type. | No implementation yet. |
| Entity behavior | Entity.toLayer | Handlers and in-memory state. | Memory is cache. |
| Remote call | AgentEntity.client | Gateway calls clientFor(sessionId).Ask. | Caller does not know the pod. |
| Persistence | ClusterSchema.Persisted | Persist important messages. | Skip noisy read calls. |
| Concurrency | Rpc.fork | Run safe readers concurrently. | Default serial protects state. |
11
Cluster answers where does this agent live. Workflow answers how do I run this long task durably, retry it, sleep, resume, and inspect status.
class AgentError extends Schema.TaggedError<AgentError>("AgentError")(
"AgentError",
{ message: Schema.String }
) {}
export const AgentWorkflow = Workflow.make({
name: "AgentWorkflow",
payload: {
runId: Schema.String,
sessionId: Schema.String,
message: Schema.String
},
success: Schema.String,
error: AgentError,
idempotencyKey: ({ runId }) => runId
})
export const AgentWorkflowLive = AgentWorkflow.toLayer(
Effect.fn(function* (payload, executionId) {
yield* EventStore.append({
executionId, type: "started", message: payload.message
})
const answer = yield* Activity.make({
name: "RunAgentTurn",
success: Schema.String,
error: AgentError,
execute: Effect.gen(function* () {
const assistant = yield* AiAssistant
return yield* assistant.agent(payload.message)
})
}).pipe(Activity.retry({ times: 3 }))
yield* EventStore.append({ executionId, type: "completed", answer })
return answer
})
)
Durable model
Use workflows for work that outlives one HTTP request: reasoning, tool chains, approvals, scheduled follow-ups, retries, and cancellation.
Start
execute(payload, { discard: true })Returns execution id quickly. Perfect for HTTP and WebSocket start endpoints.
Poll
poll(executionId)Returns complete, suspended, or undefined. Status pages and retries use this.
Cancel
interrupt(executionId)Map a user cancel action to workflow interruption.
12
WebSockets belong at the gateway edge. They never own state. They start workflows, subscribe to persisted events, and reconnect by replaying from lastEventId.
Why GKE for sockets, not Cloud Run
Cloud Run treats WebSockets as long-lived HTTP requests and applies the 60-minute request timeout hard cap. Anything still connected at 60 minutes gets killed regardless of activity. That's fine for short chat sessions if your client reconnects cleanly, but for agent runs that observe a long workflow, GKE is the calmer home: idle timeouts are yours to choose, instance scaling is predictable, and you avoid the cold-start-during-reconnect dance.
const GatewayLive = HttpRouter.empty.pipe(
HttpRouter.get(
"/ws/:sessionId",
Effect.gen(function* () {
const req = yield* HttpServerRequest.HttpServerRequest
const sessionId = req.pathParams.sessionId
const outbound = EventStore.stream(sessionId).pipe(
Stream.map((event) => JSON.stringify(event)),
Stream.encodeText
)
const inbound = outbound.pipe(
Stream.pipeThroughChannel(HttpServerRequest.upgradeChannel()),
Stream.decodeText(),
Stream.runForEach((raw) =>
Effect.gen(function* () {
const msg = yield* Schema.decodeUnknown(ClientMessage)(
JSON.parse(raw)
)
const clientFor = yield* AgentEntity.client
const agent = clientFor(sessionId)
switch (msg._tag) {
case "Ask":
yield* agent.Ask({ runId: msg.runId, message: msg.text })
return
case "Cancel":
yield* agent.Cancel({ executionId: msg.executionId })
return
}
})
)
)
yield* inbound
return HttpServerResponse.empty()
})
),
HttpServer.serve()
)
Realtime contract
The browser can disconnect anytime. Every meaningful event is persisted so a reconnect resumes by lastEventId.
| Problem | Pattern | Effect tool | GCP nuance |
|---|---|---|---|
| Client disconnect | Persist events; replay from offset. | Stream + SQL EventStore. | Do not rely on Cloud Run memory. |
| Long LLM response | Workflow emits events while running. | Activity + EventStore. | Gateway restarts are safe. |
| Cancel button | Cancel message interrupts workflow. | Workflow.interrupt | Return final cancelled event to all clients. |
| Multiple tabs | Same session id subscribes once. | AgentEntity.client | Entity serializes mutations. |
| Socket scaling | Gateway pods are stateless subscribers. | Stream + SQL/PubSub bridge. | Prefer GKE gateway over Cloud Run for sockets. |
Scale
13
10 million users is not one number. Model MAU, peak concurrent users, concurrent WebSockets, active agent runs, tool calls per run, tokens per minute, and database writes per run.
| Metric | Why | Lever | Deliverable |
|---|---|---|---|
| MAU / DAU | Business scale, not raw load. | Plans, retention, tenant tiers. | Traffic worksheet per tier. |
| Peak concurrent users | Gateway CPU and request capacity. | Cloud Run autoscaling, GKE HPA. | Staged concurrency load test. |
| Concurrent WebSockets | Realtime connection pressure. | GKE gateway, heartbeat, idle timeout. | Soak test and reconnect test. |
| Active agent runs | Workflow and LLM pressure. | Quotas, rate limits, queues. | Admission policy. |
| DB writes per run | Cloud SQL bottleneck. | Event compaction, batching, retention. | Event schema plan. |
| Tokens per minute | Provider limits and cost. | Model router, fallback plan. | Per-tenant token dashboard. |
SLO
p95 gateway response, first-token latency, WebSocket reconnect success, run-start acceptance.
SLO
Workflow completion rate, retry exhaustion, suspended workflow age, event replay lag.
SLO
Cloud SQL utilization, p95 query latency, lock waits, backup success, restore drill age.
14
At this scale the hard part is not starting a workflow. The hard part is deciding which state is hot, durable, searchable, replayable, compacted, and deletable.
State taxonomy
type AgentEvent =
| { _tag: "RunAccepted"; executionId: string; runId: string }
| { _tag: "Token"; executionId: string; index: number; text: string }
| { _tag: "ToolCall"; executionId: string; tool: string; callId: string }
| { _tag: "ToolResult"; executionId: string; callId: string; ok: boolean }
| { _tag: "Completed"; executionId: string; answerRef: string }
| { _tag: "Failed"; executionId: string; reason: string }
// Persist every event with tenantId, sessionId, executionId,
// monotonic eventId, createdAt, and retention bucket.
| Data | Primary GCP choice | Why | Nuance |
|---|---|---|---|
| Cluster + workflow | Cloud SQL Postgres | Official SQL-backed cluster storage. | Stable sessions; watch locks. |
| Event stream | Cloud SQL first, Pub/Sub later | SQL is simple and replayable. | Pub/Sub is not source of truth. |
| Semantic memory | AlloyDB pgvector or Vertex Vector | Search and embeddings at scale. | Clear PII and deletion path. |
| Artifacts and files | Cloud Storage | Cheap durable blobs. | Store refs in SQL, not blobs. |
| Analytics | BigQuery sink | Large reporting without OLTP. | Export redacted events only. |
-- The event store is your code, not Effect's. This is one
-- reasonable shape; adapt freely. Cluster and workflow state are
-- created by @effect/cluster and @effect/workflow themselves — do
-- not touch those tables.
create table agent_events (
tenant_id text not null,
session_id text not null,
execution_id text not null,
event_id bigserial primary key,
event_type text not null,
payload jsonb not null,
created_at timestamptz not null default now()
);
create index agent_events_session_idx
on agent_events (tenant_id, session_id, event_id);
create index agent_events_exec_idx
on agent_events (execution_id, event_id);
-- Retention partitioning lives here too. The course leaves the
-- specifics to you because tenancy and compliance shape them.
Honest gap
Cluster runner registry and workflow execution tables are managed for you by @effect/cluster and @effect/workflow when you pick SQL storage. The event store above is application code — there is no built-in module to install. Treat the DDL on the left as a starting point, not a published library.
Two things to decide before you ship it:
created_at::date and drop old partitions.Connection pool reality
Cloud SQL on db-custom-2-7680 caps at roughly 200 connections. PgClient.layer defaults to a pool of 10 per process. With three runner pods + two gateway pods that's already 50 connections — fine. Add a fourth tier (background jobs, migrations runner) and you're at 80. Past about 120 total open connections, queue queries inside the pool instead of buying more headroom; the database is the bottleneck, not the pool. PgClient.layer({ url, transformQueryNames, transformResultNames }) takes pool tuning via the underlying postgres driver options if you need to lower the per-pod cap.
15
Pulumi makes environments repeatable. Production also needs disciplined release mechanics.
Typecheck, tests, lint, Pulumi preview on every PR.
Tagged by git SHA, signed, pushed to Artifact Registry.
Apply backward-compatible DB changes before deploying the new image.
Staging, one runner, then full replica count. Smoke tests after each step.
Pulumi
Distinct dev, staging, and prod stacks with protected resources.
Migrations
Add columns and tables first. Deploy code. Backfill. Remove old fields later.
Rollback
Release plans cover DB compatibility, workflow versions, event schema compatibility.
Workflow versioning
Long-running workflows resume after deploys. Version names or preserve old handlers.
Policy
CI posts Pulumi preview. Production apply requires approval and pinned providers.
Smoke
Start a workflow, stream events, kill a pod, verify completion, check logs.
Migration tool — pick one
For a course-shaped repo, Drizzle (drizzle-kit migrate) is the lightest option that produces plain SQL files you can read and audit. node-pg-migrate is also fine if you prefer hand-written up/down. The course's expand/deploy/contract rule applies to either:
CONCURRENTLY.# Service account the CI uses to run `pulumi up` against prod.
# Bind these roles project-wide unless you can scope tighter.
roles/container.admin # GKE Autopilot cluster + workloads
roles/cloudsql.admin # Cloud SQL instances and users
roles/artifactregistry.admin # repos + image pushes
roles/secretmanager.admin # versions + IAM bindings
roles/iam.serviceAccountAdmin # to create the workload-identity GSAs
roles/iam.serviceAccountUser # to *use* those GSAs in workloads
roles/compute.networkAdmin # VPC, subnets, Cloud Armor
roles/servicenetworking.networksAdmin # private services access
roles/storage.admin # Pulumi state bucket + course buckets
roles/monitoring.admin # dashboards + alert policies
# Do NOT grant roles/owner. It works, then it leaks.
16
For a 5–10M user product, start with strong single-region HA and tested disaster recovery. Multi-region active-active comes after this is solid.
| Area | Standard target | GCP mechanism | Course lab |
|---|---|---|---|
| GKE runtime | Multi-replica, disruption-safe rollouts. | StatefulSet, readiness, PDB, Autopilot. | Drain one pod mid-workflow. |
| Database | HA primary, PITR, tested restore. | Cloud SQL HA, backups, PITR. | Restore staging from backup. |
| Gateway | Horizontal autoscaling, stateless reconnect. | GKE HPA or Cloud Run autoscaling. | WebSocket reconnect soak test. |
| Provider outage | Graceful fallback or queueing. | ExecutionPlan, rate limits, fallback models. | Force primary model failure. |
| Traffic spike | Admission control before overload. | Per-tenant quotas, queue depth, 429 policy. | Load test to controlled rejection. |
| Incident response | Known runbooks and owners. | Cloud Monitoring alerts, dashboards. | Run a simulated incident review. |
Production rule
If a failure mode does not have an alert, a dashboard panel, and a runbook entry, it is not production-ready.
17
Agent systems carry unique blast-radius problems: model spend, tool permissions, prompt injection, retention, and tenant isolation.
Tenancy
Every table, event, payload, trace, metric, log carries tenant id. Quotas before workflows start.
Abuse
Rate limits by tenant, user, IP, and tool category. Reject early before LLM cost.
Cost
Track tokens, tool calls, retries, model class, run duration. Hard caps and fallbacks.
Privacy
Define TTLs for raw events, summaries, embeddings, artifacts, and audit records.
Tools
Tool handlers are Effect services with typed requirements and policy checks.
Pulumi
Enforce private DB, no public buckets, required labels, deletion protection, approved regions.
Advanced
18
Single-region HA covers most 5–10M-user products. Cross-region buys disaster recovery and lower global latency, but it adds replication lag, failover drills, and workflow-versioning rules. Start with read replicas before active-active.
Step 1
Primary stays availabilityType: "REGIONAL". Add a replicaInstance in another region. Promote on disaster, then repoint the gateway.
Step 2
Deploy the gateway in two regions behind a global external load balancer. Sticky to primary DB until a failover playbook flips it.
Step 3
Cross-region failover may resume workflows under the new primary. Old workflow names must keep their handlers for at least one major version.
new gcp.sql.DatabaseInstance("agents-pg-replica", {
region: "us-east1",
databaseVersion: "POSTGRES_16",
masterInstanceName: primary.name,
// Postgres read replicas can never be auto-failover targets in
// Cloud SQL — this field is effectively a no-op here. Failover for
// Postgres uses the primary's REGIONAL availability + a manual
// promote on the replica during a regional outage.
replicaConfiguration: { failoverTarget: false },
settings: {
tier: "db-custom-2-7680",
availabilityType: "ZONAL", // replicas don't need HA
ipConfiguration: {
ipv4Enabled: false,
privateNetwork: network.id
}
}
})
Honest limit
Effect Cluster is not designed for split-brain across regions today. Real active-active means workflow ids partitioned by region or an external coordinator. Most 5–10M-user products do not need this.
19
By default all entities share one shard ring. For premium tenants, large customers, or noisy workloads, route them to a dedicated runner pool with shard groups.
export const AgentEntity = Entity.make("Agent", [Ask, Cancel])
.annotate(
ClusterSchema.ShardGroup,
(entityId) =>
entityId.startsWith("ent-")
? "enterprise"
: "default"
)
Runtime config
A runner only owns shards from its declared groups. Run two StatefulSets — one with shardGroups: ["default"], one with ["enterprise"] — and scale them independently.
"default" on at least one runner pool.20
Hand-rolling HTTP between Cloud Run and GKE loses your types at the boundary. @effect/rpc gives a typed schema, no codegen, that both sides import from a shared package.
export class StartRun extends Rpc.make("StartRun", {
payload: {
sessionId: Schema.String,
runId: Schema.String,
message: Schema.String
},
success: Schema.Struct({ executionId: Schema.String })
}) {}
export class PollRun extends Rpc.make("PollRun", {
payload: { executionId: Schema.String },
success: WorkflowResult
}) {}
export const AgentRpc = RpcGroup.make(StartRun, PollRun)
Why it pays
21
Cloud Armor is the WAF and rate limiter in front of your load balancer. IAP enforces Google identity on internal endpoints. Both are Pulumi resources you add once the gateway is stable.
WAF
Use Google preconfigured rules for SQLi, XSS, and protocol attacks. Add per-tenant rate limit by header.
Rate limit
Two limit policies stack: a global IP throttle and a tenant-key throttle. Reject early with 429 before any LLM cost.
IAP
Admin dashboards, replay tooling, and runbook actions sit behind IAP with Google login. No VPN needed.
new gcp.compute.SecurityPolicy("gw-armor", {
rules: [
{
action: "deny(403)",
priority: 1000,
match: {
expr: {
expression:
"evaluatePreconfiguredExpr('sqli-stable')"
}
}
},
{
action: "rate_based_ban",
priority: 2000,
rateLimitOptions: {
rateLimitThreshold: { count: 100, intervalSec: 60 },
conformAction: "allow",
exceedAction: "deny(429)",
enforceOnKey: "HTTP-HEADER",
enforceOnKeyName: "x-tenant-id"
},
match: {
versionedExpr: "SRC_IPS_V1",
config: { srcIpRanges: ["*"] }
}
},
{
action: "allow",
priority: 2147483647,
match: {
versionedExpr: "SRC_IPS_V1",
config: { srcIpRanges: ["*"] }
}
}
]
})
22
At 5–10M users you must answer “which tenant spent how much, on which model, doing what.” Two streams feed it: GCP Billing export and your own token accounting.
Labels
Pulumi applies tenant, env, app, component labels to GKE node pools, Cloud SQL, Cloud Run, and Cloud Storage. Cloud Billing rolls these up.
Token usage
response.usage from LanguageModel.generateText gives input/output tokens. Persist them keyed by tenant, workflow, model.
Reports
Export Cloud Billing to BigQuery. Join with your token usage table. Looker Studio for the per-tenant dashboard.
const response = yield* LanguageModel.generateText({
prompt: payload.message
})
// response.usage is { inputTokens, outputTokens, totalTokens,
// cachedInputTokens }. The model name is not on usage — pass it
// through from the workflow payload or your model config so the
// number you store matches the call you actually made.
yield* Activity.make({
name: "RecordUsage",
execute: TokenUsage.append({
tenantId: payload.tenantId,
executionId,
model: payload.model,
inputTokens: response.usage.inputTokens,
outputTokens: response.usage.outputTokens,
cachedInputTokens: response.usage.cachedInputTokens ?? 0
})
})
Capstone
23
The capstone is a reusable repo: app code, infrastructure code, local development, and production deployment.
apps/
cloudrun-main/ # existing app or adapter
agent-runtime/ # Effect cluster + workflow runtime
agent-gateway/ # internal HTTP + WebSocket API in GKE
packages/
domain/ # schemas, errors, tools, prompts
effect-agents/ # Entity + Workflow modules
event-store/ # SQL-backed durable events
infra/
Pulumi.yaml
index.ts # GCP + Kubernetes resources
components/
network.ts
artifact-registry.ts
cloud-sql.ts
gke-autopilot.ts
agent-runtime.ts
gateway.ts
docker-compose.yml # local Postgres + runtime
Makefile # build, push, preview, deploy
Graduation criteria
lastEventId.24
Every entry below is a real failure mode from running this stack. Symptoms in the words you'll see in a log or alert, then the fix.
storage: "local" from local development. NodeClusterSocket.layer only persists shard ownership and runner registry when storage: "sql" is set and the SQL client layer is provided. Set both, redeploy with one replica, then scale out.runnerAddress and runnerListenAddress are not the same thing. runnerAddress is the address peers use to call this runner — it must be the pod's StatefulSet DNS (pod-0.svc.cluster.local). runnerListenAddress is what the local server binds to — that one is 0.0.0.0. Using 0.0.0.0 for both is the most common bug; the cluster registers an unreachable address.idempotencyKey is not deterministic. Anything that includes Date.now(), crypto.randomUUID(), or the current pod hostname produces a new execution id every retry. The key must be a stable function of the request — usually a client-provided runId:
// bad — new execution every call
idempotencyKey: () => crypto.randomUUID()
// good — client owns the run id, retries dedupe
idempotencyKey: ({ runId }) => runId
Activity retries forever and never surfaces the error.Activity.make on its own retries until the workflow's suspendedRetrySchedule gives up — effectively forever in dev. Wrap with Activity.retry({ times: N }) for a bounded budget, and consider .withCompensation so a final failure cleans up.FATAL: sorry, too many clients already.db-custom-2-7680 caps at ~200 connections. With PgClient.layer's default pool of 10, you'll hit this around 20 pods. Either lower the per-pod pool (driver options on the layer), use the Cloud SQL Auth Proxy with its built-in pooling, or upsize the instance. Adding more pods to "fix" latency makes this worse, not better.403 unauthorized.roles/run.invoker on the target — or the gateway is verifying the JWT itself and the audience doesn't match. (2) The ID token's aud claim equals the gateway's URL exactly, scheme and trailing slash included. (3) The token is in Authorization: Bearer, not in a custom header that the load balancer strips.OOMKilled on GKE Autopilot under load.resources.requests.memory to your real p95 plus a margin (e.g. 1Gi for the agent runtime), and set limits.memory equal to requests so Node sees the right heap target.apiVersion: v1
kind: ServiceAccount
metadata:
name: agent-runtime
namespace: agents
annotations:
iam.gke.io/gcp-service-account: agent-runtime@PROJECT.iam.gserviceaccount.com
roles/iam.workloadIdentityUser granted to PROJECT.svc.id.goog[agents/agent-runtime]. Module 07 has the Pulumi for both halves.
RunnerHealth.layerK8s is what tells the cluster a pod is gone. If you're using runnerHealth: "k8s", the K8s API client needs read access to pods in the namespace — that's the Role + RoleBinding from module 07. If you've left RunnerHealth.layerNoop in by accident (very common when copy-pasting test setup), dead pods stay "alive" forever and their shards are orphaned.Workflow.make({ name: "AgentWorkflow" }) binds executions to that exact string. Renaming, even cosmetically, abandons every in-flight execution. Keep old names. If you must evolve, add a new workflow alongside and migrate explicitly.pulumi up sits on Cloud SQL for ten minutes then errors.gcp.servicenetworking.Connection to exist and propagate before gcp.sql.DatabaseInstance with privateNetwork set. Add an explicit dependsOn: [privateVpcConnection] on the DB resource — Pulumi will not infer this dependency because the values aren't interpolated.When this list isn't enough
Two diagnostics worth running before opening an issue. First: kubectl -n agents logs -l app=effect-agent-runtime --tail=200 --prefix with structured-log filter on severity>=WARNING — the cluster prints fairly readable warnings before things fail. Second: query the cluster runner registry directly: SELECT * FROM cluster_runners; SELECT * FROM cluster_shards; tells you exactly who owns what.
25
Pulumi destroys what it created. Before destroy, take a final backup and confirm no data you need is left in Cloud SQL or Cloud Storage. Resources outside Pulumi (manual buckets, ad-hoc IAM bindings) must be cleaned up by hand.
Step 1
Force an on-demand Cloud SQL backup and export to Cloud Storage. Export workflow tables if you need them long-term.
Step 2
Scale the StatefulSet to zero first so workflows release shard locks and write final state.
Step 3
Pulumi destroy removes every resource it owns. Dependency order is automatic.
# 1. final SQL backup + export
gcloud sql backups create \
--instance="$(pulumi -C infra stack output cloudSqlInstance)" \
--description="pre-destroy snapshot"
gcloud sql export sql \
"$(pulumi -C infra stack output cloudSqlInstance)" \
"gs://$PROJECT_ID-teardown/dump-$(date +%Y%m%d).sql.gz" \
--database=effect_cluster
# 2. drain runners gracefully
kubectl -n agents scale statefulset effect-agent-runtime --replicas=0
kubectl -n agents rollout status statefulset/effect-agent-runtime
# 3. destroy infrastructure
cd infra
pulumi destroy --yes
pulumi stack rm dev --yes
# 4. delete the GCP project (irreversible — only if isolated)
gcloud projects delete "$PROJECT_ID"
Last guard
Never run gcloud projects delete on a shared or production project. Keep teardown stacks in their own GCP project so destruction has no blast radius.