For Adam Ceresia  ·  Concept brief

The agentic AI infrastructure layer purpose-built for 125+ FIs.

Posh AI runs REALM™, Operating Procedures, and 300+ FI deployments across voice, digital, search, and outreach. That shape — multi-tenant governed reasoning, LLM-heavy, latency-sensitive, compliance-burdened — maps almost 1:1 to Workers for Platforms, AI Gateway, and Workers AI. Cloudflare didn't build these for Posh, but they may as well have.

125+
FIs on Posh today
300+
Live deployments
330+
Cloudflare POPs
~50ms
Voice TTFT at edge
The thesis

You're not a banking AI company.
You're a multi-tenant LLM platform that happens to serve banks.

Every Posh customer has its own Operating Procedures, knowledge base, voice persona, escalation policy, and compliance rules. 125+ tenants. Each one needs isolated reasoning, governed RAG, audit-ready execution, and sub-second response. That's not a banking problem. That's a multi-tenant agentic AI infrastructure problem — and it's the workload Cloudflare's developer platform was built for.

What we noticed in your stack

Your product runs on GCP (geographic-redundant Google zones — per your Trust page). Your marketing runs on Webflow. Your corp DNS is on AWS Route 53. Three clouds, three perimeters, three audit boundaries — to serve a workload that should sit behind one edge. That's not a critique; it's how every fast-growing SaaS starts. The question is what comes next as you scale from 125 FIs to 500.

Value plays

Eight things Cloudflare changes for Posh AI.

Ranked by impact-per-effort for your specific workload shape.

01 — Flagship

Per-FI isolation with Workers for Platforms

Each FI gets its own dispatch namespace. Citadel's Operating Procedures, Vystar's escalation policy, Camden National's compliance language — fully isolated, instantly deployable, individually metered. No more "noisy neighbor" risk between tenants.

Workers for Platforms Dispatch Namespaces
125 FIs × isolated compute = zero blast radius
02 — Highest ROI

AI Gateway: cache hits across 125 FIs

"What's my routing number?" "How do I reset my password?" — the same 100 questions repeat across every FI you serve. AI Gateway caches LLM responses at the edge with semantic similarity. You pay for one inference and serve thousands of customers. Cache hit rates of 40–60% are realistic.

AI Gateway Semantic Cache Multi-LLM
See calculator below ↓
03 — Voice latency

Sub-second voice agents at 330+ POPs

Voice assistants live or die by time-to-first-token. A 200ms regional round-trip to GCP is the difference between "AI feels human" and "AI feels broken." Workers AI runs inference at the same edge your customer's phone hits — Whisper for STT, Llama / Workers AI catalog for response, all under 50ms TTFT.

Workers AI Whisper Edge Inference
~3–4× faster than centralized GCP region
04 — Knowledge

Vectorize + R2 for governed RAG per FI

REALM™ pulls from "verified institutional knowledge, not the open web." Each FI's knowledge base becomes a Vectorize index + R2 namespace — encrypted at rest, isolated per tenant, queryable in <30ms. Operating Procedures stay deterministic; retrieval stays governed; audit trail stays clean.

Vectorize R2 Workers AI Embeddings
Zero egress fees on knowledge retrieval
05 — Session state

Durable Objects for multi-turn voice conversations

"Skip ahead, loop back, pivot mid-workflow without losing context" — that's Operating Procedures language for stateful agents. Durable Objects give each conversation its own single-threaded actor with strong consistency, automatic geographic routing, and zero session-affinity infrastructure to manage.

Durable Objects WebSockets
No Redis cluster, no sticky-session load balancer
06 — Cost

Zero-egress storage with R2

Audio recordings, transcripts, training data for Posh Simulator, knowledge documents. Move that off S3 / GCS and pay $0 per GB egress. For 300+ deployments generating call recordings 24/7, R2 typically pays for itself in month 2.

R2 Zero Egress
~40–60% storage TCO reduction
07 — Compliance

One audit boundary, not three

You're SOC 2 Type II and CSA STAR Level 1. Today that posture spans GCP + Webflow + AWS + your own controls. On Cloudflare's developer platform, knowledge → reasoning → execution → delivery sit inside one compliance perimeter (SOC 2 Type II, ISO 27001, PCI DSS, FedRAMP Moderate). Less to evidence in every FI vendor review.

Compliance WAF Audit Logs
~30% reduction in vendor-review cycle time
08 — Marketing

Pages: a clean exit from Webflow

www.posh.ai is on Webflow today. When you outgrow it (custom server-side logic, real A/B routing, in-page AI demos), Pages + Workers is the path. Same Git workflow, real edge compute on every route, no CMS lock-in. Not urgent — but it's there when you need it.

Pages Workers
No more Webflow tier escalation
Mapping

Your platform, mapped to ours.

Every pillar Posh names on /platform has a clean Cloudflare primitive. Not "kind of" — exactly.

Posh pillar / need What it does Cloudflare primitive
Knowledge Governed RAG per FI, single source of truth, no open-web guesswork Vectorize + R2 + Workers AI Embeddings
Reasoning (REALM™) Contextual understanding, adaptive decisions, policy-bound LLM orchestration AI Gateway + Workers AI + multi-provider routing
Control (Operating Procedures) Deterministic guardrails, compliance rules, audit-ready execution Workers for code-as-policy + Logpush for audit trail
Integrations Plug into FIS / Jack Henry / Fiserv core systems, telephony, KMS Workers + API Shield + mTLS per tenant
Security SOC 2 Type II, CSA STAR, continuous monitoring, audit logs WAF, Bot Mgmt, Cloudflare One, FedRAMP-ready posture
Per-FI tenant isolation 125+ banks, each with its own OPs, knowledge, persona, escalation Workers for Platforms dispatch namespaces
Voice TTFT Sub-second response or the conversation breaks Workers AI at edge + Durable Objects for session
Posh Outreach (proactive) Initiate conversations on schedule across FIs Cron Triggers + Queues + Workflows
Quantify it

The AI Gateway cache math for 125+ FIs.

Drag the sliders. The compounding insight: when you serve N tenants asking similar questions, semantic caching scales with N. The bigger Posh gets, the better the math.

AI Gateway savings calculator

Annual LLM inference cost — with and without semantic cache

Assumes GPT-4-class pricing (~$10/M input tokens, $30/M output tokens). Adjust for your actual model mix.

125
8,000
2,500
45%
$15
Total LLM calls / year 365M
Total tokens / year 913B
Cost without AI Gateway $13.7M
Cost with semantic cache $7.5M
Annual savings $6.2M

Calculator is directional; actual cache-hit rates depend on FAQ overlap, prompt structure, and TTL config. AI Gateway also adds free observability, rate limiting, fallback routing, and request logging — none of which is priced into the chart above.

Architecture

How a Posh AI request flows on Cloudflare.

A single voice call from a credit-union member. Following the full path.

1

Inbound voice hits the edge at the closest POP

WebRTC or SIP termination at the Cloudflare POP nearest the caller. WAF, bot mitigation, and DDoS protection applied at L3/L4/L7 before a single token is generated.

Magic Transit WAF Calls (WebRTC)
2

Workers for Platforms routes to the right FI's namespace

Hostname → dispatch namespace lookup. Citadel's worker, Vystar's worker, Camden National's worker — each isolated, each with its own Operating Procedures bundle. Zero noisy-neighbor risk.

Workers for Platforms Dispatch Namespaces
3

Durable Object holds the conversation state

A single-threaded actor per call. Tracks turn history, authenticated identity, Operating Procedure state machine. Strongly consistent, geo-routed, persists if the caller switches networks.

Durable Objects Storage API
4

Workers AI transcribes the speech (Whisper) at the edge

Audio → text in <100ms at the same POP. No regional round-trip. Pay per second of audio, not provisioned GPU hours.

Workers AI Whisper
5

Vectorize finds matching knowledge for this FI only

Semantic search across that FI's governed knowledge base. Per-tenant namespace isolation. Sub-30ms query latency. Returns the top-K passages REALM™ needs for the reasoning step.

Vectorize R2 Workers AI Embeddings
6

AI Gateway checks the semantic cache before any LLM call

Identical or semantically-similar request from any other FI in the last N hours? Return cached response. Otherwise route to your chosen LLM (OpenAI, Anthropic, Bedrock, Workers AI) with built-in rate limiting, retry, and fallback.

AI Gateway Semantic Cache Multi-provider
7

The Worker enforces the Operating Procedure deterministically

LLM proposes; the Worker disposes. Operating Procedure code validates required disclosures, identity checks, regulatory language. The LLM never speaks before the policy layer approves.

Workers Code-as-policy
8

Response synthesized, logged, sent back as voice

Text → audio at the edge. Full request trace pushed to Logpush (S3 / GCS / BigQuery sink — your choice). Every turn is audit-ready. The customer hears a sub-second response and never knows there were eight layers behind it.

Workers AI TTS Logpush Workers Analytics Engine

Let's talk about what 500 FIs looks like.

125 customers is where the GCP-centric architecture still works. 500 is where the cache economics, voice latency, and per-tenant isolation start to dominate the P&L. A 30-minute conversation to map your roadmap to ours — no slides, no sales pitch, just the engineering math.

Book 30 min with Matt Holscher
Matt Holscher · Solutions Engineer · Cloudflare Developer Platform