Posh AI runs REALM™, Operating Procedures, and 300+ FI deployments across voice, digital, search, and outreach. That shape — multi-tenant governed reasoning, LLM-heavy, latency-sensitive, compliance-burdened — maps almost 1:1 to Workers for Platforms, AI Gateway, and Workers AI. Cloudflare didn't build these for Posh, but they may as well have.
Every Posh customer has its own Operating Procedures, knowledge base, voice persona, escalation policy, and compliance rules. 125+ tenants. Each one needs isolated reasoning, governed RAG, audit-ready execution, and sub-second response. That's not a banking problem. That's a multi-tenant agentic AI infrastructure problem — and it's the workload Cloudflare's developer platform was built for.
Your product runs on GCP (geographic-redundant Google zones — per your Trust page). Your marketing runs on Webflow. Your corp DNS is on AWS Route 53. Three clouds, three perimeters, three audit boundaries — to serve a workload that should sit behind one edge. That's not a critique; it's how every fast-growing SaaS starts. The question is what comes next as you scale from 125 FIs to 500.
Ranked by impact-per-effort for your specific workload shape.
Each FI gets its own dispatch namespace. Citadel's Operating Procedures, Vystar's escalation policy, Camden National's compliance language — fully isolated, instantly deployable, individually metered. No more "noisy neighbor" risk between tenants.
"What's my routing number?" "How do I reset my password?" — the same 100 questions repeat across every FI you serve. AI Gateway caches LLM responses at the edge with semantic similarity. You pay for one inference and serve thousands of customers. Cache hit rates of 40–60% are realistic.
Voice assistants live or die by time-to-first-token. A 200ms regional round-trip to GCP is the difference between "AI feels human" and "AI feels broken." Workers AI runs inference at the same edge your customer's phone hits — Whisper for STT, Llama / Workers AI catalog for response, all under 50ms TTFT.
REALM™ pulls from "verified institutional knowledge, not the open web." Each FI's knowledge base becomes a Vectorize index + R2 namespace — encrypted at rest, isolated per tenant, queryable in <30ms. Operating Procedures stay deterministic; retrieval stays governed; audit trail stays clean.
"Skip ahead, loop back, pivot mid-workflow without losing context" — that's Operating Procedures language for stateful agents. Durable Objects give each conversation its own single-threaded actor with strong consistency, automatic geographic routing, and zero session-affinity infrastructure to manage.
Audio recordings, transcripts, training data for Posh Simulator, knowledge documents. Move that off S3 / GCS and pay $0 per GB egress. For 300+ deployments generating call recordings 24/7, R2 typically pays for itself in month 2.
You're SOC 2 Type II and CSA STAR Level 1. Today that posture spans GCP + Webflow + AWS + your own controls. On Cloudflare's developer platform, knowledge → reasoning → execution → delivery sit inside one compliance perimeter (SOC 2 Type II, ISO 27001, PCI DSS, FedRAMP Moderate). Less to evidence in every FI vendor review.
www.posh.ai is on Webflow today. When you outgrow it (custom server-side logic, real A/B routing, in-page AI demos), Pages + Workers is the path. Same Git workflow, real edge compute on every route, no CMS lock-in. Not urgent — but it's there when you need it.
Every pillar Posh names on /platform has a clean Cloudflare primitive. Not "kind of" — exactly.
| Posh pillar / need | What it does | Cloudflare primitive |
|---|---|---|
| Knowledge | Governed RAG per FI, single source of truth, no open-web guesswork | Vectorize + R2 + Workers AI Embeddings |
| Reasoning (REALM™) | Contextual understanding, adaptive decisions, policy-bound LLM orchestration | AI Gateway + Workers AI + multi-provider routing |
| Control (Operating Procedures) | Deterministic guardrails, compliance rules, audit-ready execution | Workers for code-as-policy + Logpush for audit trail |
| Integrations | Plug into FIS / Jack Henry / Fiserv core systems, telephony, KMS | Workers + API Shield + mTLS per tenant |
| Security | SOC 2 Type II, CSA STAR, continuous monitoring, audit logs | WAF, Bot Mgmt, Cloudflare One, FedRAMP-ready posture |
| Per-FI tenant isolation | 125+ banks, each with its own OPs, knowledge, persona, escalation | Workers for Platforms dispatch namespaces |
| Voice TTFT | Sub-second response or the conversation breaks | Workers AI at edge + Durable Objects for session |
| Posh Outreach (proactive) | Initiate conversations on schedule across FIs | Cron Triggers + Queues + Workflows |
Drag the sliders. The compounding insight: when you serve N tenants asking similar questions, semantic caching scales with N. The bigger Posh gets, the better the math.
Assumes GPT-4-class pricing (~$10/M input tokens, $30/M output tokens). Adjust for your actual model mix.
Calculator is directional; actual cache-hit rates depend on FAQ overlap, prompt structure, and TTL config. AI Gateway also adds free observability, rate limiting, fallback routing, and request logging — none of which is priced into the chart above.
A single voice call from a credit-union member. Following the full path.
WebRTC or SIP termination at the Cloudflare POP nearest the caller. WAF, bot mitigation, and DDoS protection applied at L3/L4/L7 before a single token is generated.
Hostname → dispatch namespace lookup. Citadel's worker, Vystar's worker, Camden National's worker — each isolated, each with its own Operating Procedures bundle. Zero noisy-neighbor risk.
A single-threaded actor per call. Tracks turn history, authenticated identity, Operating Procedure state machine. Strongly consistent, geo-routed, persists if the caller switches networks.
Audio → text in <100ms at the same POP. No regional round-trip. Pay per second of audio, not provisioned GPU hours.
Semantic search across that FI's governed knowledge base. Per-tenant namespace isolation. Sub-30ms query latency. Returns the top-K passages REALM™ needs for the reasoning step.
Identical or semantically-similar request from any other FI in the last N hours? Return cached response. Otherwise route to your chosen LLM (OpenAI, Anthropic, Bedrock, Workers AI) with built-in rate limiting, retry, and fallback.
LLM proposes; the Worker disposes. Operating Procedure code validates required disclosures, identity checks, regulatory language. The LLM never speaks before the policy layer approves.
Text → audio at the edge. Full request trace pushed to Logpush (S3 / GCS / BigQuery sink — your choice). Every turn is audit-ready. The customer hears a sub-second response and never knows there were eight layers behind it.
125 customers is where the GCP-centric architecture still works. 500 is where the cache economics, voice latency, and per-tenant isolation start to dominate the P&L. A 30-minute conversation to map your roadmap to ours — no slides, no sales pitch, just the engineering math.
Book 30 min with Matt Holscher →