The pitch on posh.tech is clear: "bank-savvy AI" for credit unions and community banks. Every Karina call is multiple model invocations — speech-to-text, intent classification, RAG against the FI's policy corpus, response generation — under PII-regulated data, with a different cost story per FI.
Each FI you onboard is a new tenant with its own policy corpus, brand voice, call volume curve, and compliance posture. Multi-tenant AI at scale isn't a feature — it's the foundation. That foundation is what Cloudflare's developer platform is built for.
Three places the developer platform maps directly to how Posh delivers Karina:
AI Gateway — under every Karina call: per-FI spend caps, PII redaction before the LLM sees account data, one unified log across OpenAI, Anthropic, and any self-hosted model. The answer to "which credit union spent what last quarter?"
Vectorize — for the per-FI knowledge retrieval. 125+ separate policy corpora, each searched at conversation speed.
Workers + Durable Objects — for the multi-tenant orchestration: per-FI session state, per-conversation context, regional residency where the FI's compliance posture requires it.
Is the bigger near-term pain on the cost-attribution side — answering each FI's "what's my AI spend this quarter?" question with confidence — or on the multi-tenant scale side — onboarding the next 100 FIs without the inference math falling apart? 20 minutes to find the right starting point.
The detailed primitive-by-primitive mapping — including the eight things Cloudflare changes for Posh AI, the request-flow diagram for a Karina call on Cloudflare, the AI Gateway cache math for 125+ FIs (with an interactive calculator), and the path to 500 FIs — is in the expanded version below.
Read the expanded version →