BYOK Architecture: How JieGou Routes LLM Calls Without Seeing Your Data

When we designed JieGou’s LLM layer, we had a constraint most platforms don’t: customers should be able to use their own API keys, and we should never see the plaintext keys at rest. This post covers how our Bring Your Own Key (BYOK) system works, the encryption scheme, the provider routing architecture, and the guardrails that keep everything running reliably.

Why BYOK Matters

Most AI platforms proxy your calls through their own API keys. That means your data flows through their accounts, your usage is subject to their rate limits, and you have no control over which models or endpoints are used.

With BYOK, each customer connects their own Anthropic, OpenAI, or Google API keys. Calls go directly to the provider using the customer’s credentials. JieGou orchestrates the workflow but doesn’t see the request or response payloads when BYOK keys are in use.

Key Encryption: AES-256-GCM

API keys are encrypted at rest using AES-256-GCM with a 12-byte initialization vector and 16-byte authentication tag. The encryption key is a 256-bit value derived from a 64-character hex string stored as an environment variable — it never touches the database.

The storage format is a Base64-encoded concatenation of IV + authTag + ciphertext. We store an 8-character safe prefix alongside for display purposes (“sk-proj…”) so users can identify which key is stored without decrypting it.

Keys are stored in a Firestore account_api_keys collection with fields for the account ID, provider name, encrypted key blob, and a validity flag.

Key Resolution Flow

When a workflow step needs to call an LLM, the key resolver runs through this sequence:

Plan gating — Check whether the account’s subscription allows BYOK (cached in Redis with a 10-minute TTL).
Redis cache lookup — Decrypted keys are cached for 5 minutes. A sentinel value (__none__) indicates a key was previously looked up and doesn’t exist, avoiding repeated Firestore reads.
Firestore lookup — If the cache misses, fetch from account_api_keys.
Validity check — Skip keys that have been marked invalid by the auto-invalidation system.
Decryption — AES-256-GCM decryption happens on the fly, in memory.
Fail-open — If anything goes wrong at any step, fall back to the platform’s own API key. Never degrade the user experience.

The fail-open design is deliberate. If Redis is down, if decryption fails, if the Firestore read errors — the workflow still runs, just using platform keys. Users see their work complete rather than getting a cryptic error.

Provider Routing

The LLM layer is built on the Vercel AI SDK with a provider abstraction that supports Anthropic, OpenAI, and Google. Each provider has two instantiation paths:

Platform key (singleton) — A shared instance created at startup, used for free-tier accounts or as the BYOK fallback.

BYOK (ephemeral) — A new provider instance created per call with the customer’s decrypted key. This instance is not cached — it’s used for one request and discarded, so decrypted keys don’t linger in memory.

Workflows can specify different models per step. A content pipeline might use Claude for nuanced writing in step 1 and GPT for structured data extraction in step 2. The provider routing handles this transparently.

Auto-Invalidation

When an LLM call returns an authentication error (HTTP 401, 402, or 403, or response bodies matching patterns like “invalid api key” or “unauthorized”), the system automatically marks that key as invalid in Firestore and evicts it from the Redis cache.

The user gets a clear message: “Your {Provider} API key is invalid or has been revoked. Please update your API key in Account Settings.” Subsequent calls fall back to platform keys until the user provides a new key.

We check against 11 known error patterns across providers. This catches rotated keys, revoked keys, and keys that have exceeded their spending limits.

Circuit Breaker

Each LLM provider has a per-provider circuit breaker (not per-account — a single provider being down affects everyone).

The breaker trips after 5 errors within a 60-second window. Once open, it stays open for 30 seconds before allowing a single probe request (half-open state). If the probe succeeds, the circuit closes.

Only server-side failures count: 5xx responses, timeouts, and connection errors. Client errors (4xx) like invalid API key don’t trip the breaker — those are handled by auto-invalidation instead.

The circuit breaker itself is fail-open. If Redis is unavailable for state tracking, all requests are allowed through. This is consistent with our overall philosophy: when infrastructure is degraded, let the user’s work proceed.

Concurrency Control

Each account is limited to 10 concurrent LLM calls via a Redis-based semaphore. This prevents a single account’s large batch run from consuming all available connections to a provider.

The semaphore uses INCR/DECR with a 5-minute TTL safety net (if a process crashes without decrementing, the counter auto-expires). On capacity exceeded, the counter is immediately decremented to avoid a leak. Like everything else, it’s fail-open on Redis errors.

Lessons Learned

Fail-open is the right default for optional features. BYOK is an enhancement, not a requirement. If the encryption pipeline, cache layer, or key resolution fails, the user should still be able to run their workflow. We log the failure for investigation but never block execution.

Cache sentinel values prevent thundering herds. Without the __none__ sentinel, an account without BYOK keys would hit Firestore on every single LLM call. Caching the negative result for 5 minutes keeps Firestore reads predictable.

Ephemeral provider instances prevent key leakage. By creating a new provider instance per call and letting it get garbage collected, we minimize the window where a decrypted key exists in memory. It’s not as strong as a hardware security module, but it’s a meaningful reduction in attack surface for a SaaS application.

Per-provider circuit breakers with per-account concurrency is the right granularity. Provider outages are global events; concurrency is a per-tenant concern. Mixing them would either be too aggressive (breaking one account breaks the breaker for everyone) or too lenient (no protection against provider-wide failures).