Bring Your Own Model: How JieGou Supports Every LLM From Claude to Llama

Every AI automation platform claims “multi-model support.” In practice, that usually means you can switch between GPT-4o and GPT-5 in a settings dropdown. Maybe Claude is listed too. If you want to run an open-source model, you’re on your own.

JieGou takes a different approach. We built a universal model layer that treats every LLM — cloud-hosted or self-hosted, proprietary or open-source — as a first-class citizen. This post explains how it works and why it matters.

Four provider tiers in one platform

Tier 1: Cloud providers with BYOK

Bring your own API keys for Anthropic (Claude Sonnet 4.6, Haiku 4.5, Opus 4.6), OpenAI (GPT-5.2, GPT-5-mini, GPT-5-nano, o3, o4-mini), and Google (Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro/Flash).

Your keys are encrypted with AES-256-GCM using per-account derived keys via HKDF-SHA256. They’re decrypted in-memory only during execution and never stored in plaintext. You can also use platform-provided keys on the free tier to get started without entering any credentials.

Tier 2: Certified open-source models

We’ve tested four open-source models end-to-end on vLLM and certified them for full JieGou compatibility — including tool calling, structured JSON output, and recipe execution:

Model	Parameters	Tool Calling	Structured Output	Vision	Context
Llama 4 Maverick	400B+ MoE	Yes	Yes	Yes	1M tokens
DeepSeek V3.2	671B MoE	Yes	Yes	No	128K tokens
Qwen 3 235B	235B MoE	Yes	Yes	No	128K tokens
Mistral 3 Large	123B dense	Yes	Yes	Yes	128K tokens

“Certified” means we’ve run thousands of recipe executions against these models, verified that tool calling and structured output work correctly, and documented the compatibility level. You can deploy them with confidence.

Tier 3: Community models

Any model accessible via an OpenAI-compatible API works with JieGou. We haven’t tested it, so it gets a “community” tier label — but the integration is identical. If it speaks the OpenAI API format, JieGou can use it.

Tier 4: Auto-discovered local models

JieGou probes for local inference servers at startup:

http://ollama:11434 (Docker Compose service name)
http://localhost:11434 (local Ollama)
http://localhost:8000 (local vLLM)
The OLLAMA_BASE_URL environment variable

When it finds a server, it queries the model list and makes those models available in the model picker. No manual configuration needed. The discovery result is cached for 5 minutes to avoid hammering your inference server.

Per-step model selection

This is the feature that makes multi-provider support actually useful, rather than a checkbox on a comparison chart.

In a JieGou workflow, every step can use a different model. A typical setup:

Workflow Step	Task	Model	Why
1. Research	Deep competitive analysis	Claude Opus 4.6	Best reasoning quality
2. Classify	Categorize findings	GPT-5-nano	Fast and cheap for classification
3. Extract	Pull structured data	Llama 4 Maverick	High volume at lowest cost
4. Summarize	Write executive brief	Claude Sonnet 4.6	Strong writing quality
5. Translate	Localize to 5 languages	Qwen 3 235B	Best multilingual performance

The same flexibility applies to recipes (each recipe has its own model setting), conversations (pick a model per chat), and batch runs (the selected model applies to all rows).

Model recommendation engine

Choosing the right model for every task sounds powerful but also complex. The recommendation engine makes it practical.

After 10+ runs of a recipe, the engine has enough data to score every model you’ve used:

score = successRate × 0.5 + costEfficiency × 0.3 + speed × 0.2

It looks at the last 60 days of execution history and compares:

Success rate — what percentage of runs completed without errors
Cost efficiency — cost per successful run (lower is better)
Speed — average execution duration (faster is better)

If your current model has ≥90% success rate across 10+ runs, the engine confirms it’s a good choice. Otherwise, it recommends the highest-scoring alternative with full metrics so you can make an informed switch.

For rigorous comparison, you can run a bakeoff — a head-to-head evaluation with LLM-as-judge scoring and 95% confidence intervals. Bakeoffs can compare any two models, any two recipes, or any two workflows.

Enterprise resilience

Running production workloads across multiple providers requires more than API key management. JieGou includes three resilience layers:

Circuit breakers

Each provider gets its own circuit breaker. If 5 calls fail within 60 seconds, the circuit opens — subsequent calls fail fast instead of timing out. After 30 seconds, the circuit enters half-open state and sends a probe request. If it succeeds, the circuit closes and traffic resumes.

For openai-compatible providers, circuit breakers are scoped per-account (since each customer may have a different endpoint). Cloud providers share a global circuit breaker.

Critically, circuit breakers are fail-open — if Redis is down and we can’t check the circuit state, we let the call through. This means a monitoring failure never blocks your workflows.

Concurrency limits

A global semaphore limits concurrent LLM calls per account to prevent runaway usage. The limit scales with your plan:

Plan Tier	Global Capacity Share	Per-Account Max
Enterprise	100% (150 slots)	10 concurrent
Pro	83% (125 slots)	10 concurrent
Starter	67% (100 slots)	10 concurrent

Cost tracking

Every LLM call records token usage and estimated cost. When you use BYOK, the cost is tracked separately — it shows up in your analytics dashboard but doesn’t count toward platform usage limits, since you’re paying your provider directly.

The cost estimator uses historical averages from your last 20 successful runs to project costs before you execute. You can see expected spend per recipe, per workflow step, and per batch run.

Zero-knowledge key architecture

JieGou never sees your API keys in plaintext at rest. The encryption pipeline:

Root key loaded from Secret Manager or environment variable (64-character hex)
Per-account key derived via HKDF-SHA256: HKDF(rootKey, "", "jiegou-byok-envelope-v1:{accountId}", 32)
Encryption: AES-256-GCM with random 12-byte IV and 16-byte auth tag
Storage: Only the ciphertext + IV + auth tag are stored in Firestore
Decryption: Happens in-memory at execution time, never persisted

Key rotation is supported — the system can migrate from the legacy global encryption scheme to per-account envelope encryption without downtime.

If an API call returns 401 or 403, the system automatically marks the key as invalid and surfaces a clear error. You can re-validate or replace the key from the settings page.

Getting started

Free tier: Use platform-provided keys for Anthropic, OpenAI, and Google — no credentials needed
BYOK: Go to Settings > API Keys, add your provider keys, and they’re encrypted immediately
Open source: Enter a custom base URL (e.g., http://your-vllm-server:8000/v1) and model name
Auto-discovery: If Ollama or vLLM is running locally, models appear automatically

Multi-provider model access is available on all plans. OpenAI-compatible endpoints and the model recommendation engine are available on Pro and above. Certified model registry and auto-discovery are Enterprise features.

Explore multi-provider model support or start your free trial.