Every AI automation platform claims “multi-model support.” In practice, that usually means you can switch between GPT-4o and GPT-5 in a settings dropdown. Maybe Claude is listed too. If you want to run an open-source model, you’re on your own.
JieGou takes a different approach. We built a universal model layer that treats every LLM — cloud-hosted or self-hosted, proprietary or open-source — as a first-class citizen. This post explains how it works and why it matters.
Four provider tiers in one platform
Tier 1: Cloud providers with BYOK
Bring your own API keys for Anthropic (Claude Sonnet 4.6, Haiku 4.5, Opus 4.6), OpenAI (GPT-5.2, GPT-5-mini, GPT-5-nano, o3, o4-mini), and Google (Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro/Flash).
Your keys are encrypted with AES-256-GCM using per-account derived keys via HKDF-SHA256. They’re decrypted in-memory only during execution and never stored in plaintext. You can also use platform-provided keys on the free tier to get started without entering any credentials.
Tier 2: Certified open-source models
We’ve tested four open-source models end-to-end on vLLM and certified them for full JieGou compatibility — including tool calling, structured JSON output, and recipe execution:
| Model | Parameters | Tool Calling | Structured Output | Vision | Context |
|---|---|---|---|---|---|
| Llama 4 Maverick | 400B+ MoE | Yes | Yes | Yes | 1M tokens |
| DeepSeek V3.2 | 671B MoE | Yes | Yes | No | 128K tokens |
| Qwen 3 235B | 235B MoE | Yes | Yes | No | 128K tokens |
| Mistral 3 Large | 123B dense | Yes | Yes | Yes | 128K tokens |
“Certified” means we’ve run thousands of recipe executions against these models, verified that tool calling and structured output work correctly, and documented the compatibility level. You can deploy them with confidence.
Tier 3: Community models
Any model accessible via an OpenAI-compatible API works with JieGou. We haven’t tested it, so it gets a “community” tier label — but the integration is identical. If it speaks the OpenAI API format, JieGou can use it.
Tier 4: Auto-discovered local models
JieGou probes for local inference servers at startup:
http://ollama:11434(Docker Compose service name)http://localhost:11434(local Ollama)http://localhost:8000(local vLLM)- The
OLLAMA_BASE_URLenvironment variable
When it finds a server, it queries the model list and makes those models available in the model picker. No manual configuration needed. The discovery result is cached for 5 minutes to avoid hammering your inference server.
Per-step model selection
This is the feature that makes multi-provider support actually useful, rather than a checkbox on a comparison chart.
In a JieGou workflow, every step can use a different model. A typical setup:
| Workflow Step | Task | Model | Why |
|---|---|---|---|
| 1. Research | Deep competitive analysis | Claude Opus 4.6 | Best reasoning quality |
| 2. Classify | Categorize findings | GPT-5-nano | Fast and cheap for classification |
| 3. Extract | Pull structured data | Llama 4 Maverick | High volume at lowest cost |
| 4. Summarize | Write executive brief | Claude Sonnet 4.6 | Strong writing quality |
| 5. Translate | Localize to 5 languages | Qwen 3 235B | Best multilingual performance |
The same flexibility applies to recipes (each recipe has its own model setting), conversations (pick a model per chat), and batch runs (the selected model applies to all rows).
Model recommendation engine
Choosing the right model for every task sounds powerful but also complex. The recommendation engine makes it practical.
After 10+ runs of a recipe, the engine has enough data to score every model you’ve used:
score = successRate × 0.5 + costEfficiency × 0.3 + speed × 0.2
It looks at the last 60 days of execution history and compares:
- Success rate — what percentage of runs completed without errors
- Cost efficiency — cost per successful run (lower is better)
- Speed — average execution duration (faster is better)
If your current model has ≥90% success rate across 10+ runs, the engine confirms it’s a good choice. Otherwise, it recommends the highest-scoring alternative with full metrics so you can make an informed switch.
For rigorous comparison, you can run a bakeoff — a head-to-head evaluation with LLM-as-judge scoring and 95% confidence intervals. Bakeoffs can compare any two models, any two recipes, or any two workflows.
Enterprise resilience
Running production workloads across multiple providers requires more than API key management. JieGou includes three resilience layers:
Circuit breakers
Each provider gets its own circuit breaker. If 5 calls fail within 60 seconds, the circuit opens — subsequent calls fail fast instead of timing out. After 30 seconds, the circuit enters half-open state and sends a probe request. If it succeeds, the circuit closes and traffic resumes.
For openai-compatible providers, circuit breakers are scoped per-account (since each customer may have a different endpoint). Cloud providers share a global circuit breaker.
Critically, circuit breakers are fail-open — if Redis is down and we can’t check the circuit state, we let the call through. This means a monitoring failure never blocks your workflows.
Concurrency limits
A global semaphore limits concurrent LLM calls per account to prevent runaway usage. The limit scales with your plan:
| Plan Tier | Global Capacity Share | Per-Account Max |
|---|---|---|
| Enterprise | 100% (150 slots) | 10 concurrent |
| Pro | 83% (125 slots) | 10 concurrent |
| Starter | 67% (100 slots) | 10 concurrent |
Cost tracking
Every LLM call records token usage and estimated cost. When you use BYOK, the cost is tracked separately — it shows up in your analytics dashboard but doesn’t count toward platform usage limits, since you’re paying your provider directly.
The cost estimator uses historical averages from your last 20 successful runs to project costs before you execute. You can see expected spend per recipe, per workflow step, and per batch run.
Zero-knowledge key architecture
JieGou never sees your API keys in plaintext at rest. The encryption pipeline:
- Root key loaded from Secret Manager or environment variable (64-character hex)
- Per-account key derived via HKDF-SHA256:
HKDF(rootKey, "", "jiegou-byok-envelope-v1:{accountId}", 32) - Encryption: AES-256-GCM with random 12-byte IV and 16-byte auth tag
- Storage: Only the ciphertext + IV + auth tag are stored in Firestore
- Decryption: Happens in-memory at execution time, never persisted
Key rotation is supported — the system can migrate from the legacy global encryption scheme to per-account envelope encryption without downtime.
If an API call returns 401 or 403, the system automatically marks the key as invalid and surfaces a clear error. You can re-validate or replace the key from the settings page.
Getting started
- Free tier: Use platform-provided keys for Anthropic, OpenAI, and Google — no credentials needed
- BYOK: Go to Settings > API Keys, add your provider keys, and they’re encrypted immediately
- Open source: Enter a custom base URL (e.g.,
http://your-vllm-server:8000/v1) and model name - Auto-discovery: If Ollama or vLLM is running locally, models appear automatically
Multi-provider model access is available on all plans. OpenAI-compatible endpoints and the model recommendation engine are available on Pro and above. Certified model registry and auto-discovery are Enterprise features.
Explore multi-provider model support or start your free trial.