Skip to content
Product

Bring Your Own Model: How JieGou Supports Every LLM From Claude to Llama

How JieGou's multi-provider architecture lets you run Claude, GPT-5, Gemini, and open-source models like Llama 4 from a single platform — with per-step selection, auto-discovery, and zero-knowledge key encryption.

JT
JieGou Team
· · 7 min read

Every AI automation platform claims “multi-model support.” In practice, that usually means you can switch between GPT-4o and GPT-5 in a settings dropdown. Maybe Claude is listed too. If you want to run an open-source model, you’re on your own.

JieGou takes a different approach. We built a universal model layer that treats every LLM — cloud-hosted or self-hosted, proprietary or open-source — as a first-class citizen. This post explains how it works and why it matters.

Four provider tiers in one platform

Tier 1: Cloud providers with BYOK

Bring your own API keys for Anthropic (Claude Sonnet 4.6, Haiku 4.5, Opus 4.6), OpenAI (GPT-5.2, GPT-5-mini, GPT-5-nano, o3, o4-mini), and Google (Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro/Flash).

Your keys are encrypted with AES-256-GCM using per-account derived keys via HKDF-SHA256. They’re decrypted in-memory only during execution and never stored in plaintext. You can also use platform-provided keys on the free tier to get started without entering any credentials.

Tier 2: Certified open-source models

We’ve tested four open-source models end-to-end on vLLM and certified them for full JieGou compatibility — including tool calling, structured JSON output, and recipe execution:

ModelParametersTool CallingStructured OutputVisionContext
Llama 4 Maverick400B+ MoEYesYesYes1M tokens
DeepSeek V3.2671B MoEYesYesNo128K tokens
Qwen 3 235B235B MoEYesYesNo128K tokens
Mistral 3 Large123B denseYesYesYes128K tokens

“Certified” means we’ve run thousands of recipe executions against these models, verified that tool calling and structured output work correctly, and documented the compatibility level. You can deploy them with confidence.

Tier 3: Community models

Any model accessible via an OpenAI-compatible API works with JieGou. We haven’t tested it, so it gets a “community” tier label — but the integration is identical. If it speaks the OpenAI API format, JieGou can use it.

Tier 4: Auto-discovered local models

JieGou probes for local inference servers at startup:

  1. http://ollama:11434 (Docker Compose service name)
  2. http://localhost:11434 (local Ollama)
  3. http://localhost:8000 (local vLLM)
  4. The OLLAMA_BASE_URL environment variable

When it finds a server, it queries the model list and makes those models available in the model picker. No manual configuration needed. The discovery result is cached for 5 minutes to avoid hammering your inference server.

Per-step model selection

This is the feature that makes multi-provider support actually useful, rather than a checkbox on a comparison chart.

In a JieGou workflow, every step can use a different model. A typical setup:

Workflow StepTaskModelWhy
1. ResearchDeep competitive analysisClaude Opus 4.6Best reasoning quality
2. ClassifyCategorize findingsGPT-5-nanoFast and cheap for classification
3. ExtractPull structured dataLlama 4 MaverickHigh volume at lowest cost
4. SummarizeWrite executive briefClaude Sonnet 4.6Strong writing quality
5. TranslateLocalize to 5 languagesQwen 3 235BBest multilingual performance

The same flexibility applies to recipes (each recipe has its own model setting), conversations (pick a model per chat), and batch runs (the selected model applies to all rows).

Model recommendation engine

Choosing the right model for every task sounds powerful but also complex. The recommendation engine makes it practical.

After 10+ runs of a recipe, the engine has enough data to score every model you’ve used:

score = successRate × 0.5 + costEfficiency × 0.3 + speed × 0.2

It looks at the last 60 days of execution history and compares:

  • Success rate — what percentage of runs completed without errors
  • Cost efficiency — cost per successful run (lower is better)
  • Speed — average execution duration (faster is better)

If your current model has ≥90% success rate across 10+ runs, the engine confirms it’s a good choice. Otherwise, it recommends the highest-scoring alternative with full metrics so you can make an informed switch.

For rigorous comparison, you can run a bakeoff — a head-to-head evaluation with LLM-as-judge scoring and 95% confidence intervals. Bakeoffs can compare any two models, any two recipes, or any two workflows.

Enterprise resilience

Running production workloads across multiple providers requires more than API key management. JieGou includes three resilience layers:

Circuit breakers

Each provider gets its own circuit breaker. If 5 calls fail within 60 seconds, the circuit opens — subsequent calls fail fast instead of timing out. After 30 seconds, the circuit enters half-open state and sends a probe request. If it succeeds, the circuit closes and traffic resumes.

For openai-compatible providers, circuit breakers are scoped per-account (since each customer may have a different endpoint). Cloud providers share a global circuit breaker.

Critically, circuit breakers are fail-open — if Redis is down and we can’t check the circuit state, we let the call through. This means a monitoring failure never blocks your workflows.

Concurrency limits

A global semaphore limits concurrent LLM calls per account to prevent runaway usage. The limit scales with your plan:

Plan TierGlobal Capacity SharePer-Account Max
Enterprise100% (150 slots)10 concurrent
Pro83% (125 slots)10 concurrent
Starter67% (100 slots)10 concurrent

Cost tracking

Every LLM call records token usage and estimated cost. When you use BYOK, the cost is tracked separately — it shows up in your analytics dashboard but doesn’t count toward platform usage limits, since you’re paying your provider directly.

The cost estimator uses historical averages from your last 20 successful runs to project costs before you execute. You can see expected spend per recipe, per workflow step, and per batch run.

Zero-knowledge key architecture

JieGou never sees your API keys in plaintext at rest. The encryption pipeline:

  1. Root key loaded from Secret Manager or environment variable (64-character hex)
  2. Per-account key derived via HKDF-SHA256: HKDF(rootKey, "", "jiegou-byok-envelope-v1:{accountId}", 32)
  3. Encryption: AES-256-GCM with random 12-byte IV and 16-byte auth tag
  4. Storage: Only the ciphertext + IV + auth tag are stored in Firestore
  5. Decryption: Happens in-memory at execution time, never persisted

Key rotation is supported — the system can migrate from the legacy global encryption scheme to per-account envelope encryption without downtime.

If an API call returns 401 or 403, the system automatically marks the key as invalid and surfaces a clear error. You can re-validate or replace the key from the settings page.

Getting started

  1. Free tier: Use platform-provided keys for Anthropic, OpenAI, and Google — no credentials needed
  2. BYOK: Go to Settings > API Keys, add your provider keys, and they’re encrypted immediately
  3. Open source: Enter a custom base URL (e.g., http://your-vllm-server:8000/v1) and model name
  4. Auto-discovery: If Ollama or vLLM is running locally, models appear automatically

Multi-provider model access is available on all plans. OpenAI-compatible endpoints and the model recommendation engine are available on Pro and above. Certified model registry and auto-discovery are Enterprise features.

Explore multi-provider model support or start your free trial.

byom byok multi-provider open-source llama deepseek vllm ollama model-selection
Share this article

Enjoyed this post?

Get workflow tips, product updates, and automation guides in your inbox.

No spam. Unsubscribe anytime.