JieGou supports models from Anthropic (Claude), OpenAI (GPT, o-series), and Google (Gemini). You can choose a different model for every recipe and every workflow step. But with this many options, how do you decide which model to use where?
This guide walks through a practical framework for model selection.
Start with the task type
Different models have different strengths. Based on thousands of recipe executions across our user base, here are general patterns:
Long-form writing and nuance — Claude (Sonnet and Opus) tends to produce more natural, nuanced writing. If your recipe generates customer-facing content, marketing copy, or detailed analysis, Claude is a strong starting point.
Structured extraction and classification — GPT models are often strong at extracting structured data from unstructured text. Invoice parsing, ticket categorization, and data transformation tasks frequently perform well with GPT.
Speed-sensitive tasks — For tasks where latency matters more than quality ceiling (chat responses, real-time suggestions), smaller models like Claude Haiku, GPT-5-mini, or Gemini Flash give faster responses at lower cost.
Reasoning-heavy tasks — For tasks requiring multi-step logic, planning, or mathematical reasoning, the o-series models (o3, o4-mini) and Gemini Pro are worth testing.
These are guidelines, not rules. The right model for your specific recipe depends on your prompt, your data, and your quality bar.
Use bakeoffs to validate
Instead of guessing, use JieGou’s bakeoff system to test empirically. Here’s a practical workflow:
Round 1: Quick screen (3 models, 10 inputs)
Create a recipe bakeoff comparing your top 3 model candidates on 10 representative inputs. Use a single LLM judge. This takes minutes and gives you a directional signal.
Look for clear winners and clear losers. If one model scores significantly lower, eliminate it. If two are close, they both advance to round 2.
Round 2: Statistical evaluation (2 models, 50 inputs)
Take the top 2 candidates and run a more rigorous bakeoff with 50 inputs and multi-judge evaluation. Check the confidence intervals — if they don’t overlap, you have a winner. If they do, the models are functionally equivalent for this task, and you should decide based on cost or speed.
Round 3: Production A/B test (optional)
If the offline evaluation is inconclusive or if you need production validation, set up a live A/B test. Route traffic between the two variants for 48-72 hours and let the auto-stop mechanism determine the winner based on real-world performance.
Consider cost vs. quality trade-offs
Model pricing varies significantly. A frontier model might score 5% higher on quality but cost 10x more per token. For many tasks, that trade-off isn’t worth it.
JieGou bakeoffs show cost comparison alongside quality scores, so you can make informed decisions. Common findings:
- For 80% of internal-facing tasks (summaries, drafts, categorization), mid-tier models produce equivalent quality to frontier models at a fraction of the cost
- For customer-facing content and high-stakes analysis, the quality difference from frontier models is worth the cost
- For high-volume, low-complexity tasks (classification, extraction), the smallest sufficient model saves the most money
Mix models within workflows
One of JieGou’s strengths is per-step model selection in workflows. A common pattern:
- Extraction step — Use a fast, cheap model (Haiku, GPT-5-mini) to extract structured data from input
- Analysis step — Use a reasoning-focused model (o3, Gemini Pro) to analyze the extracted data
- Writing step — Use a strong writing model (Claude Sonnet, GPT-5) to produce the final output
Each step uses the model best suited to its task type, optimizing for both quality and cost across the entire workflow.
Re-evaluate periodically
Model capabilities change with new releases. A model that was second-best six months ago might be the best option today. Set a reminder to re-run your bakeoffs quarterly, especially after major model updates.
JieGou makes this easy — your bakeoff configurations are saved, so re-running with updated models takes a single click.
Get started
Multi-provider model support is available on all plans. Bakeoffs for model comparison are available on Pro. Explore all supported models or start your first bakeoff.