Hybrid Chat Agents — The Resolution Cascade That Actually Works in Production

The Problem With Chatbot Builders Today

Most chatbot platforms force you into one of two camps. Camp one: keyword-based rule engines. You define patterns like “hours” or “refund policy” and map them to canned responses. They are fast, deterministic, and cheap — but they break the moment a user phrases something differently. “When are you open?” matches, but “what time do you close on weekends?” does not.

Camp two: throw everything at an LLM. Every message goes to GPT or Claude, and you hope the model gets it right. It often does — but at 2-10 cents per conversation turn, with variable latency and no guarantee the model won’t hallucinate your return policy.

Neither approach is production-ready on its own. The first is too rigid. The second is too expensive and unpredictable. What you actually need is a system that uses each approach where it excels and falls through to the next tier only when necessary.

The 4-Tier Resolution Cascade

JieGou’s chat agents resolve messages through a 4-tier cascade, evaluated in order:

Tier 1 — Rule Table with Embedding Similarity. Your rules are stored as a table of pattern-response pairs. But unlike keyword matching, each pattern is embedded as a vector. When a message arrives, it’s embedded and compared against all rule centroids using cosine similarity. If the similarity exceeds a configurable threshold (default 0.82), the matched rule fires instantly. Zero LLM cost. Sub-100ms latency. Deterministic output.

Tier 2 — Knowledge Base Retrieval (RAG). If no rule matches, the message is routed to your knowledge base — uploaded documents, FAQ pages, product manuals. RAG retrieves the most relevant chunks, and a lightweight LLM synthesizes a response grounded in your content. Configurable minimum similarity ensures low-quality retrievals are filtered out.

Tier 3 — LLM Fallback. If RAG confidence is below threshold, the full conversation context plus your system prompt is sent to a large language model. The LLM handles open-ended questions, nuanced requests, and anything your rules and knowledge base don’t cover.

Tier 4 — Human Escalation. When the LLM’s confidence is low, or when the topic matches escalation triggers (e.g., legal questions, medical advice, billing disputes), the conversation is routed to a human agent with full context preserved.

The cascade is not just a priority list — it’s an economic optimizer. Most production traffic hits Tier 1 or Tier 2. LLM calls are reserved for the long tail. Human agents handle only what genuinely requires a human.

CSV Import for Non-Technical Teams

The rule table is designed for the people who actually know your business — support leads, clinic managers, product specialists. They don’t write code. They write spreadsheets.

Upload a CSV with two columns: pattern and response. JieGou auto-embeds every pattern, computes centroids for rules with multiple pattern variants, and the rule table is live. Need to handle “What are your hours?”, “When do you open?”, and “Are you open on Saturdays?” with the same response? Add three rows with the same response. The embedding model understands paraphrases — no regex required.

Rules can be updated at any time. Re-upload the CSV, and embeddings are recomputed. No redeployment. No downtime.

Conversation Threading With Compaction

Real conversations are multi-turn. A user asks about pricing, then follows up with “What about the enterprise plan?”, then asks “Can I get a demo?” Each message depends on what came before.

JieGou maintains full conversation threads with automatic compaction. Recent messages are kept verbatim. Older messages are summarized by the LLM to preserve context while staying within token limits. This means your agent can handle 50-turn conversations without blowing through context windows or racking up costs on repeated full-history prompts.

Thread state is persisted across sessions. If a user returns the next day, the agent picks up where it left off.

Multi-Channel: Same Agent, Any Platform

Build your agent once. Deploy it to LINE, Instagram, WhatsApp, Facebook Messenger, and YouTube. The resolution cascade, rule table, knowledge base, and conversation threads work identically across every channel.

Channel-specific features — LINE rich menus, Instagram story replies, WhatsApp template messages — are handled at the adapter layer. Your agent logic stays unified. Update a rule, and it takes effect everywhere.

This is particularly valuable in APAC markets where businesses routinely operate across LINE (Taiwan, Japan, Thailand), WhatsApp (Southeast Asia), and Instagram (everywhere) simultaneously.

Real Use Case: Healthcare Clinic on LINE

A medical clinic in Taiwan deployed a JieGou chat agent on LINE with 200+ rules covering appointment scheduling, insurance questions, clinic hours, and directions — in both Traditional Chinese and English.

Tier 1 handles 70% of incoming messages: “How do I book an appointment?”, “Do you accept National Health Insurance?”, “Where is the Xinyi branch?” These resolve in under 100ms with zero LLM cost.

Tier 2 covers knowledge base queries about specific procedures, preparation instructions, and post-visit care — synthesized from the clinic’s uploaded medical guides.

Tier 3 handles open-ended questions like “I have a rash on my arm that appeared after hiking last weekend, what should I do?” The LLM provides general guidance while clearly stating it is not medical advice.

Tier 4 escalates sensitive topics — medication interactions, symptom triage, insurance claim disputes — to human staff with full conversation history attached.

Governed by the Full Stack

Chat agents in JieGou are not standalone bots. They operate within the same governance framework as every other JieGou agent:

RBAC controls who can create, edit, and deploy agents
Audit logging records every message, resolution tier used, and response generated
Sensitivity labels ensure PHI and PII in medical or financial conversations are handled according to policy
Threat detection monitors for prompt injection attempts within chat messages

Your chat agent is intelligent, fast, and cost-effective. And it’s governed from day one.