Skip to content
Product

Open Source LLMs + JieGou: Run AI Automation Without Cloud Dependencies

Deploy JieGou with Llama, DeepSeek, Qwen, and Mistral on your own infrastructure. A practical guide to air-gapped AI automation for enterprises that can't send data to the cloud.

JT
JieGou Team
· · 5 min read

Some organizations can’t send data to OpenAI. Or Anthropic. Or Google. Not because the models aren’t good enough — because the data can’t leave the building.

Healthcare systems processing patient records. Financial institutions handling transaction data. Defense contractors working with classified information. Government agencies bound by data sovereignty requirements. For these organizations, the promise of AI automation has always come with an asterisk: as long as you’re comfortable sending your data to a cloud API.

JieGou removes that asterisk.

What changed

Two things converged to make self-hosted AI automation practical:

Open source models caught up. Llama 4 Maverick, DeepSeek V3, Qwen 3 235B, and Mistral 3 Large deliver quality that matches or exceeds GPT-4o on many tasks. Tool calling, structured output, long context windows — the capabilities that enterprise workflows need are all there.

Inference servers matured. vLLM, Ollama, SGLang, and LocalAI provide production-ready OpenAI-compatible APIs. You point your application at http://localhost:8000/v1 instead of https://api.openai.com/v1, and everything works.

JieGou now supports any OpenAI-compatible endpoint as a first-class provider. Same recipe system, same workflow engine, same bakeoff comparisons, same approval gates — just running on your hardware with your models.

How it works

The OpenAI-compatible provider

JieGou treats custom endpoints the same way it treats Anthropic, OpenAI, and Google. When you configure a custom endpoint in Settings > API Keys, you provide:

  • Endpoint URL — Where your inference server lives (e.g., http://ollama:11434/v1)
  • Model name — Which model to use (e.g., llama3.3)
  • API key — Optional. Most local endpoints don’t require one.

From that point on, every JieGou feature works with your model: recipes, workflows, bakeoffs, batch runs, multi-turn chat, structured output extraction — all of it.

Certified vs. community models

Not all open source models handle every JieGou feature equally well. Tool calling, structured JSON output, and long-context processing require specific model capabilities. We test and certify models that reliably handle the full JieGou feature set:

ModelSizeKey Capabilities
Llama 4 Maverick400B+ MoETool calling, structured output, vision, 1M context
DeepSeek V3.2671B MoEReasoning, code generation, structured output
Qwen 3 235B235B MoEMultilingual, tool calling, structured output
Mistral 3 Large123BVision, tool calling, 128K context

Certified models get a green badge in the model selector. Community models (everything else) get a gray badge with a note: “Not certified — quality may vary.” We encourage users to run a Bakeoff comparing community models against certified ones before putting them into production.

Auto-discovery

When JieGou starts, it probes well-known local endpoints:

  • http://ollama:11434 — Docker network (co-located Ollama container)
  • http://localhost:11434 — Ollama default port
  • http://localhost:8000 — vLLM default port

If it finds a running inference server, the admin dashboard shows a banner: “Local LLM endpoint detected” with a one-click “Configure” button that pre-fills the endpoint settings.

Model download manager

For Ollama endpoints, JieGou includes a built-in model manager. Browse the certified model list, click “Pull,” and watch the download progress in real time. No terminal required.

Deployment options

Option 1: Docker Compose starter kit (simplest)

For evaluation or small teams. Everything in one docker compose up:

git clone https://github.com/JieGouAI/orion.git
cd orion/console/self-hosted-starter
cp .env.example .env
docker compose up -d
./models/pull-models.sh llama3.3

Five minutes to a working AI automation platform. JieGou auto-detects the co-located Ollama instance. Open http://localhost:3000 and start building.

For GPU acceleration:

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

Option 2: Hybrid VPC deployment (enterprise)

For organizations that want JieGou’s managed control plane (UI, scheduling, monitoring) but need execution to happen on-premises. VPC execution agents run inside your network, receive step execution requests, and process them using your local LLM endpoints. The control plane never sees raw data.

Option 3: Full Kubernetes deployment

For large organizations running their own K8s clusters. JieGou ships a Helm chart (console/chart/) that deploys alongside your existing vLLM or Ollama services. Configure the custom endpoint to point at your inference service’s internal DNS name.

Comparing platforms

How does JieGou’s self-hosted story compare to alternatives?

n8n supports self-hosting and has an Ollama integration, but it’s a general-purpose workflow tool — not purpose-built for AI automation. No certified model registry, no bakeoff system for comparing model quality, no department-first workflow organization, no approval gates.

Zapier and Microsoft Copilot Studio are cloud-only. There’s no self-hosted option, period.

LangChain/LangGraph provides the building blocks but not the platform. You still need to build the UI, user management, scheduling, approval workflows, quality monitoring, and everything else. That’s the product, not a library call.

JieGou is the only platform that combines self-hosted AI automation with the enterprise features that regulated industries need: RBAC, approval workflows, audit logging, compliance presets, and quality bakeoffs — all running on your infrastructure.

What’s next

We’re investing heavily in the self-hosted experience:

  • Model performance benchmarks — Automated quality scoring for each certified model against JieGou’s recipe test suite
  • Inference cost calculator — Compare self-hosted GPU costs vs. cloud API pricing for your specific workload
  • Multi-GPU orchestration — Route different recipes to different models based on capability requirements
  • Offline model catalog — Pre-packaged model bundles for fully air-gapped environments with no internet access at all

Get started

The self-hosted starter kit is available now. Clone the repo, run Docker Compose, pull a model, and start automating.

If you need hybrid VPC deployment or compliance controls for regulated industries, contact our sales team about the Enterprise plan.

open-source self-hosted air-gapped ollama vllm enterprise llm compliance
Share this article

Enjoyed this post?

Get workflow tips, product updates, and automation guides in your inbox.

No spam. Unsubscribe anytime.