Skip to content

How an Operations Partner runs its own AI operations.

JieGou is its own first customer. 819 QA waves in 22 days during a complete ICP pivot. Anthropic Memory swapped same-day from research preview to GA. 10-layer governance applied to JieGou itself before it ships to anyone else.

The 'customer' in this case study is JieGou itself. The numbers are commits, waves, and ship times — all reproducible from the public git history.

§1 — Why this case study exists

Eating our own dog food, in public, with the numbers attached.

When we tell CIOs that Operations Partner means production-grade AI ops discipline — QA cadence, observability, multi-LLM provider-portable architecture, hash-chain audit, shadow-mode trust escalation — the reasonable next question is "prove it on yourselves first."

That's what this case study does. The customer is JieGou. The numbers are commits, waves, and ship times. Everything below is reproducible from the public git history; you can clone the repo and verify.

Three concrete pieces of evidence — QA cadence, same-day model swap, and the 10-layer governance applied internally — together form the load-bearing claim: an Operations Partner that doesn't operate this way on its own platform shouldn't be trusted to operate that way on yours.

§2 — QA cadence — 819 waves in 22 days during a complete ICP pivot

Every new surface paired with tests. The cadence holds through pivots.

Between QA wave v461 (2026-05-01 — feedback API coverage) and wave v1279 (2026-05-22 — workflow failure-recipient handlers), JieGou shipped 819 QA waves in 22 calendar days. The window covered:

  • A complete ICP pivot (MSP-first → Wintec-shape engineering-led mid-market CIO)
  • A new brand identity rollout (Operations Partner) across the marketing site (36-PR push)
  • Three new Anthropic Managed Agents shim integrations (Outcomes, Multi-Agent, Dreaming)
  • A new live-customer pilot (PSKin LINE chat-agent) including a same-day fix for a parser bug and a model-routing upgrade
  • The R3 / R6 / R7 marketing positioning shipped

This is more QA waves in 22 days than the entire prior history of the QA program combined. Each wave is a single test file covering a specific surface — an API route, a server-side helper, a UI flow, a parser case. The pattern is invariant: every new surface ships with tests, and the test number tracks monotonically so the per-surface coverage is auditable.

The honest read on what this number means: not that 819 tests are inherently virtuous, but that the cadence survived a window where most companies would have suspended it. The ICP pivot, the brand rollout, and the live-customer firefighting all happened in parallel. The test-coverage discipline didn't slip. That's the trust signal.

What this looks like operationally: the unit-test suite runs in about ten seconds. npm run check exits with zero type errors on every commit (~15,600 tests across ~385 test files at last count). The check gates the commit; the commit gates the deploy; the deploy gates production. The thing customers buy when they buy Operations Partner is this gate holding even when the schedule is on fire.

§3 — Same-day model swap — Anthropic Memory from research preview to GA

Architectural foresight: build the shim, swap when the dependency unblocks.

On 2026-04-23, Anthropic moved Memory from research preview to public beta under the standard managed-agents-2026-04-01 API header. JieGou's managed-agent memory module swapped to the real API the same day. The pre-swap and post-swap tool names, argument shapes, and MCP server schemas were identical, so no caller (agent prompts, tests, MCP integrations) needed updating.

The mechanic that made this possible was architectural, not heroic. Months earlier, JieGou had built a Firestore-backed memory module (agent_memory collection) shaped to match Anthropic's future Memory API based on the available public signals about that API's likely shape. The shim was decoupled from callers via a stable interface. When Memory went GA, the swap was a backend implementation change behind an unchanged contract.

The same pattern repeated three more times in May 2026: Anthropic Outcomes, Multi-Agent Orchestration, and Dreaming — announced at Code with Claude on May 6, integrated as shims against the future API by May 22. When those capabilities themselves go GA, the swap is a backend change behind an unchanged contract.

What this means for a customer evaluating Operations Partner: model-layer obsolescence is the fastest-moving risk in the AI stack right now. Yesterday's frontier model is next quarter's mid-tier. Yesterday's research preview is next month's GA. The shim-then-swap pattern, applied at the Operations Partner layer, decouples the customer's workflows from that churn. The customer doesn't re-architect when Anthropic ships GPT-N or when Google ships Gemini-N. The Operations Partner runs the swap as a backend operation.

The long-form engineering essay covering this pattern in detail is "How JieGou implements Anthropic Managed Agents end-to-end" (essay 10 in the CIO series). The same discipline is the case study here.

§4 — 10-layer governance applied to JieGou itself

The framework we sell is the framework we use.

The 10-layer governance frame — published on the marketing site and at the heart of the Operations Partner pitch — is operated on JieGou itself before it's offered to anyone else. Concretely:

  • Layer 1 — Use-case fit + ROI baseline. Every new internal capability (agent, recipe, workflow) gets scoped against measurable improvement vs the prior state. Capabilities that can't articulate the baseline don't ship.
  • Layer 2-3 — Recipe + workflow design with approval gates and tool scopes. The 31 Composio toolkits and the 13 messaging channels are scoped per-recipe; no blanket-grant integration runs in production.
  • Layer 4 — Bakeoff harness. LLM-as-judge evaluation + ground-truth eval sets gate model and prompt changes. The bakeoff infrastructure is the same product customers use for their own evaluations.
  • Layer 5 — Knowledge integrations + sensitivity labels. 13 knowledge sources (Coveo, Glean, Elasticsearch, Confluence, Notion, Drive, etc.) are connected with per-label retrieval scoping. The PII layer classifies query inputs before they reach the LLM.
  • Layer 6 — Multi-LLM provider-portable runtime. Anthropic, OpenAI, and Google API keys held with per-provider circuit breakers. The same runtime used by Operations Partner customers handles JieGou's internal workflows.
  • Layer 7 — Shadow Mode → Tier 1 → Tier 2 → Tier 3 trust escalation. New workflows start in shadow, with operator review on every output, before they progress up the trust tiers. The PSKin LINE chat-agent runs at the shadow-write tier today.
  • Layer 8 — Hash-chain audit trail. Audit records are append-only with optional hash-chain integrity. The same audit infrastructure is offered to customers.
  • Layer 9-10 — Drift detection + retirement policy. The QA cadence above is the drift detector. The cadence itself is what surfaces regressions before customers see them.

The point of running the framework on ourselves is not to claim purity — every operator finds places the framework's discipline rubs against schedule pressure. The point is that we know the rough edges from the inside. When a customer asks "what breaks when Layer 7 trust escalation moves a workflow from Shadow to Tier 1?" the answer comes from operating it, not from reading the slides.

§5 — What this proves about the Operations Partner shape

The discipline is the deliverable.

The thing customers buy when they buy Operations Partner is the discipline shown above, applied to their workflows. Three implications worth being explicit about:

1. The QA cadence is your QA cadence, eventually. When the Operations Partner integrates with your environment, the per-workflow test coverage follows the same monotonic pattern. The customer's workflow gets QA waves at the same rhythm. The numbers above are the proof that the cadence survives schedule pressure.

2. The model-swap optionality is your model-swap optionality. When Anthropic, OpenAI, or Google ships a new capability that improves the customer's workflow, Operations Partner runs the swap as a backend operation. The customer doesn't re-platform, doesn't re-procure, doesn't re-test. The shim-then-swap pattern shown above is the recurring mechanism.

3. The 10-layer frame is auditable by you. The governance frame is operated on JieGou itself before it's offered. Customers can — and have — asked to see specific layer's implementation against the published frame. The answer is the source code, not the slide.

The honest disclosure: this isn't a customer-success story in the traditional sense. JieGou is the customer, JieGou is the provider, and the case study is self-attested. The reason it works as a case study anyway is that the underlying numbers (git commits, wave counts, ship times) are not self-attested — they're auditable. If you want to verify, the repo is at github.com/JieGouAI/orion (private, shareable on request under NDA).

Want to see this discipline applied to your workflows?

30-min discovery call. No deck. We walk through your existing AI footprint and what the QA cadence + model-swap optionality + 10-layer frame would look like applied to it. Honest about when Operations Partner is the right shape and when it's not.