An engineering-led mid-market manufacturer chose Operations Partner over building the last 20% in-house.

NDA-anonymized architecture case study. Seven-component decomposition, multi-LLM fail-over, hash-chain audit, VPC-portable deployment — without re-architecting between managed and self-hosted.

Book a 30-min discovery call → Read the reference architecture

Customer is referenced as 'engineering-led mid-market manufacturer (NDA-protected)'. Sub-vertical, headcount, and geography deliberately omitted. Architecture details are reproducible across the engineering-led mid-market IT segment — not customer-specific.

§1 — The shape of the problem

They had built 60–90% themselves. The last 20% was structurally different.

The customer is an engineering-led mid-market manufacturer with an IT team that ships real software in production. Before the engagement they had already prototyped an AI document-extraction pipeline — invoice intake from a dedicated mailbox, structured output back to their middleware — using Power Automate plus a handful of custom Python scripts plus a Claude API key. The first 60-90% of the work was done.

The remaining 20% wasn't more of the same. It was a structurally different class of work: vendor-format drift (the eighth vendor sent a PDF a Konica copier had flattened into a single column), per-tenant credential scoping (Graph API had been granted tenant-wide because nobody had time to scope it tighter), audit-trail evidence in the shape the general counsel would actually accept, multi-channel handoff when the workflow needed to fan out beyond email.

The internal debate had three options:

Keep going in-house. Add three to six months of senior engineering time and budget the operational hardening as a roadmap line item. Defensible. The risk: the engineers doing operational hardening are the same engineers who could be working on the next initiative.
Hire a dedicated platform engineer. Right answer at five or more in-production pipelines. Wrong answer at one to four — the engineer divides time across the whole roadmap and the AI-ops layer gets fractional attention.
Bring in an Operations Partner whose entire product surface is that operational 20%.

The customer chose option 3 for the document-extraction workflow, with the explicit understanding that the architecture had to keep option 2 open. If the in-house team was ready to take over in twelve months, the system had to be portable.

§2 — Seven-component decomposition

The pipeline maps onto seven independently-deployable services. No managed-cloud-specific glue.

On the discovery call the customer described the product shape as "specify the email pattern to mine, the type of handler, and the output format / destination." That abstraction mapped onto seven JieGou components:

Intake — subscribes to the dedicated mailbox; identifies invoice emails among mixed inbound; deduplicates by Message-ID and content hash. Microsoft Graph delta subscription; polling fallback.
Extraction — clean-PDF parse path (pdfplumber + LLM verification) plus hard-scan vision path (Claude Sonnet 4.6 vision); per-field confidence scoring; cross-path agreement raises confidence, disagreement lowers it.
Metadata — resolves customer ID / account ID via a deterministic reference-table match with configurable fuzzy-match fallback. Not LLM-based. ID resolution requires determinism; LLMs are non-deterministic.
Structuring — renders the enriched record into the customer's downstream schema (XML / EDI); schema validation before emit; never sends invalid output to the middleware.
Handoff — adapter pattern: file drop with directory ACL, REST API with mTLS, or queue (Azure Service Bus / SQS). Customer chose the mechanism; Operations Partner configured to match.
Observability + Audit — structured JSON logging at every step; append-only audit table with optional hash-chain integrity (each row hashes the prior row's hash + current payload, making tamper detection straightforward). Export to JSON dump, customer-controlled S3 bucket, or forward-to-SIEM.
Review Surface — SvelteKit web application; OIDC via the customer's Entra ID; RBAC with Operator / Reviewer / Admin / Auditor roles. Every approve / edit / reject action logged to audit with full diff (what the AI proposed → what the operator approved).

Each component is a separately-deployable containerized service. That decomposition is the thing that kept option 2 open. Twelve months from now, if the customer's team is ready to take over, no re-architecture is required — the same images redeploy to the customer's environment with a Helm values override.

§3 — Auth + secret model

Mailbox access scoped to one mailbox. LLM keys held by Operations Partner. Customer-side OIDC for operator SSO.

The customer flagged on the discovery call that Microsoft Graph's default grant is too broad and that their IT team disables scopes via PowerShell. The pilot access model addressed that explicitly:

Mailbox access. Entra app registration with Application Access Policy (New-ApplicationAccessPolicy -AppId <app-id> -PolicyScopeGroupId <dedicated-mailbox> -AccessRight RestrictAccess). Result: the app registration can only call Mail.Read against the dedicated pilot mailbox. Calls against any other mailbox return access-denied at the Exchange layer. Shared-mailbox delegation available as an alternative if Application Access Policy isn't viable in the tenant configuration.
LLM provider authentication. Anthropic, OpenAI, and Google API keys held in Operations Partner's AWS Secrets Manager. Customer keys not required and not used. No provisioning burden on the customer's IT team for LLM access. Quarterly rotation default.
Handoff authentication. Matched to the customer's middleware: directory ACL for file drop, mTLS for API, SAS token or managed identity for queue. Locked in pre-pilot with customer IT.
Operator authentication. OIDC via the customer's Entra ID. No new identity store, no separate password to manage, no shadow-IT credential.
Secret storage + audit. AWS Secrets Manager with CloudTrail audit on every secret-access event. No secrets in environment variables, code, or logs.

The point of including this section is not the specific mechanism — it's that the auth model was an explicit conversation with the customer's IT lead before pilot kickoff, with the customer's tightening preferences captured and applied. That conversation is standard practice across the Operations Partner shape.

§4 — LLM primary + fail-over + training-data posture

Multi-provider orchestration for reliability, not for marketing.

Primary: Anthropic Claude Sonnet 4.6 — used for both extraction and hard-case OCR via Claude's vision capability. Selected for invoice-extraction accuracy on the hard-scan path (the differentiated 20%), reliable structured-output mode, and strong instruction-following on field-by-field confidence reporting.

Fail-over: OpenAI GPT-4 (vision-capable) and Google Gemini as secondary fail-overs. Circuit breaker trips on per-provider error rate, timeout, or latency SLO miss; routes to the next provider automatically; ops alert on sustained primary-down.

Training-data posture (explicit). Operations Partner does not train or fine-tune models on the customer's data. Anthropic, OpenAI, and Google all contractually commit (commercial API tier) to not training on customer data. DPAs from all three providers available on request. The customer's invoice content does not leave the Operations Partner managed cloud except as the final structured output back to the customer's middleware, and is not retained in long-term object storage past the configurable processing horizon (default 7 days for reprocessing; configurable to zero beyond the audit-record hash).

§5 — Hash-chain audit

Tamper-evident audit trail in the shape the general counsel will actually accept.

The pre-engagement audit posture was structured JSON written to standard output, retained for 28 days, with no schema enforcement. The general counsel had asked once, two years out, what the AI decided on a specific date and who approved it. The team had answered after eight hours of two engineers reconstructing CloudWatch streams. That experience drove the audit-trail conversation in the pilot.

The post-engagement architecture records, per invoice, every processing step's input, output, model used, version, confidence scores, and timestamp; and every operator action — who, what, when, with what edits — in full diff against the AI proposal. Storage is append-only (Postgres or Firestore, both viable); each row hashes the prior row's hash plus the current payload. Tampering with any row breaks the chain at every subsequent row. AES-256-GCM at rest with per-tenant encryption keys.

Retention defaults to one year for full per-invoice trace, configurable per the customer's policy (longer for compliance, shorter if preferred). The audit-record hash is retained indefinitely — long after raw records age out, a historical claim can still be verified against the chain. Export options include on-demand JSON dump, customer-controlled S3 push, and forward-to-SIEM (Splunk, Microsoft Sentinel, syslog — the customer's tooling, delivered in the format the SIEM wants).

The operator-action audit is queryable separately from the system-action audit. Investigations into "who approved this draft?" don't have to filter through "what did the LLM say?" Both share the hash chain; the queries are separate.

§6 — Deployment shape, designed to keep all options open

Shape A today. Shape B or C without re-architecture when the customer's team is ready.

Three deployment shapes were on the table at pilot scoping:

Shape A — Operations Partner managed cloud. What the pilot runs today. Single-tenant VPC for the customer; isolated compute and storage. Operations Partner owns the platform; customer owns the workflow outputs.
Shape B — VPC hybrid. The same containers deployed inside the customer's network; Operations Partner cloud holds only the control plane (auth, audit aggregation, ops monitoring). Data never leaves the customer's network except as audit aggregates. The Phase 17 hybrid architecture is already built and shipping; the reference VPC agent is in console/code/scripts/vpc-agent/.
Shape C — Fully self-hosted. Customer operates the platform end-to-end. Operations Partner provides containers, runbook, training, support. Maximum sovereignty; maximum operational burden on customer IT.

The pilot runs Shape A today, with the Week 4 review explicitly able to flip to Shape B or C. Because the architecture is decomposed into containerized services with no managed-cloud-specific glue, the transition is a packaging change, not a re-architecture. The same ECR images that run in Operations Partner's managed cloud today redeploy to the customer's environment with a Helm values override.

That portability is what the customer paid for. Not the pilot throughput. Not the model selection. The portability — because it keeps the in-house-takeover option open at twelve months without stranding the work done in the first three.

§7 — What this enabled

Optionality at every layer that mattered to the CIO.

What the customer ended up with — at every layer where the original in-house build had been about to compromise:

LLM portability. When Anthropic shipped Claude 4.6, the pilot swapped models same-day. When Anthropic ships 4.7, same pattern. The customer didn't re-negotiate, didn't re-architect, didn't ask permission.
Deployment portability. Shape A → Shape B is a Helm values flip. The in-house-takeover scenario at twelve months is no longer a re-platforming project.
Audit portability. Forward-to-SIEM means the customer's existing tools — Splunk, Sentinel — see the audit feed in their format. No new tool to procure, no new dashboard to learn.
Operator-team portability. The Review Surface uses OIDC via the customer's Entra ID. When the security team rotates the operator group's membership in Entra, the Review Surface reflects it without an Operations-Partner-side change.
Operational accountability portability. One named human is responsible on the Operations Partner side at 3am — at pilot kickoff, the founder; as the partnership scales, the named owner is on the Operations Partner team. The customer's CIO has one number to call.

§8 — Disclosure + how to read this

NDA-clean, architecture-grade, operator-honest.

The customer is NDA-protected. Sub-vertical, headcount, geography, and the engagement timeline are deliberately omitted from this page. The architecture details are reproducible across the engineering-led mid-market IT segment — the case study is meant to show you the shape of what you'd get from the Operations Partner relationship, not to claim that your situation will land at exactly these specifics.

The named-reference unlock for this customer is in flight. When it happens, this page gets updated with the customer's name and a quote. Until then, the architecture story is what's available — and per Operations Partner discipline, the architecture story is actually the substantive part of the case study. Logos are decoration; the seven-component decomposition is the answer to "what would we get?"

If you want the architect-to-architect version of this material with your team's specifics overlaid — your existing stack, your middleware, your IT-sovereignty preference — that's what the 30-minute discovery call is for.

30-min discovery call. We'll walk through the architecture with your team's specifics overlaid.

No deck. No demo. We look at your existing stack, identify whether Operations Partner is the right shape — or whether build / hire / consultant fits better. Honest either way.

Schedule discovery call → Reference architecture