Question 1

How does your authentication scope to a single source without granting broader access?

Accepted Answer

For Microsoft Graph: Entra app registration with Application Access Policy (configured via Exchange Online PowerShell: New-ApplicationAccessPolicy with -AccessRight RestrictAccess). Result: app registration can only call Mail.Read against the specific mailbox or mail-enabled security group; calls against any other mailbox return access-denied. Alternative mechanism: shared-mailbox delegation. Your IT team chooses which mechanism; we configure to match. For other source systems (Slack, Teams, REST APIs): equivalent narrow-scope OAuth grants per platform.

Question 2

How are LLM API keys secured?

Accepted Answer

Keys are held in AWS Secrets Manager (default) or your own KMS / Vault if you prefer. Quarterly rotation default; on-demand rotation supported. CloudTrail audit of every secret-access event. No secrets in environment variables, code, or logs. For BYOK customers: keys you provision are used; never transmitted to JieGou control plane.

Question 3

What is the LLM training-data posture?

Accepted Answer

JieGou does NOT train or fine-tune models on customer data. Anthropic, OpenAI, and Google all contractually commit (commercial API tier) to not training on customer data — DPAs from all three providers available on request. We do not augment LLM training with your inputs, outputs, edits, or operator actions.

Question 4

What goes in the audit log and how long is it retained?

Accepted Answer

Every processing step: input, output, model used (where applicable), version, confidence scores, timestamp. Every operator action: who, what, when, with what edits (full diff). Append-only table with optional hash-chain integrity. Default 1-year retention for full per-record trace; configurable per your policy. Audit-record hash retained indefinitely (lets you verify a historical claim against the chain even after raw record TTL).

Question 5

How do I export audit data to my SIEM?

Accepted Answer

Three options: JSON dump (on-demand or scheduled); S3 bucket you control (push); forward-to-SIEM in your format of choice (Splunk, Microsoft Sentinel, generic syslog). Plus an API for ad-hoc query during pilot phases. Operator-action audit is separate from system-action audit so investigations into "who approved this draft?" don't have to filter through system logs.

Question 6

What does Shadow Mode actually mean in practice?

Accepted Answer

AI drafts; human approves before any output reaches a customer or modifies a system. Configurable per workflow: low-impact actions can run autonomously (drafting, summarizing); high-impact actions stay drafts until a named human approves (sending, modifying, deleting). Approval queue with notification routing (email + Slack + Teams). Sub-1-minute review per draft is the design target. Every approval audit-logged with operator identity + diff between AI-proposed and operator-approved output.

Question 7

What is the SOC 2 status?

Accepted Answer

SOC 2 Type II — in flight; target Q4 2026. Type I attestation available now on request. DPA available — covers JieGou + sub-processor pass-through. Sub-processor list: Anthropic (LLM), OpenAI (LLM fail-over), Google (LLM fail-over), AWS (infrastructure: EKS, S3, Secrets Manager, CloudTrail, CloudWatch). GDPR-ready data residency (region-pinning available). ISO 27001 — on roadmap; not certified. Industry-specific: not currently HIPAA-eligible by default (engagement-specific); PCI-DSS-eligible architecture (encryption + audit) but not currently certified.

Question 8

Can I deploy this entirely inside my own VPC?

Accepted Answer

Yes. Phase 17 hybrid architecture (already built and shipping) deploys the same containers inside your network; JieGou cloud holds only the control plane (auth, audit aggregation, ops monitoring). Same image tags as our managed cloud; same Helm chart; same operational behavior. Customer-VPC is the default deployment option for engineering-led IT teams. For full sovereignty (no JieGou control plane connection), Option C self-hosted is also available.

Customer framing	Component	Purpose	Implementation	Failure mode
Pattern to mine	Intake	Subscribes to data sources (email, API, queue); identifies relevant events; deduplicates by Message-ID + content hash	Microsoft Graph subscription with delta query; fallback to scheduled poll; webhook listeners	Source unreachable → exponential backoff with operator alert at configurable threshold
Type of handler (parse)	Extraction	Parses content; produces per-field values with confidence scores; routes hard cases through alternate paths	Anthropic Claude Sonnet 4.6 primary (extraction + vision for hard scans); OpenAI/Gemini failover via circuit breaker	Low-confidence field flagged for review; whole-record failure routes to exception queue
Type of handler (enrich)	Metadata	Resolves lookup fields against authoritative sources (reference tables, ERPs)	Deterministic fuzzy-match against reference data; NOT LLM-based (ID resolution requires determinism)	No-match flags for human review (never silently autoresolved); stale reference data triggers operator alert
Output format	Structuring	Renders enriched record in target schema (XML / EDI / JSON)	Schema-driven template engine; schema version-pinned; validates output before emitting	Schema validation failure → exception queue; never emits invalid output downstream
Output destination	Handoff	Delivers structured output to your existing middleware	Adapter pattern: file drop (directory ACL), REST API (mTLS), queue (SAS token / managed identity) — your choice	Delivery failure → retry policy + DLQ + Review Surface alert; idempotency key prevents duplicate processing
Governance layer	Observability + Audit	Captures every system + operator action with full provenance	Structured JSON logging; append-only audit table; optional hash-chain integrity; export to SIEM (Splunk / Sentinel / syslog)	Logging backpressure → upstream services degrade gracefully; audit-table write failure halts pipeline (never process without audit)
Human oversight	Review Surface	Web UI for operators to review, approve, edit, reject AI-drafted outputs before they reach customers or modify systems	SvelteKit; SSO via OIDC (Entra / Okta / Google Workspace); RBAC (Operator / Reviewer / Admin / Auditor)	Review Surface down → outputs queue at Structuring; no auto-handoff without operator approval

Crossing	Data	Classification	Encryption	Retention
Source → JieGou (intake)	Raw event content + attachments	Confidential (PII; financial data; customer-specific)	TLS 1.2+ in transit; AES-256-GCM at rest	7 days default in processing storage; configurable to 0 beyond audit-record hash
Internal component → component (within JieGou perimeter)	Extraction records, enriched records, structured payloads	Confidential	TLS 1.2+ between services; AES-256-GCM at rest	Same as intake
Structuring → Handoff → customer middleware	Structured output payload	Confidential	Per chosen handoff (TLS for API; ACL + at-rest encryption for file drop; broker-managed for queue)	Customer-controlled post-handoff
Audit → customer SIEM (optional)	Audit records	Confidential	TLS in transit	Customer-controlled in SIEM
Operator → Review Surface	Operator approvals + edits	Confidential	TLS 1.2+; session via OIDC	Logged to audit per audit retention

Reference architecture for engineering-led IT teams.

Seven components. Each separately deployable. Each with named failure mode.

Three trust boundaries. Two of them you control.

Ten named failure modes. Each with detection + defined behavior.

Three deployment options. Same containers. Packaging change, not re-architecture.

Explicit boundaries. The things you should not have to ask about.

Multi-provider for reliability. Not for marketing.

This is the architecture your engineering lead can evaluate without a sales rep present.

Questions engineering leads ask before procurement.

Book a 30-min architecture review with the founder.