Operations Hub: Revenue, Availability, and Security Monitoring in One Dashboard

JieGou’s Operations Hub started as a place to see what your AI automations were doing — which departments were active, who had permission to do what, how many runs were happening. Useful, but incomplete. If you’re running a business on JieGou, you need to see more than automation metrics.

Today we’re expanding the Operations Hub with revenue analytics, availability monitoring, billing health, user adoption tracking, security monitoring, and a dead letter queue dashboard. It’s a full SaaS operations center.

Revenue analytics

The revenue dashboard pulls live data from Stripe to show the numbers that matter:

MRR and ARR — Calculated from active subscriptions, normalized for annual plans. You see total MRR, a breakdown by plan tier, and trailing trend data. ARPU is computed automatically from paying account count.

Churn and retention — Logo churn rate (accounts lost), revenue churn rate (MRR lost), and Net Revenue Retention (NRR). NRR accounts for expansion revenue from upgrades and contraction from downgrades, giving you the full picture of whether existing customers are growing or shrinking. Expansion and contraction are tracked via Stripe subscription update events with previous_attributes comparison, so every plan change is captured.

Unit economics — Per-account view of MRR versus cost (token usage). JieGou calculates margin percentage for each account, so you can identify which customers are profitable and which are consuming more than they pay for. Costs are tracked from usage records and aggregated monthly.

Revenue snapshots are stored daily, enabling period-over-period comparison and accurate churn calculation against a known starting baseline.

Billing health

Revenue looks healthy in aggregate, but individual billing problems can hide in the noise. The billing health dashboard surfaces them:

Failed payments — Charges that failed in the last 30 days, with failure reason and amount at risk
Past-due subscriptions — Accounts with overdue payments that need dunning attention
Recent refunds — Refunds issued in the last 30 days with reason codes
Upcoming renewals — Subscriptions renewing in the next 7 days, so you can proactively address any issues
Revenue reconciliation — Expected MRR versus actual revenue collected (including overage charges), with a discrepancy percentage

User adoption

Knowing who pays is one thing. Knowing who’s actually using the product is another.

Active users — DAU, WAU, and MAU tracked via Redis HyperLogLog. HyperLogLog gives cardinality estimates with less than 2% error margin while using minimal memory — no need to store individual user IDs. The DAU/MAU ratio tells you at a glance how sticky the product is.

Feature adoption — Ten key features are tracked individually: chat, agent, workflows, schedules, triggers, bakeoffs, batch runs, documents, brand voice, and MCP tools. For each feature, you see unique accounts using it and total usage count, plus an adoption rate against total paying accounts.

Activation funnel — Seven milestones from first login to power user: pick department, run first AI task, give feedback, schedule a task, invite teammate, create workflow, view quality trend. Conversion rate at each stage shows where users get stuck.

A 30-day DAU trend chart rounds out the picture, showing daily active user counts over the last month.

Availability and SLA monitoring

Uptime isn’t optional. The availability monitoring system records health checks at per-minute resolution, tracking both Firestore and Redis component status.

Uptime calculation — Current month and rolling 30-day uptime percentage, measured against a 99.9% SLA target. A status indicator goes green (>= 99.9%), yellow (>= 99.5%), or red (< 99.5%). Error budget remaining shows how many minutes of degradation you can absorb before breaching the SLA.

Auto-incident detection — Three consecutive health check failures automatically create an incident record. Incidents are categorized by severity (minor, major, critical) based on how many components are affected. When health returns to normal, the incident auto-resolves.

Incident metrics — MTTR (Mean Time to Resolve) and MTBF (Mean Time Between Failures) over a 90-day window. These are the numbers auditors and enterprise buyers ask for.

Public status API — A public endpoint at /api/health returns current status with per-component breakdown. No authentication required. Returns HTTP 200 when healthy, 503 when degraded.

Security monitoring

Security incidents don’t announce themselves. The security monitoring layer watches for anomalies continuously:

Brute force detection — Per-IP authentication failure tracking with a 5-minute sliding window. More than 10 failures in 5 minutes triggers an automatic IP block. Unique failing IPs and 24-hour failure counts are visible at a glance.

API key health — Every API key’s age, validity status, and last validation timestamp. Keys older than 90 days trigger rotation alerts. You shouldn’t need to remember when you last rotated — the dashboard tells you.

Usage spike detection — A daily check compares each account’s usage against its trailing 7-day average. Usage exceeding 3x the average generates an alert, categorized by severity: low (3-5x), medium (5-10x), high (>10x). This catches compromised keys, runaway automations, and unexpected usage patterns.

Role change auditing — All role changes in the last 7 days, showing who changed whom and what the old and new roles were. Permission escalation is the most common vector for insider threats — visibility is the defense.

Alerts are dismissible by staff and tracked with audit metadata (who dismissed, when).

Dead letter queue dashboard

Async operations fail. Webhooks don’t deliver. Emails bounce. Scheduled runs time out. The DLQ dashboard shows all of it in one place.

JieGou tracks 15 categories of async operations: webhook delivery, email, audit logs, notifications, usage records, overage charges, scheduled runs, trigger runs, output destinations, connector syncs, insights digests, batch executions, pipeline runs, and Slack notifications.

For each category, you see pending, retrying, and exhausted counts, plus a retry success rate and the age of the oldest pending entry. Failed operations retry automatically with exponential backoff — 1 minute, 5 minutes, 15 minutes — up to 3 attempts before marking as exhausted.

This isn’t just an error log. It’s an operational dashboard that tells you which subsystems need attention and whether the retry mechanism is actually recovering from failures.

Everything in one place

The expanded Operations Hub brings together six views that teams typically scatter across multiple tools:

Automation landscape — Department-level health, cross-department dependencies
Governance — User permissions, change history, compliance levels
Revenue & billing — MRR, churn, billing health, reconciliation
Adoption — DAU/WAU/MAU, feature usage, activation funnel
Availability — Uptime, SLA compliance, incident history
Security — Brute force detection, key health, usage anomalies

No Grafana dashboards to maintain. No Stripe dashboard tabs to juggle. No separate security monitoring tools. One console, one login, one set of alerts.

Availability

The automation landscape, governance, and org analytics views are available on all plans. Revenue analytics, availability monitoring, and security monitoring are available on Team and Enterprise plans. Learn more about the Operations Hub or start your free trial.