On-Call & Support

On-Call And Support

Owner: Engineering Last reviewed: 2026-Q2

This page defines expectations for support ownership, alert handling, and handoff.

Coverage Model

  • During active development, the engineer or pod that owns a feature owns first-line triage for regressions in that feature.
  • Production-critical services need a named weekly primary and backup before prod is enabled.
  • Dev and showcase incidents are handled during working hours unless they block demos, releases, or security validation.

Responsibilities

  • Monitor deploy outcomes, health endpoints, and key product flows after changes merge.
  • Triage support reports by environment, user impact, affected workflow, and recent changes.
  • Keep status visible in the incident or support thread.
  • Escalate quickly for security, data access, billing/entitlements, migrations, or provider outages.

Handoff

Use this handoff format:

Current status:
Affected environment:
Known impact:
Recent changes checked:
Logs/dashboards checked:
Next recommended action:
Open risks:
Links:

Triage Checklist

  • Confirm environment: local, dev, showcase, prod.
  • Confirm user identity and permissions without exposing sensitive data.
  • Check recent deploys for frontend and backend.
  • Check backend health endpoints and frontend load path.
  • Check provider dependencies: Clerk, Azure PostgreSQL, Blob Storage, OpenAI, Vision, Document Intelligence, Front Door/CDN.
  • Reproduce with the narrowest workflow possible.
  • Decide if this is an incident, a bug, a configuration issue, or user support.

Escalation Triggers

  • Suspected cross-user data exposure.
  • Authentication or authorization failure.
  • Failed migration or potential data loss.
  • Production deploy failure.
  • Cost spike from AI, embeddings, OCR, storage, or retry loops.
  • Provider outage with user-visible impact.

Support Notes

  • Never ask users to send secrets, tokens, raw credentials, or sensitive documents in chat.
  • Redact document names, emails, IDs, and provider request IDs when sharing outside the engineering context.
  • Turn repeated support issues into tests, monitoring, docs, or product improvements.