On-Call And Support
Owner: Engineering Last reviewed: 2026-Q2
This page defines expectations for support ownership, alert handling, and handoff.
Coverage Model
- During active development, the engineer or pod that owns a feature owns first-line triage for regressions in that feature.
- Production-critical services need a named weekly primary and backup before prod is enabled.
- Dev and showcase incidents are handled during working hours unless they block demos, releases, or security validation.
Responsibilities
- Monitor deploy outcomes, health endpoints, and key product flows after changes merge.
- Triage support reports by environment, user impact, affected workflow, and recent changes.
- Keep status visible in the incident or support thread.
- Escalate quickly for security, data access, billing/entitlements, migrations, or provider outages.
Handoff
Use this handoff format:
Current status:
Affected environment:
Known impact:
Recent changes checked:
Logs/dashboards checked:
Next recommended action:
Open risks:
Links:Triage Checklist
- Confirm environment: local, dev, showcase, prod.
- Confirm user identity and permissions without exposing sensitive data.
- Check recent deploys for frontend and backend.
- Check backend health endpoints and frontend load path.
- Check provider dependencies: Clerk, Azure PostgreSQL, Blob Storage, OpenAI, Vision, Document Intelligence, Front Door/CDN.
- Reproduce with the narrowest workflow possible.
- Decide if this is an incident, a bug, a configuration issue, or user support.
Escalation Triggers
- Suspected cross-user data exposure.
- Authentication or authorization failure.
- Failed migration or potential data loss.
- Production deploy failure.
- Cost spike from AI, embeddings, OCR, storage, or retry loops.
- Provider outage with user-visible impact.
Support Notes
- Never ask users to send secrets, tokens, raw credentials, or sensitive documents in chat.
- Redact document names, emails, IDs, and provider request IDs when sharing outside the engineering context.
- Turn repeated support issues into tests, monitoring, docs, or product improvements.