AI Operations

AI And Document Operations

Owner: Engineering Last reviewed: 2026-Q2

This runbook covers operational debugging for AI, OCR, document analysis, embeddings, vector search, and file-to-action-point workflows.

Key Workflows

  • Text to action points: /api/openai/optibot.
  • File/query to action points: /api/query/boards/{board_id}/process-query.
  • OCR: /api/vision/*.
  • Document Intelligence: /api/doc-intel/*.
  • Embeddings: /api/embeddings/*.
  • Vector document workflows: /api/azuredocs/*.
  • Blob-backed document operations: /api/azure/* and /api/documents/*.

Common Failure Modes

SymptomLikely AreaFirst Checks
Upload rejectedFile validationSize, MIME, extension, configured limits
Document stuck processingBackground work/providerLogs, provider status, blob existence, DB status fields
OCR/document extraction timeoutAzure Vision/Document IntelligenceProvider latency, file size, retry limits, operation polling
Empty or poor AI outputPrompt/context/modelExtracted text quality, chunking, model deployment, JSON parsing
Embedding failureAzure OpenAI embeddingsEndpoint/model env vars, batch size, provider throttling
Vector search misses expected docspgvector/chunkingDocument chunks, embedding dimensions, metadata filters
Cost spikeRetry loop or bulk processingRequest volume, batch endpoints, provider retries

Debugging Steps

  1. Confirm the authenticated user has access to the board/document/action point.
  2. Check the file metadata, document status, content hash, version, and storage path.
  3. Check provider-specific logs and response status.
  4. Confirm required env vars from environment-variables.md.
  5. Reproduce with a small known-good file.
  6. Verify no real provider calls are happening in automated tests.

Reprocessing Guidance

  • Prefer idempotent reprocessing keyed by document ID/content hash where available.
  • Avoid manually editing document state unless the expected state transition is clear.
  • If a document was partially processed, inspect chunks, embeddings, and blob contents before retrying.
  • Record manual reprocessing in the incident/support thread.

Provider Limits And Cost

  • Batch endpoints should have bounded concurrency and maximum item counts.
  • Provider throttling should degrade gracefully and produce actionable logs.
  • Expensive AI, OCR, and embedding workflows should be covered by tests with mocked providers.
  • Watch for retry loops after provider outages or bad credentials.

Security Checks

  • Treat document contents as untrusted input.
  • Do not pass secrets into prompts or model context.
  • URL-based analysis must protect against SSRF before broad external use.
  • Generated action points must still pass authorization checks before being saved to a board.