AI Integration

How AI capabilities are integrated into the WorkOpti product. All AI features are backed by Azure-hosted services, accessed through the backend service layer, and connected to the Kanban board workflow.

Operational debugging and reprocessing guidance lives in ai-operations.md.

Architecture Overview

User Query / File Upload
    │
    ▼
[FastAPI Routes] ─── auth via Clerk JWT
    │
    ▼
[Service Layer]
    ├── OpenAI Service ──────────► Azure OpenAI (GPT-4-turbo)
    ├── Embeddings Service ──────► Azure OpenAI (text-embedding-ada-002)
    ├── Vision Service ──────────► Azure Computer Vision (OCR)
    ├── Doc Intelligence Service ► Azure Document Intelligence
    ├── Adaptive Chunker ────────► Local (token-aware splitting)
    ├── Enhanced Query Service ──► Local (query analysis + ranking)
    └── Vector Store Service ────► PostgreSQL pgvector
                                       │
                                       ▼
                                  [Kanban Board]
                                  Action points, boards, columns

Capabilities

1. Action Point Generation from Text (OptiBot)

Route: POST /api/openai/optibot Service: services/openai_service.py Model: Azure OpenAI GPT-4-turbo (configurable via AZURE_OPENAI_DEPLOYMENT)

Generates 3-5 actionable action points from a user query, optionally enriched with document context. The LLM is instructed to produce imperative-verb titles of 10 words or fewer with concrete deliverables in 1-3 sentences. Output is forced to JSON format.

Flow: User query + optional context → adaptive chunking (if context exceeds token limit) → Azure OpenAI (JSON mode, temp=0.1) → parsed action-point list → saved to Kanban board.

Security: System prompt treats document context as untrusted data with explicit jailbreak resistance. Context is passed as a separate user message, never embedded in the system prompt.

2. Document Text Extraction (OCR)

Routes: POST /api/vision/analyze-url, analyze-image, analyze-upload, batch-analyze, batch-analyze-upload Service: services/azure_ai_vision_service.py Backing service: Azure Computer Vision (READ visual feature)

Extracts text from images with spatial layout, per-word confidence scores, and bounding polygons. Supports URL, base64, and file upload inputs. Batch operations process up to 10 images concurrently (8 max concurrent API calls) with retry logic and rate-limit handling.

3. Document Structure Analysis

Routes: POST /api/doc-intel/analyze-url, analyze-image, analyze-upload, batch-analyze, batch-analyze-upload Service: services/azure_document_intelligence_service.py Backing service: Azure Document Intelligence (prebuilt-layout model)

Extracts structured content from PDFs, Word docs, and presentations: page layout, tables (with row/column structure and cell spans), key-value pairs, paragraphs with role tags (title, abstract, etc.), font styles, and language detection. Uses async polling for long-running operations (5 max concurrent, exponential backoff, 5-minute timeout).

4. Text Embeddings

Routes: POST /api/embeddings/get-embeddings, get-batch-embeddings, add-document, add-documents Service: services/openai_embeddings_service.py Model: Azure OpenAI text-embedding-ada-002 (1536 dimensions)

Converts text into vector embeddings for semantic search and similarity matching. Batch operations process texts concurrently via ThreadPoolExecutor. Embeddings are stored in PostgreSQL using the pgvector extension.

5. Vector Search & RAG

Service: services/vector_store_service.py Storage: PostgreSQL pgvector via LlamaIndex PGVectorStore

Stores document chunk embeddings and performs similarity search for retrieval-augmented generation. Supports metadata filtering and top-k retrieval with relevance scores. Used by the query pipeline to provide document context to the LLM.

6. File-to-Action-Points Pipeline

Route: POST /api/query/boards/{board_id}/process-query Service: services/file_processing_service.py

End-to-end workflow: upload file → extract text (Vision for images, Document Intelligence for documents) → combine with user query → generate action points via OpenAI → save to board. Supports PDF, DOCX, PPTX, TXT, MD, CSV, and common image formats. Files are uploaded to Azure Blob Storage in the background.

7. Bot Interaction Tracking

Route: historical bot interaction route; current query workflows are under /api/query. Models: BotInteraction, BotGeneratedActionPoint

Records AI-generated content for audit: stores the original query, LLM response, and links generated action points back to the interaction. Provides traceability for AI-created action points.

Supporting Services

Adaptive Chunking

Service: services/adaptive_chunker_service.py

Breaks large documents into semantically meaningful chunks that respect token limits. Detects content type (code, JSON, XML, tabular, lists, technical docs, prose) and applies type-specific chunking strategies with appropriate boundaries, size limits, and overlap percentages. Uses tiktoken for accurate token counting.

Enhanced Query Processing

Service: services/enhanced_query_service.py

Analyzes queries before processing: classifies complexity (simple → analytical), detects intent (extraction, analysis, summarization, comparison, generation, explanation, search), and selects a processing strategy (direct, chunked, iterative, hierarchical). Ranks context chunks by relevance using keyword matching, content type matching, structure bonuses, and intent-specific scoring.

Configuration

Models (defined in core/config.py):

Model	Purpose	Token Limit
`gpt-4-turbo`	Chat / action-point generation (default)	128K
`gpt-5`	Chat / action-point generation (supported)	200K
`text-embedding-ada-002`	Embeddings	1536 dims

Required environment variables:

AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_API_VERSION — chat model
AZURE_OPENAI_DEPLOYMENT — deployment name (may differ from model name)
AZURE_OPENAI_EMBEDDINGS_ENDPOINT, AZURE_OPENAI_EMBEDDINGS_MODELNAME — embeddings
AZURE_COMPUTER_VISION_ENDPOINT, AZURE_COMPUTER_VISION_KEY — OCR
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT, AZURE_DOCUMENT_INTELLIGENCE_KEY — document analysis

Performance

Concurrent processing — ThreadPoolExecutor for synchronous Vision/Embeddings APIs; asyncio + aiohttp for Document Intelligence long-running operations
Token management — pre-flight token counting via tiktoken with model-specific limits, 95% threshold warnings, fallback character-based estimation
Connection pooling — aiohttp TCPConnector with DNS caching (300s TTL) for Document Intelligence
Adaptive chunking — content-type-specific strategies preserve context integrity vs. naive truncation

Security

Prompt injection resistance — system prompt is final and cannot be overridden; document context treated as untrusted data in a separate message
File upload validation — MIME type detection from file content (magic bytes), extension whitelist, max 10MB/file and 30MB/session (core/config.py)
Authentication — all AI routes require Clerk JWT via get_current_user dependency
No secrets in prompts — API keys and credentials are never passed to the LLM

On-Call & Support AI Operations