Agentic Engineering

Agentic Engineering

Owner: Engineering Last reviewed: 2026-Q2

Guidelines for using AI coding agents (Claude Code, Codex, Cursor, etc.) within our engineering workflow. The goal is to increase developer productivity while preserving reliability, security, and maintainability.

AI agents can accelerate development, but they must never replace engineering judgment. Every change produced with agent assistance must meet the same quality bar as manually written code.

Non-Negotiables

1. Product Thinking Before Prototyping

Agents make it trivially easy to build something. That does not mean it should be built. Before starting an agent session for a new feature, have a clear answer to: what problem does this solve, is it worth solving, and is it aligned with the product roadmap? Feature scope decisions should involve engineering leadership.

2. Tests First (TDD)

All AI-assisted code must follow test-driven development:

  1. Red — Write a test for the behavior you want. Run it. It fails because the code doesn’t exist yet.
  2. Green — Write the minimum code to make the test pass. Nothing more.
  3. Refactor — Clean up the code while keeping the test green.

AI should never generate untested production logic.

3. Engineers Own the Code

AI output must be treated as untrusted code. Engineers are fully responsible for correctness, security, performance, and maintainability. All generated code must be reviewed, understood, and verified before merging.

4. Manual Verification is Mandatory

Automated tests are not sufficient. Before opening a PR, engineers must manually exercise the feature, verify expected behavior, test edge cases, and confirm no regressions were introduced.

5. Secrets Must Never Be Shared

Agents must never be given secrets or sensitive credentials: API keys, database credentials, access tokens, private keys, customer data, or proprietary datasets. Sensitive data must remain in secure systems (Key Vault, environment variables).

6. Small, Reviewable PRs

Agent-assisted PRs must be small and focused: one logical change, clear tests, limited surface area. Large AI-generated changes are not acceptable.

7. Security-Sensitive Changes Require Second Review

Changes affecting authentication, authorization, user data isolation, data storage boundaries, secrets handling, infrastructure configuration, or database schema require explicit second reviewer approval.

Standard Agent Workflow

Stage 1: Before the Session

  • Ensure local tests are running
  • Confirm the project builds successfully
  • Review relevant code and architecture
  • Define a clear goal for the task

Agents perform best when given precise, scoped tasks.

Stage 2: During the Session

  • Give clear and focused prompts
  • Generate code incrementally
  • Review generated output carefully
  • Run tests after every change

Avoid:

  • Asking agents to generate large multi-feature implementations
  • Letting the agent absorb a hacky workaround instead of refactoring the original design — if the iteration path forces compromises, pause and fix the design
  • Blindly accepting generated changes
  • Delegating architecture decisions to the agent
  • Building features speculatively because “it’s easy to prompt” — easy to build is not the same as worth building

Stage 3: Before Opening a PR

  • Run the full test suite
  • Run linting and formatting tools
  • Manually verify functionality
  • Confirm documentation changes (if applicable)
  • Review the entire diff to confirm no unnecessary changes, no secrets, and alignment with project patterns

Stage 4: During Code Review

Reviewers must treat AI-assisted PRs the same as manually written code. Verify correctness of logic, readability, test coverage, and adherence to project conventions. For sensitive changes (auth, storage, user data isolation), conduct extra scrutiny.

Stage 5: After Merge

  • Monitor logs and telemetry
  • Watch for unexpected regressions
  • Validate system stability

If issues arise, investigate and correct immediately.

Security and High-Risk Areas

Exercise extra caution when using agents in these areas:

  • Authentication and authorization — avoid privilege escalation vulnerabilities in identity handling, permission checks, session validation, and access tokens
  • User data boundaries — user isolation must be strictly enforced; queries must always scope through ownership or explicit grants; verify AI-generated queries preserve data isolation
  • File storage and user data — review for proper access control, user ownership validation, and safe file handling
  • Database migrations — review for data loss risks, backward compatibility, migration performance, and rollback safety; validate in staging first
  • Prompt injection risks — treat external inputs as untrusted; avoid passing unvalidated content to agents; verify generated code does not execute unsafe instructions

Tooling and Context

  • PR template checklist — tests added/updated, tests pass locally, manual verification completed, linting passes, no secrets included, diff reviewed
  • CI enforcement — test execution, lint checks, formatting rules, build validation; PRs failing CI should not be merged
  • Project context files — agents should rely on CLAUDE.md, architecture docs, API schemas, and project conventions; maintaining accurate context improves agent output quality
  • API client generation — when API schemas change, regenerate clients (npm run orval) to ensure backend/frontend contract consistency

Rollout

PhaseFocusGoal
1. ExplorationSmall features, test generation, refactoringBuild familiarity with tools
2. Structured UsageBroader development under these guidelinesMeasure productivity, quality, review effort
3. Operational IntegrationRegular part of development workflowsPeriodic review of stability, efficiency, cost

Success Metrics

  • Productivity — development cycle time, ticket-to-merge time, PR throughput
  • Quality — bug rate in production, rollback frequency, hotfix rate
  • Review Efficiency — PR review time, number of revisions required, reviewer load
  • Cost — agent usage cost per sprint, cost per completed feature
  • Developer Experience — developer satisfaction, perceived productivity improvements, ease of maintaining AI-generated code