Skip to content

Quality and CI

Quality is not a phase at the end of development. It's a property of how you work. When quality is built into your process—through testing, automation, and careful release practices—you catch problems early, ship with confidence, and spend less time firefighting.

This page covers the systems that make quality sustainable: testing strategies, CI/CD pipelines, feature flags, code review, and release hygiene. The goal is not perfection but a quality level that matches your users' needs and your team's capacity.

What Problem This Solves

Quality problems compound. A bug that escapes to production costs 10x more to fix than one caught in development. A flaky test suite that everyone ignores teaches the team to distrust automation. A slow CI pipeline that takes 45 minutes creates pressure to skip it.

When quality systems are weak, you see the symptoms:

Bugs escape to production. Users find issues before the team does. Trust erodes. The team spends cycles on hotfixes instead of features.

CI is a bottleneck. Developers wait for pipelines. They batch changes to avoid the wait, making reviews harder and risk higher.

Testing is inconsistent. Some areas are well-tested; others are untested. Coverage is a vanity metric disconnected from confidence.

Releases are scary. Nobody wants to deploy on Friday. Rollbacks are manual and stressful. The team ships less frequently to avoid risk.

Strong quality systems prevent all of these. They create the confidence to ship frequently and the safety net to catch problems early.


When to Invest in Quality Systems

Actively invest when:

  • Bugs are escaping to production at an unacceptable rate
  • CI is slow, flaky, or distrusted
  • Releases are infrequent because they're risky
  • New engineers don't know what or how to test
  • You're scaling the team and need consistent practices

Maintain when:

  • Quality is stable but needs to stay that way
  • Onboarding new engineers to existing practices
  • Evolving the codebase (new tests for new code)

Quality investment is continuous. The goal is not a one-time fix but a sustainable system.


Ownership

Role Responsibility
Tech Lead Defines testing strategy, code review standards, and quality bar
Platform/DevOps Owns CI/CD infrastructure, deployment pipelines, feature flag systems
Engineering Manager Ensures team has time for quality work, addresses systemic quality issues
Individual Contributors Write tests, participate in code review, maintain quality in their work
QA (if present) Advises on testing strategy, owns manual/exploratory testing, maintains test infrastructure

Quality is everyone's job

Don't create a quality silo. When only QA cares about quality, engineers write code that "someone else will test." Quality must be owned by the people who write the code.


Testing Strategy

The Testing Pyramid

The testing pyramid is a guide to where you should invest testing effort:

        /\
       /  \      E2E / Integration
      /----\
     /      \    Integration / Service
    /--------\
   /          \  Unit Tests
  /------------\

Unit tests (base): Fast, isolated, test individual functions or classes. High volume, low cost per test. Catch logic errors.

Integration tests (middle): Test how components work together. Slower, more complex, but catch interaction bugs that unit tests miss.

End-to-end tests (top): Test the full user journey. Slowest, most brittle, but catch issues that only appear in the real environment.

The principle: Most of your tests should be at the base. Each layer up should have fewer tests because they're slower and more expensive to maintain.

What to Test

Test the things that matter. Focus testing effort on:

  • Core business logic
  • Integrations with external systems
  • Edge cases that have caused bugs before
  • Paths where failure would be costly

Don't test everything. Low-value tests create maintenance burden without proportional confidence. Simple CRUD operations with well-tested frameworks don't need exhaustive testing.

Test Quality Over Quantity

Coverage metrics are misleading. 80% coverage with shallow tests is worse than 50% coverage with meaningful tests.

Signs of good tests:

  • Tests break when the feature breaks
  • Tests don't break when unrelated things change
  • Tests are readable—you can understand what they're testing
  • Tests run fast enough that people actually run them

Signs of bad tests:

  • Tests pass when the feature is broken
  • Tests break on every refactor
  • Tests are brittle (depend on timing, order, external state)
  • Tests are so slow that developers skip them

CI/CD Pipeline

Pipeline Design Principles

Fast feedback first. Structure your pipeline so the fastest checks run first. If linting fails in 30 seconds, developers shouldn't wait 15 minutes for integration tests to find out.

Fail early, fail clearly. When something fails, the failure message should tell developers exactly what's wrong and where. "Build failed" is useless. "Test X failed because Y" is useful.

Make it green. A pipeline that's usually red teaches developers to ignore it. If tests are flaky, fix them or remove them. A green pipeline should mean "safe to merge."

Keep it fast. Every minute of pipeline time is a minute of developer waiting. If your pipeline takes 30+ minutes, developers will batch changes, skip CI, or context-switch excessively. Target < 10 minutes for the critical path.

Typical Pipeline Stages

Commit → Lint/Format → Build → Unit Tests → Integration Tests → Deploy to Staging → E2E Tests → Deploy to Production

Commit hooks: Fast checks (linting, formatting) can run before commit to catch obvious issues immediately.

Pre-merge checks: Lint, build, unit tests—the fast stuff that gates every PR.

Post-merge checks: Slower integration tests, deployment to staging.

Deployment: Automated deployment to staging, gated deployment to production.

Handling Flaky Tests

Flaky tests—tests that sometimes pass and sometimes fail without code changes—are quality poison. They teach developers to distrust the suite and retry until green.

How to handle flaky tests:

  1. Quarantine immediately. Move flaky tests out of the critical path so they don't block merges.
  2. Track and prioritize. Maintain a list of quarantined tests. Prioritize fixing them.
  3. Investigate root causes. Flaky tests usually have deterministic causes: timing issues, test pollution, external dependencies.
  4. Don't just retry. Automatic retries hide the problem. Fix the test or remove it.

Feature Flags

Feature flags decouple deployment from release. You can deploy code to production without exposing it to users, then enable it gradually or for specific segments.

Why Use Feature Flags

Reduce deployment risk. If a feature causes problems, disable the flag instead of rolling back the deployment.

Enable gradual rollout. Release to 1% of users, then 10%, then 50%, then 100%. Catch problems at small scale.

Support testing in production. Enable features for internal users or beta testers before general release.

Enable trunk-based development. Long-lived feature branches create merge pain. Flags let you merge incomplete work safely.

Feature Flag Hygiene

Flags have a lifecycle. A flag should be temporary. Once a feature is fully rolled out, remove the flag and the old code path. Stale flags create confusion and technical debt.

Limit active flags. Too many flags create combinatorial complexity. Set a limit and enforce cleanup.

Name flags clearly. enable_new_checkout_flow is better than flag_123. Include the ticket or initiative in the name.

Test both paths. When a flag exists, you have two code paths. Test both, or you'll discover bugs when you flip the flag.

Document flag behavior. What does the flag do? What's the rollout status? Who owns it? When should it be removed?


Code Review

Code review is a quality gate, a knowledge-sharing mechanism, and a mentorship tool. Done well, it catches bugs, spreads context, and raises the team's collective skill. Done poorly, it's a bottleneck that adds friction without value.

Code Review Principles

Review for correctness, not style. Style should be automated (linters, formatters). Review time is for logic, design, and things machines can't catch.

Be timely. Unreviewed PRs block progress. Aim to review within hours, not days. If you can't review quickly, say so.

Be specific. "This doesn't look right" is unhelpful. "This query will N+1 because X—consider Y" is helpful.

Distinguish blocking from non-blocking. Make it clear which comments require changes before merge and which are suggestions for future consideration.

Approve when good enough. Don't block on perfection. If it's correct and doesn't make things worse, approve. Improvements can come in follow-up PRs.

Code Review Checklist

Reviewers should check:

  • Correctness: Does the code do what it's supposed to?
  • Tests: Are the right things tested? Do the tests actually test them?
  • Edge cases: What happens with empty inputs, large inputs, concurrent access?
  • Security: Any injection risks, auth issues, data exposure?
  • Performance: Any obvious performance problems? O(n²) where O(n) is possible?
  • Readability: Will future maintainers understand this?
  • Consistency: Does this follow existing patterns and conventions?

Release Hygiene

Release Cadence

Ship frequently. Smaller, more frequent releases are less risky than large, infrequent ones. Each release has less change, so problems are easier to identify and fix.

Target: Most teams should be able to release at least weekly, ideally daily or continuously.

What blocks frequent releases:

  • Manual QA requirements
  • Change approval processes
  • Fear of release (lack of confidence in quality gates)
  • Lack of rollback capability

Address these blockers to enable healthy release cadence.

Deployment Strategies

Blue-green deployment: Run two identical environments. Deploy to the inactive one, then switch traffic. Easy rollback by switching back.

Canary deployment: Route a small percentage of traffic to the new version. Monitor for problems before increasing traffic.

Rolling deployment: Gradually update instances. Some traffic goes to old version, some to new. Standard for containerized deployments.

Feature flag release: Deploy code but keep it behind a flag. Enable the flag gradually. Decouple deployment from release.

Rollback Readiness

Every deployment should be easy to roll back. If rolling back is hard, you'll resist doing it when you should, and problems will persist longer than necessary.

Rollback requirements:

  • One-click or one-command rollback
  • Rollback tested regularly (not just theoretically possible)
  • Database migrations are backward-compatible (or separately sequenced)
  • Team knows when to roll back vs. roll forward

What Good Looks Like

You'll know your quality systems are working when:

Signal What it looks like
Few production bugs Users rarely discover issues before the team does
Fast CI Pipeline runs in < 10 minutes; developers don't skip it
Green is meaningful A green build reliably means the code works
Confident releases Team ships frequently without anxiety
Quick rollback Problems are reverted in minutes, not hours
Sustainable pace Quality work is budgeted, not squeezed out

Metrics to Track

  • Change failure rate: Percentage of deployments that cause a rollback or fix
  • Mean time to recovery (MTTR): How long from problem detection to resolution
  • CI pipeline duration: How long developers wait
  • Test flakiness rate: Percentage of test runs with flaky failures
  • Deployment frequency: How often you ship

Failure Modes and Mitigations

The Flaky Suite

Symptom: Tests pass and fail randomly. Developers retry until green. Nobody trusts the suite.

Root cause: Test pollution, timing dependencies, external dependencies, poor isolation.

Mitigation: Quarantine flaky tests immediately. Invest in fixing them. Track flakiness rate. Consider flaky tests as bugs, not annoyances.

The Slow Pipeline

Symptom: CI takes 30+ minutes. Developers batch changes, skip CI, or context-switch constantly.

Root cause: Too many slow tests, no parallelization, slow infrastructure.

Mitigation: Parallelize tests. Move slow tests to a post-merge stage. Invest in faster infrastructure. Measure and set targets.

The Review Bottleneck

Symptom: PRs sit for days awaiting review. Developers have multiple PRs in flight. Merging is a relief, not routine.

Root cause: Not enough reviewers, reviews not prioritized, PRs too large.

Mitigation: Smaller PRs (< 400 lines). Shared on-call reviewer rotation. Team norms around review turnaround.

The Coverage Theater

Symptom: High coverage numbers but bugs still escape. Tests exist but don't catch real problems.

Root cause: Tests written for coverage, not confidence. Tests don't exercise real failure modes.

Mitigation: Stop measuring coverage as a goal. Focus on meaningful tests for critical paths. Review tests as carefully as production code.

The Flag Graveyard

Symptom: Dozens of feature flags, most stale. Nobody knows which flags are active or what they do.

Root cause: Flags created but never cleaned up. No ownership.

Mitigation: Flag lifecycle policy: flags must be removed within N days of full rollout. Limit active flags. Regular flag audits.


Copy-Paste Artifact: PR Checklist

## PR Checklist

**Before requesting review:**

- [ ] Code compiles/builds without errors
- [ ] All tests pass locally
- [ ] New tests added for new functionality
- [ ] Existing tests updated if behavior changed
- [ ] No commented-out code
- [ ] No debug logging left in
- [ ] PR description explains what and why
- [ ] Self-reviewed the diff for obvious issues

**For reviewers:**

- [ ] Logic is correct and handles edge cases
- [ ] Tests cover the right scenarios
- [ ] No obvious security issues
- [ ] No obvious performance issues
- [ ] Code is readable and maintainable
- [ ] Follows existing patterns and conventions
- [ ] Changes are appropriately sized (consider splitting if > 400 lines)

**Before merging:**

- [ ] CI is green
- [ ] Required approvals received
- [ ] Merge conflicts resolved
- [ ] Feature flag configured (if applicable)

Copy-Paste Artifact: Feature Flag Spec

## Feature Flag: [Flag Name]

**Created:** [Date]
**Owner:** [Name]
**Related ticket:** [Link]

### Purpose

[What does this flag control? What feature is being released?]

### Flag key

`[flag_key_name]`

### Default state

- [ ] Disabled (feature hidden by default)
- [ ] Enabled (feature shown by default)

### Rollout plan

| Stage | Audience         | Date   | Notes             |
| ----- | ---------------- | ------ | ----------------- |
| 1     | Internal/dogfood | [Date] | Testing by team   |
| 2     | 5% of users      | [Date] | Monitor metrics   |
| 3     | 25% of users     | [Date] | If stage 2 stable |
| 4     | 100% of users    | [Date] | Full rollout      |
| 5     | Flag removal     | [Date] | Clean up code     |

### Success metrics

[How do we know the feature is working? What metrics should improve?]

### Rollback criteria

[Under what conditions should we disable the flag?]

### Cleanup deadline

**Remove flag by:** [Date, typically 2-4 weeks after full rollout]

---

**Status:** [ ] Active [ ] Fully rolled out [ ] Removed
**Removal PR:** [Link when removed]

Copy-Paste Artifact: Quality Review Template

Use this quarterly to assess your quality systems.

## Quality Systems Review

**Quarter:** [Q_ YYYY]
**Team:** [Name]
**Reviewer:** [Name]

### Metrics

| Metric               | Target   | Actual      | Trend |
| -------------------- | -------- | ----------- | ----- |
| Change failure rate  | < 15%    | \_\_\_%     | ↑/↓/→ |
| CI pipeline duration | < 10 min | \_\_\_ min  | ↑/↓/→ |
| Test flakiness rate  | < 2%     | \_\_\_%     | ↑/↓/→ |
| Deployment frequency | Daily    | \_\_\_/week | ↑/↓/→ |
| MTTR                 | < 1 hour | \_\_\_ min  | ↑/↓/→ |

### Assessment

**Testing:**

- Test coverage adequate for critical paths? [ ] Yes [ ] No [ ] Partial
- Tests catch real bugs? [ ] Usually [ ] Sometimes [ ] Rarely
- Test suite maintainable? [ ] Yes [ ] No [ ] Declining

**CI/CD:**

- Pipeline fast enough? [ ] Yes [ ] No
- Flaky tests under control? [ ] Yes [ ] No
- Deployment automated? [ ] Fully [ ] Partially [ ] Manual

**Releases:**

- Releasing frequently? [ ] Daily+ [ ] Weekly [ ] Less
- Rollback capability? [ ] Easy [ ] Possible [ ] Hard
- Feature flags healthy? [ ] Yes [ ] Stale [ ] Excessive

### Top Issues

1. [Issue]
2. [Issue]
3. [Issue]

### Improvement Actions

| Action   | Owner  | Due    |
| -------- | ------ | ------ |
| [Action] | [Name] | [Date] |
| [Action] | [Name] | [Date] |

Further Reading

  • Continuous Delivery by Jez Humble and David Farley – The foundational text on CI/CD
  • Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim – The research behind DORA metrics
  • Release It! by Michael Nygard – Patterns for building resilient systems