Skip to content

Engineering Metrics

Engineering metrics help you understand how effectively your teams deliver value. They reveal patterns in your delivery system, surface bottlenecks, and provide a foundation for improvement conversations. Used well, they help you ship better software faster. Used poorly, they create bureaucracy and gaming.

This page covers the core delivery metrics most engineering teams should track, how to implement and review them, and how to avoid the common failure modes.

What Problem This Solves

Engineering delivery is complex and often opaque. Without measurement:

You cannot see patterns. Is your team getting faster or slower? Is quality improving? Are certain areas of the codebase consistently problematic? Without data, you are guessing.

Conversations become subjective. "We're slow" versus "We're doing fine" becomes an argument of competing perceptions. Metrics provide shared ground.

Improvement lacks feedback. You try something new—smaller PRs, more pairing, different tooling—but you cannot tell if it helped.

Problems hide until they explode. Degradation happens gradually. You do not notice cycle time creeping up or reliability declining until it becomes a crisis.

Good engineering metrics make these patterns visible so you can address them before they become severe.


When to Measure

Actively invest in metrics when:

  • You are establishing a new team and want to understand baseline performance
  • You suspect delivery problems but lack data to confirm
  • Leadership is asking questions you cannot answer with confidence
  • You want to evaluate whether an improvement effort is working
  • You are scaling and need visibility across multiple teams

Maintain when:

  • Things are working—metrics are health checks, not deep investigations
  • Onboarding new people who need to understand team performance

Investigate when:

  • Metrics move unexpectedly in either direction
  • Metrics and team sentiment diverge (numbers look good but team feels bad)
  • External stakeholders express concern about delivery

Ownership

Role Responsibility
Engineering Manager Owns metric visibility and review cadence; addresses systemic issues revealed by metrics
Tech Lead Interprets technical implications; proposes improvement actions
Platform/DevOps Provides metric infrastructure; ensures data accuracy
Individual Contributors Understands what metrics mean; contributes to improvement

Metrics are for teams, not individuals

Never use engineering metrics to evaluate individual performance. This destroys collaboration, encourages gaming, and makes the metrics useless. Keep metrics at the team level.


Core Metrics: DORA

The DORA (DevOps Research and Assessment) metrics are the most validated predictors of software delivery performance. Research consistently shows they correlate with organizational performance and team wellbeing.

The Four Key Metrics

Metric Definition What It Indicates
Deployment Frequency How often you deploy to production Ability to ship incrementally
Lead Time for Changes Time from code commit to production Speed of delivery
Change Failure Rate Percentage of deployments causing failures Quality of releases
Mean Time to Recovery (MTTR) Time from incident detection to resolution Resilience and recovery capability

Why These Four

DORA metrics capture both speed and stability. The research shows that high performers are fast and stable—these are not trade-offs. Teams that deploy frequently have lower failure rates and recover faster. The metrics reinforce each other.

Performance Benchmarks

Level Deploy Frequency Lead Time Change Failure Rate MTTR
Elite Multiple times/day < 1 hour 0-15% < 1 hour
High Weekly to daily 1 day - 1 week 16-30% < 1 day
Medium Monthly to weekly 1 week - 1 month 16-30% 1 day - 1 week
Low Monthly+ 1-6 months 16-30% 1 week - 1 month

Use benchmarks as reference, not gospel. Your context matters. A startup and a regulated healthcare system have different acceptable risk profiles. Compare to yourself over time, not just to industry benchmarks.


Flow Metrics

Beyond DORA, flow metrics help you understand how work moves through your system.

Cycle Time

Definition: Time from when work starts to when it is done (typically "In Progress" to "Done" or "Deployed").

Why it matters: Long cycle time means slow feedback, high risk per change, and work that sits "in flight" too long. It is one of the most actionable metrics because so many practices affect it.

What affects cycle time:

  • Work item size (smaller is faster)
  • WIP limits (lower WIP reduces queue time)
  • Handoffs and waiting
  • Review bottlenecks
  • Deploy frequency

Typical targets: 2-5 days for most teams. If median cycle time exceeds 10 days, investigate.

Throughput

Definition: Number of work items completed per unit time.

Why it matters: Throughput indicates team capacity and is useful for forecasting. Combined with cycle time, it reveals whether you are doing fewer things faster (good) or more things slower (concerning).

Caution: Do not optimize throughput by making items artificially small. Count value delivered, not tickets closed.

Work in Progress (WIP)

Definition: Number of items currently being worked on.

Why it matters: High WIP correlates with longer cycle time (Little's Law: Cycle Time ≈ WIP / Throughput). High WIP also means more context switching and cognitive load.

Typical targets: WIP should not exceed 1.5-2x the number of people working. If you have 5 engineers, WIP above 10 is a red flag.

Flow Efficiency

Definition: Active time divided by total time. How much of cycle time is spent actually working versus waiting.

Why it matters: Most work spends more time waiting (in queues, in review, blocked) than being actively worked on. Flow efficiency reveals this.

Typical numbers: 15-40% is common. Below 15% indicates a waiting problem worth investigating.


Quality Metrics

Quality metrics reveal whether your delivery is sustainable or building up problems.

Change Failure Rate

Definition: Percentage of deployments that cause a production failure requiring rollback, hotfix, or incident response.

Why it matters: This is part of DORA but worth highlighting. High failure rate means your quality gates are not catching problems. Low failure rate gives confidence to deploy frequently.

Typical targets: Under 15% for elite performance. If above 30%, invest in testing, staging validation, or deployment strategies.

Escaped Defects

Definition: Bugs discovered in production (by users or monitoring) versus bugs caught before production.

Why it matters: Where you find bugs matters as much as how many you find. Bugs caught in development are cheap. Bugs caught in production are expensive.

What to track: Count of production bugs by severity. Trend over time. Ratio of production bugs to pre-production bugs.

Test Coverage (with Caveats)

Definition: Percentage of code executed by tests.

Why it matters (with caveats): Coverage alone is a poor metric—you can have 80% coverage with meaningless tests. But declining coverage suggests new code is not being tested. And very low coverage in critical areas is a risk.

How to use it: Track coverage trends, not absolute numbers. Require coverage for critical paths. Do not set coverage targets that encourage writing tests for coverage rather than confidence.

Incident Rate

Definition: Number of incidents per time period, often segmented by severity.

Why it matters: Increasing incident rate suggests reliability problems. Decreasing incident rate (alongside stable or increasing deploy frequency) suggests quality is improving.


Implementing Metrics

Data Sources

Metric Typical Source
Deploy frequency CI/CD system (GitHub Actions, Jenkins, etc.)
Lead time Version control + CI/CD timestamps
Change failure rate Incident tracking system + deployment correlation
MTTR Incident tracking timestamps
Cycle time Issue tracker workflow timestamps
Throughput Issue tracker
WIP Issue tracker snapshot
Test coverage CI coverage reports
Incident rate Incident tracking system

Building Dashboards

Start simple. A spreadsheet updated weekly is better than an elaborate dashboard nobody looks at.

Minimum viable dashboard:

  1. DORA metrics (four numbers, trended over time)
  2. Cycle time (median, trended)
  3. Current WIP
  4. Incident count by severity

Evolution path:

  • Start with manual collection if automated tooling is not available
  • Automate data collection as you confirm the metrics are useful
  • Add drill-down capability once you understand what questions you need to answer
  • Build team-specific views once you have multiple teams

Review Cadence

Metrics need regular review to be useful. Without review, dashboards become decoration.

Weekly (Team Level)

  • Quick check on cycle time and WIP
  • Any anomalies worth investigating?
  • Takes 5 minutes in standup or async

Sprint/Iteration (Team Level)

  • Review DORA metrics and cycle time
  • Discuss any changes from previous period
  • Connect to retro actions—did experiments help?
  • Takes 10-15 minutes in retro or planning

Monthly (Team + Leadership)

  • Trend analysis across multiple sprints
  • Compare to previous quarter
  • Strategic decisions about process investment
  • Takes 30-45 minutes

Quarterly (Organization Level)

  • Cross-team comparison (for patterns, not ranking)
  • Investment decisions (tooling, process, capacity)
  • Goal-setting for next quarter
  • Takes 60-90 minutes

What Good Looks Like

Signal What It Looks Like
Metrics inform decisions "We are investing in CI because lead time has increased 40% over two quarters"
Trends are visible Anyone can pull up a chart showing how metrics have changed
No gaming Teams focus on actual improvement, not making numbers look good
Balance maintained Speed and stability are both tracked; neither sacrificed
Context included Metrics are discussed alongside qualitative context
Actions follow signals When a metric degrades, the team investigates and responds

Failure Modes and Mitigations

Gaming the Metrics

Symptom: Cycle time improves but value delivered does not. People split work into tiny items or close things prematurely.

Root cause: Metric became a target instead of a signal. Pressure to hit numbers.

Mitigation: Pair metrics with outcome measures (user satisfaction, business impact). Never set numeric targets tied to evaluation.

Dashboard Graveyard

Symptom: Dashboards exist but nobody looks at them. Metrics are technically available but not used.

Root cause: No review rhythm. Metrics not connected to decisions.

Mitigation: Schedule recurring reviews. Start each review with "what decisions could this inform?"

Context-Free Comparison

Symptom: Teams compared on metrics without accounting for different contexts (team size, domain complexity, tech debt).

Root cause: Metrics used to rank rather than understand.

Mitigation: Compare teams to their own history, not each other. Use metrics to spark questions, not assign blame.

Measurement Overload

Symptom: Dozens of metrics, none clearly important. Analysis paralysis.

Root cause: Adding metrics without retiring any. Fear of missing something.

Mitigation: Limit core metrics to 5-7. Ask "what decision does this inform?" for every metric. Sunset unused ones.

Speed Without Stability

Symptom: Deploy frequency increases but so does incident rate. Fast but broken.

Root cause: Optimizing one DORA metric without the others.

Mitigation: Always review DORA metrics together. Speed and stability should improve together or neither should be pushed.


Copy-Paste Artifact: Metrics Dashboard Spec

## Engineering Metrics Dashboard

**Team:** [Name]
**Last updated:** [Date]

### DORA Metrics

| Metric               | This Week    | Last Week    | Trend | Target            |
| -------------------- | ------------ | ------------ | ----- | ----------------- |
| Deployment Frequency | \_\_\_/week  | \_\_\_/week  | ↑/↓/→ | [e.g., daily]     |
| Lead Time (median)   | \_\_\_ days  | \_\_\_ days  | ↑/↓/→ | [e.g., < 3 days]  |
| Change Failure Rate  | \_\_\_%      | \_\_\_%      | ↑/↓/→ | [e.g., < 15%]     |
| MTTR (median)        | \_\_\_ hours | \_\_\_ hours | ↑/↓/→ | [e.g., < 4 hours] |

### Flow Metrics

| Metric              | Current      | Trend | Notes                    |
| ------------------- | ------------ | ----- | ------------------------ |
| Cycle Time (median) | \_\_\_ days  | ↑/↓/→ |                          |
| Throughput          | \_\_\_/week  | ↑/↓/→ |                          |
| WIP                 | \_\_\_ items | ↑/↓/→ | Target: < [2x team size] |
| Flow Efficiency     | \_\_\_%      | ↑/↓/→ |                          |

### Quality Metrics

| Metric                         | This Period | Previous | Trend |
| ------------------------------ | ----------- | -------- | ----- |
| Incidents (SEV1/2)             | \_\_\_      | \_\_\_   | ↑/↓/→ |
| Escaped Defects                | \_\_\_      | \_\_\_   | ↑/↓/→ |
| Test Coverage (critical paths) | \_\_\_%     | \_\_\_%  | ↑/↓/→ |

### Data Sources

| Metric              | Source          | Collection |
| ------------------- | --------------- | ---------- |
| Deploy frequency    | [CI/CD tool]    | Automated  |
| Lead time           | [Git + CI/CD]   | Automated  |
| Change failure rate | [Incident tool] | Manual tag |
| MTTR                | [Incident tool] | Automated  |
| Cycle time          | [Issue tracker] | Automated  |
| Throughput          | [Issue tracker] | Automated  |
| WIP                 | [Issue tracker] | Snapshot   |

### Review Schedule

- **Weekly:** Quick anomaly check (5 min in standup)
- **Sprint:** Full review in retro (15 min)
- **Monthly:** Trend analysis with leadership (30 min)
- **Quarterly:** Organization review and goal-setting (60 min)

Copy-Paste Artifact: Monthly Metrics Review Agenda

## Monthly Engineering Metrics Review

**Date:** [Date]
**Attendees:** [Team leads, EMs, relevant stakeholders]
**Duration:** 45 minutes

### Pre-work

- [ ] Update metrics dashboard with current data
- [ ] Prepare trend charts for last 3 months
- [ ] Note any known context (holidays, major releases, incidents)

### Agenda

**1. Metrics Snapshot (10 min)**

| Metric              | This Month | Last Month | 3-Month Trend |
| ------------------- | ---------- | ---------- | ------------- |
| Deploy Frequency    |            |            |               |
| Lead Time           |            |            |               |
| Change Failure Rate |            |            |               |
| MTTR                |            |            |               |
| Cycle Time          |            |            |               |

**2. Analysis (15 min)**

- What moved significantly?
- What's the likely cause?
- Is action needed?

**3. Context (10 min)**

- What does the team say qualitatively?
- Any divergence between metrics and sentiment?
- External factors affecting the data?

**4. Actions (10 min)**
| Issue | Proposed Action | Owner | Due |
| ----- | --------------- | ----- | --- |
| | | | |

### Follow-up

- [ ] Share summary with team
- [ ] Update action tracker
- [ ] Schedule next review

Copy-Paste Artifact: Metric Investigation Template

## Metric Investigation: [Metric Name]

**Date:** [Date]
**Investigator:** [Name]

### The Signal

**Metric:** [Which metric]
**Expected:** [Baseline or target]
**Actual:** [What we observed]
**Period:** [When this occurred]
**Magnitude:** [How significant is the change?]

### Hypotheses

| Possible Cause | Supporting Evidence | Contradicting Evidence |
| -------------- | ------------------- | ---------------------- |
|                |                     |                        |
|                |                     |                        |
|                |                     |                        |

### Investigation

**Data reviewed:**

- [ ] Trend data over longer period
- [ ] Correlated metrics
- [ ] Deployment/release history
- [ ] Incident history
- [ ] Team feedback

**Root cause assessment:**
[What we believe is causing this]

### Impact

- **Who is affected:** [Teams, users, stakeholders]
- **Severity:** Low / Medium / High
- **Trend:** Improving / Stable / Degrading

### Recommendations

| Action | Priority | Owner | Timeline |
| ------ | -------- | ----- | -------- |
|        |          |       |          |

### Monitoring

- **How we'll know if it's fixed:** [Metric target]
- **Check-in date:** [Date]

Further Reading

  • Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim — The research behind DORA metrics
  • The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford — DevOps principles in narrative form
  • Measuring and Managing Performance in Organizations by Robert Austin — Why measurement often backfires and how to avoid it
  • The State of DevOps Report (annual) — Ongoing research on software delivery performance