Engineering Metrics¶
Engineering metrics help you understand how effectively your teams deliver value. They reveal patterns in your delivery system, surface bottlenecks, and provide a foundation for improvement conversations. Used well, they help you ship better software faster. Used poorly, they create bureaucracy and gaming.
This page covers the core delivery metrics most engineering teams should track, how to implement and review them, and how to avoid the common failure modes.
What Problem This Solves¶
Engineering delivery is complex and often opaque. Without measurement:
You cannot see patterns. Is your team getting faster or slower? Is quality improving? Are certain areas of the codebase consistently problematic? Without data, you are guessing.
Conversations become subjective. "We're slow" versus "We're doing fine" becomes an argument of competing perceptions. Metrics provide shared ground.
Improvement lacks feedback. You try something new—smaller PRs, more pairing, different tooling—but you cannot tell if it helped.
Problems hide until they explode. Degradation happens gradually. You do not notice cycle time creeping up or reliability declining until it becomes a crisis.
Good engineering metrics make these patterns visible so you can address them before they become severe.
When to Measure¶
Actively invest in metrics when:
- You are establishing a new team and want to understand baseline performance
- You suspect delivery problems but lack data to confirm
- Leadership is asking questions you cannot answer with confidence
- You want to evaluate whether an improvement effort is working
- You are scaling and need visibility across multiple teams
Maintain when:
- Things are working—metrics are health checks, not deep investigations
- Onboarding new people who need to understand team performance
Investigate when:
- Metrics move unexpectedly in either direction
- Metrics and team sentiment diverge (numbers look good but team feels bad)
- External stakeholders express concern about delivery
Ownership¶
| Role | Responsibility |
|---|---|
| Engineering Manager | Owns metric visibility and review cadence; addresses systemic issues revealed by metrics |
| Tech Lead | Interprets technical implications; proposes improvement actions |
| Platform/DevOps | Provides metric infrastructure; ensures data accuracy |
| Individual Contributors | Understands what metrics mean; contributes to improvement |
Metrics are for teams, not individuals
Never use engineering metrics to evaluate individual performance. This destroys collaboration, encourages gaming, and makes the metrics useless. Keep metrics at the team level.
Core Metrics: DORA¶
The DORA (DevOps Research and Assessment) metrics are the most validated predictors of software delivery performance. Research consistently shows they correlate with organizational performance and team wellbeing.
The Four Key Metrics¶
| Metric | Definition | What It Indicates |
|---|---|---|
| Deployment Frequency | How often you deploy to production | Ability to ship incrementally |
| Lead Time for Changes | Time from code commit to production | Speed of delivery |
| Change Failure Rate | Percentage of deployments causing failures | Quality of releases |
| Mean Time to Recovery (MTTR) | Time from incident detection to resolution | Resilience and recovery capability |
Why These Four¶
DORA metrics capture both speed and stability. The research shows that high performers are fast and stable—these are not trade-offs. Teams that deploy frequently have lower failure rates and recover faster. The metrics reinforce each other.
Performance Benchmarks¶
| Level | Deploy Frequency | Lead Time | Change Failure Rate | MTTR |
|---|---|---|---|---|
| Elite | Multiple times/day | < 1 hour | 0-15% | < 1 hour |
| High | Weekly to daily | 1 day - 1 week | 16-30% | < 1 day |
| Medium | Monthly to weekly | 1 week - 1 month | 16-30% | 1 day - 1 week |
| Low | Monthly+ | 1-6 months | 16-30% | 1 week - 1 month |
Use benchmarks as reference, not gospel. Your context matters. A startup and a regulated healthcare system have different acceptable risk profiles. Compare to yourself over time, not just to industry benchmarks.
Flow Metrics¶
Beyond DORA, flow metrics help you understand how work moves through your system.
Cycle Time¶
Definition: Time from when work starts to when it is done (typically "In Progress" to "Done" or "Deployed").
Why it matters: Long cycle time means slow feedback, high risk per change, and work that sits "in flight" too long. It is one of the most actionable metrics because so many practices affect it.
What affects cycle time:
- Work item size (smaller is faster)
- WIP limits (lower WIP reduces queue time)
- Handoffs and waiting
- Review bottlenecks
- Deploy frequency
Typical targets: 2-5 days for most teams. If median cycle time exceeds 10 days, investigate.
Throughput¶
Definition: Number of work items completed per unit time.
Why it matters: Throughput indicates team capacity and is useful for forecasting. Combined with cycle time, it reveals whether you are doing fewer things faster (good) or more things slower (concerning).
Caution: Do not optimize throughput by making items artificially small. Count value delivered, not tickets closed.
Work in Progress (WIP)¶
Definition: Number of items currently being worked on.
Why it matters: High WIP correlates with longer cycle time (Little's Law: Cycle Time ≈ WIP / Throughput). High WIP also means more context switching and cognitive load.
Typical targets: WIP should not exceed 1.5-2x the number of people working. If you have 5 engineers, WIP above 10 is a red flag.
Flow Efficiency¶
Definition: Active time divided by total time. How much of cycle time is spent actually working versus waiting.
Why it matters: Most work spends more time waiting (in queues, in review, blocked) than being actively worked on. Flow efficiency reveals this.
Typical numbers: 15-40% is common. Below 15% indicates a waiting problem worth investigating.
Quality Metrics¶
Quality metrics reveal whether your delivery is sustainable or building up problems.
Change Failure Rate¶
Definition: Percentage of deployments that cause a production failure requiring rollback, hotfix, or incident response.
Why it matters: This is part of DORA but worth highlighting. High failure rate means your quality gates are not catching problems. Low failure rate gives confidence to deploy frequently.
Typical targets: Under 15% for elite performance. If above 30%, invest in testing, staging validation, or deployment strategies.
Escaped Defects¶
Definition: Bugs discovered in production (by users or monitoring) versus bugs caught before production.
Why it matters: Where you find bugs matters as much as how many you find. Bugs caught in development are cheap. Bugs caught in production are expensive.
What to track: Count of production bugs by severity. Trend over time. Ratio of production bugs to pre-production bugs.
Test Coverage (with Caveats)¶
Definition: Percentage of code executed by tests.
Why it matters (with caveats): Coverage alone is a poor metric—you can have 80% coverage with meaningless tests. But declining coverage suggests new code is not being tested. And very low coverage in critical areas is a risk.
How to use it: Track coverage trends, not absolute numbers. Require coverage for critical paths. Do not set coverage targets that encourage writing tests for coverage rather than confidence.
Incident Rate¶
Definition: Number of incidents per time period, often segmented by severity.
Why it matters: Increasing incident rate suggests reliability problems. Decreasing incident rate (alongside stable or increasing deploy frequency) suggests quality is improving.
Implementing Metrics¶
Data Sources¶
| Metric | Typical Source |
|---|---|
| Deploy frequency | CI/CD system (GitHub Actions, Jenkins, etc.) |
| Lead time | Version control + CI/CD timestamps |
| Change failure rate | Incident tracking system + deployment correlation |
| MTTR | Incident tracking timestamps |
| Cycle time | Issue tracker workflow timestamps |
| Throughput | Issue tracker |
| WIP | Issue tracker snapshot |
| Test coverage | CI coverage reports |
| Incident rate | Incident tracking system |
Building Dashboards¶
Start simple. A spreadsheet updated weekly is better than an elaborate dashboard nobody looks at.
Minimum viable dashboard:
- DORA metrics (four numbers, trended over time)
- Cycle time (median, trended)
- Current WIP
- Incident count by severity
Evolution path:
- Start with manual collection if automated tooling is not available
- Automate data collection as you confirm the metrics are useful
- Add drill-down capability once you understand what questions you need to answer
- Build team-specific views once you have multiple teams
Review Cadence¶
Metrics need regular review to be useful. Without review, dashboards become decoration.
Weekly (Team Level)¶
- Quick check on cycle time and WIP
- Any anomalies worth investigating?
- Takes 5 minutes in standup or async
Sprint/Iteration (Team Level)¶
- Review DORA metrics and cycle time
- Discuss any changes from previous period
- Connect to retro actions—did experiments help?
- Takes 10-15 minutes in retro or planning
Monthly (Team + Leadership)¶
- Trend analysis across multiple sprints
- Compare to previous quarter
- Strategic decisions about process investment
- Takes 30-45 minutes
Quarterly (Organization Level)¶
- Cross-team comparison (for patterns, not ranking)
- Investment decisions (tooling, process, capacity)
- Goal-setting for next quarter
- Takes 60-90 minutes
What Good Looks Like¶
| Signal | What It Looks Like |
|---|---|
| Metrics inform decisions | "We are investing in CI because lead time has increased 40% over two quarters" |
| Trends are visible | Anyone can pull up a chart showing how metrics have changed |
| No gaming | Teams focus on actual improvement, not making numbers look good |
| Balance maintained | Speed and stability are both tracked; neither sacrificed |
| Context included | Metrics are discussed alongside qualitative context |
| Actions follow signals | When a metric degrades, the team investigates and responds |
Failure Modes and Mitigations¶
Gaming the Metrics¶
Symptom: Cycle time improves but value delivered does not. People split work into tiny items or close things prematurely.
Root cause: Metric became a target instead of a signal. Pressure to hit numbers.
Mitigation: Pair metrics with outcome measures (user satisfaction, business impact). Never set numeric targets tied to evaluation.
Dashboard Graveyard¶
Symptom: Dashboards exist but nobody looks at them. Metrics are technically available but not used.
Root cause: No review rhythm. Metrics not connected to decisions.
Mitigation: Schedule recurring reviews. Start each review with "what decisions could this inform?"
Context-Free Comparison¶
Symptom: Teams compared on metrics without accounting for different contexts (team size, domain complexity, tech debt).
Root cause: Metrics used to rank rather than understand.
Mitigation: Compare teams to their own history, not each other. Use metrics to spark questions, not assign blame.
Measurement Overload¶
Symptom: Dozens of metrics, none clearly important. Analysis paralysis.
Root cause: Adding metrics without retiring any. Fear of missing something.
Mitigation: Limit core metrics to 5-7. Ask "what decision does this inform?" for every metric. Sunset unused ones.
Speed Without Stability¶
Symptom: Deploy frequency increases but so does incident rate. Fast but broken.
Root cause: Optimizing one DORA metric without the others.
Mitigation: Always review DORA metrics together. Speed and stability should improve together or neither should be pushed.
Copy-Paste Artifact: Metrics Dashboard Spec¶
## Engineering Metrics Dashboard
**Team:** [Name]
**Last updated:** [Date]
### DORA Metrics
| Metric | This Week | Last Week | Trend | Target |
| -------------------- | ------------ | ------------ | ----- | ----------------- |
| Deployment Frequency | \_\_\_/week | \_\_\_/week | ↑/↓/→ | [e.g., daily] |
| Lead Time (median) | \_\_\_ days | \_\_\_ days | ↑/↓/→ | [e.g., < 3 days] |
| Change Failure Rate | \_\_\_% | \_\_\_% | ↑/↓/→ | [e.g., < 15%] |
| MTTR (median) | \_\_\_ hours | \_\_\_ hours | ↑/↓/→ | [e.g., < 4 hours] |
### Flow Metrics
| Metric | Current | Trend | Notes |
| ------------------- | ------------ | ----- | ------------------------ |
| Cycle Time (median) | \_\_\_ days | ↑/↓/→ | |
| Throughput | \_\_\_/week | ↑/↓/→ | |
| WIP | \_\_\_ items | ↑/↓/→ | Target: < [2x team size] |
| Flow Efficiency | \_\_\_% | ↑/↓/→ | |
### Quality Metrics
| Metric | This Period | Previous | Trend |
| ------------------------------ | ----------- | -------- | ----- |
| Incidents (SEV1/2) | \_\_\_ | \_\_\_ | ↑/↓/→ |
| Escaped Defects | \_\_\_ | \_\_\_ | ↑/↓/→ |
| Test Coverage (critical paths) | \_\_\_% | \_\_\_% | ↑/↓/→ |
### Data Sources
| Metric | Source | Collection |
| ------------------- | --------------- | ---------- |
| Deploy frequency | [CI/CD tool] | Automated |
| Lead time | [Git + CI/CD] | Automated |
| Change failure rate | [Incident tool] | Manual tag |
| MTTR | [Incident tool] | Automated |
| Cycle time | [Issue tracker] | Automated |
| Throughput | [Issue tracker] | Automated |
| WIP | [Issue tracker] | Snapshot |
### Review Schedule
- **Weekly:** Quick anomaly check (5 min in standup)
- **Sprint:** Full review in retro (15 min)
- **Monthly:** Trend analysis with leadership (30 min)
- **Quarterly:** Organization review and goal-setting (60 min)
Copy-Paste Artifact: Monthly Metrics Review Agenda¶
## Monthly Engineering Metrics Review
**Date:** [Date]
**Attendees:** [Team leads, EMs, relevant stakeholders]
**Duration:** 45 minutes
### Pre-work
- [ ] Update metrics dashboard with current data
- [ ] Prepare trend charts for last 3 months
- [ ] Note any known context (holidays, major releases, incidents)
### Agenda
**1. Metrics Snapshot (10 min)**
| Metric | This Month | Last Month | 3-Month Trend |
| ------------------- | ---------- | ---------- | ------------- |
| Deploy Frequency | | | |
| Lead Time | | | |
| Change Failure Rate | | | |
| MTTR | | | |
| Cycle Time | | | |
**2. Analysis (15 min)**
- What moved significantly?
- What's the likely cause?
- Is action needed?
**3. Context (10 min)**
- What does the team say qualitatively?
- Any divergence between metrics and sentiment?
- External factors affecting the data?
**4. Actions (10 min)**
| Issue | Proposed Action | Owner | Due |
| ----- | --------------- | ----- | --- |
| | | | |
### Follow-up
- [ ] Share summary with team
- [ ] Update action tracker
- [ ] Schedule next review
Copy-Paste Artifact: Metric Investigation Template¶
## Metric Investigation: [Metric Name]
**Date:** [Date]
**Investigator:** [Name]
### The Signal
**Metric:** [Which metric]
**Expected:** [Baseline or target]
**Actual:** [What we observed]
**Period:** [When this occurred]
**Magnitude:** [How significant is the change?]
### Hypotheses
| Possible Cause | Supporting Evidence | Contradicting Evidence |
| -------------- | ------------------- | ---------------------- |
| | | |
| | | |
| | | |
### Investigation
**Data reviewed:**
- [ ] Trend data over longer period
- [ ] Correlated metrics
- [ ] Deployment/release history
- [ ] Incident history
- [ ] Team feedback
**Root cause assessment:**
[What we believe is causing this]
### Impact
- **Who is affected:** [Teams, users, stakeholders]
- **Severity:** Low / Medium / High
- **Trend:** Improving / Stable / Degrading
### Recommendations
| Action | Priority | Owner | Timeline |
| ------ | -------- | ----- | -------- |
| | | | |
### Monitoring
- **How we'll know if it's fixed:** [Metric target]
- **Check-in date:** [Date]
Further Reading¶
- Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim — The research behind DORA metrics
- The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford — DevOps principles in narrative form
- Measuring and Managing Performance in Organizations by Robert Austin — Why measurement often backfires and how to avoid it
- The State of DevOps Report (annual) — Ongoing research on software delivery performance
Related¶
- Metrics in Execution — Connecting metrics to daily work
- Quality and CI — The practices that drive quality metrics
- Team Health Metrics — The human side of measurement
- Continuous Improvement — Acting on what metrics reveal