Metrics

What gets measured gets improved. We use a balanced set of metrics to understand our engineering health.

DORA Metrics

The DORA (DevOps Research and Assessment) team at Google identified four key metrics that predict software delivery performance.

1. Deployment Frequency

How often do we deploy to production?

LevelFrequencyDescription
๐ŸŸข EliteOn-demand (multiple times per day)Changes flow to production quickly and safely
๐ŸŸก HighBetween once per day and once per weekRegular, predictable releases
๐ŸŸ  MediumBetween once per week and once per monthSlower release cycles
๐Ÿ”ด LowBetween once per month and once every 6 monthsBig bang releases, high risk

Why it matters: Higher frequency means smaller changes, less risk, faster feedback.

How to improve:

  • Smaller batch sizes
  • Feature flags for incomplete work
  • Automated deployment pipelines
  • Reduce manual approval steps

2. Lead Time for Changes

How long does it take from commit to production?

LevelLead TimeDescription
๐ŸŸข EliteLess than one hourRapid feedback and iteration
๐ŸŸก HighBetween one day and one weekReasonable pace
๐ŸŸ  MediumBetween one week and one monthSlower iteration
๐Ÿ”ด LowBetween one month and six monthsLong delays

Why it matters: Shorter lead time means faster value delivery and quicker learning.

How to improve:

  • Fast builds and tests
  • Parallel pipeline stages
  • Automated quality gates
  • Reduce manual handoffs

3. Change Failure Rate

What percentage of deployments cause failures?

LevelFailure RateDescription
๐ŸŸข Elite0-15%Most changes succeed
๐ŸŸก High0-15%Acceptable failure rate
๐ŸŸ  Medium16-30%Frequent issues
๐Ÿ”ด Low16-30% or higherUnreliable deployments

Why it matters: Lower failure rate means more confidence in deployments.

How to improve:

  • Comprehensive testing
  • Incremental rollouts
  • Canary deployments
  • Automated rollbacks

4. Time to Restore Service

How quickly can we recover from failures?

LevelRecovery TimeDescription
๐ŸŸข EliteLess than one hourRapid recovery
๐ŸŸก HighLess than one daySame-day recovery
๐ŸŸ  MediumLess than one weekSlower recovery
๐Ÿ”ด LowMore than one weekExtended outages

Why it matters: Faster recovery reduces impact on customers and business.

How to improve:

  • Good monitoring and alerting
  • Runbooks for common issues
  • Automated remediation
  • Chaos engineering practice

Balanced Metrics

DORA metrics tell us about delivery performance. We also track:

Quality Metrics

MetricTargetWhy
Test Coverage>80%Confidence in changes
Code Review Time<24 hoursFast feedback
Security Vulnerabilities0 critical/highSecurity first
Technical Debt Ratio<10%Sustainable codebase

Productivity Metrics

MetricTargetWhy
Build Time<10 minutesFast feedback
Time to First PR<3 daysReduced WIP
Developer Satisfaction>4/5Retention and motivation
Onboarding Time<2 weeksTeam scalability

Reliability Metrics

MetricTargetWhy
Uptime>99.9%Availability
MTBF (Mean Time Between Failures)>30 daysStability
Error Rate<0.1%Quality in production
Alert Fatigue<2 false alerts/weekSustainable operations

Using Metrics Wisely

Do’s โœ…

  • Use metrics to guide improvement, not punish
  • Look at trends, not single data points
  • Combine multiple metrics for a balanced view
  • Share metrics transparently with the team
  • Review metrics regularly in retrospectives

Don’ts โŒ

  • Do not use metrics to compare teams unfairly
  • Do not optimize a single metric at the expense of others
  • Do not ignore context when interpreting metrics
  • Do not make metrics a target (Goodhart’s Law)

Dashboard Example

A good engineering dashboard shows:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Engineering Health Dashboard                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  DORA Metrics                                    โ”‚
โ”‚  โ”œโ”€โ”€ Deployment Frequency:    12/day  ๐ŸŸข        โ”‚
โ”‚  โ”œโ”€โ”€ Lead Time:               2 hours ๐ŸŸข        โ”‚
โ”‚  โ”œโ”€โ”€ Change Failure Rate:     8%      ๐ŸŸข        โ”‚
โ”‚  โ””โ”€โ”€ Time to Restore:         15 min  ๐ŸŸข        โ”‚
โ”‚                                                  โ”‚
โ”‚  Quality                                         โ”‚
โ”‚  โ”œโ”€โ”€ Test Coverage:           85%     โœ…        โ”‚
โ”‚  โ””โ”€โ”€ Open Vulnerabilities:    2       โš ๏ธ         โ”‚
โ”‚                                                  โ”‚
โ”‚  Productivity                                    โ”‚
โ”‚  โ”œโ”€โ”€ Build Time:              8 min   โœ…        โ”‚
โ”‚  โ””โ”€โ”€ Developer Satisfaction:  4.2/5   โœ…        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Resources