Org Blueprint
R-028HumanTechnicalP0 · CriticalUnfilled

Head of SRE

Owns reliability of the production estate — SLOs, error budgets, capacity planning, and the on-call posture.

Live Ops

Due-diligence relevance

Acquirers run scenario analysis on the on-call schedule. A single-name on-call rotation is a fragility signal. A well-instrumented SLO/error-budget posture with multi-region capacity is a positive valuation signal — it tells the acquirer the product can absorb their growth without breaking.

Responsibilities

  • Set and defend SLOs and error budgets across services
  • Own capacity planning and cost-of-infrastructure ratios
  • Run the on-call rotation and post-incident discipline
  • Coordinate with engineering on reliability investments
  • Represent reliability posture to strategic acquirers

Inputs

  • · Service architecture
  • · Capacity metrics
  • · Incident history

Outputs

  • · SLO dashboard
  • · Capacity plan
  • · On-call rotation health

Qualifications

  • Senior SRE / Production Engineering leadership
  • Has run on-call at scale
  • Strong post-incident discipline

KPIs

SLO breach minutesMTTRPages per on-call rotation

Interfaces