Org Blueprint
R-028HumanTechnicalP0 · CriticalUnfilled
Head of SRE
Owns reliability of the production estate — SLOs, error budgets, capacity planning, and the on-call posture.
Live Ops
Due-diligence relevance
Acquirers run scenario analysis on the on-call schedule. A single-name on-call rotation is a fragility signal. A well-instrumented SLO/error-budget posture with multi-region capacity is a positive valuation signal — it tells the acquirer the product can absorb their growth without breaking.
Responsibilities
- Set and defend SLOs and error budgets across services
- Own capacity planning and cost-of-infrastructure ratios
- Run the on-call rotation and post-incident discipline
- Coordinate with engineering on reliability investments
- Represent reliability posture to strategic acquirers
Inputs
- · Service architecture
- · Capacity metrics
- · Incident history
Outputs
- · SLO dashboard
- · Capacity plan
- · On-call rotation health
Qualifications
- Senior SRE / Production Engineering leadership
- Has run on-call at scale
- Strong post-incident discipline
KPIs
SLO breach minutesMTTRPages per on-call rotation