Roles and Staffing
How you staff incident response depends on your team size and structure. This document covers options from small teams to larger organizations.
See Incident-Response-Policy for role definitions.
Core Roles (All Team Sizes)
Every incident needs these roles filled, even if one person wears multiple hats:
| Role | Responsibility |
|---|---|
| Incident Leader | Coordinates response, assigns tasks, makes decisions |
| Scribe | Documents everything in Incident Log |
| Responders | Execute fixes, investigate, implement mitigations |
For larger incidents, add:
- Communication Manager - Handles internal/external comms
- Subject Matter Experts - Specialists for specific domains
Small Teams (2-5 people)
Approach
Everyone knows everything. Just make sure someone is always reachable.
Whether you need formal on-call depends more on your commitments (SLAs, assets held, user expectations) than team size alone. Small teams with high-value assets may still need structured coverage.
Structure
- Designate 1-2 people as default Incident Leaders (only one leads any given incident)
- Everyone else responds as needed
- Leader also serves as Scribe for minor incidents
- Separate Scribe for P1/P2 incidents
Expectations
- Keep a shared contact list (see Contacts)
- Establish one communication channel for incidents
- Someone should always be reachable (informal coverage)
What You Might Not Need
- Formal on-call rotation (unless your commitments require it)
- Separate First Responder program
- Multiple communication managers
Medium Teams (5-15 people)
Approach
Define subject matter experts. Consider a simple on-call rotation.
Structure
Subject Matter Experts (SMEs)| Domain | Primary | Backup |
|---|---|---|
| Smart Contracts | ||
| Infrastructure | ||
| Frontend | ||
| Security |
Option A: Informal
- No formal schedule, but SMEs are expected to be reachable during their working hours
- Clear escalation for after-hours: who to call first
Option B: Simple Rotation
- Weekly rotation among willing team members
- One person on-call, responsible for initial triage
- They pull in SMEs as needed
Expectations
- SMEs respond quickly when paged for their domain
- On-call person handles initial assessment and escalation
- Separate Scribe and Incident Leader for P1/P2 incidents
Larger Teams (15+ people)
Approach
Formal First Responder program with trained personnel and scheduled on-call.
First Responder Program
What First Responders Do:- Initial triage when an incident is detected
- Assess severity
- Kick off the incident response process
- Pull in the right people
- Hand off to Incident Leader
- Fix the issue themselves (unless they're also the SME)
- Make major decisions without escalation
- Distributes knowledge across the organization
- Reduces burden on any single team
- Ensures someone is always ready to start the process
- Doesn't require deep expertise in all domains
On-Call Structure
Consider parallel schedules for different domains:
| Schedule | Coverage | Rotation |
|---|---|---|
| Infrastructure | 24/7 | Weekly among 6-8 people |
| Smart Contracts | 24/7 | Weekly among 6-8 people |
ex. With 8 people per rotation, each person is on-call one week every two months.
First Responder Training
Before going on-call, complete:
- Review Incident-Response-Policy
- Review Incident Log and Post-Mortem templates
- Read 2-3 past post-mortems
- Understand basic architecture (infra and smart contracts)
- Know how to reach SMEs and Decision Makers
- Test alerting system access
On-Call Expectations
During your shift:- Keep alerting device accessible
- Respond to pages within 15 minutes
- Triage and escalate appropriately
- You don't need to fix everything. Get the right people involved
- Stay current on documentation
- Review new post-mortems
- Participate in tabletop exercises
Decision Makers
Regardless of team size, define who can make high-stakes decisions during P1 incidents:
| Role | Name | Contact |
|---|---|---|
These people should be reachable 24/7 for critical incidents. Consider:
- Founders / C-level
- Security Lead
- Engineering Lead
- Legal (for incidents with legal implications)
Tools Checklist
Ensure your on-call personnel have access to:
- Alerting system (PagerDuty, etc.)
- Communication platform (Slack, Discord, etc.)
- Video conferencing
- Monitoring dashboards
- On-call schedule
- Contacts list
Choosing Your Model
| Team Size | Recommended Approach |
|---|---|
| 2-5 | Informal coverage, designated leaders |
| 5-10 | SME structure, optional simple rotation |
| 10-15 | Simple rotation with SMEs |
| 15+ | First Responder program with parallel schedules |
Start simple and add structure as you grow. A lightweight process that people follow beats a heavyweight process that gets ignored.
See Incident-Response-Policy for how these roles work during an actual incident.