cover |
---|
assets/img/covers/incident_response_docs.png |
This documentation covers parts of the PagerDuty Incident Response process. It is a cut-down version of our internal documentation, used at PagerDuty for any major incidents, and to prepare new employees for on-call responsibilities. It provides information not only on preparing for an incident, but also what to do during and after. It is intended to be used by on-call practitioners and those involved in an operational incident response process (or those wishing to enact a formal incident response process). See the about page for more information on what this documentation is and why it exists.
!!! tip "Don't know where to start?" If you're new to incident response and don't yet have a formal process in your organization, we recommend looking at our Getting Started page for a quick list of things you can do to begin, and our Training Course page for a more detailed overview of our process.
If you've never been on-call before, you might be wondering what it's all about. These pages describe what the expectations of being on-call are, along with some resources to help you.
- Being On-Call - A guide to being on-call, both what your responsibilities are, and what they are not.
- Alerting Principles - The principles we use to determine what things page an engineer, and what time of day they page.
Reading material for things you probably want to know before an incident occurs. You likely don't want to be reading these during an actual incident.
- What is an Incident? - Before we can talk about incident response, we need to define what an incident actually is.
- Severity Levels - Information on our severity level classification. What constitutes a SEV-3 vs SEV-1? What response do they get?
- Different Roles for Incidents - Information on the roles during an incident; Incident Commander, Scribe, etc.
- Incident Call Etiquette - Our etiquette guidelines for incident calls, before you find yourself in one.
- Complex Incidents - Our guide for handling larger, more complex incidents.
Information and processes during a major incident.
- During an Incident - Information on what to do during an incident, and how to constructively contribute.
- Security Incident Response - Security incidents are handled differently to normal operational incidents.
Our followup processes, how we make sure we don't repeat mistakes and are always improving.
- After an Incident - Information on what to do after an incident is resolved.
- Post-Mortem Process - Information on our post-mortem process; what's involved and how to write or run a post-mortem.
- Post-Mortem Template - The template we use for writing our post-mortems for major incidents.
- Effective Post-Mortems - A guide for writing effective post-mortems.
So, you want to learn about incident response? You've come to the right place.
- Training Overview - An overview of our training guides and additional training material from third-parties.
- Glossary of Incident Response Terms - A collection of terms that you may hear being used, along with their definition.
- Incident Commander Training - A guide to becoming our next Incident Commander.
- Deputy Training - How to be a deputy and back up the Incident Commander.
- Scribe Training - A guide to scribing.
- Subject Matter Expert Training - A guide on responsibilities and behavior for all participants in a major incident.
- Customer Liaison Training - A guide on how to act as our public representative during an incident.
- Internal Liaison Training - A guide on how to liaise with internal teams during an incident.
- Incident Response Training Course - An introductory course on incident response and the role of the Incident Commander.
Useful material and resources from external parties that are relevant to incident response.
- Reading - Recommended reading material relevant to incident response.
- ChatOps - Description of the chat bot commands we've referenced in this documentation.
- Anti-Patterns - List of things we've tried and then rejected, learn from our mistakes.