Incident Management
The process of identifying, responding to, and resolving IT incidents to restore normal service.
Incident Management is the end-to-end process of detecting, alerting, triaging, resolving, and learning from IT incidents. A robust incident management process includes on-call rotation, escalation policies, runbooks, post-mortems, and status page communication. Modern incident management tools automate alerting, route incidents to the right responder, and provide collaboration features for distributed teams.
Related Category
Compare Incident Management tools →Related Terms
The average time it takes to recover a system or service after a failure.
The average time it takes to detect a problem or failure in a system.
A rotation where engineers are available outside normal hours to respond to incidents.
A blameless analysis conducted after an incident to understand causes and prevent recurrence.