Learn/Runbook
PROCESSES

Runbook

A step-by-step guide for handling specific operational tasks or incidents.

By Niketa Sharma, Founder at Runframe·Last updated Mar 2026
Runbook

A step-by-step guide for handling specific operational tasks or incidents.

The "Checklist"

A Runbook is a recipe. It assumes the reader is smart but stressed. It focuses on Action.

Elements of a Good Runbook

  1. Triggers: "Use this when Alert X fires."
  2. Impact: "This issue causes 500 errors on checkout."
  3. Steps:
      1. Check Dashboard Y.
      1. If CPU > 90%, run command Z.
      1. If not, escalate to Database Team.
  4. Verification: "How do I know it's fixed?"

Runbook vs. Documentation

  • Docs: "Here is how the system works." (Read this on Tuesday morning).
  • Runbook: "Here is how to fix the system." (Read this at 3 AM on Saturday).

ExThe "Restart" Runbook

A complex microservice required a specific restart order (DB -> Cache -> App).

Impact
Engineers often guessed the order, corrupting data.
Resolution
A simple checklist runbook was created: "Step 1: Stop App. Step 2: Flush Cache. Step 3: Restart DB." Incidents became trivial.

Why Runbook Matters

Runbooks reduce cognitive load during incidents. Follow the steps instead of figuring it out live.

Good runbooks enable on-call success and faster incident resolution.

Common Pitfalls

Outdated Info
Runbooks must be "living" documents. If a runbook fails, update it immediately.
Assuming Knowledge
Detailed "ssh" commands. Don't write "Connect to the server". Write "ssh [email protected]".

How to Use Runbook

✏️
Keep Simple: Checklists work better than essays.
🔄
Update Often: Stale runbooks are worse than none.
🧪
Test During Game Days: Verify runbooks actually work.

Frequently Asked Questions

Put this into practice.