One of the things I pay close attention to in my organization is toil.
Toil refers to manual operational work that doesn’t provide leverage to the organization. It’s a drain on resources that could be eliminated with automation or changes in organizational processes.
We all know these routines—it might be manually cleaning old data from a database, copying data between systems, scanning dashboards or logs for specific metrics, supervising workflows, or deploying systems manually. The list goes on.
How is toil different from tech debt?
Toil is about repetitive operational work. In contrast, tech debt arises from limitations in software design or implementation that need to be refactored or reworked later.
Why do I monitor toil?
- It’s a waste of resources: Time spent on toil could be redirected to valuable, impactful initiatives.
- It leads to burnout: Engineers (and most of us) want to invest time in creating and adding value, not performing repetitive tasks that can be automated.
- It creates interruptions and context switches: Toil breaks focus and disrupts productivity.
Here are some tips on how to control and reduce toil:
- Identify and score manual tasks: Evaluate tasks by frequency, effort, and impact.
- Analyze processes: Determine if the task can be eliminated by changing an organizational process.
- Delegate to self-service: Educate teams to handle common tasks independently, e.g., teaching SQL or building better internal tools.
- Automate (or semi-automate): Use scripts, tools, or workflows to reduce manual effort.
- Proactive and smart alerting: Implement alerts based on health (not just liveness), anomalies, pattern detection, and series correlation.
Toil may seem small on the surface, but its impact compounds. By addressing it, you free up your team to focus on work that truly moves the needle.
