The Missing Ingredient in RCA: Willingness to Accept Our Own Faults

by Bob Latino | Articles, Maintenance and Reliability, Root Cause Analysis

Engineer staring into a mirror with root cause analysis diagram on it

No matter where we work, we will experience failures or ‘undesirable outcomes’ of some kind. As long as we work with other humans, this will indeed be the case. These failures may manifest as production delays, injuries, customer complaints, missed deadlines, lost profits, legal claims, and the like.

To prevent the recurrence of such undesirable outcomes, we must fully understand the causes that led to them. In many of our worlds, the process for analyzing and understanding what went wrong is called Root Cause Analysis (RCA). However, for the sake of this article, call this process whatever you want: problem solving, brainstorming, troubleshooting, etc. The common denominator among these terms is that they aim to address a failure and prevent its recurrence.

Call it whatever you want – if it doesn’t explain why the failure made sense at the time, it isn’t real RCA.

Let’s get away from labels and specific industries and focus on the anatomy of a ‘failure’. Where does a failure come from? Consider this wherever you work and see if it applies.

Chart 1

Where Failures Really Begin: Management Systems

The seeds of this germination are what we will call our management (or organizational) systems. These are the rules and guidelines by which our organization operates, much like the laws of our countries that govern their functioning. Since these are created and maintained by humans, they are not flawless.

They can be insufficient, inadequate, incorrect, or even non-existent (in unforeseen situations). We refer to these management system flaws as Latent Root Causes. This is because they are always present, lying dormant and awaiting activation by the human.

When our management systems are flawed, we feed incomplete information to people who must process it to make their decisions. Ultimately, this will likely result in an inappropriate decision! We refer to these ‘decision errors’ as Human Root Causes.

When humans make a wrong decision, it is expressed in one of two ways: 1) errors of commission or 2) errors of omission. This means we either took an inappropriate action (error of commission) or failed to take an appropriate action when we should have (error of omission).

Every human error is either an action taken that shouldn’t have been – or an action not taken that should have been.

Examples are endless for these two situations, but an error of commission may be closing a valve in a manufacturing operation that should have been left open. An error of omission may be that an ER nurse improperly triaged a patient, and as a result, the patient died waiting for care in the waiting room.

When humans make decision errors, they often result in observable consequences. At this point, the error chain is not yet apparent because it is still in the decision-maker’s mind. Only after the decision is made are the consequences observable. We will refer to these consequences as Physical Root Causes.

Let’s follow through with our examples used earlier. A manufacturing plant operator turns off a valve that would have supplied water to cool an overheated process. As a result, the overheated process causes an unexpected interruption that automatically shuts down the entire operation.

In the hospital emergency room, an improperly triaged patient goes flatline, triggering a Code Blue and requiring a rapid response team to attend to the patient. The patient had an underlying condition that was not detected during the initial triage assessment, and as a result, the patient had a stroke and passed away.

In both scenarios, the consequences of the decisions became apparent.

RCA Effectiveness Starts With Looking in the Mirror

Now that we understand where failures originate and how the error chain grows, how can we make our RCA processes more effective? Why do we often conduct RCAs on the same events repeatedly? Are we not learning from the past? Is it that our RCAs just aren’t that good?

As an RCA practitioner for over 40 years across various industry sectors, my observation is that we have difficulty looking in the mirror and accepting that we could be part of the problem!

Many organizations seem content with their RCA processes when their analyses pass regulatory audits. This means the regulators are off their backs.

However, that is not an accurate measure of RCA effectiveness and is misleading. RCA effectiveness should be measured using quantifiable, meaningful bottom-line metrics that align with corporate dashboards or KPIs.

Blame-based RCA

In our hospital scenario, just because we passed an RCA audit or survey, is the patient any safer? Almost all 6,000 hospitals in the U.S. are accredited. Yet, the deaths due to medical error continue to rise (to the point that medical error is the 3rd leading killer of Americans today at over 1,000 deaths/day).

The key to RCA effectiveness is facing the truth, and, unfortunately, we are not very good at accepting it when it involves ourselves.

Why Blame-Based RCA Guarantees Repeat Failures

The ‘truth’ is embedded in the management systems we spoke about earlier. Oftentimes, we focus on the decision-makers and then levy discipline for making a poor decision. However, RCA is not about ‘who’ made the poor decision. We are more interested in why the person believed their decision was appropriate at the time. In my opinion, this is what RCA is all about!

When we examine decision-makers’ reasoning, most of the time their rationales are perfectly logical. Their decisions are most often well-intended. And more importantly, others would likely make the same decision given the same information.

If a decision made sense at the time, the failure lives in the system – not the person.

When we delve this deep, we return to the flawed management systems that provide these people with this information. These systems are designed to help our people make better decisions. When they are flawed, our systems risk failing to perform as intended.

Let us reflect on the hospital scenario described earlier. A patient presents to their local ER and is assessed by a nurse, PA, or MD. Those conducting the triage certainly did not intend for the patient to be harmed while waiting for care. What could have led them to believe that this particular patient could wait, given the acuity of the other patients in the ER? Here are just a couple of possibilities:

Inexperienced person conducting triage.
ER was overloaded, and the staff were time-pressured and understaffed.

From a management-system standpoint, if the above existed, we would need to examine more closely the systems that enabled those conditions.

Why would we have an inexperienced person conducting triage in the ER?
1. Person scheduled to do triage was unavailable due to another emergency that pulled them away (either at the hospital or a family emergency), so they pulled someone from another department that was available.
2. The person was a new hire and new to the position.
Why would we be understaffed when the ER was overloaded?
1. We did not anticipate the overload.
2. We did not have a plan in place to activate under such conditions.
3. We had a plan in place to handle the overload, but we did not follow it.
4. We had a plan in place to handle the overload and followed it, but it was obsolete. It had not been updated since the addition of new technologies and the expansion of the ER.

Indeed, this is not a comprehensive listing, but it makes the point. This is where the mirror becomes relevant.

The Mirror Test: What Effective RCA Actually Requires

The Mirror Test

What if we were the person who:

Allowed the inexperienced triage person to work in that capacity, because things were hectic and confusing at the time?
Did not update the procedure for handling an overload condition when the ER was revised and expanded?
Did not follow the procedure for an overload condition?
Trained the person conducting the triage, and they were not ready yet?

These are the sensitive issues that a true RCA would seek to understand and uncover. This is the hard part of RCA: uncovering the truth. This is where most RCAs lack depth, and people often avoid addressing these sensitive, yet necessary issues.

Think about it: if we choose to ignore these deeper issues (because it is easier and more comfortable to do so), the ‘seeds’ of failure are still embedded in our systems. This means another party will activate them at a later time, and the patient or the operation will be at risk again.

For RCAs to be genuinely effective, we must look in the mirror and face the possibility that we may have unintentionally contributed to the adverse outcome; that is the only way we will make progress. This type of openness and non-punitive environment is a key principle of a High Reliability Organization (HRO).

Remember, “We NEVER seem to have the time and budget to do things right, but we ALWAYS seem to have the time and budget to do them again!”

Author

Bob Latino

Principal of Prelical Solutions, LLC and former CEO of Reliability Center, Inc. (RCI), Bob has 38+ years of global experience in Root Cause Analysis (RCA). He’s trained over 10,000 professionals in 25+ countries and co-authored ten books on RCA, FMEA, and Reliability. Bob serves on the Board of the Community of Human and Organizational Learning (CHOLearning), and is Series Editor for CRC Press’s “Reliability, Maintenance, and Safety Engineering” series.
View all posts

SHARE

Recent Posts

Reliable Directory

Find industrial contractors, distributors & integrators

15,000+ verified listings across North America

Search the Directory →

You May Also Like

Industrial Downtime Cost Benchmarks: What Published Studies Actually Show

Industrial Downtime Cost Benchmarks: What Published Studies Actually Show

Downtime costs get quoted like gospel. That’s risky. A number that makes sense for an automotive assembly plant can...

Adding Traction to Industrial Stairs Without Pulling a Hot Work Permit

Adding Traction to Industrial Stairs Without Pulling a Hot Work Permit

Slippery grated stairs are one of the most predictable injury sources in any plant. Maintenance teams know which...

25 Red Flags That Tell You Leadership Doesn’t Actually Support Reliability

25 Red Flags That Tell You Leadership Doesn’t Actually Support Reliability

Every executive team in manufacturing claims reliability is a top priority. The slide deck says it. The wall posters...

20 Conversation Starters When Training Budgets Get Cut First and Restored Last

20 Conversation Starters When Training Budgets Get Cut First and Restored Last

Every manager swears training matters. Then Q4 hits, the numbers tighten, and the first line item to vanish from the...

The Maintenance Manager Trap Nobody Warns You About Until It’s Late

The Maintenance Manager Trap Nobody Warns You About Until It’s Late

You take the maintenance manager job because it looks like the next step. More money, a real title, a seat closer to...

Why Succession Planning Should Be Treated as a Reliability Engineering Discipline

Why Succession Planning Should Be Treated as a Reliability Engineering Discipline

In several recent conversations with colleagues and clients, the topic of succession planning keeps coming up. We see...