Solving Lubrication Degradation Starts with Asking the Right Why

by Sanya Mathura, Bob Latino | Articles, Lubrication, Maintenance and Reliability, Root Cause Analysis

Lubrication Degradation

Can Oil Fail?

Within the industry, there has always been a great debate: it is not the oil that fails, but rather the machine. When considering the failure of a component, the immediate thought is that it cannot perform its specified function under normal operating conditions.

If we apply this thought principle to the argument above, we can deduce that both the machine and the lubricant have failed. Although they can be classified as separate components, they are integral to each other in performing their respective functions.

Oil doesn’t fail in isolation – it fails because the system made it fail.

Ideally, any component can fail if subjected to conditions that are outside of its operating envelope. In this particular case, if a lubricant fails under certain conditions, we must investigate the source of these conditions. By following deductive reasoning, the source of these conditions can be the immediate environment or, in this situation, the machine.

We must now investigate what caused the machine to exceed its operating limits. Hence, it can be said that the great argument should perhaps be rephrased to state, “Oil can fail when its conditions induce such a failure.” Excerpted from Lubrication Degradation: Getting to the Root Causes by Sanya Mathura and Robert J. Latino.

Using RCA to Analyze How and Why Oil Can Fail

In this section, we will explore the fundamentals of constructing a logic tree to analyze how and why oil can fail. First, let’s define some terms so we’re all on the same page when interpreting some commonly confusing words.

Cause-and-Effect Logic: from level to level, it represents a cause-and-effect relationship. This does not have to be a linear relationship, as there may be (and likely are) multiple causes that must occur simultaneously to create that singular effect. We need to understand that we are simply making a graphical representation of logic to reflect the facts that happened, which ultimately led to an undesirable outcome.

Event: Typically, this is the reason you care! What brought this incident to your attention? Many believe that we do RCA on incidents themselves, but we think we do RCA on their consequences. Consider your situation; there is usually a business-level reason why we conduct an RCA, such as an injury or fatality, production loss exceeding a certain amount, maintenance cost exceeding a certain amount, a regulatory violation, and the like (often referred to as triggers). Sound familiar? These are the known FACTS.

Failure Mode: These are the typical issues that typically initiate a Root Cause Analysis (RCA), such as pump failure, injury, loss of production, or environmental excursion. These physical failures led to the Event. These are anomalies that must be explained using evidence when reconstructing the Event using RCA. These, too, MUST BE FACTS.

Hypotheses: Just like in high school, these are ‘educated guesses. These are potential causes of the preceding nodes. When exploring equipment-related failures, the initial question after identifying the Failure Mode is, ‘How Could the preceding node have occurred?’

Verifications: These are the methods by which we proved, with sound evidence, that Hypotheses were either True or Not True. Fun fact…hearsay is NOT a valid verification technique!

Physical Root Causes: These are where the physics of failure are identified. These are observable, tangible things we can see. These are usually the immediate consequences of decision errors (Human Roots).

Human Root Causes: THIS IS NOT ‘THE WHO DUNNIT’! This is the act of decision-making. These are usually errors of omission and/or commission. We did something we were not supposed to do, or we were supposed to do something and didn’t. The key here is not to blame indiscriminately and to take the opportunity to understand human reasoning.

To facilitate this thought process, when we encounter a human choice that was made, we switch our questioning from ‘How Could?’ to ‘Why?’ We are now interested in ‘why’ the decision at the time made sense (and it usually made complete sense given the working environment).

Latent/Systemic Root Causes: These are the organizational/systemic systems, cultural norms, and sociotechnical factors that influence and contribute to our decision-making reasoning. Unfortunately, our ‘systems’ are far from perfect and are always a ‘works-in-progress.’ They include, but are not limited to, our policies, procedures, training systems, purchasing systems, HR systems, incentive systems, compliance systems, and the like.

Contributing Factors: Identification of conditions that did not directly lead to the failure but created vulnerabilities allowing the failure to occur. These are usually conditions that we don’t have control over, but we can often compensate for them (if we are aware of them). For instance, some failures may only occur when it’s freezing outside. This is a condition that we cannot change, but we can compensate for it to mitigate its potential consequences.

Applying RCA Basics to the Six (6) Lubrication Degradation Mechanisms (LDM)

In our previously mentioned book, Lubrication Degradation: Getting to the Root Causes, we cover the fundamentals of Oil and RCA and then apply those principles to the six (6) primary LDMs. These are as follows:

Micro dieseling
Oxidation
Additive Depletion
Electrostatic Spark Discharge (ESP)
Contamination
Thermal Degradation

For example, we will explore a portion of a Logic Tree for the micro-dieseling mechanism. Let’s walk through the reconstruction steps.

Figure 1. Micro Dieseling Logic Tree (Part 1)

Figure 1. Micro Dieseling Logic Tree (Part 1)

The Event, in this case, is defined as the business impact on the operation. The Mode in this example is the factual functional failure of the process equipment involved that led to the Event. We ask HOW COULD that mode have occurred?

In this example, a crucial pump bearing failed due to lubrication degradation. This resulted from micro-dieseling, which was caused by entrained air moving through different pressure zones. Please note that each node should be supported by sound evidence to continue drilling down.

Figure 2. Micro Dieseling Logic Tree (Part 2)

Figure 2. Micro Dieseling Logic Tree (Part 2)

As we let the evidence direct us to where we should drill down, we are at the point where we’re asking, ‘How could entrained air move through different pressure zones?’ Our team of SME’s provides us with three hypotheses:

Air Leaks into System
Introduction of Air Within the Closed System, and
Varying Pressure Zones

Our evidence collected (in this mock example) validates that we have introduced air within our closed system. That’s the beauty of holistic RCA: when properly facilitated, it directs us to the next question. In this case, ‘How could we have introduced air into a closed system?’

Our SMEs shine once again and hypothesize either 1) churning is occurring as lubricant re-enters the sump and/or 2) air is trapped in the system.

Our evidence confirms we have air trapped in the system. At this point, I want everyone to recognize that the three dots under the various nodes represent additional hypotheses that have been explored.

However, if any node is NOT TRUE, there is no need to drill down further. It is important to keep that information on the logic tree because this will become a troubleshooting flow diagram for future analysts (lessons learned).

So, what may not be true in your RCA may be true in theirs, and they can pick up from where you left off. We aim to prevent RCA rework and capitalize on the collective learning of all analysts across the company.

OK, let’s wrap this up and put a bow on it.

Figure 3. Micro Dieseling Logic Tree (Part 3)

Figure 3. Micro Dieseling Logic Tree (Part 3)

‘How can air be trapped in the system?’ Our experts, once again, come to the rescue and suggest that 1) Lines have not been bled before use, and/or 2) Intake lines are positioned too high about the adequate sump level and introduce air into the system.

For the sake of our example, we find that both possibilities are true. Notice now that these two hypotheses have been labeled Human Roots (and turned orange). This is because they involve human choices that were made. As a result of these choices, air was trapped in the system, which is a Physical Root (purple node) because it is the first observable consequence of the decisions.

Now that we have made our choices, as promised, we will switch our questioning from ‘How Could’ to ‘Why.’ We aim to understand why it made sense to the decision-makers at the time and in that particular environment.

Figure 4. Micro Dieseling Logic Tree (Part 4)

Let’s start with the first orange node and ask, ‘Why were the lines not bled before use? This is where we delve into the sensitive human issues. Here is where we apply our knowledge of the social sciences (the human aspects) and tread lightly, as we don’t want to accuse or blame anyone.

We learn through our interviews and validations that there were two organizational or systemic system flaws (Red nodes) that influenced the decision-makers: 1) Lack of experience/training of the proper procedure for this equipment, AND 2) Less Than Adequate (LTA) procedures for bleeding lines.

To conclude this leg of the logic tree, we must now identify our Corrective Actions (green nodes) and ensure they are completed and effective.

Let’s finish the last Human Root and ask, ‘Why were the intake lines positioned too high in relation to the adequate sump level and introduced air into the system?’ Our systemic root here is that the intake line/system has a LTA design, and a redesign is necessary.

Applying Your New Logic Tree Knowledge to the Rest of the Other Lubrication Degradation Mechanisms

Now that we have learned how to construct a logic tree on an LDM Event, we should be able to read other logic trees like a blueprint and/or a troubleshooting flow diagram.

If you are interested in receiving the full-blown logic trees from our referenced book, please CONTACT US, and in the ADDITIONAL COMMENTS field, enter ‘LDM LOGIC TREES.’ That’ll be our queue to email them to you.

Authors

Sanya Mathura

Sanya Mathura is the Founder, Managing Director, and Senior Consultant at Strategic Reliability Solutions Ltd in Trinidad & Tobago. She specializes in reliability and asset management and works with global affiliates. Sanya holds a BSc in Electrical and Computer Engineering and an MSc in Engineering Asset Management and is the first ICML-certified Machinery Lubrication Engineer (MLE) in the Caribbean. She was also the first woman globally to earn the ICML Varnish badges (VIM & VPR) and Mobius FL CAT I certification. Sanya is the only registered MLE by the Board of Engineering Trinidad & Tobago. She serves on the Editorial Board of Precision Lubrication Magazine, is a digital editor for STLE’s TLT Magazine, and is a columnist for Equipment Today. Additionally, she is on the Lubricant Expo North America board and an external steward of UWI's Equality, Diversity & Inclusion Mainstreaming Committee. She is the author and co-author of six books; Lubrication Degradation Mechanisms, A Complete Guide, Lubrication Degradation – Getting into the Root Causes, Machinery Lubrication Technician (MLT) I & II Certification Exam Guide, Preventing Turbomachinery ‘Cholesterol’ – The Story of Varnish. She was assigned the Series Editor of the series including Empowering Women in STEM, Empowering Women in STEM – Personal Stories and Career Journeys from Around the World, and Empowering Women in STEM – Working Together to Inspire the Future.
View all posts
Bob Latino

Principal of Prelical Solutions, LLC and former CEO of Reliability Center, Inc. (RCI), Bob has 38+ years of global experience in Root Cause Analysis (RCA). He’s trained over 10,000 professionals in 25+ countries and co-authored ten books on RCA, FMEA, and Reliability. Bob serves on the Board of the Community of Human and Organizational Learning (CHOLearning), and is Series Editor for CRC Press’s “Reliability, Maintenance, and Safety Engineering” series.
View all posts

SHARE

Recent Posts

You May Also Like

How to Empower Operators to Evaluate Abnormal Machinery Conditions

How to Empower Operators to Evaluate Abnormal Machinery Conditions

Process machines are critical to the profitability of processes. Safe, efficient, and reliable machines are essential...

Why Reliability Engineering That Ignores Context Will Fail Fast

Why Reliability Engineering That Ignores Context Will Fail Fast

Reliability is a subordinate topic to industrial and manufacturing engineering that has been branching out into its...

Why Shop Floor Training Fails – And How to Build Precision Instead

Why Shop Floor Training Fails – And How to Build Precision Instead

After more than three decades in industrial engineering and asset management, and having trained or overseen the...

How a Maintenance and Engineering Manager Drives Plant Reliability

How a Maintenance and Engineering Manager Drives Plant Reliability

Reporting to the plant manager, the Maintenance and Engineering Manager ensures continuous, effective, efficient, and...

How Frontline Teams Can Systematically Drive Down MTTR in Operations

How Frontline Teams Can Systematically Drive Down MTTR in Operations

Downtime is the silent thief of productivity. While many organizations chase improvements in Mean Time Between...

How to Champion Frontline Success for Greater Reliability

How to Champion Frontline Success for Greater Reliability

A realization that appears to be taking hold in corporate America today is that the answer to increased productivity...