How an RCM Review Exposed Gaps in a Cooling Water Pump Program

by Farshad Bakhshi | Articles, Bearings, Maintenance and Reliability, Pumps

Cooling water pump

When More Maintenance Activity Does Not Mean Better Failure Control

In many organizations, when reviewing maintenance dashboards and KPIs, you often see high PM-to-EM ratios, more planned shutdowns than unplanned shutdowns, and many other attractive indicators. But do these numbers necessarily mean that equipment reliability is improving?

This case study shows that more maintenance activity does not always lead to better reliability.

The equipment evaluated in this project was a cooling water pump operating in a refinery utility system. From an operational perspective, the pump was in acceptable condition, operating in service, and supported by existing Preventive Maintenance (PM) and Condition-Based Maintenance (CBM) programs. Maintenance activities were properly recorded in the CMMS, and maintenance history was also well documented. Therefore, from the organization’s point of view, the equipment was considered to be operating under acceptable and controlled conditions.

However, after performing an RCM review of the equipment, it became clear that many of the existing maintenance activities had not been developed directly to control the pump’s actual failure modes.

Why This RCM Review Was Performed

An important point in this case study is that the RCM review did not start due to a major breakdown or following a failure investigation. The maintenance team was already reviewing critical refinery assets as part of a reliability improvement program, and this cooling water pump was selected because of its operational importance in the utility system.

The review focused on whether the existing maintenance programs were actually aligned with the equipment failure modes, operating conditions, and risk level.

In other words, the main question was not:

“Why did the equipment fail?”

The real question was:

“Are the current maintenance programs and activities really preventing the critical failure modes of the equipment?”

This difference shifted the review from a reactive maintenance mindset to a proactive engineering decision-making process.

Starting the RCM Review: From Equipment Review to Failure Mode Analysis

The RCM3 review for this equipment was performed in accordance with the principles of SAE JA1011, NAVAIR, and ISO 14224. In the first step, the equipment operating conditions, equipment boundary definition, technical documents, CMMS history, and refinery environmental conditions were reviewed.

The equipment review showed that the pump:

was installed outdoors,
was exposed to environmental dust,
operated under changing temperature and humidity conditions,
and operated in intermittent service.

In addition, in accordance with ISO 14224 principles, the equipment boundary was clearly defined, and equipment such as the electric motor was considered outside the analysis boundary for this pump.

Cooling Water Pump

Next, the equipment functions, failure modes, failure effects, and failure consequences were identified using the principles and classification structure of ISO 14224 and OREDA, ensuring the analysis aligned with standard industry and best practices.

Identifying the Main Issue: 73% Ineffective Work Orders

During the RCM project and the review of CMMS history from the previous four years, it became clear that many of the existing maintenance activities had little impact on controlling or reducing equipment failure modes.

The analysis indicated that about 73% of the work orders issued for this equipment were not contributing effectively to failure control.

The organization was performing maintenance activities, but not necessarily failure control.

This did not mean that the maintenance activities had no value. However, it showed that many activities:

had been defined as repetitive routine tasks,
did not have a direct connection to a specific failure mode,
were performed mainly based on organizational habits,
or were generated mainly to complete PM routines inside the CMMS.

In practice, the organization was performing “maintenance activities,” but not necessarily “failure control.”

This became one of the most important observations during the RCM review.

Why RCA Was Not Needed in This Case

The maintenance history showed that the equipment failures were known to the maintenance team, and the team had sufficient experience with the main equipment failure modes. The maintenance team had worked with this pump for many years, and the main failure modes had already been identified through previous operating and maintenance experience.

Maintenance effectiveness depends on choosing the right strategy for each failure mode.

The issue was not missing failure information or lack of RCA data.

The real challenge was the quality of maintenance decisions.

For example:

Which failure modes actually require PM?
Which failure modes should be controlled through CBM?
Which failure modes are not economically justified for PM?
And which failure modes can be managed under a Run-to-Failure strategy?

For this reason, the project’s focus shifted from analyzing failures after they occurred to selecting the appropriate maintenance strategy for the equipment.

Example: Bearing Failure Mode

One of the identified failure modes for this equipment was gradual bearing damage, which could increase vibration levels and reduce pump reliability over time.

In many maintenance programs, fixed periodic PM activities are typically defined for this failure mode. However, the RCM analysis showed that:

this failure mode does not have a direct safety or environmental consequence,
it can be detected before functional failure occurs,
and calendar-based PM was not considered cost-effective for this failure mode,

Based on this analysis, instead of defining a time-based PM task, the decision was made to manage this failure mode through Condition Monitoring. Run-to-Failure was applied only if early failure indications were not detected in time.

This decision is a clear example of the difference between:

“Performing more maintenance activities”

and

“Making better maintenance decisions.”

Effective Maintenance Is Not Created by More PM Tasks

This project confirmed that a high level of maintenance activity does not always lead to better failure control.

In many organizations, large numbers of work orders, high PM activity, and green KPI indicators can create the impression that equipment risks are under control, even though some of these activities may not effectively manage equipment failure modes.

The purpose of RCM is not to increase maintenance activity. The purpose is to align maintenance decisions with:

the actual failure modes,
equipment risk level,
operating conditions,
and failure consequences.

This case study showed that effective maintenance is not achieved by more maintenance activity but by clearer, more effective engineering decisions.

Better reliability begins with better decisions about how each failure mode should be controlled.

And one of the most important lessons from this project is this:

Equipment reliability is not achieved by creating more work orders.

It is the result of making the right decisions to control failure.

RCM does not improve reliability by creating more maintenance work. It improves reliability by helping organizations make better maintenance decisions.

Author

Farshad Bakhshi

Farshad Bakhshi is a Maintenance & Reliability consultant and CMMS implementation specialist with over 20 years of experience in asset-intensive industries. He helps organizations improve reliability performance through maintenance strategy, data governance, preventive maintenance optimization, and root cause analysis.
View all posts

SHARE

Recent Posts

Ludeca

Reliable Directory

Find industrial contractors, distributors & integrators

15,000+ verified listings across North America

Search the Directory →

You May Also Like

Wrench Time Benchmarks: Typical, Good, and World-Class Rates

Wrench Time Benchmarks: Typical, Good, and World-Class Rates

Ask ten maintenance managers what “good” wrench time looks like and you’ll get ten answers, most of them wrong in the...

MTBF and MTTR Benchmarks by Equipment Type: What Holds Up

MTBF and MTTR Benchmarks by Equipment Type: What Holds Up

You went looking for a number. “MTBF for a centrifugal pump.” “Typical MTTR for a gearbox.” What you found was a wall...

Planned vs Reactive Maintenance Spend Ratios: Reality Check

Planned vs Reactive Maintenance Spend Ratios: Reality Check

Walk into any maintenance conference and you'll hear it: “World-class is 85% planned and 15% reactive.” Or “80/20.” Or...

Bearing Failure Cause Statistics: What the Numbers Show

Bearing Failure Cause Statistics: What the Numbers Show

Somebody, somewhere, is right now putting “36% of bearing failures are caused by lubrication” on a slide. The number...

OEE Benchmarks by Industry: What World Class Really Means

OEE Benchmarks by Industry: What World Class Really Means

Eighty-five percent. It's the most quoted number in manufacturing, and most of the time it's quoted wrong. People drop...

Maintenance Cost as a Percent of RAV: Benchmarks by Industry

Maintenance Cost as a Percent of RAV: Benchmarks by Industry

Maintenance cost as a percent of RAV is the most-cited number in maintenance benchmarking, and one of the most abused....