RCM in Action: The Method That Extends Asset Life Dramatically

by | Articles, Maintenance and Reliability

An RCM process systematically identifies all the asset’s functions and functional failures, as well as all reasonably likely failure causes. It then proceeds to determine the effects of these potential failure modes and to identify how those effects matter. Once it has gathered this information, the RCM process then selects the most appropriate asset management policy.

RCM considers all asset management options, including on-condition tasks, scheduled restoration tasks, scheduled discard tasks, failure-finding tasks, and one-time changes (to hardware design, operating procedures, personnel training, or other aspects of the asset outside the strict realm of maintenance). This consideration is unlike other maintenance development processes.

Seven Questions Addressed by RCM

Fundamentally, the RCM process seeks to answer the following seven questions in sequential order:

Functions

What are the functions and associated desired standards of performance of the asset in its present operating context (functions)?

The same hardware does not always require the same failure management policy in all installations.

The specific criteria that the process must satisfy are:

  • The operating context of the asset shall be defined.
  • All the functions of the asset/system shall be identified (all primary and secondary functions, including the functions of all protective devices).
  • All function statements shall contain a verb, an object, and a performance standard (quantified in every case where this can be done).
  • Performance standards incorporated in function statements shall be the level of performance desired by the owner or user of the asset/system in its operating context.

The operating context refers to the circumstances in which the asset is operated. The same hardware does not always require the same failure management policy in all installations. For example, a single pump in a system typically requires a different failure management policy than a pump that is one of several redundant units in the same system.

Asset Performance Management

A pump moving corrosive fluids will typically require a different policy than a pump moving benign fluids. Protective devices are often overlooked; an RCM process shall ensure that their functions are identified. Finally, the owner/user shall specify the level of performance that the maintenance program is designed to sustain.

Functional Failures

In what ways can it fail to fulfill its functions (functional failures)? This question has only one specific criterion: All the failed states associated with each function shall be identified. If functions are well-defined, listing functional failures is relatively straightforward. For example, if a function is to “keep system temperature between 50°C and 70°C,” then functional failures might be:

  • Unable to raise system temperature above ambient,
  • Unable to keep system temperature above 50°C
  • Unable to keep system temperature below 70°C.

Failure Modes

What causes each functional failure (failure modes)? In Failure Modes, Effects, and Criticality Analysis (FMECA), the term “failure mode” is used like how RCM uses the term “functional failure.” However, the RCM community uses the term “failure mode” to refer to the event that causes functional failure. The standard’s criteria for a process that identifies failure modes are:

  • All failure modes reasonably probable to cause each functional failure shall be identified.
  • The method used to decide what constitutes a “reasonably probable” failure mode shall be acceptable to the owner/user of the asset.
  • Failure modes shall be identified at a level of causation that enables the identification of an appropriate failure management policy.
  • Lists of failure modes shall include failure modes that have happened before, failure modes that are currently being prevented by existing maintenance programs, and failure modes that have not yet happened; however, they are thought to be reasonably likely (credible) in the operating context.
  • Lists of failure modes should include any event or process that is likely to cause a functional failure, including deterioration, human error, whether caused by operators or maintainers, and design defects.

RCM is the most thorough of the analytic processes that develop maintenance programs and manage physical assets. It is therefore appropriate for RCM to identify every reasonably likely failure mode.

RCM is the most thorough of the analytic processes that develop maintenance programs and manage physical assets.

Failure Effects

What happens when each of the failures occurs (failure effects)? The criteria for identifying failure effects are:

  • Failure effects shall describe what would happen if no specific task were done to anticipate, prevent, or detect the failure.
  • Failure effects include all the information needed to support the evaluation of the consequences of the failure, such as:
    • What evidence (if any) that the failure has occurred (in the case of hidden functions, what would happen if multiple failures occurred)?
    • What does it do (if anything) to kill or injure someone, or to hurt the environment?
    • What does it do (if anything) to have an adverse effect on production or operations?
    • What physical damage (if any) is caused by the failure?
    • What (if anything) must be done to restore the function of the system after the failure?

FMECA or FMEA typically describes failure effects in terms of their impact at the local level, subsystem level, and system level.

Failure Consequences

In what way does each failure matter (failure consequences)? The standard’s criteria for a process that identifies failure consequences are:

  • The assessment of failure consequences shall be carried out as if no specific task is currently being done to anticipate, prevent, or detect the failure.
  • The consequences of every failure mode shall be formally categorized as follows:
    • The consequence categorization process shall separate hidden failure modes from evident failure modes.
    • The consequence categorization process shall clearly distinguish events (failure modes and multiple failures) that have safety and/or environmental consequences from those that only have economic consequences (operational and non-operational consequences).

RCM assesses failure consequences as if no action is being taken about it. Some people are tempted to say, “Oh, that failure doesn’t matter because we always do (something), which protects us from it.” However, RCM is thorough; it checks the assumption that this action that “we always do” actually does protect them from it, and it checks the assumption that this action is worth the effort.

RCM Failure Assessment Cycle

RCM assesses failure consequences by formally assigning each failure mode to one of four categories: hidden, evident safety/environmental, evident operational, and evident non-operational. The explicit distinction

Between hidden and evident failures, performed at the outset of consequence assessment, is one of the characteristics that most clearly distinguishes RCM, as defined by Stan Nowlan and Howard Heap, from MSG-2 and earlier U.S. civil aviation processes.

Proactive Tasks

What should be done to predict or prevent each failure (proactive tasks and task intervals)? This is a complex topic, and so its criteria are presented in two groups. The first group pertains to the overall subject of selecting failure management policies. The second group of criteria pertains to scheduled tasks and intervals, which comprise proactive tasks as well as one default action (failure-finding tasks).

The criteria for selecting failure management policies are:

  • The selection of failure management policies shall be carried out as if no specific task is currently being done to anticipate, prevent, or detect the failure.
  • The failure management selection process shall take account of the fact that the conditional probability of some failure modes will increase with age (or exposure to stress), that the conditional probability of others will not change with age, and the conditional probability that others will decrease with age.
  • All scheduled tasks shall be technically feasible and worth doing (applicable and effective), and the means by which this requirement will be satisfied are set out under scheduled tasks in the failure management section.
  • If two or more proposed failure management policies are technically feasible and worth doing (applicable and effective), the most cost-effective policy shall be selected.

Scheduled tasks are tasks that are “performed at fixed, predetermined intervals, including ‘continuous monitoring’ (where the interval is effectively zero).” Scheduled tasks should be identified that fit the following criteria:

In the case of an evident failure mode that has safety or environmental consequences, the task shall reduce the probability of the failure mode to a level that is tolerable to the owner/user of the asset.

Select Failure Management Policies

In the case of a hidden failure mode that has safety or environmental consequences, the task shall reduce the probability of the hidden failure mode to an extent that reduces the likelihood of the associated multiple failures to a level tolerable to the owner/user of the asset.

In the case of an evident failure mode that does not have safety or environmental consequences, the direct and indirect costs of doing the task shall be less than the direct and indirect costs of the failure mode when measured over comparable periods of time.

In the case of a hidden failure mode where the associated multiple failures do not have safety or environmental consequences, the direct and indirect costs of doing the task shall be less than the direct and indirect costs of the multiple failures, as well as the cost of repairing the hidden failure mode, are measured over comparable periods of time.

Categories of Tasks

Three general categories of tasks are considered to be proactive in nature:

On-condition Tasks

An on-condition task is “a scheduled task used to detect a potential failure.” Such a task has many other names in the maintenance community, such as:

  • “Predictive” tasks (in contrast to “preventive” tasks, a name that these people apply to scheduled discard and scheduled restoration tasks.)
  • “Condition-based” tasks, referring to “condition-based maintenance” or CBM (again, in contrast to “time-based maintenance” or scheduled discard and scheduled restoration tasks)
  • “Condition-monitoring” tasks, since the tasks monitor the condition of the asset.

Scheduled Discard Task

The next kind of task is a scheduled discard task, defined as “a scheduled task that entails discarding an item at or before a specified age limit regardless of its condition at the time.” A scheduled discard task must be subjected to the following criteria before accepting the task:

  • There shall be a clearly defined (preferably a demonstrable) age at which there is an increase in the conditional probability of the failure mode under consideration.
  • A sufficiently large proportion of the occurrences of this failure mode shall occur after this age to reduce the probability of premature failure to a level that is tolerable to the owner or user of the asset.

Scheduled Restoration Tasks

The next kind of task is a scheduled restoration task, defined as “a scheduled task that restores the capability of an item at or before a specified interval (age limit), regardless of its condition at the time, to a level that provides a tolerable probability of survival to the end of another specified interval.” The following criteria must be applied to a scheduled restoration task before accepting the task:

  • There shall be a clearly defined (preferably a demonstrable) age at which there is an increase in the conditional probability of the failure mode under consideration.
  • The task shall restore the resistance to failure (condition) of the component to a level that is acceptable to the owner or user of the asset.
  • A sufficiently large proportion of the occurrences of this failure mode shall occur after this age to reduce the probability of premature failure to a level that is tolerable to the owner or user of the asset.

Default Actions

What should be done if a suitable proactive task cannot be identified (default actions)? This question pertains to unscheduled failure management policies: the decision to let an asset run to failure, and the decision to modify something about the asset’s operating context (such as its design or the way it is operated).

Failure-Finding Tasks

A failure-finding task is defined as “a scheduled task used to determine whether a specific hidden failure has occurred.” Failure-finding tasks usually apply to protective devices that fail without notice.

Failure-finding tasks don’t prevent breakdowns—they expose hidden failures before they cascade into bigger problems.

This task represents a transition from the sixth question (proactive tasks) to the seventh question (default actions, or actions taken in the absence of proactive tasks).

Failure-finding tasks are scheduled tasks, similar to proactive tasks. However, failure-finding tasks are not proactive. They do not predict or prevent failures. They detect failures that have already occurred, reducing the likelihood of multiple failures and the failure of a protected function while a protective device is already in a failed state.

Run to Failure

If a process offers a decision to let an asset run to failure, the following criteria should be applied before accepting the decision:

In cases where the failure is hidden and there is no scheduled task in place, the associated multiple failures shall not have safety or environmental consequences.

In cases where the failure is evident and there is no scheduled task in place, the associated failure mode shall not have safety or environmental consequences. In other words, the process must not allow its users to select “run to failure” if the failure mode, or (in the case of a hidden failure) the associated multiple failure, has safety or environmental consequences.

Author

  • Ricky Smith, CMRP, CMRT

    Ricky Smith, CMRP, CMRT is the Vice President of World Class Maintenance and a leading Maintenance Reliability Consultant with over 35 years of experience. He holds certifications such as Certified Maintenance and Reliability Professional (CMRP) and Certified Maintenance and Reliability Technician (CMRT). Ricky has worked with global companies like Coca-Cola, Honda, and Georgia Pacific, delivering expert maintenance solutions across 30 countries. His career began in the U.S. Army, advancing to leadership roles, including a position at the Pentagon as Facility Investigator for the Secretary of Defense. Ricky is also the co-author of Rules of Thumb for Maintenance and Reliability Engineers and Lean Maintenance: Reduce Costs, Improve Quality, and Increase Market Share.

    View all posts
SHARE

You May Also Like