Why Reliability Engineering That Ignores Context Will Fail Fast

by Howard Penrose | Articles, Maintenance and Reliability

Reliability Engineering

Reliability is a subordinate topic to industrial and manufacturing engineering that has been branching out into its own more and more over the past few decades, including an increase in the number of reliability engineer titles that are handed out within organizations.

The problem is that the work surrounding the reliability engineer is not what it should be, and is often just another title for maintenance manager, maintenance engineer, or planner. In the end, the opportunities surrounding the application of a true reliability engineering department are lost to the facility or company, as another box is checked off to state that ‘we are reliable.’

Reliability engineering has become a title without teeth—confused, diluted, and too often mistaken for maintenance in disguise.

To understand the problem we are facing, we must first know what we are discussing, as the variety of information available in the aftermarket space combines terms such as reliability, maintenance, physical asset management, and asset management in much the same way.

Overall, a review of the information available from various organizations and consulting groups has resulted in a confusing mishmash that perpetuates the confusion senior management, let alone those in the industry itself, are dealing with.

Wind Turbine Study to Identify Component and Maintenance Practice Modifications to Improve Operational Availability

Figure 1: Wind Turbine Study to Identify Component and Maintenance Practice Modifications to Improve Operational Availability

What is reliability? The general definition is: the probability that a system or product (system) will perform satisfactorily for a given period when used under specified operating conditions or operational context.

This means that it is a system that allows us to determine a measurable chance and confidence level that the system will be available upon demand to perform as expected, not necessarily as designed, given an operating context.

As a result, the concept of reliability allows us to determine means and methods to: know what will occur if we operate outside the design thresholds, and identify changes that would cause the system to no longer perform satisfactorily. It will enable us to perform tasks that either maintain system operation or project a time to failure based on tests or observations.

Design Study to Evaluate Hybrid Vehicle Improvements for Improved Inherent Availability

Figure 2: Design Study to Evaluate Hybrid Vehicle Improvements for Improved Inherent Availability

One of the concepts discussed within the last decade is the concept of ‘inherent reliability.’ This has been an undefined term from the design and manufacturing perspective. Still, it has recently been used in marketing terms as the ‘design reliability’ of the system, which, in industrial and design engineering, refers to the availability (inherent, achieved, and operational).

For example, when participating in the design of a hybrid tractor, the work we performed on the electric machines was not to determine an ‘inherent reliability,’ but to decide how to optimally set the ‘availability’ for a percentage of the systems to survive across a period of time to a specific level of confidence for a particular operating context.

Inherent reliability sets a glass ceiling that limits innovation—real engineering demands we challenge it, not obey it.

We can define that as the B20 at 20,000 hours, to a confidence level of 85% (not the real number, just an example). Research into components and survival can then be conducted, and the proper and cost-effective selection of materials and manufacturing processes can be developed.

However, we cannot do that using the variety of inherent reliability definitions that can be found. Finally, the concept of inherent reliability itself puts extreme limits on the reliability engineer. In effect, it creates a glass ceiling, saying that ‘you cannot pass this level of reliability’ with the system, which, by definition, is false.

Field Testing Design Study Results at General Motors Engineering Laboratory

Figure 3: Field Testing Design Study Results at General Motors Engineering Laboratory

644 Tractor Hybrid Prototype Being Prepared for Testing Circa 2010

Figure 4: 644 Tractor Hybrid Prototype Being Prepared for Testing Circa 2010

The aftermarket reliability engineer involved post-design should be involved in determining the impact of the operating context of the system outside the designer’s operating context.

A system that is designed, such as a pumping system, is developed, and information is made available to design into an application that, most likely, will not be operating within the original design concepts of the manufacturer.

A reliability engineer would be tasked with determining the operational availability of the system in the new operating context and, understanding the new operating context, identify options such as monitoring, re-engineering, or run to failure. This is where tools such as Reliability-Centered Maintenance (RCM) come into play, as well as a variety of modeling systems available to the engineer, either from the manufacturer or elsewhere.

Hybrid Tractor Testing and Improvements to Lower Reliability Components for Improvement and Aging Studies

Figure 5: Hybrid Tractor Testing and Improvements to Lower Reliability Components for Improvement and Aging Studies

One of the options available to the reliability engineer is re-engineering, or modifying, the system to meet the expected operation context and the required availability. If we are working with a product line that has a specific end date for product line replacement or retirement, then expectations can be developed for the system or its components.

When involved in the specification development of the product line, reliability engineers can determine the optimal costs associated with meeting the prescribed availability expected by the organization and provide options along with their associated financial impacts.

Once we understand the expectations, we can then take a closer look at the maintenance needs, transitioning from corrective to predictive maintenance practices. This can even occur on the design side of the system.

True reliability engineering isn’t about accepting limits—it’s about redefining them through smart redesign, context-driven specs, and cost-informed tradeoffs.

Using the example of the hybrid tractor, the electric machine selected has specific design characteristics due to the motor manufacturer’s patents.

However, based on the reliability expectations of the tractor manufacturer, it is determined that the design characteristics cannot be met through the motor manufacturer’s process, and the material selection does not meet the aging requirements in the operating context of the tractor.

The reliability engineer then works with the manufacturer to determine if motor manufacturing or design can be modified to meet the system’s availability expectations. Alternately, expectations can be modified or maintenance tasks developed to meet the manufacturer’s target.

Once the tractor arrives at the site, such as a quarry, it is quickly discovered that the operators have been tasked with moving a specific amount of product within a specified period of time, which exceeds the tractor’s capabilities, as per the manufacturer’s design context.

A review by the owner’s reliability engineer and the manufacturer’s engineer determines that the result will be increased particulate in the cooling system, a component of the hydraulic system, which would cause the motor bearings to fail or the windings to short, based on product research.

Changing the filter to a denser one would restrict the cooling system to the point where the thermal life of the machine would be similar to that of not taking any action. It has been determined that a slightly denser filter is selected, and the filter replacement maintenance task is made more frequent.

Other modifications are made to the machine’s operation to accommodate the increased workload. Under the new modifications and operating context, the operating temperature of the motors and hydraulic fluid is closely monitored to identify early defects due to wear, and the new inherent availability and overall reliability are determined.

When operating context collides with design assumptions, only real-time adaptation and engineering judgment can preserve reliability.

Sometimes the issue arises from the outcome of a root-cause failure analysis. A flywheel energy storage system was designed with special characteristics that included direct exposure of the insulation system to a vacuum.

The 0.5 MegaWatt system was initially designed to provide power for a specific amount of time at 460 Volts as the torsional energy of the flywheel was converted back to electrical energy through a control mechanism.

In order to do this, an inverter sped the motor-generator up to a speed not to exceed 7000 RPM, and maintained that speed, per the design, and the vacuum had to be held at less than 0.08 Bar for both friction and insulation resistance purposes. A sales engineer sold the system with the expectation of a longer runtime based on smaller units, which required an operating speed of 12,000 RPM.

The insulation systems began to fail within a few hundred hours of operation. Engineers decided, at this point, to increase the operating pressure to 0.4 Bar, which would have a minimal effect on the coast-down. They immediately noticed that the machines started catastrophically failing within 10s of hours. An RCFA was implemented.

Flywheel Control Trailer Testing as Part of RCFA

Figure 7: Flywheel Control Trailer Testing as Part of RCFA

A review of the original design research revealed that the maximum speed at which the machine could operate was 9000 RPM, and that specific electrical and mechanical conditions existed beyond this point that would severely impact the system’s reliability. It was also determined that increasing the pressure in the system caused additional problems per Paschen’s law, in which the winding now became conductive to ground.

Another unexpected finding was that the output filters between the drive and motor-generator were home-made, resulting in significant stress to the winding. To achieve an acceptable operational availability that would allow for a reasonable and simple payback to the owner and investors, the proposal included 9,000 RPM, the addition of sine wave filters, and returning the pressure to 0.08 Bar.

While the increase in speed would reduce the overall expected life cycle of the machine, the reliability of the system could be calculated, and owners or investors could make a business decision based on a lower return on investment or retiring the system. The purpose of reliability engineering then becomes to investigate, assist in design review, and provide data back to the owners/investors so that a decision can be made.

Modified Vacuum Pressure Tank for Insulation Studies as Part of Flywheel RCFA and Reliability Improvements

Figure 8: Modified Vacuum Pressure Tank for Insulation Studies as Part of Flywheel RCFA and Reliability Improvements

In less dramatic cases, a reliability engineer would investigate aspects such as greasing frequencies, the application of IoT systems, the development of cost-effective maintenance strategies, root cause analysis, design reviews, manufacturing process reviews, and other activities that can potentially have a significant impact on the organization.

According to the National Institute of Standards and Technology study, “Economics of Manufacturing Machinery Maintenance: A Survey and Analysis of US Costs and Benefits,” just the development of optimal maintenance practices can have a significant impact (in relation to NAICS 321-339 excluding NAICS 324 and 325 – petro-chem) per year:

Reduction of maintenance costs by up to $16.3 billion from unplanned failures plus buffer inventory costs of at least $0.9 billion, of the $74.5 billion in annual maintenance activity expenditures;
Avoid losses due to preventable maintenance issues up to $119.1 billion: $18.1 billion due to downtime; $0.8 billion due to defects; and 100.2 billion due to lost sales from delays and defects.
The reduction of an estimated 16.03 injuries and 0.05 deaths per million employees;
Advanced maintenance strategy impacts of predictive maintenance: $6.5 billion from downtime reduction and $67.3 billion in increased sales;
Additional benefits where less than 50% of the strategy is reactive maintenance: 15% less downtime, 87% lower defect rate, 66% less inventory; and,
Energy cost improvements of at least 15%.

Overall, a properly selected and applied reliability engineer or reliability engineering department will have a significant impact on the company’s bottom line. The reliability engineer has the most significant impact when tasked with developing specifications for new and repaired equipment, conducting root cause analysis, and creating optimized maintenance strategies.

Author

Howard Penrose

Howard W. Penrose, Ph.D., CMRP, CEM, CMVP, is president of MotorDoc® LLC, a Veteran-Owned Small Business. He chairs standards at American Clean Power (2022-25), previously led SMRP (2018), and has been active with IEEE since 1993. He represents the USA for CIGRE machine standards (2024-28) and serves on NEMA rail electrification standards (2024+). A former Senior Research Engineer at the University of Chicago, he’s a 5-time UAW-GM Quality Award winner. His work spans GM and John Deere hybrids, Navy machine repair, and high-temperature motors. He holds certifications in reliability, energy, M&V, and data science from Kennedy-Western, Stanford, Michigan, AWS, and IBM.
View all posts

SHARE

Recent Posts

You May Also Like

How to Empower Operators to Evaluate Abnormal Machinery Conditions

How to Empower Operators to Evaluate Abnormal Machinery Conditions

Process machines are critical to the profitability of processes. Safe, efficient, and reliable machines are essential...

Solving Lubrication Degradation Starts with Asking the Right Why

Solving Lubrication Degradation Starts with Asking the Right Why

Can Oil Fail? Within the industry, there has always been a great debate: it is not the oil that fails, but rather the...

Why Shop Floor Training Fails – And How to Build Precision Instead

Why Shop Floor Training Fails – And How to Build Precision Instead

After more than three decades in industrial engineering and asset management, and having trained or overseen the...

How a Maintenance and Engineering Manager Drives Plant Reliability

How a Maintenance and Engineering Manager Drives Plant Reliability

Reporting to the plant manager, the Maintenance and Engineering Manager ensures continuous, effective, efficient, and...

How Frontline Teams Can Systematically Drive Down MTTR in Operations

How Frontline Teams Can Systematically Drive Down MTTR in Operations

Downtime is the silent thief of productivity. While many organizations chase improvements in Mean Time Between...

How to Champion Frontline Success for Greater Reliability

How to Champion Frontline Success for Greater Reliability

A realization that appears to be taking hold in corporate America today is that the answer to increased productivity...