Mastering the FRACAS Root Cause Analysis Process for Real Failure Elimination

by Reliable Media, Alison Field | Cartoons

FRACAS root cause analysis process

The cartoon captures the frustration perfectly: instead of a structured investigation, FRACAS meetings often deteriorate into sticky notes, arrows in every direction, speculation, and symptom-chasing. “ROOT CAUSE???” becomes a mystery board rather than an engineering conclusion.

A strong FRACAS root cause analysis process requires technical discipline, statistically credible reasoning, modern standards, and an honest acknowledgment of uncertainty when evidence is incomplete. Failures in industrial environments are rarely simple. Mechanisms interact, duty cycles matter, indicators vary with load, and evidence may degrade before anyone analyzes it. Effective FRACAS embraces this complexity rather than forcing simplistic, linear narratives.

Failure Progression Is Not Linear – It Is Conditional and Often Multi-Mechanism

Classic descriptions of failure sequences can imply a neat progression: initiating condition → mechanism → symptoms → failure. While this model is helpful for conceptual framing, real assets often deviate significantly from it.

Failures can include:

Concurrent degradation mechanisms (e.g., corrosion-fatigue or contamination + adhesive wear)
Latent cracks or defects producing no detectable signals until sudden failure
Intermittent conditions that disappear before monitoring detects them
Duty-cycle effects where indicators appear only under certain loads or speeds
Situations where evidence is destroyed at the moment of failure

A more accurate model within the FRACAS root cause analysis process is:

One or more initiating conditions create stressors or environmental mismatches.
Degradation mechanisms evolve, possibly in combination or non-linear interaction.
Indicators may appear depending on monitoring resolution, technique, and operational conditions.
A functional failure occurs when the asset can no longer perform a required function to a required standard (per SAE JA1011/1012 definitions), even if it has not reached a fully failed state.

Clarifying Indicators vs. Mechanistic Signatures

Indicators such as vibration, temperature rise, or oil-condition changes are signals. Some correlate strongly with mechanisms (e.g., BPFO amplitude increases with rolling-element outer-race defects), but indicators do not independently reveal initiating conditions. The FRACAS team’s role is to connect signals to physical mechanisms and upstream drivers.

Causal Chains Must Reflect Engineering Evidence, Statistical Credibility, and Uncertainty

Weak FRACAS programs rely on brainstorming. Strong ones rely on defensible causal chains that integrate physical evidence, probabilistic reasoning, and operational context.

Three Elements of a Technically Valid Causal Chain

Initiating Conditions
These may be single or multiple. Examples include off-tolerance fits, moisture ingress, incorrect torque, VFD-induced shaft voltage, thermal cycling, or transient overloads.
Importantly, not all initiating conditions can be definitively proven. DoD and NASA practices (NASA-STD-8729.1A, DLAI 3200.4) explicitly allow classification of causes as probable, suspected, or undetermined when evidence is insufficient or compromised.
Degradation Mechanisms
Mechanisms such as abrasive wear, adhesive wear, corrosion, fatigue propagation, EDM damage, or boundary lubrication all follow known physical principles. However, many real failures are mixed-mode, and distinguishing dominant contributors may require metallurgical examination, microscopy, or tribological testing.
Indicators (Context-Dependent)
Indicators must be interpreted in the context of load, speed, duty cycle, and environmental conditions. Some vibration indicators disappear under light load, while others – like particle counts or temperature drift from lubrication starvation – may persist regardless of instantaneous load.

Evidence Types Used in FRACAS

A causal chain is valid when supported by:

Physical evidence (wear patterns, particles, microscale features)
Historical evidence (CM data, alarms, process logs)
Statistically credible inference
(Weibull fits, Poisson modeling, Bayesian updating, confidence intervals)

Not all investigations yield deterministic results, and FRACAS frameworks must allow for that.

Strengthening the FRACAS Root Cause Analysis Process With Modern Standards and Appropriate Data

Strong FRACAS programs use structured methodologies aligned with modern, active standards, not outdated or withdrawn ones.

Relevant and Current Standards That Support FRACAS

These standards do not define FRACAS, but they directly support its rigor:

ISO 14224 – Governs failure data collection and taxonomy for maintenance and reliability data.
IEC 62740 – A modern, formal standard for root cause analysis methodology.
SAE JA1011/1012 – Defines functional failure logic, which FRACAS must use in system-level analysis.
NASA-STD-8729.1A – Incorporates evidence requirements and uncertainty handling for causal analysis.

Data Resolution Must Match Mechanism Time Scales

Sampling or bandwidth requirements depend entirely on the mechanism and defect geometry:

Bearing fault detection
Effective bandwidths of 5–20 kHz are common, depending on bearing size and speed.
Envelope or demodulation techniques (PeakVue*, Shock Pulse/SPM-HD*, HFD/SEE*) may use lower raw sampling with specialized signal processing.
Gear defects
Require resolution around gearmesh frequencies and associated sidebands.
Electrical transients
Microsecond-scale capture is relevant for power-quality and switching studies but not routine mechanical CM.
Thermal degradation
Hourly or daily trend data may be sufficient.

FRACAS must interpret data realistically based on method capability and mechanism behavior.

Corrective Actions Must Eliminate Initiating Conditions – Not Just Address Symptoms or Mechanisms

Many ineffective corrective actions target:

Symptoms (noise, alarms, heat)
Or the mechanism only (replacing bearings, cleaning filters, adding lubrication)

But the initiating conditions remain intact.

Corrective Action Hierarchy (Clarified and Precise)

Eliminate initiating conditions
(Off-spec fits, contamination paths, shaft voltage, inadequate lubrication design, incorrect installation torque.)
Interrupt or reduce degradation mechanisms
(Improved lubrication practices, alignment correction, filtration upgrades, and electrical filtering.)
Manage symptoms only for short-term stabilization.

Correcting the EDM Discussion for Accuracy

EDM damage commonly originates from VFD PWM switching, but:

Non-VFD motors can also suffer EDM due to asymmetry or grounding issues.
Cable construction, switching frequency, and grounding paths affect severity.

Primary mitigations (ranked by typical effectiveness):

Shaft-grounding ring/brush (e.g., AEGIS SGR)
Insulated or ceramic bearings
Common-mode chokes or dV/dt filters

Secondary/supplemental mitigations:

Shielded VFD cable
Improved motor-frame grounding (necessary, but rarely sufficient alone)

Replacing bearings without correcting the shaft voltage guarantees recurrence.

Corrective Action Validation Requires Statistical Rigor

The earlier 3–6-month guideline is too narrow. A more accurate statement:

The time required for statistically meaningful validation varies drastically:

High-criticality or fast-cycle assets: validation may occur within 1–3 repeats or via accelerated testing.
High-volume processes: statistical confidence may be established in weeks.
Low-failure-rate assets (turbines, generators): validation may require 12–60 months or substantial operating hours.
Confidence bounds (Weibull, Poisson, χ², Bayesian) should guide when validation is “sufficient,” not a fixed timeline.

FRACAS Trend Analysis Must Use Proper Statistics, Adequate Sample Sizes, and Evidence Discipline

FRACAS trend analysis is robust when datasets support it. When they don’t, trend interpretation can be dangerously misleading.

Three Trend Views That Provide Real Insight

Initiating Condition Frequency Trends
Mechanism Distribution Across Asset Classes
Corrective Action Effectiveness (Statistically Validated)

Statistical Requirements Often Overlooked

Effective FRACAS trend analysis should use:

Weibull distribution fits for time-to-failure patterns
Poisson modeling for failure counts
Confidence intervals to assess significance
Normalization for duty cycle, environment, and operating hours
Bayesian updating for low-n datasets

Evidence Preservation Matters

Investigations frequently encounter:

Premature teardown destroying wear evidence
Missing lubrication samples
CM data overwritten or unavailable
Contaminated samples compromising root-cause signatures
Incomplete or missing historical records

These limitations legitimately shift an investigation into the probable or undetermined category.

A rigorous FRACAS root cause analysis process recognizes that real-world failures involve uncertainty, context, and interacting mechanisms. By grounding investigations in physical evidence, statistically credible reasoning, appropriate standards, and validation cycles that match asset behavior, FRACAS becomes a high-precision tool for eliminating failure, not endlessly documenting it.

Authors

Reliable Media

Reliable Media simplifies complex reliability challenges with clear, actionable content for manufacturing professionals.
View all posts
Alison Field

Alison Field captures the everyday challenges of manufacturing and plant reliability through sharp, relatable cartoons. Follow her on LinkedIn for daily laughs from the factory floor.
View all posts

SHARE

Recent Posts

Reliable Directory

Find industrial contractors, distributors & integrators

15,000+ verified listings across North America

Search the Directory →

You May Also Like

Acting on Predictive Maintenance Findings Before They Crush Your Schedule

Acting on Predictive Maintenance Findings Before They Crush Your Schedule

Predictive maintenance sells itself in the boardroom because the math is clean. Sensors detect early-stage faults,...

How to Reduce Maintenance Backlog Before It Buries Your Plant Floor

How to Reduce Maintenance Backlog Before It Buries Your Plant Floor

Every backlog starts the same way. A single work order gets deferred for what feels like a defensible reason: parts...

Common Causes of Bearing Failure When Production Pushes the Limits

Common Causes of Bearing Failure When Production Pushes the Limits

When production demands climb, bearings tend to be the first component to register the change. Every plant has a story...

How to Justify Maintenance Budget Increases Without Resorting to Fear

How to Justify Maintenance Budget Increases Without Resorting to Fear

Most maintenance leaders have written a defect elimination plan that looked great on paper and got nowhere. The vision...

How to Secure Maintenance Funding When Budgets Are Under Pressure

How to Secure Maintenance Funding When Budgets Are Under Pressure

Many maintenance managers share the same frustration: the equipment is aging, the backlog is growing, and the budget...

Simple Fixes for Manufacturing Inefficiency That Save Real Money

Simple Fixes for Manufacturing Inefficiency That Save Real Money

Every manufacturing plant has a few problems everyone knows about and nobody fixes. The conveyor that jams at the same...