Maintenance and Reliability Glossary

About this glossary: An independent, regularly updated reference of maintenance and reliability terminology covering 50 essential terms across maintenance strategies, reliability metrics, methodologies, condition monitoring techniques, software systems, and operational roles. Built for maintenance professionals, reliability engineers, students, and anyone needing clear definitions of industry terminology.

Quick Navigation

Maintenance Fundamentals – strategies and approaches
Reliability Metrics – KPIs and measurement
Methodologies and Frameworks – RCM, TPM, FMEA, and more
Software and Systems – CMMS, EAM, APM
Roles and Operations – people and processes
Condition Monitoring Techniques – predictive maintenance technologies

Maintenance Fundamentals

Preventive Maintenance (PM)

Maintenance performed at scheduled intervals – based on calendar time, runtime hours, or production cycles – to prevent equipment failure before it occurs.

PM tasks are planned in advance and follow defined procedures regardless of equipment condition at the time of execution. Common examples include scheduled lubrication, filter changes, belt replacements, and inspection routines. PM is most effective when applied to assets with predictable failure patterns and least effective when applied indiscriminately to assets with random failure modes – a common reason PM programs underperform expectations.

Corrective Maintenance

Maintenance performed to restore equipment to working order after a failure or detected defect.

Corrective maintenance includes both immediate emergency repairs and planned corrective work scheduled after a problem is identified but before failure occurs. The distinction between corrective and reactive maintenance is important: corrective work can be planned and prepared in advance, while reactive work is performed under emergency conditions. Mature maintenance programs minimize reactive maintenance while accepting that some level of corrective maintenance is normal and unavoidable.

Reactive Maintenance

Maintenance performed only after equipment has failed or stopped functioning, with no preventive or predictive activities to anticipate the failure.

Also called breakdown maintenance or fire-fighting maintenance. Reactive maintenance is the most expensive and disruptive maintenance approach because failures occur at unpredictable times, often require expedited parts and labor at premium prices, and frequently cause secondary damage to other components. Industry studies typically estimate reactive maintenance costs three to five times more per repair than the same work performed proactively.

Predictive Maintenance (PdM)

Maintenance triggered by equipment condition data – vibration readings, thermal images, oil analysis, ultrasonic measurements, or motor current signatures – that indicates an emerging problem before failure occurs.

PdM uses condition monitoring technology to predict the optimal time to perform maintenance, enabling planned intervention before failure rather than fixed-interval preventive work or post-failure reactive repair. Modern PdM increasingly incorporates machine learning and IoT sensor data to automate condition assessment and forecast remaining useful life.

Condition-Based Maintenance (CBM)

Maintenance performed based on real-time or near-real-time monitoring of equipment condition rather than fixed intervals.

CBM is closely related to predictive maintenance but typically refers to the broader strategy of using condition data – including manual inspections – to drive maintenance decisions. CBM programs can use simple operator rounds with checklists, scheduled inspections by reliability technicians, or fully automated continuous monitoring systems, depending on asset criticality and economics.

Run-to-Failure

A deliberate maintenance strategy in which equipment is operated until failure occurs, then repaired or replaced.

Run-to-failure is appropriate for low-criticality assets where the cost of preventive maintenance exceeds the cost of failure, such as easily replaceable consumables or redundant systems. Run-to-failure is a legitimate strategy when consciously chosen – it becomes a problem when it is the default outcome of inadequate maintenance planning rather than an explicit decision based on asset criticality. Within RCM methodology, run-to-failure is acceptable only for failure modes with non-operational consequences. See our RCM methodology guide for the full consequence-based framework.

Maintenance Strategy

The overall approach an organization uses to manage equipment maintenance, typically combining preventive, predictive, condition-based, and run-to-failure tactics applied differently across asset categories based on criticality, failure modes, and economics.

Mature maintenance organizations use formal strategy frameworks like Reliability-Centered Maintenance (RCM) to assign appropriate strategies to different asset types rather than applying one approach universally. The “right” strategy varies by asset and circumstance – there is no single correct maintenance strategy. See our RCM methodology guide for a structured framework for selecting the right strategy for each asset.

Planned Maintenance

Any maintenance work that is scheduled in advance, with required parts, tools, procedures, and labor identified before execution.

Planned maintenance includes preventive, predictive, and planned corrective work – distinguished from unplanned reactive maintenance. The percentage of work that is planned versus unplanned is one of the most important indicators of maintenance maturity. Best-in-class organizations typically achieve 80% or more of work being planned, while reactive organizations may have less than 50%.

Unplanned Maintenance

Maintenance performed without prior scheduling, typically in response to unexpected equipment failure or urgent production needs.

High levels of unplanned maintenance indicate immature reliability programs and lead to higher costs, longer downtime, and increased safety risk. Reducing unplanned maintenance is one of the primary objectives of reliability improvement initiatives, achieved through better predictive monitoring, improved planning, and root cause elimination of recurring failures.

Proactive Maintenance

A maintenance approach focused on identifying and eliminating root causes of equipment failure rather than addressing symptoms.

Proactive maintenance combines predictive monitoring with reliability engineering practices to extend equipment life and reduce overall maintenance burden. Proactive maintenance treats individual equipment failures as data points to investigate rather than just problems to solve, using techniques like root cause analysis to systematically reduce future failures.

Reliability Metrics

MTBF (Mean Time Between Failures)

The average time an asset operates between failures, calculated as total operating time divided by number of failures during that period.

MTBF is one of the most widely used reliability metrics and is typically expressed in hours or days. Higher MTBF indicates better reliability. MTBF is most meaningful when applied to large populations of similar assets over significant time periods – applying MTBF to a single asset over short periods often produces misleading results because failure intervals vary significantly. See our MTBF and MTTR methodology guide for the full formula, worked examples, and common calculation mistakes.

MTTR (Mean Time to Repair)

The average time required to repair a failed asset and return it to service, calculated as total repair time divided by number of repairs.

MTTR measures maintainability and is a key indicator of how quickly an organization responds to and resolves equipment failures. Lower MTTR indicates better maintainability. MTTR is influenced by parts availability, technician skill, accessibility of equipment, and quality of maintenance procedures – improving any of these factors typically reduces MTTR. See our MTBF and MTTR methodology guide for the full formula, the important MTTR-versus-MDT distinction, and worked examples.

MTTF (Mean Time to Failure)

The average time a non-repairable asset operates before failure, calculated as total operating time divided by number of failures.

MTTF applies to single-use components or assets that are replaced rather than repaired, while MTBF applies to assets that are repaired and returned to service. The distinction matters for spare parts planning and for assets where replacement cost is similar to repair cost.

MTBR (Mean Time Between Replacements)

The average time between component replacements for a given asset, used particularly for components that are replaced as a maintenance action rather than repaired.

MTBR is useful for spare parts planning and lifecycle cost analysis. For example, MTBR for a centrifugal pump’s mechanical seal helps determine inventory levels, replacement budgets, and optimal preventive replacement intervals.

Availability

The percentage of time an asset is operating or capable of operating when required, calculated as MTBF divided by the sum of MTBF and MTTR.

Availability is one of the three components of OEE and is a primary measure of equipment effectiveness in production environments. Availability of 95% means an asset is ready when needed 95% of the time. Availability is distinct from reliability – an asset can have high availability through fast repairs even if it fails frequently. See our MTBF and MTTR methodology guide for the inherent versus operational availability distinction.

Reliability

The probability that an asset will perform its intended function for a specified period under stated operating conditions.

Reliability is typically expressed as a percentage and is closely related to but distinct from availability – reliability measures freedom from failure, while availability measures readiness to operate. An asset can have high reliability and low availability if repairs take a long time when failures do occur, or low reliability and high availability if repairs are fast.

Maintainability

The ease and speed with which an asset can be restored to operating condition after a failure.

Maintainability is influenced by equipment design, accessibility, parts availability, technician skill, and procedural quality, and is measured primarily through MTTR. Designing for maintainability – ensuring that components can be accessed, removed, and replaced efficiently – is one of the highest-leverage reliability improvements available during equipment design phases.

OEE (Overall Equipment Effectiveness)

A composite metric measuring how effectively equipment is used in production, calculated as the product of three factors: Availability, Performance, and Quality.

OEE expresses production effectiveness as a single percentage where 100% would indicate perfect production with no downtime, no slow cycles, and no defects. World-class OEE is generally considered to be 85% or higher. OEE is widely used in TPM programs and is a foundational metric in lean manufacturing. See our OEE methodology guide for the full formula, worked examples, and honest critique of how OEE is commonly misused.

PM Compliance

The percentage of scheduled preventive maintenance work orders completed on time within their target window, typically expressed as a percentage.

PM compliance above 90% indicates a mature maintenance program; rates below 70% typically indicate planning, resource, or prioritization problems. Low PM compliance is often a leading indicator of future reliability problems because deferred PMs eventually translate into failures that could have been prevented.

Wrench Time

The percentage of a maintenance technician’s shift spent actively performing maintenance work – turning wrenches – as opposed to traveling, waiting for parts, getting instructions, or performing administrative tasks.

Industry studies typically find wrench time between 25-35% in poorly run organizations and 55-65% in mature ones. Improving wrench time through better planning, kitting, and scheduling is one of the highest-leverage maintenance productivity improvements because it does not require additional headcount.

Methodologies and Frameworks

RCM (Reliability-Centered Maintenance)

A structured methodology for determining the optimal maintenance strategy for each asset based on its function, failure modes, and consequences of failure.

RCM uses systematic analysis – typically following SAE JA1011 standards – to assign maintenance tasks that genuinely address failure modes rather than applying generic preventive schedules. RCM analysis can be time-intensive but typically produces maintenance strategies that are both more effective and more efficient than less rigorous approaches. See our RCM methodology guide for the seven SAE JA1011 questions, the four consequence categories, the proactive task hierarchy, and worked examples.

TPM (Total Productive Maintenance)

A holistic equipment management approach originating in Japanese manufacturing that involves operators in routine maintenance tasks, emphasizes equipment effectiveness through OEE, and pursues zero unplanned downtime.

TPM is built on eight pillars including autonomous maintenance, planned maintenance, focused improvement, early equipment management, quality maintenance, training and education, safety, and TPM in administration. TPM and RCM are complementary rather than competing approaches.

FMEA (Failure Mode and Effects Analysis)

A systematic technique for identifying potential failure modes in an asset or process, evaluating their causes and effects, and prioritizing them based on severity, occurrence, and detection.

FMEA produces a Risk Priority Number (RPN) or Action Priority (AP) that guides where to focus maintenance and reliability improvement efforts. FMEA is used both during equipment design (Design FMEA) and ongoing operations (Process FMEA), and is foundational to RCM analysis. See our FMEA methodology guide for the AIAG-VDA 2019 seven-step approach, the RPN-versus-AP distinction, and worked examples.

Root Cause Analysis (RCA)

A problem-solving methodology used to identify the underlying causes of equipment failures or operational problems rather than just addressing symptoms.

RCA techniques include the 5 Whys, fishbone diagrams, fault tree analysis, and formal investigation processes. Effective RCA distinguishes between proximate causes (what immediately caused the failure) and root causes (what underlying conditions allowed the proximate cause to occur). Without addressing root causes, the same failures recur.

5 Whys

A simple root cause analysis technique that involves asking “why” repeatedly – typically five times – to drill from a surface symptom down to the underlying cause of a problem.

Originally developed at Toyota as part of its production system, the 5 Whys is widely used because it requires no special tools and surfaces causes that single-pass analysis often misses. The technique works best when team members from different disciplines participate, since each “why” tends to expose a different facet of the problem.

Bathtub Curve

A graph showing the failure rate of a population of assets over time, named for its characteristic shape: high failure rates early in life (infant mortality), a low and stable failure rate during normal operating life, and rising failure rates as assets reach end of life (wear-out).

Modern reliability analysis has shown that many assets do not actually follow this pattern — the seminal study by Nowlan and Heap for United Airlines found that only 11% of assets follow the classic bathtub curve, with 89% exhibiting other failure patterns including random failure. This finding fundamentally changed how reliability professionals approach maintenance strategy. See our RCM methodology guide for the full Nowlan-Heap context and how it shaped modern reliability practice.

P-F Curve

A graph showing the period between when a potential failure becomes detectable (Point P) and when functional failure occurs (Point F).

The P-F interval determines how often condition monitoring inspections need to occur to reliably catch developing failures before they cause functional failure. If the P-F interval is six months, monthly inspections are appropriate. If it is two days, only continuous monitoring will reliably catch failures in time.

Asset Criticality Analysis

A systematic process for ranking assets based on the consequences of their failure – including safety, environmental, production, quality, and cost impacts.

Criticality analysis drives where maintenance resources should be concentrated and which assets warrant predictive monitoring versus run-to-failure approaches. A formal criticality assessment typically produces a ranked list with categorical labels (high, medium, low) or numerical scores that align maintenance investment with actual business risk.

Software and Systems

CMMS (Computerized Maintenance Management System)

Software that manages the day-to-day execution of maintenance work, including work order creation and tracking, preventive maintenance scheduling, parts and inventory management, asset records, and maintenance reporting.

CMMS is distinguished from EAM by its focus on maintenance execution rather than full asset lifecycle management. Common CMMS platforms include MaintainX, Limble, UpKeep, Coast, Fiix, and eMaint. See our CMMS comparison guide for detailed evaluations.

EAM (Enterprise Asset Management)

Software that manages the full lifecycle of physical assets, including procurement, depreciation, maintenance, capital planning, compliance, and end-of-life decisions.

EAM encompasses CMMS functionality but extends to financial accountability, multi-site enterprise management, and integration with ERP and procurement systems. Common EAM platforms include IBM Maximo, SAP EAM, Oracle EAM, Hexagon EAM, AVEVA, and Infor EAM. See our EAM comparison guide and CMMS vs EAM explainer.

APM (Asset Performance Management)

Software that applies analytics, machine learning, and condition monitoring data to optimize asset reliability and performance.

APM platforms typically integrate with EAM and CMMS systems to provide predictive maintenance insights, reliability analytics, and asset health scoring. APM is increasingly being absorbed into broader EAM platforms – IBM Maximo Application Suite, AVEVA, and Hexagon all include APM capabilities – but standalone APM platforms remain available. See our APM comparison guide and APM vs CMMS explainer for detailed evaluations.

Work Order

A formal instruction to perform specific maintenance work on an asset, including the tasks to be completed, parts and tools required, labor estimated, and procedures to follow.

Work orders are the fundamental transaction unit of maintenance management and the primary record of maintenance activity. Well-structured work orders enable accurate cost tracking, performance measurement, and historical analysis. Poorly structured work orders make maintenance data nearly useless for analysis.

Asset Hierarchy

The structured organization of assets within a CMMS or EAM, typically arranged in parent-child relationships from the site or facility level down through systems, equipment, and individual components.

A well-designed asset hierarchy enables accurate failure tracking, cost roll-up, and reporting at meaningful levels of aggregation. Most reliability analytics depend on a properly structured asset hierarchy, and rebuilding hierarchies after CMMS deployment is one of the most expensive and disruptive corrections an organization can make.

Bill of Materials (BOM)

A structured list of components, parts, and materials required for an asset or maintenance task.

In maintenance contexts, BOMs identify the spare parts associated with a specific asset and enable accurate parts planning, kitting for work orders, and inventory management. Maintenance BOMs differ from manufacturing BOMs in their focus on consumable parts, wear items, and items needed for repair rather than parts that constitute the finished product.

Roles and Operations

Maintenance Planner

A maintenance professional responsible for preparing maintenance work – defining scope, identifying parts and tools, estimating labor, writing procedures, and ensuring work is ready to execute efficiently.

Effective planning is one of the highest-leverage roles in maintenance operations and directly drives wrench time and PM compliance. Industry research consistently shows that organizations with dedicated planners achieve significantly higher productivity than organizations where technicians plan their own work, with typical multipliers of 2-3x in wrench time.

Reliability Engineer

An engineering professional responsible for improving asset reliability through analysis of failure data, root cause investigation, asset criticality analysis, and reliability-centered maintenance.

Reliability engineers focus on eliminating root causes of failure rather than executing maintenance work directly. The role typically requires both engineering credentials and deep understanding of maintenance practices, and serves as the bridge between maintenance execution and engineering design improvements.

Maintenance Technician

A skilled trades professional who performs hands-on maintenance work on equipment, including repairs, preventive maintenance tasks, troubleshooting, and condition assessments.

Maintenance technicians typically specialize in mechanical, electrical, instrumentation, or multi-craft disciplines. Modern technicians increasingly work with digital tools — CMMS mobile apps, condition monitoring instruments, smart glasses, and remote expert support systems – alongside traditional hand tools.

Maintenance Manager

The leader responsible for overall maintenance operations at a facility or organization, including strategy, budget, staffing, and coordination with operations and engineering.

Maintenance managers balance reliability improvement, cost control, and operational reliability under significant constraints. The role has evolved significantly in recent decades from primarily tactical execution toward strategic asset management, reliability improvement, and integration with broader business objectives.

MRO (Maintenance, Repair, and Operations)

The category of supplies, parts, equipment, and services used to maintain and repair production assets but not incorporated into finished products.

MRO inventory typically includes spare parts, consumables, tools, lubricants, and PPE, and is managed distinctly from production materials. MRO inventory management is a significant operational concern in industrial operations, with carrying costs, stockouts, and obsolescence all creating financial pressure that requires balanced solutions.

Backlog

The accumulated work orders that have been identified but not yet completed, typically measured in labor weeks.

A healthy maintenance backlog is generally considered to be 2-4 weeks of work – too little indicates poor work identification (problems are not being found), while too much indicates insufficient resources or poor prioritization. Backlog management is a core responsibility of maintenance planners and managers.

Condition Monitoring Techniques

Vibration Analysis

A condition monitoring technique that measures vibration patterns in rotating equipment to detect developing faults including imbalance, misalignment, bearing defects, looseness, and gear problems.

Vibration analysis is the most widely used predictive maintenance technique for rotating machinery and typically uses accelerometers to capture frequency-domain data. Modern vibration analysis ranges from periodic route-based collection by analysts to continuous online monitoring with automated diagnostics, with the right approach determined by asset criticality and failure mode characteristics.

ISO 10816 / ISO 20816

International standards that define vibration severity zones for evaluating machine condition based on overall vibration measurements.

ISO 10816 is the legacy standard being progressively replaced by ISO 20816, which provides updated guidance for different machine classes including reciprocating machines, gas turbines, and large industrial equipment. The standards define four vibration severity zones (A, B, C, D) corresponding to good, satisfactory, unsatisfactory, and unacceptable conditions.

Ultrasonic Testing (UT)

A non-destructive testing technique that uses high-frequency sound waves to detect internal defects in materials, measure thickness, and identify corrosion.

UT is widely used in pipeline integrity management, pressure vessel inspection, and structural assessments. The technique sends ultrasonic waves into a material and measures the reflected signals, with discontinuities and defects creating characteristic reflection patterns. UT is distinct from airborne ultrasound, which detects sound transmitted through air.

Airborne Ultrasound

A condition monitoring technique that detects high-frequency sound emissions in air to identify compressed air leaks, vacuum leaks, electrical discharge (corona, arcing, tracking), and mechanical problems including bearing faults and steam trap failures.

Distinct from ultrasonic testing, which uses ultrasound transmitted through materials. Airborne ultrasound has expanded significantly as a maintenance technology because it identifies high-cost problems – particularly compressed air leaks and electrical discharge – that are otherwise invisible to operators and maintenance teams.

Thermography (Infrared Thermography)

A condition monitoring technique that uses infrared cameras to visualize temperature differences on equipment, identifying problems including loose electrical connections, overloaded circuits, bearing faults, insulation failures, and refractory damage.

Thermography is widely used in electrical inspection, mechanical equipment monitoring, and building envelope assessment. The technique is particularly valuable for electrical inspections where loose connections create heat well before they fail, allowing repair during planned outages rather than after equipment damage. See our thermal camera comparison guide.

Oil Analysis

A condition monitoring technique that analyzes lubricant samples to assess equipment health, lubricant condition, and contamination levels.

Oil analysis tests typically include wear metals analysis (which metals are present in the oil indicates which components are wearing), viscosity testing, particle counting, water content, and additive depletion, providing insight into both lubricant performance and equipment condition. See our oil analysis labs guide.

Motor Current Signature Analysis (MCSA)

A condition monitoring technique that analyzes electrical current signatures from motors to detect mechanical and electrical faults including broken rotor bars, air gap eccentricity, bearing problems, and stator winding issues.

MCSA is non-intrusive and can detect faults without requiring physical access to the motor. The technique uses the motor itself as a transducer – fault conditions modulate the current signature in characteristic patterns that can be identified through frequency-domain analysis of the current waveform.

Shaft Alignment

The process of positioning two or more rotating shafts so that their centerlines are colinear when the machine is operating at normal conditions.

Proper shaft alignment reduces vibration, extends bearing and seal life, lowers energy consumption, and prevents premature failure of coupled equipment including pumps, motors, fans, and gearboxes. Misalignment is one of the most common root causes of premature rotating equipment failure, and precision alignment is one of the highest-leverage reliability improvement practices.

Laser Shaft Alignment

A precision shaft alignment method that uses laser-based measurement systems to determine and correct misalignment between coupled shafts.

Laser alignment tools provide significantly higher accuracy than dial indicators and have become the standard method for precision alignment work in industrial maintenance. Modern laser alignment systems include cordless wireless units, automatic measurement modes, and integrated soft foot detection, making precision alignment significantly faster and more reliable than traditional methods.

Soft Foot

A condition where a machine’s mounting feet do not all sit flat on the baseplate when bolted down, causing distortion of the machine frame when bolts are tightened.

Soft foot creates alignment problems that cannot be corrected through normal alignment procedures and must be diagnosed and corrected before precision alignment can be achieved. Common causes include warped baseplates, dirt or debris under feet, machined surface irregularities, and incorrect shimming. Modern laser alignment systems include automatic soft foot detection.

Suggesting an Addition

This glossary is updated regularly as terminology evolves and new techniques emerge in maintenance and reliability practice. If you would like to suggest a term to be added or refined, please reach out to the Reliable Magazine editorial team. The goal of this resource is to be the most accurate, useful, and accessible reference for maintenance and reliability terminology available.

Related Resources

Last updated: May 2, 2026. This glossary is editorial reference content from Reliable Magazine. No vendor paid for inclusion or editorial input.

Author

Reliable Media

Reliable Media simplifies complex reliability challenges with clear, actionable content for manufacturing professionals.
View all posts

Reliable Directory

Find industrial contractors, distributors & integrators

15,000+ verified listings across North America

Search the Directory →