Failure Time Frames Help Maintenance Teams Plan and Save Resources

by Mark Kingkade | Articles, Maintenance and Reliability, Predictive Maintenance

Failed gearbox

For decades, the debate over how people report their machine findings when using different monitoring technologies has been ongoing. Do you link a failure date to your report when asked or not? Do you even have a discussion about a projected time frame for the repairs needed?

Do you link a failure date to your report when asked or not? Do you even have a discussion about a projected time frame for the repairs needed?

The Foundation of Effective Vibration Programs

I started my vibration career in 1995. Since I left my first in-house vibration program, I have been told not to add a timestamp to my reports on machine failures. During my early years, many of my discussions were around the time to complete failure. I was told part of my job was to help the plant spend its limited resources as effectively as possible.

By resources, they were maintenance dollars, manpower hours, parts dollars, and availability, as well as machine downtime windows. We had two shutdowns each year. One seven-day outage and one ten-day outage. These outages were used to address more complex repairs that could not always be performed during normal operating schedules.

I was told part of my job was to help the plant spend its limited resources as effectively as possible.

As a result, I was taught to use Rate of Change, Historical data (we kept everything), Reaching Out to equipment SME’s, and so on. I was taught to use all my senses when collecting data and taking notes on what I saw, felt, heard, and so on. These notes were added to the vibration database and linked to assets, and at times to specific points.

Also, I was taught to talk with the local maintenance personnel and machine operators to help identify issues that occurred between my route collections. We used a combination of band alarms, B sub-S alarms, and statistical alarms as the first filter. Once we had alarms tripped, sensory findings, comments from the maintenance or operators, the reviews began.

This way of running vibration programs has changed with the increase in online 24/7 programs today. The biggest change is that in some cases, we are no longer able to be present at a machine or system. In some cases, we also do not have particularly good communication, if any, with operations or maintenance techs.

Building Relationships Across the Organization

I was also taught that I needed to build relationships with everyone involved in operating and maintaining the machines. This relationship was from the operations floor to the front office, from the floor maintenance tech to the supervisor, and up to the maintenance manager. In some organizations, my communications extended to corporate levels.

I also needed to help educate all these different groups about what I was doing and what I and my data could and could not do. There are still many misconceptions about this topic.

The Case Against Avoiding Time Frames

When people say you shouldn’t set time frames for repairs, I ask why. If you do, you are being set up as a scapegoat if it fails before the time you told them. My response is that they do not fully understand the value of your data or how to apply it correctly to their operations. This is where the first steps of customer education and relationships come in.

If you cannot use your data to help the customer develop an action plan and a time frame for repairing the asset, it also devalues your service in my eyes.

In some cases, it may also indicate a lack of machine education for the vibration analyst. If you cannot use your data to help the customer develop an action plan and a time frame for repairing the asset or implementing a design/operation change, it also devalues your service in my eyes. I have always said identification of the developing defective component is the easy part. Especially if I have the correct component information. However, many places and companies are also telling people that this data is no longer needed with the use of AI. This is another discussion for me.

When the Site Won’t Commit to Reliability

If, after all this, you still feel the site is set up to point fingers and deflect “Blame” for failure to others, then they are not committed to Reliability. They are just parts changers with no long-term reliability vision for their organization, and every failure mode identified becomes another “Emergency Repair”.

Do yourself a favor and try to move away from working for them. At least understand the operation “Their Vision” and how it will impact you. Run your routes or vibration-monitoring process, point out defects and the need for repair, and move on. When things fail that you have identified, do not worry; just move on and point out that the defect has been identified. If you missed it, figure out why and adjust your monitoring for this asset so it will not happen again, and move on.

It will not be a healthy relationship for you or your employee, and never will be, because the operation becomes a CYA.

Dealing with the “Blame Game”

I had a customer like this early on in my career. The first time I walked onto the site, I was told, “Corporate said I have to have you monitor my machines. I have been running this plant successfully for 25 years and really do not need you to tell me how to do things. You can come in, do your thing, and drop off the report, or you can throw it in the trash can and save me the trouble. I have made ticket every month.

Several months into this assignment, I found out that “Ticket” was based on a 5-day operation. So if they had breakdowns, they just ran full shifts through the weekend to make up for the lost production. At the end of the month, if they produced the set amount of product, they made “Ticket”. I ran that route for several years until a reorganization, and I was moved out of the plant to cover others. The plant never changed operations until that maintenance manager quit.

However, I built strong relationships with the operations manager, the operators, the maintenance techs, and the planner. We developed work arounds overtime to get work done. The planner and operations manager told me that at times they were concerned about failures and the possible impact on their careers.

What I helped them do to CYA (Cover Their Behind) was suggest they write Work Requests and put them in the system. Then, when they had failures, at least they could point to the fact that the “Work Request” had been entered and was never approved by the maintenance manager to move it into a “Work Order”.

When Blame Replaces Accountability

In another case, the maintenance manager was not fully committed to using the data he was provided and was always looking for scapegoats for failures. Vibration missed it. They failed to properly identify the severity of the defect. Operators are not running things correctly. Oil analysis is wrong. The manufacturer said there was nothing wrong with that machine.

The Catastrophic Gearbox Failure

The site’s vibration person flagged a developing issue in a gearbox, and it failed several months later. Almost every one of these excuses was given after it catastrophically failed during the RCA. Two days before the failure, the vibration related to the bearing cage frequency had jumped by around 1000%. The amplitude levels were not extremely high at this frequency, but the bearing was running at around 56 rpm in a larger gearbox, and the data were not collected directly next to the bearing. The change was there.

Also, during the run-up to the failure, the oil filtration system kicked out because of plugged filters. The quarterly oil reports indicated extreme gear wear. The sampling rate was shortened during the run-up to the total failure, but the maintenance manager continued to challenge its accuracy. The report back from the manufacturer, when reports were sent to them, said this was just break-in wear and that they should not be concerned. They never actually visited the site to do a visual inspection of the gears.

This was the result of the failure to address what the data was telling them. In my mind, the way the maintenance manager treated this analyst took a mental toll on him as he retraced his steps to figure out what he could have done differently to prevent this. Sometimes we will be put in a difficult situation and have to use those events to understand that there was nothing we could have done differently and to move on.

Sometimes we will be put in a difficult situation and have to use those events to understand that there was nothing we could have done differently and to move on.

Broken machinery

When Time-Frame Estimates Led to Better Outcomes

A bearing defect was identified in this gearbox related to the 4th shaft bottom bearing. The gearbox in question did have what I felt was a design flaw, based on other failures I had been involved in. At the time, they did not have a spare box on site or in the corporate supply chain.

They did have a gearbox from a plant that had been shut down, but it also had other issues that made it a less-than-desirable long-term replacement option for this location. The site’s maintenance manager planned to upgrade the drive and base to a more robust design, which required an extended outage window.

Electric motor and gearbox

Understanding the Design Flaw

When you look at this base, you will notice it sits on top of the floor and is basically just an I-Beam framework with a plated top. The motor is also just a bent plate made from ¼ inch flat stock. It has a couple of cross braces under it that you cannot see. The motor base would flex during startup, which affected the alignment.

Over time, the motor would shift, and the alignment would need to be adjusted. The fundamental design issue here was that the base probably weighed as much as the drive, but not the typical three times I was taught back in my early days. Going back to the machine’s history, when it was started under full load without following the correct procedure, it would eventually shift the motor.

Over time, these small shifts would result in alignment issues that would blow the grid coupling apart if we were not allowed to adjust the alignments. This was an across-the-line start with no way to reduce the torque/shock load, which also compounded the design issues.

When the bearing issue first developed, the question I was asked was how long it would last in the present condition. The maintenance manager had a long-term vision: a frame and foundation design change to address all the issues with this drive. New style gearbox, frame, and motor base.

Monitoring Through Delayed Shutdowns

The discussion first centered on “Will it last until the summer shutdown in June?” My response was yes, if we can keep it from tripping out and going down under load. Every time this happens, we will need to reevaluate things. This was also located in a hazardous area, so no welding or spark-producing work could be done without a complete area purge.

Summer shutdown was about a month away, and it was announced that “Margins were too good to shut down”. We were told they were going to push the outage out for a couple of months. That time came, and the outage was pushed out again. There was another round of conversations about time frames.

My feeling was that we were on the edge with no local spare, but I was not seeing a double in the vibration levels; it was a slow increase. We were doing monthly vibration routes and bumped up the oil sampling rate. We talked about the other gearbox that was in the other plant that was shut down.

I explained it had issues, but was better than what we currently had. They planned to have it removed and shipped to our plant. When it arrived, we conducted the inspections we could through the available inspection covers, and it was placed near the location. Ready if needed.

We continue to monitor and discuss time to failure. We eventually got the shutdown later that year, at which time the old gearbox was pulled along with the frame. They cut a hole in the floor and installed the new frame, completely grouted the base in place, and filled it. The motor base was upgraded to a much better design at this time.

The following photos show the end result of the drive upgrade.

Industrial gearbox

Electric motor

This is shown as an example of the value of open discussions and of developing an action plan that requires time-frame discussions.

The Cost of Skipping Severity and Time-Frame Discussions

A Site Without Time-Frame Discussions

I was moved to fill in for a site when the vibration analyst left. As part of that transition, my first trip was a little awkward. I did not get a warm welcome when I showed up. I completed my routes and delivered my report at the end of the week. I asked for a time on Friday to go over the report with everyone before I leave the area.

The first thing I was told was that they had never had this happen with the last person. We set up a meeting, and I told them I was impressed with how well things were running and had only a few minor things to discuss. That is when the discussion took an uncomfortable turn. The first thing I was told by the maintenance manager was, “You guys are killing me. I have already spent over half my maintenance budget, and I have 9 months left.”

I was a little confused and asked him to explain this to me. I was told to go ahead and go through what you have on your list, and we will deal with it. We went through the list, and there were actually very minor issues with alignment, balance, and a couple of early bearing issues. I could tell there was frustration building during the meeting.

I explained that, based on the data I had and the history I reviewed, there was no reason this work could not be done during that outage.

The maintenance manager looked at the operations manager and asked when he could shut the plant down to do this work. I was a little caught off guard. Nothing I found needed immediate work. The fan imbalance issue showed a vibration level slightly over 0.225 in/sec, which was the corporate vibration spec.

The data showed a step change, and when I asked about it, they told me they had a bag in the dust collector that developed a hole. But after they replaced the bags, the vibration levels were steady, and I suggested cleaning the fan and balancing it if needed. There was a city water booster pump that was just beginning to show signs of a defect. The alignment issue was in an area where floor tiles were coming loose. In that case, I was saying they really needed to address the floor tile issues in the area.

I went on to explain that nothing I was seeing needed urgent attention. I asked when their summer shutdown was. It was around 4 months away. I explained that, based on the data I had and the history I reviewed, there was no reason this work could not be done during that outage.

They seemed a little shocked and told me the last guy always told them that scheduling the work was not part of his job. They just needed to fix things ASAP; he did not discuss severity or time frames. They would shut down machines and areas typically within the next week or two and complete the work. That is why they were burning through their budgets so quickly.

At times, they would increase overtime or bring in contractors on short notice, adding to the cost of the repairs. Regarding the floor issue, they planned to replace it during the outage, so they would reset several pumps in that area anyway. Had they gone through with the work I had requested, they would have had to do it twice.

We discussed severity, rate of change, and other things. I explained to them that as long as increases happened in a logarithmic fashion, we should be fine. However, if anything in the operation changed, or say they plugged up the pump, we would need to reevaluate everything.

Over the next several months at the site, we had great engagement once they understood how to effectively use the data I was giving them. They also began to understand risk management and time-to-failure estimates, which were what I was giving them. How they used these estimates was up to their group.

In the end, I personally have always felt my focus in my programs was to help sites spend their resources effectively. To do this, I need to help them out by giving them time to fail, which means the machine has no choice but to be shut down.

Also, to help educate people on what I can and cannot do, and how to effectively use my data. These are just my feelings, and I know others may not agree, and that is fine. We all need to do what we feel comfortable with.

Author

Mark Kingkade

Mark Kingkade is a Reliability Solutions Architect for Waites Wireless with over 30 years in Maintenance and Reliability. A CMRP since 2008, he has presented at multiple conferences on vibration analysis. His experience spans roles as Field Analyst, Lead Analyst, SME, and millwright planner. An ISO Cat 3 vibration analyst, he also works with oil analysis, thermography, ultrasonics, and more. He has supported 45+ PDM programs, developed inspection procedures, and optimized PM strategies. Passionate about mentoring, he helps others grow in condition monitoring.
View all posts

SHARE

Recent Posts

Vibration Institute

Reliable Directory

Find industrial contractors, distributors & integrators

15,000+ verified listings across North America

Search the Directory →

You May Also Like

Air in Oil: The Reliability Variable Many Programs Still Treat Too Late

Air in Oil: The Reliability Variable Many Programs Still Treat Too Late

Air is inert, so it shouldn’t affect your oil, right?! This is a concept that many people get wrong or don’t fully...

The Reliability Culture Myth: You Can’t Train Your Way Out of Bad Leadership

The Reliability Culture Myth: You Can’t Train Your Way Out of Bad Leadership

Reliability training rooms across North America fill up every week. Reliability-Centered Maintenance, root cause...

Why Uptime Doesn’t Need to Be an Uphill Struggle

Why Uptime Doesn’t Need to Be an Uphill Struggle

Preventing downtime can feel like a constant battle, but it doesn’t have to be. Gaining a better understanding of your...

A Ball Mill’s Replacement Gearbox Compromised Before Service

A Ball Mill’s Replacement Gearbox Compromised Before Service

The first gearbox was condemned for excessive gear mesh backlash. A borescope inspection of its replacement found a...

Industrial Downtime Cost Benchmarks: What Published Studies Actually Show

Industrial Downtime Cost Benchmarks: What Published Studies Actually Show

Downtime costs get quoted like gospel. That’s risky. A number that makes sense for an automotive assembly plant can...

Adding Traction to Industrial Stairs Without Pulling a Hot Work Permit

Adding Traction to Industrial Stairs Without Pulling a Hot Work Permit

Slippery grated stairs are one of the most predictable injury sources in any plant. Maintenance teams know which...