When equipment fails we need to repair them as soon as possible, especially in a fast paced environment where availability is a critical issue. The key to effectively solving technical problems is to quickly find the cause and take actions to fix it. However, this process is usually not as easy as it sounds.
Sometimes maintenance actions put the machine back on service even if the cause still there. In doing this, we are likely to have the same problem again in the near future. Other times the cause of the failure is not so evident, and the asset remains out of service because we are unable to find what is wrong.
Each of the cases above has its particularities and requires different maintenance approaches depending on the situation. The first case is a chronic situation, while the second is an acute one. We will discuss each of them separately.
The chronic case is the most frequent. It presents situations where we can make temporary repairs that keep the production line running. However, we end up dealing with the same issue over and over again because we are not attacking the root causation, we are just dealing with the symptoms. The correct approach in this case is the Root Cause Analysis (RCA), which will allow us to detect and remove what is really causing the problem and not solely the symptoms.
The pro of this situation is that we have more time to study the problem. Since the production line is working again, there is no pressure from the operational point of view. However, the problem can worsen if we do not act in short term, and the time to repair could dramatically increased.
The con for this case is that sometimes this type of problems is extremely difficult to detect, mainly for the following reasons:
- Different technicians, all who are unaware of repetitive failure, perform the corrective actions, so the repetition remains undiscovered.
- Usually repairs that are simply, quick tasks and replacements are not registered anywhere, making it difficult for the maintenance engineering office to detect the problems.
- The worker that repairs the equipment feels the pressure to have the line in service as soon as possible, so he thinks of RCA as an extra delay.
- Sometimes, technicians think that RCA only increases their work, so they prefer to stick to the quick fix to avoid it.
There are several tools and techniques to perform a RCA that will be discussed in a future post. But, as general guidelines, I can mention that:
- If the problem happens frequently, and keeping downtime to a minimum is crucial (especially in fast paced environments), we can plan a different approach each time until we discover the root cause.
- When complete components have been replaced, the defective one can be studied in the maintenance workshop to discover what went wrong.
- If we send the replaced part to the manufacturer for repairs, we can ask for a report explaining the cause of the failure.
- For complex problems, a failure reporting system should be implemented to detect patterns that could lead us to the root cause.
- If possible, always create a work group to perform the RCA since it is much more efficient than a single person working alone.
What are your experiences in fault isolation and root cause analysis?
Which was your most difficult case?
Do you have a different approach?
Please share your viewpoint in the comment section!
Thanks for reading!