The MaintXL System™ – Accelerate Maintenance Excellence
“Maintenance Excellence” is 2 words that every company strives for but very few actually really define. We say we want to be excellent in how we carry out our maintenance and everyone agrees, but what does it actually mean; how can we be sure we are moving towards it and how do we know when are achieve it?
Read on to find out more or use the table of contents to jump straight to a section you’re interested in.
- 1 Why use the MaintXL System™?
- 2 Maintenance Philosophy
- 3 Inputs
- 4 Work Execution Process
- 5 Outputs and Reporting
- 6 Reporting
- 7 Performance Analysis
- 8 Continuous Improvement
- 9 External Changes
- 10 Summary
Why use the MaintXL System™?
The MaintXL System provides a framework for understanding the goals of the maintenance department, understanding the current performance in relation to these goals and then identifying specific areas that need to be addressed in order to meet these goals. It is also an excellent tool for communicating the performance of the maintenance department with the rest of the organisation in a format that is easy to understand for most managers and executives.
Maintenance Excellence all about excellence starts with the maintenance system in place and functioning correctly then the results will follow as long as you stick to the system.
Note: In the context of this system it’s not the computer maintenance management system. The CMMS is a key tool that is part of the maintenance system but is not in itself the maintenance system.
The MaintXL System is also a great too which an be used to articulate and sell the benefits of maintenance. Representing maintenance as a system puts it into a context that most people can understand and relate to. Systems have the following elements:
- A purpose – we know what we want to achieve
- Inputs – we feed the system with inputs of various types
- Processing – work gets done on our plant and equipment
- Outputs – the system produces results
- Feedback process – we take those results and use them to calibrate the system. This can be changing the inputs or modifying the expected outputs
Let’s now break the MaintXL System down into its component parts.
This philosophy is a high-level statement outlining the objectives and policy of the company and its approach to maintenance.
This maintenance policy and ojectives need to be agreed with all departments within in company and will often also outline the scope of the maintenance system. For example, the plant and system will be the responsibility of the maintenance organisation; however, often in larger buildings or plants there will be a facilities maintenance team who will be responsible for roads and buildings, offices, IT infrastructure etc. so it can be useful to make the distinction. Below is an example used at an oil and gas company.
Preventative Maintenance (PM) Equipment Strategies
The PM equipment strategies (sometimes known as equipment or asset life plans) are the specific preventative maintenance tasks that will be carried out against the assets. It is useful to have an overall written plan for each classification of equipment i.e. pumps, motors, vessels etc. However, the specific tasks, tags, frequency, man hours, craft codes and so on will be loaded into the CMMS to allow for the planning and scheduling of the workload as well as keeping a history of the work that has been carried out against each asset.
A vital activity that will need to be carried out as part of developing these equipment strategies is a criticality analysis. There are various methods of carrying out this analysis but in general equipment is classified as either non-critical or critical to safety, the environment or production.
This will then be followed by a failure modes analysis for the most critical equipment to understand how it could fail. This analysis will in turn drive the tasks that maintenance need to be carry out to detect and/or prevent these failures. The greater the criticality of the equipment the greater the value there is in determining and then performing the tasks to detect or prevent a functional failure. In other words, we can invest more money on determining failure modes and then carrying out the maintenance tasks because there will be a bigger impact to the business if the equipment fails in service.
These are the technical standards against which the maintenance that is carried out will be compared against. An example of a standard is a performance standard that a piece of equipment is expected to achieve. i.e. pump a certain rate at a certain pressure or provide a defined quantity of power. External regulatory bodies can also provide standards.
Procedures / Guidelines
These are a set written documents that outline how maintenance will be carried out. This library of documents will need to be controlled through a document management process that manages changes and updates. Examples of procedures and guidelines are:
- Equipment Criticality Allocation
- Risk Base Inspection
- Planning and Scheduling
- Corrective Work Identification
- Applying Isolations
- Work Deferrals
- Work Order Close Out and Capturing of History
This list is by no means exhaustive but provides an indication of the types of procedures and guidelines required.
The key IT systems that are used to manage the maintenance are outlined in the strategy. It will also include a broad description of the information that will be contained and managed in each system. It can be useful to have a systems map that shows how these systems join together and the information that flows into and out of each one as this is often a source of inefficiency, data errors and even data losses.
Examples of IT systems are:
- Document Management
- HR (Skills and Competency)
- Control Of Work
- Work Scheduling
Many companies have invested in ERP systems like SAP or Oracle. These systems can integrate many of the examples above into a single computer system, while others companies may have various levels of manual or automatic interfaces between different systems. For example, the CMMS might be in Maximo while the Finance and Procurement is carried out in SAP, and the documents are managed in Documentum. The key is to understand what these systems are and the information they should contain so the organisation has a clear understanding if they are fit for purpose. If they are not then corrective actions can be taken to improve the situation.
KPIs / Targets
It’s essential to be able to measure the efficiency and effectiveness of the maintenance strategy. The strategy will outline the KPIs and measures which will be used to do this. It will also set specific targets for these measures, for example zero safety critical work in the backlog or 85% schedule compliance.
Where no fixed targets can be defined the strategy will outline the method to explain how these targets will be set. An example of this could be setting targets based on the best historical performance at the site over the last 12 months or based on the performance that similar plants in the organisation or within the industry.
Every system will have inputs. This is the information and resources that are required to make the system function effectively.
The maintenance strategy will be the key feed that will determine the inputs we need. These inputs come in the form of the following categories:
- Money (Budget) – How much budget is allocated to the maintenance department to allow it to carry out the activities defined in the strategy?
- Material – The spares and consumables required.
- Organisation – The number of people and skills and competency of the technicians, management and administration organisation.
- Information and systems – accurate information and fit-for-purpose systems. Examples of this could be accurate planned hours and task descriptions on the preventative maintenance routines, spare parts lists or a CMMS workflow that supports the work execution process.
Work Execution Process
The work execution process is the part of the system that will carry out the maintenance activities. The main object is to carry out the work identified in the strategy efficiently.
This is split into 6 key steps:
- Corrective Work Identification – This step in mainly concerned with identifying emergent work. Preventative Maintenance work requirements will be defined in the strategy, loaded into the CMMS and automatically generated as work orders. However, there needs to be a clear process in place for identification and logging of defects.
- Prioritisation – Once a defect has been clearly and accurately identified the next step is to understand its impact or potential impact on the business. This impact is quantified based on risk using a risk-based prioritisation matrix. The matrix is an objective tool to help management understand the risk the defect poses to the business and therefore determine a duration that the business can tolerate this risk. This will drive a target date that states when the defect must be fixed by.
(Add matrix below and a link to a full post on the subject)
- Preparation (Planning) – Preparation, also known as planning, is the process of creating a work-pack containing a clear definition of the tasks, manpower, spares, tools and documents required to carry out the work. Planning is an in-depth subject of its own but overall the objective of planning is to:
- Provide the scheduler with accurate estimates of the crafts, man hours and plant condition to allow them to schedule the work at the right time and for efficient use of resources and plant outages.
- Provide the technician who is carrying out the work with all of the resources and information required to get the job done right first time and on time.
- Scheduling – Once a work order has been fully planned it can be scheduled. Scheduling is the process of allocating an execution date to a work order to make the most efficient use of the crew available. This date is allocated based on a number of different factors including:
- available crew size and craft mix (manpower)
- work priority
- condition of the plant required i.e. running or shutdown
The scheduling process also needs to accommodate any emergency work for example equipment breakdowns.
- Execution – The execution step is concerned with all the short term activities associated with the allocation and performing the work. This includes pre-job toolbox talks, creation of control of work (CoW) permits, preparation of the worksite and performing the actual work.
- Close Out – Last but by no means least is the close out of the work. This part of the process includes the accurate recording of work history and all the follow-on activities that should take place or be considered.
- Recording of crafts and the man hours used against each work order or step
- An accurate and complete history narrative of the work that was carried out and the results.
- Actual spare parts used
- Returning unused spare parts to stores
- Failure codes to allow failure analysis to be carried out in the future
- Creation follow-on work orders. These may be required if a PM routine identifies a fault or condition monitoring activity detects an anomaly
Outputs and Reporting
The outputs of the maintenance system are the results it produces. Some outputs are measured as soon as the output is created; others need to be measured and monitored on an ongoing basis.
The output of the system can be broken into 2 categories, these being efficiency and effectiveness. It’s important to know the difference between these two outputs and both of them need to be measured to understand if the maintenance system is producing the desired results i.e. those outlined in the maintenance philosophy and strategy agreed by the business.
Efficiency is a measure of how well something is achieved. Efficiency means that there is very little wasted time, effort and materials.
Effectiveness is concerned with the question “Are we doing the right thing?”.
The diagram below outlines the relationship between efficiency and effectiveness.
As you can see, an organisation could be very efficient at executing its maintenance. It could have high tool time, skilled and competent individuals and very little waste. However, if the system is set up in such a way that the team are fixing and maintaining the non-critical equipment or not carrying out the correct PM tasks at the correct frequency then we may not be effective. This would show up in reduced reliability, poor safety, high cost etc.
There is a need to measure where the site is on this efficiency vs. effectiveness square and strive to move into the top right quadrant and then stay there.
Measures of Efficiency and Effectiveness
- Tool Time – Also known as wrench time, this is a measure of how much actual time the technicians are working on maintaining the plant and equipment.
- Work Quality – Measuring the quality of workmanship indicates if the team are maintaining the equipment to a high standard and not introducing new defects or carrying out high levels of rework. This is also a measure of competency.
- Backlog – How much work is being deferred?
- Schedule Attainment – Is there the organisational capability to produce and then execute a schedule of work on a consistent basis that matches the available capacity of the team.
- PM Compliance – Are the preventative maintenance tasks outlined in the strategy for each asset being carried out on time? If the preventative maintenance work is not carried out then we won’t know if it is effective. If it is being carried out and reliability is still below target then it could be that the wrong tasks and or frequencies are in place.
- MTTR – How quickly is the system able to enact repairs of our equipment?
- Safety and Environment – How many people are being injured and what impact on the environment is the facility making in the process of carrying out the maintenance on the plant.
- Reliability – How reliability is the equipment. High reliability one of the key reasons to carry out maintenance on the equipment.
- Production Efficiency – This is a measure of the actual plant output as a percentage of the maximum theoretical output. It is also often measured in the slightly different measure of Overall Equipment Effectiveness (OEE).
- Cost – Most companies have a requirement to make a return on the capital invested in them. If a plant can’t be maintained at a cost that provides an acceptable return on investment then it is not effective.
The system will create outputs; that is inevitable. These outputs will be real and will have an impact; for example, there will be safety performance, reliability, tool-time, production efficiency etc.
The reporting element the MaintXL System is the concerned with defining, collecting and distributing this performance information and is the input to the performance analysis process.
The reporting element ensures that correct and accurate data is collected and transformed into a format that can be analysed at the right time by the right people in the organisation, providing them with decision support to allow performance analysis to be carried out quickly, easily and effectively.
Information can often be collected in IT systems but can be hard to extract from the sources systems and presented in a meaningful mannor. Dashboards are a great tool for displaying information that is clear, concise and supports management in making decisions.
The subject of displaying information is a science in itself and too in-depth for this article. However, there are excellent books by Steven Few that cover the subject of reporting and dashboard design and I’ve used some of these principles in the dashboards I’ve created to great effect.
A key component of every system is to have the ability to calibrate and improve itself over time. Performance analysis is the element of the system that is responsible for taking the reported outputs and measures them against the targets that have been set for them. These targets will be defined and agreed in the maintenance strategy document.
Once the target vs. actual values are know for each output we can understand if the current performance is acceptable or unacceptable. There is a decision to be made in both instances.
If the actual value is better than the target values then the management team can start to look at proactive improvements to the system.
If the actual value is less than the target value then there is a decision to be made. Either tolerate the different or start to exam what changes to the system could be made to improve the situation and bring the output back on target.
Tolerating the difference is the default position if no action is taken. However, there could also be a management decision to accept reduced performance for a defined period of time and accept the reduced performance that comes with that. This mode of operating is unsustainable in the long term and will lead to gradual or even fast deterioration of the system’s ability to produce the required outputs.
This is the element of the system that focuses on identifying and effecting changes to the system in order to improve it. Continuous improvement is a key element of the system that is often looked upon as a nice-to-have addition by some companies, or given lip service and not the time and effort it requires.
If a company is not continually improving in an environment where its competition is, then it is in effect becoming less competitive and will eventually fail.
At the heart of continuous improvement is root cause analysis (RCA). An RCA process identifies the root cause that resulted in the output(s) being off-target. Understanding the root cause gives management the confidence that any changes they do invest in are going to address the causes for deficiencies in the system and bring the outputs into alignment with the targets.
Root cause analysis can also be used to identify positive changes that can be made to the system to ensure that it is meeting the target and continues to do so with reduced input and therefore more efficiently (less cost and resources).
The RCA information allows management to choose two important courses of action:
- Where to make changes to the system
- What specific changes are required to the system
There are 3 areas where changes can be applied to the MaintXL System:
- Changes to the inputs to the system
- Changes some element of the strategy, which includes the work execution system since it is defined in the strategy
- Proactive changes to improve the system performance
Once the element of the system which needs to be changed is identified a decision then needs to be made to decide which type of corrective action to take. There are an almost limitless number of options, however, the corrective actions will typically fall into the categories below:
- New Technology
- Updated Techniques
- Skills Training
- Design Out
It may even be the case that the organisation changes its targets. For example, this can happen if the market conditions change or as the plant ages and nears the end of its useful life.
1.) Poor Schedule Attainment (Efficiency)
The organisation was struggling to create a schedule of work and had very poor schedule attainment. This was causing technicians to waste time on work orders which were not fully planned, reducing their tool-time and impacting reliability of critical equipment.
Change to Strategy – scheduling was carried out for the remote site from a central office. The scheduler had contact with the site through phone, email and video conference but not day-to-day, hour-by-hour close contact with the site team. As a result, the supervisory team at the site were carrying out the scheduling function independently and were not working to the official schedule which was issued to the site from the central office. This was inefficient in a number of ways.
- The scheduler was wasting time selecting the most important jobs and scheduling them in the CMMS
- The supervisory team were wasting time creating an unofficial schedule when this time should have been used to supervise their teams
- The KPI measures were not correct as they were looking at dates and work order status codes in the CMMS, not the unofficial Excel spreadsheet which the supervisors created and well following
A change to the strategy lead to an update in the work execution process and the organisation. The scheduler position was changed and was included as part of the site organisation instead of the remote office support team. This increased day-to-day interaction and improved communication between the scheduler and the site supervision team. The supervisors started to input into an official site schedule and the quality of this schedule improved as the scheduler started to better understand the site constraints and resource availability. As a result schedule compliance increased and the site tool-time also started to increase.
2.) Spare Parts Issues
Measuring the outputs of the MaintXL System showed that the Mean Time To Repair (MTTR) critical equipment had increased over the last 12 months. This was leading to reduced production efficiency.
A root cause analysis identified that spare parts for work orders kept getting misplaced or used on other jobs. There were also situations where the stock in the storeroom did not match the stock quantity listed in the CMMS, there was often a lower number in the storeroom. This caused delays in carrying out critical work as new parts had to be ordered, not to mention the additional cost of expediting purchase orders.
It was identified that a lack of a dedicated storeman at the site was resulting in poor control of the stores. The storeman position has been cut during a round of earlier job losses and was now a shared position which the site deck crew foreman filled. He was not trained as a storeman and could only be in the stores for around half the day. This ment that operations and maintenance teams could help themselves to the stock and often did not update the issues register. This lead to differences between the actual stock in the storeroom and the number stated in the CMMS.
Change to Inputs – Using the MaintXL System as a tool, the site maintenance team were able to demonstrate that there was a business case for a full-time storeman. If MTTR could be reduced by increasing the availability of critical spares this reduced product losses and increase revenue and profit. Additional manpower was employed and a dedicated storeman re-instated at the site.
3.) Reducing Vendor Cost
The team had worked hard to reduce barriers to getting work done and there had been strong focus on increasing their efficiency. As a result, they were seeing an increased amount of wrench time.
Proactive Improvement – This meant they had reduced their backlog. One member of the team was then allocated to plan the shutdown scope and this meant that less contract labour was required to carry out this role and the company saved money on external contractors who would normally be brought in to carry out the shutdown planning.
All systems exist within an environment and that environment can exert external influences on the system that can’t be controlled by the system.
There is a principle in cybernetics called the “Law of Requisite Variety”. There are plenty of websites that explain this in far greater detail than I can, but the basic idea is that a system needs to provide responses which at least matches the problems applied to it in order to survive in its current state.
An example of this is the weather. Weather happens and as we carry out maintenance we will have to deal with the weather, we cannot change or control it. However, we can respond to the weather in a way that allows us to continue to carry out our maintenance tasks. This could be scheduling work based on the weather forecast, building weather temporary enclosures to protect people and equipment from the rain or wind while it is being worked on, or even redesigning the plant to ensure that equipment is housed in a permanent enclosure.
The MaintXL System has the concept of a Gate Keeper. This is a position or person within the system that is responsible monitoring and then co-coordinating the responses to external influences.
Some other forms of external influence are:
The Health and Safety Executive continually reviews workplace safety and from time-to-time creates or modifies legislation to address issues, concerns or trends it identifies within an industry.
An example of this was the legislation that was introduced after the Piper Alpha Disaster in 1988. A number of pieces of new legislation were introduced to improve how work was carried and controlled offshore and this would have had an impact on maintenance systems at the time and still does today. It must be said that these were positive impacts but would have required a change to the strategy and inputs at the time. These would have included additional information and systems, increased budget, improved methods etc. All of these would have cost money and change how the system operated.
After the implementation of new legislation, the outputs would be expected to improve, particularly in safety and environmental elements. If this is not the case then the MaintXL System is an excellent tool for understanding why they have not improved and pinpointing areas where actions could be taken to address this lack of improvement.
There is constant pressure between the need to maintain equipment and protect its reliability and the need to produce the goods the plant is designed to create.
Production targets may impact our maintenance system by preventing us from carrying out maintenance as outlined in the equipment strategy. This will mean the work has to be rescheduled and may have a knock on effect on its reliability and also the cost. A gate keeper can keep up-to-date with the production schedule or plant shutdown dates and can attempt to mitigate inefficiency i.e. scheduling work when they know it won’t be done or carrying out additional condition monitoring or inspections on overdue equipment.
Plant Modifications and Upgrades
Market demand may call for an increase or change in the plant configuration and design. Involvement of maintenance at each stage of the project to implement the modification will allow the maintenance department to understand and even control the impact on the Maintenance System. This will allow the strategy and inputs to be modified accordingly and will limit or eliminate any negative impacts on the outputs.
An example of this is when a new processing module is added to an offshore installation in order to process new hydrocarbons from a new well through the processing plant.
This new medium being introduced to the plant may put different loads on the equipment. It may also have different fluid and flow characteristics so may impact the integrity of the pipes and vessels in a different manor. As a result, this may need different inspection frequencies and updated risk base inspection models. The new equipment may need additional resource to maintain and they may need to be trained with additional skills to work on the new equipment.
These external changes will need to be managed by the gate keeper through a plan of actions to protect the current output of the system.
Sometimes it will be necessary for the business to cut costs as a result of poor financial performance and market conditions. This results in reduced profit or even losses. Maintenance should be viewed as a profit centre and not a cost centre but often the reality of the situation is different. The maintenance department will employ people and spend money and this money may simply not be available. Targets can be inposed on maintenance management to reduce the number of people employed, often known as “head count”, by people in positions who are only looking at the company through the lens of a financial system.
In this situation inputs are imposed on the system and will most likely result in reduced output performance for a period of time. The MaintXL System can be used as a tool to communicate this information and the site leadership team can make an informed choice to tolerate this or reduce the targets and expectations on what the maintenance system can deliver.
Resource shortage can occur through reduced budget to pay or staff or vendors or through the unavailability of these resources on the market. This can be particularly acute if a company is using highly specialist equipment or operates in a very remote or hostile location where people may not be willing to work.
In these situations the gate keeper function needs to act to carry out a criticality analysis of the equipment and work that is in the system and decide which work to focus on to protect the outputs of the system. For example, safety critical activities will need to be a focus to protect the people working on the plant.
So there you have it. The MaintXL™ System – a framework to implementing and sustaining maintenance excellence.
It starts with understanding what the system needs to achieve through defining a clear maintenance philosophy that is aligned to the requirements of the business and its legal obligations.
These requirements are then translated into an overall maintenance strategy. The strategy outlines the criticality of the equipment, their failure modes, and consequences, and the specific preventative maintenance equipment tasks or life plans that need to be in place to detect and prevent these failures. The strategy also outlines the specific processes and procedures used to carry out maintenance activities, and finally the KPIs and targets which have been agreed and will be used to measure the efficiency and effectiveness of the system.
The requirements outlined in the strategy will then determine the inputs that the systems needs. These inputs are the money, spares, maintenance organisation, skills and information, as well as the IT systems (i.e. CMMS, document management, safe systems of work etc.) that need to be designed and implemented to allow the maintenance to be carried out efficiently.
The process of actually executing work is next. This process is where the work is actually carried out at the site in an operational sense.
Next, we will measure the outputs. These are the results that the system is producing and they are split into those that measure efficiency and those that measure effectiveness. The system has to be both efficient and effective for it to be sustainable.
The results the system produce are then compared to the targets outlined by business as we enter the continuous improvement element of the system. Root cause analysis is carried out if the outputs are off-target, and the findings of these used to feed actions to modify the system and correct any anomilies as well as make further refinements to the input if the targets are being met. i.e. could we get the same results for less inputs.
If you have any question or comments on the MaintXL™ System then please comment below or contact me via the contact form, I’d love to hear your thoughts and suggestions.