Dashboard for analyzing SCADA data log: a case study of urban railway in Malaysia

The operation of maintenance in the railway industry is high priorities for safety which reflects the train services. The Kelana Jaya Line (KJL) is the leading urban metro train operator among light rail transit (LRT) categories in Malaysia. The main issue in KJL, they are facing huge historical data from the supervisory control and data acquisition (SCADA) logger when organizing the corrective maintenance work schedule. The previous conventional method leads to time constraint which causes redundancy of data reading. Hence, SCADA Dashboard was developed as a tool to structure their work prioritization for weekly planning. It was developed by using Microsoft Power Query and Power Pivot, in the advanced Excel software. The dashboard evaluation was performed based on usability and user experience studies. The evaluation test proves that the dashboard can be used effectively and it’s beneficial to the organizations as well as a contribution towards systematic maintenance.


INTRODUCTION
The job of constantly supervising machinery and train activities can be an arduous task. It would be a waste of a person's expertise on trivial activity to be a regular patrol duty around the equipment or computer for performing checks in 24 hours. Therefore, the engineers designed equipment and sensors that might reduce this burdensome function. As a result, control systems and its various supporting units were established in the supervisory control and data acquisition (SCADA) systems. SCADA provides simple monitoring of sensors mounted from a central location at distances [1]- [3]. Since the 1960s SCADA systems have been around and developed according to technological changes. It was commonly used to monitor and control various operations automatically in service networks where manual tasks are difficult and prone to error [4]. The SCADA system is part of the communication system in the Kelana Jaya Line (KJL) and it supervises and controls security functions at the station related to facilities, trains and Traction Power substation (TPSS). Therefore, a smooth and reliable SCADA system is vital for the railway transportation sector where both, data acquisition and control, are critically important for more efficient and effective operation. Based on an interview session with the department of KJL signalling and communication (S&C) team, interruptions in KJL operations occurred due to the lack of monitoring of equipment and system maintenance that eventually led to an unplanned situation. This real-case scenario sometimes impacts the overall operation of KJL [5]. Hence, KJL often faces a major crisis related to transit service reliability, customer dissatisfaction as well as bad company reputation. In KJL, all the trains, station facilities, and traction power supply (TPS) activities are monitored and controlled by the SCADA system. Therefore, this study focused on the SCADA data log, which requires a systematic tool to present the SCADA data log in a graphical view in order to plan effective maintenance works.
The SCADA system in KJL has more than 50 RTUs that communicate with the control organization by using the communication network. The RTU accepts digital and analog input signals from remote field installations [6]. The PLC is a small industrial machine designed to perform logic functions performed by electrical hardware such as switches, relays, and mechanical timers or counters. PLCs are often used as field instruments in a SCADA system because they are cost-effective, scalable, robust and configurable to RTU. human machine interface (HMI) is both software and hardware which allows human operators to track the status of a controlled process, change the control objective by modifying control settings and manually override automatic control operations in emergencies [1]. The SCADA's function in KJL is to provide dynamic monitoring and control functions for the security status, such as facilities at station entrance shutters, entrance to the station's equipment rooms and ticket vending machines. It integrates with the environmental control system (ECS) to monitor and control devices, such as tunnel ventilation fans, escalators, elevators, lightings, and doors. Typically, the SCADA host in KJL is an industrial personal computer (PC) running sophisticated SCADA man-machine interface (MMI) or HMI. Using this program, remote areas are polluted, and the data collected is stored in its central database. The SCADA Host program will configure logic, which then tracks and controls the plant or equipment. The control may be automated or activated with the commands of an operator. The data is then analyzed to identify current warning situations, and an alert message will appear on the operator screen and be connected to an alarm on the computer when an alarm is activated. The operator must then acknowledge this alarm. The data collected is usually viewed on one or more SCADA Host PCs located at the central or master site [1].
The S&C maintenance team focuses on the planning and control processes as well as the railway engineering and maintenance departments; however, it is not equipped with advanced planning and control tools. In general, maintenance planning and control jobs are mainly paper-based, while subjectivity plays a predominant role in the manual task. Usually, maintenance planners comprise several key maintenance personnel who are highly experienced and possess expert knowledge and they have the final say on how the maintenance is planned. Knowledge of the maintenance staff is critical in deciding when different procedures are needed for certain properties, causing more of the human factor in the maintenance chain and therefore making the overall process less analytical and more prone to errors. According to the Indian Ministry of Railways, 59% of the accidents in the last ten years were caused by railway staff, 25.5% by those other than railway staff, 6.5% was due to equipment failure and 9% was due to miscellaneous reasons. Therefore, railway staff was the major cause of railway accidents in India. In the SCADA system, the human factor plays a crucial role. The study of the human factor in the SCADA system started to gain momentum during the last decade. It has been estimated that up to 90 % of all workplace accidents were caused by human errors [7]. The current practice is that KJL maintenance managers with a high level of expertise are entitled to have the final say on maintenance preparation. Knowledge of the maintenance staff is critical in deciding when certain properties require specific action, triggering the human factor in the maintenance chain; thereby making the overall process less objective and more error-prone.
According to experts from KJL, they need to manage and trace huge amounts of complex data for making decisions regarding corrective maintenance (CM) scheduling. Meanwhile, the existing database is based on records from the IT department, which were extracted from a SCADA database server. Often due to unstructured data from the server, the maintenance management staff can't locate the correct data in the database. Consequently, the unsystematic database of maintenance management systems leads to difficulties faced by maintenance management staff as they determine the order of work correctly. Information collection's low usability affects the criticality of assigning the facilities condition and evaluation issues at the KJL wayside. The conventional method, such as using a paper-based process along with an unsystematic database, cannot capture long-term market objectives [8]. In-comprehensiveness current method leads to repetitive reading and causes negligent data retrieval and the data might not be useful for the users or impact the results. This research focused on developing a tool for simplifying tasks by making them similar to natural tasks but give mental aids to the enhanced method. Additionally, these tools might facilitate them in decision-making CM work order, to speed up the sorting and filtering process for maintenance and corrective action whenever a failure occurs. Globalization and the rapid development of communication, computer and information technologies has changed the pattern of maintenance works. In the era of industrial revolution 4.0, machines and systems must irreversibly change the way of data collection and interpretation to optimize maintenance works. To help the maintenance process, a high level of expert knowledge acquired by 253 maintenance managers can be codified in maintenance rules and implemented in proper decision methods such as data visualization, which are more advanced compared to paper-based planning instruments. Data visualization represents data in a structured manner, including knowledge unit attributes and variables [8]. Visualization-based methods of data exploration allow business users to create personalized analytical views by mashing disparate data sources. Advanced analytics can be incorporated into the tools used to create interactive and animated graphics on desktops, laptops or mobile devices, such as tablets and smartphones [9]. Table 1 indicates the advantages of data visualization, based on the percentage of the survey respondents. It shows that 77% agreed that data visualization contributes to improving an organization's decision-making process. Data Visualization allows us to quickly upload data from a variety of sources (for example, spreadsheets, .csv files, fusion applications, and many databases) to a system or model it in a few easy steps. It can easily blend data sets so that they can analyze a larger data set to reveal different patterns and information. This project will use a dashboard to visualize a million data from the SCADA data logger [10]. The dashboard as a technology solution makes it possible to create, maintain, retrieve, and instantly provide the latest information on the right place, at the right time, at the lowest cost for use in decisionmaking [11].

MATERIAL AND METHOD 2.1. Introduction to dashboard
A dashboard comes with an interactive graphical interface that depicts information garnered from databases. As a major concern nowadays on pandemic Covid-19 situation, the adaptation of Dashboard as a centralized information and easily accessible by providing the workers greater support for the decision makers tools which majority of the them are working from home [12]. The basis of the interactive interface is to incorporate information visualization in the context of providing information to end-users, which then assists them in making better and prompt decisions [13]- [17]. According to the previous research on the development of a maintenance dashboard, the occurrences of the event in the system can be classified and categorized into specific levels. The occurrences are accumulated over time as historical information and represented in a visual format over a timeline based on an entity-relationship diagram. The information result will display in a single page view of maintenance activities [18]. Two types of dashboards can be used, namely the analytical and operational dashboards [19]. This study focused on the analytical dashboard since it has to deal with a large volume of historical data related to alarms and events from the SCADA database server to assist the S&C maintenance team with the corrective maintenance schedule. Figure 1 shows the current practice and enhancement method that is executed through the dashboard for their maintenance works. The process of developing the SCADA dashboard involves various steps, such as user requirement gathering, designing sketch, data extraction, transform and load (ETL) and testing process, as shown in Figure 2. This research process was designed based on the previous study [20].

User requirements
In any system development, gathering the requirements is the most important aspect that should be taken seriously by project members and experts from KJL. Obtaining information from the owners of the system is a very important phase in creating a dashboard as this will enable developers to get a clear picture of the user's needs. The implementation of this step will reduce the risk of a failed project. A failed project is bound to occur if the user's requirements are not gathered and understood accordingly [21]. To understand the requirements of the dashboard, several meetings, training, and unstructured interview sessions were held with the S&C team and to obtain an overview of the actual S&C system as well as to identify their requirements of the dashboard and the data structures.

Dashboard design requirements
After the user's requirements are successfully gathered, the next activity would be to design the dashboard connecting one or more data sources and data sets that are represented as charts. The dashboard's design is usually documented in the form of storyboards. Developers create storyboards to indicate the flow of the dashboard, which also includes other features, such as selectors and searching capabilities. The SCADA dashboard comprises four focus areas, which are locations of the alarm, the date and time when the alarm occurred, the trend of the alarm and details of the alarm description, as shown in Figure 3. The dashboard concentrates on quantitative descriptors that illustrate the frequency of the alarm that appears in the SCADA data logger. This dashboard enables the user to identify the most dominant failure rate and occurrence of repetitive alarms showing the output in the graphical view in one screen with a specific time and location selected by the user's input. This dashboard could help them to prioritize their work schedule and avoid delays in performing work tasks.
The dashboard implementation uses the Microsoft Excel software as requested by the customer, as it requires only intermediate and simple programming skills that younger employees have already learned in schools or colleges and do not require special programming expertise. Not all businesses work for multibillion-dollar parent companies that can afford reportable solutions at the enterprise level. Funding for new servers or new software is minimal in most small businesses, let alone funding for costly Dashboard software packages. Excel is the most cost-effective way to deliver key business information using operational monitoring tools, without compromising usability and feature too much [22], [23]. Bremser and Wagner state that the dashboard software must have the capability to access the organization's data warehouse for information. They claimed the Microsoft Excel dashboard tool are cost-effective and easy to use [24]. Excel is easy to understand and generally used by many collaborators with basic computer skills. It also intends to be functional and take a short time in updating the data and generating information [25]. Therefore, this dashboard was developed using Excel as requested by the user as it would be easy for them to access using several devices and for troubleshooting.

Extract, transform and load
The development process is based on previous studies, which including the ETL process. There are two types of data envisaged when developing a dashboard, namely structured data, and unstructured data [26]. Structured data is easily obtainable and searchable as the data are stored in a well-organized database. Although unstructured data is important, it is unorganized and difficult to retrieve. For example, unstructured data can be found in video files, audio files, and Microsoft word files. However, to obtain a better dashboard performance, data transformation is required to make full use of unstructured data. Data log in the SCADA server is structured data since the data can be used directly but requires transformation.
Firstly, the Extract phase is a stage where data extraction occurs, which involves data extracted from various sources of sources. Extract the raw data from the SCADA server. The raw in the .csv format. Secondly, transformation is ideal when configuring a standard database structure, which is important for subduing multiple instances of the same field that would eventually help smoothen the loading process. This section will deal with understanding, transforming, cleansing and clustering data. This process converts data or information from one format to another, usually from the format of a source system into the required format of a new destination system by using Power Query. Power Query does several functions such as removing columns, filtering, grouping data, splitting, extracting keywords, appending rows from another table, and making a conditional column. Hence, the transformation results have been separated according to user requirements according to hourly, daily, monthly, yearly, error code, alarm description, and alarm status.
Then, the alarm location will be clustered by using the VLOOKUP function to look up and retrieve data in a table. The "V" in VLOOKUP stands for vertical, which means that data in the table must be arranged vertically and in rows. VLOOKUP requires that the table be structured so that lookup values appear in the left-most column. Data that needs to be retrieved (result values) can appear in any column to the right. In this process, data that needs to be looked up is the KJL train body unit (BU), stations, substations, etc. There are 76 BTU, 38 stations including a depot, 30 substations, and other locations. Finally, data were loaded to the Power Pivot and an interactive chart as the Dashboard was developed.

Development of the dashboard
The dashboard is designed to display information based on user requirements, as illustrated in Figure 3. The interactive charts for the dashboard design use the Power Pivot. The tools are embedded in Microsoft Excel and most widely used by companies due to powerful data connectivity with various data sources and possessing automation capabilities [27]. This type of dashboard needs only intermediate Excel and basic programming skills which have already been mastered by younger employees in schools or universities; thus, no special programmer expertise is needed. A cheaper and more varied-packed manner can use these parts by creating Excel dashboards as indicators and statistics tools [28]. This dashboard was developed using Excel as requested by the user due to the ease of accessing it with several devices and for troubleshooting. Once the data have been established, and the transformed data is loaded into Power Pivot, the creation of the dashboard comes into place.  Figure 4 shows the SCADA dashboard. The type of alarm and frequency of events of the respective locations can be determined according to the user's input by slicer selectors and filter functions. Eight input slicers can be selected, which are yearly, monthly, daily, hourly, locations, tag_name, alarm type, and alarm description. This slicer is very important for sorting and filtering data while carrying out maintenance tracking processing for reporting purposes.

System testing and validation
The validation process is a preliminary check to ensure that it meets the criteria and requirements outlined in the project plan [29]. Validation is a routine process designed to ensure that reported values meet the quality goals of the data operations. Some of that validation can be performed by the technical team independently. Another aspect, especially in ensuring that the cleaning and filtering of data are correct, is that it must be performed by the primary dashboard users or their representatives. This validation activity is a routine activity that is conducted two or three times per month together with the experts from the S&C department and IT personnel to validate the function of the dashboard, to obtain feedback, as well as brainstorming ideas and knowledge to improve the dashboard prototype. Furthermore, participants were given an overview tutorial about the dashboard and explained its features and functionalities of the dashboard. This activity is important to establish the compliance of the dashboard output integrated well to the raw data and meet the expected right result.
During the test validation session, train body unit 88 was selected as a sample for the pilot test. Figure 5 shows the trend of BU 88 activities and events that occurred in December 2018. By looking at the trends in the dashboard, the user can choose which train should be monitored. Records refer to the trends that indicate the most dominant failure alarm based on the alarm description graph. The user can monitor weekly to trace the activities and failure alarms by selecting day slicers and choosing a date that needs to be viewed. For example, this study selected data for 1 week from 10 until 16 December for the pilot test. One-week data for trends shows that the highest record was the FC1 track brake. The alarms fall under fault category (FC) 1. Number 1 indicates the risk of a fault. There are 15 categories of FC in KJL, FC1 until FC4 in Table 2 represents the high-risk categories that require immediate action from the maintenance team. The rest of the When the description of the alarm was determined based on the data, it revealed a summary of FC 1 records for a week. There were 53 alarms, of which 49 automatically reverted to normal by the system, and there was four remaining alarm loss. Hence, the detailed status of the alarms can be traced hourly in the dashboard. Thus, if the status of the alarm and the normal situation is balanced or not, it means that the alarm has been cleared automatically. Unfortunately, if the fault alarm is more than a normal fault, further action by the maintenance specialist would be required. Therefore, with this dashboard, the user may find the details alarm as well as speed up the sorting and filtering in the data log process to prioritize maintenance and corrective action.
The visualize data in the dashboard matches the data logger captured and analyze manually. In previous practices, the company analyzes the data log manually, and it is time-consuming and prone to delay the maintenance work activities due to difficulty to prioritize work order. By using Excel, the dashboard's versatility enables it to incorporate vast amounts of data, options such as sorting, filtering, exporting, adding, deleting, scheduling, and updating, improving user experiences in their daily task rather than using traditional methods. Hence, this special feature through various options for further analysis was found to be a major strength of the dashboard and in line with the goals set by the S&C team. Pleasurable and friendly user interaction can improve the quality of the work-life of the user, as it helps users to access information at the right time with minimal effort and also keeps the dashboard dynamically stable [30].

CONCLUSION
This research process has successfully achieved its objectives. The dashboard development process had faced one limitation, which was the ETL data processing. The data were too complex to understand due to the different backgrounds of the knowledge. This led to a dependency on expertise from the S&C maintenance personnel regarding data cleansing, filtering, and clustering. Several discussions and evaluation sessions were held during the development with different experts from KJL to obtain their reports and reactions to the drawbacks of dashboard designs. Consequently, the dashboard can simplify the huge volume of data in a graphical interface to increase the readability and efficiency of the SCADA data log. This tool can assist them when making decisions to prioritize their corrective maintenance work order and fulfill the user's requirements with the responsive dashboard designed as well as incurring no additional funding. All data from the dashboard can be used by the maintenance team as supporting evidence for making the right decisions for quick maintenance action, which sometimes is related to costs, for example, staff overtime (OT), assets replacement, etc. Additionally, this study is aligned with Industrial Revolution 4.0, in which the railway industry must aggressively move towards a smart maintenance system that applies the latest technology. Therefore, this research was a good start for the KJL team that was looking forward to a smart maintenance system. In the future, it might be good to enhance the dashboard tools as a solution for predictive maintenance, whereby all this data can be gathered in real-time, while the tools allow managers to set up alerts to quickly notify them of important occurrences. This includes setting up audible or visible alerts in the dashboard for work orders that have remained open for a certain number of days.