Using machine learning techniques to predict the cost of repairing hard failures in underground fiber optics networks

Fiber optics cable has been adopted by telecommunication companies worldwide as the primary medium of transmission. The cable is steadily replacing long-haul microwave, copper cable, and satellite transmissions systems. Fiber cable has been deployed in an underground, submarine, and aerial architecture to transmit high-speed signals in intercontinental, inter countries, inter cities and intra-cities. Underground fiber cable transmission has experienced major failures as compared to other mediums of fiber transmission infrastructure. The failure is rampant, and especially the cable get cuts frequently in areas where there are road constructions, road road expansion projects, and other developmental projects. The cost of repairing these failures is enormous, and it largely depends on the cause of failure and the geographical area the faults occurred. The main aim of this paper was to investigate the cost of repairing underground fiber cable failures, clustered the cause of faults, and then used feedforward neural networks (FFNN) and linear regression to predict the cost of repairing future faults. The result of the predictive model is significant to the telecommunications industry, which means the cost of repairing an underground optical networks will be known to the industry players before the fault occurs. depending on which area, the cause of the failure and the mean time to repair (MTTR), the predictive model tells the mobile network operators the cost involved to repair the damaged cable. The accuracy of the result obtained indicates the predictive model is good for predicting the cost of repairing fiber cable cut in underground optical networks.


Literature review
The cost prediction this work entails using machine learning and deep learning models, for instance, neural networks and K-means to obtain the expected result. ML algorithms receive and analyze input data to predict output values. They improve their performance while being fed with new data, collect and analyze high volumes of data about fiber cable cuts. Deploying of ML modelling is a new research area, so no literature has so far been identified concerning the topic presented in this paper. However, there were few works of literature which provided a cost-effective predictive model in big data process and the quality of result [5]. Ahmadvand and Goudarzi [6], also presented their simulation a quick and effective mechanism to rank portions of big data processing steps for costeffective purposes. The warehouse-scale computers used more time-and cost-efficiently process data, or process partial data with optimized approximation concerning resource, time and energy limitations. Ahmadvand and Goudarzi [6] showed that different data portions, from the same or different sources, have different significance in determining the outcome of the computation, and hence, by prioritizing them and assigning more resources to the processing of more essential data. A cost-effective predictive model using adapting genetic algorithm was adopted by Padhy, Singh and Satpathy [7] to predict the fault tolerance in web service applications.
In an optical network transmission system, the cost of installation and deployment of cables and plants is quite expensive. However, critical protective measures have been adopted over the years to safeguard the various component in the optical transmission ecosystem from damage. That notwithstanding, negligence on the part of some organizations and developers have lead to the destruction of many optical cables. When cables are destroyed by known or unidentified developers, the affected mobile network operator (MNO) has to fund the repairing of the optical cable quickly. This situation has lead to unimaginable loss of funds to the MNO, which arguably increases the Capex of the MNO. The optical cables are mostly buried underground for a long-haul transmission system. When a cut occurs in the underground cable, the technician team are required to dig up the cable and repair it. The cost of repairing a fault in an underground optical cable is unknown to the MNOs. The use of machine learning to predict the cost of repairing a fault in fiber optic cable has not attracted the attention of the scientific research community. This study uses the ML model to predict the cost of repairing cuts in underground optical cables.

Hard failures in underground optical networks
Despite the massive infrastructure and the meshed optical networks built to support the deployment of the services, mobile network operators (MNOs) continue to suffer fiber cable cuts, which disrupts service quality and interrupts service to users. Hard failures are referred to as the physical damage to the underground fiber optics cable such as cable breaks or cuts due to several activities [1,2]. The telecommunications industry in Ghana records a monthly average of 200 underground fiber cable cuts, which affect close to 38% of all network interruptions leading to the obstruction of quality of service targets annually. The cuts of underground fiber cables reduce the optical network availability, affect customer experience, and increases the MNOs cable repair costs. The rampant underground cable cut further reduces rural network expansion and affects last mile connectivity strategies [1][2][3].
Private developers and road construction are the leading cause of the destruction of underground fiber cables. Additionally, drainages, dig-ups and other utility service organizations which access the road reservation corridors have contributed immensely to the underground fiber cable destruction. It is crucial to note that continuous cuts in underground fiber cable result in degradation of the optical signal as a result of the additional splicing joints due to constant repairs. This phenomenon severely affects the quality of service (QoS). In ensuring QoS, MNOs are confronted with the challenge of replacing long spans of fiber cables to eliminate the multiple splicing joints, and this comes at huge costs and extended time in restoring network outages [2].
Other developing nations such as Nigeria, Kenya, India, etc. have experienced similar hard failures in the underground optical networks. Studies have found that underground fiber cables were more likely to be cut in rural areas, even though several unapproved activities in urban areas have accounted to a higher number of fiber optic cables destructions in a given area [1,3].

K-mean clustering
K-means algorithm is an iterative algorithm that attempts to partition the dataset into K pre-defined distinct non-overlapping subgroups called clusters, where each data point belongs to only one group. The algorithm cluster data points as similar as possible while also keeping the clusters as different as possible. The algorithm then allocates data points to a cluster in such a way that the summation of the squared distance between the data points and the cluster's centroid. The less variation obtained within clusters, the more homogeneous the data points are within the same cluster. The clustering technique is one of the most common exploratory data analysis techniques used to get an intuition about the structure of the data. In this paper, the K-means algorithm deployed grouped the dataset into C n clusters [8].
Let C 1 , …, C k denote sets containing the indices of the dataset in each cluster. These set satisfying two properties, according to [9]: C 1 U C 2 U … U C k = {1, …, n}, each data belong to one of the K clusters.
If the ith data is in the Kth cluster, then i ∈ C k cluster C k is a measure w(C k ) of the amount by which the data within a cluster differ from each other as derived by [10] in Eqs. (1), (2), (3) and (4) ie: Squared Euclidean distance will be of the clusters is: where |C k | denotes the number of the dataset in the Kth cluster (1) minimize There are K n ways to partition n datasets where is the mean for feature j in the cluster C k , which minimizes the sum-of-squared deviation. However, the coefficient of deviation or separation took the values in the interval [− 1, 1].
If it is 0, then the sample is very close to the neighbouring clusters.
If it is 1, then the sample is far away from the neighbouring clusters.
If it is − 1, then the sample is assigned to the wrong clusters. Silhouette analysis was used to determine the degree of separation between the clusters. That means the distance from all data points is in the same cluster (k). When the average distance from all data points in the closest cluster (C k ) [10,11].
Hence, the computation of the coefficient has been given in Eq. (5) as:

Feedforward neural networks
Feedforward neural networks fundamentally consist of three layers, which are the input layer, hidden layer, and output layer. The FFNN model has one hidden layer with monotonically increasing differentiable functions, which has the ability to approximate the continuous function with the hidden layer [12]. The FFNN model used in this paper has two input neurons and a hidden layer with two nodes, which comprise of the data collected from the field, which are the determinants of the cost of fiber cable repairs. The input variables include the cause of a fault and the area/region it occurred [13]. The output of the FFNN for the input pattern Z p was computed with a multilayer forward pass through the network. According to [14] each output unit, O k , is given in Eqs. (6), (7) and (8) as: where f ok and f yi are the activation functions, O k is the output unit, y i is the hidden unit, W kj is the weight between the output unit O k and the hidden unit y j , Z i,p is the value of the input unit z i of the input pattern z p.
The (I + 1)th input unit and the (J + 1)th hidden unit were the bias units representing the threshold values of the neurons in the next layer within the hidden layer [15]. The machine learning model designed for our predictive model has been shown in the figure. After applying the transform function to Eqs. 6, 7 and 8, our predictive model now has x 1 , x 2, and x 3 as the input variables of the mode with n i=1 (x i w i ) being the activation function [16], as shown in Fig. 1.

Research design
According to Ghana Chamber of Telecoms [1], the leading MNO in the country experience 2000 underground fiber cable cuts in every 6 months and other MNOs suffers about 1500 cuts over the same period. Averagely, 200 cable cuts are recorded monthly, which mostly affects close to 38% of all network infrastructure. The study was designed to collect the necessary data for simulation, testing and evaluation. Weka 3.8 machine learning simulation tool was deployed. Weka is a collection of machine learning algorithms for solving real-world data mining and big data problems. We used Weka for our simulation because it is an open-source machine learning software with a good graphical user interface, standard terminal applications and contains built-in tools for standard

Description of causes of fiber cable cut
Road expansion: Deliberate action which is taken to increase, extend or broaden the existing road. Road construction: A project that involves the creation of a new road and drainage system.
Private developers: Any individual, firm, corporation or entity, other than a nonprofit corporation, limited profit entity, or public corporation who acquire buildings or land in order to construct or refurbish building projects on the site.
Cable defect: This is a fault which is caused by internal damage of an underground fiber cable.
Water pipe laying: Laying pipes along or across the road. Rodent chewing: When animals deliberately chew part of the underground fiber cable. Dig ups: When electricity power companies dig trenches to mount electricity poles and the road.
Railway construction: When existing track infrastructure receives an expansion or a new track is under construction. Table 2 presents the distribution of the cause of fiber cable cut against the total fiber cable cuts throughout the observation period and the cost of repairing each fault. According to the referenced table, the cost of repairing a fiber cable cut caused by a road expansion, road construction, private developer and so on is $3800, $3800, 4200, 4400 respectively. Table 3 shows the total fiber cable cut in each region throughout the observation period and the total amount spent on repairing those faults. Table 3 revealed that region 1 (R1) recorded the highest cable cuts followed closely by region 5 (R5), with the least cable cuts occurring in region 8 (R8). The pattern of the cause of cable cuts and its associated cost of repairs were critically examined and modelled to give an accurate cost value by using machine learning techniques such as k-means clustering, FFNN and The cost of repairing faulty underground fiber cable largely depends on the cause of cut, the location, the MTTR, and the texture of soil where the cable has been buried. The location could be dryland, across the road and waterloo. The soil texture is sandy, rocky and clayey. All these factors directly affect the cost of repairs and even may cause delays in tracing the fault [17,18].
The graphical representation of the predictive model has shown Fig. 2a, b. The cost of repairing a faulty cable largely depends on the cause of cable cut and at which location the event occurred. The higher the MTTR, the larger the cost of repair of the fault. In Fig. 2a, it has been indicated that the highest cost of cable repair was in R2, which was a result of a considerable number of cable cuts within the period of observation [18].
However, Fig. 3 shows the comparison of the cost of repair against the total cables for each cause of the cut. It has been established in Fig. 3 that road construction was the highest cause of cable cut which also accounted for the total highest cost of repairing cable cuts.

Conceptual model for the prediction of cost
The paper collected data from the field in eight regions. The field datasets included the cause of fiber cable cuts, the distance of cuts, time of cuts, etc. The cost of repairing a fiber cut according to the Ghana Chamber of Telecoms is $3000. The datasets were integrated and pre-processed by data selection, data cleansing, and data normalization. The cleaned dataset was partitioned into 80% training dataset and 20% test dataset, as shown in Fig. 4. K-means clustering, FFNN and linear regression were the machine learning algorithms [19] with sigmoid activation in the Weka library were applied to the datasets   to classify and predict the cost of repairing cut cables. The iterative characteristic of an ML model is very significant in the sense that as the ML model is exposed to new datasets, they can adapt autonomously. The model quickly learns from the previous computations of the cost of repairing faults and cause of the cable cut to produce reliable and high accurate results. We adopted the ML technique to simulate our work because of its ability to classify, integrate and train new dataset into the predictive model to produce the expected output. The ML model was carefully chosen to aid in providing good accuracy in our cost prediction research. The cost of repairing fiber cable cut (y) = the cost of repair + the revenue lost * MTTR MTBF is the average time elapsed from one failure to the next [4]. MTTR is the average time taken to repair faults after a failure. In Ghana, the MTTR for fault repair is ≤ 4 h.
Calculating actual MTBF requires a set of observations of fiber cable cuts; each observation is: Uptime: The moment at which an optical network began operating (initially or after a repair).
Downtime: The moment at which an optical network fail operating as a result of cut since the previous uptime-moment. Three quantities are required: n = number of observations. u i = This is the ith Uptime_moment. d i = This is the ith Downtime_moment following the ith Uptime_moment.
So Mean Time Between Failures = d i −u i n , for all i = 1 through n cable cuts. More simply, it is the total working time divided by the number of failures.

Discussion and evaluation
The K-means clustering technique applied to the dataset produced final cluster centroids are shown in clustering Table 4 and Fig. 3. Table 4 present two main cluster in the clustering process; Cluster 0 and Cluster 1, respectively. The clusters have the same attributes which have been represented by Region, Cause of cut and Cost of cut repair. Figure 5 shows the graphical representation of the various clustering and the degree of association. Silhouette analysis was used to determine the degree of separation between the clusters and the distance from all data points in the same cluster.
The Classifier model (full training set) produced the result which made the clustering of the fiber cable cut, causes of cut and the cost of repairs a goon one. Tables 5, 6, 7 and 8 represent the nodes of the neural networks used in the predictive model and the threshold for the weight and inputs, respectively. The result obtained when the sigmoid activation function was applied to each node in relation to the weight of the inputs variables indicate a good correlation in the features deployed in the predictive model. A linear regression model [20,21] was used to train the multilayer perceptron FFNN that produced the predicted cost of repairing a cut cable, as shown in Fig. 6. Three inputs were fed into the multilayer perceptron FFNN cost predictive model, which was computed using Eq. (9). The output of the model [22][23][24] was the predicted cost of repair, which accurately    correlate with the pattern of the clustered input data. The predicted cost [25][26][27] of repairing an underground fiber cable cut, according to the model, can be determined when variables such as region and cause of cuts are known. The empirical machine learning model [28] we used in this paper requires an update of recent fault variables to be able to predict the actual cost of repairing the cable.

Results evaluation
Two main evaluation metrics were adopted to measure the performance of the predictive model. The Mean Absolute Error (MAE) and Root mean squared error (RMSE) were the two metrics used to measure the accuracy of the predicted cost of repairing fiber cable cut. The accuracy rate obtained was perfect for the predictive model. MAE measured the average magnitude of the errors in a set of predictions whiles RMSE is a quadratic scoring rule which was deployed to measure the average size of the error. Both MAE and RMSE expressed average model prediction error which means metrics ranged from 0 to ∞ and are indifferent to the direction of errors. The matrics are negatively-oriented scores, which means lower values are better. Equations (12), (13) and (14) were used in the computation of the error matrics [10,11].
The predictive model has been able to predict the cost of repairing an underground fiber cable cut with a high accuracy rate. The MAE value obtained, as indicated in Table 9 shows an accurate prediction as this paper sort to achieve. We achieved the best value of 0.0658 for MAE performance matric. The obtained MAE value show that our model has high accuracy in predicting the cost of repairing underground fiber cables. The result attained is extremely significant in the telecommunication industry, which experiences frequent underground fiber cable cut to determine the cost of repairing the cable as quickly as possible. The telecommunication company will be able to allocate the exact predicted cost of repair for the maintenance of the cable. Using a machine learning model for such industrial practice offers significant relief of unknow budget and unreliable overhead cost, which may stifle the MNOs Opex. The correlation coefficient (CC) indicates the strength of the relationship between the actual cost of repairing fiber cable and the predicted cost of repairing a fiber cable.
Additionally, the accuracy rate obtained from the result was perfect for the predictive model. The two main evaluation metrics used to measure the accuracy of the prediction were the Mean Absolute Error (MAE), and Root means squared error (RMSE). According to [10] and [11], lower values of MAE and RMSE are better. This assertion strongly supports the results of the model deployed in this paper which gave the value of the correlation coefficient as 0.5179, indicating that the features of the dataset used the prediction were highly correlated. The performance matric deployed met the empirical threshold of measurement, and that makes our result more relevant to predict the expected cost when hard failures occur in underground optical network infrastructure.

Conclusion
The main aim of this paper was to investigate the cost of repairing underground fiber cable failures, classify the cause of faults, and then used the ML model to predict the cost of repairing future faults. We achieved this carefully collecting dataset and then deployed the Weka ML simulation tool to achieve our objectives. The result of the predictive model is significant to the telecommunications industry, which means the cost of repairing an underground optical network will be known to the industry players before the fault occurs. Depending on which area, the cause of the failure and the MTTR, the predictive model, tells the mobile network operators the cost involved to repair the damaged cable.
Road construction, private developers, water pipes installation, railway construction, etc. are some of the reasons for underground fiber cable cuts as indicated in the dataset used for predictive modelling. The K-means clustering model was able to provide balanced clustering with accurate centroid computation. The multilayer perceptron FFNN was analyzed and evaluated with the best MAE value, which made the result of this paper have good, acceptable accuracy in predicting the cost of repairing fiber cable cuts. The accuracy of this work shows the MNOs will now have an idea about the cost of repairing the cut in an underground optical cable before it happens. The cost of repairs depend on the features used for prediction, thus, the region and the cause of the cut.
There has been series and rampant fiber cable cuts in Ghana, parts of Africa and other low-income countries. The obtained MAE value show that our model has high accuracy in predicting the cost of repairing underground fiber cables. The result attained is extremely significant in the telecommunication industry, which experiences frequent underground fiber cable cut to determine the cost of repairing the cable as quickly as possible. The telecommunication company will be able to allocate the exact predicted cost of repair for the maintenance of the cable. Using a machine learning model for such industrial practice offers significant relief of unknow budget and unreliable overhead cost, which may stifle the MNOs Opex.