Skip to main content

Machine learning-based turbulence-risk prediction method for the safe operation of aircrafts


This study has proposed a method for detecting turbulence, a primary factor that influences safe aircraft operation. The number of observed turbulence events is limited, thereby indicating the requirement of an appropriate flow for detecting turbulence events from a small number of samples. In addition, the opinions and experiences of pilots must be reflected at the initial stage to address the high risk of turbulence occurrence, which can result in airline operations being cancelled. Thus, this study proposed a method for predicting turbulence occurrence based on the turbulence occurrence date information provided by airlines as well as meteorological data sets obtained from open data available in Japan as teacher data. However, because commonly used machine learning methods are unable to detect the turbulence occurrence date, the proposed method employed principal component analysis coupled with the K-Means method to generate risk clusters with a high likelihood of turbulence occurrence and consequently perform statistical checks. Subsequently, the risk clusters were utilized as supervisory data for turbulence occurrence, while the support vector machine was used for predicting turbulence occurrence. Furthermore, the results obtained with the proposed method were statistically checked as well as practically verified by a pilot to confirm the appropriateness of the turbulence occurrence date predicted.


One the most important requirements for airlines has been providing a comfortable space to customers, with avoidance and mitigation of aircraft shaking being a crucial factor. Turbulence is among the common causes of aviation accidents [1, 2]. In addition, the potential increase in aircraft turbulence owing to the effects of global warming is a prevalent concern [3].

Upon receiving a report by a pilot related encountering one or more instances of severe turbulence during a flight, the corresponding aircraft must undergo maintenance work to confirm its airworthiness. Therefore, turbulence remains a major issue for airlines. In addition, if the maximum acceleration recorded exceeds the operational acceleration limit of the aircraft, the scope of maintenance work increases considerably, thereby significantly impacting aircraft operation schedules. Therefore, airlines must strive to avoid severe turbulence to the best extent possible. However, if reports regarding turbulence rely primarily on the opinions of pilots, which tend to vary, variations in reports provided by them are inevitable.

Consequently, this study proposed a method for predicting turbulence occurrence, with an aim to contribute to the safe and comfortable operation of aircrafts. Figure 1 outlines this method, which involves the accumulation and aggregation of open data and quick access recorder (QAR) data [4, 5]. In addition, the prediction of turbulence using machine learning methods is outlined as well, the results of which are fed back to airlines and pilots. Flights to and from Matsumoto Airport in Japan, on E-170 aircrafts operated by Fuji Dream Airlines (FDA), have been observed to frequently experience turbulence during the winter season. In this study, the Matsumoto Airport was considered as the model airport representing mountainous areas subject to turbulence. The proposed technique can also be adapted to other airports.

Fig. 1
figure 1

Framework of the proposed method

For conducting the study, meteorological data from Japan and turbulence information provided by FDA were used. Because turbulence is a relatively rare event, first, the risk cluster was estimated. To this end, a principal component analysis (PCA) of the meteorological data was conducted to obtain a projection matrix \({\varvec{W}}\) such that the number of dimensions of the data to be analyzed was reduced. Subsequently, using the turbulence-occurrence indicator and meteorological data transformed by \({\varvec{W}}\), the k-means method was employed to calculate the risk cluster, which is required for predicting the days with turbulence risk for meteorological data from the year 2019 through support vector classification (SVC). The results based on this meteorological data revealed that the prediction method accurately identified the days with a risk of turbulence.

Related work

Most existing research concerning turbulence prediction has been performed from a meteorological perspective [6, 7], such as studies conducted to examine past turbulence incidents [8]. In an event that occurred in central Colorado on January 11, 1972, optimal conditions for strong mountain wave generation were detectable from sounding data 12–24 h in advance and approximately 1000 km upstream [9]. Further, in the case of a fatal accident involving a light aircraft near Clonvina Inn, Victoria, Australia, on July 31, 2007, the observed environment was analyzed and consequently through a three-dimensional simulation the region where turbulence intensified was identified [10]. When a Boeing 777 encountered severe clear-air turbulence (CAT) over western Greenland at an altitude of 10 km on May 25, 2010, through digital flight data recorder (DFDR) analysis and high-resolution numerical simulations the operation of a high-resolution non-hydrostatic simulation model was confirmed to predict mountain-wave turbulence (MWT) [11]. Thus, understanding past examples are crucial to identifying and predicting the conditions for turbulence.

Currently, analysis using the QAR (Quick Access Recorder) data on board the aircraft is also under consideration for predicting turbulence. Further, new methods for estimating eddy dissipation rate (EDR), considered as a measurement of turbulence, through QAR [12], comparison of calculation algorithms [13], and development of QAR data analysis software to calculate meteorological quantities such as three wind components, wind shear risk coefficient, and turbulence intensity parameters [14] have been proposed as well.

In the current aviation industry, a method for turbulence detection involves the use of Doppler lidar [15,16,17]. A laser beam (using a wavelength band that is safe for the pilot’s eyes) can be fired into the atmosphere to observe winds in the sky. Although CAT cannot be detected by conventional aviation weather radars, airborne predictive windshear (PWS) radars enhanced with algorithms designed for turbulence detection and long-range airborne Doppler lidars have been developed and operated [18,19,20]. Consequently, turbulence detection using these systems has resulted in a reduction in the number of turbulence encounters by alerting pilots to the possibility of encounters.

However, in these studies, data were acquired in real time from many sensors and analyzed using a time-series approach [21]. Although turbulence forecasting with pinpoint accuracy is desirable, preparing a suitable environment for the sensors results in significant cost, and thus, it is infeasible for airlines.

In recent years, owing to the accumulation of aviation data and improvements in computation rapidity, the concept of turbulence prediction via machine learning has been introduced [22, 23]. However, studies concerning this subject are limited. Furthermore, determining an optimal machine learning approach for turbulence prediction is challenging. Moreover, there exists a need to utilize open data (such as meteorological data) to improve analysis accuracy as it can aid in the development of turbulence predictions that can be logically deduced from the data provided by the airlines. For example, in a detailed study of the causes of 700 fatal aviation accidents involving commercial airliners that occurred worldwide between 1990 and 2006, it was found that the composition of accident causes varied greatly depending region of the world, type of operation, and category of aircraft [24]. Further, a study proposed a turbulence prediction algorithm that was based on the examination of turbulent weather phenomena and aircraft operations using a stepwise multiple regression analysis model [25].

Thus, the above discussion reiterates the importance of developing of a system that can predict turbulence, among the most common aviation accidents, independent of the equipment and environment used to acquire the data.

This study attempted to approach machine learning from a non-meteorological perspective. PCA was employed to generate risk clusters for the data and determine the prediction accuracy. Several studies have followed a procedure similar to that of this study [26]. For instance, when attempting to identify the relevant genes for gene expression classification, the data was passed through PCA and independent component analysis methods, and based on the variants of the class obtained, the selected elements were individually transformed to lower dimensions. Consequently, the classification performance of the experiment was evaluated using a support vector machine kernel classifier [27]. Further, in Classifying Colon Cancer Microarray Data, PCA and Partial Least Square (PLS) have also been used to extract more features [13].

However, this has never been done in case of turbulence analysis. Therefore, this study examined the possibility of it being applied as a new method in the field for turbulence prediction.

Basic analysis of turbulence at matsumoto airport

In this section a basic analysis of the data collected at Matsumoto Airport is described.

Examples of the effects of turbulence on flights from Matsumoto Airport

Considering the topographical characteristics shown in Fig. 2, it can be inferred that flights operating from Matsumoto Airport are susceptible to mountain waves [28] from the Northern Alps, particularly on the route toward New Chitose Airport.

Fig. 2
figure 2

Impact of mountain waves on flights departing from Matsumoto Airport

Table 1 summarizes the turbulence, presumably caused by mountain waves, reported by flights departing from the Matsumoto Airport. The authors were present on the flight that departed on December 27th, 2017, to gain a real-world understanding of the level of turbulence faced during a flight. Table 2 shows the meteorological conditions during operation on the day of the turbulence event. These values were found to be significantly different from the conditions during normal operations.

Table 1 Examples of the impact of turbulence on operations
Table 2 Weather conditions at the time of operation on days when turbulence occurs

Visualization of the wind direction, speed of mountain waves, and sway of aircrafts

The first step toward solving the problem involves visualizing the turbulence and its resulting impact on operations. Thus, a visualization depicting a severe turbulence scenario was created, wherein the altitude changes during turbulence were modeled as per the flight of an E170 aircraft. Further, the aircraft altitude at every second was depicted using Google Earth Pro 7.3.4. Figure 3 shows the visual representation of a journey via FDA Flight 211 in January 2018, wherein the pilot encountered severe turbulence during ascent. Latitude, longitude, altitude, heading, pitch, and roll recorded by the aircraft were reflected in the parameters of Google Earth for accurate rendering. The average wind directions during the turbulence were represented using red lines. In addition, the wind blew over the Northern Alps directed towards the aircraft (from the back to the front of the figure). Consequently, significant altitude changes were observed during this period.

Fig. 3
figure 3

Visualization of altitude changes owing to turbulence

Elementary analysis of turbulence occurrences using open data

To create the dataset, weather information from October 1, 2017, through March 31, 2018, were obtained from Sunny Spot [29], which is the website homepage of the Japan Meteorological Agency [30]. In addition, an environmental database provided by Iowa State University [31] was used as well. Subsequently, a dataset with 165 rows and 45 columns was created as an explanatory variable. Table 3 summarizes the items in this dataset. Using real-world QAR data from a pilot report provided by FDA, Yes/No values were obtained for indicating whether any FDA flights that either departed from or landed at Matsumoto airport encountered a greater than moderate (“moderate-plus”) or higher level of daily turbulence during the observation period. Three instances of moderate-plus turbulence exist in the data used in this study. These data were described based on “location-time-altitude-type.”

Table 3 List of data used

Figure 4 illustrates the boxplots of fx106-03-500-spd, Wajima-12-700-temp, Matsumoto-12-500-hum, and fx106-03-500-shear; here, all data are normalized. The circle, triangle, and square in each boxplot represent the instances of turbulence in the data. In addition, on the days when turbulence was observed, the wind speed and shear were high, while the temperature was low [33].

Fig. 4
figure 4

Dataset comparison by wind speed, temperature, humidity, and shear difference


Turbulence-occurrence analysis using PCA

Owing to a lack of sufficient data for observing patterns in annual turbulence, predicting its occurrence through supervised learning is challenging [34]. In addition, there exists a possibility of weather conditions affecting operations on days other than those on which turbulence was reported. Further, meteorological data comprises many explanatory variables, and determining the variables that contribute to turbulence is complex. Thus, in this study, to supplement the scarce information on the day of turbulence occurrence, represent the weather conditions affecting flight operations as well as contributing to the decision-making process of pilots and airlines in implementing flight operations in high-risk environments, the formation of risk clusters was determined using PCA and statistical information on their weather conditions. In addition, a method for determining forecast accuracy was applied as well. First, the limits of the explanatory variables in the PCA were reduced and the weights were used to calculate the risk clusters employing the k-means method. Consequently, the risk cluster obtained was used to predict the occurrence of turbulence through SVC. The program was executed in the Python 3.7.0 environment and scikit-learn version 0.23.2 was used as well. The algorithm is described as follows.

  1. (1)

    Creation of a dataset for turbulence predictions, using open data

  2. (2)

    Calculation of turbulence risk cluster

    1. (a)

      A projection matrix \(W\) is created via PCA [35]

      • (i) Let the i-th data be \({\varvec{x}}_{{\varvec{i}}},\) and let \({\varvec{Y}}\) be all the rows of the data matrix \({\varvec{X}}\) minus \(\overline{x}.\) \({\varvec{Y}} = \left( {\begin{array}{*{20}c} {\left( {{\varvec{x}}_{{\mathbf{1}}} - \overline{x}} \right)^{T} } \\ \vdots \\ {\left( {{\varvec{x}}_{{\varvec{n}}} - \overline{x}} \right)^{T} } \\ \end{array} } \right) = \left( {x_{ij} - \frac{1}{n} \sum \nolimits_{k = 1}^{n} x_{kj} } \right)\). The covariance matrix S of X is as follows: \({\varvec{S}} = \frac{1}{n} \sum \nolimits_{i = 1}^{n} \left( {{\varvec{x}}_{i} - \overline{x}} \right)\left( {{\varvec{x}}_{i} - \overline{x}} \right)^{T} = \frac{1}{n}{\varvec{Y}}^{T} {\varvec{Y}}\)

      • (ii) Let \({\varvec{S}}\) be decomposed into singular values, \({\varvec{S}} = {\varvec{U}}\Sigma {\varvec{V}}\), and let \({\varvec{V}}^{{\left( {\varvec{c}} \right)}}\) be the number of dimensions \(c\) acquired from \({\varvec{V}}\) following dimensionality reduction. Let \({\varvec{W}} = {\varvec{V}}^{{\left( {\varvec{c}} \right)}}\).

    2. (b)

      The data are converted to principal component (PC) vector \(Z\) using \(W\), such that \({\varvec{Z}} = {\varvec{Wx}}.\)

    3. (c)

      The risk clusters are generated based on \(Z\) using the k-means method [19].

      • (i) If the set of indices of \({\varvec{x}}_{{\varvec{i}}}\) belonging to the j-th cluster is \(I_{j}\), the center of gravity \(G_{j}\) of the cluster is \(G_{j} = \frac{1}{{|I_{j} |}} \sum \nolimits_{{i \in I_{j} }} {\varvec{x}}_{{\varvec{i}}} .\)

      • (ii) For each \({\varvec{x}}_{{\varvec{i}}}\), calculate the distance from the center of gravity and repeat assigning to the cluster with the closest distance.

  3. (3)

    Prediction of turbulence using risk clusters

    1. (a)

      Prediction of turbulence-occurrence dates via SVC [35, 36]. Test data set \(\user2{X^{\prime}}\) is converted using \({\varvec{W}},\) \(\user2{Z^{\prime}} = \user2{WX^{\prime}}.\)

      • (i) Consider the following optimization problem for a map \(\phi :{\mathbb{R}}^{c} \to {\mathbb{R}}^{c} ,\)

        Maximize \(f\left( {\varvec{a}} \right) = \sum \nolimits_{k = 1}^{n} a_{k} - \frac{1}{2} \sum \nolimits_{k = 1}^{n} \sum \nolimits_{l = 1}^{n} a_{k} a_{l} y_{k} y_{l} K\left( {\user2{z^{\prime}}_{{\varvec{k}}} , \user2{z^{\prime}}_{{\varvec{l}}} } \right).\)

        Subject to \( \sum \nolimits_{i = 1}^{n} a_{i} y_{i} = 0,\) \(0 \le a_{i} \le C.\)

        Where \(\phi \left( {\user2{z^{\prime}}_{{\varvec{k}}} } \right)^{T} \phi \left( {\user2{z^{\prime}}_{{\varvec{l}}} } \right) = K\left( {\user2{z^{\prime}}_{{\varvec{k}}} ,\user2{z^{\prime}}_{{\varvec{l}}} } \right), \;{\varvec{a}} = \left( {a_{1} ,a_{2} , \ldots ,a_{n} } \right)^{T} , \;y_{i} \in \left\{ { - 1,1} \right\}.\)

    2. (b)

      Validation of predicted turbulence-occurrence dates.

Dimensionality reduction and coordinate transformation in PCA

PCA was employed to determine the factors that cause turbulence. Figure 5 depicts a plot for each observation date, with PCs 1 and 2 forming the x- and y-axes, respectively; the points indicated by arrows represent the three actual instances of turbulence. Flights with turbulence are plotted in the upper-right part of the figure. Figure 6 illustrates a scatter plot of the elements comprising the first and second PC planes. As can be observed, the wind speed, contour lines, and trough elements are concentrated in the upper-right quadrant of the PC1-axis; the temperature elements, in the upper-left. It can be inferred that the farther the PC1 lies on the right-hand side, the higher the wind speed and the lower the temperature (i.e., there are many contour lines). Further, the humidity elements are present at the top of the plot, while wind direction and cloud height elements at the bottom indicating that in case of high humidity, the wind direction is negative. Thus, when most of the wind is from the west, on occurrence of turbulence, the wind from the southwest exerts a significant influence on the aircraft. Furthermore, the cloud height is low toward the top of the PC2 axis. The values of these PC loads are listed in Table 4. Figure 5 reveals that in PC4, troughs and cloud height were major influencing factors on the days when the turbulence occurred. Further, it can be observed that wind speed difference significantly influences the PC5.

Fig. 5
figure 5

Scatter plot of PC1 and PC2

Fig. 6
figure 6

Plot of each element on PC1 and PC2 planes

Table 4 Value of each PC load

The cumulative contribution rate from PC1 to PC2 was determined as 43.23%, with 13 components required to achieve a cumulative contribution of at least 80%. Therefore, 13 PCs were considered to obtain the matrix \({\varvec{W}}\) that performs coordinate transformations based on \({\varvec{Z}} = {\varvec{Wx}}\). Here, \({\varvec{x}}\) is the original data and \({\varvec{Z}}\) is the coordinate after transformation.

Calculation of risk clusters by k-means method

Using the coordinate transformation matrix obtained from the PCA described in the previous Section, the risk cluster was calculated employing the k-means method, wherein the \({\varvec{Z}}\) coordinate transformed via \({\varvec{W}}\) was used. Figure 7 shows the resulting classification into six clusters. Clusters where turbulence was expected to occur are indicated in red, and included almost all the dates on which turbulence was observed, as presented in Table 1. However, although Cluster ID 5 might have been affected by turbulence, it did not significantly affect flight operations. Moreover, it is also probable that the other clusters were less affected by turbulence.

Fig. 7
figure 7

Six clusters obtained via the k-means method

Figure 8 presents a comparison of the risk clusters with other clusters. It is evident that the risk clusters exhibit faster wind speeds, lower temperatures, lower humidity, and larger wind speed differences. Further, T-test or Welch’s test conducted on the risk cluster and the other clusters showed that the p-value was less than 0.05, confirming that the means of the two groups were significantly different for all the items in Fig. 8.

Fig. 8
figure 8

Comparison of risk cluster and other clusters

Result and discussion

Turbulence prediction for validation data

The risk clusters described in the previous chapter were used to predict the occurrence of turbulence using 179 data points that were collected from Table 3 in the year 2019. Following the normalization of this data, axis transformation was performed using the transformation matrix \({\varvec{W}}\) described in the previous section.

Calculation of risk date using the risk cluster via SVC

The risk cluster was used as the training data to predict the turbulence dates for the 2019 data using SVC. Table 5 lists the validation data and SVC parameters.

Table 5 Usage data and SVC parameters

Figure 9 presents a comparison of the days that were predicted to exhibit turbulence with those that were not. It is evident that the distributions of wind speed and temperature are similar to those of the risk cluster. For fx106-03-500-shear, all values between 10/01/2019 and 12/31/2019 equaled 0. It was concluded that the days with predicted turbulence exhibited strong wind speeds, low temperatures, and large wind speed differences.

Fig. 9
figure 9

Comparison of days with and without turbulence predicted for the 2019 data

Verification of forecasted turbulence dates via SVC

Through the use of a weather map, the days with the risk of turbulence predicted using the risk clusters and SVC were verified; the results are summarized in Table 6. Turbulence risk was assigned based on four levels, categorized in increasing order of risk: 1 (normal), 2 (caution), 3 (warning), and 4 (critical), to render it easier to propose to airlines. Herein, the highest risk was observed on January 9, 2019, when flight cancellations were considered. Moreover, even on the dates when the turbulence risk level was at least two, passenger safety, if not flight cancellation, were seriously considered. Therefore, it was confirmed that this analysis can adequately predict turbulence-risk days.

Table 6 Verification of the predicted turbulence days using the weather map

Figure 10 presents a comparison of the per-minute average of the maximum standard deviations (SDs) of the vertical sway of the aircraft [37, 38] obtained from actual QAR data against those of the predicted turbulence date calculated via SVC and the other days. However, 02/04/2019 was excluded because QAR data could not be obtained for the said date. As can be observed, from the left, the graph shows the average of the maximum SDs of the vertical sway, and the values concerning its climb and descent. Further, the vertical sway can be observed to be generally larger on the days wherein turbulence is predicted. Moreover, there is considerable shaking observed during descent.

Fig. 10
figure 10

Comparison of the average of the maximum SDs of vertical sway for days with and without predicted turbulence from QAR data

Comparison with other methods

The proposed method was compared with other methods. Table 7 shows the results of validation of the data in Table 3 using the cross-validation method (K = 10), where the records for the days when no turbulence occurred were set as true, and the accuracy was the highest among the methods used. The results of all methods and models show that the FN: False Negative item, which detects the days when turbulence is observed, is 0, thereby indicating that turbulence occurrence was not detected.

Table 7 Comparison with other machine learning methods


This study used open data to predict the occurrence of turbulence to render aircraft operations safer and more comfortable. Although turbulence occurs infrequently, it is a leading cause of aircraft damage and changes in flight schedules. The findings of this study are twofold. First, following the confirmation of the statistical information using the risk clusters, they were used as supervisory data to make appropriate predictions even for low frequency events such as turbulence. Moreover, the turbulence-risk cluster was derived through k-means clustering after reducing the dimensions of available data via PCA, instead of using the rare instances of turbulence as the training data. In addition, the process of creating risk clusters provided an opportunity to examine the factors that influenced turbulence occurrence. In the case of high-risk events such as aircraft operations, this can have a synergistic effect with the experience and knowledge of the pilots themselves. Further, using this turbulence-risk cluster as training data, the turbulence occurrences for 2019 were predicted through SVC, with the obtained results being confirmed to be sufficiently accurate for utilization by pilots. Second, it was found that using open data, the prediction of turbulence occurrence was possible. Further, the meteorological data used in this study is routinely used by pilots and airlines, and thus can be used at airports other than the one covered in this study.

However, there exist certain issues that need to be addressed in the future. As the present study was focused on aircraft taking off from and landing at airports in Japan, the impact of the season is significant. The occurrence of turbulence over Japan is concentrated in the winter season. Although the present data can be used to predict turbulence in Japan, further data is essential to cover all regions of the world. Moreover, to generalize the model, the availability of such data in various parts of the world must be investigated.

Although this study was conducted for predicting turbulence occurrence for Matsumoto Airport, the same method can be employed to analyze turbulence for other airports. Further, it is suggested that the prediction accuracy of the proposed technique can be improved via the combination of daily aircraft data with open data, such as weather data. The proposed method is expected to aid in turbulence prediction and also result in increased systems expertise and technological advancements in combating turbulence, to compensate for future human resource shortages in aviation.

Availability of data and materials

All data generated or analyzed during this study are included in this published article. The publicly available dataset and the source code for the analysis can be found at the following link Github:



Principal component analysis


Fuji Dream Airlines


Support vector classification


Quick access recorder


Standard deviation


  1. Digest of aircraft accident analyses for prevention of accidents due to the shaking of the aircraft. In: JTSB digest, vol. 15. Japan Transport Safety Board. 2015. Accessed 7 Dec 2020.

  2. Cabinet Office in Japan. White paper on traffic safety in Japan 2018. Accessed 7 Dec 2020.

  3. Regular Airlines Association. Movements towards alternative aviation fuel use in the aviation industry. Accessed 12 Sept 2018.

  4. International Aircraft Development Fund. Trends in flight data analysis technology (FDM/FOQA). Accessed 24 Dec 2020. (In Japanese).

  5. Dubois P, AIRBUS—Airlines SMS & FDA Assistance. Flight data analysis. Accessed 24 Dec 2020.

  6. Japan Aerospace Exploration Agency. Smart flight technology. Accessed 29 Aug 2018.

  7. Japan Aerospace Exploration Agency. Demonstration of turbulence prevention airframe technology (SafeAvio). Accessed 7 Dec 2020.

  8. Clark TL, Hall WD, Kerr RM, Middleton D, Radke L, Ralph FM, et al. Origins of aircraft-damaging clear-air turbulence during the 9 December 1992 Colorado downslope windstorm: numerical simulations and comparison with observations. J Atmos Sci. 2000;57:1105–31.

    Article  Google Scholar 

  9. Lilly DK. A severe downslope windstorm and aircraft turbulence event induced by a mountain wave. J Atmos Sci. 1978;35:59–77.

    Article  Google Scholar 

  10. Parker T, Lane T. Trapped mountain waves during a light aircraft accident. AMOJ. 2013;63:377–89.

    Article  Google Scholar 

  11. Sharman R, Tebaldi C, Wiener G, Wolff J. An integrated approach to mid- and upper-level turbulence forecasting. Weather Forecast. 2006;21:268–87.

    Article  Google Scholar 

  12. Huang R, Sun H, Wu C, Wang C, Lu B. Estimating Eddy dissipation rate with QAR flight big data. Appl Sci. 2019;9:5192.

    Article  Google Scholar 

  13. Lee JCW, Leung CYY, Kok MH, Chan PW. A comparison study of EDR estimates from the NLR and NCAR algorithms. Atmosphere. 2022;13:132.

    Article  Google Scholar 

  14. Haverdings H, Chan PW. Quick access recorder data analysis software for windshear and turbulence studies. J Aircr. 2010;47:1443–7.

    Article  Google Scholar 

  15. Ralph FM, Neiman PJ, Levinson D. Lidar observations of a breaking mountain wave associated with extreme turbulence. Geophys Res Lett. 1997;24:663–6.

    Article  Google Scholar 

  16. Williams JK. Using random forests to diagnose aviation turbulence. Mach Learn. 2014;95:51–70.

    MathSciNet  Article  Google Scholar 

  17. Veermann H, Vrancken P, Lombard L. Flight testing DELICAT—a promise for medium-range clear air turbulence protection. Luleå, Schweden; 2014. Accessed 28 Jan 2022.

  18. Hamilton DW, Proctor FH. convectively induced turbulence encountered during NASA’s fall-2000 flight experiments. 2002. Accessed 28 Jan 2022.

  19. Hamada Y, Kikuchi R, Inokuchi H. LIDAR-based gust alleviation control system: obtained results and flight demonstration plan. IFAC-PapersOnLine. 2020;53:14839–44.

    Article  Google Scholar 

  20. Inokuchi H, Tanaka H, Ando T. Development of an onboard Doppler Lidar for flight safety. J Aircr. 2009;46:1411–5.

    Article  Google Scholar 

  21. Oikawa H, Inokuchi H, Izumi K, Kikuchi Y, Hayasaki N. Relation between resolution enhancement and accuracy in prediction of turbulence area which perturbs aircraft. Tenki. 2010;57(9):669–80 (In Japanese).

    Google Scholar 

  22. Aviation today: delta develops artificial intelligence tool to address weather disruption, improve flight operations. Accessed 7 Dec 2020.

  23. Muñoz-Esparza D, Sharman RD, Deierling W. Aviation turbulence forecasting at upper levels with machine learning techniques based on regression trees. J Appl Meteorol Clim. 2020;59:1883–99.

    Article  Google Scholar 

  24. Oster CV, Strong JS, Zorn K, editors. Why airplanes crash: causes of accidents worldwide. In: 51st annual transportation research forum conference paper; 2010.

  25. Weli V, Emenike G. Turbulent weather events and aircraft operations: implications for aviation safety at the Port Harcourt international airport, Nigeria. IJWCCCR. 2016;2:11–21.

    Google Scholar 

  26. Arowolo MO, Adebiyi MO, Adebiyi AA. An efficient PCA ensemble learning approach for prediction of RNA-Seq malaria vector gene expression data classification. Int J Eng Res Technol. 2020;13:163–9.

    Article  Google Scholar 

  27. Arowolo MO, Adebiyi MO, Adebiyi AA, Okesola OJ. A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data. IEEE Access. 2020;8:182422–30.

    Article  Google Scholar 

  28. Carney TQ, Bedard Jr AJ, Brown JM, McGinley J, Lindholm T, Kraus MJ. Hazardous mountain winds and their visual indicators. US Department of Commerce, National Oceanic and Atmospheric Administration; 1995. p. 55.

  29. Sunny Spot Inc. Weather and climate information site. Accessed 10 Dec 2020.

  30. Japan Meteorological Agency. Historical weather data retrieval (high rise), Accessed 10 Dec 2020.

  31. Iowa State University. Iowa environmental Mesonet: download ASOS/AWOS/METAR data. Accessed 10 Dec 2020.

  32. Japan Meteorological Agency. Aviation forecast. Accessed 24 Dec 2020. (In Japanese).

  33. Center VAA. Characteristics of high-degree disturbances near Japan. Tenki. 1967;14(5):179–87 (In Japanese).

    Google Scholar 

  34. Fukui K. Machine learning learned with Python and examples: identification/prediction/abnormality detection. Ohmsha 2018. (In Japanese).

  35. Yang XS. Optimization techniques and applications with examples. 1st ed. New Jersey: Wiley; 2018.

    Book  Google Scholar 

  36. Bishop CM. Pattern recognition and machine learning. 1st ed. New York: Springer; 2006.

    MATH  Google Scholar 

  37. Sato M, Endo E. Spectrum analysis for turbulence and induced accelerations. J Jpn Soc Aeronaut Space Sci. 2008;56(Supp 653):293–5 (In Japanese).

    Google Scholar 

  38. Prince JB, Buck BK, Robinson PA, Ryan T. In-service evaluation of the turbulence auto-PIREP system and enhanced turbulence radar technologies. NASA, CR-2007-214887. 2007.

Download references


The authors would like to thank Fuji Dream Airlines Co., Ltd. for providing information useful for this study.


Not applicable.

Author information




KI and SM conceptualized the study; SM designed the study; SM, HO, and KI were responsible for data acquisition and analysis; SM and HO were responsible for data interpretation; SM and HO contributed new methods or models or software; SM was responsible for writing the draft manuscript and revisions. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shinya Mizuno.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mizuno, S., Ohba, H. & Ito, K. Machine learning-based turbulence-risk prediction method for the safe operation of aircrafts. J Big Data 9, 29 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • PCA
  • k-means
  • SVC
  • Risk cluster
  • Open data
  • Meteorological data
  • Mountain waves