Skip to main content

Prediction of flight departure delays caused by weather conditions adopting data-driven approaches


In this study, we utilize data-driven approaches to predict flight departure delays. The growing demand for air travel is outpacing the capacity and infrastructure available to support it. In addition, abnormal weather patterns caused by climate change contribute to the frequent occurrence of flight delays. In light of the extensive network of international flights covering vast distances across continents and oceans, the importance of forecasting flight delays over extended time periods becomes increasingly evident. Existing research has predominantly concentrated on short-term predictions, prompting our study to specifically address this aspect. We collected datasets spanning over 10 years from three different airports such as ICN airport in South Korea, JFK and MDW airport in the United States, capturing flight information at six different time intervals (2, 4, 8, 16, 24, and 48 h) prior to flight departure. The datasets comprise 1,569,879 instances for ICN, 773,347 for JFK, and 404,507 for MDW, respectively. We employed a range of machine learning and deep learning approaches, including Decision Tree, Random Forest, Support Vector Machine, K-nearest neighbors, Logistic Regression, Extreme Gradient Boosting, and Long Short-Term Memory, to predict flight delays. Our models achieved accuracy rates of 0.749 for ICN airport, 0.852 for JFK airport, and 0.785 for MDW airport in 2-h predictions. Furthermore, for 48-h predictions, our models achieved accuracy rates of 0.748 for ICN airport, 0.846 for JFK airport, and 0.772 for MDW airport based on our experimental results. Consequently, we have successfully validated the accuracy of flight delay predictions for longer time frames. The implications and future research directions derived from these findings are also discussed.


With the increasing demand for air travel, the number of air passengers has significantly increased. The global air passenger transport market doubles every 15 years [1]. For example, as of February 2023, the revenue passenger kilometer in Asia Pacific and North America has increased by 105.4% and 25.1% relative to that in 2022, respectively. Despite a temporary decline in passenger traffic during the Covid-19 pandemic, the number of air passengers has steadily increased over the past few decades [2].

Meeting the increasing demand for air travel and ensuring efficient supply chain operations require the development of aviation infrastructure. This includes expanding airport facilities, updating airline fleets, and implementing effective air schedule management. Addressing these issues is crucial to provide a seamless and reliable travel experience for passengers. However, a significant challenge in delivering satisfactory services is the frequent occurrence of unexpected flight delays and cancellations [3].

According to Tileagă and Oprisan [4], the number of compensation cases due to delayed flight schedules is increasing steadily. Table 1 shows that the number of compensation recipients for air delays and cancellations is steeply increasing every year. Flight delays have significant economic consequences for both airlines and passengers, rendering it a notable issue within the aviation industry.

Table 1 Number of eligible passengers for compensation versus the number of total passengers

Table 2 show the types and proportion of delays from 2010 to 2021 at the John F. Kennedy International Airport (JFK). It reveals that weather-related delays account for a small proportion of delays (3.86%). However, weather-related delays were longer than other types of delays, with an average delay time of 69.81 min and a standard deviation of 100.79 min [5].

Table 2 Different types of delays

The frequency of abnormal weather phenomena that are known to contribute to an increase in flight delays [6] is on the rise worldwide. In addition, the regional climate determined by geographical location plays a significant role in flight operations [7]. For example, in South Korea, the total rainfall period is concentrated from July to September each year, with approximately 42.5% rainfall in July, 27.4% in August, and 12.8% in September. In addition, the region is directly affected by typhoons at the end of August through early September every year.

While previous studies on flight delay prediction have often incorporated weather information [8,9,10], the majority of these studies have centered around predicting delays within relatively short timeframes, typically within thresholds of 15 min or up to 4 h, primarily tailored to airline services. However, the unique context of international flights covering vast distances across continents and oceans, with flight durations spanning from as little as 10 h to as long as 20 h, underscores the necessity for delay prediction over more extended timeframes.

Therefore, this study aims to predict flight delays over more extended timeframes (2 to 48 h) based on weather data. We focus on three well-known international airports: Incheon International Airport in South Korea (ICN), John F. Kennedy International Airport (JFK), and Chicago Midway International Airport (MDW) in the United States. In addition, we use weather information from the meteorological agencies located at each airport. “Background and related work” section reviews previous research in this area, whereas “Methodology” section presents the machine learning and deep learning models along with the evaluation methods utilized in the study. The experimental procedures and the comparison of the results across the models are presented in “Implementation and result” section. “Discussion and concluding remarks” section concludes this paper by presenting the interpretation of the results, noteworthy findings, limitations, and suggestions for future research.

Background and related work

Several studies have been conducted to forecast flight departure delays using various statistical methods, machine learning, and deep learning techniques. Table 3 provides a summary of prior flight delay detection research based on machine learning and neural network approaches.

Researchers [9, 11, 12] have utilized Bayesian modeling, clustering, classification, and regression with diverse datasets from different regions. The time span of the data varied, ranging from 1 month to 5 years, and the airports under investigation differed as well. Khaksar and Sheikholeslami [9] identified parameters that enable effective estimation of delays. They used Bayesian modeling, decision tree, cluster classification, random forest, and hybrid methods. They used 2,825,647 data for US airlines and 15,428 data for Iranian airlines. They realized an accuracy of approximately 70%.

Al-Tabbakh et al. [11] analyzed the flight delay patterns using four decision tree classifiers, including Decisionstump, J48, Random Forest, and REPTree. They utilized 512 data from a brief duration of 1 month, i.e., January 2018. The findings revealed that among the classifiers evaluated for the Egypt Airline dataset, REPTree attained the highest accuracy score of 80.3%.

Ye et al. and Atlioğlu et al. [12, 13] conducted flight delay prediction via supervised learning methods, whereas [12] employed multiple linear regression, a support vector machine, extremely randomized trees, and LightGBM. They used 105,993 data and reported the highest accuracy of 86.53%.

Atlioğlu et al. [13] studied 11 machine learning models using data obtained following feature selection and transformation. They used 8086 data and achieved F1-scores of approximately 81%.

Certain researchers predict airline delay using neural networks and hybrid models [8, 10, 14]. Kim et al. [8] investigated the effectiveness of deep learning models in predicting air traffic delays. Daily sequences of departure and arrival flight delays for individual airports were modeled using the long short-term memory (LSTM) and recurrent neural network (RNN) architecture. The accuracy of RNN improves with deeper architectures, exhibiting the highest performance with an accuracy of 90.95% on the Atlanta air traffic data.

Qu et al. [10] analyzed and predicted flight delays using a convolutional neural network (CNN) and RNN models that are well-suited for classification problems in the field of deep learning. They improved the CondenseNet network by incorporating CBAM modules within the CNN-based CondenseNet algorithm to develop CBAMCondenseNet. Additionally, they constructed a CNN-MLSTM network based on the CNN model and injected the SimAM module to enhance the attention of flight chain data. They used 36,287 data of China and achieved the highest accuracy score of 91.36%.

Yazdi et al. [14] designed the proposed model to output optimized results by incorporating a technique based on stack denoising autoencoder to account for the noisy flight delay data. They constructed SAE-LM based on an autoencoder and LM algorithm. The stacked denoising autoencoder is based on only denoising autoencoder. They utilized a comprehensive dataset spanning 5 years of US flight operations, comprising a total of 3,601,679 data points. The results demonstrated that the proposed model exhibited enhanced accuracy compared with the RNN model, highlighting its effectiveness in predicting flight delays. While numerous researchers have utilized state-of-the-art machine learning and deep learning techniques to study weather-related takeoff delays from various angles, the majority of studies have focused on predicting delays within a time criterion of approximately 15 min. There has been limited exploration and prediction of flight delays exceeding 2 h.

By employing established research methodologies, it is feasible to aggregate the outcomes of short-term predictions to generate long-term forecasts. Nevertheless, it’s vital to recognize that repeated predictions may introduce inaccuracies. When assessing the practical utility of such models, the ability to predict aviation delays over extended time intervals based on input data widens the scope of possibilities for long-haul flights and diverse flight schedules. This expanded capability offers benefits not only from the perspective of airport resource management but also in various other aspects. Therefore, there is a pressing need for research that focuses on machine learning and neural network models capable of forecasting the distant future using authentic long-term differential data. Hence, in this study, our objective is to specifically address and forecast flight delays of more than 2 h.

Table 3 Summary of prior prediction of flight delay


Classification models

We used the following machine learning models and LSTM neural network to predict flight takeoff delays. The LSTM model boasts the advantage of effectively managing time-series data, but it comes with the drawback of requiring considerably more complex and powerful hardware. From this standpoint, machine learning (ML) models allow predictions at the individual time-unit level and are notably more computationally efficient when compared to the LSTM model.

  • Decision Tree (DT): DT is a type of supervised learning model that classifies or regresses data by applying a set of classification rules. The resulting model has a tree-like structure, hence the name ‘Decision Tree.’ Pruning techniques can be applied to enhance the model’s generalization performance and prevent overfitting, ensuring that it performs effectively on unknown data. Grid search can be used to find the optimal parameter values for the DT model, optimizing its performance [15]. It does not necessitate data preprocessing, such as normalization or handling missing values and outliers. It also has the capability to simultaneously handle both numerical and categorical variables. However, it has the limitation of considering only one variable at a time, which can make it challenging to capture interactions between variables. Moreover, the shape of the resulting decision tree can exhibit significant variations with minor differences in the data [16, 17].

  • Random Forest (RF): RF is an ensemble algorithm that trains multiple DT models and combines their results to make predictions. The method entails the random selection of a subset of features from the complete feature set to build one decision tree, followed by the selection of another random feature subset to create additional decision trees. Multiple decision trees are generated using this process. The final prediction is made by choosing the most frequently occurring prediction from these multiple decision trees [18]. This approach is versatile as it can be applied to both classification and regression problems. It is particularly effective in handling large-scale data and mitigates the issue of overfitting by reducing model noise, ultimately improving model accuracy [19, 20].

  • Support Vector Machine (SVM): SVM is a powerful supervised learning model that can be used for various tasks such as classification, regression, and anomaly detection. It aims to find a decision boundary that maximizes the separation between two classes while satisfying certain conditions. SVM can handle both linear and non-linear classification problems by using different kernel functions [21]. It determines the side of the decision boundary to which a data point belongs, allowing it to effectively classify data. Although it may be slower and less interpretable due to the requirement for multiple combination tests, it offers the advantage of being applicable to both categorical and numerical prediction problems, with minimal vulnerability to outlier data. Additionally, it is less susceptible to overfitting and more user-friendly compared to neural networks [22, 23].

  • K-Nearest Neighbors (KNN): KNN is a classification algorithm that operates based on the principle of similarity. It assigns a class label to a given data point by considering the labels of its “k” nearest neighbors in the feature space. The distance between data points is typically calculated using the Euclidean distance measurement method [24]. It offers several advantages, such as high accuracy and the ability to exclude outlier data from consideration by using only the top k closest data points. Furthermore, it does not rely on assumptions about the data since it is based on existing data. However, it has the disadvantage of increased processing time as the dataset size grows, as it needs to compare with all existing data points, and it may require significant memory usage for large datasets [22, 25].

  • Logistic Regression (LR): LR is one of the simplest classification models. It predicts the probability of data belonging to a certain category as a value between 0 and 1 and classifies it into the category with a higher probability [26]. It has the advantage of being less complex and faster due to linear combinations, making it easy to interpret the results. However, it may suffer a reduction in learning ability when dealing with non-linear relationships and can be sensitive to outliers and anomalies, which are its disadvantages [27, 28].

  • Extreme Gradient Boosting (XGB): XGB is an algorithm implemented using the boosting technique. It supports both regression and classification problems and exhibits suitable performance and resource efficiency. It is characterized by strong durability with its built-in overfitting regularization function [29, 30].

  • Long Short-Term Memory (LSTM): LSTM networks are a type of RNN that can learn the order dependence in sequence prediction problems. RNNs are modified by adding a memory cell that can store information for an extended period. LSTM was proposed as a solution to address the issue of vanishing gradients in RNN when processing long sequential data [31]. However, it has the drawback of being computationally intensive and having a complex model structure due to the incorporation of forget gates, input gates, and output gates [32,33,34].

Evaluation methods

To evaluate the performance of each classifier, we calculated the confusion matrix and measured the accuracy, precision, recall, and F-score. Table 4 is the confusion matrix, a 2 × 2 matrix representation of classification results. The number of correctly classified instances is the sum of the diagonals of the matrix, while all other instances are incorrectly classified. Each item in the confusion matrix includes the following four indicators.

Table 4 Confusion matrix

The first indicator is True Positive (TP), which signifies that the predicted value is positive when the actual value is positive. The second indicator is True Negative (TN), indicating that the predicted value is negative when the actual value is negative. The third indicator is False Positive (FP), denoting that the predicted value is positive when the actual value is negative. Lastly, the fourth indicator is False Negative (FN), showing that the predicted value is negative when the actual value is positive [35].

Accuracy serves as “a metric for assessing the overall performance of each model by computing the ratio of correctly classified samples to the total number of samples” [36]. However, in situations with a significant imbalance between positive and negative samples, accuracy may not provide a suitable evaluation measure. Precision presents “the proportion of true positive cases among all predicted positive cases” [37], while recall computes “the ratio of correctly predicted positive samples to the total number of true positive samples” [38]. F1-score represents “a balanced measure that combines both precision and recall” [39].

Implementation and result

Data description and analysis

We collected three datasets including flight and weather information of Incheon International Airport in South Korea (ICN) [40], John F. Kennedy International Airport (JFK) [41], and Chicago Midway International Airport (MDW) [42] in the United States.

The flight information [43, 44] is organized by all flight-related features, including scheduled departure time, actual departure time, and delay type. The weather information is the officially introduced regional weather feature. The flight information scheduled from 2010 to 2021 was examined, spanning a total of 11 years. The weather information corresponding to the same period was also collected. For the experiment, weather and flight information were merged with a time difference for data preprocessing to predict flights based on weather conditions. The merged datasets include the attributes listed in Tables 5 and 6. Among these attributes, the airline, flight number, and destination were not used in the actual model training. Additionally, since the features wind direction (e.g., NW, WNW) and condition (e.g., Cloudy, Windy) are categorical data, they were transformed into one-hot encoding before being included in the training dataset.

Table 5 Incheon International Airport’s attributes list
Table 6 John F. Kennedy International Airport, and Chicago Midway International Airport’s attributes list

Data processing

ICN dataset

In situations where the scheduled departure time differs by more than 1 h, we classify the data as delayed. The ICN dataset comprises 1,562,029 instances of normal flights and 7850 instances of delayed flights caused by weather conditions. To achieve a balanced distribution between normal and delayed cases, we randomly sampled an equal number of normal and delayed flight instances. To address the absence of certain features in the cases, we utilized a data interpolation method that was previously validated in a research study [45]. Due to the hourly-based nature of the ICN weather information, there were instances of missing features. To fill these gaps, we employed a linear interpolation technique to estimate the values for the unmeasured time periods. The interpolated data comprises 953 data points, which accounts for 0.9% of the total 105,192 data points. Furthermore, we included flight takeoff results with time differences as additional features. To fulfill the objectives of the present study, we implemented a time difference criterion and utilized combined flight and weather cases. The time differences were categorized into intervals of 2, 4, 8, 16, 24, and 48 h.

JFK dataset and MDW dataset

Similar to the ICN dataset, we created delayed flight instances for the JFK and MDW datasets based on the time difference between the scheduled and actual departure times. The JFK dataset consisted of 763,930 normal cases and 9417 delayed cases attributed to weather conditions, while the MDW dataset comprised 398,945 normal cases and 5562 delayed flight instances. Similar to the approach followed for the ICN dataset, we conducted down-sampling procedures to achieve a 1:1 ratio of normal and delayed cases.

In both the JFK and MDW datasets, the weather information consists of several categorical features, such as wind direction and condition details. To incorporate these features into our data-driven approaches for machine learning and neural network frameworks, we employed a one-hot encoding technique. This encoding method allows us to represent the categorical variables as binary vectors, facilitating their utilization in the models. Additionally, we included flight takeoff results with time differences as one of the features in the dataset. Subsequently, both the JFK and MDW datasets with weather information were merged.


Figure 1 shows the flow chart of our overall approach. For machine learning models, we input the data sampled following the process as mentioned above, while we stack the sampled data to create time-series data and input them to the LSTM model.

Fig. 1
figure 1

Flow charts for a machine learning, and b LSTM models

To begin, we partitioned the dataset into subdata and testing subsets in an 80:20 ratio. Subsequently, we further divided the subdata into training and validation subsets in an 80:20 ratio, resulting in a distribution of the training, validation, and test datasets with a ratio of 67:13:20. Table 7 presents the number of datasets used for training, validation, and testing.

Table 7 Summary of the employed datasets in training, validation, and test sessions

All experiments were conducted on a single GeForce RTX 3080 Ti 10GB GPU and implemented using Python 3.6 as the programming language. We performed a grid search to determine the optimal hyperparameters, including learning rates, number of epochs, number of layers, and number of stacked time-series data. We selected the most optimal parameters for the best performance. Tables 8 and 9 show the list of hyperparameters for DT and LSTM used in the grid search. In the case of the LSTM model, the training parameters varied for each airport dataset. The ICN dataset had 2,385 parameters, while the JFK and MDW datasets had 2,833 parameters.

Table 8 Tested parameters in DT
Table 9 Tested parameters in LSTM model


Flight delay prediction

Tables 10, 11 and 12 show the prediction results of flight departure delays based on weather data using various models. The results were obtained corresponding to a total of six different time differences (2, 4, 8, 16, 24, and 48 h).

Table 10 summarizes the results of the ICN dataset. The RF model reported the highest accuracy score of 0.749 with a time difference of 2 h. Except for the DT model that showed the best recall performance of 0.700, the RF model displayed superior performance in other metrics.

For the JFK airport dataset with a time difference of 2 h, the LSTM model achieved the highest accuracy score of 0.852 (Table 11). In terms of recall for predicting flight delays, the DT model outperformed all other models (0.826), whereas in terms of precision of prediction of on-time flights, the RF model outperformed all other models (0.835). Nonetheless, the LSTM model demonstrated superior performance in other evaluation metrics.

The result corresponding to the MDW airport dataset for a time difference of 2 h is presented in Table 12. The LSTM model achieved the highest accuracy score of 0.785. Although the DT model exhibited the best performance in terms of recall (0.759), the LSTM model outperformed the other models in all other evaluation metrics.

Table 10 Results of ICN airport
Table 11 Results of JFK airport
Table 12 Results of MDW airport

Flight delay prediction (1 to 24 h, hourly)

Tables 13, 14 and 15 provide an hourly breakdown of model accuracy from 1 h to 24 h, utilizing the same three datasets for ICN, JFK, and MDW airports, along with average training and testing times. The hyperparameters that yielded the best performance in the prior experiments were applied. Across all three airport datasets, the highest accuracy was observed at a 1-h time difference, with a declining trend in performance as the time difference increased. The magnitude of performance decline from 1 h to 24 h for each model is detailed in Table 16. Notably, the Random Forest model exhibited the least performance degradation, with a decrease of only − 3.6%, while the SVM model showed the most significant performance decline, with an average decrease of − 16.1%. Machine learning models completed their training in just a few seconds, while LSTM required several 100 s, indicating it was approximately 100 times more time-consuming. In terms of testing time, it ranged from as low as 1 ms to a maximum of around 1.3 ms.

Table 13 Results of ICN airport (accuracy from 1 to 24 h)
Table 14 Results of JFK airport (accuracy from 1 to 24 h)
Table 15 Results of MDW airport (accuracy from 1 to 24 h)
Table 16 Comparison of accuracy levels between 1 and 24 h

Ablation study

We conducted training on the ICN dataset with identical parameters and training strategies, except for the exclusion of linear interpolation, while examining a time difference of 2 h. The results, as depicted in Table 17, reveal a slight reduction in overall performance, ranging from 1 to 2%, when interpolation was omitted. It is noteworthy that the interpolated data constitutes only 0.9% (953 out of 105,192) of the entire dataset, which lends credibility to the decision to incorporate linear interpolation in our research.

Table 17 Ablation study on linear interpolation in the ICN dataset with a time difference of 2 h

Feature importance

To determine the features with a substantial impact on our models, we conducted feature importance analysis. We chose the Random Forest and LSTM models, which demonstrated the best performance. For the Random Forest model, we made use of the built-in feature importance function, whereas for the LSTM model, we employed external algorithms using loss data. Consequently, in the case of Random Forest, higher values correspond to greater feature importance, whereas for LSTM, lower values signify reduced importance. Considering the results of the ICN airport dataset, Random Forest attributed the highest importance to temperature, dew point, and weather phenomena in that order, while LSTM assigned the highest importance to temperature, wind speed, weather phenomena, and local pressure. Notably, temperature was identified as the most crucial feature in both models (Table 18).

Table 18 Feature importance of ICN airport

For the JFK airport dataset, Random Forest identified pressure, temperature, and dew point as the most important features, while LSTM emphasized pressure, precipitation, and wind speed as the top influential factors. Notably, pressure was recognized as the most crucial feature in both models for this dataset (Table 19).

Table 19 Feature importance of JFK airport

In the case of the MDW airport dataset, Random Forest indicated that pressure, humidity, and temperature were the top features in terms of importance, while LSTM emphasized pressure, precipitation, and wind speed as the most influential factors. Notably, pressure was consistently identified as the most important feature in both models for this dataset (Table 20).

Table 20 Feature importance of MDW airport

Comparison with prior approaches

We conducted a performance comparison between our models and a prior research model [8]. Using the same JFK airport dataset, we compared our research’s Random Forest and LSTM models with the prior research model’s LSTM model. Our Random Forest model achieved an accuracy of 84.3% with a 2-h time difference and 84.6% with a 48-h time difference. In contrast, the LSTM model in our research achieved an accuracy of 85.2% with a 2-h time difference and 73.6% with a 48-h time difference. It’s worth noting that the previous model exhibited a performance of 86.51% at a short time interval of 15 min.

Discussion and concluding remarks

For predicting flight takeoff delays using weather information for the airports of ICN, JFK, and MDW, machine learning and LSTM models were employed. Based on the prediction results for the three regions, the RF model demonstrated the highest performance for the ICN airport, while the LSTM model exhibited the highest performance for JFK and MDW airports, with a minimum time difference of 2 h. The accuracy scores were 0.749 for ICN, 0.852 for JFK, and 0.785 for MDW airports. Moreover, the RF model also displayed the best performance with high accuracy for all three airports, with a maximum time difference of 48 h; the accuracy scores were 0.748 for ICN, 0.846 for JFK, and 0.772 for MDW airports. Moreover, when assessing test times, all of the models require less than 2 ms, which makes them suitable for real-time predictions. These findings confirm the feasibility of predicting flight takeoff delays using weather data collected 2 h prior to the scheduled departure time.

Our analysis incorporated datasets spanning from 2011 to 2021, encompassing a long time period. This extensive dataset allowed us to leverage both actual flight operation data and weather information for our analysis. By utilizing these comprehensive datasets, our proposed models exhibited outstanding performance in predicting delayed flights across three different datasets. The utilization of a long-term dataset facilitated robust predictions and enhanced the reliability of our models. Furthermore, the approaches we developed can be applied to various other transportation-related domains, including ocean vessel delays, vehicle operation restrictions, and outdoor construction work stoppages. In these application areas, early-stage warnings play a crucial role in mitigating potential risks to human safety and property damage. By leveraging our proposed models, it becomes feasible to anticipate and prepare for potential disruptions, enabling proactive measures to be taken in advance. This can significantly contribute to minimizing the adverse impacts associated with delays and restrictions in these transportation-related sectors. The presented implications notwithstanding, it is important to acknowledge the presence of notable limitations. One such limitation is the significant influence of national and regional factors on weather conditions, rendering it challenging to generalize the results to other locations. The generalization of findings beyond the specific context may not be straightforward owing to these variations. Furthermore, the performance of the ICN airport dataset was relatively lower compared with the JFK and MDW airport datasets. This discrepancy in performance could be attributed to several factors, including the presence of missing features in the dataset. The absence of these features may have impacted the overall performance of the models. Future research endeavors should focus on addressing these limitations by exploring more comprehensive datasets and improving data collection methods to minimize missing features. This would enhance the generalizability and accuracy of the models in predicting flight delays.

In future research, our aim is to develop a more robust model that incorporates geographic information, enabling its application to other airports beyond the specific datasets analyzed in this study.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


  1. Economics-IATA: air passenger market analysis 2014. 2014.

  2. Economics-IATA: air passenger market analysis 2023. 2023.

  3. Efthymiou M, Njoya ET, Lo PL, Papatheodorou A, Randall D. The impact of delays on customers’ satisfaction: an empirical analysis of the British airways on-time performance at Heathrow airport. J Aerosp Technol Manag. 2018;11:e0219.

    Article  Google Scholar 

  4. Tileagă C, Oprisan O. Flights delay compensation 261/2004: a challenge for airline companies? In: Organizations and Performance in a complex world: 26th international economic conference of Sibiu (IECS) 26. Springer; 2021. p. 335–44.

  5. U.S.D. of transportation, airline on-time performance data. 2023. &QO_anzr=Nv4yv0r.

  6. Sim G-M, Kim Y-S, Jung M-P, Kim J-W, Park M-S, Hong S-H, Kang K-K. Changes in the frequency of abnormal weather events in South Korea in recent years. J Korean Soc Clim Change. 2018;9(4):461–70.

    Google Scholar 

  7. Lee J-W, Yoo H-I, Kim G-H. Analysis of South Korea’s heavy rain characteristics from 2006 to 2015 using AWS data. In: Proceedings of the Korean meteorological society conference. 2016. p. 521–2.

  8. Kim YJ, Choi S, Briceno S, Mavris D. A deep learning approach to flight delay prediction. In: 2016 IEEE/AIAA 35th digital avionics systems conference (DASC). IEEE; 2016. p. 1–6.

  9. Khaksar H, Sheikholeslami A. Airline delay prediction by machine learning algorithms. Scientia Iranica. 2019;26(5):2689–702.

    Google Scholar 

  10. Qu J, Wu S, Zhang J. Flight delay propagation prediction based on deep learning. Mathematics. 2023;11(3):494.

    Article  Google Scholar 

  11. Al-Tabbakh SM, El-Zahed H. Machine learning techniques for analysis of Egyptian flight delay. J Sci Res Sci. 2018;35(part 1):390–9.

    Google Scholar 

  12. Ye B, Liu B, Tian Y, Wan L. A methodology for predicting aggregate flight departure delays in airports based on supervised learning. Sustainability. 2020;12(7):2749.

    Article  Google Scholar 

  13. Atlioğlu MC, Bolat M, Şahin M, Tunali V, Kilinç D. Supervised learning approaches to flight delay prediction. Sakarya Univ J Sci. 2020;24(6):1223–31.

    Article  Google Scholar 

  14. Yazdi MF, Kamel SR, Chabok SJM, Kheirabadi M. Flight delay prediction based on deep learning and Levenberg–Marquart algorithm. J Big Data. 2020;7:1–28.

    Article  Google Scholar 

  15. Lee J, Cha J, Park E. Data-driven approaches into political orientation and news outlet discrimination: the case of news articles in south korea. Telemat Inform. 2023;85: 102066.

    Article  Google Scholar 

  16. Gao Z, Gatpandan MP, Gatpandan PH. Classification decision tree algorithm in predicting students’ course preference. In: 2021 2nd international symposium on computer engineering and intelligent communications (ISCEIC). IEEE; 2021. p. 93–7.

  17. Sharma A, Sharma M, Dwivedi R. Improved decision tree classification (IDT) algorithm for social media data. In: 2021 10th international conference on system modeling & advancement in research trends (SMART). IEEE; 2021. p. 155–7.

  18. Kim E, Ji H, Kim J, Park E. Classifying apartment defect repair tasks in South Korea: a machine learning approach. J Asian Archit Build Eng. 2022;21(6):2503–10.

    Article  Google Scholar 

  19. Soumya A, Kumar GH. Classification of ancient epigraphs into different periods using random forests. In: 2014 fifth international conference on signal and image processing. IEEE; 2014. p. 171–8.

  20. Ardiansyah D, Mantoro T, Syafei WA. Potential classification prediction of solar and wind energy in Indonesia using machine learning with random forest algorithm. In: 2022 5th international conference of computer and informatics engineering (IC2IE). IEEE; 2022. p. 297–302.

  21. Lee J, Park E. D-HRSP: dataset of helpful reviews for service providers. Telemat Inform. 2023;82:102001.

    Article  Google Scholar 

  22. Fadhil IM, Sibaroni Y. Topic classification in Indonesian-language tweets using fast-text feature expansion with support vector machine (SVM). In: 2022 international conference on data science and its applications (ICoDSA). IEEE; 2022. p. 214–9.

  23. Charan PVS, Ramkumar G. Black fungus classification using Adaboost with SVM-based classifier and compare accuracy with support vector machine. In: 2022 5th international conference on contemporary computing and informatics (IC3I). IEEE; 2022. p. 1895–901.

  24. Hwang S, Ahn H, Park E. iMovieRec: a hybrid movie recommendation method based on a user-image-item model. Int J Mach Learn Cybern. 2023;14:3205–16.

    Article  Google Scholar 

  25. Auleria M, Arrahmah AI, Saputra DE. A review on KN nearest neighbour based classification for object recognition. In: 2021 international conference on data science and its applications (ICoDSA). 2021; IEEE. p. 274–80.

  26. Kim S, An C, Cha J, Kim D, Park E. D-visa: a dataset for detecting visual sentiment from art images. In: Proceedings of the IEEE/CVF international conference on computer vision. 2023. p. 3051–9.

  27. Akoulih M, Tigani S, Saadane R, Tazi A. Electrocoagulation based chromium removal efficiency classification using logistic regression. Appl Sci. 2020;10(15):5179.

    Article  Google Scholar 

  28. Guan X, Zhang J, Chen S. Logistic regression based on statistical learning model with linearized kernel for classification. Comput Inform. 2021;40(2):298–317.

    Article  MathSciNet  Google Scholar 

  29. Paleczek A, Grochala D, Rydosz A. Artificial breath classification using XGBoost algorithm for diabetes detection. Sensors. 2021;21(12):4187.

    Article  Google Scholar 

  30. Liang H, Li J, Wu H, Li L, Zhou X, Jiang X. Mammographic classification of breast cancer microcalcifications through extreme gradient boosting. Electronics. 2022;11(15):2435.

    Article  Google Scholar 

  31. Lee S, Jeong D, Park E. MultiEmo: multi-task framework for emoji prediction. Knowl-Based Syst. 2022;242: 108437.

    Article  Google Scholar 

  32. Hur Y. Malaysian name-based ethnicity classification using LSTM. KSII Trans Internet Inf Syst. 2022;16(12):3855–67.

    Google Scholar 

  33. Zerrouki N, Houacine A, Harrou F, Bouarroudj R, Cherifi MY, Sun Y. Exploiting deep learning-based LSTM classification for improving hand gesture recognition to enhance visitors’ museum experiences. In: 2022 international conference on innovation and intelligence for informatics, computing, and technologies (3ICT). IEEE; 2022. p. 451–6.

  34. Madanan M, Venugopal A, Velayudhan NC. A hybrid anomaly based intrusion detection methodology using IWD for LSTM classification. In: 2020 IEEE international conference on advanced networks and telecommunications systems (ANTS). IEEE; 2020. p. 1–5.

  35. Lee S, Kim J, Kim D, Kim KJ, Park E. Computational approaches to developing the implicit media bias dataset: assessing political orientations of nonpolitical news articles. Appl Math Comput. 2023;458:128219.

    MathSciNet  Google Scholar 

  36. Lee S, Kim J, Park E. Can book covers help predict bestsellers using machine learning approaches? Telemat Inform. 2023;78: 101948.

    Article  Google Scholar 

  37. Park E. CRNet: a multimodal deep convolutional neural network for customer revisit prediction. J Big Data. 2023;10(1):1–10.

    Article  MathSciNet  Google Scholar 

  38. Oh S, Ji H, Kim J, Park E, del Pobil AP. Deep learning model based on expectation–confirmation theory to predict customer satisfaction in hospitality service. Inform Technol Tour. 2022;24(1):109–26.

    Article  Google Scholar 

  39. Yu H, Park E. A harmless webtoon for all: an automatic age-restriction prediction system for webtoon contents. Telemat Inform. 2023;76: 101906.

    Article  Google Scholar 

  40. Incheon airport weather. &tabNo=1.

  41. New York City weather.

  42. Chicago City weather.

  43. Incheon air port flight.

  44. United States Department of Transport. &QO_anzr=Nv4yv0r.

  45. Panda B, Adhikari RK. A method for classification of missing values using data mining techniques. In: 2020 international conference on computer science, engineering and applications (ICCSEA). IEEE; 2020. p. 1–5.

Download references


This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2023S1A5A8075518). This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ICAN (ICT Challenge and Advanced Network of HRD) support program (IITP-2023-RS-2023-00259497) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).

Author information

Authors and Affiliations



Kim contributed to the design, implementation, and analysis of the research with the examination of the manuscript. Kim and Park wrote and revised the manuscript. Park approved the final version of the manuscript.

Corresponding author

Correspondence to Eunil Park.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S., Park, E. Prediction of flight departure delays caused by weather conditions adopting data-driven approaches. J Big Data 11, 11 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: