The effect of feature extraction and data sampling on credit card fraud detection
Journal of Big Data volume 10, Article number: 6 (2023)
Training a machine learning algorithm on a class-imbalanced dataset can be a difficult task, a process that could prove even more challenging under conditions of high dimensionality. Feature extraction and data sampling are among the most popular preprocessing techniques. Feature extraction is used to derive a richer set of reduced dataset features, while data sampling is used to mitigate class imbalance. In this paper, we investigate these two preprocessing techniques, using a credit card fraud dataset and four ensemble classifiers (Random Forest, CatBoost, LightGBM, and XGBoost). Within the context of feature extraction, the Principal Component Analysis (PCA) and Convolutional Autoencoder (CAE) methods are evaluated. With regard to data sampling, the Random Undersampling (RUS), Synthetic Minority Oversampling Technique (SMOTE), and SMOTE Tomek methods are evaluated. The F1 score and Area Under the Receiver Operating Characteristic Curve (AUC) metrics serve as measures of classification performance. Our results show that the implementation of the RUS method followed by the CAE method leads to the best performance for credit card fraud detection.
An unequal distribution of classes in a dataset is known as class imbalance. Under this condition, the majority class can overburden machine learning algorithms, thus making recognition of the minority class more challenging. As a result, classification performance scores for the impacted algorithms can become biased in favor of the dominant class. Random Undersampling (RUS) , Synthetic Minority Oversampling Technique (SMOTE) , and SMOTE Tomek  are examples of data-level approaches for dealing with class imbalance. Algorithm-level approaches that address this imbalance typically involve various cost-sensitive strategies . In this study, the focus is on data-level approaches that mitigate class imbalance.
The data sampling technique of RUS discards members of the majority class until the ratio of instances of majority and minority class members reaches a predetermined level. SMOTE is a data augmentation technique that is used to increase the representation of the minority class in datasets. This technique synthesizes components of the minority class based on those that already exist and are proximate to each other. SMOTE Tomek combines the ability of SMOTE, which boosts instances of the minority class, and Tomek links , which removes instances of the majority class that have been identified as Tomek connections. Tomek links is further described in the “Background information” section, as this technique is not as well-known as SMOTE and RUS.
Feature extraction begins with a dataset that contains the original features and uses them to generate derived features which are designed to be informative and non-redundant. This process facilitates generalization and may improve interpretation and classification performance scores. Feature extraction generally leads to dimensionality reduction. As depicted in Figure 1, the original features of a dataset are transformed into a reduced set of features. Before training classifiers on largescale imbalanced datasets, feature extraction or dimensionality reduction is often performed. Feature extraction is carried out with various algorithms, such as Principal Component Analysis (PCA)  and Convolutional Autoencoders (CAEs) . PCA is based on linear transformations, while autoencoders use non-linear complex functions. The feature extraction techniques used in this study are further described in the “Background information” section.
Our motivation for this work comes from the fact that there are yearly increases in the number of credit card fraud incidents  and that machine learning techniques have been successfully used to detect fraudulent activity. Our paper examines the use of feature extraction and data sampling on a class-imbalanced dataset. To be more specific, our research involves the evaluation of PCA and CAE techniques for feature extraction, with RUS, SMOTE, and SMOTE Tomek used as the data sampling techniques. To the best of our knowledge, this is the first study that investigates the use of PCA, CAE, RUS, SMOTE, and SMOTE Tomek on classimbalanced data. Our research uses a credit card fraud detection dataset from the Kaggle  community, aptly named the Credit Card Fraud Detection Dataset. Given the solid performance of ensemble classifiers in many studies [11,12,13], we use four ensemble learners based on the Decision Tree  classifier: Random Forest , XGBoost , LightGBM , and CatBoost . Classification performance is measured with the F1 score and Area Under the Receiver Operating Characteristic Curve (AUC) metric.
The contribution of our research is highlighted as follows:
Examines effect of PCA and CAE on ensemble classifiers
Examines effect of RUS, SMOTE, and SMOTE Tomek on ensemble classifiers
Examines effect of the order of preprocessing tasks on ensemble classifiers
The remainder of this paper is organized as follows: the “Background information” section provides background information on the Tomek links, PCA, and CAE algorithms; the “Related work” section reviews relevant Bot-IoT literature; the “Methodology” section covers data preprocessing and classification tasks; the “Results and discussion” section provides and analyzes our findings; and the “Conclusion” section summarizes the key points of this paper, as well as providing suggestions for future work.
The Tomek links algorithm acts as a label noise filter and is denoted by pairs of instances. A Tomek link is a pair of data points x and y from different classes, such that, if d stands for the distance metric, there exists no example z such that d(x,z) is lower than d(x,y), or d(y,z) is lower than d(x,y). Hence, where the two examples x and y form a Tomek link, either one is noise or both are borderline. These two examples are thus eliminated from the training data. To elaborate further, in a binary classification environment with classes 0 and 1, a Tomek link pair would have an instance of each class and would be nearest neighbors across the dataset . These cross-class pairs are valuable in defining the class boundary . Figure 2 shows an alignment of Tomek link pairs at the class boundary. It is important to note that the use of Tomek links for label noise detection does not involve the calculation of reconstruction error.
The PCA algorithm is a feature extraction technique with a variety of applications in exploratory data analysis, visualization, and dimensionality reduction . It is an unsupervised algorithm that generates a linear mixture of original features and new features which are not correlated with the original features. Furthermore, generated features are ranked according to the amount of variance that can be explained by them. As a result, Principal Component 1 represents the first principal that explains the greatest amount of variance in the dataset, Principal Component 2 represents the second principal that explains the second greatest amount of variance in the dataset, and so on. It is therefore possible to minimize the dimensionality of data with principal components.
An autoencoder is a neural network architecture that tries to learn a compact or latent representation of an input. This latent representation contains the extracted features. The autoencoder is frequently part of a larger model that tries to recreate the input. Despite the fact that an autoencoder is an unsupervised learning method, it is technically trained via supervised learning and hence can be considered a type of semi-supervised learning.
Figure 3 illustrates the structure of a typical autoencoder, which is a feed-forward neural network containing an encoder, one or more hidden layers, and a decoder. The encoder feeds information from the input into the hidden layer, and the decoder feeds information from the hidden layer into the output layer. It is assumed that an autoencoder model will reconstruct the identical inputs that flowed through the input layer during the training process. Consequently, the decoder acts as a mirror image of the encoder, with a matching number of neurons to the encoder in both directions. For feature extraction and dimensionality reduction, the smallest hidden layer in the architecture (also referred to as the bottleneck) is used to compress the input to the lowest level of space (also known as latent space) in order to achieve the desired dimensionality reduction . During the training phase, the decoder is used to calculate the error rate of the model, but it is not utilized to recover the original input dimension of the data. Several distinct types of autoencoders are available, and their uses range widely.
The CAE has a similar architecture to the Convolutional Neural Network (CNN) . Both algorithms use some of the same fundamental components, including convolutional filters and pooling layers . The encoder performs feature extraction and dimensionality reduction by using the convolution filters and pooling layers of the CNN. The decoder performs the reverse operation. Figure 4 shows the structure of a typical CAE.
The reduction of high-dimensional data, such as genomic information, images, videos, and text, is seen as an important and necessary data preprocessing step that generates high-level representations. One reason for reducing dimensionality is to provide deeper insight into the inherent structure of data. Various feature extraction techniques have been explored. The early approaches are based on projection and involve mapping input features in the original high-dimensional space to a new low-dimensional space while minimizing information loss . PCA and Linear Discriminant Analysis (LDA)  are two of the most well-known projection techniques. The former is an unsupervised method that maximizes variance to project original data into its principal directions. The latter is a supervised approach for locating a linear subspace by optimizing distinguishing data between classes. The main disadvantage of these approaches is that they conduct linear projection. Subsequent research overcame this problem by utilizing non-linear methods. Another drawback of the early approaches is that the majority of these works tend to map data from high-dimensional to low-dimensional space by extracting features once, rather than stacking them to build deeper levels of representation progressively . Autoencoders compress dimensionality by minimizing reconstruction loss using artificial neural networks. As a result, it is simple to stack autoencoders by adding hidden layers. This gives the autoencoder and its variants, such as the CAE, the ability to extract meaningful features.
Compared to the plain autoencoder, the CAE has the ability to extract smooth features by use of its pooling layers, which is advantageous for classification. Polic et al.  employ a CAE to reduce the optical-based output for a tactile sensor image. The authors validate their method with a set of benchmarking cases. Shallow neural networks and other machine learning models are used to estimate contact object shape, edge position, orientation, and indentation depth. A contact force estimator  is also trained, resulting in the confirmation that the extracted features contain sufficient information on both the spatial and mechanical properties of the object.
Meng et al.  note that the plain autoencoder fails to take into account relationships between data features. These relationships may impact results if original and/or novel features are used. For feature extraction, Meng et al. propose a relational autoencoder model that factors in both data features and their relationships. The authors also make their model compatible with other autoencoder variants, such as a sparse autoencoder , denoising autoencoder , and variational autoencoder . Upon testing the proposed model on a set of benchmark datasets, results show that the incorporation of data relationships generates more robust features with lower reconstruction error loss, when compared to the other autoencoder variants.
In another related work, Lee et al.  use a CAE to perform feature extraction and dimensionality reduction for radar data analysis. The aim of their study is to obtain a fast, accurate, and human-like image-processing algorithm. Finally, Maggipinto et al.  use a CAE to extract features in data-driven applications for virtual metrology. Values for optical emission spectrometry serve as the input data.
Finally, we investigated autoencoder studies performed on the Credit Card Fraud Detection Dataset published by Kaggle. The relevant works are described in the following two paragraphs.
Using both a plain autoencoder algorithm and a Logistic Regression algorithm, Al-Shabi  evaluated balanced and imbalanced data to detect credit card fraud in the dataset. Results show that the autoencoder outperformed Logistic Regression. However, we note that the F1 score for the autoencoder is only 0.04, due to the low value for the Precision metric.
Within the framework of one-class classification, Chen et al.  combined a sparse autoencoder with a Generative Adversarial Network to detect credit card fraud in the dataset. Other one-class classification algorithms were evaluated, namely OneClass Gaussian Process  and Support Vector Data Description . Based on the results, the authors’ proposed model, with a top F1 score of 0.8736, performed the best. The reproducibility of their work is questionable, since the hyperparameters used for the One-Class Gaussian Process and Support Vector Data Description algorithms have not been provided.
With regard to the final two related works, we point out that their best values for F1 score are noticeably lower than our best value obtained in this study. Further, we note that none of the related works discussed in this section use data sampling in conjunction with feature extraction.
The Credit Card Fraud Detection Dataset  was published by Worldline and the Universit´e Libre de Bruxelles (ULB). There are 284,807 instances and 30 independent variables in the raw dataset, which shows credit card purchases by Europeans in September 2013. The label (dependent variable) of this binary dataset is 1 for a fraudulent transaction and 0 for a non-fraudulent transaction. Fraudulent transactions constitute 492 instances, or 0.172%, thus rendering the dataset highly imbalanced with regard to the majority and minority classes.
In this study, we evaluate three data sampling techniques (RUS, SMOTE, and SMOTE Tomek) and two feature extraction techniques (PCA and CAE). The impact on classifier performance is evaluated with various scenarios, as depicted in Table 1.
The RUS technique is used to obtain five different ratios (1:1, 1:5, 1:10, 1:20 and 1:50), as shown in Tables 2, 3, 4, 5, 6. These ratios represent the minority to majority instances for each dataset obtained by down-sampling the original dataset. The use of this range of ratios strengthens the validity of our study. The SMOTE and SMOTE Tomek data sampling techniques are associated with Tables 7 and 8, respectively. Table 9 shows the results where no preprocessing was performed, i.e., no data sampling and no feature extraction.
The SMOTE, SMOTE Tomek, and RUS algorithms are included in the imbalancedlearn  Python library. For SMOTE and SMOTE Tomek, we set the k neighbors parameter to 5. After the implementation of the PCA algorithm with ScikitLearn , the number of principal components obtained was 15, which is half the the number of original dataset features. To recreate the original transactions from the PCA components, the inverse transform function from Scikit-Learn is used. The CAE is implemented with Keras and TensorFlow, with optimum parameters selected during preliminary experimentation .
The learners used in this study are Random Forest, XGBoost, LightGBM, and CatBoost. Random Forest, which is an ensemble of Decision Trees, uses the bagging  technique. XGBoost, LightGBM, and CatBoost are Gradient-Boosted Decision Trees (GBDTs) , which are ensembles of Decision Trees that are trained sequentially with the boosting  technique. XGBoost is based on a weighted quantile sketch and a sparsity-aware function. A weighted quantile sketch uses approximate tree learning  for merging and pruning operations, while sparsity is concerned with zero or missing values. LightGBM is defined by Exclusive Feature Bundling and Gradient-based One-Side Sampling. Exclusive Feature Bundling reduces the count of variables through the categorization of mutually exclusive features, while One-Side Sampling excludes a chunk of instances associated with small gradients. CatBoost is designed around Ordered Boosting, an algorithm that orders instances used by Decision Trees.
Training and testing are performed with k-fold cross-validation, where the model is trained on k-1 folds each time and tested on the remaining fold. This ensures that as much data as possible is used during the classification phase. Our crossvalidation process is stratified, which seeks to ensure that each class is proportionally represented across the folds. In this experiment, a value of five was assigned to k: four folds used in training and one fold used in testing. The process was repeated five times.
The AUC metric is used to measure classifier performance. AUC refers to the area under the Receiver Operating Characteristic (ROC) curve, which plots True Positive Rate (TPR) against False Positive Rate (FPR). AUC summarizes overall model performance and is reflective of all classification thresholds along the curve . The F1 score metric is the harmonic mean of precision and recall. Like AUC, the F1 score is well-suited for datasets with a high class imbalance . For the F1 score, the default threshold of 0.5 was used.
Results and discussion
Table 9 reflects the baseline scores, where no preprocessing activity (no data sampling and no feature extraction) was implemented. The highest values of 0.853 and 0.891 are for the F1 score and AUC, respectively. These two scores were obtained with CatBoost.
The highest values among the tabulated results obtained through the RUS sampling technique are in Table 6, which is associated with a minority-to-majority class ratio of 1:50. For the F1 score and AUC, the highest values in this table are 0.909 and 0.988, respectively. The score of 0.909 was obtained by LightGBM, while the score of 0.988 was obtained by XGBoost, LightGBM, and CatBoost, all GBDTs. Interestingly, the highest F1 score for the baseline (Table 9) is greater than any of the F1 scores for the RUS ratios of 1:1, 1:5, and 1:10 (Tables 2, 3, and 4, respectively).
Table 7 was obtained with the SMOTE sampling technique. The highest values in this table for the F1 score and AUC are 0.872 and 0.940, respectively. The score of 0.872 was obtained by XGBoost, while the score of 0.940 was obtained by LightGBM and CatBoost. In Table 8, which was obtained with the SMOTE Tomek sampling technique, the highest values of 0.899 and 0.970 are associated with the F1 score and AUC, respectively. The score of 0.899 was obtained by Random Forest, while the score of 0.970 was obtained by Random Forest and XGBoost.
To determine the statistical significance of the performance scores, we perform three-way ANalysis Of VAriance (ANOVA) tests. ANOVA reveals whether there is a significant difference between the group means . A 95% (α = 0.05) confidence level is used for our ANOVA tests. The results are shown in Tables 10 and 11 for the F1 score and AUC, respectively.
In these tables, Df is the degrees of freedom, Sum Sq is the sum of squares, Mean Sq is the mean sum of squares, F value is the F-statistic, and Pr(>F) is the p-value. Note that for the Sampling Technique factor, only the 1:50 minority-to-majority class ratio is considered for the RUS technique, since this ratio yields the highest performance scores among all the RUS ratios obtained. As shown in Tables 10 and 11, the p-value for each factor is practically 0, well below the level of α. Hence, we infer that all factors have a significant impact on performance in terms of AUC. Since this is the case, Tukey’s Honestly Significant Difference (HSD) tests  are carried out to find out which groups are significantly different from each other. For a particular experiment, letter groups assigned through the Tukey method indicate similarity or significant differences in performance results within a factor.
The Tukey method is first applied within the scope of the F1 score metric. With regard to the Scenario factor (Table 12), data sampling alone is ranked in group ‘a’, the best-performing group. Data sampling followed by CAE is ranked in group ‘b’, the second-best performing group. The bottom group ‘f’ consists of PCA followed by data sampling. In terms of the Classifier factor (Table 13), Random Forest, the top performer, is in group ‘a’, XGBoost is in group ‘b’, and at the bottom is LightGBM in group ‘d’. For the Sampling factor (Table 14), RUS, the best performer, is in group ‘a’, SMOTE Tomek is in group ‘b’, and SMOTE is in group ‘c’.
The Tukey method is next applied within the scope of the AUC metric. With regard to the Scenario factor (Table 15), data sampling followed by CAE is ranked in group ‘a’, the best-performing group. Data sampling alone is ranked in group ‘b’, the second-best performing group. The bottom group ‘h’ is associated with no preprocessing activity. In terms of the Classifier factor (Table 16), CatBoost, the top performer, is in group ‘a’, XGBoost is in group ‘b’, and at the bottom is LightGBM in group ‘d’. For the Sampling factor (Table 17), RUS, the best performer, is in group ‘a’, SMOTE Tomek is in group ‘b’, and SMOTE is in group ‘c’.
Based on the Tukey’s HSD results for the F1 score and AUC metrics, the RUS technique is the clear-cut top choice for the Sampling factor. However, the choice of best classifier could not be established from the rankings. This is because the HSD results for the F1 score metric (Table 13) show Random Forest as the best classifier, while the HSD results for the AUC metric (Table 16) show CatBoost as the best classifier. The Scenario factor shows data sampling followed by CAE as the second-best choice for the F1 score metric (Table 12) and the best choice for the AUC metric (Table 15). Conversely, data sampling alone is the top choice for the F1 score metric and the second-best choice for the AUC metric. We recommend the use of data sampling followed by CAE for the Scenario factor. This is because the implementation of CAE, which is a feature extraction technique, tends to reduce computational burden and decrease the training time of machine learning algorithms. As stated earlier, feature extraction may also improve generalization and the interpretation of results. With regard to the Scenario factor for the F1 score and AUC, the following was observed: Sampling + CAE is better than Sampling + PCA; CAE + Sampling is better than PCA + Sampling; and CAE + None is better than PCA + None. We believe that CAE has an advantage over PCA because the autoencoder can model non-linear functions.
In this research, we use a credit fraud dataset to investigate the effect of data sampling and feature extraction on four ensemble classifiers. Three data sampling techniques, RUS, SMOTE, and SMOTE Tomek are evaluated, and two feature extraction techniques, PCA and CAE, are evaluated. The results indicate that the use of the RUS data sampling technique followed by the use of the CAE feature extraction technique yields the best results.
Future work will involve additional classifiers and datasets, with a focus on incorporating data that contains audio and images. In addition, further work will consider other data sampling and feature extraction algorithms.
Availability of data and materials
Area Under the Receiver Operating Characteristic Curve
ANalysis Of VAriance
Convolutional Neural Network
False Positive Rate
Gradient-Boosted Decision Tree
Honestly Significant Difference
Linear Discriminant Analysis
Principal Component Analysis
Receiver Operating Characteristic
Synthetic Minority Oversampling Technique
True Positive Rate
Universit´e Libre de Bruxelles
Liu B, Tsoumakas G. Dealing with class imbalance in classifier chains via random undersampling. Knowl-Based Syst. 2020;192: 105292.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Jonathan B, Putra PH, Ruldeviyani Y. Observation imbalanced data text to predict users selling products on female daily with smote, tomek, and smote-tomek. In: 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), IEEE. pp. 81–85; 2020.
Thai-Nghe N, Gantner Z, Schmidt-Thieme L. Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), IEEE. pp. 1–8; 2010.
Tomek I, et al. Two modifications of cnn. IEEE Trans Syst Man Cybern. 1976;11:769–72.
Peng C, Chen Y, Kang Z, Chen C, Cheng Q. Robust principal component analysis: a factorization-based approach with linear complexity. Inf Sci. 2020;513:581–99.
Maggipinto M, Masiero C, Beghi A, Susto GA. A convolutional autoencoder approach for feature extraction in virtual metrology. Procedia Manufacturing. 2018;17:126–33.
Alsenan SA, Al-Turaiki IM, Hafez AM. Feature extraction methods in quantitative structure–activity relationship modeling: a comparative study. IEEE Access. 2020;8:78737–52.
Popat RR, Chaudhary J. A survey on credit card fraud detection using machine learning. In: 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), IEEE. 2018. p. 1120–1125.
Kaggle: Credit Card Fraud Detection. https://www.kaggle.com/mlg-ulb/creditcardfraud
Hancock JT, Khoshgoftaar TM. Catboost for big data: an interdisciplinary review. J Big data. 2020;7(1):1–45.
Zuech R, Hancock J, Khoshgoftaar TM. Detecting web attacks using random undersampling and ensemble learners. J Big Data. 2021;8(1):1–20.
Leevy JL, Hancock J, Zuech R, Khoshgoftaar TM. Detecting cybersecurity attacks across different network features and learners. J Big Data. 2021;8(1):1–29.
Patel HH, Prajapati P. Study and analysis of decision tree based classification algorithms. Int J Computer Sci Eng. 2018;6(10):74–8.
Breiman L. Random forests. Mach Learning. 2001;45(1):5–32.
Shi X, Wong YD, Li MZ-F, Palanisamy C, Chai C. A feature learning approach based on xgboost for driving assessment and risk prediction. Accid Anal Prev. 2019;129:170–9.
Tang C, Luktarhan N, Zhao Y. An efficient intrusion detection method based on lightgbm and autoencoder. Symmetry. 2020;12(9):1458.
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. Catboost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems, p. 6638–6648. 2018.
He H, Ma Y. Imbalanced Learning: Foundations, Algorithms, and Applications. New York: Wiley; 2013.
Brownlee J. Undersampling algorithms for imbalanced classification. https://machinelearningmastery.com/undersampling-algorithms-for-imbalanced-classification/
Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans R Soci. 2016;374(2065):20150202.
Meng Q, Catchpoole D, Skillicom D, Kennedy PJ. Relational autoencoder for feature extraction. In: 2017 International Joint Conference on Neural Networks (IJCNN), IEEE; p. 364–371. 2017.
Nicholson C. A Beginner’s Guide to Important Topics in AI, Machine Learning, and Deep Learning: Deep utoencoders. https://wiki.pathmind.com/deep-autoencoder
Safayenikoo P, Akturk I. Weight update skipping: Reducing training time for artificial neural networks. arXiv preprint arXiv:2012.02792. 2020.
Chablani M. Autoencoders: Introduction and Implementation in TF. https://towardsdatascience.com/autoencoders-introduction-and-implementation-3f40483b0a85
Khalid S, Khalil T, Nasreen S. A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference, IEEE. p. 372–378; 2014.
Sharma A, Paliwal KK. Linear discriminant analysis for the small sample size problem: an overview. Int J Mach Learn Cybern. 2015;6(3):443–54.
Polic M, Krajacic I, Lepora N, Orsag M. Convolutional autoencoder for feature extraction in tactile sensing. IEEE Robot Autom Lett. 2019;4(4):3671–8.
Garcıa JG, Robertsson A, Ortega JG, Johansson R. Generalized contact force estimator for a robot manipulator. In: Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006; p. 4019–4024 (2006). IEEE.
Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K. Deep learning approach combining sparse autoencoder with svm for network intrusion detection. IEEE Access. 2018;6:52843–56.
Meng Z, Zhan X, Li J, Pan Z. An enhancement denoising autoencoder for rolling bearing fault diagnosis. Measurement. 2018;130:448–54.
Zavrak S, Iskefiyeli M. Anomaly-based intrusion detection from network flow features using variational autoencoder. IEEE Access. 2020;8:108346–58.
Lee H, Kim J, Kim B, Kim S. Convolutional autoencoder based feature extraction in radar data analysis. In: 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), IEEE. p. 81–84; 2018.
Al-Shabi M. Credit card fraud detection using autoencoder model in unbalanced datasets. J Adv Math Computer Sci. 2019;33(5):1–16.
Chen, J., Shen, Y., Ali, R.: Credit card fraud detection using sparse autoencoder and generative adversarial network. In: 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 1054–1059 (2018). IEEE
Kemmler M, Rodner E, Wacker E-S, Denzler J. One-class classification with gaussian processes. Pattern Recogn. 2013;46(12):3507–18.
Kim S, Choi Y, Lee M. Deep learning with support vector data description. Neurocomputing. 2015;165:111–7.
imbalanced-learn developers T. Imbalanced-learn documentation. https://imbalanced-learn.org/stable/
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: Machine learning in python. J Mach Learn Res. 2011;12:2825–30.
Gulli A, Pal S. Deep Learning with Keras. New York: Packt Publishing Ltd; 2017.
Gonzalez S, Garıa S, Del Ser J, Rokach L, Herrera F. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Inform Fusion. 2020;64:205–37.
Wen Z, He B, Kotagiri R, Lu S, Shi J. Efficient gradient boosted decision tree training on gpus. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 234–243 (2018). IEEE.
Basha SM, Rajput DS, Vandhan V. Impact of gradient ascent and boosting algorithm in classification. Int J Intell Eng Syst (IJIES). 2018;11(1):41–9.
Gupta A, Nagarajan V, Ravi R. Approximation algorithms for optimal decision trees and adaptive tsp problems. Math Oper Res. 2017;42(3):876–96.
Seliya N, Khoshgoftaar TM, Van Hulse J. A study on the relationships of classifier performance metrics. In: ICTAI’09. 21st International Conference On Tools with Artificial Intelligence, 2009, IEEE. 2009. p. 59–66.
Gu Q, Zhu L, Cai Z. Evaluation measures of the classification performance of imbalanced data sets. In: International Symposium on Intelligence Computation and Applications. 2009; Springer. p. 461–71.
Iversen GR, Norpoth H, Norpoth HP. Analysis of Variance. New York: Sage; 1987.
Tukey JW. Comparing individual means in the analysis of variance. Biometrics. 1949;8:99–114.
We would like to thank the reviewers in the Data Mining and Machine Learning Laboratory at Florida Atlantic University.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Salekshahrezaee, Z., Leevy, J.L. & Khoshgoftaar, T.M. The effect of feature extraction and data sampling on credit card fraud detection. J Big Data 10, 6 (2023). https://doi.org/10.1186/s40537-023-00684-w