Wang XD, Chen RC, Yan F, et al. Fast adaptive K-means subspace clustering for high-dimensional data. IEEE Access. 2019;7:42639–51.
Google Scholar
Jaiswal JK, Samikannu R. Application of random forest algorithm on feature subset selection and classification and regression. In: World Congress on Computing and Communication Technologies (WCCCT). IEEE, 2017, p. 65–8.
Chen RC. Using deep learning to predict user rating on imbalance classification data. IAENG Int J Comput Sci. 2019;46:109–17.
Google Scholar
Caraka RE. Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Genetic Algorithm (GA). Int J Eng Busin Manag. 2017;1:35–42.
Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, et al. A review of robust clustering methods. Adv Data Anal Classif. 2010;4:89–109.
MathSciNet
MATH
Google Scholar
Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artif Intell. 2002;97:245–71.
MathSciNet
MATH
Google Scholar
Schmidtler, AR M, A NC. Data classification methods using machine learning techniques. US Patent Application 2012; 11: 691.
Chen XW, Wasikowski M. FAST: A roc-based feature selection metric for small samples and imbalanced data classification problems. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, p. 124–32.
Segal MR. Machine Learning Benchmarks and Random Forest Regression. Biostatistics 2004; 1–14.
Cenggoro TW, Mahesworo B, Budiarto A, et al. Features importance in classification models for colorectal cancer cases phenotype in Indonesia. Procedia Comput Sci. 2019;157:313–20.
Google Scholar
Tao J, Kang Y. Features importance analysis for emotional speech classification. Lecture Notes Comput Sci. 2005;3784:449–57.
Google Scholar
Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, et al. Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev. 2015;71:804–18.
Google Scholar
Zhu F, Jiang M, Qiu Y, et al. RSLIME: an efficient feature importance analysis approach for industrial recommendation systems. In: 2019 International Joint Conference on Neural Networks (IJCNN) 2019; 1: 1–6.
Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002;2:18–22.
Google Scholar
Kella BJ, HimaBindu K, Suryanarayana D. A comparative study of random forest & k – nearest neighbors on the har dataset using caret. Int J Innov Res Technol. 2017;3:6–9.
Google Scholar
Casanova R, Saldana S, Chew EY, et al. Application of random forests methods to diabetic retinopathy classification analyses. PLoS ONE. 2014;9:1–8.
Google Scholar
Grömping U. Variable importance assessment in regression: linear regression versus random forest. Am Stat. 2009;63:308–19.
MathSciNet
Google Scholar
Khoshgoftaar TM, Golawala M, Van Hulse J. An empirical study of learning from imbalanced data using random forest. In: Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI. 2007, pp. 310–317.
Li Y, Xia J, Zhang S, et al. An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Syst Appl. 2012;39:424–30.
Google Scholar
Hsu HH, Hsieh CW, Da LuM. Hybrid feature selection by combining filters and wrappers. Expert Syst Appl. 2011;38:8144–50.
Google Scholar
Dewi C, Chen R-C. Human Activity Recognition Based on Evolution of Features Selection and Random Forest. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). 2019.
Wei W, Qiang Y, Zhang J. A bijection between lattice-valued filters and lattice-valued congruences in residuated lattices. Math Probl Eng. 2013;36:4218–29.
MathSciNet
MATH
Google Scholar
Wei W, Yang XL, Shen PY, et al. Holes detection in anisotropic sensornets: topological methods. Int J Distrib Sens Netw. 2012;8:9.
Google Scholar
Dewi C, Chen R-C. Random forest and support vector machine on features selection for regression analysis. Int J Innov Comput Inform Control. 15.
Murray K, Conner MM. Methods to quantify variable importance: implications for the analysis of noisy ecological data. Ecology. 2009;90:348–55.
Google Scholar
Warton DI, Blanchet FG, O’Hara RB, et al. So many variables: joint modeling in community ecology. Trends Ecol Evol. 2015;30:766–79.
Google Scholar
Caraka RE, Chen RC, Lee Y, et al. Variational approximation multivariate generalized linear latent variable model in diversity termites. Sylwan. 2020;164:161–77.
Google Scholar
Jeliazkov A, Mijatovic D, Chantepie S, et al. A global database for metacommunity ecology, integrating species, traits, environment and space. Sci Data. 2020;7:1–15.
Google Scholar
Haidar A, Verma B. A novel approach for optimizing climate features and network parameters in rainfall forecasting. Soft Comput. 2018;22:8119–30.
Google Scholar
Caraka RE, Bakar SA, Tahmid M, et al. Neurocomputing fundamental climate analysis. Telkomnika. 2019;17:1818–27.
Google Scholar
Hu J, Ghamisi P, Zhu X. Feature Extraction and Selection of Sentinel-1 Dual-Pol Data for Global-Scale Local Climate Zone Classification. ISPRS Int J Geo-Inform. 2018. https://doi.org/10.3390/ijgi7090379.
Article
Google Scholar
Bechtel B, Daneke C. Classification of local climate zones based on multiple earth observation data. IEEE J Select Topics Appl Earth Observ Remote Sens. 2012. https://doi.org/10.1109/jstars.2012.2189873.
Article
Google Scholar
Torija AJ, Ruiz DP. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods. Sci Total Environ. 2015;505:680–93.
Google Scholar
Caraka RE, Chen RC, Toharudin T, et al. Prediction of Status Particulate Matter 25 Using State Markov Chain Stochastic Process and HYBRID VAR-NN-PSO. IEEE Access. 2019;7:161654–65.
Google Scholar
De Vito S, Piga M, Martinotto L, et al. CO, NO2 and NOx urban pollution monitoring with on-field calibrated electronic nose by automatic bayesian regularization. Sens Actuat B. 2009;143:182–91.
Google Scholar
Prastyo DD, Nabila FS, Suhartono, et al. VAR and GSTAR-based feature selection in support vector regression for multivariate spatio-temporal forecasting. In: Communications in Computer and Information Science. 2019, p. 46–57.
Bui DT, Tsangaratos P. Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci Total Environ. 2019;6:1038–54.
Google Scholar
Hosseini FS, Choubin B, Mosavi A, et al. Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: application of the simulated annealing feature selection method. Sci Total Environ. 2020;711:135161.
Google Scholar
Micheletti N, Foresti L, Robert S, et al. Machine learning feature selection methods for landslide susceptibility mapping. Math Geosci. 2014;46:33–57.
MATH
Google Scholar
Brett PTB, Guida R. Earthquake damage detection in urban areas using curvilinear features. IEEE Trans Geosci Remote Sens. 2013;51:4877–84.
Google Scholar
Zhuang J, Ogata Y, Vere-Jones D. Analyzing earthquake clustering features by using stochastic reconstruction. J Geophys Res. 2004. https://doi.org/10.1029/2003jb002879.
Article
Google Scholar
Wieland M, Liu W, Yamazaki F. Learning change from synthetic aperture radar images: performance evaluation of a support vector machine to detect earthquake and tsunami-induced changes. Remote Sens. 2016;8:792.
Google Scholar
Caraka RE, Nugroho NT, Tai SK, et al. Feature importance of the aortic anatomy on endovascular aneurysm repair (EVAR) using Boruta and Bayesian MCMC. Comm Math Biol Neurosci. 2020. https://doi.org/10.28919/cmbn/4584.
Article
Google Scholar
Kushan DS, Jönsson D, Demmer RT. A combined strategy of feature selection and machine learning to identify predictors of prediabetes. J Am Med Inform Assoc. 2020;27:394–406.
Google Scholar
Caraka RE, Goldameir NE, et al. An end to end of scalable tree boosting system. Sylwan. 2020;165:1–11.
Google Scholar
Garcia-Carretero R, Vigil-Medina L, Mora-Jimenez I, et al. Use of a K-nearest neighbors model to predict the development of type 2 diabetes within 2 years in an obese, hypertensive population. Med Biol Eng Comput. 2020; 1–12.
Magesh G, Swarnalatha P. Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. Evol Intel. 2020. https://doi.org/10.1007/s12065-019-00336-0.
Article
Google Scholar
Kavitha R, Kannan E. An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining. In: 1st International Conference on Emerging Trends in Engineering, Technology and Science, ICETETS 2016 - Proceedings. 2016. https://doi.org/10.1109/icetets.2016.7603000(Epub ahead of print 2016)
Shilaskar S, Ghatol A. Feature selection for medical diagnosis: evaluation for cardiovascular diseases. Expert Syst Appl. 2013;40:4146–53.
Google Scholar
Sodhi P, Aggarwal P. Feature selection using SEER data for the survivability of ovarian cancer patients. In: Advances in computing and intelligent systems. 2019, p. 271–9.
García-Díaz P, Sánchez-Berriel I, Martínez-Rojas JA, et al. Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data. Genomics. 2020;112:1916–25.
Google Scholar
Singh RK, Sivabalakrishnan M. Feature selection of gene expression data for cancer classification: a review. In: Procedia Computer Science. 2015, pp. 52–7.
Naftchali RE, Abadeh MS. A multi-layered incremental feature selection algorithm for adjuvant chemotherapy effectiveness/futileness assessment in non-small cell lung cancer. Biocybern Biomed Eng. 2017;37:477–88.
Google Scholar
Fung G, Stoeckel J. SVM feature selection for classification of SPECT images of Alzheimer’s disease using spatial information. Knowl Inf Syst. 2007;11:243–58.
Google Scholar
Sankhwar S, Gupta D, Ramya KC, et al. Improved grey wolf optimization-based feature subset selection with fuzzy neural classifier for financial crisis prediction. Soft Comput. 2020;24(1):101–10.
Google Scholar
Wei W, Xia X, Wozniak M, et al. Multi-sink distributed power control algorithm for Cyber-physical-systems in coal mine tunnels. Comput Netw. 2019;161:210–9.
Google Scholar
Sani NS, Rahman MA, Bakar AA, et al. Machine learning approach for Bottom 40 Percent Households (B40) poverty classification. Int J Adv Sci Eng Inform Technol. 2018. https://doi.org/10.18517/ijaseit.8.4-2.6829.
Article
Google Scholar
Njuguna C, McSharry P. Constructing spatiotemporal poverty indices from big data. J Busin Res. 2017;70:318–27.
Google Scholar
Matos T, Macedo JA, Lettich F, et al. Leveraging feature selection to detect potential tax fraudsters. Expert Syst Appl. 2020;145:113–28.
Google Scholar
Zhang H. Optimization of risk control in financial markets based on particle swarm optimization algorithm. J Comput Appl Math. 2020;368:112530.
MathSciNet
MATH
Google Scholar
Caraka RE, Chen RC, Toharudin T, et al. Ramadhan short-term electric load: a hybrid model of cycle spinning wavelet and group method data handling (CSW-GMDH). IAENG Int J Comput Sci. 2019;46:670–6.
Google Scholar
Abedinia O, Amjady N, Zareipour H. A new feature selection technique for load and price forecast of electrical power systems. IEEE Trans Power Syst. 2017;32:62–74.
Google Scholar
Caraka RE, Bakar SA. Evaluation Performance of Hybrid Localized Multi Kernel SVR (LMKSVR) in electrical load data using 4 different optimizations. J Eng Appl Sci. 2020;13(17):7440–9.
Google Scholar
Sałat R, Osowski S, Siwek K. Principal Component Analysis (PCA) for feature selection at the diagnosis of electrical circuits. Przegląd Elektrotechniczny. 2003;79:667–70.
Google Scholar
Lojowska A, Kurowicka D, Papaefthymiou G, et al. Stochastic modeling of power demand due to EVs using copula. In: IEEE Transactions on Power Systems. https://doi.org/10.1109/tpwrs.2012.2192139(Epub ahead of print 2012).
Jie S, Hui L, Hamido F, et al. Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting. Inform Fusion. 2020;54:128–44.
Google Scholar
Caraka RE, Hafianti S, Hidayati S, et al. Identifying Indicators of Household Indebtedness by Provinces. In: The ninth research dive for development on household vulnerability 2019; 10–15.
Kaban PA, Kurniawan R, Caraka RE, et al. Biclustering method to capture the spatial pattern and to identify the causes of social vulnerability in Indonesia: a new recommendation for disaster mitigation policy. Procedia Computer Science. 2019;157:31–7.
Google Scholar
Kurniawan R, Siagian TH, Yuniarto B, et al. Construction of social vulnerability index in Indonesia using partial least squares structural equation modeling. Int J Eng Technol. 2018;7:6131–6.
Google Scholar
Ravisankar P, Ravi V, Raghava Rao G, et al. Detection of financial statement fraud and feature selection using data mining techniques. Decis Support Syst. 2011;50:491–500.
Google Scholar
Derrig RA. Insurance Fraud. J Risk Insur. 2002;69:271–87.
Google Scholar
Altinbas H, Biskin OT. Selecting macroeconomic influencers on stock markets by using feature selection algorithms. In: Procedia Economics and Finance. https://doi.org/10.1016/s2212-5671(15)01251-4(Epub ahead of print 2015).
Wei W, Xu Q, Wang L, et al. GI/Geom/1 queue based on communication model for mesh networks. Int J Commun Syst. 2014;27:3013–29.
Google Scholar
Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.
MATH
Google Scholar
Chunkai Z, Ying Z, JianweI G, et al. Research on classification method of high-dimensional class imbalanced datasets based on SVM. Int J Mach Learn Cybern. 2018;10:1765–78.
Google Scholar
Wei W, Zhou B, Połap D, et al. A regional adaptive variational PDE model for computed tomography image reconstruction. Pattern Recogn. 2019;92:64–81.
Google Scholar
R Development Core Team R. R: A Language and Environment for Statistical Computing. 2011. https://doi.org/10.1007/978-3-540-74686-7(Epub ahead of print 2011).
Sain SR, Vapnik VN. The nature of statistical learning theory. Technometrics. 2006. https://doi.org/10.2307/1271324.
Article
Google Scholar
Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat. 1992;46:175–85.
MathSciNet
Google Scholar
Cunningham P, Delany SJ. K-Nearest neighbour classifiers. Multiple Classifier Systems. 2007;34:1–17.
Google Scholar
Bonyad M, Tieng Q, Reutens D. Optimization of distributions differences for classification. IEEE Trans Neural Netw Learn Syst. 2019;30:511–23.
MathSciNet
Google Scholar
Min-Ling Z, Zhi-Hua Z. A k-nearest neighbor based algorithm for multi-label classification. In: IEEE International Conference on Granular Computing. 2005, p. 718–21.
Peterson L. K-nearest neighbor. Scholarpedia; 4. https://doi.org/10.4249/scholarpedia.1883(Epub ahead of print 2009).
Hand DJ, Vinciotti V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recogn Lett. 2003;24:1555–62.
MATH
Google Scholar
Tharwat A, Gaber T, Ibrahim A, et al. Linear discriminant analysis: a detailed tutorial. AI Commun. 2017. https://doi.org/10.3233/aic-170729.
Article
MathSciNet
Google Scholar
Ferizal R, Wibirama S, Setiawan NA. Gender recognition using PCA and LDA with improve preprocessing and classification technique. In: Proceedings - 2017 7th International Annual Engineering Seminar, InAES 2017. 2017, p. 1–6.
Pardamean B, Budiarto A, Caraka RE. Bioinformatika dengan R Tingkat Lanjut. 1st ed. Yogyakarta: Teknosains; 2018.
Google Scholar
Juárez I, Mira-McWilliams J, González C. Important variable assessment and electricity price forecasting based on regression tree models: classification and regression trees, Bagging and Random Forests. IET Gener Transm Distrib. 2015;9:1120–8.
Google Scholar
Andrew AM. An Introduction to support vector machines and other kernel-based learning methods. Kybernetes. 2001. https://doi.org/10.1108/k.2001.30.1.103.6.
Article
Google Scholar
Durgesh, K. Srivastava BL. Data classification using support vector machine. J Theor Appl Inform Technol. 2010; 12: 1-7.
Salakhutdinov R, Hinton G. Learning a nonlinear embedding by preserving class neighbourhood structure. J Mach Learn Res. 2007, pp. 412–419.
Yasin H, Caraka RE, et al. Prediction of crude oil prices using support vector regression (SVR) with grid search—Cross validation algorithm. Global J Pure Appl Math. 2016;12:3009–20.
Google Scholar
Caraka RE, Bakar SA, Pardamean B, et al. Hybrid support vector regression in electric load during national holiday season. In: ICITech. IEEE, 2018, pp. 1–6.
Chen RC, Hsieh CH. Web page classification based on a support vector machine using a weighted vote schema. Expert Syst Appl. 2006;31:427–35.
Google Scholar
Ertekin S, Huang J, Bottou L, et al. Learning on the border: active learning in imbalanced data classification. In: International conference on information and knowledge management, proceedings. 2007, p. 127–36.
Wei W, Liu S, Li W, et al. Fractal intelligent privacy protection in online social network using attribute-based encryption schemes. IEEE Trans Comput Soc Syst. 2018;5:736–47.
Google Scholar
Sharma A, Lee YD, Chung WY. High accuracy human activity monitoring using neural network. In: 3rd International Conference on Convergence and Hybrid Information Technology, ICCIT 2008. 2008, p. 430–35.
R Core Team. R software. R Foundation for Statistical Computing 2008; 739: 409.
Wei W, Su J, Song H, et al. CDMA-based anti-collision algorithm for EPC global C1 Gen2 systems. Telecommun Syst. 2018;67:63–71.
Google Scholar
Caffo B. Developing Data Products in R. R Software 2015; 52.
Yang JY, Wang JS, Chen YP. Using acceleration measurements for activity recognition: an effective learning algorithm for constructing neural classifiers. Pattern Recogn Lett. 2008;29:2213–20.
Google Scholar
Ting KM. Confusion Matrix. In: Encyclopedia of Machine Learning and Data Mining. 2017, p. 260.
Hernández-Orallo J. ROC curves for regression. Pattern Recogn. 2013;46:3395–411.
MATH
Google Scholar
Sedgwick P. Receiver operating characteristic curves. BMJ (Online) 2013; 1–3.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–32.
Google Scholar
Mishra P, Mishra M, Somani AK. Applications of Hadoop Ecosystems Tools. In: NoSQL: Database for Storage and Retrieval of Data in Cloud. 2017, p. 173–90.
Mishra, Mayank PM, Somani AK. Understanding the data science behind business analytics. Big Data Analyt. 2017; 93–116.
Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinform 2006; 7: 1–13.
Efron B, Tibshirani R. Improvements on cross-validation: The.632+ bootstrap method. J Am Stat Assoc. 1997;42:548–60.
MathSciNet
MATH
Google Scholar
Wei W, Song H, Li W, et al. Gradient-driven parking navigation using a continuous information potential field based on wireless sensor network. Inform Sci. 2017. https://doi.org/10.1016/j.ins.2017.04.042.
Article
Google Scholar
Asim S, Muhammad H, Rehman SU, et al. A comparative study of feature selection approaches: 2016–2020. Int J Sci Eng Res. 2020;11:469–78.
Google Scholar
Jollife IT, Cadima J. Principal component analysis: A review and recent developments. Philos Trans R Soc A. https://doi.org/10.1098/rsta.2015.0202(Epub ahead of print 2016).
Álvarez JD, Matias-Guiu JA, Cabrera-Martín MN, et al. An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders. BMC Bioinform. https://doi.org/10.1186/s12859-019-3027-7(Epub ahead of print 2019).
Imtiaz T, Rifat S, Fattah SA, et al. Automated Brain Tumor Segmentation Based on Multi-Planar Superpixel Level Features Extracted from 3D MR Images. IEEE Access. https://doi.org/10.1109/access.2019.2961630(Epub ahead of print 2020).
Dong L, Xing L, Liu T, et al. Very high resolution remote sensing imagery classification using a fusion of random forest and deep learning technique-subtropical area for example. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. https://doi.org/10.1109/jstars.2019.2953234(Epub ahead of print 2020).
Tumar I, Hassouneh Y, Turabieh H, et al. Enhanced binary moth flame optimization as a feature selection algorithm to predict software fault prediction. IEEE Access. https://doi.org/10.1109/access.2020.2964321(Epub ahead of print 2020).
Liu Y, Ju S, Wang J, et al. A new feature selection method for text classification based on independent feature space search. Hindawi Mathematical Problems in Engineering 2020; 1–14.
Schapire RE. Explaining adaboost. In: Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik. 2013. https://doi.org/10.1007/978-3-642-41136-6_5(Epub ahead of print 2013).
Cabezas M, Oliver A, Valverde S, et al. BOOST: A supervised approach for multiple sclerosis lesion segmentation. Journal of Neuroscience Methods. https://doi.org/10.1016/j.jneumeth.2014.08.024(Epub ahead of print 2014).
Kubankova A, Kubanek D, Prinosil J. Digital modulation classification based on characteristic features and GentleBoost algorithm. In: 2011 34th International Conference on Telecommunications and Signal Processing, TSP 2011 - Proceedings. 2011. https://doi.org/10.1109/tsp.2011.6043692(Epub ahead of print 2011).