MPA-SVM: An Effective Feature Selection Approach for High-Dimensional Datasets

المؤلفون

DOI:

https://doi.org/10.32792/jeps.v15i2.526

الكلمات المفتاحية:

Marine Predators Algorithm، Feature selection، Support Vector Machine، Data mining، High Dimensional Datasets

الملخص

A crucial phase in the data mining process is choosing a subset of potential features. Finding the ideal number of superior features to optimize the learning algorithm's performance is the ultimate aim of feature selection. But when a dataset's feature count rises, this issue gets more difficult to solve. As a result, modern optimization techniques are employed to find the best feature combinations. The Marine Predators Algorithm (MPA) is a novel metaheuristic that has proven effective in solving many optimization issues. Support vector machines (SVMs) are an essential method that are skillfully used to address classification problems. In order to address the feature selection problem in high dimensional datasets, in this work the MPA is adjusted using the SVM as classifier. The present study proposes MPA- SVM as a solution to the issue of feature selection in high dimensional datasets.   The suggested method's efficacy was confirmed using ten high-dimensional datasets acquired from Arizona State University (ASU) repository; the outcomes are contrasted with those of the other six cutting-edge feature selection algorithms. Atom Search Optimization (ASO), Equilibrium Optimizer (EO), Emperor Penguin Optimizer (EPO), Monarch Butterfly Optimization (MBO), Satin Bowerbird Optimizer (SBO), and  Sine Cosine Algorithm (SCA) are the algorithms that we compared.  The outcomes confirm that the suggested MPA-SVM method outperformed a number of metaheuristic algorithms and demonstrated an amazing capacity to choose the most important and ideal attributes. Across all datasets, MPA-SVM produces the lowest average error rates, minimum classification standard deviation (STD) values and FS rates.

المراجع

Agyeman, M., Guerrero, A., Access, Q. V.-I., & 2022, undefined. (n.d.). A review of classification techniques for arrhythmia patterns using convolutional neural networks and Internet of Things (IoT) devices. Ieeexplore.Ieee.Org. Retrieved October 13, 2022, from https://ieeexplore.ieee.org/abstract/document/9832886/

Al-Betar, M. A., Awadallah, M. A., Heidari, A. A., Chen, H., Al-khraisat, H., & Li, C. (2021). Survival exploration strategies for Harris Hawks Optimizer. Expert Systems with Applications, 168(December 2019), 114243. https://doi.org/10.1016/j.eswa.2020.114243

Al-Qaness, M. A. A., Ewees, A. A., Fan, H., Abualigah, L., & Elaziz, M. A. (2020). Marine predators algorithm for forecasting confirmed cases of COVID-19 in Italy, USA, Iran and Korea. International Journal of Environmental Research and Public Health, 17(10). https://doi.org/10.3390/ijerph17103520

Battiti, R. (1994). Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Transactions on Neural Networks, 5(4), 537–550. https://doi.org/10.1109/72.298224

Beheshti, Z. (2022). BMPA-TVSinV: A Binary Marine Predators Algorithm using time-varying sinus and V-shaped transfer functions for wrapper-based feature selection. Knowledge-Based Systems, 252. https://doi.org/10.1016/j.knosys.2022.109446

Bradley, P. S., & Mangasarjan, O. L. (1998). Feature Selection via Concave Minimization and Support Vector Machines. Proceedings of the Fifteenth International Conference on Machine Learning (ICML ’98), 6, 82–90.

Datasets | Feature Selection @ ASU. (n.d.). Retrieved August 1, 2024, from https://jundongl.github.io/scikit-feature/OLD/datasets_old.html

De Stefano, C., Fontanella, F., & Scotto di Freca, A. (2016). A novel GA-based feature selection approach for high dimensional data. GECCO 2016 Companion - Proceedings of the 2016 Genetic and Evolutionary Computation Conference, 87–88. https://doi.org/10.1145/2908961.2909049

Dhiman, G., & Kumar, V. (2018). Emperor penguin optimizer: A bio-inspired algorithm for engineering problems. Knowledge-Based Systems, 159, 20–50. https://doi.org/10.1016/j.knosys.2018.06.001

Dorigo, M., & Stützle, T. (2009). Ant colony optimization: overview and recent advances. Techreport, IRIDIA, Universite Libre de Bruxelles, May.

Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048–1054. https://doi.org/10.1109/72.788645

Dupin, N., & Talbi, E. G. (2020). Machine learning-guided dual heuristics and new lower bounds for the refueling and maintenance planning problem of nuclear power plants. Algorithms, 13(8). https://doi.org/10.3390/A13080185

Einstein, A., & Cowper, A. D. (n.d.). INVESTIGATIONS O N THE THEORY .OF ,THE BROWNIAN MOVEMENT R. F ü R T H TRANSLATED BY.

Elminaam, D. S. A., Nabil, A., Ibraheem, S. A., & Houssein, E. H. (2021). An Efficient Marine Predators Algorithm for Feature Selection. IEEE Access, 9, 60136–60153. https://doi.org/10.1109/ACCESS.2021.3073261

Emary, E., & Zawbaa, H. M. (2019). Feature selection via Lèvy Antlion optimization. Pattern Analysis and Applications, 22(3), 857–876. https://doi.org/10.1007/s10044-018-0695-2

Faramarzi, A., Heidarinejad, M., Mirjalili, S., & Gandomi, A. H. (2020). Marine Predators Algorithm: A nature-inspired metaheuristic. Expert Systems with Applications, 152. https://doi.org/10.1016/j.eswa.2020.113377

Faramarzi, A., Heidarinejad, M., Stephens, B., & Mirjalili, S. (2020). Equilibrium optimizer: A novel optimization algorithm ✩. 191, 105190. https://doi.org/10.1016/j.knosys

Feature Selection using Salp Swarm Algorithm for Real Biomedical Datasets. (2017). IJCSNS International Journal of Computer Science and Network Security.

Gheyas, I. A., & Smith, L. S. (2010). Feature subset selection in large dimensionality domains. Pattern Recognition, 43(1), 5–13. https://doi.org/10.1016/j.patcog.2009.06.009

Gopal, S., Patro, K., & Kumar Sahu, K. (n.d.). Normalization: A Preprocessing Stage. www.kiplinger.com,

Hermes, L., & Buhmann, J. M. (2000). Feature Selection for Support Vector Machines. Proc. {IEEE} Intl. Conf. Pattern Recognition ({ICPR’00}), 2, 716–719. https://doi.org/10.1109/ICPR.2000.906174

Hussain, K., Neggaz, N., Zhu, W., & Houssein, E. H. (2021). An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection. Expert Systems with Applications, 176. https://doi.org/10.1016/j.eswa.2021.114778

Ibrahim, H. T., Mazher, W. J., Ucan, O. N., & Bayat, O. (2018). A grasshopper optimizer approach for feature selection and optimizing SVM parameters utilizing real biomedical data sets. Neural Computing and Applications. https://doi.org/10.1007/s00521-018-3414-4

Izmailov, A. F. (2010). Solution sensitivity for Karush-Kuhn-Tucker systems with non-unique Lagrange multipliers. Optimization, 59(5), 747–775. https://doi.org/10.1080/02331930802434922

Jha, K., & Saha, S. (2021). Incorporation of multimodal multiobjective optimization in designing a filter based feature selection technique. Applied Soft Computing, 98. https://doi.org/10.1016/j.asoc.2020.106823

Jia, L., Gong, W., & Wu, H. (n.d.). An Improved Self-adaptive Control Parameter of Differential Evolution for Global Optimization.

Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Neural Networks, 1995. Proceedings., IEEE International Conference On, 4, 1942–1948 vol.4. https://doi.org/10.1109/ICNN.1995.488968

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324. https://doi.org/10.1016/S0004-3702(97)00043-X

Kumar, L., & Bharti, K. K. (2021). A novel hybrid BPSO–SCA approach for feature selection. Natural Computing, 20(1), 39–61. https://doi.org/10.1007/s11047-019-09769-z

Luo, J., Zhou, D., Jiang, L., & Ma, H. (2022). A particle swarm optimization based multiobjective memetic algorithm for high-dimensional feature selection. Memetic Computing, 14(1), 77–93. https://doi.org/10.1007/s12293-022-00354-z

Mafarja, M. M., & Mirjalili, S. (2017). Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing, 260, 302–312. https://doi.org/10.1016/j.neucom.2017.04.053

Mammone, A., Turchi, M., & Cristianini, N. (2009). Support vector machines. Engineering, December, 1–39. https://doi.org/10.1002/wics.049

Mantegna, R. N. (1994a). Fast, accurate algorithm for numerical simulation of Levy stable stochastic processes (Vol. 49, Issue 5).

Mantegna, R. N. (1994b). Fast, accurate algorithm for numerical simulation of Lévy stable stochastic processes. Physical Review E, 49(5), 4677–4683. https://doi.org/10.1103/PhysRevE.49.4677

Mirjalili, S. (2016). SCA: A Sine Cosine Algorithm for solving optimization problems. Knowledge-Based Systems, 96, 120–133. https://doi.org/10.1016/j.knosys.2015.12.022

Moorthy, U., & Gandhi, U. D. (2021). A novel optimal feature selection technique for medical data classification using ANOVA based whale optimization. Journal of Ambient Intelligence and Humanized Computing, 12(3), 3527–3538. https://doi.org/10.1007/s12652-020-02592-w

Mukherjee, S., Chapelle, O., Weston, J., Mukherjee yy, S., Chapelle, O., Pontil yy Tomaso Poggio yy, M., Vapnik, V., & Barnhill BioInformaticscom, yyy. (2000). Feature selection for SVMs. https://www.researchgate.net/publication/221619995

Parouha, R. P., & Das, K. N. (2016). A memory based differential evolution algorithm for unconstrained optimization. Applied Soft Computing, 38, 501–517. https://doi.org/10.1016/J.ASOC.2015.10.022

Pehlivanlı, A. Ç. (2016). A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection. Journal of Applied Statistics, 43(6), 1140–1154. https://doi.org/10.1080/02664763.2015.1092112

Qinbao Song, Jinjie Ni, & Guangtao Wang. (2011). A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data. Knowledge Creation Diffusion Utilization, 99(X), 1–14. https://doi.org/10.1109/TKDE.2011.181

Salcedo-sanz, S., Prado-cumplido, M., Fernando, P., & Bouso, C. (2002). Feature Selection via Genetic Optimization. 547–548.

Samareh Moosavi, S. H., & Khatibi Bardsiri, V. (2017). Satin bowerbird optimizer: A new optimization algorithm to optimize ANFIS for software development effort estimation. Engineering Applications of Artificial Intelligence, 60, 1–15. https://doi.org/10.1016/j.engappai.2017.01.006

Shaheen, M. A. M., Yousri, D., Fathy, A., Hasanien, H. M., Alkuhayli, A., & Muyeen, S. M. (2020). A Novel Application of Improved Marine Predators Algorithm and Particle Swarm Optimization for Solving the ORPD Problem. Energies, 13(21). https://doi.org/10.3390/en13215679

Shen, C., & Zhang, K. (2022). Two-stage improved Grey Wolf optimization algorithm for feature selection on high-dimensional classification. Complex and Intelligent Systems, 8(4), 2769–2789. https://doi.org/10.1007/s40747-021-00452-4

Soliman, M. A., Hasanien, H. M., & Alkuhayli, A. (2020). Marine Predators Algorithm for Parameters Identification of Triple-Diode Photovoltaic Models. IEEE Access, 8, 155832–155842. https://doi.org/10.1109/ACCESS.2020.3019244

Storn, R., & Price, K. (1997). Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization, 11(4), 341–359. https://doi.org/10.1023/A:1008202821328

Tawhid, M. A., & Dsouza, K. B. (2018a). Hybrid binary bat enhanced particle swarm optimization algorithm for solving feature selection problems. Applied Computing and Informatics, 16(1–2), 117–136. https://doi.org/10.1016/j.aci.2018.04.001

Tawhid, M. A., & Dsouza, K. B. (2018b). Hybrid binary bat enhanced particle swarm optimization algorithm for solving feature selection problems. Applied Computing and Informatics, 16(1–2), 117–136. https://doi.org/10.1016/j.aci.2018.04.001

Too, J., Mafarja, M., & Mirjalili, S. (2021). Spatial bound whale optimization algorithm: an efficient high-dimensional feature selection approach. Neural Computing and Applications, 33(23), 16229–16250. https://doi.org/10.1007/s00521-021-06224-y

Tran, B., Xue, B., & Zhang, M. (2019). Adaptive multi-subswarm optimisation for feature selection on high-dimensional classification. GECCO 2019 - Proceedings of the 2019 Genetic and Evolutionary Computation Conference, 481–489. https://doi.org/10.1145/3321707.3321713

V, V. (1995). The nature of statistical learning theory. Springer, New York.

Wang, G. G., Deb, S., & Cui, Z. (2019). Monarch butterfly optimization. Neural Computing and Applications, 31(7), 1995–2014. https://doi.org/10.1007/s00521-015-1923-y

Winter, G., Periaux, J., Galan, M., & Cuesta, P. (1996). Genetic Algorithms in Engineering and Computer Science. http://dl.acm.org/citation.cfm?id=547504

Xu, Z., Huang, G., Weinberger, K. Q., & Zheng, A. X. (2014). Gradient boosted feature selection. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 522–531. https://doi.org/10.1145/2623330.2623635

Yan, C., Ma, J., Luo, H., & Patel, A. (2019). Hybrid binary Coral Reefs Optimization algorithm with Simulated Annealing for Feature Selection in high-dimensional biomedical datasets. Chemometrics and Intelligent Laboratory Systems, 184, 102–111. https://doi.org/10.1016/j.chemolab.2018.11.010

Zhang, B., & Cao, P. (2019). Classification of high dimensional biomedical data based on feature selection using redundant removal. PLoS ONE, 14(4), 1–19. https://doi.org/10.1371/journal.pone.0214406

Zhao, W., Wang, L., & Zhang, Z. (2019). Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowledge-Based Systems, 163, 283–304. https://doi.org/10.1016/j.knosys.2018.08.030

التنزيلات

منشور

2025-06-01