Journal Articles (All Issues)

A ROBUST MACHINE LEARNING-BASED FRAMEWORK TO LEVERAGE CLASSIFICATION OF MALWARE

Authors

Lingaraj Sethi author1*, Dr Prof Prashanta Kumar Patra2

Keyword Malware, Malicious Software, Spyware, Adware, Gradient Boosting

Abstract

Malicious software, or malware, is a growing problem in today's cyber landscape and threatens the availability, confidentiality, and integrity of digital information. An effective malware detection system can be designed with the help of ensemble machine-learning models, according to this research paper. This research makes use of the Canadian Institute for Cybersecurity's CIC-Malmem2022 dataset, which was designed for studies on complicated malware classification. To strengthen the malware detection model's accuracy and resilience, the suggested approach combines Principal Component Analysis, Recursive Feature Eliminator, Decision Trees, Light Gradient Boosting Machine, and Gradient Boosting. Although the ensemble model achieved a high level of accuracy (99.96%) on the test set, the results demonstrate its effectiveness. Model hyperparameter tuning reveals best-practice parameters, and the ensemble confusion matrix delves into classification efficacy. Analyses comparing the proposed approach to current methods show that it is superior at detecting malware. The study finishes with suggestions for a safe environment to deploy the model and for frequent updates to address shifting cybersecurity threats.

References

    1. Gupta, Ruchika, and S. P. Agarwal. "A comparative study of cyber threats in emerging economies." Globus: An International Journal of Management & IT 8, no. 2 (2017): 24-28. 2. R. Komatwar and M. Kokare, ‘‘A survey on malware detection and classification,’’ J. Appl. Secure. Res., pp. 1–31, Aug. 2020. 3. Aslan, Ömer Aslan, and Refik Samet. "A comprehensive review on malware detection approaches." IEEE Access 8 (2020): 6249-6271. 4. S. A. Roseline, S. Geetha, S. Kadry, and Y. Nam, ‘‘Intelligent vision-based malware detection and classification using deep random forest paradigm,’’ IEEE Access, vol. 8, pp. 206303–206324, 2020. 5. M. Nisa, J. H. Shah, S. Kanwal, M. Raza, M. A. Khan, R. Damaševičius, and T. Blažauskas, ‘‘Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features,’’ Appl. Sci., vol. 10, no. 14, p. 4966, 2020 6. Ö. Aslan, M. Ozkan-Okay, and D. Gupta, ‘‘A review of cloud-based malware detection system: Opportunities, advances, and challenges,’’ Eur. J. Eng. Technol. Res., vol. 6, no. 3, pp. 1–8, Mar. 2021. 7. https://www.akamai.com/glossary/what-is-malware 8. Alenezi, Mohammed N., Haneen Alabdulrazzaq, Abdullah A. Alshaher, and Mubarak M. Alkharang. "Evolution of malware threats and techniques: A review." International Journal of Communication Networks and Information Security 12, no. 3 (2020): 326-337. 9. Aboaoja, Faitouri A., Anazida Zainal, Fuad A. Ghaleb, Bander Ali Saleh Al-rimy, Taiseer Abdalla Elfadil Eisa, and Asma Abbas Hassan Elnour. "Malware detection issues, challenges, and future directions: A survey." Applied Sciences 12, no. 17 (2022): 8482. 10. Wazid, Mohammad, Ashok Kumar Das, Joel JPC Rodrigues, Sachin Shetty, and Youngho Park. "IoMT malware detection approaches analysis and research challenges." IEEE Access 7 (2019): 182459-182476. 11. Andrade, Eduardo de O., José Viterbo, Cristina N. Vasconcelos, Joris Guérin, and Flavia Cristina Bernardini. "A model based on LSTM neural networks to identify five different types of malware." Procedia Computer Science 159 (2019): 182-191. 12. Matrosov, Alex, Eugene Rodionov, and Sergey Bratus. Rootkits and boot kits: reversing modern malware and next generation threats. No Starch Press, 2019. 13. Dwivedi, Aarushi, Krishna Chandra Tripathi, and M. L. Sharma. "Advanced keylogger-a stealthy malware for computer monitoring." Asian Journal For Convergence In Technology (AJCT) ISSN-2350-1146 7, no. 1 (2021): 137-140 14. Cadden, James, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, and Jonathan Appavoo. "SEUSS: skip redundant paths to make serverless fast." In Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1-15. 2020. 15. Sebastio, Stefano, Eduard Baranov, Fabrizio Biondi, Olivier Decourbe, Thomas Given-Wilson, Axel Legay, Cassius Puodzius, and Jean Quilbeuf. "Optimizing symbolic execution for malware behavior classification." Computers & Security 93 (2020): 101775. 16. Or-Meir, Ori, Nir Nissim, Yuval Elovici, and Lior Rokach. "Dynamic malware analysis in the modern era—A state of the art survey." ACM Computing Surveys (CSUR) 52, no. 5 (2019): 1-48. 17. Li, Hongcheng, Jianjun Huang, Bin Liang, Wenchang Shi, Yifang Wu, and Shilei Bai. "Identifying parasitic malware as outliers by code clustering." Journal of Computer Security 28, no. 2 (2020): 157-189. 18. Tian, Donghai, Rui Ma, Xiaoqi Jia, and Changzhen Hu. "A kernel rootkit detection approach based on virtualization and machine learning." IEEE Access 7 (2019): 91657-91666. 19. Liu, Zhifeng, Desheng Zheng, Xinlong Wu, Jixin Chen, Xiaolan Tang, and Ziyong Ran. "VABox: A virtualization-based analysis framework of virtualization-obfuscated packed executables." In Advances in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19-23, 2021, Proceedings, Part III 7, pp. 73-84. Springer International Publishing, 2021. 20. Smith, S. E. The Geek and the Sheikh. Montana Publishing, 2023. 21. Mamchenko, Mark, and Alexey Sabanov. "Exploring the taxonomy of USB-based attacks." In 2019 Twelfth International Conference" Management of large-scale system development"(MLSD), pp. 1-4. IEEE, 2019. 22. Hussain, Abrar, Muhammad Asif, Maaz Bin Ahmad, Toqeer Mahmood, and M. Arslan Raza. "Malware detection using machine learning algorithms for Windows platform." In Proceedings of International Conference on Information Technology and Applications: ICITA 2021, pp. 619-632. Singapore: Springer Nature Singapore, 2022. 23. Akhtar, Muhammad Shoaib, and Tao Feng. "Malware Analysis and Detection Using Machine Learning Algorithms." Symmetry 14, no. 11 (2022): 2304. 24. Narayanan, Barath Narayanan, and Venkata Salini Priyamvada Davuluru. "Ensemble malware classification system using deep neural networks." Electronics 9, no. 5 (2020): 721. 25. Lad, Sumit S., and Amol C. Adamuthe. "Malware classification with improved convolutional neural network model." International Journal of Computer Network & Information Security 12, no. 6 (2020): 30-43. 26. Kumar, Nitesh, Subhasis Mukhopadhyay, Mugdha Gupta, Anand Handa, and Sandeep K. Shukla. "Malware classification using early stage behavioral analysis." In 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), pp. 16-23. IEEE, 2019. 27. Alomari, Esraa Saleh, Riyadh RahefNuiaa, Zaid Abdi AlkareemAlyasseri, Husam Jasim Mohammed, Nor Samsiah Sani, Mohd Isrul Esa, and Bashaer Abbuod Musawi. "Malware detection using deep learning and correlation-based feature selection." Symmetry 15, no. 1 (2023): 123. 28. Masum, Mohammad, Md Jobair Hossain Faruk, Hossain Shahriar, Kai Qian, Dan Lo, and Muhaiminul Islam Adnan. "Ransomware classification and detection with machine learning algorithms." In 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0316-0322. IEEE, 2022. 29. Sitaula, Chiranjibi, and Tej Bahadur Shahi. "Monkeypox virus detection using pre-trained deep learning-based approaches." Journal of Medical Systems 46, no. 11 (2022): 78. 30. He, Ke, and Dong-Seong Kim. "Malware detection with malware images using deep learning techniques." In 2019 18th IEEE International Conference on trust, security, and privacy in computing and communications/13th IEEE International Conference on big data science and engineering (TrustCom/BigDataSE), pp. 95-102. IEEE, 2019. 31. Yuxin, Ding, and Zhu Siyi. "Malware detection based on a deep learning algorithm." Neural Computing and Applications 31 (2019): 461-472. 32. Kwon, Young-Man, Jae-Ju An, Myung-Jae Lim, Seongsoo Cho, and Won-Mo Gal. "Malware classification using smash encoding and PCA (MCSP)." Symmetry 12, no. 5 (2020): 830. 33. Tiwari, Suman R., and Ravi U. Shukla. "An android malware detection technique using optimized permission and API with PCA." In 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 2611-2616. IEEE, 2018. 34. Rami, Khyati, and Vinod Desai. "Malware Detection Framework Using PCA Based ANN." In Computing Science, Communication and Security: First International Conference, COMS2 2020, Gujarat, India, March 26–27, 2020, Revised Selected Papers 1, pp. 298-313. Springer Singapore, 2020. 35. Mahmoud, Baffa Sani, and Ahmad Baita Garko. "A Machine Learning Model for Malware Detection Using Recursive Feature Elimination (RFE) For Feature Selection and Ensemble Technique." 36. Al Sarah, Neamat, Fahmida Yasmin Rifat, Md Shohrab Hossain, and Husnu S. Narman. "An efficient android malware prediction using Ensemble machine learning algorithms." Procedia Computer Science 191 (2021): 184-191. 37. Gunduz, Hakan. "Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs." PeerJ Computer Science 8 (2022): e988. 38. Kornyo, Oliver, Michael Asante, Richard Opoku, Kwabena Owusu-Agyemang, Benjamin Tei-Partey, Emmanuel Kwesi Baah, and Nkrumah Boadu. "Botnet Attacks Classification in AMI Networks with Recursive Feature Elimination (RFE) and Machine Learning Algorithms." Computers & Security (2023): 103456. 39. Manzil, HashidaHaidros Rahima, and Manohar S. Naik. "COVID-Themed Android Malware Analysis and Detection Framework Based on Permissions." In 2022 International Conference for Advancement in Technology (ICONAT), pp. 1-5. IEEE, 2022. 40. Kumar, Rajesh, and S. Geetha. "Malware classification using XGboost-Gradient boosted decision tree." Adv. Sci. Technol. Eng. Syst 5 (2020): 536-549. 41. Galen, Colin, and Robert Steele. "Empirical measurement of performance maintenance of gradient boosted decision tree models for malware detection." In 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 193-198. IEEE, 2021. 42. Ullah, Faizan, Qaisar Javaid, Abdu Salam, Masood Ahmad, Nadeem Sarwar, Dilawar Shah, and Muhammad Abrar. "Modified decision tree technique for ransomware detection at runtime through API calls." Scientific Programming 2020 (2020). 43. Mustafa Hilal, Anwer, Siwar Ben Haj Hassine, Souad Larabi-Marie-Sainte, Nadhem Nemri, Mohamed K. Nour, Abdelwahed Motwakel, Abu Sarwar Zamani, and Mesfer Al Duhayyim. "Malware Detection Using Decision Tree Based SVM Classifier for IoT." Computers, Materials & Continua 72, no. 1 (2022). 44. Al-Kasassbeh, Mouhammd, Mohammad A. Abbadi, and Ahmed M. Al-Bustanji. "LightGBM algorithm for malware detection." In Intelligent Computing: Proceedings of the 2020 Computing Conference, Volume 3, pp. 391-403. Springer International Publishing, 2020. 45. Gao, Yun, Hirokazu Hasegawa, Yukiko Yamaguchi, and Hajime Shimada. "Malware detection using LightGBM with a custom logistic loss function." IEEE Access 10 (2022): 47792-47804. 46. Abbadi, M., M. Al-Bustanji, and Mouhammd Al-Kasassbeh. "Robust Intelligent malware detection using LightGBM Algorithm." International Journal of Innovative Technology and Engineering 9, no. 6 (2020): 1253-1263. 47. Zhang, ZheMing. "Microsoft Malware Prediction Using LightGBM Model." In 2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 41-44. IEEE, 2022. 48. Ghourabi, Abdallah. "A security model based on light gum and transformer to protect healthcare systems from cyberattacks." IEEE Access 10 (2022): 48890-48903. 49. Thosar, Keshav, Pranay Tiwari, Revanth Jyothula, and Dayanand Ambawade. "Effective malware detection using gradient boosting and convolutional neural network." In 2021 IEEE Bombay Section Signature Conference (IBSSC), pp. 1-4. IEEE, 2021. 50. Yousefi‐Azar, Mahmood, Vijay Varadharajan, Len Hamey, and Shiping Chen. "Mutual Information and Feature Importance Gradient Boosting: automatic byte n‐gram feature reranking for Android malware detection." Software: Practice and Experience 51, no. 7 (2021): 1518-1539. 51. Turnip, ToguNovriansyah, Amsal Situmorang, Ayu Lumbantobing, Josua Marpaung, and Samuel IG Situmeang. "Android malware classification based on permission categories using extreme gradient boosting." In Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology, pp. 190-194. 2020. 52. Talukder, Md Alamin, KhondokarFida Hasan, Md Manowarul Islam, Md Ashraf Uddin, Arnisha Akhter, Mohammad Abu Yousuf, Fares Alharbi, and Mohammad Ali Moni. "A dependable hybrid machine learning model for network intrusion detection." Journal of Information Security and Applications 72 (2023): 103405. 53. Smith, Daryle, Sajad Khorsandroo, and Kaushik Roy. "Supervised Feature Selection to Improve the Accuracy for Malware Detection." (2023).

Downloads

View/Download PDF

PDF



Published

2024-01-30

Issue

Vol. 43 No. 01 (2024)