Performance analysis of machine learning algorithms for Malware classification

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.

Библиографические подробности
Главные авторы: Bushra, Raisa Hasan, Alam, Md Taukir, Saha, Aniruddho, Fahim, Nazmus Sakib, Binty, Nabila Mourium
Другие авторы: Chakrabarty, Amitabha
Формат: Диссертация
Язык:English
Опубликовано: Brac University 2023
Предметы:
Online-ссылка:http://hdl.handle.net/10361/21825
id 10361-21825
record_format dspace
spelling 10361-218252023-10-15T21:07:34Z Performance analysis of machine learning algorithms for Malware classification Bushra, Raisa Hasan Alam, Md Taukir Saha, Aniruddho Fahim, Nazmus Sakib Binty, Nabila Mourium Chakrabarty, Amitabha Rodoshi, Ahanaf Hassan Department of Computer Science and Engineering, Brac University Machine learning Trojan Adware Ransomware Classification Malware Zero-day Naïve Bayes Stochastic gradient descent Random forest Decision tree AdaBoost XGBoost Logistic regression Multi-layer perceptron K- nearest neighbour Support vector machine Regression analysis Computer algorithms This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 32-36). Malware detection research has been popular over the years as the variations and complexity of malware attacks are increasing daily. Using variously Supervised and Unsupervised machine learning algorithms to detect, identify, or classify malware attacks has been proven a very effective technique for some past years. Some com- mon and widely concerning malware attacks are Trojan, Adware, Ransomware, and Zero-day. In this paper, we used ten ML algorithms such as AdaBoost, Stochastic Gradient Descent (SGD), Naïve Bayes (NB), Decision Tree (DT), Random For- est (RF), XGBoost, Logistic Regression (LR), Multi-Layer Perceptron (MLP), K- Nearest Neighbour(KNN), Support Vector Machine (SVM) for classifying software- based Trojan attacks, Ransomware, Adware and Zero-day attacks. This research was conducted on a dataset having a total sample of 12863 malware, consisting of the malware categories mentioned above, to extract features and learn patterns. Also, we showed a comparison between these ML methods and analysis based on how they classify these popular malware in this paper after testing each classifier on the selected dataset. After implementation, RF achieved the highest accuracy of 86.97%, and Gaussian NB achieved the lowest accuracy of 47.84%. MLP, XGBoost, KNN, DT, AdaBoost, SVM, LR, SGD got 83.60%, 82.59%, 80.68%, 79.63%, 73.30%, 73.22%, 67.08%, 64.40% accuracy respectively. Other than accuracy, our analysis was based on individual accuracy, precision, and F1-score, TPR, TNR, FPR, and FNR of malware classes for each ML classifier. Raisa Hasan Bushra Md Taukir Alam Aniruddho Saha Nazmus Sakib Fahim Nabila Mourium Binty B.Sc. in Computer Science 2023-10-15T10:39:29Z 2023-10-15T10:39:29Z ©2022 2022-09-29 Thesis ID 18301064 ID 18301277 ID 18201117 ID 18201166 ID 19101082 http://hdl.handle.net/10361/21825 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 47 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic Machine learning
Trojan
Adware
Ransomware
Classification
Malware
Zero-day
Naïve Bayes
Stochastic gradient descent
Random forest
Decision tree
AdaBoost
XGBoost
Logistic regression
Multi-layer perceptron
K- nearest neighbour
Support vector machine
Regression analysis
Computer algorithms
spellingShingle Machine learning
Trojan
Adware
Ransomware
Classification
Malware
Zero-day
Naïve Bayes
Stochastic gradient descent
Random forest
Decision tree
AdaBoost
XGBoost
Logistic regression
Multi-layer perceptron
K- nearest neighbour
Support vector machine
Regression analysis
Computer algorithms
Bushra, Raisa Hasan
Alam, Md Taukir
Saha, Aniruddho
Fahim, Nazmus Sakib
Binty, Nabila Mourium
Performance analysis of machine learning algorithms for Malware classification
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.
author2 Chakrabarty, Amitabha
author_facet Chakrabarty, Amitabha
Bushra, Raisa Hasan
Alam, Md Taukir
Saha, Aniruddho
Fahim, Nazmus Sakib
Binty, Nabila Mourium
format Thesis
author Bushra, Raisa Hasan
Alam, Md Taukir
Saha, Aniruddho
Fahim, Nazmus Sakib
Binty, Nabila Mourium
author_sort Bushra, Raisa Hasan
title Performance analysis of machine learning algorithms for Malware classification
title_short Performance analysis of machine learning algorithms for Malware classification
title_full Performance analysis of machine learning algorithms for Malware classification
title_fullStr Performance analysis of machine learning algorithms for Malware classification
title_full_unstemmed Performance analysis of machine learning algorithms for Malware classification
title_sort performance analysis of machine learning algorithms for malware classification
publisher Brac University
publishDate 2023
url http://hdl.handle.net/10361/21825
work_keys_str_mv AT bushraraisahasan performanceanalysisofmachinelearningalgorithmsformalwareclassification
AT alammdtaukir performanceanalysisofmachinelearningalgorithmsformalwareclassification
AT sahaaniruddho performanceanalysisofmachinelearningalgorithmsformalwareclassification
AT fahimnazmussakib performanceanalysisofmachinelearningalgorithmsformalwareclassification
AT bintynabilamourium performanceanalysisofmachinelearningalgorithmsformalwareclassification
_version_ 1814309589868347392