Recognizing sentimental emotions in text by using Machine Learning

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.

Bibliografski detalji
Glavni autori: Bushra, Tabassum Khan, Saha, Kallol, Mulki, Ammin Hossain, Khan, Sanjana Sabah, Binta Amzad, Afrin
Daljnji autori: Mostakim, Moin
Format: Disertacija
Jezik:English
Izdano: Brac University 2023
Teme:
Online pristup:http://hdl.handle.net/10361/18070
id 10361-18070
record_format dspace
spelling 10361-180702023-04-03T21:01:51Z Recognizing sentimental emotions in text by using Machine Learning Bushra, Tabassum Khan Saha, Kallol Mulki, Ammin Hossain Khan, Sanjana Sabah Binta Amzad, Afrin Mostakim, Moin Department of Computer Science and Engineering, Brac University BERT Bag of Words TF-IDF Naive Bayes LSTM Machine learning This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 67-68). As one of the fastest and most prominent deep learning technologies being fiddled with today, sentiment analysis is capable of revealing an individual’s true emotions by analyzing their facial speech, text, facial expressions, gestures, and so on. The technology is being constantly used to understand how different individuals feel or react when they are put under certain circumstances or situations. The information obtained from such analyses is then processed to unravel the subject’s sentimental reactions to said circumstances and situations which can further be utilized in a magnitude of ways. While the technology itself is constantly being improved upon, opportunities still exist to make it more efficient. This research aims to use a va riety of machine learning algorithms and language models for sentiment detection in textual data, and understand how each of these algorithms and models approach the problems presented to them through the textual data. This is to be achieved utilizing five models that fall under three pairs namely primitive or simple models featuring TF-IDF and Bag of Words; mid complexity models featuring Naive Bayes; and advanced context-identifying state-of-the-art models namely LSTM and BERT. The datasets for this research include the Spotify App Reviews Dataset and 100K Coursera’s Course Reviews Dataset. We used 10000 samples from these datasets for our research. After running the suggested models, the research aims to discover which of them works best and on which datasets, whether or not there are any similarity patterns between them, and whether or not any of the suggested models provide poor or disappointing results, all of which are provided in descriptive and quantified forms, as well as through graphical representation. For 5 label sentiment classification, Multinomial Naive Bayes gave the highest accuracy score for both the Coursera’s Course Review and LSTM scored highest for Spotify App Review dataset which are 74.81% and 62.7%. For 3 label classification, pretrained BERT gave the highest accuracy score for the Coursera dataset and LSTM gave the highest score for Spotify dataset which are 91.2% and 78.3% respectively. However since our datasets very highly imbalanced, the accuracy score is a poor metric for per formance evaluation of the algorithms so we looked at the f1 scores instead. We have also addressed the imbalance in out datasets by using different bias handling techniques, such as random oversampling of the minority classes. We finally reached the conclusion that both LSTM and BERT performed the best for both datasets after carefully observing the f1 scores for all the class predictions for our algorithms in both cases of sentiment label categorization. Tabassum Khan Bushra Kallol Saha Ammin Hossain Mulki Sanjana Sabah Khan Afrin Binta Amzad B. Computer Science 2023-04-03T08:00:38Z 2023-04-03T08:00:38Z 2022 2022-10 Thesis ID: 18101163 ID: 18101461 ID: 18101468 ID: 18101502 ID: 19301267 http://hdl.handle.net/10361/18070 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 68 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic BERT
Bag of Words
TF-IDF
Naive Bayes
LSTM
Machine learning
spellingShingle BERT
Bag of Words
TF-IDF
Naive Bayes
LSTM
Machine learning
Bushra, Tabassum Khan
Saha, Kallol
Mulki, Ammin Hossain
Khan, Sanjana Sabah
Binta Amzad, Afrin
Recognizing sentimental emotions in text by using Machine Learning
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.
author2 Mostakim, Moin
author_facet Mostakim, Moin
Bushra, Tabassum Khan
Saha, Kallol
Mulki, Ammin Hossain
Khan, Sanjana Sabah
Binta Amzad, Afrin
format Thesis
author Bushra, Tabassum Khan
Saha, Kallol
Mulki, Ammin Hossain
Khan, Sanjana Sabah
Binta Amzad, Afrin
author_sort Bushra, Tabassum Khan
title Recognizing sentimental emotions in text by using Machine Learning
title_short Recognizing sentimental emotions in text by using Machine Learning
title_full Recognizing sentimental emotions in text by using Machine Learning
title_fullStr Recognizing sentimental emotions in text by using Machine Learning
title_full_unstemmed Recognizing sentimental emotions in text by using Machine Learning
title_sort recognizing sentimental emotions in text by using machine learning
publisher Brac University
publishDate 2023
url http://hdl.handle.net/10361/18070
work_keys_str_mv AT bushratabassumkhan recognizingsentimentalemotionsintextbyusingmachinelearning
AT sahakallol recognizingsentimentalemotionsintextbyusingmachinelearning
AT mulkiamminhossain recognizingsentimentalemotionsintextbyusingmachinelearning
AT khansanjanasabah recognizingsentimentalemotionsintextbyusingmachinelearning
AT bintaamzadafrin recognizingsentimentalemotionsintextbyusingmachinelearning
_version_ 1814308150855073792