Enhanced hate speech detection in social media using transformer-based models

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

书目详细资料
Main Authors: Tabasshum, Anika, Ashrafi, Fairuz Tasnim, Afreen, Sadia
其他作者: Alam, Md Golam Rabiul
格式: Thesis
语言:English
出版: Brac University 2024
主题:
在线阅读:http://hdl.handle.net/10361/22841
id 10361-22841
record_format dspace
spelling 10361-228412024-10-21T05:54:34Z Enhanced hate speech detection in social media using transformer-based models Tabasshum, Anika Ashrafi, Fairuz Tasnim Afreen, Sadia Alam, Md Golam Rabiul Offensive language Neural network Machine learning Social media CNN Comment classification Neural networks (Computer science). Online social networks--Security measures. Social media. Natural language processing (Computer science). This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. Cataloged from PDF version of thesis. Includes bibliographical references (pages 61-66). Hate speech on social media can escalate into ”cyber conflict,” detrimentally impacting social life. With the exponential growth of Internet users and media content, identifying abusive language in audio and video content has become increasingly challenging. The nuances of human communication mean that individuals might employ seemingly non-hateful language in derogatory ways, often accompanied by specific voice tones and gestures that aren’t captured when converting multimedia into text. This research delves deep into the realm of hate speech detection, aiming to automatically identify harmful content across various social media platforms. Initially focused on text, our study utilized remote supervision for automatically labeled dataset creation and employed word embeddings with a bias toward hate. We analyzed datasets from Twitter, testing various machine-learning models to gauge the representation of hate speech and abusive language. Any tweet or online post exhibiting racist or sexist sentiments was categorized as ”hate speech.” Our objective was to classify such messages for better content moderation systematically. With advancements in our research, we have extended our detection capabilities to audio content. By leveraging Simple Feed-forward Neural Networks, RNNs, and CNNs, we can now discern hate speech patterns in audio with enhanced accuracy. However, the vastness of content on social media platforms means not every piece can be manually moderated. This underscores the importance of our automated hate speech detection, especially when dealing with content in linguistically challenging languages. However, social media networks cannot control every piece of user content. Because of this, it is necessary to identify hate speech automatically. This desire is heightened when the content is written in challenging languages. Our study provides a unique transformer-based methodology for detecting hate speech in social media. The proposed model uses Natural Language Processing (NLP) approaches to assess text and audio input. To increase the accuracy of hate speech identification, we use sophisticated deep learning architectures such as attention methods and transformers. Our model is trained on a huge dataset of tweets and audio recordings, and its performance is measured using a variety of criteria. Our transformer-based approach beats existing state-of-the-art hate speech identification methods, according to the results. Our study makes an essential addition to the field of computer science and engineering by addressing the critical issue of hate speech on social media and proposing an effective solution based on modern machine learning techniques. Anika Tabasshum Fairuz Tasnim Ashrafi Sadia Afreen B.Sc. in Computer Science 2024-05-15T08:26:26Z 2024-05-15T08:26:26Z ©2024 2024-01 Thesis ID: 19201106 ID: 19201035 ID: 19201105 http://hdl.handle.net/10361/22841 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 76 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic Offensive language
Neural network
Machine learning
Social media
CNN
Comment classification
Neural networks (Computer science).
Online social networks--Security measures.
Social media.
Natural language processing (Computer science).
spellingShingle Offensive language
Neural network
Machine learning
Social media
CNN
Comment classification
Neural networks (Computer science).
Online social networks--Security measures.
Social media.
Natural language processing (Computer science).
Tabasshum, Anika
Ashrafi, Fairuz Tasnim
Afreen, Sadia
Enhanced hate speech detection in social media using transformer-based models
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.
author2 Alam, Md Golam Rabiul
author_facet Alam, Md Golam Rabiul
Tabasshum, Anika
Ashrafi, Fairuz Tasnim
Afreen, Sadia
format Thesis
author Tabasshum, Anika
Ashrafi, Fairuz Tasnim
Afreen, Sadia
author_sort Tabasshum, Anika
title Enhanced hate speech detection in social media using transformer-based models
title_short Enhanced hate speech detection in social media using transformer-based models
title_full Enhanced hate speech detection in social media using transformer-based models
title_fullStr Enhanced hate speech detection in social media using transformer-based models
title_full_unstemmed Enhanced hate speech detection in social media using transformer-based models
title_sort enhanced hate speech detection in social media using transformer-based models
publisher Brac University
publishDate 2024
url http://hdl.handle.net/10361/22841
work_keys_str_mv AT tabasshumanika enhancedhatespeechdetectioninsocialmediausingtransformerbasedmodels
AT ashrafifairuztasnim enhancedhatespeechdetectioninsocialmediausingtransformerbasedmodels
AT afreensadia enhancedhatespeechdetectioninsocialmediausingtransformerbasedmodels
_version_ 1814308572667838464