Quality assessment of extracted information from newspaper comment sections using natural language processing

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Bibliografske podrobnosti
Main Authors: Deb, Arnob, Islam, Maidul, Hossain, Sadab Sifar, Alam, Farjana
Drugi avtorji: Sadeque, Farig Yousuf
Format: Thesis
Jezik:English
Izdano: Brac University 2024
Teme:
Online dostop:http://hdl.handle.net/10361/22782
id 10361-22782
record_format dspace
spelling 10361-227822024-05-09T21:03:17Z Quality assessment of extracted information from newspaper comment sections using natural language processing Deb, Arnob Islam, Maidul Hossain, Sadab Sifar Alam, Farjana Sadeque, Farig Yousuf Department of Computer Science and Engineering, Brac University Natural language processing Information extraction S-BERT RoBERTa Similarity Natural language processing (Computer science) This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024. Cataloged from PDF version of thesis. Includes bibliographical references (pages 39-40). Newspaper comment section– where readers can leave their opinions– can be an excellent source of information embellishment if used properly. Although there is a risk of fake news and misinformation being spread through the comment section, quality information can also be extracted from these comments that may supplement the original news. From recently performed research, a comment can range between irrelevant to informative– and in our thesis, we would like to identify informative news comments that will further be used to supplement the original news article. We will also identify the level of informativeness of a newspaper comment to figure out whether the task of assigning the Editor’s Pick flag (which is currently done by hand at every large news outlet) with the help of state-of-the-art natural language processing and information extraction techniques. We evaluated the similarity between comments and their respective news articles using transformer models like Sentence BERT. Furthermore, we checked if a comment logically entails using different models, from Simple RNN and LSTM to advanced ones like Roberta and big models like Electra. The final model for Textual Entailment (RoBERTa) task outperformed all the other models by achieving an accuracy of 88.60% and the final model for Textual Similarity (SBERT) task outperformed all the similarity models with an accuracy of 68.49%. Arnob Deb Maidul Islam Sadab Sifar Hossain Farjana Alam B.Sc. in Computer Science 2024-05-09T03:23:08Z 2024-05-09T03:23:08Z ©2024 2024-01 Thesis ID: 23241076 ID: 20101309 ID: 23341064 ID: 20101022 http://hdl.handle.net/10361/22782 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 52 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic Natural language processing
Information extraction
S-BERT
RoBERTa
Similarity
Natural language processing (Computer science)
spellingShingle Natural language processing
Information extraction
S-BERT
RoBERTa
Similarity
Natural language processing (Computer science)
Deb, Arnob
Islam, Maidul
Hossain, Sadab Sifar
Alam, Farjana
Quality assessment of extracted information from newspaper comment sections using natural language processing
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.
author2 Sadeque, Farig Yousuf
author_facet Sadeque, Farig Yousuf
Deb, Arnob
Islam, Maidul
Hossain, Sadab Sifar
Alam, Farjana
format Thesis
author Deb, Arnob
Islam, Maidul
Hossain, Sadab Sifar
Alam, Farjana
author_sort Deb, Arnob
title Quality assessment of extracted information from newspaper comment sections using natural language processing
title_short Quality assessment of extracted information from newspaper comment sections using natural language processing
title_full Quality assessment of extracted information from newspaper comment sections using natural language processing
title_fullStr Quality assessment of extracted information from newspaper comment sections using natural language processing
title_full_unstemmed Quality assessment of extracted information from newspaper comment sections using natural language processing
title_sort quality assessment of extracted information from newspaper comment sections using natural language processing
publisher Brac University
publishDate 2024
url http://hdl.handle.net/10361/22782
work_keys_str_mv AT debarnob qualityassessmentofextractedinformationfromnewspapercommentsectionsusingnaturallanguageprocessing
AT islammaidul qualityassessmentofextractedinformationfromnewspapercommentsectionsusingnaturallanguageprocessing
AT hossainsadabsifar qualityassessmentofextractedinformationfromnewspapercommentsectionsusingnaturallanguageprocessing
AT alamfarjana qualityassessmentofextractedinformationfromnewspapercommentsectionsusingnaturallanguageprocessing
_version_ 1814308915313115136