Text classification with an efficient preprocessing technique for cross-language and multilingual data

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.

Dettagli Bibliografici
Autori principali: Khan, Towhid, Mallick, David Dew, Khan, Md.Shakiful Islam, Hasan, Md Mahadi
Altri autori: Ashraf, Faisal Bin
Natura: Tesi
Lingua:English
Pubblicazione: Brac University 2023
Soggetti:
Accesso online:http://hdl.handle.net/10361/21865
id 10361-21865
record_format dspace
spelling 10361-218652023-10-17T21:04:23Z Text classification with an efficient preprocessing technique for cross-language and multilingual data Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi Ashraf, Faisal Bin Department of Computer Science and Engineering, Brac University Random forest Logistic regression TF-IDF SVM XGB mLSTM LSTM Information retrieval Sentiment analysis NLP Natural language processing (Computer science) Computational linguistics--Congresses This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 43-44). The procedure of eradicating extraneous textual elements and preparing or process- ing the values to be fed into the classifier model is often indicates the concept of text-preprocessing. There are several preprocessing methods, however not all of them are effective when used with cross-language and multilingual datasets. Run- ning a cross-lingual or multilingual dataset through a single pre-processing method and text classification model is rather challenging. What if a technique could be used to better classify data from multilingual and cross lingual datasets? In order to accelerate the process of improving accuracy, we tested various combinations of data pre-processing with text classification models on datasets in Bangla, English, and cross-lingual (Native language written in English letters). We may infer from our experiment that mLSTM functioned effectively for datasets in Bangla and English. Thus, mLSTM can be a helpful preprocessing method for datasets containing a variety of languages. Towhid Khan David Dew Mallick Md.Shakiful Islam Khan Md Mahadi Hasan B.Sc. in Computer Science 2023-10-17T08:43:07Z 2023-10-17T08:43:07Z ©2022 2022-09-28 Thesis ID 18201035 ID 18201045 ID 18201198 ID 18201062 http://hdl.handle.net/10361/21865 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 54 pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language English
topic Random forest
Logistic regression
TF-IDF
SVM
XGB
mLSTM
LSTM
Information retrieval
Sentiment analysis
NLP
Natural language processing (Computer science)
Computational linguistics--Congresses
spellingShingle Random forest
Logistic regression
TF-IDF
SVM
XGB
mLSTM
LSTM
Information retrieval
Sentiment analysis
NLP
Natural language processing (Computer science)
Computational linguistics--Congresses
Khan, Towhid
Mallick, David Dew
Khan, Md.Shakiful Islam
Hasan, Md Mahadi
Text classification with an efficient preprocessing technique for cross-language and multilingual data
description This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.
author2 Ashraf, Faisal Bin
author_facet Ashraf, Faisal Bin
Khan, Towhid
Mallick, David Dew
Khan, Md.Shakiful Islam
Hasan, Md Mahadi
format Thesis
author Khan, Towhid
Mallick, David Dew
Khan, Md.Shakiful Islam
Hasan, Md Mahadi
author_sort Khan, Towhid
title Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_short Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_full Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_fullStr Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_full_unstemmed Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_sort text classification with an efficient preprocessing technique for cross-language and multilingual data
publisher Brac University
publishDate 2023
url http://hdl.handle.net/10361/21865
work_keys_str_mv AT khantowhid textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata
AT mallickdaviddew textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata
AT khanmdshakifulislam textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata
AT hasanmdmahadi textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata
_version_ 1814307890109874176