Text classification with an efficient preprocessing technique for cross-language and multilingual data
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.
Autori principali: | , , , |
---|---|
Altri autori: | |
Natura: | Tesi |
Lingua: | English |
Pubblicazione: |
Brac University
2023
|
Soggetti: | |
Accesso online: | http://hdl.handle.net/10361/21865 |
id |
10361-21865 |
---|---|
record_format |
dspace |
spelling |
10361-218652023-10-17T21:04:23Z Text classification with an efficient preprocessing technique for cross-language and multilingual data Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi Ashraf, Faisal Bin Department of Computer Science and Engineering, Brac University Random forest Logistic regression TF-IDF SVM XGB mLSTM LSTM Information retrieval Sentiment analysis NLP Natural language processing (Computer science) Computational linguistics--Congresses This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 43-44). The procedure of eradicating extraneous textual elements and preparing or process- ing the values to be fed into the classifier model is often indicates the concept of text-preprocessing. There are several preprocessing methods, however not all of them are effective when used with cross-language and multilingual datasets. Run- ning a cross-lingual or multilingual dataset through a single pre-processing method and text classification model is rather challenging. What if a technique could be used to better classify data from multilingual and cross lingual datasets? In order to accelerate the process of improving accuracy, we tested various combinations of data pre-processing with text classification models on datasets in Bangla, English, and cross-lingual (Native language written in English letters). We may infer from our experiment that mLSTM functioned effectively for datasets in Bangla and English. Thus, mLSTM can be a helpful preprocessing method for datasets containing a variety of languages. Towhid Khan David Dew Mallick Md.Shakiful Islam Khan Md Mahadi Hasan B.Sc. in Computer Science 2023-10-17T08:43:07Z 2023-10-17T08:43:07Z ©2022 2022-09-28 Thesis ID 18201035 ID 18201045 ID 18201198 ID 18201062 http://hdl.handle.net/10361/21865 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 54 pages application/pdf Brac University |
institution |
Brac University |
collection |
Institutional Repository |
language |
English |
topic |
Random forest Logistic regression TF-IDF SVM XGB mLSTM LSTM Information retrieval Sentiment analysis NLP Natural language processing (Computer science) Computational linguistics--Congresses |
spellingShingle |
Random forest Logistic regression TF-IDF SVM XGB mLSTM LSTM Information retrieval Sentiment analysis NLP Natural language processing (Computer science) Computational linguistics--Congresses Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi Text classification with an efficient preprocessing technique for cross-language and multilingual data |
description |
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. |
author2 |
Ashraf, Faisal Bin |
author_facet |
Ashraf, Faisal Bin Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi |
format |
Thesis |
author |
Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi |
author_sort |
Khan, Towhid |
title |
Text classification with an efficient preprocessing technique for cross-language and multilingual data |
title_short |
Text classification with an efficient preprocessing technique for cross-language and multilingual data |
title_full |
Text classification with an efficient preprocessing technique for cross-language and multilingual data |
title_fullStr |
Text classification with an efficient preprocessing technique for cross-language and multilingual data |
title_full_unstemmed |
Text classification with an efficient preprocessing technique for cross-language and multilingual data |
title_sort |
text classification with an efficient preprocessing technique for cross-language and multilingual data |
publisher |
Brac University |
publishDate |
2023 |
url |
http://hdl.handle.net/10361/21865 |
work_keys_str_mv |
AT khantowhid textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata AT mallickdaviddew textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata AT khanmdshakifulislam textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata AT hasanmdmahadi textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata |
_version_ |
1814307890109874176 |