Text classification with an efficient preprocessing technique for cross-language and multilingual data

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.

Dettagli Bibliografici
Autori principali:	Khan, Towhid, Mallick, David Dew, Khan, Md.Shakiful Islam, Hasan, Md Mahadi
Altri autori:	Ashraf, Faisal Bin
Natura:	Tesi
Lingua:	English
Pubblicazione:	Brac University 2023
Soggetti:	Random forest Logistic regression TF-IDF SVM XGB mLSTM LSTM Information retrieval Sentiment analysis NLP Natural language processing (Computer science) Computational linguistics > Congresses
Accesso online:	http://hdl.handle.net/10361/21865

id	10361-21865
record_format	dspace
spelling	10361-218652023-10-17T21:04:23Z Text classification with an efficient preprocessing technique for cross-language and multilingual data Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi Ashraf, Faisal Bin Department of Computer Science and Engineering, Brac University Random forest Logistic regression TF-IDF SVM XGB mLSTM LSTM Information retrieval Sentiment analysis NLP Natural language processing (Computer science) Computational linguistics--Congresses This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 43-44). The procedure of eradicating extraneous textual elements and preparing or process- ing the values to be fed into the classifier model is often indicates the concept of text-preprocessing. There are several preprocessing methods, however not all of them are effective when used with cross-language and multilingual datasets. Run- ning a cross-lingual or multilingual dataset through a single pre-processing method and text classification model is rather challenging. What if a technique could be used to better classify data from multilingual and cross lingual datasets? In order to accelerate the process of improving accuracy, we tested various combinations of data pre-processing with text classification models on datasets in Bangla, English, and cross-lingual (Native language written in English letters). We may infer from our experiment that mLSTM functioned effectively for datasets in Bangla and English. Thus, mLSTM can be a helpful preprocessing method for datasets containing a variety of languages. Towhid Khan David Dew Mallick Md.Shakiful Islam Khan Md Mahadi Hasan B.Sc. in Computer Science 2023-10-17T08:43:07Z 2023-10-17T08:43:07Z ©2022 2022-09-28 Thesis ID 18201035 ID 18201045 ID 18201198 ID 18201062 http://hdl.handle.net/10361/21865 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 54 pages application/pdf Brac University
institution	Brac University
collection	Institutional Repository
language	English
topic	Random forest Logistic regression TF-IDF SVM XGB mLSTM LSTM Information retrieval Sentiment analysis NLP Natural language processing (Computer science) Computational linguistics--Congresses
spellingShingle	Random forest Logistic regression TF-IDF SVM XGB mLSTM LSTM Information retrieval Sentiment analysis NLP Natural language processing (Computer science) Computational linguistics--Congresses Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi Text classification with an efficient preprocessing technique for cross-language and multilingual data
description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.
author2	Ashraf, Faisal Bin
author_facet	Ashraf, Faisal Bin Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi
format	Thesis
author	Khan, Towhid Mallick, David Dew Khan, Md.Shakiful Islam Hasan, Md Mahadi
author_sort	Khan, Towhid
title	Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_short	Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_full	Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_fullStr	Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_full_unstemmed	Text classification with an efficient preprocessing technique for cross-language and multilingual data
title_sort	text classification with an efficient preprocessing technique for cross-language and multilingual data
publisher	Brac University
publishDate	2023
url	http://hdl.handle.net/10361/21865
work_keys_str_mv	AT khantowhid textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata AT mallickdaviddew textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata AT khanmdshakifulislam textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata AT hasanmdmahadi textclassificationwithanefficientpreprocessingtechniqueforcrosslanguageandmultilingualdata
_version_	1814307890109874176

Text classification with an efficient preprocessing technique for cross-language and multilingual data

Documenti analoghi