Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023.
Main Authors: | , , , , |
---|---|
Outros Autores: | |
Formato: | Thesis |
Idioma: | English |
Publicado em: |
Brac University
2024
|
Assuntos: | |
Acesso em linha: | http://hdl.handle.net/10361/23605 |
id |
10361-23605 |
---|---|
record_format |
dspace |
spelling |
10361-236052024-06-26T21:02:41Z Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation Billah, Syed Mohammed Mostaque Subarna, Ateya Ahmed Sarna, Sudipta Nandi Wasit, Ahmad Shawkat Shawkat, Ahmad Sadeque, Farig Yousuf Department of Computer Science and Engineering, Brac University Parallel corpus Machine translation Neural Machine Translation Low resource language Aligner Computer lingiustics This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023. Cataloged from PDF version of thesis. Includes bibliographical references (pages 38-39). Around seven million individuals in India, Bangladesh, Bhutan, and Nepal speak Santali, positioning it as nearly the third most commonly used Austroasiatic language. Despite its prominence among the Austroasiatic language family’s Munda subfamily, Santali lacks global recognition. Currently, no translation models exist for the Santali language. This paper aims to remove Santali from the NPL spectrum. We aim to examine the feasibility of building Santali-English translation models based on available Santali corpora. This paper successfully addressed the low-resource problem and, with promising results, examined the possibility of using the Santali language. We think that our study will open the door for further exploration into Santali-English machine translation. Syed Mohammed Mostaque Billah Ateya Ahmed Subarnav Sudipta Nandi Sarna Ahmad Shawkat Wasit Anika Fariha Chowdhury B.Sc in Computer Science 2024-06-26T07:24:09Z 2024-06-26T07:24:09Z ©2023 2023-09 Thesis ID 20101057 ID 23341089 ID 20101257 ID 20101398 ID 20101042 http://hdl.handle.net/10361/23605 en Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 45 pages application/pdf Brac University |
institution |
Brac University |
collection |
Institutional Repository |
language |
English |
topic |
Parallel corpus Machine translation Neural Machine Translation Low resource language Aligner Computer lingiustics |
spellingShingle |
Parallel corpus Machine translation Neural Machine Translation Low resource language Aligner Computer lingiustics Billah, Syed Mohammed Mostaque Subarna, Ateya Ahmed Sarna, Sudipta Nandi Wasit, Ahmad Shawkat Shawkat, Ahmad Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation |
description |
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023. |
author2 |
Sadeque, Farig Yousuf |
author_facet |
Sadeque, Farig Yousuf Billah, Syed Mohammed Mostaque Subarna, Ateya Ahmed Sarna, Sudipta Nandi Wasit, Ahmad Shawkat Shawkat, Ahmad |
format |
Thesis |
author |
Billah, Syed Mohammed Mostaque Subarna, Ateya Ahmed Sarna, Sudipta Nandi Wasit, Ahmad Shawkat Shawkat, Ahmad |
author_sort |
Billah, Syed Mohammed Mostaque |
title |
Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation |
title_short |
Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation |
title_full |
Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation |
title_fullStr |
Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation |
title_full_unstemmed |
Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation |
title_sort |
towards santali linguistic inclusion: building the first santali-to-english translation model using mt5 transformer and data augmentation |
publisher |
Brac University |
publishDate |
2024 |
url |
http://hdl.handle.net/10361/23605 |
work_keys_str_mv |
AT billahsyedmohammedmostaque towardssantalilinguisticinclusionbuildingthefirstsantalitoenglishtranslationmodelusingmt5transformeranddataaugmentation AT subarnaateyaahmed towardssantalilinguisticinclusionbuildingthefirstsantalitoenglishtranslationmodelusingmt5transformeranddataaugmentation AT sarnasudiptanandi towardssantalilinguisticinclusionbuildingthefirstsantalitoenglishtranslationmodelusingmt5transformeranddataaugmentation AT wasitahmadshawkat towardssantalilinguisticinclusionbuildingthefirstsantalitoenglishtranslationmodelusingmt5transformeranddataaugmentation AT shawkatahmad towardssantalilinguisticinclusionbuildingthefirstsantalitoenglishtranslationmodelusingmt5transformeranddataaugmentation |
_version_ |
1814308343639965696 |