Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2022.
Hovedforfatter: | |
---|---|
Andre forfattere: | |
Format: | Thesis |
Sprog: | en_US |
Udgivet: |
Brac University
2022
|
Fag: | |
Online adgang: | http://hdl.handle.net/10361/17532 |
id |
10361-17532 |
---|---|
record_format |
dspace |
spelling |
10361-175322022-10-24T21:01:39Z Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System Karimi, Sadullah Rasel, Annajiat Alim Department of Computer Science and Engineering, Brac University Natural Language Querying Translating From Afghan to English Lexical analysis Syntax analysis Semantic analysis Query Generation Python Library Data dictionary Natural language interface to database NLIDB Non-English NLIDB Natural language interface NLI Natural language user interface NLUI Afghan NLP survey Dari Dari language. This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 167-194). Technology adoption is extremely limited in Afghanistan, especially since people have limited access to the Internet, smartphone, and computer due to power limitations and the high cost of the Internet. The people in Afghanistan suffer from high-cost of Internet that is provided by the private sector with very low-speed and quality. Natural Language Processing (NLP) has various applications and improves access to information and systems. To advance as a country, Afghanistan needs to be able to utilize existing databases, datasets, and create new ones and maintain those. Initially, people need a system so they can access the databases providing various guidance with the limited resource that they have access to. Later, they would benefit from higher level access for maintenance and crowdsourced contributions. This work first focus on building a system that Afghanistan people can access database in their native language. Afghan (Dari) language is one of the widely used languages, with up to 110 million speakers worldwide. It is used in countries like Afghanistan, Azerbaijan, Iran, Iraq, Russia, Tajikistan, Turkmenistan, Uzbekistan, etc. The Afghan language lacks resources and requires more qualified lexicon translation. The proposed Afghan Natural Language Interface to Database is based on a natural language query-response model. Afghan language has been used in the model to extract desired data from a database. Retrieving data from a database necessitates knowledge of SQL Query Language or a very well-designed user interface. It is easy for domain experts to retrieve data from databases. However, it is quite challenging for non-expert users to access the database using SQL queries in absence of a proper and friendly user interface. This work overcomes the challenge for those who speak the Afghan Language worldwide to access different databases and datasets. First, we did a survey of current state of Afghan NLP for finding research gaps for future researchers of the Afghan language. We have identified the research gap of NLIDB systems. Second, we surveyed non-English NLIDB systems and conducted a systematic review of the current methods of non-English NLIDB. Then we propose an NLIDB system for Afghan language. Through our system, users in Afghanistan can access the database through feature phone, land phone calls based on an open-source Interactive Voice Response (IVR) system in addition to smartphones and computers. The system can be easily accessed by users without the need for high-speed Internet, sustainable power, computer, and smartphone to access databases. The system is built according to the limited technology situation in Afghanistan. The Afghan Spoken NLIDB build through lexical analysis, semantic analysis, and syntax analysis to respond to the Afghan language natural language query for transforming it into Structured Query Language (SQL). Sadullah Karimi M. Computer Science and Engineering 2022-10-24T10:37:55Z 2022-10-24T10:37:55Z 2022 2022-09 Thesis ID: 21166041 http://hdl.handle.net/10361/17532 en_US Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 194 Pages application/pdf Brac University |
institution |
Brac University |
collection |
Institutional Repository |
language |
en_US |
topic |
Natural Language Querying Translating From Afghan to English Lexical analysis Syntax analysis Semantic analysis Query Generation Python Library Data dictionary Natural language interface to database NLIDB Non-English NLIDB Natural language interface NLI Natural language user interface NLUI Afghan NLP survey Dari Dari language. |
spellingShingle |
Natural Language Querying Translating From Afghan to English Lexical analysis Syntax analysis Semantic analysis Query Generation Python Library Data dictionary Natural language interface to database NLIDB Non-English NLIDB Natural language interface NLI Natural language user interface NLUI Afghan NLP survey Dari Dari language. Karimi, Sadullah Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System |
description |
This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2022. |
author2 |
Rasel, Annajiat Alim |
author_facet |
Rasel, Annajiat Alim Karimi, Sadullah |
format |
Thesis |
author |
Karimi, Sadullah |
author_sort |
Karimi, Sadullah |
title |
Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System |
title_short |
Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System |
title_full |
Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System |
title_fullStr |
Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System |
title_full_unstemmed |
Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System |
title_sort |
survey of afghan (dari) language nlp for building afghan nlidb system |
publisher |
Brac University |
publishDate |
2022 |
url |
http://hdl.handle.net/10361/17532 |
work_keys_str_mv |
AT karimisadullah surveyofafghandarilanguagenlpforbuildingafghannlidbsystem |
_version_ |
1814307425562394624 |