Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System

This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2022.

Bibliografiske detaljer
Hovedforfatter: Karimi, Sadullah
Andre forfattere: Rasel, Annajiat Alim
Format: Thesis
Sprog:en_US
Udgivet: Brac University 2022
Fag:
Online adgang:http://hdl.handle.net/10361/17532
id 10361-17532
record_format dspace
spelling 10361-175322022-10-24T21:01:39Z Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System Karimi, Sadullah Rasel, Annajiat Alim Department of Computer Science and Engineering, Brac University Natural Language Querying Translating From Afghan to English Lexical analysis Syntax analysis Semantic analysis Query Generation Python Library Data dictionary Natural language interface to database NLIDB Non-English NLIDB Natural language interface NLI Natural language user interface NLUI Afghan NLP survey Dari Dari language. This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2022. Cataloged from PDF version of thesis. Includes bibliographical references (pages 167-194). Technology adoption is extremely limited in Afghanistan, especially since people have limited access to the Internet, smartphone, and computer due to power limitations and the high cost of the Internet. The people in Afghanistan suffer from high-cost of Internet that is provided by the private sector with very low-speed and quality. Natural Language Processing (NLP) has various applications and improves access to information and systems. To advance as a country, Afghanistan needs to be able to utilize existing databases, datasets, and create new ones and maintain those. Initially, people need a system so they can access the databases providing various guidance with the limited resource that they have access to. Later, they would benefit from higher level access for maintenance and crowdsourced contributions. This work first focus on building a system that Afghanistan people can access database in their native language. Afghan (Dari) language is one of the widely used languages, with up to 110 million speakers worldwide. It is used in countries like Afghanistan, Azerbaijan, Iran, Iraq, Russia, Tajikistan, Turkmenistan, Uzbekistan, etc. The Afghan language lacks resources and requires more qualified lexicon translation. The proposed Afghan Natural Language Interface to Database is based on a natural language query-response model. Afghan language has been used in the model to extract desired data from a database. Retrieving data from a database necessitates knowledge of SQL Query Language or a very well-designed user interface. It is easy for domain experts to retrieve data from databases. However, it is quite challenging for non-expert users to access the database using SQL queries in absence of a proper and friendly user interface. This work overcomes the challenge for those who speak the Afghan Language worldwide to access different databases and datasets. First, we did a survey of current state of Afghan NLP for finding research gaps for future researchers of the Afghan language. We have identified the research gap of NLIDB systems. Second, we surveyed non-English NLIDB systems and conducted a systematic review of the current methods of non-English NLIDB. Then we propose an NLIDB system for Afghan language. Through our system, users in Afghanistan can access the database through feature phone, land phone calls based on an open-source Interactive Voice Response (IVR) system in addition to smartphones and computers. The system can be easily accessed by users without the need for high-speed Internet, sustainable power, computer, and smartphone to access databases. The system is built according to the limited technology situation in Afghanistan. The Afghan Spoken NLIDB build through lexical analysis, semantic analysis, and syntax analysis to respond to the Afghan language natural language query for transforming it into Structured Query Language (SQL). Sadullah Karimi M. Computer Science and Engineering 2022-10-24T10:37:55Z 2022-10-24T10:37:55Z 2022 2022-09 Thesis ID: 21166041 http://hdl.handle.net/10361/17532 en_US Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 194 Pages application/pdf Brac University
institution Brac University
collection Institutional Repository
language en_US
topic Natural Language Querying
Translating From Afghan to English
Lexical analysis
Syntax analysis
Semantic analysis
Query Generation
Python Library
Data dictionary
Natural language interface to database
NLIDB
Non-English NLIDB
Natural language interface
NLI
Natural language user interface
NLUI
Afghan NLP survey
Dari
Dari language.
spellingShingle Natural Language Querying
Translating From Afghan to English
Lexical analysis
Syntax analysis
Semantic analysis
Query Generation
Python Library
Data dictionary
Natural language interface to database
NLIDB
Non-English NLIDB
Natural language interface
NLI
Natural language user interface
NLUI
Afghan NLP survey
Dari
Dari language.
Karimi, Sadullah
Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
description This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2022.
author2 Rasel, Annajiat Alim
author_facet Rasel, Annajiat Alim
Karimi, Sadullah
format Thesis
author Karimi, Sadullah
author_sort Karimi, Sadullah
title Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
title_short Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
title_full Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
title_fullStr Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
title_full_unstemmed Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
title_sort survey of afghan (dari) language nlp for building afghan nlidb system
publisher Brac University
publishDate 2022
url http://hdl.handle.net/10361/17532
work_keys_str_mv AT karimisadullah surveyofafghandarilanguagenlpforbuildingafghannlidbsystem
_version_ 1814307425562394624