Finite state recognizer and string similarity based spelling checker for Bangla

This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2007.

Opis bibliograficzny
1. autor: Asadullah, Munshi
Kolejni autorzy: Khan, Mumit
Format: Praca dyplomowa
Język:English
Wydane: BRAC University 2010
Hasła przedmiotowe:
Dostęp online:http://hdl.handle.net/10361/416
id 10361-416
record_format dspace
spelling 10361-4162022-01-26T10:18:14Z Finite state recognizer and string similarity based spelling checker for Bangla Asadullah, Munshi Khan, Mumit Department of Computer Science and Engineering, BRAC University Computer science and engineering This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2007. Cataloged from PDF version of thesis report. Includes bibliographical references (page 44). A crucial figure of merit for a spelling checker is not just whether it can detect misspelled words, but also in how it ranks the suggestions for the word. Spelling checker algorithms using edit distance methods tend to produce a large number of possibilities for misspelled words. We propose an alternative approach to checking the spelling of Bangla text that uses a finite state automaton (FSA) to probabilistically create the suggestion list for a misspelled word. FSA has proven to be an effective method for problems requiring probabilistic solution and high error tolerance. We start by using a finite state representation for all the words in the Bangla dictionary; the algorithm then uses the state tables to test a string, and in case of an erroneous string, try to find all possible solutions by attempting singular and multi-step transitions to consume one or more characters and using the subsequent characters as look-ahead; and finally, we use backtracking to add each possible solution to the suggestion list. The use of finite state representation for the word implies that the algorithm is much more efficient in the case of noninflected forms; in case of nouns, it is even more significant as Bangla nouns are heavily used in the non-inflected form. In terms of error detection and correction, the algorithm uses the statistics of Bangla error pattern and thus produces a small number of significant suggestions. One notable limitation is the inability to handle transposition errors as a single edit distance errors. This is not as significant as it may seem since the number of transposition errors are not as common as other errors in Bangla. This paper presents the structure and the algorithm to implement a Practical Bangla spell-checker, and discusses the results obtained from the prototype implementation. Munshi Asadullah B. Computer Science and Engineering 2010-10-10T06:26:09Z 2010-10-10T06:26:09Z 2007 2007 Thesis http://hdl.handle.net/10361/416 en BRAC University thesis are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 44 pages application/pdf BRAC University
institution Brac University
collection Institutional Repository
language English
topic Computer science and engineering
spellingShingle Computer science and engineering
Asadullah, Munshi
Finite state recognizer and string similarity based spelling checker for Bangla
description This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2007.
author2 Khan, Mumit
author_facet Khan, Mumit
Asadullah, Munshi
format Thesis
author Asadullah, Munshi
author_sort Asadullah, Munshi
title Finite state recognizer and string similarity based spelling checker for Bangla
title_short Finite state recognizer and string similarity based spelling checker for Bangla
title_full Finite state recognizer and string similarity based spelling checker for Bangla
title_fullStr Finite state recognizer and string similarity based spelling checker for Bangla
title_full_unstemmed Finite state recognizer and string similarity based spelling checker for Bangla
title_sort finite state recognizer and string similarity based spelling checker for bangla
publisher BRAC University
publishDate 2010
url http://hdl.handle.net/10361/416
work_keys_str_mv AT asadullahmunshi finitestaterecognizerandstringsimilaritybasedspellingcheckerforbangla
_version_ 1814308725019639808