Rule based segmentation of lower modifiers in complex Bangla scripts

Includes bibliographical references (page 5).

Detaylı Bibliyografya
Asıl Yazarlar: Hasnat, Md. Abul, Khan, Mumit
Diğer Yazarlar: Center for Research on Bangla Language Processing (CRBLP), BRAC University
Materyal Türü: Makale
Dil:English
Baskı/Yayın Bilgisi: BRAC University 2010
Online Erişim:http://hdl.handle.net/10361/338
id 10361-338
record_format dspace
spelling 10361-3382019-09-29T05:27:25Z Rule based segmentation of lower modifiers in complex Bangla scripts Hasnat, Md. Abul Khan, Mumit Center for Research on Bangla Language Processing (CRBLP), BRAC University Includes bibliographical references (page 5). Segmentation is the most challenging part of Bangla optical character recognition (OCR). To solve the problems of joining errors, several algorithms have been proposed in the literature, with varying degrees of accuracy. The selection of the lower modifier container units and the subsequent extraction of the modifiers from the core unit during segmentation have not been studied extensively. We present a dissection based lower modifier segmentation method which solves the problem of segmenting lower modifiers under a wide range of document images. A key goal in our methodology is to avoid over-segmentation of the units that do not actually contain any lower modifier, leading to unacceptably high error rates during segmentation. Our methodology consists of four tasks: we first identify the lower modifier separator line using character height information, and then select the primary lower modifier containers; we filter this set to eliminate the units/characters that do not actually contain any lower modifier; we then extract the lower modifier unit using the features of the core units and the lower modifiers; the final step consists of a set of empirical rules, aided by dictionary lookups, to eliminate most of the errors, resulting in an accuracy of 99.6%. Md. Abul Hasnat Mumit Khan 2010-10-05T06:16:48Z 2010-10-05T06:16:48Z 2009 2009 Article http://hdl.handle.net/10361/338 en 8 pages application/pdf BRAC University
institution Brac University
collection Institutional Repository
language English
description Includes bibliographical references (page 5).
author2 Center for Research on Bangla Language Processing (CRBLP), BRAC University
author_facet Center for Research on Bangla Language Processing (CRBLP), BRAC University
Hasnat, Md. Abul
Khan, Mumit
format Article
author Hasnat, Md. Abul
Khan, Mumit
spellingShingle Hasnat, Md. Abul
Khan, Mumit
Rule based segmentation of lower modifiers in complex Bangla scripts
author_sort Hasnat, Md. Abul
title Rule based segmentation of lower modifiers in complex Bangla scripts
title_short Rule based segmentation of lower modifiers in complex Bangla scripts
title_full Rule based segmentation of lower modifiers in complex Bangla scripts
title_fullStr Rule based segmentation of lower modifiers in complex Bangla scripts
title_full_unstemmed Rule based segmentation of lower modifiers in complex Bangla scripts
title_sort rule based segmentation of lower modifiers in complex bangla scripts
publisher BRAC University
publishDate 2010
url http://hdl.handle.net/10361/338
work_keys_str_mv AT hasnatmdabul rulebasedsegmentationoflowermodifiersincomplexbanglascripts
AT khanmumit rulebasedsegmentationoflowermodifiersincomplexbanglascripts
_version_ 1814307452233973760