Rule based segmentation of lower modifiers in complex Bangla scripts
Includes bibliographical references (page 5).
Asıl Yazarlar: | , |
---|---|
Diğer Yazarlar: | |
Materyal Türü: | Makale |
Dil: | English |
Baskı/Yayın Bilgisi: |
BRAC University
2010
|
Online Erişim: | http://hdl.handle.net/10361/338 |
id |
10361-338 |
---|---|
record_format |
dspace |
spelling |
10361-3382019-09-29T05:27:25Z Rule based segmentation of lower modifiers in complex Bangla scripts Hasnat, Md. Abul Khan, Mumit Center for Research on Bangla Language Processing (CRBLP), BRAC University Includes bibliographical references (page 5). Segmentation is the most challenging part of Bangla optical character recognition (OCR). To solve the problems of joining errors, several algorithms have been proposed in the literature, with varying degrees of accuracy. The selection of the lower modifier container units and the subsequent extraction of the modifiers from the core unit during segmentation have not been studied extensively. We present a dissection based lower modifier segmentation method which solves the problem of segmenting lower modifiers under a wide range of document images. A key goal in our methodology is to avoid over-segmentation of the units that do not actually contain any lower modifier, leading to unacceptably high error rates during segmentation. Our methodology consists of four tasks: we first identify the lower modifier separator line using character height information, and then select the primary lower modifier containers; we filter this set to eliminate the units/characters that do not actually contain any lower modifier; we then extract the lower modifier unit using the features of the core units and the lower modifiers; the final step consists of a set of empirical rules, aided by dictionary lookups, to eliminate most of the errors, resulting in an accuracy of 99.6%. Md. Abul Hasnat Mumit Khan 2010-10-05T06:16:48Z 2010-10-05T06:16:48Z 2009 2009 Article http://hdl.handle.net/10361/338 en 8 pages application/pdf BRAC University |
institution |
Brac University |
collection |
Institutional Repository |
language |
English |
description |
Includes bibliographical references (page 5). |
author2 |
Center for Research on Bangla Language Processing (CRBLP), BRAC University |
author_facet |
Center for Research on Bangla Language Processing (CRBLP), BRAC University Hasnat, Md. Abul Khan, Mumit |
format |
Article |
author |
Hasnat, Md. Abul Khan, Mumit |
spellingShingle |
Hasnat, Md. Abul Khan, Mumit Rule based segmentation of lower modifiers in complex Bangla scripts |
author_sort |
Hasnat, Md. Abul |
title |
Rule based segmentation of lower modifiers in complex Bangla scripts |
title_short |
Rule based segmentation of lower modifiers in complex Bangla scripts |
title_full |
Rule based segmentation of lower modifiers in complex Bangla scripts |
title_fullStr |
Rule based segmentation of lower modifiers in complex Bangla scripts |
title_full_unstemmed |
Rule based segmentation of lower modifiers in complex Bangla scripts |
title_sort |
rule based segmentation of lower modifiers in complex bangla scripts |
publisher |
BRAC University |
publishDate |
2010 |
url |
http://hdl.handle.net/10361/338 |
work_keys_str_mv |
AT hasnatmdabul rulebasedsegmentationoflowermodifiersincomplexbanglascripts AT khanmumit rulebasedsegmentationoflowermodifiersincomplexbanglascripts |
_version_ |
1814307452233973760 |