Bangla text extraction by digital image processing

This thesis is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.

Бібліографічні деталі
Автори: Hoque, A. K. M Rashedul, Pandit, Proma Mrittika, Nasreen, Najia, Raihan, Hasin
Інші автори: Uddin, Dr. Jia
Формат: Дисертація
Мова:English
Опубліковано: 2018
Предмети:
Онлайн доступ:http://hdl.handle.net/10361/10142
id 10361-10142
record_format dspace
spelling 10361-101422022-01-26T10:13:16Z Bangla text extraction by digital image processing Hoque, A. K. M Rashedul Pandit, Proma Mrittika Nasreen, Najia Raihan, Hasin Uddin, Dr. Jia Department of Computer Science and Engineering, BRAC University Image processing Text extraction Bangla language This thesis is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages 32-34). Optical character recognition (OCR) is a technology to extract the text from an image. The main purpose of an OCR is to make editable text documents from different scanned documents, image files or books. In this paper, we would discuss the process to develop an OCR for Bangla language. Bangla script contains different shapes and sizes of text. Therefore, extraction of Bengali text from images becomes challenging. In this paper, we would discuss the process of developing an OCR for Bengali language, we focus on the training data preparation process, Tesseract integration procedure for character recognition and the post-processing techniques. Before the recognition step, few preprocessing steps are needed like noise removal, convert to gray scale and binarization for scanned documents. In this paper, we present the basic steps required for developing a Bangla OCR and a complete workflow for development process with the probable errors encountered during recognition using several techniques. We used Tesseract version 3.04 for windows operating system and ‘NIKOSH’ Bangla font in this project. For clear documents, around 95% word level recognition accuracy has been obtained. A. K. M Rashedul Hoque Proma Mrittika Pandit Najia Nasreen Hasin Raihan B. Computer Science and Engineering 2018-05-14T06:32:34Z 2018-05-14T06:32:34Z 2017 2017-12 Thesis ID 13301049 ID 13301032 ID 13301050 ID 13301102 http://hdl.handle.net/10361/10142 en BRAC University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 34 pages application/pdf
institution Brac University
collection Institutional Repository
language English
topic Image processing
Text extraction
Bangla language
spellingShingle Image processing
Text extraction
Bangla language
Hoque, A. K. M Rashedul
Pandit, Proma Mrittika
Nasreen, Najia
Raihan, Hasin
Bangla text extraction by digital image processing
description This thesis is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.
author2 Uddin, Dr. Jia
author_facet Uddin, Dr. Jia
Hoque, A. K. M Rashedul
Pandit, Proma Mrittika
Nasreen, Najia
Raihan, Hasin
format Thesis
author Hoque, A. K. M Rashedul
Pandit, Proma Mrittika
Nasreen, Najia
Raihan, Hasin
author_sort Hoque, A. K. M Rashedul
title Bangla text extraction by digital image processing
title_short Bangla text extraction by digital image processing
title_full Bangla text extraction by digital image processing
title_fullStr Bangla text extraction by digital image processing
title_full_unstemmed Bangla text extraction by digital image processing
title_sort bangla text extraction by digital image processing
publishDate 2018
url http://hdl.handle.net/10361/10142
work_keys_str_mv AT hoqueakmrashedul banglatextextractionbydigitalimageprocessing
AT panditpromamrittika banglatextextractionbydigitalimageprocessing
AT nasreennajia banglatextextractionbydigitalimageprocessing
AT raihanhasin banglatextextractionbydigitalimageprocessing
_version_ 1814308053527298048