Implementation of an Optical Character Recognizer (OCR) for Bengali language

Cataloged from PDF version of thesis report.

Библиографические подробности
Главные авторы:	Chowdhury, Muhammed Tawfiq, Islam, Md. Saiful, Bipul, Baijed Hossain
Другие авторы:	Rahman, Dr. Md. Khalilur
Формат:	Диссертация
Язык:	English
Опубликовано:	BRAC University 2015
Предметы:	Computer science and engineering Optical Character Recognition (OCR) Bengali language Tesseract JTessboxEditor Netbeans IDE
Online-ссылка:	http://hdl.handle.net/10361/4374

id	10361-4374
record_format	dspace
spelling	10361-43742022-01-26T10:18:19Z Implementation of an Optical Character Recognizer (OCR) for Bengali language Chowdhury, Muhammed Tawfiq Islam, Md. Saiful Bipul, Baijed Hossain Rahman, Dr. Md. Khalilur Department of Computer Science and Engineering, BRAC University Computer science and engineering Optical Character Recognition (OCR) Bengali language Tesseract JTessboxEditor Netbeans IDE Cataloged from PDF version of thesis report. Includes bibliographical references (page 44). This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2015. Optical character recognition (OCR) is the process of extracting text from an image. The main purpose of an OCR is to make editable documents from existing paper documents or image files. A number of algorithms are required to develop an OCR. Noise removal, skew identification and correction, segmentation, etc are the different steps of developing an OCR. OCR primary works in two phases; they are character and word detection. In case of more sophisticated approach, an OCR also works on sentence detection to preserve documents' structures. In this paper, we would discuss the process of developing an OCR for Bengali language. Lots of efforts have been put on developing an OCR for Bengali. Though some OCRs have been developed, none of them is completely error free. For our thesis, we trained Tesseract OCR engine to develop an OCR for Bengali language. Tesseract is currently the most accurate OCR engine. This engine was developed at HP labs and currently owned by Google. We used a number of software to prepare our training files. Our OCR's library contains 18110 characters and 2617 words. We used "Solaimanlipi" font in our project. We used 200 input files to test the accuracy of our OCR . We are using the latest 3.03 version of Tesseract for windows operating system. For clean image files, the accuracy of our software was as high as 97.56%. It is important to mention that we measured accuracy as the percentage of correct characters and words. Muhammed Tawfiq Chowdhury Md. Saiful Islam Baijed Hossain Bipul B. Computer Science and Engineering 2015-09-03T07:14:51Z 2015-09-03T07:14:51Z 2015 8/24/2015 Thesis ID 11101009 ID 11101061 ID 11101047 http://hdl.handle.net/10361/4374 en BRAC University thesis are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. 52 pages application/pdf BRAC University
institution	Brac University
collection	Institutional Repository
language	English
topic	Computer science and engineering Optical Character Recognition (OCR) Bengali language Tesseract JTessboxEditor Netbeans IDE
spellingShingle	Computer science and engineering Optical Character Recognition (OCR) Bengali language Tesseract JTessboxEditor Netbeans IDE Chowdhury, Muhammed Tawfiq Islam, Md. Saiful Bipul, Baijed Hossain Implementation of an Optical Character Recognizer (OCR) for Bengali language
description	Cataloged from PDF version of thesis report.
author2	Rahman, Dr. Md. Khalilur
author_facet	Rahman, Dr. Md. Khalilur Chowdhury, Muhammed Tawfiq Islam, Md. Saiful Bipul, Baijed Hossain
format	Thesis
author	Chowdhury, Muhammed Tawfiq Islam, Md. Saiful Bipul, Baijed Hossain
author_sort	Chowdhury, Muhammed Tawfiq
title	Implementation of an Optical Character Recognizer (OCR) for Bengali language
title_short	Implementation of an Optical Character Recognizer (OCR) for Bengali language
title_full	Implementation of an Optical Character Recognizer (OCR) for Bengali language
title_fullStr	Implementation of an Optical Character Recognizer (OCR) for Bengali language
title_full_unstemmed	Implementation of an Optical Character Recognizer (OCR) for Bengali language
title_sort	implementation of an optical character recognizer (ocr) for bengali language
publisher	BRAC University
publishDate	2015
url	http://hdl.handle.net/10361/4374
work_keys_str_mv	AT chowdhurymuhammedtawfiq implementationofanopticalcharacterrecognizerocrforbengalilanguage AT islammdsaiful implementationofanopticalcharacterrecognizerocrforbengalilanguage AT bipulbaijedhossain implementationofanopticalcharacterrecognizerocrforbengalilanguage
_version_	1814308782762622976

Implementation of an Optical Character Recognizer (OCR) for Bengali language

Схожие документы