Automatic Bangla corpus creation

Includes bibliographical references (page 4-5).

Opis bibliograficzny
Główni autorzy: Sarkar, Asif Iqbal, Pavel, Dewan Shahriar Hossain, Khan, Mumit
Kolejni autorzy: Center for Research on Bangla Language Processing (CRBLP), BRAC University
Format: Technical report
Język:English
Wydane: BRAC University 2010
Dostęp online:http://hdl.handle.net/10361/652
id 10361-652
record_format dspace
spelling 10361-6522019-09-29T05:39:10Z Automatic Bangla corpus creation Sarkar, Asif Iqbal Pavel, Dewan Shahriar Hossain Khan, Mumit Center for Research on Bangla Language Processing (CRBLP), BRAC University Includes bibliographical references (page 4-5). This paper addresses the issue of automatic Bangla corpus creation, which will significantly help the processes of Lexicon development, Morphological Analysis, Automatic Parts of Speech Detection and Automatic grammar Extraction and machine translation. The plan is to collect all free Bangla documents on the World Wide Web and offline documents available and extract all the words in them to make a huge repository of text. This body of text or corpus will be used for several purposes of Bangla language processing after it is converted to Unicode text. The conversion process is also one of the associated and equally important research and development issue. Among several procedures our research focuses on a combination of font and language detection and Unicode conversion of retrieved Bangla text as a solution for automatic Bangla corpus creation and the methodology has been described in the paper. Asif Iqbal Sarkar Dewan Shahriar Hossain Pavel Mumit Khan 2010-10-28T03:51:23Z 2010-10-28T03:51:23Z 2007 2007 Technical report http://hdl.handle.net/10361/652 en 5 pages application/pdf BRAC University
institution Brac University
collection Institutional Repository
language English
description Includes bibliographical references (page 4-5).
author2 Center for Research on Bangla Language Processing (CRBLP), BRAC University
author_facet Center for Research on Bangla Language Processing (CRBLP), BRAC University
Sarkar, Asif Iqbal
Pavel, Dewan Shahriar Hossain
Khan, Mumit
format Technical report
author Sarkar, Asif Iqbal
Pavel, Dewan Shahriar Hossain
Khan, Mumit
spellingShingle Sarkar, Asif Iqbal
Pavel, Dewan Shahriar Hossain
Khan, Mumit
Automatic Bangla corpus creation
author_sort Sarkar, Asif Iqbal
title Automatic Bangla corpus creation
title_short Automatic Bangla corpus creation
title_full Automatic Bangla corpus creation
title_fullStr Automatic Bangla corpus creation
title_full_unstemmed Automatic Bangla corpus creation
title_sort automatic bangla corpus creation
publisher BRAC University
publishDate 2010
url http://hdl.handle.net/10361/652
work_keys_str_mv AT sarkarasifiqbal automaticbanglacorpuscreation
AT paveldewanshahriarhossain automaticbanglacorpuscreation
AT khanmumit automaticbanglacorpuscreation
_version_ 1814308295303757824