Download now
Abstract- Diagnosis of transliteration error is definitely not so facile in Bangla. To check for real world problem in a word, it comes with increased difficulties. From this paper, we focus on repairing homophone error in actual word problem. We make use of N-gram Unit which is used in lots of purposes just like machine translation, speech reputation, to remove syntactic data etc . we use a combination of Bi-gram and Tri-gram with candidate word which is gonna be recognized whether it is real word error or certainly not. We have created corpora which will contains (i) one of them can be described as collection of sets of homophone (confusing) term, (ii) one other two would be the collection of bigrams and trigrams using homophone word and (iii) other seven are the test pieces. A candidate term extracts the set of homophone words from your corpus. Within our proposed method, we make tri-gram and bi-gram using homophone expression, then it inspections the quality and takes the consistency of bi-gram or tri-gram, and finally calculates the likelihood for making final decision about the candidate term. We have applied around a mil words to examine our system. The proposed technique achieves a lot more than 96% accuracy and reliability in finding and correcting real phrase of Bangla Text.
Keywords: Bangla homophones, NLP, Real-word mistake, N-gram, Markov model.
We. INTRODUCTION
People exchange their views through ‘languages’. It is like all other family pets, we as well involved in communicating through mental, sign or perhaps textual representation to express our views to other people.
Textual portrayal is the most aufstrebend ways of connection follows which people can easily express their desire to other people. We can get ideas of fiel representation simply by Newspaper, Diary, Manuals, Literature, Novel, Publications etc . Simply by textual representation of vocabulary, we can continue to keep and support the information through media and evolution with the legal program. Bangla is a primary dialect in Bangladesh and second most spoken language in India. Bangla is one of the most generally spoken terminology with about 250 , 000, 000 people.
Bangla terminology comes from Indo-Aryan, Indo-European languages. Bangla language is one of the most important languages we know that. In Bangla, there are eleven vowels 39 consonant characters. So , total 50 albhabets overall retaining the whole Bangla language. It can be difficult to method Bangla vocabulary for its intricate orthographic guidelines. There are many crucial grammatical rules that’s quite so hard constantly to follow in our textual rendering. That’s why it is a common requirement for auto correction in our text which is known as transliteration correction.
A cause checker is an application which usually detects errors and also offers the suggestion. Generally, a mean checker says about the word where it can be misspelled or perhaps not. If the word have not any lifestyle on the a or dictionary, then it will be invalid expression or misspelled word. There are several common causes of spelling errors like since similar phonetic letters in Bangla, existing of comparable pronounced expression, less skill on punctuational rules and so forth there are many types of mistakes such as typographical error, cognitive error etc .
Kukich [1] grouped spelling problems into two styles which are typographical error and cognitive problem. Typographical problems occur although typing (‘দোসর’ as ‘দোসরর’) and intellectual errors (‘বাস’ as ‘বাষ’) occurs insufficient knowledge how to spell the phrase. Typographical error also includes attachment error, removal error, alternative error, changement error. Intellectual error involves phonetic problem.
Kukich also introduced with real-word error and nonword mistake. Non-word error is a word-level