Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language

Namdar ,  Saman; Faili,  Hesham; Khadivi ,  Shahram

Volume 5, Issue 1 (3-2013) 2013, 5(1): 39-52 | Back to browse issues page

Mendeley

Zotero

RefWorks

Namdar S, Faili H, Khadivi S. Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language . International Journal of Information and Communication Technology Research 2013; 5 (1) :39-52
URL: http://ijict.itrc.ac.ir/article-1-165-en.html

Statistical Machine Translation (SMT) for Highly-Inflectional Scarce-Resource Language

Saman Namdar

, Hesham Faili

, Shahram Khadivi

Abstract: (2548 Views)

Statistical Machine Translation (SMT) is a machine translation paradigm, in which translations are generated on the base of statistical models. In this system, parameters are derived from an analysis of a parallel corpus, and SMT quality depends on the ability of learning word translations. Enriching the SMT by a suitable morphology analyser decreases out of vocabulary words and dictionary size dramatically. This could be more considerable when it deals with a highly-inflectional, low-resource, language like Persian. Defining a suitable granularity for word segment may improve the alignment quality in the parallel corpus. In this paper different schemes and word’s combinations segments in a SMT’s experiment from Persian to English language are prospected and the best one-to-one alignment, which is called En-like scheme, is proposed. By using the mentioned scheme the translation’s quality from Persian to English is improved about 3 points with respect to BLEU measure over the phrase-based SMT.

Keywords: Statistical Machine Translation, Segmentation Schemes, Lexical Granularities, Morpheme, Persian Language

Full-Text [PDF 2599 kb] (1625 Downloads)

Type of Study: Research | Subject: Information Technology

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Principal Contact