Persian Wordnet Construction using Supervised Learning
This paper presents an automated supervised method for Persian wordnet construction. Using a Persian corpus and a bi-lingual dictionary, the initial links between Persian words and Princeton WordNet synsets have been generated. These links will be discriminated later as correct or incorrect by employing seven features in a trained classification system. The whole method is just a classification system which has been trained on a train set containing a pre-existing Persian wordnet, FarsNet, as a set of correct instances. A set of some sophisticated distributional and semantic features is proposed to be used in the classification system. Furthermore, a set of randomly selected links have been added to training data as incorrect instances. The links classified as correct are collected to be included in the final wordnet. State of the art results on the automatically derived Persian wordnet is achieved. The resulted wordnet with a precision of 91.18% includes more than 16,000 words and 22,000 synsets.
 Li, C.H., J.C. Yang, and S.C. Park, Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet. Expert Systems with Applications, 2012. 39(1): p. 765-772.
 Lee, S., S.-Y. Huh, and R.D. McNiel, Automatic generation of concept hierarchies using WordNet. Expert Systems with Applications, 2008. 35(3): p. 1132-1144.
 Fellbaum, C., WordNet: An Electronic Lexical Database: Bradford Book. 1998, Cambridge, MA: MIT Press.
 Vossen, P., Introduction to eurowordnet. Computers and the Humanities, 1998. 32(2-3): p. 73-89.
 Tufis, D., D. Cristea, and S. Stamou, BalkaNet: Aims, methods, results and perspectives. a general overview. Romanian Journal of Information science and technology, 2004. 7(1-2): p. 9-43.
 Shamsfard, M. Developing FarsNet: A lexical ontology for Persian. in 4th Global WordNet Conference, Szeged, Hungary. 2008.
 Shamsfard, M., et al. Semi automatic development of farsnet; the persian wordnet. in Proceedings of 5th Global WordNet Conference, Mumbai, India. 2010.
 Montazery, M. and H. Faili. Automatic Persian wordnet construction. in Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 2010. Association for Computational Linguistics.
 Montazery, M. and H. Faili. Unsupervised Learning for Persian WordNet Construction. in RANLP. 2011.
 Fadaee, M., et al., Automatic WordNet Construction Using Markov Chain Monte Carlo. Polibits, 2013(47): p. 13-22.
 Taghizadeh, N. and H. Faili, Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD. J. Artif. Intell. Res.(JAIR), 2016. 56: p. 61-87.
 Lee, C., G. Lee, and S.J. Yun. Automatic WordNet mapping using word sense disambiguation. in Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13. 2000. Association for Computational Linguistics.
 Mikolov, T., et al. Distributed representations of words and phrases and their compositionality. in Advances in neural information processing systems. 2013.
 Yablonsky, S. English-Russian WordNet for Multilingual Mappings. in Proceedings of 2010 Workshop on Cross-Cultural and Cross-Lingual Aspects of the Semantic Web. 2010. Citeseer.
 Kurc, R., M. Piasecki, and S. Szpakowicz. Automatic acquisition of wordnet relations by distributionally supported morphological patterns extracted from Polish corpora. in International Conference on Text, Speech and Dialogue. 2010. Springer.
 Rodríguez, H., et al. Arabic WordNet: Semi-automatic Extensions using Bayesian Inference. in LREC. 2008.
 Sathapornrungkij, P. and C. Pluempitiwiriyawej, Construction of Thai WordNet lexical database from machine readable dictionaries. Proc. 10th Machine Translation Summit, Phuket, Thailand, 2005.
 Bijankhan, M., The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics, 2004. 19(2).
 Oroumchian, F., et al., Creating a feasible corpus for Persian POS tagging. Department of Electrical and Computer Engineering, University of Tehran, 2006.
 Shamsfard, M., H.S. Jafari, and M. Ilbeygi. STeP-1: A Set of Fundamental Tools for Persian Text Processing. in LREC. 2010.
 Zesch, T. and I. Gurevych, Wisdom of crowds versus wisdom of linguists-measuring the semantic relatedness of words. Natural Language Engineering, 2010. 16(1): p. 25.
 Lesk, M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. in Proceedings of the 5th annual international conference on Systems documentation. 1986. ACM.
 AleAhmad, A., et al., Hamshahri: A standard Persian text collection. Knowledge-Based Systems, 2009. 22(5): p. 382-387.
 Endres, D.M. and J.E. Schindelin, A new metric for probability distributions. IEEE Transactions on Information theory, 2003. 49(7): p. 1858-1860.
 Österreicher, F. and I. Vajda, A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics, 2003. 55(3): p. 639-653.
 Hall, M., et al., The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 2009. 11(1): p. 10-18.
 Boyd-Graber, J., et al. Adding dense, weighted connections to WordNet. in Proceedings of the third international WordNet conference. 2006. Citeseer.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)