Persian Wordnet Construction using Supervised Learning

  • Zahra Mousavi School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
  • Heshaam Faili School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
  • Marzieh Fadaee School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
Keywords: wordnet, ontology, supervised, Persian language

Abstract

This paper presents an automated supervised method for Persian wordnet construction. Using a Persian corpus and a bi-lingual dictionary, the initial links between Persian words and Princeton WordNet synsets have been generated. These links will be discriminated later as correct or incorrect by employing seven features in a trained classification system. The whole method is just a classification system which has been trained on a train set containing a pre-existing Persian wordnet, FarsNet, as a set of correct instances. A set of some sophisticated distributional and semantic features is proposed to be used in the classification system. Furthermore, a set of randomly selected links have been added to training data as incorrect instances. The links classified as correct are collected to be included in the final wordnet. State of the art results on the automatically derived Persian wordnet is achieved. The resulted wordnet with a precision of 91.18% includes more than 16,000 words and 22,000 synsets.

Downloads

Download data is not yet available.

Author Biographies

Zahra Mousavi, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

Zahra Mousavi received her B.Sc. degree in Software Engineering from Alzahra University in 2007 and her M.Sc. degree in Artificial Intelligence from University of Tehran in 2017. Her research interests are Machine Intelligence, Robotics and Natural Language Processing.

Heshaam Faili, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

Heshaam Faili received his B.Sc. degree and M.Sc. degree in Software Engineering from Sharif University of Technology in 1997 and 1999 and his Ph.D. degree in Artificial Intelligence from the same university in 2006. At present, he is an Associate Professor of University of Tehran. His areas of research include Machine Intelligence and Robotics, Information Technology, Software.

Marzieh Fadaee, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

Marzieh Fadaee received her B.Sc. degree in Computer Engineering from Sharif University of and her M.Sc. degree in Artificial Intelligence from University of Tehran. At present, she is a PhD candidate at the Information and Language Processing Systems (ILPS) group at University of Amsterdam. Her research interest is Information and Language Processing.

References

[1] Clarke, C.L., et al. The influence of caption features on clickthrough patterns in web search. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007. ACM.
[2] Li, C.H., J.C. Yang, and S.C. Park, Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet. Expert Systems with Applications, 2012. 39(1): p. 765-772.
[3] Lee, S., S.-Y. Huh, and R.D. McNiel, Automatic generation of concept hierarchies using WordNet. Expert Systems with Applications, 2008. 35(3): p. 1132-1144.
[4] Fellbaum, C., WordNet: An Electronic Lexical Database: Bradford Book. 1998, Cambridge, MA: MIT Press.
[5] Vossen, P., Introduction to eurowordnet. Computers and the Humanities, 1998. 32(2-3): p. 73-89.
[6] Tufis, D., D. Cristea, and S. Stamou, BalkaNet: Aims, methods, results and perspectives. a general overview. Romanian Journal of Information science and technology, 2004. 7(1-2): p. 9-43.
[7] Shamsfard, M. Developing FarsNet: A lexical ontology for Persian. in 4th Global WordNet Conference, Szeged, Hungary. 2008.
[8] Shamsfard, M., et al. Semi automatic development of farsnet; the persian wordnet. in Proceedings of 5th Global WordNet Conference, Mumbai, India. 2010.
[9] Montazery, M. and H. Faili. Automatic Persian wordnet construction. in Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 2010. Association for Computational Linguistics.
[10] Montazery, M. and H. Faili. Unsupervised Learning for Persian WordNet Construction. in RANLP. 2011.
[11] Fadaee, M., et al., Automatic WordNet Construction Using Markov Chain Monte Carlo. Polibits, 2013(47): p. 13-22.
[12] Taghizadeh, N. and H. Faili, Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD. J. Artif. Intell. Res.(JAIR), 2016. 56: p. 61-87.
[13] Lee, C., G. Lee, and S.J. Yun. Automatic WordNet mapping using word sense disambiguation. in Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13. 2000. Association for Computational Linguistics.
[14] Mikolov, T., et al. Distributed representations of words and phrases and their compositionality. in Advances in neural information processing systems. 2013.
[15] Yablonsky, S. English-Russian WordNet for Multilingual Mappings. in Proceedings of 2010 Workshop on Cross-Cultural and Cross-Lingual Aspects of the Semantic Web. 2010. Citeseer.
[16] Kurc, R., M. Piasecki, and S. Szpakowicz. Automatic acquisition of wordnet relations by distributionally supported morphological patterns extracted from Polish corpora. in International Conference on Text, Speech and Dialogue. 2010. Springer.
[17] Rodríguez, H., et al. Arabic WordNet: Semi-automatic Extensions using Bayesian Inference. in LREC. 2008.
[18] Sathapornrungkij, P. and C. Pluempitiwiriyawej, Construction of Thai WordNet lexical database from machine readable dictionaries. Proc. 10th Machine Translation Summit, Phuket, Thailand, 2005.
[19] Bijankhan, M., The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics, 2004. 19(2).
[20] Oroumchian, F., et al., Creating a feasible corpus for Persian POS tagging. Department of Electrical and Computer Engineering, University of Tehran, 2006.
[21] Shamsfard, M., H.S. Jafari, and M. Ilbeygi. STeP-1: A Set of Fundamental Tools for Persian Text Processing. in LREC. 2010.
[22] Zesch, T. and I. Gurevych, Wisdom of crowds versus wisdom of linguists-measuring the semantic relatedness of words. Natural Language Engineering, 2010. 16(1): p. 25.
[23] Lesk, M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. in Proceedings of the 5th annual international conference on Systems documentation. 1986. ACM.
[24] AleAhmad, A., et al., Hamshahri: A standard Persian text collection. Knowledge-Based Systems, 2009. 22(5): p. 382-387.
[25] Endres, D.M. and J.E. Schindelin, A new metric for probability distributions. IEEE Transactions on Information theory, 2003. 49(7): p. 1858-1860.
[26] Österreicher, F. and I. Vajda, A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics, 2003. 55(3): p. 639-653.
[27] Hall, M., et al., The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 2009. 11(1): p. 10-18.
[28] Boyd-Graber, J., et al. Adding dense, weighted connections to WordNet. in Proceedings of the third international WordNet conference. 2006. Citeseer.
Published
2017-06-30
How to Cite
Mousavi, Z., Faili, H., & Fadaee, M. (2017, June 30). Persian Wordnet Construction using Supervised Learning. International Journal of Information & Communication Technology Research, 9(2), 35-44. Retrieved from http://journal.itrc.ac.ir/index.php/ijictr/article/view/9
Section
Information Technology