A Probabilistic Topic Model based on an Arbitrary-Length Co-occurrence Window
Probabilistic topic models have been very popular in automatic text analysis since their introduction. These models work based on word co-occurrence, but are not very flexible with respect to the context in which cooccurrence is considered. Many probabilistic topic models do not allow for taking local or spatial data into account. In this paper, we introduce a probabilistic topic model that benefits from an arbitrary-length co-occurrence window and encodes local word dependencies for extracting topics. We assume a multinomial distribution with Dirichlet prior over the window positions to let the words in every position have a chance to influence topic assignments. In the proposed model, topics being shown by word pairs have a more meaningful presentation. The model is applied on a dataset of 2000 documents. The proposed model produces interesting meaningful topics and reduces the problem of sparseness.
 Luo, W. and T.B. Zhang. Blind Image Quality Assessment Using Latent Dirichlet Allocation Model. 2014. Trans Tech Publ.
 Savoy, J., Authorship attribution based on a probabilistic topic model. Information Processing & Management, 2013. 49(1): p. 341-354 %@ 0306-4573.
 Razavi, A.H. and D. Inkpen. Text Representation Using Multi-level Latent Dirichlet Allocation. in Canadian Conference on Artificial Intelligence. 2014. Springer.
 Blei, D.M. and J.D. Lafferty. Dynamic topic models. 2006. ACM.
 Wallach, H.M. Topic modeling: beyond bag-of-words. 2006. ACM.
 McAuliffe, J.D. and D.M. Blei. Supervised topic models. 2008.
 Porteous, I., et al. Fast collapsed gibbs sampling for latent dirichlet allocation. 2008. ACM.
 Anandkumar, A., et al. A spectral algorithm for latent dirichlet allocation. 2012.
 Andrzejewski, D., X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. 2009. ACM.
 Barbieri, N., et al., Probabilistic topic models for sequence data. Machine learning, 2013. 93(1): p. 5-29 %@ 0885-6125.
 Griffiths, T.L., et al. Integrating Topics and Syntax. 2004.
 Griffiths, T.L., M. Steyvers, and J.B. Tenenbaum, Topics in semantic representation. Psychological review, 2007. 114(2): p. 211 %@ 1939-1471.
 Wang, X., A. McCallum, and X. Wei. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. 2007. IEEE.
 Jameel, S., W. Lam, and L. Bing, Supervised topic models with word order structure for document classification and retrieval learning. Information Retrieval Journal, 2015. 18(4): p. 283-330 %@ 1386-4564.
 Yang, G., et al., A novel contextual topic model for multi-document summarization. Expert Systems with Applications, 2015. 42(3): p. 1340-1352 %@ 0957-4174.
 Momtazi, S., S. Khudanpur, and D. Klakow. A comparative study of word co-occurrence for term clustering in language model-based sentence retrieval. 2010. Association for Computational Linguistics.
 Evert, S., The statistics of word cooccurrences: word pairs and collocations. 2005.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)