Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs

Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs

Abstract

We develop a fast algorithm for the Admixture of Poisson MRFs (APM) topic model and propose a novel metric to directly evaluate this model. The APM topic model recently introduced by Inouye et al. (2014) is the first topic model that allows for word dependencies within each topic unlike in previous topic models like LDA that assume independence between words within a topic. Research in both the semantic coherence of a topic models and measures of model fitness provide strong support that explicitly modeling word dependencies as in APM could be both semantically meaningful and essential for appropriately modeling real text data. Though APM shows significant promise for providing a better topic model, APM has a high computational complexity because O(p^2) parameters must be estimated where p is the number of words (previous work could only provide results for datasets with p = 200). In light of this, we develop a parallel alternating Newton-like algorithm for training the APM model that can handle p = 104 as an important step towards scaling to large datasets. In addition, Inouye et al. Inouye et al. (2014) only provided tentative and inconclusive results on the utility of APM. Thus, motivated by simple intuitions and previous evaluations of topic models, we propose a novel evaluation metric based on human evocation scores between word pairs (i.e. how much one word “brings to mind” another word). We provide compelling quantitative and qualitative results on the BNC corpus that demonstrate the superiority of APM over previous topic models for identifying semantically meaningful word dependencies. (MATLAB code available at: http://bigdata.ices.utexas.edu/software/apm/)

Publication
Neural Information Processing Systems (NeurIPS)