Sunday, February 22, 2015

Topic modeling - Machine Learning

Probabilistic Topic modeling provides methods for organizing, understanding, searching, and summarizing large electronic archives.

Latent Dirichlet Allocation (LDA): The simple intuition behind LDA is that documents exhibit multiple topics.

In reality, we only observe the documents, the other structure are hidden variables
Our goal is to infer the hidden variables i.e. compute their distribution conditioned on the documents: p(topics, proportions, assignments|documents).

Gibbs sampling for LDA - Here we sample the topic of a word in one of the documents, given the topics of all other words, the topic distributions and the data. A sample from a recent coursework at Chalmers below:

Another instance shown below - for 10 topics: