Overview

Topic status: We're looking for students to study this topic.

The rapid evolution of contents available on the Internet is nowadays accelerated by the opportunity for any user to contribute their knowledge or comments through web 2.0 applications. Most of this information is subjective and redundant making it hard to get a rapid overview of what is said on a given topic. Classification of documents based on the keywords they contain has been long investigated and is nowadays employed in a number of information access and retrieval systems. However these systems cannot cope well with very short messages as those found in comments in web 2.0 applications such as blogging or twitter. Another dimension that classical approaches fail to address is the multiplicity of perspectives one can adopt when analysing messages that can combine topics and opinions at the same time.

This project aims at generating a quick overview of thousands of short messages via complex topics extraction and automatic labelling. The topics extraction will follow the principles of either probabilistic approaches such as topic models or algebraic approaches such as non-negative matrix factorisation. First these methods will be improved to allow for user input as to what constitute a topic or not and to allow for dynamic manipulation of the results. The new method will then be evaluated against gold standards (topic labelling performed by users) and also against their usability in an interactive framework.

References

  • Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs, Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai, WWW 2007 / Track: Data Mining
Study level
Honours
Supervisors
QUT
Organisational unit

Science and Engineering Faculty

Research area

Computer Science

Contact
Please contact the supervisor.