Overview

Topic status: We're looking for students to study this topic.

Project Summary

This project aims at investigating novel query representations for processing phrases in queries. In particular, the combination of senses of words in a query.

The first step will be to collect a number of novel noun compounds in a corpus (such as the current Wikipedia) by extracting noun compounds that were not appearing in a second corpus (such as an older version of the Wikipedia). This can be done by using existing libraries to POS-tag the text. 

The second step will be to evaluate the outputs of a search engine on the older collection for a number queries made of these extracted compounds, but also for the same queries with additional words manually inserted. 

The third step will be to evaluate a range of pre-determined combination approaches to represent the compounds such that the manually inserted words can be found semantically related, and thus be found by automatic query expansion processes. Such approaches would generally involve vector space representations of the words in a large collection, also referred to as distributional semantics.

Expected outcomes, applications and/or benefits

This project will provide the motivations for better models of representing noun compounds semantically, and also provide insight into what solutions are most likely to work and should be investigated in the future.

Required student skills/experience

Some programming skills and knowledge of information retrieval and basics of natural language processing (INX344).

Study level
Vacation research experience scholarship
Supervisors
QUT
Organisational unit

Science and Engineering Faculty

Research area

Computer Science

Keywords
search, engines
Contact
Contact the supervisor for more information