Overview
Topic status: We're looking for students to study this topic.
Search engines are primarily based on the Inverted File data structure. An alternative to the inverted file structure is Document Signature based indexing. The advantage of the document signatures approach is that it can support efficient modifications to the collection of documents (insertions/deletions/updates), while inverted files require an excessive number of index modifications to support truly volatile collections. On the other hand, the document signatures approach does not produce quite as good retrieval quality as the inverted file approach. Furthermore, the search efficiency of inverted files is greater. In this project we seek to tackle both problems - improve the retrieval performance and the search efficiency of documents signatures. The outcome of this project will be a prototype for a small footprint search engine that can efficiently index and search very large document collections (e.g. the English Wikipedia, some 50GB of text.)
Requirements: This project will suit a student who is interested in search engine technology and wants to learn about what's under the hood of systems such a Google or Yahoo!. You will be required to write code in C/C++ and develop highly efficient code utilising multi-core hardware. Naturally you will have to be a good programmer, love programming, and enjoy creating non-obvious solutions to tough problems.
- Study level
- Honours
- Supervisors
- QUT
- Organisational unit
Science and Engineering Faculty
- Research area
- Contact
- Please contact the supervisor.