Overview

Topic status: We're looking for students to study this topic.

Link-the-Wiki: we are using the Wikipedia collection - about 5GB consisting of 660,000 documents, in XML format. The document set is extensively hyperlinked, but not completely and always effectively. The Link-the-Wiki task aims at creating link discovery algorithms. More specifically, given a new Wikipedia document, the task is to analyse the text and recommend a set of incoming and outgoing links from/to anchor text in the existing collection. Going beyond traditional text document analysis, in the context of Link-the-Wiki we aim to operate at the XML element level. This means that anchor text will link not only to a related document, but to a specific XML element within, or to the best entry point for starting to read the referenced material from. More than one link will be allowed per anchor, extending the current Wikipedia link structure. We also consider modifications to the Wikipedia page viewer so that it can support multiple link per anchor browsing.

Good Java, XML, and Database skills are required.

Study level
Honours
Supervisors
QUT
Organisational unit

Science and Engineering Faculty

Research area

Computer Science

Contact

Please contact the supervisor.