Skip to content

Automatically Assessing the Trustworthiness of Written Documents

Study level

PhD

Master of Philosophy

Honours

Faculty/Lead unit

Science and Engineering Faculty

School of Information Systems

Topic status

We're looking for students to study this topic.

Supervisors

Professor Peter Bruza
Position
Professor
Division / Faculty
Science and Engineering Faculty
Professor Colin Fidge
Position
Professor
Division / Faculty
Science and Engineering Faculty

Overview

In a world increasingly flooded by "fake news" and unreliable information, there is a pressing need for automated ways of assessing the trustworthiness of documents.

Natural language machine learning is still a long way from being able to understand the contents of written text, let alone assess its reliability.

Nonetheless, we can exploit various data sources available online to help us judge the quality of documents.

his project will take an "all sources" approach to the problem by showing how to assess the trustworthiness of written documents in multiple ways.

Research activities

Although automatically understanding the meaning of a document in order to assess its quality or trustworthiness is infeasible at present, even using machine learning techniques, there are a number of algorithmic processes that could be applied to help assess a document's quality immediately.

This project will attempt to develop some or all of the following approaches (from easiest to hardest):

  • quantifying the reputation of the author through sources such as Google Scholar
  • quantifying the reputation of the publication venue through sources
    • this includes journal rankings, "website credibility" ratings and Google Scholar metrics
  • quantifying the reliability of other documents cited to support the current one
  • confirming the veracity of citations in the document by searching online for the source documents
  • assessing the cohesiveness of the document by matching synonyms and related phrases in sentences to determine which parts of the document concern the same topic
  • identifying the components of the document
    • i.e. premises, reasoning steps and conclusions, by applying a natural language "argumentation mining" algorithm
  • quantifying the quality of argumentation within the document by applying a "labeled argumentation framework" algorithm to propagate trustworthiness values from the premises through the reasoning steps to the claimed conclusions.

This is a very large and complex project and it can be undertaken at an Honours, Masters or PhD level.

An Honours project could attempt just one of the aims above, a Masters project could tackle a few, and a PhD might look at them all.

Feel free to discuss with the supervisors which parts interest you.

Outcomes

Ultimately the intention is to develop a program which can quantify the trustworthiness of non-trivial written documents, such as corporate reports and government white papers.

Previous research has looked at documents with more rigid structures such as academic and legal publications.

Ideally such a program would be able to explain its conclusions through appropriate visualisations that highlight and link related parts of the text.

Skills and experience

For this project we require an applicant with strong programming skills.

For Honours or Masters level research, experience with scraping and parsing web documents would be very helpful, as would experience in natural language machine learning.

For PhD level research, a sound knowledge of formal logic would be beneficial.

While not essential, a background in linguistics or text analytics would be beneficial.

Scholarships

You may be able to apply for a research scholarship in our annual scholarship round.

Annual scholarship round

Keywords

Contact

Contact the supervisor for more information.