Study level

  • Honours

Faculty/School

Faculty of Science

School of Mathematical Sciences

Topic status

We're looking for students to study this topic.

Research centre

Supervisors

Distinguished Professor Kerrie Mengersen
Position
Professor of Statistics
Division / Faculty
Faculty of Science
Dr Robert Salomone
Position
Division / Faculty

Overview

In the 21st Century, there is an abundance of data, often containing insights that could benefit a number of stakeholders. However, despite this opportunity, it is often the case that the data is sensitive and can not be released by organisations or government agencies due to privacy concerns. One possible solution to the above dilemma is to instead carefully construct a 'twin' data set that contains similar information (and ideally, the same insights) as the original data set, but without directly releasing any private data.

Researchers at QUT are actively engaged in the creation of a synthetic data software framework that employs the latest developments in computer science (particularly in the fields of deep learning and privacy preservation), combined with traditional modelling ideas to create a world first suite of tools for the rapid experimentation and testing of models that are able to create high-quality synthetic data.

This project will involve learning about the methods used in the software, and subsequently testing, applying, and experimenting with the software on a practical use case. The purpose of these activities are to inform future software development, and to potentially provide an industry partner with a synthetic dataset to release to interested parties.

Research activities

During this project you will:

  • learn about the variety of models and methods used in the new software such as:
    • generative adversarial networks
    • normalising flows
    • directed graphical models
    • autoregressive neural networks - how they can be combined, and how they can be made to preserve privacy (via a concept called differential privacy).
  • learn about methods for assessing synthetic data
  • apply these methods to a specialised case study (or case studies), and critically evaluate the synthetically generated datasets that result.
You will work alongside those who have developed the software (including faculty members, research fellows, and PhD students).

Outcomes

The project aims to produce one of the first case studies employing synthetic data software recently-developed by QUT researchers to generate a data set that captures the essential information in a dataset, while retaining the privacy of the original data.

The outcome of this work would be synthetic data for a practical use case that represents the state of the art in terms of quality and privacy.

Skills and experience

It is essential that you have:

  • experience programming in a high-level programming language (preferably Python)
  • a basic understanding of statistical modelling and data visualisation (such as linear models, R-squared statistic, plots of different types).
  • a willingness to learn about the basics of deep learning topics and the use of associated software or hardware, such as PyTorch, and graphical processing units/cloud computing (respectively) as it relates to the synthetic data software (some background knowledge in these topics is required to operate the software).

Ideally you should also have:

  • knowledge of, or interest in learning about machine learning and advanced statistical models.
  • knowledge of, or an interest in learning about deep generative models (such as generative adversarial networks, normalising flows, deep latent variable models) specifically in the context of tabular (not image or video) data
  • a desire to learn about advanced probability models from a high-level perspective.

As the project is at the intersection of statistics and computer science, you should have either an interest in both, or experience relating one of the fields and a desire to complement it by learning some aspects of the other.

Scholarships

You may be eligible to apply for a research scholarship.

Explore our research scholarships

Keywords

Contact

For more information, contact Dr Robert Salomone.