Study level

  • PhD
  • Master of Philosophy
  • Honours
  • Vacation research experience scheme


Topic status

We're looking for students to study this topic.

Research centre


Professor Yuefeng Li
Division / Faculty
Faculty of Science


A wide variety of companies now use personalized prediction models to improve customer satisfaction, for example, detecting cancer relapses, Detecting Attacks in Networks (e.g., SDN) or understanding Customer Online Shopping Behaviour. However, the dramatic increase in size and complexity of newly generated data from various sources is creating a number of challenges for domain experts to make personalized prediction.

For example, early detection of cancer can drastically improve the chance and successful treatment. Recently, supervised deep learning has brought breakthroughs in some environments (e.g., in dealing with image data). Most supervised machine leaning algorithms require a large amount of labeled data to train their classifiers. However, in many applications, domain experts can only obtain small training sets (or labeled data) that come from multiple resources or sensors because stricter regulations on data privacy and security exacerbate the data fragmentation and isolation problem, where data holders are unwilling, or prohibited to share their raw data freely to build machine learning applications, especially in the cases of requiring of multiple data types.

Usually, the labeled data is obtained by asking human users to make judgments on the unlabeled data (e.g., “what the topic of a news article is?”). In many cases, generating large amounts of labeled data (samples) required by traditional machine learning methods is too expensive. Semi-supervised learning (SSL) introduced a popular way to enlarge small training sets by using unlabeled data in order to boost the performance. However, SSL does not provide a systematic way to measure uncertainties in unlabeled data using the existing techniques (e.g., co-training, data augmentation or uncertainty sampling for classifiers). Recently unsupervised learning was considered for deep learning, and people predicted that unsupervised learning will become far more important in the longer term.

Research activities

  • The project will discuss or contribute leading techniques for building high performance learning systems to increase institutional intelligence capacity timely.
  • The research group, AI-based Data Analysis will foster the development of students through the transition of knowledge from supervisors with the world-class level.
  • For VRES or honours students, you will obtain research experience in the ares of machine learning and data mining.
  • For HDR students, you will learn how to develop world-leading research capabilities or skills for your career.


  • For VRES or honours students, the aim of the project is to develop survey reports or a conceptual model to understand the current research trends or issues for deploying machine learning algorithms in personalised predictions in the fields of gynaecological oncology, software defined network or sentiment analysis
  • For HDR students, the aim of the project is to develop new models or algorithms to solve the research gaps in one of the fields.

Skills and experience

  • You are expected to have solid background in computer science.
  • You have the python or java expereince
  • GPA > = 5.5



Contact the supervisor for more information.