Semantic-based source code embeddings for software vulnerability discovery

Study level

Master of Philosophy
Honours

Faculty/School

Faculty of Science

School of Computer Science

Topic status

We're looking for students to study this topic.

Supervisors

Associate Professor Yue Xu

Position: Associate Professor
Division / Faculty: Faculty of Science

Overview

Operational Technology (OT) is a field of computing which is becoming increasingly prominent in modern society. It is responsible for a variety of critical services, especially in industrial contexts, including power generation, manufacturing, transport, and many others. This important role makes OT an especially tempting target for malicious attackers. In order to counter this, tools must be developed to locate vulnerabilities and flaws in OT software systems before attacks can be launched. Vulnerability discovery in computer software systems including OT systems, however, is a challenging problem which is yet to be solved.

Recently deep learning based models were proposed for vulnerability discovery in software systems. One important reason for the emerging of deep learning models is due to their ability to capture the semantics in source code. Deep learning models have the capability to discover latent features representing the meaning of the code that human experts may never be able to define. However, the existing deep learning models are mainly developed for the vulnerability discovery part, not for source code representation (also called code embedding). This project investigates software vulnerability discovery based on source code embeddings using deep learning.

Research activities

In this project, we will conduct an investigation to evaluate existing deep learning based vulnerability detection models and explore the effectiveness of semantic-based code embeddings for vulnerability discovery in OT networks.

Specifically, the project aims to:

Adapt Code2Vec method to generate code embeddings for representing source code semantically
evaluate the impact of code semantics on the accuracy of vulnerability discovery using supervised classification based-models
develop and evaluate a semi-supervised method to identify vulnerabilities in a large unlabelled dataset based on the code embeddings learnt from a small labelled dataset.

Outcomes

Upon conclusion of this research project, we expect:

To have improved models or algorithms to generate code embeddings for representing source code semantically
to develop a semi-supervised method to identify vulnerabilities in a large unlabelled dataset based on the code embeddings learnt from a small labelled dataset.

Skills and experience

To be considered for this project, we expect you to have:

knowledge of data mining and machine learning
knowledge of networking
good programming skills (preferably Python, C#)

Contact

Contact the supervisor for more information.

Study areas

Explore

Study areas

Research degrees

Explore

Fully online degrees

Online short courses

Need help deciding?

Study areas

Explore

Short courses and professional education

For individuals

For organisations

Explore

Our facilities

Allied facilities

Research by area

Participate in research

Research strengths

Real Focus

Research degrees

Explore

Collaborate with us

Research partnerships

Work with our students

Engagement through sport

Industry and innovation

Professional development

Give to QUT

Contribute

Explore your alumni network

Additional resources

Connect

About us

Gardens Point campus

Kelvin Grove campus

Public venues

Campus experiences

News by subject

Events by subject

Our specialists

Media team contact

Semantic-based source code embeddings for software vulnerability discovery

Study level

Faculty/School

Topic status

Research centre

Supervisors

Overview

Research activities

Outcomes

Skills and experience

Contact