Shotgun metagenomic sequencing has become commonplace when studying microbial communities and their relationship with the health of our planet, and their direct effects on our own health. Currently, there are >180,000 shotgun metagenomes publicly available, but until recently trying to treat these data as a resource has been challenging due to its extreme size (>700 trillion base pairs).
Recently we have developed a tool that can efficiently convert this base pair information into a straightforward assessment of which microorganisms are present in a sample, and what their abundances are. Searching this dataset can now be undertaken in milliseconds.
Following on from this first step of tabulating microbial community profiles from each of these datasets, we can now treat this public resource as an enormous dataset that informs our understanding of the world’s microbiomes and their properties.
This project will apply machine learning tools to predict one particular property of these microbial communities – the temperature at which they grow.
The main challenge encountered previously in applying predictive algorithms is that the properties of each community (e.g. is it derived from a human faecal sample? What is the concentration of carbon dioxide?) are sometimes missing.
The first task will be to determine how the temperature of each sample can be found or imputed, based on direct metadata associated with the sample, climate modelling and/or semi-supervised machine learning techniques.
The second task will be to find broad patterns in the relationship between microbial community dynamics and the assigned temperature of each sample, and to find if there are repeated patterns observed when microbial communities differ in their temperature.
The work will be supported by the excellent computational resources available at the Centre for Microbiome Research, comprising >2,100 hyperthreaded CPU cores, >8 TB RAM and an NVIDIA V100 GPU spread across 9 nodes. Hardware and OS maintenance is carried out by QUT’s eResearch arm, and CMR employs a system administrator for technical software support.
Skills and experience
You should have some knowledge of a programming language e.g. Python / R.
Contact the supervisor for more information.