The QUT Ecoacoustics research group collects a massive amount of passively-recorded environmental audio data. The data, currently 93TB in size, constitutes more than 46 years of combined environmental monitoring. This audio data is analyzed so that ecologists may scale their observations of the environment.
However, as with all data-intensive projects, the data is not perfect. One of our larger collections of data, collected from the Sturt desert, has been misdated. The result is that for large sub-sections of the data, audio events (like the dawn chorus) do not occur at the correct time. The resulting confusion has made the data very difficult to use by our ecologist stakeholders. The information normally collected to confirm the starting time of these recordings was lost.
In this project, you will attempt to identify and measure a consistent soundmark in the recordings (like dawn chorus). Then you will look for deviances in the position of this soundmark during the day, in relation to surrounding days, in order to identify badly-dated data.
You can expect to:
- Process large amounts of CSV data (upwards of 60GB)
- Try a range of techniques for identifying sound marks (using your choice of heuristics, statistics, or machine learning)
- Do this work in R, C#, F#, or Python
- Contribute to a research paper detailing your work
- Work in a group of other computer scientists
At the end of the project, all recordings in the selected dataset should be classified as either having the correct date-time or needing a correction to their date-time. The amount of time shift needed for each file should also be output. The method you use to identify the soundmarks should be written up (with assistance) into a research paper.
Skills and experience
- You should be able to program in at least one of the following programming languages: R, C#, F#, or Python. Experience with machine learning or statistics is desirable but not required - there will be some opportunity to learn as you go.
- You should be keen to work with challenging datasets that are too big to comprehend or fit into main memory.
- This project will result in real and applied outcomes to a dataset that is used by ecologists. If you're interested in practical outcomes then this is a good choice for you.
Contact the supervisor for more information.