Everything you need to know as a first-time student.
Information and support for postgraduate study.
Courses, supervisors and your life as a researcher.
Discover our campuses, courses and entry requirements.
Step-by-step application guides for our courses.
Get financial support for your studies. Find a scholarship that's right for you.
Options like part-time, external and online study can help you tailor how you learn.
See where our graduates are now, and where your studies can take you.
Our executive education courses give you the skills you need to lead in a fast-paced world.
Boost your career or extend your skills with a short course or unit.
Our free online courses are open to everyone.
We're constantly moving forward in our research output, commercialisation and collaboration. Find out how you can join our research community and bring innovation to the real world.
Considering research with us? Here's what to expect.
Our strengths and achievements, research projects and activity, and research institutes, centres and groups.
Apply for scholarships for research study, or competitive grants as a professional researcher.
Our researchers work in supportive and established networks.
We value and promote integrity and ethical responsibility in all research we conduct.
A selection of world-class research from our research centres and groups.
We collaborate with industry partners to research solutions for real-world problems, and to give our students hands-on experience in the workplace.
Work with our students and graduates, sponsor scholarships, prizes or events, or become an industry partner.
We offer commercial research and consultancy services, research commercialisation, and workplace training and development.
We're working with a range of industry partners and collaborators.
Our customised executive education equips your employees with tools and inspiration to give your organisation a real edge.
We offer short courses to help you advance your career and expand your skills.
An innovative app to help maximise wellbeing and resilience has been developed using QUT research and expertise in partnership with the AFL Players’ Association and funded by the Movember Foundation.
We are a highly successful and globally positioned Australian university with an applied emphasis in courses and research.
Make a real impact by giving to QUT and supporting our students, researchers and community.
Our history, key statistics, sustainability initiatives and programs and Indigenous acknowledgement.
Meet our staff and executive team.
Our awards, accreditation details, research rankings and scholarly achievements.
Our plans for expanding our university's achievements in learning, teaching and research.
Policies, procedures and annual reports.
What's on at QUT.
Want to work with us? See available jobs.
Our campuses and facilities, including maps, research locations and public venues.
Email: email@example.com Phone: +61 7 3138 2000
Our graduates run successful businesses, conduct ground-breaking research and make significant contributions to their communities.
We celebrate our alumni with annual awards for graduates and students.
Get involved with QUT by engaging with and supporting our current students.
Once you've graduated, we encourage you to keep in touch with the QUT community and your fellow alumni.
Email: firstname.lastname@example.org Phone: +61 7 3138 4778 Mon-Fri, 8.30am-5pm
Darryl McDonough has been named Alumnus of the Year as well as the Faculty of Law Outstanding Alumni Award Winner.
Step-by-step guide to applying as an international student.
We offer scholarships for international students to help with study and living costs.
You may be able to meet with a QUT staff member or official representative in your city.
Find out more about living and studying in Brisbane.
While you're studying here, you can access a range of support services to help you adjust to life in Brisbane.
Come to QUT for one or two semesters.
Freecall: 1800 181 848 (within Australia)
Phone: +61 3 9627 4853 (outside Australia)
Subscribe for email updates
QUT Chancellor Mr Tim Fairfax AC has announced the appointment of Professor Margaret Sheil AO as the university's next Vice-Chancellor, effective February 2018.
Email: email@example.com Phone: +61 7 3138 2361
You are here:
The Speech, Audio, Image and Video Technologies (SAIVT) research program is based at our Gardens Point campus.
We conduct world class research, postgraduate training, industrial consultancy and product development in the areas of speech, audio, image and video technologies. A major focus of our research is in applying machine learning and pattern recognition techniques to solve real world problems in computer vision and speech and language processing.
Our research group was established in 1989 and has seen 52 PhD students and 10 masters by research students graduate in the areas of speech, audio, image and video technologies. Currently 20 full-time PhD students are enrolled in the group.
For general enquires, information on PhD programs within SAIVT or to arrange a consultation, contact Professor Sridha Sridharan.
We are accepting PhD students in:
You must have:
The PhD research will be conducted in the Speech, Audio, Image and Video Technologies (SAIVT) laboratories within the School of Electrical Engineering and Computer Science.
For more information on research projects or PhD studies, or to arrange a consultation, contact our Program Leader, Professor Sridha Sridharan.
If you are interested in applying for a PhD with us, email an expression of interest with attached CV (including details of undergraduate studies and your cumulative grade point average) in addition to copies of academic transcripts of all undergraduate and postgraduate courses only.
Research study within the SAIVT research group is well supported by funding to purchase resources required for research, and to attend national and international conferences.
3-6 month PhD internships at prestigious institutions are also available.
Scholarships to support PhD studies, which includes living allowance and tuition fees, are available for both domestic and international students. Find out more about scholarships and financial support.
Our research has been funded by:
Our researchers collaborate extensively with:
Our research has led to the publication of 6 book chapters, over 80 journal papers, and over 400 conference papers to date.
Our database collection is freely available to download and includes installation instructions. For more information on our databases, contact Dr David Dean or Dr Simon Denman
In addition to citing our paper, we request the following text be included in your publications:
'We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-SoftBio database for our research'.
Download and unzip this archive
At this point, you should have the following data structure and the SAIVT-SoftBio database is installed:
SAIVT-SoftBio +-- Calibration +-- C1--U1-17 +-- C2--U18-48 ... +-- C10--U140-152 +-- Uncontrolled +-- Subject001 +-- Subject002 +-- Subject003 ... +-- Subject152 +-- Bialkowski2012 - A database for person re-identification in multi-camera surveillance networks.pdf +-- LICENSE.txt +-- README.txt +-- SAIVTSoftBioDatabase.xml
The 'Calibration' directory contains a camera calibration and background images (one image per camera) for the dataset. It is arranged into groups of subjects (i.e. C1--U1-17 contains camera calibration and background images valid for subjects 1 to 17). All camera calibration has been calculated using Tsai's method.
The 'Uncontrolled' directory contains the image sequences for each subject, arranged by camera view.
The 'SAIVTSoftBioDatabase.xml' file defines the database. This file specifies the number of cameras used and number of calibrations present, the regions of interest for each camera (<camera> tags), the location of the calibration information (<calibration> tags), and the subjects themselves (<uncontrolledsubject> tags). Note that for each subject, a camera calibration is specified.
This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database. It also contains code to create the QUT-NOISE-SRE protocol on top of an existing speaker recognition evaluation database (such as NIST evaluations). Further information on the QUT-NOISE and QUT-NOISE-TIMIT databases is available in our paper: D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) "The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms", in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.
This paper is also available in the file: docs/Dean2010, The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithm.pdf, distributed with this database.
Further information on the QUT-NOISE-SRE protocol is available in our paper: D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition". In Proceedings of Interspeech 2015, September, Dresden, Germany.
This paper is also available in the file: docs/Dean2015, The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition.pdf, distributed with this database.
The QUT-NOISE data itself is licensed CC-BY-SA, and the code required to create the QUT-NOISE-TIMIT database and QUT-NOISE-SRE protocols is licensed under the BSD license. Please consult the approriate LICENSE.txt files (in the code and QUT-NOISE directories) for more information. To attribute this database, please include the following citation: D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms", in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.
If your work is based upon the QUT-NOISE-SRE, please also include this citation: D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition". In Proceedings of Interspeech 2015, September, Dresden, Germany.
Download the following QUT-NOISE*.zip files:
Please unzip all QUT-NOISE*.zip files into the same directory, and you should have the following directory structure:
QUT-NOISE +--QUT-NOISE (.wav files collected for QUT-NOISE) | +--labels (time labels) | +--impulses (calculated room impulse responses) +--QUT-NOISE-TIMIT (will contain the QUT-NOISE-TIMIT database after installation) +--code (code used to create QUT-NOISE-TIMIT) +--docs (this file and the publications).
At this point, you have the QUT-NOISE database. If you wish to create the QUT-NOISE-TIMIT database, or create a database based upon the QUT-NOISE-SRE protocol please continue to read the following sections.
The SAIVT-Campus dataset is captured at the Queensland University of Technology, Australia.
It contains two video files from real-world surveillance footage without any actors:
This dataset contains a mixture of crowd densities and it has been used in the following paper for abnormal event detection:
The normal activities include pedestrians entering or exiting the building, entering or exiting a lecture theatre (yellow door), and going to the counter at the bottom right. The abnormal events are caused by a heavy rain outside, and include people running in from the rain, people walking towards the door to exit and turning back, wearing raincoats, loitering and standing near the door and overcrowded scenes. The rain happens only in the later part of the test dataset.
As a result, we assume that the training dataset only contains the normal activities. We have manually made an annotation as below:
SAIVT-QUTCrowdCountingDatabase +--Datasets +--PETS2006 +--PETS2009 +--QUT +--Results +--PETS2006 +--PETS2009 +--QUT +-- LICENSE.txt +-- README.txt +-- Ryan2011 - Scene invariant crowd counting.pdf +-- Ryan2012 - Scene invariant crowd counting and crowd occupancy analysis.pdf
The database is located in the 'Datasets/QUT' subdirectory. Calibration and ground truth annotation is included within this directory, as well as a clean background image and region of interest for each of the three sequences. Ground truth annotation for the PETS2006 and PETS2009 databases are contained within the 'Datasets/PETS2006' and 'Datasets/PETS2009' subdirectories. A summary of this follows:
Ground Truth ('dot' annotations)
Initial Background Frame
PETS 2009 View 1
PETS 2009 View 2
PETS 2006 View 3
PETS 2006 View 4
Camera A QUT
Camera B QUT
Camera C QUT
Each file contains a list of images and new landmark points. Each line consists of (in order) the filename, right eye X coordinate, right eye Y coordinate, left eye X coordinate, and left eye Y coordinate. These landmark files should be used in place of the landmarks provided with the MOBIO database.
The SAIVT Semantic Person Search Database was developed to provide a suitable platform to develop and evaluate techniques that search for a person using a semantic query (i.e. tall, red shirt, jeans). Sequences for 110 subjects consisting of 30 initialisation frames (to, for instance, learn a background model), a number of annotated frames containing the target subject, and a description of the subject incorporating a number of traits including clothing type and colour, gender, height and build are provided.
You can read our paper on eprints
ContactDr Simon Denman for further information.
Download and unzip the following archives:
At this point, you should have the following data structure and the SAIVT-SoftBioSearch database is installed:
SAIVT-SoftBioSearch +-- C1-BlackBlueStripe-BlueJeans +-- C1-BlackShirt-PinkShorts +-- ... +-- C6-YellowWhiteSpotDress +-- Calibration +-- Data +-- CultureColours +-- Black +-- Blue +-- ... +-- Videos +-- Cam1 +-- Cam2 +-- ... +-- Halstead 2014 - Locating People in Video from Semantic Descriptions.pdf +-- LICENSE.txt +-- README.txt (this document) +-- SAIVTSoftBioDatabase.xml
Sequences for the individual subjects are contained within the directories named C[1..6]-[TorsoDescription]-[LegDescription]. There are 110 subjects captured from six different cameras. Each directory contains an XML file with the annotation for that sequence, and the images that belong to the sequence. For each sequence, the first 30 frames are reserved for updating/learning a background model, and as such have no annotation.
The 'Calibration' directory contains a camera calibration (using Tsai's method) for the six cameras used in the database.
The 'Data' directory contains additional data that may of use. In particular, it contains a collection of colour patches within 'Data/CultureColours' that can be used to train models for a specific colour. It also contains a set of patches for skin, and for non-skin colours. 'Data/Videos' contains videos for each camera, that can be used to learn the background. It should also be noted that for a portion of the time when the database was captured, a temporary wall was up due to construction works. This impacted the following sequences captured from cameras 1 and 6:
Additional videos for these cameras are also included and are named CamX_Wall.avi. The 'SAIVTSoftBioSearchDB.xml' file defines the database. This file specifies the cameras and their calibrations/background sequences, includes definitions for the traits/soft biometrics, and lists the sequences.
The SAIVT-BNews database consists of multi-modal annotation for a corpus of 55 Australian broadcast news videos. For each video, medadata, speech and speaker ground truth, face timing and identity ground truth, face locations, and an on screen text transcription are provided. The videos are not included within the archive, however a script to automatically download them is provided. Contact Dr David Dean or Dr Simon Denman for further information.
This distribution contains the SAIVT-BNEWS database, consisting of ground truth information and metadata for a selection of 55 Australian broadcast news videos that need to be downloaded separately. Further information on the SAIVT-BNEWS database is available in our paper.
This paper is also available alongside this document in the file 'Ghaemmaghami2013, Speaker Attribution of Australian Broadcast News Data.pdf'.
The SAIVT-BNEWS ground truth information and associated metadata is licensed CC-BY-SA, and the 55 Australian broadcast news videos (downloaded separately, instructions below) are copyright All Rights Reserved by Fairfax Media.
SAIVT-BNEWS +-- The_Sydney_Morning_Herald_MRSS_Feed | +-- <videoid> (for each video) | +-- <videoid>*.txt (video metadata) | +-- <videoid>*.diarref.lab (speech and speaker | | ground truth) | +-- <videoid>*.faceref.lab (face timing ground truth) | +-- <videoid>*.facepositions (face position ground truth) | +-- <videoid>*.ocrref.lab (ocr ground truth) | +-- code (python script to help download videos) +-- docs (this file and the publication) At this point, you have the SAIVT-BNEWS ground truth information and associated meta data. To download the associated videos, the urls can be found using the information in the <videoid>*.txt files on the lines starting with 'media_content:', and a python script is provided in the code folder to automate this process. Simply run 'python code/downloadvids.py' to do so.
The videos will be downloaded into the appropriate SAIVT-BNEWS/The_Sydney_Morning_Herald_MRSS_Feed/ <videoid> folders.
If you aren't using the python script to download the videos, please ensure that only one files is downloaded at a time, and pause briefly between videos to ensure that the media provided doesn't blacklist your IP adress.
Contains information about the video, including title, a short summary, a link to the video's web page (link), as well as a link to the video itself (media_content), and the id.This file has one line per each field, with the field name and the value separated by a ':'-- Example (3123523_high.mp4.txt) -- updated: Wed, 14 March 2012 09:47:50 title: Carr crashes into Senate summary: After being officially sworn into the Senate, former premier Bob Carr unleashed on the Opposition. link: http://media.smh.com.au/news/national-news/carr-crashes-into-senate-3123523.html media_content: http://mediadownload2.f2.com.au/flash/media/2012/03/13/3123523/3123523_high.mp4 id: 3123523 ------------------------------------
Diarisation ground truth
Contains information about the speakers appearing on the audio track of the video, as well as a transcription of their speech. Each line has the start and end time of the speech (in seconds) followed by a database-level unique speaker identity and finally the speech transcription. There may be comments, that should be ignored, indicated by a '#' in the first column, and a commented header to indicate the overall length of the video (in seconds). -- Example (3123523.diarref.lab) -- #length=100.14 3.444518 10.693765 paul_bongiorno BACK INTO THE FRAY BOB CARR SWORN IN AS SENATOR SO HE CAN FULFIL A LONG TIME DREAM TO BECOME FOREIGN MINISTER 10.693765 12.922571 bob_carr I WILL BE FAITHFUL A BE A TRUE ALLEGIANCE 13.035137 17.312643 paul_bongiorno THE BIPARTISAN WELCOME WON'T DETER HIM FROM BEING A GOVERNMENT BOMB THROWER 17.312643 18.618408 bob_carr TONY ABBOTT IS LIKE THE< br /> --- ... continues ... -------------
Face ground truth
Contains information about the faces appearing in the video. Only faces judged to be sufficiently prominent and frontal are labelled at this stage. Each line has the start and end time of the face appearance (in seconds) followed by a database-level unique speaker identity. Identity labels are shared between faces and speakers if they are the same person. There may be comments, that should be ignored, indicated by a '#' in the first column, and a commented header to indicate the overall length of the video (in seconds). -- Example (3123523.faceref.lab) -- #length=100.14 2.96 6.64 bob_carr 10.92 14.2 bob_carr 14.2 15.36 bob_carr -- ... continues .. --------------- While this file indicates the timing information of the faces in the videos, it does not contain the actual locations of the faces in the video. That information is in the matching <faceid>*.facepositions file, where each line has the time, the faceid, and the top, left, height and width of the face, collected around 2.5 times per second (or every 10 frames) whenever a face is visible. -- Example (3123523.facepositions) -- #time id top left height width 2.96 bob_carr 74 526 76 64 3.36 bob_carr 100 508 70 60 3.76 bob_carr 92 526 72 62 4.16 bob_carr 68 500 70 62 -- ... continues ... ----------------
OCR Ground truth
Contains information about the on-screen text appearing in the video. Only text appearing in the lower third of the video is considered. Each line has the start and end time of the text appearance (in seconds) followed by a video-level unique ocr identity. The identity is designed to indicate when different lines of text appear in the same area within the video. There may be comments, that should be ignored, indicated by a '#' in the first column, and a commented header to indicate the overall length of the video (in seconds). At this stage the ocr reference does not indicate the location of the ocr text. This may be provided in the future, and/or QUT would be happy to incorporate this information back into the ground truth if it is produced by other researchers. -- Example (3123523.ocrref.lab) -- #length=100.14 5.3 7.7 OCR_1 PAUL BONGIORNO 5.3 7.7 OCR_1 NATIONAL AFFAIRS EDITOR 19.1 21.6 OCR_2 BOB CARR 19.1 21.6 OCR_2 FOREIGN MINISTER -- ... continues .. ---------------
The SAIVT-Thermal Feature Detection Database contains a number of images suitable for evaluating the performance of feature detection and matching in the thermal image domain.
The database includes conditions unique to the thermal domain such as non-uniformity noise; as well as condition common to other domains such as viewpoint changes, and compression and blur.
You can read our paper on eprints.
Contact Dr Simon Denman for further information.
The SAIVT Thermal Feature Detection Database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
To attribute this database, please include the following citation:
In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications:
We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT Thermal Feature Detection Database for our research.
Download and unzip the following archive:
A copy of the publication can be found at http://eprints.qut.edu.au/48161/, and is also included in this package (Vidas 2011 - An exploration of feature detector performance in the thermal-infrared modality.pdf).
Related publications of interest may be found on the following webpages:
The database has the following structure:
Each of these folders contain images under a single, controlled image transformation, the acronyms for which are expanded at the end of this document. The level of transformation varies (generally increasing in severity) as the numerical label for each subfolder increases.
'We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-BuildingMonitoring database for our research'.
Annotated Data (8 GB)
To rejoin the individual parts, use:
Pre-processed Motion Segmentation (47 GB)
At this point, you should have the following data structure and the SAIVT-BuildingMonitoring database is installed:
SAIVT-BuildingMonitoring +-- AnnotatedData +-- P_Lev_4_Entry_Way_ip_107 +-- Frames +-- Entry_ip107_00000.png +-- Entry_ip107_00001.png +-- ... +-- GroundTruth.xml +-- P_Lev_4_Entry_Way_ip_107-20140730-090000.avi +-- perspectivemap.xml +-- ROI.xml +-- P_Lev_4_external_419_ip_52 +-- ... +-- P_Lev_4_External_Lift_foyer_ip_70 +-- Frames +-- Entry_ip107_00000.png +-- Entry_ip107_00001.png +-- ... +-- GroundTruth.xml +-- P_Lev_4_External_Lift_foyer_ip_70-20140730-090000.avi +-- perspectivemap.xml +-- ROI.xml +-- VG-GroundTruth.xml +-- VG-ROI.xml +-- ... +-- Calibration +-- Lev4Entry_ip107.xml +-- Lev4Ext_ip51.xml +-- ... +-- FullSequences +-- P_Lev_4_Entry_Way_ip_107-20140730-090000.avi +-- P_Lev_4_external_419_ip_52-20140730-090000.avi +-- ... +-- MotionSegmentation +-- Lev4Entry_ip107.avi +-- Lev4Entry_ip107-Full.avi +-- Lev4Ext_ip51.avi +-- Lev4Ext_ip51-Full.avi +-- ... +-- Denman 2015 - Large scale monitoring of crowds and building utilisation.pdf +-- LICENSE.txt +-- README.txt
Data is organised into two sections, AnnotatedData and FullSequences. Additional data that may be of use is provided in Calibration and MotionSegmentation.
AnnotatedData contains the two hour sections that have been annotated (from 11am to 1pm), alongside the ground truth and any other data generated during the annotation process. Each camera has a directory, the contents of which depends on what the camera has been annotated for.
All cameras will have:
The following files exist for crowd counting cameras:
For cameras that have been annotated with a virtual gate, the following additional files are present:
The Calibration directory contains camera calibration for the cameras (with the exception of ip107, which has an uneven ground plane and is thus difficult to calibrate). All calibration is done using Tsai's method.
FullSequences contains the full sequences (9am - 5pm) for each of the cameras.
MotionSegmentation contains motion segmentation videos for all clips. Segmentation videos for both the full sequences and the 2 hour annotated segments are provided. Motion segmentation is done using the ViBE algorithm. Motion videos for the entire sequence have "Full" in the file name before the extension (i.e. Lev4Entry_ip107-Full.avi).
Further information on the SAIVT-BuildingMonitoring database in our paper: S. Denman, C. Fookes, D. Ryan, & S. Sridharan (2015) Large scale monitoring of crowds and building utilisation: A new database and distributed approach. In 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, 25-28 August 2015, Karlsruhe, Germany.
Further information about the SAIVT-DGD database is available in our paper:
Sivapalan, Sabesan, Chen, Daniel, Denman, Simon, Sridharan, Sridha, & Fookes, Clinton B. (2011) Gait energy volumes and frontal gait recognition using depth images. In "Proceeding of the International Joint Conference on Biometrics", Washington DC, USA, available at http://eprints.qut.edu.au/46382/
'We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-DGD database for our research'.
At this point, you should have the following data structure and the SAIVT-DGD database is installed:
SAIVT-DGD +--DGD +--depth_raw +--depth_silhouette +--volume +--docs
The database itself is located in the DGD sub directory. Documentation on the database, including calibration information and a copy of our paper is included in the docs sub directory.
We are looking for PhD students to join our group. See details for more information.