Overview

The Speech, Audio, Image and Video Technologies (SAIVT) research program is based at our Gardens Point campus.

We conduct world class research, postgraduate training, industrial consultancy and product development in the areas of speech, audio, image and video technologies. A major focus of our research is in applying machine learning and pattern recognition techniques to solve real world problems in computer vision and speech and language processing.

Our research group was established in 1989 and has seen 52 PhD students and 10 masters by research students graduate in the areas of speech, audio, image and video technologies. Currently 20 full-time PhD students are enrolled in the group.

Read details

Research leader
Research team
QUT External collaborators
  • Dr Simon Lucey (CMU)
  • Dr Patrick Lucey (Disney Research)
  • Dr Tim Wark (CSIRO)
Organisational unit
Lead unit Science and Engineering Faculty
Research areas
Keywords
computer vision, image processing, machine learning, face recognition, video surveillance, person tracking, biometrics, human identification, face recognition, speaker recognition, spoken term detection, audio-visual processing

Contact

For general enquires, information on PhD programs within SAIVT or to arrange a consultation, contact Professor Sridha Sridharan.

 

Details

PhD applications

We are accepting PhD students in:

  • image processing, machine learning and computer vision
  • speech processing, machine learning and speech technologies.

You must have:

  • an undergraduate degree in electrical or computer engineering or related fields with first class honours (or equivalent)
  • sound mathematical and computing skills
  • a background in signal processing.

The PhD research will be conducted in the Speech, Audio, Image and Video Technologies (SAIVT) laboratories within the School of Electrical Engineering and Computer Science.

For more information on research projects or PhD studies, or to arrange a consultation, contact our Program Leader, Professor Sridha Sridharan.

If you are interested in applying for a PhD with us, email an expression of interest with attached CV (including details of undergraduate studies and your cumulative grade point average) in addition to copies of academic transcripts of all undergraduate and postgraduate courses only.

Scholarships

Research study within the SAIVT research group is well supported by funding to purchase resources required for research, and to attend national and international conferences.

3-6 month PhD internships at prestigious institutions are also available.

Scholarships to support PhD studies, which includes living allowance and tuition fees, are available for both domestic and international students. Find out more about scholarships and financial support.

Research areas

Image and video technology

  • Computer vision
  • Video surveillance
  • Multi-camera management
  • Crowd monitoring
  • Abnormal event detection
  • Person tracking
  • Vehicle tracking
  • Video event detection
  • Human identification at distance
  • Soft biometrics
  • Multimodal biometrics
  • Anti-spoofing biometrics
  • Iris recognition at a distance
  • Gait recognition
  • 2D and 3D face recognition – cooperative and uncooperative
  • Facial expression recognition
  • Face clustering and diarisation
  • Human action recognition
  • Object recognition and scene understanding
  • Multispectral and hyperspectral image analysis
  • Sports analytics
  • Image analysis for unmanned aircrafts
  • 3-D modelling of objects and scenes
  • Robot navigation and robot-human interaction
  • Information indexing , search, retrieval and summarisation

Speech and audio technology

  • Speech detection
  • Speech quality estimation
  • Speech enhancement single/multi microphone
  • Language identification
  • Speaker verification and identification
  • Speech recognition
  • Speech emotion detection
  • Key word spotting/spoken term detection
  • Speaker indexing, diarisation, segmentation and clustering
  • Speaker role detection
  • Audio visual speech recognition
  • Multimodal emotion recognition

Partnerships

Our research has been funded by:

  • Australian Research Council (ARC) – Discovery and Linkage Program
  • Australian Institute of Sports (AIS)
  • Commonwealth Scientific and Industrial Research Organisation (CSIRO)
  • CRC for Smart Services
  • National Security Science and Technology (NSST)
  • Auto CRC
  • CRC for Rail Innovation
  • Office of Naval Research (USA)
  • Department of Defence (DoD) (USA)
  • Defence Science and Technology Organisation (DSTO)
  • Telstra
  • ValidVoice Pty Ltd
  • KAZ Pty Ltd
  • Boeing Australia
  • Australian Federal Police
  • Queensland Police Services
  • Australian Customs
  • Brisbane Airport Corporation
  • Attorney General's Department
  • Genista Corporation (Japan)
  • Disney Research (USA).

Our researchers collaborate extensively with:

  • Carnegie Mellon University (CMU), USA
  • University of Sassari, Italy
  • Radboud University, Netherlands
  • Michigan State University, USA.

Publications and output

Our research has led to the publication of 6 book chapters, over 80 journal papers, and over 400 conference papers to date.

Book chapters

Refereed journal articles (2011-2015)

Refereed journal articles (2006-2010)

Refereed journal articles (2001-2005)

Refereed journal articles (1996-2000)

  • Robust Lip Tracking using Active Shape Models and Gradient Vector Flow
    • S. Lucey, S. Sridharan, and V. Chandran
    • Australian Journal of Intelligent Information Processing Systems, vol. 6, no. 3, pp. 175 ‑ 179, 2000
  • Improving speech recognition accuracy for small vocabulary applications in adverse environments
    • D. Thambiratnam and S. Sridharan
    • International Journal of Speech Technology, vol. 3, no. 2, pp. 109 ‑ 117, 2000
  • Frequency Offset Correction for HF Radio Speech Reception
    • D. Cole, S. Sridharan, and M. Moody
    • IEEE Transactions on Industrial Electronics, vol. 47, no. 2, pp. 438 ‑ 443, 2000
  • Learning object dynamics for smooth tracking of moving lip contours
    • T. Wark, S. Sridharan, and V. Chandran
    • IEEE Electronic letters, vol. 36, no. 6, pp. 521 ‑ 522, 2000
  • Fast Search Methods for Spectral Quantization
    • J. Leis and S. Sridharan
    • Digital Signal Processing, vol. 9, no. 2, pp. 76 ‑ 88, 1999
  • Adaptive Vector Quantization for Speech Spectrum Coding
    • J. Leis and S. Sridharan
    • Digital Signal Processing, vol. 9, no. 2, pp. 89 ‑ 106, 1999
  • Very low rate speech coding using temporal decomposition
    • S. Ghaemmaghami and S. Sridharan
    • Electronic Letters, vol. 35, no. 6, pp. 456 ‑ 457, 1999
  • Coding speech at very low birates using temporal decomposition based spectral interpolation and mixed excitation in the LPC model
    • S. Ghaemmaghami, S. Sridharan, and V. Chandran
    • Applied Signal Processing, vol. 6, no. 4, pp. 203 ‑ 223, 1999
  • Comparing the multiple binary classifier model to other automatic speaker verification models
    • S. Slomka, P. Chatelain, and S. Sridharan
    • Applied Signal Processing, vol. 6, pp. 13 ‑ 24, 1999
  • Speech compaction using temporal decomposition
    • S. Ghaemmaghami, S. Sridharan, and V. Chandran
    • Electronic Letters, vol. 34, no. 24, pp. 2317 ‑ 2319, 1998
  • Multi channel speech separation by eigen decomposition and its application to co-talker interference removal
    • Y. Cao, S. Sridharan, and M. Moody
    • IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 209 ‑ 219, 1997
  • Position independent enhancement of reverberant speech
    • D. Cole, S. Sridharan, and M. Moody
    • Journal of the Audio Engineering Society, vol. 45, no. 3, pp. 142 ‑ 147, 1997
  • Compression of high quality audio using a hybrid LPC Discrete Wavelet Transform algorithm
    • S. Boland, M. Deriche, and S. Sridharan
    • Applied Signal Processing, no. 4, pp. 39 ‑ 55, 1997
  • Co-talker separation using the Cocktail Party Effect
    • Y. Cao, S. Sridharan, and M. Moody
    • Journal of the Audio Engineering Society, vol. 44, no. 12, pp. 1084 ‑ 1096, 1996
  • Simulation of cocktail party effect with neural network controlled iterative Wiener filter
    • Y. Cao, S. Sridharan, and M. Moody
    • IEICE Transactions on Fundamentals of Electronic Communication and Computer Sciences, vol. E79-A, no. 6, pp. 944 ‑ 946, 1996
  • The development of a new signal Processing program at the Queensland University of Technology
    • B. Boashash, S. Sridharan, and V. Chandran
    • IEEE Transactions on Education, vol. 39, no. 2, pp. 186 ‑ 191, 1996
  • A two stage fuzzy decision classifier for speaker identification
    • P. Castellano and S. Sridharan
    • Speech Communication, vol. 18, no. 2, pp. 139 ‑ 149, 1996
  • Effects of speech coding on speaker verification
    • P. Castellano and S. Sridharan
    • Electronic Letters, vol. 32, no. 6, pp. 517 ‑ 518, 1996
  • Speech enhancement using microphone array with multi-stage processing
    • Y. Cao, S. Sridharan, and M. Moody
    • IEICE Transactions on Fundamentals of Electronic Communication and Computer Sciences, vol. E79-A, no. 3, pp. 386 ‑ 394, 1996
  • A speech enhancement system incorporating neural network simulation of cocktail party effect
    • Y. Cao, S. Sridharan, and M. Moody
    • Applied Signal Processing, vol. 3, no. 3, pp. 143 ‑ 150, 1996

Refereed journal articles (1990-1995)

  • Text independent speaker identification with tensor link neural networks
    • P. Castellano and S. Sridharan
    • Applied Signal Processing, vol. 1, pp. 155 ‑ 165, 1994
  • Cryptanalysis of frequency domain analog speech scramblers
    • S. Sridharan, B. Goldburg, and E. Dawson
    • IEE Proceedings, Part I, Communication, Speech and Vision, vol. 140, no. 4, pp. 235 ‑ 239, 1993
  • Design and cryptanalysis of transform-based analog speech scramblers
    • B. Goldburg, S. Sridharan, and E. Dawson
    • IEEE Journal on Selected Areas in Communication, vol. 11, no. 5, pp. 735 ‑ 744, 1993
  • A Fast Fourier Transform based speech encryption system
    • S. Sridharan, E. Dawson, and B. Goldburg
    • IEE Proceedings, Part I, Communication, Speech and Vision, vol. 138, no. 3, pp. 215 ‑ 223, 1991
  • Design of a discrete cosine transform based speech scrambler
    • B. Goldburg, S. Sridharan, and E. Dawson
    • IET Electronic letters, vol. 27, no. 7, pp. 613 ‑ 614, 1991
  • Speech encryption in the Transform domain
    • S. Sridharan, E. Dawson, and B. Goldburg
    • IET Electronic Letters, vol. 26, no. 10, pp. 655 ‑ 657, 1990

Refereed conference papers (2011-2015)

Refereed conference papers (2006-2010)

Refereed conference papers (2001-2005)

Refereed conference papers (1996-2000)

  • Face recognition using fractal codes
    • H. Ebrahimpour-Komleh, V. Chandran, and S. Sridharan
    • in Proc., Workshop on Signal Processing and its Applications (WoSPA), 2000
  • Development of a low cost, model-based video coding system using stereo vision
    • D. Butler and S. Sridharan
    • in Proc., APRS/IEEE Workshop on Stereo Image and Video Processing, 2000, pp. 7 ‑ 10
  • Language identification using efficient gaussian mixture model
    • E. Wong, J. Pelecanos, S. Myers, and S. Sridharan
    • in Proc., 8th Australian International Conference on Speech, Science and Technology (SST), 2000, pp. 78 ‑ 83
  • An Improvement of Automatic Speech Reading using an Intensity to Contour Stochastic Transformation
    • S. Lucey, S. Sridharan, and V. Chandran
    • in Proc., 8th Australian International Conference on Speech, Science and Technology (SST), 2000, pp. 98 ‑ 103
  • Speech Enhancement using Near-field Superdirectivity with an Adaptive Sidelobe Canceler and Post-filter
    • I. McCowan, D. Moore, and S. Sridharan
    • in Proc., 8th Australian International Conference on Speech, Science and Technology (SST), 2000, pp. 268 ‑ 273
  • Two speaker detection by dual mixture modelling
    • S. Myers, J. Pelecanos, and S. Sridharan
    • in Proc., 8th Australian International Conference on Speech, Science and Technology (SST), 2000, pp. 300 ‑ 305
  • Rapid channel compensation for one and two speaker detection in the NIST 2000 speaker recognition evaluation
    • J. Pelecanos, S. Myers, and S. Sridharan
    • in Proc., 8th Australian International Conference on Speech, Science and Technology (SST), 2000, pp. 306 ‑ 311
  • A comparison of static and dynamic classifier performance for multi-modal speaker verification
    • T. Wark, S. Sridharan, and V. Chandran
    • in Proc., 8th Australian International Conference on Speech, Science and Technology (SST), 2000, pp. 318 ‑ 323
  • Compression of Speech for Mass Storage using Speech Recognition and Text-to-Speech
    • J. Dines, S. Sridharan, and M. Moody
    • in Proc., 8th Australian International Conference on Speech, Science and Technology (SST), 2000, pp. 392 ‑ 397
  • A Comparison of two hybrid audio coding structures incorporating discrete wavelet transforms
    • M. Mason, S. Sridharan, and V. Chandran
    • in Proc., 8th Australian International Conference on Speech, Science and Technology (SST), 2000, pp. 410 ‑ 415
  • Improving the performance of a small microphone array at low frequencies using critical band and LPC codebooks
    • Y. Cao and S. Sridharan
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2000, pp. 1033 ‑ 1036
  • The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMM
    • T. Wark, S. Sridharan, and V. Chandran
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2000, pp. 2389 ‑ 2392
  • Hybrid coding of mixed signals for digital covert audio surveillance
    • M. Mason, S. Sridharan, and V. Chandran
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2000, pp. 3654 ‑ 3657
  • Initialised Eigenlip Estimator for Fast Lip Tracking Using Linear Regression
    • S. Lucey, S. Sridharan, and V. Chandran
    • in Proc., International Conference on Pattern Recognition (ICPR), 2000, pp. 182 ‑ 185
  • Vector Quantization Based Gaussian Modelling for Speaker Verification
    • J. Pelecanos, S. Sridharan, and V. Chandran
    • in Proc., International Conference on Pattern Recognition (ICPR), 2000, pp. 298 ‑ 301
  • A speaker independent phonetic vocoder for the English language
    • J. Dines and S. Sridharan
    • in Proc., IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), 2000, pp. 696 ‑ 701
  • Narrowband speech enhancement using fricative spreading and bandwidth extension
    • M. Mason, D. Butler, S. Sridharan, and V. Chandran
    • in Proc., IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), 2000, pp. 714 ‑ 717
  • Modelling Output Probability Distributions for Enhancing Speaker Recognition
    • J. Pelecanos and S. Sridharan
    • in Proc., 6th European Conference on Speech Communication and Technology (EUROSPEECH), vol. 2, 1999, pp. 999 ‑ 1002
  • Digital Coding of Covert Audio for Monitoring and Storage
    • M. Mason, S. Sridharan, and R. Prandolini
    • in Proc., 5th International Symposium on Signal Processing and its Applications (ISSPA), 1999, pp. 475 ‑ 478
  • Speech Compaction Using Vector Quantisation and Hidden Markov Models
    • D. Cole and S. Sridharan
    • in Proc., 5th International Symposium on Signal Processing and its Applications (ISSPA), 1999, pp. 479 ‑ 483
  • Enhancing Automatic Speaker Identification using Phoneme Clustering and Frame Size Selection
    • J. Pelecanos, S. Slomka, and S. Sridharan
    • in Proc., 5th International Symposium on Signal Processing and its Applications (ISSPA), 1999, pp. 633 ‑ 637
  • Chromatic Lip Tracking using a Connectivity Based Fuzzy Threshold Technique
    • S. Lucey, S. Sridharan, and V. Chandran
    • in Proc., 5th International Symposium on Signal Processing and its Applications (ISSPA), 1999, pp. 669 ‑ 673
  • Robust speaker verification via fusion of speech and lip modalities
    • T. Wark, S. Sridharan, and V. Chandran
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1999, pp. 3061 ‑ 3064
  • Robust speaker verification via asynchronous fusion of speech and lip information
    • T. Wark, S. Sridharan, and V. Chandran
    • in Proc., 2nd International Conference on Audio- and Video-Based Biometric Person Authentication Conference (AVBPA), 1999, pp. 27 ‑ 42
  • The use of Speech and Lip Modalities for Robust Speaker Verification under Adverse Conditions
    • T. Wark and S. Sridharan
    • in Proc., IEEE International Conference on Multimedia Computing and Systems (ICMCS), 1999
  • A comparison of fusion techniques for speaker identification
    • S. Slomka, S. Sridharan, and V. Chandran
    • in Proc., 5th International Conference on Spoken Language Processing (ICSLP), 1998, pp. 225 ‑ 228
  • Modelling output probability distributions to improve small vocabulary speech recognition systems
    • D. Thambiratnam and S. Sridharan
    • in Proc., 5th International Conference on Spoken Language Processing (ICSLP), 1998, pp. 373 ‑ 376
  • Improving speaker identification performance in reverberant conditions using lip information
    • T. Wark and S. Sridharan
    • in Proc., 5th International Conference on Spoken Language Processing (ICSLP), 1998, pp. 895 ‑ 898
  • Hierachical Temporal Decomposition - A novel approach to efficient compression of spectral characteristics of speech
    • S. Ghaemmaghami, M. Deriche, and S. Sridharan
    • in Proc., 5th International Conference on Spoken Language Processing (ICSLP), 1998, pp. 2567 ‑ 2570
  • Speech enhancement using critical band spectral subtraction
    • L. Singh and S. Sridharan
    • in Proc., 5th International Conference on Spoken Language Processing (ICSLP), 1998, pp. 2827 ‑ 2830
  • On the convergence of Gaussian Mixture Models; Improvements through vector quantisation
    • J. Moody, S. Slomka, J. Pelecanos, and S. Sridharan
    • in Proc., 5th International Conference on Spoken Language Processing (ICSLP), 1998, pp. 3185 ‑ 3188
  • Two novel lossless algorithms to exploit index redundancy in VQ speech compression
    • S. Sridharan and J. Leis
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1998, pp. 57 ‑ 60
  • A syntactic approach to automatic lip feature extraction for speaker identification
    • T. Wark and S. Sridharan
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1998, pp. 3693 ‑ 3696
  • An approach to statistical lip modelling for speaker identification via chromatic feature extraction
    • T. Wark, S. Sridharan, and V. Chandran
    • in Proc., IEEE International Conference on Pattern Recognition (ICPR), 1998, pp. 123 ‑ 125
  • Hybrid aidio coding using the discrete wavelet transform and vector quantised residuals
    • M. Mason and S. Sridharan
    • in Proc., IEEE International Workshop for Intelligent Signal Processing and Communication Systems, 1998, pp. 606 ‑ 610
  • Bispectrum based cepstral coefficients for robust speaker recognition
    • M. Phythian, V. Chandran, and S. Sridharan
    • in Proc., IEEE International Workshop for Intelligent Signal Processing and Communication Systems, 1998, pp. 611 ‑ 615
  • A two-stage classifier for adaptive fusion of speech and lip information for robust speaker identification
    • T. Wark, S. Sridharan, and V. Chandran
    • in Proc., IEEE International Workshop for Intelligent Signal Processing and Communication Systems, 1998, pp. 611 ‑ 615
  • Postcode segmentation and recognition using projects and bispectral features
    • L. Tieu, C. Rielly, V. Chandran, and S. Sridharan
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 47 ‑ 50
  • Effects of speech coding on text independent speaker recognition
    • M. Phythian, J. Ingram, and S. Sridharan
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 137 ‑ 1140
  • The effect of language on Speaker Identification
    • P. Barger, S. Sridharan, and M. Moody
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 141 ‑ 144
  • Automatic gender Identification for Language independence
    • S. Slomka and S. Sridharan
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 145 ‑ 148
  • Speech recognition in adverse environment using lip information
    • D. Thambiratnam, T. Wark, and S. Sridharan
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 149 ‑ 152
  • Person Authentication using Lip information
    • T. Wark, D. Thambiratnam, and S. Sridharan
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 153 ‑ 156
  • Robust speaker identification using multi-microphone systems
    • P. Barger and S. Sridharan
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 261 ‑ 264
  • Speech Enhancement for Forensic Applications using dynamic time warping and wavelet packet analysis
    • L. Singh and S. Sridharan
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 475 ‑ 478
  • Application of noise reduction techniques for alaryngeal speech enhancement
    • D. Cole, S. Sridharan, and M. Moody
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 491 ‑ 494
  • Combined coding of audio and speech signal using LPC and the discrete wavelet transform
    • M. Mason, S. Boland, S. Sridharan, and M. Deriche
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 747 ‑ 750
  • Speech enhancement using pre-processing
    • L. Singh and S. Sridharan
    • in Proc., IEEE Region 10 Conference. Speech and Image Technologies for Computing and Telecommunications (TENCON), 1997, pp. 755 ‑ 758
  • Automatic gender identification under adverse conditions
    • S. Slomka and S. Sridharan
    • in Proc., 5th European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 2307 ‑ 2310
  • Robust enhancement of reverberant speech using iterative noise removal
    • D. Cole, M. Moody, and S. Sridharan
    • in Proc., 5th European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 2603 ‑ 2066
  • Telephone based speaker recognition using multiple binary classifier and Gaussian mixture models
    • P. Castellano, S. Slomka, and S. Sridharan
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1997, pp. 1075 ‑ 1078
  • Speech compression with preservation of speaker identity
    • J. Leis, M. Phythian, and S. Sridharan
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1997, pp. 1711 ‑ 1714
  • Speech separation by simulating the cocktail party effect with a neural network controlled Wiener filter
    • Y. Cao, S. Sridharan, and M. Moody
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1997, pp. 3261 ‑ 3263
  • Text independendent speaker recognition using fisher's discriminant
    • S. Ong, Y. S. Lin, M. Moody, and S. Sridharan
    • in Proc., 16th IASTED International Conference Modelling, Identification and Control, 1997, pp. 57 ‑ 59
  • Gender gates for automatic speaker recognition
    • P. Barger, S. Slomka, P. Castellano, and S. Sridharan
    • in Proc., 6th Australian International Conference on Speech, Science and Technology (SST), 1996, pp. 19 ‑ 24
  • Alternative methods for reverberant speech enhancement
    • D. Cole, M. Moody, and S. Sridharan
    • in Proc., 6th Australian International Conference on Speech, Science and Technology (SST), 1996, pp. 539 ‑ 544
  • Gender gates in Degraded environment
    • S. Slomka, P. Barger, P. Castellano, and S. Sridharan
    • in Proc., 6th Australian International Conference on Speech, Science and Technology (SST), 1996, pp. 617 ‑ 622
  • Intelligibility measurement of processed reverberant speech
    • D. Cole, M. Moody, and S. Sridharan
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 1, 1996, pp. 89 ‑ 92
  • Comparison of three discriminant models for automatic speaker verification
    • S. Slomka, P. Castellano, and S. Sridharan
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 1, 1996, pp. 325 ‑ 328
  • Comparison of four distance measures for long time text independent speaker identification
    • S. Ong, S. Sridharan, C.-H. Yang, and M. Moody
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 1, 1996, pp. 369 ‑ 372
  • Improving the effectiveness of existing noise reduction techniques using neural networks
    • M. Jones and S. Sridharan
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 1, 1996, pp. 387 ‑ 388
  • An intelligent microphone array for speech enhancement
    • Y. Cao, S. Sridharan, and M. Moody
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 1, 1996, pp. 391 ‑ 394
  • Robust speech coding for the preservation of speaker identity
    • M. Phythian, J. Leis, and S. Sridharan
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 1, 1996, pp. 395 ‑ 398
  • Enhancing the multiple binary classifier model
    • S. Slomka, P. Castellano, and S. Sridharan
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 2, 1996, pp. 537 ‑ 540
  • Speech enhancement by simulation of the cocktail party effect with a neural network controlled iterative filter
    • Y. Cao, S. Sridharan, and M. Moody
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 2, 1996, pp. 541 ‑ 544
  • A new approach to teaching signal processing at undergraduate level
    • B. Boashash, S. Sridharan, and V. Chandran
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 2, 1996, pp. 792 ‑ 795
  • Academic strategy planning for a university research centre
    • V. Chandran, S. Sridharan, and B. Boashash
    • in Proc., 4th International Symposium on Signal Processing and its Applications (ISSPA), vol. 2, 1996, pp. 797 ‑ 800
  • Speaker recognition in reverberant enclosures
    • P. Castellano, S. Sridharan, and D. Cole
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 1996, pp. 117 ‑ 120
  • An augmented multiple binary classifier model for speaker recognition
    • S. Slomka, P. Castellano, and S. Sridharan
    • in Proc., Australian Conference on Artificial Neural Networks (ACNN), 1996, pp. 39 ‑ 44
  • A Comparison of Gaussain mixture and multiple binary classifier models for speaker verification
    • S. Slomka, P. Castellano, P. Barger, S. Sridharan, and V. Narasimhan
    • in Proc., Australian and New Zealand Conference on Intelligent Information Systems (ANZIIS), 1996, pp. 216 ‑ 319

Refereed conference papers (1990-1995)

  • Improving a two stage fuzzy decision classifier
    • P. Castellano and S. Sridharan
    • in Proc., 8th Australian Joint Conference on Artificial Intelligence (AI), 1995, pp. 35 ‑ 42
  • Speech seeking microphone array with multi stage processing
    • Y. Cao, S. Sridharan, and M. Moody
    • in Proc., 4th European Conference on Speech Communication and Technology (EUROSPEECH), 1995, pp. 1991 ‑ 1994
  • Speech enhancement by eigen decomposition with two channel observations
    • Y. Cao, S. Sridharan, and M. Moody
    • in Proc., 4th European Conference on Speech Communication and Technology (EUROSPEECH), 1995, pp. 2017 ‑ 2020
  • A real time audio enhancement system
    • A. Fisher and S. Sridharan
    • in Proc., Audio Engineering Society Convention (AES), 1995
  • Robust enhancement of reverberant speech
    • D. Cole, S. Sridharan, and M. Moody
    • in Proc., Audio Engineering Society Convention (AES), 1995
  • An overview of ISO/MPEG audio Codec
    • S. Boland, M. Deriche, and S. Sridharan
    • in Proc., Audio Engineering Society Convention (AES), 1995
  • Voice/unvoiced/silence classification of noisy speech in real time audio processing
    • Y. Cao, S. Sridharan, and M. Moody
    • in Proc., Audio Engineering Society Convention (AES), 1995
  • Signal Processing Education at Queensland University of Technology, Australia
    • B. Boashash and S. Sridharan
    • in Proc., 28th Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 1994, pp. 1293 ‑ 1297
  • The design and development of an undergraduate signal processing teaching laboratory, Special Session on Signal Processing Education
    • S. Sridharan, V. Chandran, and M. Dawson
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 6, 1994, pp. 41 ‑ 44
  • Intelligibility of reverberant speech enhanced by inversion of room response
    • D. Cole, M. Moody, and S. Sridharan
    • in Proc., International Symposium on Speech, Image Processing and Neural Network (ISSIPNN), 1994, pp. 241 ‑ 244
  • Post Microphone array speech enhancement with adaptive filters for Forensic Applications
    • Y. Cao, S. Sridharan, and M. Moody
    • in Proc., International Symposium on Speech, Image Processing and Neural Network (ISSIPNN), 1994, pp. 253 ‑ 255
  • Confidence analysis of text independent Speaker Identification - Inspecting the effect of population size
    • S. Ong, M. Moody, and S. Sridharan
    • in Proc., International Symposium on Speech, Image Processing and Neural Network (ISSIPNN), 1994, pp. 611 ‑ 613
  • Confidence analysis of speaker Identification - The effectiveness of the various features
    • S. Ong, M. Moody, and S. Sridharan
    • in Proc., ESCA Workshop on Automatic Speaker Recognition, Identification, and Verification, 1994, pp. 91 ‑ 94
  • Text independent speaker identification with functional link neural network
    • P. Castellano and S. Sridharan
    • in Proc., ESCA Workshop on Automatic Speaker Recognition, Identification, and Verification, 1994, pp. 111 ‑ 114
  • Speech Enhancement for Forensic Applications
    • A. Fisher and S. Sridharan
    • in Proc., 5th Australian International Conference on Speech, Science and Technology (SST), vol. 1, 1994, pp. 40 ‑ 45
  • Multi channel speech signal separation by eigen decompositions and its application to co-talker interference removal
    • Y. Cao, S. Sridharan, and M. Moody
    • in Proc., 5th Australian International Conference on Speech, Science and Technology (SST), vol. 1, 1994, pp. 57 ‑ 62
  • Low bitrate speech and music coding using the wavelet transform
    • S. Boland, S. Sridharan, and M. Deriche
    • in Proc., 5th Australian International Conference on Speech, Science and Technology (SST), vol. 1, 1994, pp. 164 ‑ 169
  • Secure Speech coding for voice messaging applications
    • J. Leis, S. Sridharan, and W. Millan
    • in Proc., 5th Australian International Conference on Speech, Science and Technology (SST), vol. 1, 1994, pp. 176 ‑ 181
  • Speaker identification with project networks
    • P. Castellano and S. Sridharan
    • in Proc., 5th Australian International Conference on Speech, Science and Technology (SST), vol. 1, 1994, pp. 400 ‑ 405
  • A two stage fuzzy classifier for speaker identification
    • P. Castellano and S. Sridharan
    • in Proc., 5th Australian International Conference on Speech, Science and Technology (SST), vol. 2, 1994, pp. 456 ‑ 461
  • Measuring the intelligibility of reverberant speech enhanced by inversion of room response
    • D. Cole, M. Moody, and S. Sridharan
    • in Proc., 5th Australian International Conference on Speech, Science and Technology (SST), vol. 2, 1994, pp. 546 ‑ 550
  • Speaker Identification with artificial neural networks
    • P. Castellano and S. Sridharan
    • in Proc., SPRC Workshop on Signal Processing and its Applications (WoSPA), 1993, pp. 153 ‑ 159
  • Enhancement of single microphone recordings in small highly reverberant rooms
    • D. Cole, M. Moody, and S. Sridharan
    • in Proc., SPRC Workshop on Signal Processing and its Applications (WoSPA), 1993, pp. 279 ‑ 286
  • Multi-Microphone speech enhancement system
    • Y. Cao and S. Sridharan
    • in Proc., SPRC Workshop on Signal Processing and its Applications (WoSPA), 1993, pp. 289 ‑ 293
  • Multi-pitch estimation techniques for co-channel speech separation
    • M. Dawson and S. Sridharan
    • in Proc., SPRC Workshop on Signal Processing and its Applications (WoSPA), 1993, pp. 293 ‑ 300
  • Autoregressive time series modelling using neural nets and its application to asymmetric speech coders
    • J. Leis and S. Sridharan
    • in Proc., SPRC Workshop on Signal Processing and its Applications (WoSPA), 1993, pp. 306 ‑ 331
  • Time series modelling using neural nets and its application to asymmetric speech coders
    • J. Leis and S. Sridharan
    • in Proc., Australian and New Zealand Conference on Intelligent Information Systems (ANZIIS), 1993, pp. 192 ‑ 195
  • Design Of a high speed stream cipher
    • E. Dawson, S. Sridharan, and B. Caelli
    • in Proc., Communications '92: Communications Technology, Services and Systems, Getting it Together, 1992, pp. 129 ‑ 134
  • Cryptanalysis of analog speech encryption systems using DSP techniques
    • S. Sridharan, B. Goldburg, and E. Dawson
    • in Proc., 3rd International Symposium on Signal Processing and its Applications (ISSPA), 1992, pp. 109 ‑ 112
  • Progressive image transmission
    • S. Sridharan and A. Ginige
    • in Proc., International Conference on Image Processing and its Applications, 1992, pp. 115 ‑ 118
  • Speech enhancement using time delay Neural networks
    • E. Dawson and S. Sridharan
    • in Proc., 4th Australian International Conference on Speech, Science and Technology (SST), 1992, pp. 152 ‑ 155
  • Speech Cryptology
    • S. Sridharan, B. Goldburg, and E. Dawson
    • in Proc., 4th Australian International Conference on Speech, Science and Technology (SST), 1992, pp. 306 ‑ 311
  • A Secure Analog Speech Scrambler Using the Discrete Cosine Transform
    • B. Goldburg, E. Dawson, and S. Sridharan
    • in Proc., International Conference on the Theory and Applications of Cryptology (Asiacrypt), 1991, pp. 299 ‑ 311
  • On the use of frequency domain vector quantization codebook for cryptanalysis of analog speech scramblers
    • B. Goldburg, S. Sridharan, and E. Dawson
    • in Proc., IEEE International Symposium on Circuits and Systems (ISCAS), vol. 1, 1991, pp. 328 ‑ 331
  • The Automated Cryptoanalysis of Analog Speech Scramblers
    • B. Goldburg, E. Dawson, and S. Sridharan
    • in Proc., Workshop on the Theory and Application of of Cryptographic Techniques (Eurocrypt), 1991, pp. 422 ‑ 430
  • Progressive image coding
    • S. Sridharan and A. Ginige
    • in Proc., International Radio and Electronics Engineering Convention (IREECON), 1991
  • On the use of discrete orthogonal transforms for speech encryption
    • S. Sridharan, E. Dawson, and B. Goldburg
    • in Proc., 3rd International Symposium on Signal Processing and its Applications (ISSPA), 1990, pp. 467 ‑ 471
  • Digital speech encryption using low bitrate vocoders
    • S. Sridharan, J. Fang, and E. Dawson
    • in Proc., 3rd International Symposium on Signal Processing and its Applications (ISSPA), 1990, pp. 810 ‑ 814
  • Speech encryption using discrete orthogonal transforms
    • S. Sridharan, E. Dawson, and B. Goldburg
    • in Proc., IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, 1990, pp. 1647 ‑ 1650
  • Computer based speech training method for the hearing impaired
    • E. Dawson and S. Sridharan
    • in Proc., 3rd Australian International Conference on Speech, Science and Technology (SST), 1990, pp. 116 ‑ 120
  • Discrete Cosine Transform Speech encryption System
    • S. Sridharan, B. Goldburg, and E. Dawson
    • in Proc., 3rd Australian International Conference on Speech, Science and Technology (SST), 1990, pp. 472 ‑ 476

Databases

Our database collection is freely available to download and includes installation instructions. For more information on our databases, contact Dr David Dean or Dr Simon Denman

SoftBio

Overview
The SAIVT-SoftBio database contains a collection of multi-camera sequences of 152 pedestrians captured from a set of 8 surveillance cameras. This database provides a challenging and realistic test-bed for person redetection tasks, and is freely available for download. Contact Dr Simon Denman for more information.
Licensing
The SAIVT-SOFTBIO database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
Attribution
To attribute this database, use the citation provided on our publication at eprints.
Acknowledgement in publications

In addition to citing our paper, we request the following text be included in your publications:

'We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-SoftBio database for our research'.

Installing the SAIVT-SoftBio Database

Download and unzip this archive

At this point, you should have the following data structure and the SAIVT-SoftBio database is installed:

SAIVT-SoftBio
+-- Calibration
+-- C1--U1-17
+-- C2--U18-48
...
+-- C10--U140-152
+-- Uncontrolled
+-- Subject001
+-- Subject002
+-- Subject003
...
+-- Subject152
+-- Bialkowski2012 - A database for person re-identification in multi-camera surveillance networks.pdf
+-- LICENSE.txt
+-- README.txt
+-- SAIVTSoftBioDatabase.xml

The 'Calibration' directory contains a camera calibration and background images (one image per camera) for the dataset. It is arranged into groups of subjects (i.e. C1--U1-17 contains camera calibration and background images valid for subjects 1 to 17). All camera calibration has been calculated using Tsai's method.

The 'Uncontrolled' directory contains the image sequences for each subject, arranged by camera view.

The 'SAIVTSoftBioDatabase.xml' file defines the database. This file specifies the number of cameras used and number of calibrations present, the regions of interest for each camera (<camera> tags), the location of the calibration information (<calibration> tags), and the subjects themselves (<uncontrolledsubject> tags). Note that for each subject, a camera calibration is specified.

The QUT-NOISE Databases and Protocols

Overview

This distribution contains the QUT-NOISE database and the code required to create the QUT-NOISE-TIMIT database from the QUT-NOISE database and a locally installed copy of the TIMIT database. It also contains code to create the QUT-NOISE-SRE protocol on top of an existing speaker recognition evaluation database (such as NIST evaluations). Further information on the QUT-NOISE and QUT-NOISE-TIMIT databases is available in our paper:
D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) "The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms", in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.

This paper is also available in the file: docs/Dean2010, The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithm.pdf, distributed with this database.

Further information on the QUT-NOISE-SRE protocol is available in our paper:
D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition". In Proceedings of Interspeech 2015, September, Dresden, Germany.

This paper is also available in the file: docs/Dean2015, The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition.pdf, distributed with this database.

Licensing

The QUT-NOISE data itself is licensed CC-BY-SA, and the code required to create the QUT-NOISE-TIMIT database and QUT-NOISE-SRE protocols is licensed under the BSD license. Please consult the approriate LICENSE.txt files (in the code and QUT-NOISE directories) for more information. To attribute this database, please include the following citation:
D. Dean, S. Sridharan, R. Vogt, M. Mason (2010) The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms", in Proceedings of Interspeech 2010, Makuhari Messe International Convention Complex, Makuhari, Japan.

If your work is based upon the QUT-NOISE-SRE, please also include this citation:
D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition". In Proceedings of Interspeech 2015, September, Dresden, Germany.

Download and Installation

Download the following QUT-NOISE*.zip files:

Please unzip all QUT-NOISE*.zip files into the same directory, and you should have the following directory structure:

QUT-NOISE
+--QUT-NOISE (.wav files collected for QUT-NOISE)
| +--labels (time labels)
| +--impulses (calculated room impulse responses)
+--QUT-NOISE-TIMIT (will contain the QUT-NOISE-TIMIT database after installation)
+--code (code used to create QUT-NOISE-TIMIT)
+--docs (this file and the publications).

At this point, you have the QUT-NOISE database. If you wish to create the QUT-NOISE-TIMIT database, or create a database based upon the QUT-NOISE-SRE protocol please continue to read the following sections.

Creating QUT-NOISE-TIMIT
Obtaining TIMIT
  • In order to construct the QUT-NOISE-TIMIT database from the QUT-NOISE data supplied here you will need to obtain a copy of the TIMIT database from the Linguistic Data Consortium. If you just want to use the QUT-NOISE database, or you wish to combine it with different speech data, TIMIT is not required.
Creating QUT-NOISE-TIMIT
  • Once you have obtained TIMIT, download and install a copy of VOICEBOX: Speech Processing Toolbox for MATLAB and install it in your MATLABPATH.
  • Run matlab in the QUT-NOISE/code directory, and run the function: createQUTNOISETIMIT('/location/of/timit-cd/timit'). This will create the QUT-NOISE-TIMIT database in the QUT-NOISE/QUT-NOISE-TIMIT directory.
  • If you wish to verify that the QUT-NOISE-TIMIT database matches that evaluated in our original paper, please check that the md5sums (use md5sum on unix-based OSes) match those in the QUT-NOISE-TIMIT/md5sum.txt file.
Using the QUT-NOISE-SRE protocol
  • The code related to the QUT-NOISE-SRE protocol can be used in two ways:
    1. To create a collection of noisy audio files across the scenarios in the QUT-NOISE database at different noise levels, or,
    2. To recreate a list of file names based on the QUT-NOISE-SRE protocl produced by another researcher, having already done (1). This allows existing research to be reproduced without having to send large volumes of audio around.
  • If you are interested in creating your own noisy database from an existing SRE database (1 above), please look at the example script exampleQUTNOISESRE.sh in the QUT-NOISE/code directory. You will need to make some modifications, but it should give you the right idea.
  • If you are interested in creating our QUT-NOISE-NIST2008 database published at Interspeech 2015, you can find the list of created noisy files in the QUT-NOISE-NIST2008.train.short2.list and QUT-NOISE-NIST2008.test.short3.list files in the QUT-NOISE/code directory.
  • These files can be recreated as follows (provided you have access to the NIST2008 SRE data):

    Run matlab in the QUT-NOISE/code directory, and run the following functions:

    createQUTNOISESREfiles('NIST2008.train.short2.list', ...
    'QUT-NOISE-NIST2008.train.short2.list', ...
    '<location/of/NIST2008/SRE&rt;', ...
    '../QUT-NOISE-NIST2008')
    createQUTNOISESREfiles('NIST2008.test.short3.list', ...
    'QUT-NOISE-NIST2008.test.short3.list', ...
    '<location/of/NIST/2008/SRE&rt;', ...
    '../QUT-NOISE-NIST2008')
  • This may take some time to execute, so if you have access to a computing cluster, it may be worth dividing the QUT-NOISE-NIST2008.* files into smaller chunks and running in parallel. Just make sure that testing or training noisy files are associated with testing or training clean files (The NIST2008.* files) - but you don't need to split the clean file lists.

SAIVT-Campus

Overview
The SAIVT-Campus Database is an abnormal event detection database captured on a university campus, where the abnormal events are caused by the onset of a storm. Contact Dr Simon Denman for more information.
Licensing
The SAIVT-Campus database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
Attribution
To attribute this database, please include the following citation:
Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. available at eprints.
Acknowledging the Database in your Publications
In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications:
We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-Campus database for our research.
Installing the SAIVT-Campus database
Download and unzip:
SAIVT_Campus.tar.gz (927M)
The archive should contain:
SAIVT-Campus
+-- LICENCE.txt
+-- README.txt
+-- test_dataset.avi
+-- training_dataset.avi
+-- Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf
Notes

The SAIVT-Campus dataset is captured at the Queensland University of Technology, Australia.

It contains two video files from real-world surveillance footage without any actors:

  1. training_dataset.avi (the training dataset)
  2. test_dataset.avi (the test dataset).

This dataset contains a mixture of crowd densities and it has been used in the following paper for abnormal event detection:

  • Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. Available at eprints.
    This paper is also included with the database (Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf) Both video files are one hour in duration.

The normal activities include pedestrians entering or exiting the building, entering or exiting a lecture theatre (yellow door), and going to the counter at the bottom right. The abnormal events are caused by a heavy rain outside, and include people running in from the rain, people walking towards the door to exit and turning back, wearing raincoats, loitering and standing near the door and overcrowded scenes. The rain happens only in the later part of the test dataset.

As a result, we assume that the training dataset only contains the normal activities. We have manually made an annotation as below:

  • the training dataset does not have abnormal scenes
  • the test dataset separates into two parts: only normal activities occur from 00:00:00 to 00:47:16 abnormalities are present from 00:47:17 to 01:00:00. We annotate the time 00:47:17 as the start time for the abnormal events, as from this time on we have begun to observe people stop walking or turn back from walking towards the door to exit, which indicates that the rain outside the building has influenced the activities inside the building. Should you have any questions, please do not hesitate to contact Jingxin Xu.

SAIVT-QUT Crowd Counting

Overview
This database contains three sequences annotated for crowd counting, captured on a university campus. Sequences are sparsely annotated (every 100 or 200 frames) over the approximately 5,000 or 10,000 frame sequence duration. Annotation is local, in that the location of each individual person in the frames is provided. Annotation is also provided for the PETS 2006 and 2009 datasets (although please note that you will need to download these databases separately). Contact Dr Simon Denman for more information.
Licensing
The SAIVT-QUT Crowd Counting database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
Attribution
To attribute this database, please include the following citation: Ryan, David, Denman, Simon, Sridharan, Sridha, & Fookes, Clinton B. (2012) Scene invariant crowd counting and crowd occupancy analysis. In Video Analytics for Business Intelligence [Studies in Computational Intelligence, Volume 409]. Springer-Verlag, Germany, pp. 161-198. See the full paper on eprints.
Acknowledging the database in your publications
In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications: We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-QUT Crowd Counting database for our research. If you use the annotation data for either the PETS2006 or PETS2009 databases, you should also cite/acknowledge these databases in the appropriate method as outlined by the database creators, in addition to citing and acknowledging this database as outlined above and in the LICENCE.txt file.
Installing the database
Acknowledging the Database in your Publications Installing the SAIVT-QUT Crowd Counting database Download and unzip the following archive: SAIVT_QUTCrowdCountingDatabase.tar.gz (933 MB). Upon downloading and unzipping the database, you should have the following directory structure:

SAIVT-QUTCrowdCountingDatabase
+--Datasets
+--PETS2006
+--PETS2009
+--QUT
+--Results
+--PETS2006
+--PETS2009
+--QUT
+-- LICENSE.txt
+-- README.txt
+-- Ryan2011 - Scene invariant crowd counting.pdf

+-- Ryan2012 - Scene invariant crowd counting and crowd occupancy analysis.pdf

The database is located in the 'Datasets/QUT' subdirectory. Calibration and ground truth annotation is included within this directory, as well as a clean background image and region of interest for each of the three sequences. Ground truth annotation for the PETS2006 and PETS2009 databases are contained within the 'Datasets/PETS2006' and 'Datasets/PETS2009' subdirectories. A summary of this follows:

 

Video

Calibration

ROI

Ground Truth
('dot' annotations)

Initial Background
Frame

PETS 2009
View 1

No (1)

Yes

Yes

Yes

Yes

PETS 2009
View 2

No (1)

Yes

Yes

Yes

Yes

PETS 2006
View 3

No (2)

Yes

Yes

Yes

No (3)

PETS 2006
View 4

No (2)

Yes

Yes

Yes

No (3)

Camera A
QUT

Yes

Yes

Yes

Yes

Yes

Camera B
QUT

Yes

Yes

Yes

Yes

Yes

Camera C
QUT

Yes

Yes

Yes

Yes

Yes

Noisy MOBIO Landmarks

Overview
Face landmarks for the MOBIO database (https://www.idiap.ch/dataset/mobio) with different level of noise provided to evaluate face recognition in the presence of localisation noise. Contact Dr Simon Denman for further information.
Licensing
The Noisy MOBIO Landmarks are © 2014 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
Attribution
To attribute this database, include the following citation:
K. Anantharajah, Z. Ge, C. McCool, S. Denman, C. Fookes, P. Corke, D. Tjondronegoro, S. Sridharan (2014) Local Inter-Session Variability Modelling for Object Classification. In IEEE Winter Conference on Applications of Computer Vision (WACV) Please note that authors should also cite and acknowledge the MOBIO database as outlined on the MOBIO website.
Downloading and using the Noisy MOBIO Landmarks database
Four sets of landmarks are provided, corresponding to added uniform noise equal to 2, 5, 10 and 20% of the average inter-eye distance:

Each file contains a list of images and new landmark points. Each line consists of (in order) the filename, right eye X coordinate, right eye Y coordinate, left eye X coordinate, and left eye Y coordinate. These landmark files should be used in place of the landmarks provided with the MOBIO database.

SAIVT Semantic Person Search

Overview

The SAIVT Semantic Person Search Database was developed to provide a suitable platform to develop and evaluate techniques that search for a person using a semantic query (i.e. tall, red shirt, jeans). Sequences for 110 subjects consisting of 30 initialisation frames (to, for instance, learn a background model), a number of annotated frames containing the target subject, and a description of the subject incorporating a number of traits including clothing type and colour, gender, height and build are provided.

You can read our paper on eprints

ContactDr Simon Denman for further information.

Licensing
The SAIVT-SoftBioSearch database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
Attribution
To attribute this database, please include the following citation:
Halstead, Michael, Denman, Simon, Sridharan, Sridha, & Fookes, Clinton B. (2014) Locating People in Video from Semantic Descriptions : A New Database and Approach. In the 22nd International Conference on Pattern Recognition, 24 - 28 August 2014, Stockholm, Sweden.
Our paper is available on eprints .
Acknowledging the Database in your Publications
In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications:
We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-SoftBioSearch database for our research.
Installing the SAIVT-SoftBioSearch Database

Download and unzip the following archives:

At this point, you should have the following data structure and the SAIVT-SoftBioSearch database is installed:

SAIVT-SoftBioSearch
+-- C1-BlackBlueStripe-BlueJeans
+-- C1-BlackShirt-PinkShorts
+-- ...
+-- C6-YellowWhiteSpotDress
+-- Calibration
+-- Data
+-- CultureColours
+-- Black
+-- Blue
+-- ...
+-- Videos
+-- Cam1
+-- Cam2
+-- ...
+-- Halstead 2014 - Locating People in Video from Semantic Descriptions.pdf
+-- LICENSE.txt
+-- README.txt (this document)
+-- SAIVTSoftBioDatabase.xml

Sequences for the individual subjects are contained within the directories named C[1..6]-[TorsoDescription]-[LegDescription]. There are 110 subjects captured from six different cameras. Each directory contains an XML file with the annotation for that sequence, and the images that belong to the sequence. For each sequence, the first 30 frames are reserved for updating/learning a background model, and as such have no annotation.

The 'Calibration' directory contains a camera calibration (using Tsai's method) for the six cameras used in the database.

The 'Data' directory contains additional data that may of use. In particular, it contains a collection of colour patches within 'Data/CultureColours' that can be used to train models for a specific colour. It also contains a set of patches for skin, and for non-skin colours. 'Data/Videos' contains videos for each camera, that can be used to learn the background. It should also be noted that for a portion of the time when the database was captured, a temporary wall was up due to construction works. This impacted the following sequences captured from cameras 1 and 6:

  • Camera1 C1-GreenWhiteHorizontal-BlackPants C1-RedCheck-BlackSkirt C1-GreenCheck-BrownPants
  • Camera6 C6-GreenFaceCover-Blue-Blue C6-YellowWhiteSpotDress

Additional videos for these cameras are also included and are named CamX_Wall.avi. The 'SAIVTSoftBioSearchDB.xml' file defines the database. This file specifies the cameras and their calibrations/background sequences, includes definitions for the traits/soft biometrics, and lists the sequences.

This paper is also available alongside this document in the file 'Halstead 2014 - Locating People in Video from Semantic Descriptions.pdf'.

SAIVT-BNEWS

Overview

The SAIVT-BNews database consists of multi-modal annotation for a corpus of 55 Australian broadcast news videos. For each video, medadata, speech and speaker ground truth, face timing and identity ground truth, face locations, and an on screen text transcription are provided. The videos are not included within the archive, however a script to automatically download them is provided. Contact Dr David Dean or Dr Simon Denman for further information.

This distribution contains the SAIVT-BNEWS database, consisting of ground truth information and metadata for a selection of 55 Australian broadcast news videos that need to be downloaded separately. Further information on the SAIVT-BNEWS database is available in our paper.

This paper is also available alongside this document in the file 'Ghaemmaghami2013, Speaker Attribution of Australian Broadcast News Data.pdf'.

The SAIVT-BNEWS ground truth information and associated metadata is licensed CC-BY-SA, and the 55 Australian broadcast news videos (downloaded separately, instructions below) are copyright All Rights Reserved by Fairfax Media.

Attribution
To attribute this database, please include the following citation:
Ghaemmaghami, H., Dean, D., & Sridharan, S (2013) Speaker attribution of Australian broadcast news data, In "Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM)", CEUR Workshop Proceeedings, Volume 1012, Sun SITE Central Europe, Marseille, France, pp 72-77, available at eprints.
Installation
  1. Download SAIVT-BNEWS.zip
  2. Unzip SAIVT-BNEWS.zip and you should have the following folder structure:

SAIVT-BNEWS
+-- The_Sydney_Morning_Herald_MRSS_Feed
| +-- <videoid> (for each video)
| +-- <videoid>*.txt (video metadata)
| +-- <videoid>*.diarref.lab (speech and speaker
| | ground truth)
| +-- <videoid>*.faceref.lab (face timing ground truth)
| +-- <videoid>*.facepositions (face position ground truth)
| +-- <videoid>*.ocrref.lab (ocr ground truth)
|
+-- code (python script to help download videos)
+-- docs (this file and the publication)

At this point, you have the SAIVT-BNEWS ground truth information and associated meta data. To download the associated videos, the urls can be found using the information in the <videoid>*.txt files on the lines starting with 'media_content:', and a python script is provided in the code folder to automate this process. Simply run 'python code/downloadvids.py' to do so.

The videos will be downloaded into the appropriate SAIVT-BNEWS/The_Sydney_Morning_Herald_MRSS_Feed/ <videoid> folders.

If you aren't using the python script to download the videos, please ensure that only one files is downloaded at a time, and pause briefly between videos to ensure that the media provided doesn't blacklist your IP adress.

Data description

Video metadata

Contains information about the video, including title, a short summary, a link to the video's web page (link), as well as a link to the video itself (media_content), and the id.
This file has one line per each field, with the field name and the value separated by a ':'
-- Example (3123523_high.mp4.txt) --
updated: Wed, 14 March 2012 09:47:50
title: Carr crashes into Senate
summary: After being officially sworn into the Senate, former premier Bob Carr unleashed on the Opposition.
link: http://media.smh.com.au/news/national-news/carr-crashes-into-senate-3123523.html media_content: http://mediadownload2.f2.com.au/flash/media/2012/03/13/3123523/3123523_high.mp4 id: 3123523 ------------------------------------

Diarisation ground truth

Contains information about the speakers appearing on the audio track of the video, as well as a transcription of their speech.
Each line has the start and end time of the speech (in seconds) followed by a database-level unique speaker identity and finally the speech transcription.
There may be comments, that should be ignored, indicated by a '#' in the first column, and a commented header to indicate the overall length of the video (in seconds).
-- Example (3123523.diarref.lab) --
#length=100.14
3.444518 10.693765 paul_bongiorno BACK INTO THE FRAY BOB CARR SWORN IN AS SENATOR SO HE CAN FULFIL A LONG TIME DREAM TO BECOME FOREIGN MINISTER
10.693765 12.922571 bob_carr I WILL BE FAITHFUL A BE A TRUE ALLEGIANCE
13.035137 17.312643 paul_bongiorno THE BIPARTISAN WELCOME WON'T DETER HIM FROM BEING A GOVERNMENT BOMB THROWER
17.312643 18.618408 bob_carr TONY ABBOTT IS LIKE THE< br /> --- ... continues ... -------------

Face ground truth

Contains information about the faces appearing in the video. Only faces judged to be sufficiently prominent and frontal are labelled at this stage.
Each line has the start and end time of the face appearance (in seconds) followed by a database-level unique speaker identity. Identity labels are shared between faces and speakers if they are the same person.
There may be comments, that should be ignored, indicated by a '#' in the first column, and a commented header to indicate the overall length of the video (in seconds).
-- Example (3123523.faceref.lab) --
#length=100.14
2.96 6.64 bob_carr
10.92 14.2 bob_carr
14.2 15.36 bob_carr
-- ... continues .. ---------------
While this file indicates the timing information of the faces in the videos, it does not contain the actual locations of the faces in the video. That information is in the matching <faceid>*.facepositions file, where each line has the time, the faceid, and the top, left, height and width of the face, collected around 2.5 times per second (or every 10 frames) whenever a face is visible.
-- Example (3123523.facepositions) --
#time id top left height width
2.96 bob_carr 74 526 76 64
3.36 bob_carr 100 508 70 60
3.76 bob_carr 92 526 72 62
4.16 bob_carr 68 500 70 62
-- ... continues ... ----------------

OCR Ground truth

Contains information about the on-screen text appearing in the video. Only text appearing in the lower third of the video is considered.
Each line has the start and end time of the text appearance (in seconds) followed by a video-level unique ocr identity. The identity is designed to indicate when different lines of text appear in the same area within the video.
There may be comments, that should be ignored, indicated by a '#' in the first column, and a commented header to indicate the overall length of the video (in seconds).
At this stage the ocr reference does not indicate the location of the ocr text. This may be provided in the future, and/or QUT would be happy to incorporate this information back into the ground truth if it is produced by other researchers.
-- Example (3123523.ocrref.lab) --
#length=100.14
5.3 7.7 OCR_1 PAUL BONGIORNO
5.3 7.7 OCR_1 NATIONAL AFFAIRS EDITOR
19.1 21.6 OCR_2 BOB CARR
19.1 21.6 OCR_2 FOREIGN MINISTER
-- ... continues .. ---------------

SAIVT Thermal Feature Detection

Overview

The SAIVT-Thermal Feature Detection Database contains a number of images suitable for evaluating the performance of feature detection and matching in the thermal image domain.

The database includes conditions unique to the thermal domain such as non-uniformity noise; as well as condition common to other domains such as viewpoint changes, and compression and blur.

You can read our paper on eprints.

Contact Dr Simon Denman for further information.

Licensing

The SAIVT Thermal Feature Detection Database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.

Attribution

To attribute this database, please include the following citation:

An exploration of feature detector performance in the thermal-infrared modality. Vidas, Stephen, Lakemond, Ruan, Denman, Simon, Fookes, Clinton B., Sridharan, Sridha, & Wark, Tim. (2011) In Bradley, Andrew, Jackway, Paul, Gal, Yaniv, & Salvado, Olivier (Eds.) Proceedings of the 2011 International Conference on Digital Image Computing: Techniques and Applications, IEEE , Sheraton Noosa Resort & Spa, Noosa, QLD, pp. 217-223. http://eprints.qut.edu.au/48161/
Acknowledging the database in your publications

In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications:

We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT Thermal Feature Detection Database for our research.

Installing the database

Download and unzip the following archive:

A copy of the publication can be found at http://eprints.qut.edu.au/48161/, and is also included in this package (Vidas 2011 - An exploration of feature detector performance in the thermal-infrared modality.pdf).

Related publications of interest may be found on the following webpages:

The database has the following structure:

  • Each of the ten environments is allocated its own directory.
  • Within most of these directories, thermal-infrared and visible-spectrum data is separated into the "thermal" and "visible" subdirectories respectively
  • Within each of these subdirectories, a "profile" folder is present which contains a sequence of "ideal" (untransformed) images in 8-bit depth format.
  • The "thermal" subdirectories also contain a "pure" folder which contains identical images in their original 16-bit depth format (which is difficult to visualize).
  • Also within each "thermal" subdirectory there may be additional folders present.

Each of these folders contain images under a single, controlled image transformation, the acronyms for which are expanded at the end of this document. The level of transformation varies (generally increasing in severity) as the numerical label for each subfolder increases.

ACRONYMS:

  • CMP Image Compression
  • GAU Gaussian Noise
  • NRM Histogram Normalization
  • NUC Non-Uniformities Noise
  • OFB Out-of-focus Blur
  • QNT Quantization Noise
  • ROT Image Rotation
  • SAP Salt and Pepper Noise
  • TOD Time of day variation
  • VPT Viewpoint change

SAIVT-BuildingMonitoring

Overview
The SAIVT-BuildingMonitoring database contains footage from 12 cameras capturing a single work day at a busy university campus building. A portion of the database has been annotated for crowd counting and pedestrian throughput estimation, and is freely available for download. Contact Dr Simon Denman for more information.
Licensing
The SAIVT-BuildingMonitoring database is © 2015 QUT, and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
Attribution
To attribute this database, use the citation provided on our publication at eprints:

S. Denman, C. Fookes, D. Ryan, & S. Sridharan (2015) Large
scale monitoring of crowds and building utilisation: A new
database and distributed approach. In 12th IEEE International
Conference on Advanced Video and Signal Based Surveillance,
25-28 August 2015, Karlsruhe, Germany.
Acknowledgement in publications
In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications:

'We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-BuildingMonitoring database for our research'.

Installing the SAIVT-BuildingMonitoring Database
Download and unzip the following archives:

Annotated Data (8 GB)

  • Part 1 (2GB, md5sum: 50e63a6ee394751fad75dc43017710e8)
  • Part 2 (2GB, md5sum: 49859f0046f0b15d4cf0cfafceb9e88f)
  • Part 3 (2GB, md5sum: b3c7386204930bc9d8545c1f4eb0c972)
  • Part 4 (2GB, md5sum: 4606fc090f6020b771f74d565fc73f6d)
  • Part 5 (632 MB, md5sum: 116aade568ccfeaefcdd07b5110b815a)

Full sequences

  • Part 1 (2 GB, md5sum: 068ed015e057afb98b404dd95dc8fbb3)
  • Part 2 (2GB, md5sum: 763f46fc1251a2301cb63b697c881db2)
  • Part 3 (2GB, md5sum: 75e7090c6035b0962e2b05a3a8e4c59e)
  • Part 4 (2GB, md5sum: 34481b1e81e06310238d9ed3a57b25af)
  • Part 5 (2GB, md5sum: 9ef895c2def141d712a557a6a72d3bcc)
  • Part 6 (2GB, md5sum: 2a76e6b199dccae0113a8fd509bf8a04)
  • Part 7 (2GB, md5sum: 77c659ab6002767cc13794aa1279f2dd)
  • Part 8 (2GB, md5sum: 703f54f297b4c93e53c662c83e42372c)
  • Part 9 (2GB, md5sum: 65ebdab38367cf22b057a8667b76068d)
  • Part 10 (2GB, md5sum: bb5f6527f65760717cd819b826674d83)
  • Part 11 (2GB, md5sum: 01a562f7bd659fb9b81362c44838bfb1)
  • Part 12 (2GB, md5sum: 5e4a0d4bb99cde17158c1f346bbbdad8)
  • Part 13 (2GB, md5sum: 9c454d9381a1c8a4e8dc68cfaeaf4622)
  • Part 14 (2GB, md5sum: 8ff2b03b22d0c9ca528544193599dc18)
  • Part 15 (1.6GB, md5sum: 86efac1962e2bef3afd3867f8dda1437)

To rejoin the individual parts, use:

  • cat SAIVT-BuildingMonitoring-AnnotatedData.tar.gz.* > SAIVT-BuildingMonitoring-AnnotatedData.tar.gz
  • cat SAIVT-BuildingMonitoring-FullSequences.tar.gz.* > SAIVT-BuildingMonitoring-FullSequences.tar.gz

Pre-processed Motion Segmentation (47 GB)

At this point, you should have the following data structure and the SAIVT-BuildingMonitoring database is installed:

SAIVT-BuildingMonitoring
+-- AnnotatedData
+-- P_Lev_4_Entry_Way_ip_107
+-- Frames
+-- Entry_ip107_00000.png
+-- Entry_ip107_00001.png
+-- ...
+-- GroundTruth.xml
+-- P_Lev_4_Entry_Way_ip_107-20140730-090000.avi
+-- perspectivemap.xml
+-- ROI.xml

+-- P_Lev_4_external_419_ip_52
+-- ...

+-- P_Lev_4_External_Lift_foyer_ip_70
+-- Frames
+-- Entry_ip107_00000.png
+-- Entry_ip107_00001.png
+-- ...
+-- GroundTruth.xml
+-- P_Lev_4_External_Lift_foyer_ip_70-20140730-090000.avi
+-- perspectivemap.xml
+-- ROI.xml
+-- VG-GroundTruth.xml
+-- VG-ROI.xml

+-- ...

+-- Calibration
+-- Lev4Entry_ip107.xml
+-- Lev4Ext_ip51.xml
+-- ...

+-- FullSequences
+-- P_Lev_4_Entry_Way_ip_107-20140730-090000.avi
+-- P_Lev_4_external_419_ip_52-20140730-090000.avi
+-- ...

+-- MotionSegmentation
+-- Lev4Entry_ip107.avi
+-- Lev4Entry_ip107-Full.avi
+-- Lev4Ext_ip51.avi
+-- Lev4Ext_ip51-Full.avi
+-- ...

+-- Denman 2015 - Large scale monitoring of crowds and building utilisation.pdf
+-- LICENSE.txt
+-- README.txt

Data is organised into two sections, AnnotatedData and FullSequences. Additional data that may be of use is provided in Calibration and MotionSegmentation.

AnnotatedData contains the two hour sections that have been annotated (from 11am to 1pm), alongside the ground truth and any other data generated during the annotation process. Each camera has a directory, the contents of which depends on what the camera has been annotated for.

All cameras will have:

  • a video file, such as "P_Lev_4_Entry_Way_ip_107-20140730-090000.avi", which is the 2 hour video from 11am to 1pm
  • a "Frames" directory, that has 120 frames taken at minute intervals from the sequence. There are the frames that have been annotated for crowd counting. Even if the camera has not been annotated for crowd counting (i.e. P_Lev_4_Main_Entry_ip_54), this directory is included.

The following files exist for crowd counting cameras:

  • "GroundTruth.xml", which contains the ground truth in the following format:

    <qutcrowd-count-gt interval-scale="1800">
    <frame id="0">
    <ped x="175" y="112" />
    </frame>
    <frame id="1">
    <ped x="149" y="97" />
    <ped x="187" y="97" />
    </frame>
    ....
    <qutcrowd-count-gt>

    The file contains a list of annotated frames, and the location of the approximate centre of mass of any people within the frame. The "interval-scale" attribute indicates the distance between the annotated frames in the original video.
  • "perspectivemap.xml", a file that defines the perspective map used to correct for perspective distortion. Parameters for a bilinear perspective map are included along with the original annotations that were used to generate the map.
  • "ROI.xml", which defines the region of interest as follows:

    <ROI num-points="8" image-width="768" image-height="576">
    <point x="0" y="152" /<
    <point x="239" y="117" />
    <point x="341" y="107" />
    <point x="428" y="110" />
    <point x="519" y="116" />
    <point x="763" y="159" />
    <point x="760" y="575" />
    <point x="0" y="575" />
    </ROI>

    This defines a polygon within the image that is used for crowd counting. Only people within this region are annotated.

For cameras that have been annotated with a virtual gate, the following additional files are present:

  • VG-GroundTruth.xml, which contains ground truth in the following format:

    <qutcrowd-flow-gt both-directions="0">
    <ROI num-points="4" image-width="856" image-height="480">
    <point x="622" y="91" />
    <point x="837" y="242" />
    <point x="837" y="282" />
    <point x="622" y="131" />
    </ROI>
    <doi>0<doi>
    <ped frame="1889" x="662" y="144" direction="1" />
    <ped frame="3615" x="667" y="137" direction="0" />
    <ped frame="4851" x="770" y="212" direction="1" />
    <ped frame="5153" x="659" y="129" direction="1" />
    <ped frame="6317" x="655" y="147" direction="1" />
    ...
    </qutcrowd-flow-gt>

    The ROI is repeated within the ground truth, and a direction of interest (the <doi> tag) is also included, which indicates the primary direction for the gait (i.e. the direction that denotes a positive count. Each pedestrian crossing is represented by a <ped> tag, which contains the approximate frame the crossing occurred in (when the centre of mass was at the centre of the gait region), the x and y location of the centre of mass of the person during the crossing, and the direction (0 being the primary direction, 1 being the secondary).
  • VG-ROI.xml, which contains the region of interest for the virtual gate

The Calibration directory contains camera calibration for the cameras (with the exception of ip107, which has an uneven ground plane and is thus difficult to calibrate). All calibration is done using Tsai's method.

FullSequences contains the full sequences (9am - 5pm) for each of the cameras.

MotionSegmentation contains motion segmentation videos for all clips. Segmentation videos for both the full sequences and the 2 hour annotated segments are provided. Motion segmentation is done using the ViBE algorithm. Motion videos for the entire sequence have "Full" in the file name before the extension (i.e. Lev4Entry_ip107-Full.avi).

Further information on the SAIVT-BuildingMonitoring database in our paper: S. Denman, C. Fookes, D. Ryan, & S. Sridharan (2015) Large scale monitoring of crowds and building utilisation: A new database and distributed approach. In 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, 25-28 August 2015, Karlsruhe, Germany.

This paper is also available alongside this document in the file: 'Denman 2015 - Large scale monitoring of crowds and building utilisation.pdf'.

SAIVT-DGD Database

Overview

Further information about the SAIVT-DGD database is available in our paper:

Sivapalan, Sabesan, Chen, Daniel, Denman, Simon, Sridharan, Sridha, & Fookes, Clinton B. (2011) Gait energy volumes and frontal gait recognition using depth images. In "Proceeding of the International Joint Conference on Biometrics", Washington DC, USA, available at http://eprints.qut.edu.au/46382/

Licensing
The SAIVT-DGD database is © 2012 QUT, and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.
Attribution

To attribute this database, please include the following citation:

Sivapalan, Sabesan, Chen, Daniel, Denman, Simon, Sridharan, Sridha, & Fookes, Clinton B. (2011) Gait energy volumes and frontal gait recognition using depth images. In "Proceeding of the International Joint Conference on Biometrics", Washington DC, USA, available at http://eprints.qut.edu.au/46382/

Acknowledgement in publications

In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications:

'We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-DGD database for our research'.

Installing the SAIVT-BuildingMonitoring Database
Download and unzip the following archives in the same directory:

At this point, you should have the following data structure and the SAIVT-DGD database is installed:

SAIVT-DGD
+--DGD
+--depth_raw
+--depth_silhouette
+--volume
+--docs

The database itself is located in the DGD sub directory. Documentation on the database, including calibration information and a copy of our paper is included in the docs sub directory.

Join as a student

We are looking for PhD students to join our group. See details for more information.

Contacts

Speech, audio, image and video technology research group

  • Level S Block, Room S1012A
  • Postal address:
    SAIVT Research Labs
    School of Electrical Engineering and Computer Science
    GPO Box 2434
    Brisbane QLD 4001