Overview
The Speech, Audio, Image and Video Technologies (SAIVT) research group conducts world class research, postgraduate training, industrial consultancy and product development in the areas of Speech, Audio, Image and Video Technologies. A major focus of the research is in applying Machine Learning techniques to solve real world problems in Computer Vision and Speech and Language Processing.
The group was established in 1989 and has graduated 40 PhD students and 10 Masters by Research students in the areas of speech, audio, image and video technologies. Currently 20 full-time PhD students are enrolled within the group.
- Grantor
-
- Australian Research Council (ARC)
- National Security Science and Technology (NSST)
- Various other sources (government and private sector)
- Research leader
- Research team
- QUT
- Organisational unit
- Lead unit Information Security Institute Other units
Details
A major focus of the research is in applying Machine Learning techniques to solve real world problems in Computer Vision and Speech and Language Processing. This research is conducted within two laboratories; the Speech and Audio Research Lab and the Image and Video Research Lab.
Postgraduate Training
The market for Speech, Audio and Video technology based products and processes is expected to be worth several billion dollars worldwide at the beginning of the 21st century. There is currently a worldwide shortage of adequately trained engineers to work in these areas. One of the goals of SAIVT (Speech, Audio, Image & Video Technology) is to provide such training at postgraduate level.
The SAIVT Program offers exciting Phd (3 years) and Masters by Research (2 years) programs. Graduates of the Lab have been able to get good jobs commensurate with their qualifications. Scholarships are also available.
Projects
- Speech and Audio Research Lab
- Image and Video Research Lab
Partnerships
Collaborative programs
- DSTO
- Australian Federal Police
- Queensland Police Service
- CSIRO
- Codan Pty Ltd
- Boeing Australia
- Harris Corporation, USA
- Telstra
- Australia Post
- Motorola Australian Research Centre
Publications and output
Publications
2003
- S. Ghaemmaghami, M. Deriche and S. Sridharan, "Interpolative coding of speech parameters using hierarchical temporal decomposition," Digital Signal Processing, Volume 13, Issue 3 , July 2003, Pages 433-456.
- A. Nguyen, V. Chandran, S. Sridharan, and R. Prandolini, "Interpretability performance assessment of JPEG2000 and Part 1 compliant region of interest coding," IEEE Transactions on Consumer Electronics (Special Section on JPEG2000), November, 2003.
- T. Wark, S. Sridharan, and V. Chandran, "A hybrid chromatic-parametric approach to automatic unsupervised lip-tracking," Electronic Imaging. (under review).
- S. Lucey, S.Sridharan and V. Chandran, "Improved facial-feature detection for AVSP via unsupervised clustering and discriminant analysis," EURASIP Journal on Applied Signal Processing, Volume 2003, Number 3, March 2003, pp.264-275.
- C. Fookes and M. Bennamoun, "Rigid Medical Image Registration and its Association with Mutual Information", International Journal of Pattern Recognition and Artificial Intelligence, Special Issue on Registration and Correspondence Techniques, Vol. 17, No. 7, pp. 1-40, 2003.
2002
- Iain A. McCowan, Darren C. Moore and S. Sridharan, "Near-field Adaptive Beamformer for Robust Speech Recognition", Digital Signal Processing, Volume 12, Issue 1, January 2002, Pages 87-106
- Simon Lucey, Sridha Sridharan and Vinod Chandran, "Adaptive mouth segmentation using chromatic features", Pattern Recognition Letters, Volume 23, Issue 11, September 2002, Pages 1293-1302
- V. Chandran, S. Elgar, and A. Nguyen, "Detection of mines in acoustic images using higher order spectral features," IEEE Journal of Oceanic Engineering, vol. 27, no. 3, July 2002.
- Boyle J, Maeder A, Boles W, "Image Enhancement for Electronic Visual Prostheses", Australasian Physical & Engineering Sciences in Medicine Journal 25(2), pp.81-86, 2002
- S.Ghaemmaghami, S,Sridharan and V.Chandran, "Coding speech at very low bitrates using temporal decomposition based spectral interpolation and mixed excitation in the LPC model", Applied Signal Processing (in print).
- T. Wark and S. Sridharan, "Adaptive fusion of speech and lip information for robust speaker identification," Digital Signal Processing, Vol. 11, No. 3, pp. 169-186, July 2001.
- S. Lucey, S. Sridharan, and V. Chandran. "Robust Lip Tracking using Active Shape Models and Gradient Vector Flow", Australian Journal of Intelligent Information Processing Systems, 2001. (Accepted for publication).
- I. McCowan and S.Sridharan, " Multi-channel sub-band speech recognition", EURASIP Journal on Applied Signal Processing, vol. 20001, No. 1, pp. 45-52, March 2001
- I. McCowan, D. Moore and S. Sridharan, "Near-field adaptive beamformer for robust speech recognition," Digital Signal Processing, Vol. 11 No. 1, pages 1-20, October 2001.
- J. Pelecanos and S. Sridharan Rapid channel compensation for speaker verification in the NIST 2000 Speaker Recognition Evaluation Acoustics Australia, Vol. 29 No. 1, pp. 17-20, April 2001.
- W. Boles The Learners and their Learning Environment in an Engineering Curriculum "Towards Sustainable Development" - Shaping the Sustainable Millennium International Conference Proceedings, pp 56 - 63, Brisbane, 5-7 July 2001.
- H Pillay, W Boles and A R McCrindle Understanding the use of domain and task knowledge in the interpretation of graphical displays European Journal of Psychology of Education, Vol.XVI, No.4, pp 491-508, Dec 2001
- Boyle J.R., Maeder A.J., Boles W.W., Static Image Simulation of Electronic Visual Prostheses, Proceedings of the 7th Australian and New Zealand Intelligent Information Systems, Perth, pp.85-88, Nov 2001
- Boyle J.R., Maeder A.J., Boles W.W., Challenges in Digital Imaging for Artificial Human Vision, Proceedings SPIE San Jose, Jan 2001
- D.Thambiratnam and S.Sridharan, "Improving speaker recognition accuracy for small vocabulary applications in an adverse environment", International Journal of Speech Technology, Vol. 3, Iss. 2, pp. 109-117, June 2000.
- D.Cole, S.Sridharan and M.Moody, "Frequency Offset Correction for HF Radio Speech Reception", IEEE Transactions on Industrial Electronics, Vol. 47, No. 2, pp. 438-443, April 2000.
- T.Wark, S.Sridharan and V.Chandran, "Learning object dynamics for smooth tracking of moving lip contours", IEE Electronic Letters, 36 (6), pp. 520-521, 2000.
- J.Boyle, A.Maeder and W.Boles, "Digital Imaging Challenges for Artificial Human Vision", South African Computer Journal (Special Issue: SAICSIT'00), No.26, pp. 222-227, Nov 2000.
- Q.Tieng and W.Boles, "Space Curve Representation and Recognition Based on Wavelet Transform Zero Crossings", Journal of Mathematical Imaging and Vision, 13, pp. 5-16, 2000.
- S.Ghaemmaghami, S,Sridharan, "Very low rate speech coding using temporal decomposition", IEE Electronic Letters, pp. 456-457, vol. 35, No.6, 1999.
- J.Leis and S.Sridharan, "Fast Search Methods for Spectral Quantization", Digital Signal Processing, pp. 76-88, vol. 9, No. 2, 1999.
- J.Leis and S.Sridharan, "Adaptive Vector Quantization for Speech Spectrum Coding", Digital Signal Processing, pp. 89-106, vol. 9, No. 2, 1999.
- S.Slomka, P.Castellano and S.Sridharan, "Comparing the multiple binary classifier model to other automatic speaker verification models", Applied Signal Processing, vol 6, pp. 13-24, 1999.
- W.Boles, "Classroom assessment for improved learning: a case study using e-mail and involving students in preparing assignments", Higher Education Research & Development, vol 18, no 1, pp. 145-159, 1999.
- W.Boles, "Recognising 2D object contours in 3D space using wavelet transform", Australian Computer Journal, vol 31, no 1, pp. 17-26, 1999.
- W.Boles, P.Hitendra and R.Leonard, "Matching cognitive styles to computer-based instruction: An approach for enhanced learning in Electrical Engineering", European Journal for Engineering Education, vol 24, no 4, pp. 371-383, 1999.
- S.Ghaemmaghami, S,Sridharan and V.Chandran, "Speech compaction using temporal decomposition", IEE Electronic Letters, pp. 2317-2319, vol. 34, No. 24, 1998.
- D.Cole, S.Sridharan and M.Moody, "Position independent enhancement of reverberant speech", Journal of Audio Engineering Society, pp. 142-147, March 1997.
- Y.Cao, S.Sridharan and M.Moody, "Multi channel speech separation by eigen decomposition and its application to co-talker interference removal", IEEE Transactions on Speech and Audio Processing, pp. 209-219, vol. 5, May 1997.
- S. Boland, M. Deriche and S.Sridharan, "Compression of high quality audio using a hybrid LPC Discrete Wavelet Transform algorithm", Applied Signal Processing, No. 4, pp.39-55, 1997.
- P.Castellano and S.Sridharan, "A two stage fuzzy decision classifier for speaker identification", Speech Communication , vol. 18 , pp.139-149, Feb 1996.
- B.Boashash, S.Sridharan and V.Chandran, "The development of a new signal Processing program at the Queensland University of Technology", IEEE Trans. on Education, vol.39, No.2, pp. 186-191, May 1996.
- Y. Cao, S.Sridharan and M.Moody, "Speech enhancement using microphone array with multi-stage processing", ICICE Transactions on Fundamentals of Electronic Communication and Computer Sciences. vol. E79-A, No 3, pp. 386 - 394, March 1996.
- Y.Cao, S.Sridharan and M.Moody, "Simulation of cocktail party effect with neural network controlled iterative Wiener filter", ICICE Transactions on Fundamentals of Electronic Communication and Computer Sciences, Vol E79-A, No.6, pp. 944-946, June 1996.
- P.Castellano and S.Sridharan, "Effects of speech coding on speaker verification", IEE Electronics Letters, pp. 517-518, March 1996.
- Y.Cao, S.Sridharan and M.Moody, "A speech enhancement system incorporating neural network simulation of cocktail party effect", Applied Signal Processing, pp. 143-150, vol. 3, No. 3, 1996.
- Y.Cao, S.Sridharan and M.Moody, "Co-talker separation using the Cocktail Party Effect", Journal of the Audio Engineering Society", vol. 44,No. 12, pp. 1084-1096, Dec 1996.
- P.Castellano and S.Sridharan, "Speaker Identification with concomitant open and closed decision boundaries", Australian Journal of Intelligent Information Processing Systems, vol 2, no.2, pp. 47-53, 1995.
- P. Castellano and S. Sridharan, "Text independent speaker identification with tensor link neural networks", Applied Signal Processing, pp. 155-165, vol 1, 1994.
- S. Sridharan, B. Goldburg and E. Dawson, "Cryptanalysis of frequency domain analog speech scramblers", IEE Proceedings, Part I, Communication, Speech and Vision, vol 140, pp. 235-239, Aug 1993.
- B. Goldburg, S. Sridharan, and E. Dawson, "Design and cryptanalysis of transform based analog speech scramblers", IEEE Selected Areas in Communication, vol 11, pp735-743, June 1993.
- S. Sridharan, E. Dawson and B. Goldburg, "A Fast Fourier Transform based speech encryption system", IEE Proc. Part 1, Communication, Speech and Vision, vol. 18 No. 3, June 1991.
- B. Goldburg, S. Sridharan and E. Dawson, "Design of a discrete cosine transform base speech scrambler", Electronic letters, vol. 27, No. 7, pp.613 614, March 1991.
- B. Goldburg, E. Dawson and S. Sridharan," Automated cryptanalysis of analog speech scramblers", Advances in Cryptology, Springer-Verlag, pp. 422-430, 1991.
- S. Sridharan, E. Dawson and B. Goldburg, "Speech encryption in the Transform domain", Electronic Letters, IEE, Vol. 26, No.10, May, 1990.
- S. Sridharan and G. Dickman, "Block floating point implementation of digital filters using DSP56000", Microprocessors and Microsystems, Butterworth & co Publishing LTD, UK, Vol. 12, No. 6, July/August, 1988.
- S. Sridharan, "On improving the performance of digital filters designed using the TMS32010 signal processor", Journal of Electrical and Electronic Engineering, Vol. 7, pp. 80-82, March, 1987.
- S. Sridharan and D. Williamson, "Implementation of high-order direct form digi-tal filter structures", IEEE Trans. Circuits and Syst., Vol. CAS-33, pp. 818-822, August, 1986.
- D. Williamson and S. Sridharan, "Error feedback in a class of orthogonal poly-nomial digital filter structures", IEEE Trans. Acoust. Speech Signal Processing, Vol. ASSP-34, pp. 1013-1016, August, 1986.
- D. Williamson, S. Sridharan and P. G. McCrea, "A new approach to block float-ing point arithmetic in recursive digital filters", IEEE Trans. Circuits Syst., Vol. CAS-32, No. 7, pp. 719-722, July, 1985.
- D. Williamson and S. Sridharan, "An approach to coefficient wordlength reduc-tion in digital filters", IEEE Trans. Circuits and Syst., Vol. CAS-32, pp. 893-903, September, 1985.
- S. Sridharan and D. Williamson, "Comments on suppression of limit cycles in digital filters designed with one magnitude truncation quantizer", IEEE Trans. Circuits Syst., Vol. CAS-31, pp. 235-236, February, 1984.
Output
SAIVT Research Program has developed the WORLD'S FASTEST Radio and Television Broadcast logging tool. This performs real-time monitoring of broadcast material to identify, label and record details of program material for auditing or statistical purposes.
SAIVT Speech Research Lab takes FIRST PLACE in the NIST 2001 world wide speaker recognition benchmarking, in the categories: Single Speaker Detection Task (Basic) and Single Speaker Detection Task (Cellular Data). The approach used by SAIVT researchers included the novel technique of feature warping. This technique, which has since been adopted widely, was first described in: J.Pelecanos and S.Sridharan, "Feature warping for robust speaker verification" 2001 - A Speaker Odyssey, Crete, Greece, pp.213-218, 18-22 June 2001.
Professor Sridha Sridharan