Overview

Topic status: We're looking for students to study this topic.

Project Summary

Being able to determine “who spoke when” is a very important task in creating a rich, searchable transcription of video content. This task, commonly referred to as diarisation when performed within a single recording, or attribution when applied across an entire collection is performed by using techniques initially developed for the biometric recognition of speaker identities. By identifying which sections of speech across one or more recordings have similar characteristics of speech, traditional automatic speech recognition algorithms can be augmented with speaker identities, providing, automatically, a much richer source of information than speech recognition alone.

By taking advantage of QUT’s existing state-of-the-art in speaker attribution, diarisation and recognition algorithms, this project will illustrate the importance of this task by recognising and clustering speakers over more than one thousand videos collected from web video broadcast. In addition to performing the attribution task over a large collection of news video, techniques will also be investigated to allow the “who spoke when” information to be integrated with other sources of information such as visible faces, optical character recognition of on-screen captions and other metadata such as titles and descriptions associated with videos.

Expected outcomes, applications and/or benefits

A basic understanding of automatic speech processing techniques with a focus on speaker recognition, diarisation and attribution.

A demonstration system that illustrates speaker attribution techniques in combination with other existing multimedia indexing techniques already developed at QUT.

Improvement of automatic indexing, and therefore searchability, of news video corpora.

Required student skills/experience

Strong programming experience, with particular focus on C++ and/or Python.

Study level
Vacation research experience scholarship
Supervisors
QUT
Organisational unit

Science and Engineering Faculty

Research area

Computer Science

Keywords
video, news, diarisation, recognition
Contact
Contact the supervisor for more information