Overview

Topic status: We're looking for students to study this topic.

The field of authorship analysis can use machine learning techniques to classify the authorship of pieces of text after learning a person's writing style. Authorship analysis has been successfully carried out on writings as diverse as Shakespearian sonnets to email messages. SMS messages are limited by the fact that they are small, meaning that attribute or feature selection for machine learning will be of vital importance.

Hypothesis/Aims

To determine whether or not the writer of an SMS message can be identified from only the contents of a corpus of SMS messages.

Approaches

A corpus of SMS messages is readily available for this research. A set of features will need to be defined for SMS messages. This will most likely be a subset of those features already found to be successful for email messages. A machine learning classifier will need to be selected and tested on different feature sets for a variety of authors from the SMS corpus. Through iterative refinement the optimal parameters for classification of authorship will be determined.

References

  • Corney, Malcolm W., Anderson, Alison M., Mohay, George M., & de Vel, Olivier (2001) Identifying the Authors of Suspect Email.
  • de Vel, Olivier, Anderson, Alison M., Corney, Malcolm W., & Mohay, George (2001) Mining e-mail content for author identification forensics. ACM SIGMOD Record - Web Edition, 30(4).
Study level
Honours
Supervisors
QUT
Organisational unit

Science and Engineering Faculty

Research area

Computer Science

Contact
Please contact the supervisor.