Study level

  • Vacation research experience scheme

Faculty/School

Faculty of Science

School of Computer Science

Topic status

We're looking for students to study this topic.

Research centre

Supervisors

Professor David Lovell
Position
Professor
Division / Faculty
Faculty of Science
Ms Dimity Miller
Position
Research Fellow in Machine Learning for Multimodal Spatiotemporal Streams
Division / Faculty
Faculty of Science

Overview

Confusion matrices characterise the performance of classification systems on training and test data, but they can be hard to make sense of, especially when there are many possible classes to which an example could be assigned.

We have developed a new method to visualise confusion matrices and make distinct the contribution of the classifier and the contribution of the prior abundance of different classes.

Hypotheses

  1. In situations where there is a suggestion that a classifier is biased, what insights can we gain by applying our new method?
  2. How might the confusion matrix be used as a tool to identify bias and sources of bias in a classification system?
  3. What are some applications where ‘not all mistakes are equal’ (i.e. is a false positive equal to a false negative)?
  4. For these applications, how can we extend the confusion matrix to incorporate the cost of a misclassification?
  5. In machine learning, classification decisions often include a measure of uncertainty (often called a “confidence” measure). How might our confusion matrix methods be extended to incorporate the confidence of a classification decision?

References

[1] Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216-231.
[2] Branco, P., Torgo, L., & Ribeiro, R. P. (2017, May). Relevance-based evaluation metrics for multi-class imbalanced domains. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 698-710). Springer, Cham.
[3] Beauxis-Aussalet, E., & Hardman, L. (2014). Visualization of confusion matrix for non-expert users. In IEEE Conference on Visual Analytics Science and Technology (VAST)-Poster Proceedings.
[4] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2019). A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635.
[5] Lovell, D (2021). confusR Draft vignette https://rpubs.com/DavidLovell/confusR-dev

Aims

We would like to demonstrate how this proposed confusion matrix method can be usefully applied (e.g., classifier comparison, assessing bias in classification).

Outcomes

Ideally, the outcomes from this work will lead to publication so that others can benefit from the new knowledge.

Skills and experience

You should have:

  • experience programming in R or Python (ideally both)
  • an appreciation of concepts in classification and statistics.

Keywords

Contact

Contact the supervisor for more information.