Confusion matrices characterise the performance of classification systems on training and test data, but they can be hard to make sense of, especially when there are many possible classes to which an example could be assigned.
We have developed a new method to visualise confusion matrices and make distinct the contribution of the classifier and the contribution of the prior abundance of different classes.
- In situations where there is a suggestion that a classifier is biased, what insights can we gain by applying our new method?
- How might the confusion matrix be used as a tool to identify bias and sources of bias in a classification system?
- What are some applications where ‘not all mistakes are equal’ (i.e. is a false positive equal to a false negative)?
- For these applications, how can we extend the confusion matrix to incorporate the cost of a misclassification?
- In machine learning, classification decisions often include a measure of uncertainty (often called a “confidence” measure). How might our confusion matrix methods be extended to incorporate the confidence of a classification decision?
We would like to demonstrate how this proposed confusion matrix method can be usefully applied (e.g., classifier comparison, assessing bias in classification).
Ideally, the outcomes from this work will lead to publication so that others can benefit from the new knowledge.
Skills and experience
You should have:
- experience programming in R or Python (ideally both)
- an appreciation of concepts in classification and statistics.
Contact the supervisor for more information.