Building trust in algorithms that we count on in our daily lives

How can you trust something that you're really not aware of, but yet controls more and more of how we see and do things?

The role of algorithms can’t be understated in the technological world we now live in. For example, they help dictate the search results and ads we get online and help us get from place to place. Plus, businesses and governments rely on them more than ever to help make critical decisions.


Professor Kate Smith-Miles

As algorithms play a bigger role in our lives, the question of trust, though, may not be about the algorithms themselves, but rather how we test them.

That's where Professor Kate Smith-Miles comes in. She's developed a new online tool that improves the way we "stress-test" algorithms.

"Trust builds when an algorithm gives the ‘right answer' to enough convincing test cases. Not just the easy, obvious ones. But challenging ones, realistic ones, and those covering enough diversity that we are confident there is no bias," said Kate, a Professor of Mathematics at The University of Melbourne and a Chief Investigator for the ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).

Kate's story that she wrote for The Conversation

Until now, though, that really hasn't happened. Instead, Kate says most algorithms are merely tested on a set of benchmark test cases that are not scrutinised for these challenging properties. Also, the results of these tests on algorithms merely show how they perform ‘on average'.

This is especially true in the academic peer-review process.

"If a new algorithm is superior on these benchmarks on average, it is usually publishable. If an algorithm is not competitive with existing algorithms, it is either hidden away or some new test examples are presented where the algorithm is superior," said Kate.

"It's the computer science version of medical researchers failing to publish the full results of clinical trials."

All algorithms have weaknesses, or warts, as Kate likes to call them. She says it's extremely important to show where an algorithm will work well, and more importantly, where it could be unreliable.


A Google-maps-type problem with diverse test scenarios as dots: Algorithm B (red) is best on average, but Algorithm A (green) is better in many cases. Source: MATILDA

"We need to be asking questions about the choice of test examples: where they came from, if their history creates any bias, and how representative they are of the kinds of examples we may encounter in the future," said Kate.

To help researchers do that, Kate and her team have launched a web resource. It's called MATILDA, which stands for the Melbourne Algorithm Test Instance Library with Data Analytics. 

What's in a name? Fun facts about why this project is called MATILDA

MATILDA can show the objective strengths and weaknesses of an algorithm through powerful visualisations of a comprehensive 2D test "instance space".

"These instance spaces are filled with carefully generated test examples to augment benchmark datasets, and cover every nook and cranny of the space in which the algorithm could operate: be that working out when it is safe for a plane to land, or ensuring cancer diagnosis is accurate. They reveal which algorithms should be used when, and why," said Kate.

Kate hopes her ’instance space analysis’ will soon replace the standard ‘on average' reporting, now that the tools are available to support a more insightful approach.

The MATILDA project was funded by the Australian Research Council under the Australian Laureate Fellowship scheme awarded to Kate in 2014.

ACEMS Media Contact: Tim Macuga, 07 3138 6741