# Case Study: Towards trustworthy algorithms

Algorithms pervade our lives, in online tools, social media networks, and decision-making for businesses and governments. All too often, however, their behaviour is poorly understood and it's hard to predict how they will perform in unfamiliar situations.

"We need a paradigm shift in the way we test algorithms," says Professor Kate Smith-Miles, who has developed a powerful new methodology for more rigorous stress-testing.

Consider the general challenge of finding anomalies in data.

Working with Professor Rob Hyndman at Monash and an industry partner, Kate and her team developed an algorithm for finding intrusions in security system data. The algorithm distinguishes between typical and anomalous behaviour, which means it can also be used for similar problems such as identifying signs of a disease outbreak.

While visiting Professor Kerrie Mengersen and Dr Erin Peterson at QUT, Kate discovered they were also trying to detect anomalies in data, but from water quality sensors. Mathematically, this was another example of the same task, so the QUT team joined the anomaly detection collaboration.

Even when the idea behind an algorithm can be applied in a new context, the coded algorithm may not be well suited to the nuances of the data in the new problem. The algorithm tailored for the security system, for example, will give unreliable results for water contamination without some adjustment.

One familiar example of algorithm failure can occur when you ask Google Maps how to get from A to B.

"Sometimes it gives you a solution you know is not the most effective, based on your experience," Kate says.

It might be relatively easy to see when a navigation algorithm is not giving the best answers, but in other situations it can be harder to tell.

"We will depend more and more on algorithms for decision-making in the future, with the rise of big data and machine learning and AI. We need rigorous methodologies for developing trustworthy algorithms."

At present, algorithms are generally tested on a small range of benchmark situations, especially in academic research. About a decade ago, Kate started wondering whether these benchmarks were truly exposing the strengths and weaknesses of algorithms.

"A lot of the benchmarks people use in certain fields are not fit for purpose. We can't trust the conclusions of that research because it hasn't been tested on the broadest possible set of situations."

Kate's alternative to benchmarking is stress-testing. The idea is to test an algorithm on the broadest possible set of instances to see where it works and – most importantly – where it breaks down. The key mathematical challenge is to understand the theoretical boundary of what instances are possible and how to generate them.

Over the last five years, Kate has developed objective stress-testing techniques and ways of visualising and understanding the results.

"We create a two-dimensional map called the 'instance space' containing all possible situations that an algorithm might face. Then we need to make sure we have test problems to cover the whole space, which lets us visualise the effectiveness of different algorithms everywhere. Finally, we can decide which one is best to use in a given situation."

Kate is currently preparing to launch an online platform called MATILDA, the Melbourne Algorithm Test Instance Library with Data Analytics. MATILDA will give researchers tools to generate instance spaces and conduct stress-testing, as well as hosting a growing collection of case studies.

Kate expects MATILDA will benefit not only the academic community but also industry, where deploying poorly suited algorithms can result in disaster.

While it may seem like overkill to test an algorithm across all possible situations, including those that seem far from real-world situations, Kate is focussed on the future.

Our understanding of the real world is based on the past and the present. We don't know what the real world will look like in 50 years. We need to be prepared.