What's the link between statistical analysis and a shopping spree at Walmart?
The answer to this question comes in the shape of Professor Rob Hyndman, Chief Investigator at ACEMS. As one of the world’s leading statisticians, his research is all about understanding large data sets and using them to make predictions about the future.
A particularly interesting piece of work Rob is doing at the moment involves analysing clothing sales for Walmart’s online store. As he explains, the vast array of clothing available at Walmart requires a lot of forward planning.
rob-hyndman-walmart_daily_sales.png
Daily apparel sales for 8 Walmart departments from 1 January 2013 to 4 October 2017. The large spikes correspond to Christmas sales.
“The clothes at Walmart can be classified into a ‘hierarchical structure’,” Rob says. “The clothes are classified as male or female clothing, and then further split up into different clothing types – such as trousers, shirts or dresses – and then into different sizes, and so on. Walmart needs to forecast each clothing type at each size, in order to make sure it has an appropriate amount of inventory stock to meet customer demand.”
But forecasting exactly how many items of clothing Walmart needs in each sub- category is pretty complicated. For a start, the forecasts need to match the structure of the data. ‘Reconciled’ forecasts occur when the forecasts add up appropriately across the hierarchy. For instance, sales of male and female clothing should be equal to the sales of all clothing. However, when each item of clothing is forecasted separately, the separate forecasts tend not to add up to the total number of sales.
“This means that there’s a ‘reconciliation’ process required to adjust the forecasts,” explains Rob. “With several collaborators, I have been working on a solution to this problem by developing an ‘optimal reconciliation approach’. Our approach works in any context where reconciled forecasts are needed, no matter how big the forecasting problem.”
However, reconciled forecasts are not enough for Walmart. The store needs to monitor the probability of running out of a particular item of clothing and ensure that this probability is kept low. At the same time, Walmart does not want to end up with a huge amount of stock it is unable to shift.
That’s where Rob’s statistical and mathematical expertise comes in. “There’s no point in keeping enough stock to meet average demand – the amount of stock on-hand must be sufficient to meet demand almost all of the time,” he explains. “We’re responding to this challenge by forecasting apparel sales for Walmart’s online store based on probability. We call this probabilistic hierarchical forecasting.”
To date, Rob’s team has been busy delving into huge data sets collected over successive points in time. They are developing exciting new data visualisation tools that uncover patterns in the data on clothing sales and also provide a basis for making smarter and better-informed decisions about inventory stock going forward.
As part of their work on probabilistic hierarchical forecasting, Rob and his team are creating software using the open-source R language that implements their statistical models for everyone to use. The data they have obtained from Walmart will allow them to test out the models and software on a very large and complicated forecasting problem, and to get feedback from Walmart on how the tools can be further improved.
Rob’s pioneering work is at the cutting edge of statistics, as no one currently knows how to do optimal probabilistic hierarchical forecasting. And, while it’s great that
a large global company like Walmart sees the value of Rob’s amazing research, the future implications of Rob’s project are likely to be much bigger than Walmart.
“Hierarchical forecasting occurs in almost every industry, including manufacturing, retail, energy, telecommunications, finance and health,” Rob says. “Our hope is that the results of our research can be applied in millions of organisations all over the world.”