Decision tree methods for filling missing data in satellite images and identifying woodland

Free satellite images are a useful source of data for environmental monitoring. A barrier to using these data are gaps in the images due to clouds, especially in tropical parts of the world which have persistent high cloud cover.

ACEMS researchers, led by Jacinta Holloway, have produced a fast and highly accurate approach for filling data gaps due to clouds and classifying land cover using decision tree methods, data which can be extracted from any free satellite image, and open source software, R. The effectiveness of the methods was demonstrated on a case study in Queensland, Australia identifying presence of woodland and grassland.

The research showed both decision tree methods; gradient boosted machine and particularly random forest, could classify observed and missing data with up to 0.90 accuracy for a binary classification of grassland or woodland. The results also showed that the random forest method was more accurate for missing and observed data than a well-established spatial method inverse distance weighted interpolation. The decision tree approach accurately interpolated missing data and classified grassland and woodland, and is transferable to other images and land cover types.

The authors are in discussions with the United Nations Global Working Group on Big Data for Official Statistics about sharing the algorithms with a wider audience through the UN Global Platform.