Analysis of Functional Data

Lead CIs: Aurore Delaigle and Peter Hall

During the past decade functional data analysis has become a very popular field in statistics. The goal of the project is to develop novel statistical techniques for the analysis of such data, focusing on non standard data and problems. Functional data are data which are naturally represented by curves; examples include growth curves of children, annual rainfall or temperature curves at Australian weather stations, ozone pollution curves, arteries in the brain, near infrared spectra, temporal gene expression profiles, CD4 cell counts as a function of age, etc. They arise in a variety of fields, such as biosecurity, face recognition, health studies and weather modelling.

In this project we wish to develop new tools for problems that have received little attention so far. Our project will evolve with the years, but here are several concrete examples of projects we plan to start working on very soon:

  1. Develop new methods for analyzing functional data that have been observed only on parts of their domain. Such data arise when, because of budget or technical constraints, we cannot afford to observe the functions during more than one or some period(s) of time. For example, satellite data are often only partially observed because weather conditions affect possibility for the satellite to acquire data. There exists work for such data but they are either too restrictive (parametric restrictions, unrealistic independence assumptions etc), or work only in simple cases (for example data observed in the form of one fragment per individual). We wish to develop new techniques that work for more realistic data types. Our goals include curve reconstruction, covariance estimation, classification, clustering and prediction for such data.
  2. Develop new methods for classifying and clustering functional data that are not well aligned. When functional data are not perfectly aligned (for example the curves have roughly the same shape but they are shifted and/or rescaled), a common technique is to register the curves prior to analysis. However, such an approach cannot be applied straightforwardly in the classification and clustering cases, since by aligning the curves we run the risk of losing what makes the distinction between the various classes possible. We wish to develop more appropriate approaches to this problem.
  3. Develop techniques for estimating the correlation, over time, between several parts of the brain. This project originates from collaboration with a researcher from the School of Engineering at the University of Melbourne. The eventual goal is to construct a brain network that evolves with time.
  4. Investigate effective ways to analyze streaming functional data: how to update estimators, classifiers and others when the data arise in patches?
  5. How to analyze privatized functional data, i.e. functional data that have been modified to conceal the identity of the patients? There are several ways to privatize data. In the functional data case, a recent approach suggested in the literature is to add, to the curves, noise in the form of a Gaussian stochastic process. How can we analyze such transformed curves?