Contributions to Change Point and Functional Data Analysis

Loading...
Thumbnail Image

Date

2025-04-29

Advisor

Rice, Gregory
Chenouri, Shojaeddin

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

The advent and progression of computers has led to consideration of data previous considered too unwieldy. So called high-dimensional, or big, data can be considered large in both the size of observations and the number of observations. In this thesis, we consider such data which may be infinite dimensional and is often collected over some dimension, such as time. Methodology for detection of changes and exploration of this information-rich data is explored. Chapter 1 provides a review of concepts and notation used throughout the thesis. Topics related to time series, functional data, and change point analysis are of particular interest and form the foundation of the thesis. The chapter concludes with an overview of the main contributions contained in the thesis. An empirical characteristic functional-based method for detecting distributional change points in functional time series is presented in chapter 2. Although various methods exist to detect changes in functional time series, they typically require projection or are tuned to specific changes. The characteristic functional-based approach is fully functional and sensitive to general changes in the distribution of functional time series. Integrated- and supremum-type test statistics are proposed. Theoretical considerations for the test statistics are examined, including asymptotic distributions and the measure used to integrate the test statistic over the function space. Simulation, permutation, and approximation approaches to calibrate detection thresholds for the test statistics are investigated. Comparisons to existing methods are conducted via simulation experiments. The proposed methods are applied to continuous electricity prices and high-frequency asset returns. Chapter 3 is devoted to graph-based change point detection. Graph-based approaches provide another method for detecting distributional changes in functional time series. Four test statistics and their theoretical properties are discussed. Extensive simulations provide context for graph-based tuning parameter choices and compare the approaches to other functional change point detection methods. The efficacy of graph-based change point detection is demonstrated on multi-year pedestrian counts, high-resolution stock returns, and continuous electricity prices. Despite increased interest in functional time series, available implementations are largely missing. Practical considerations for applying functional change point detection are covered in chapter 4. We present fChange, a functional time series package in R. The package combines and expands functional time series and change point methods into an easy-to-use format. The package provides functionality to store and process data, summarize and validate assumptions, characterize and perform inference of change points, and provide visualizations. The data are stored as discretely observed observations, promoting usability and accuracy. Applications to continuous electricity prices, cancer mortality, and long-term treasury rates are shown. In chapter 5, we propose novel methodology for analyzing tumor microenvironments (TMEs) in cancer research. TMEs contain vast amounts of information on patient's cancer through their cellular composition and the spatial distribution of tumor cells and immune cell populations. We present an approach to explore variation in TMEs, and determine the extent to which this information can predict outcomes such as patient survival or treatment success. Our approach can identify specific interactions which are useful in such predictions. We use spatial $K$ functions to summarize interactions, and then apply a functional random forest-based model. This approach is shown to be effective in simulation experiments at identifying important spatial interactions while also controlling the false discovery rate. We use the proposed approach to interrogate two real data sets of Multiplexed Ion Beam Images of TMEs in triple negative breast cancer and lung cancer patients. The publicly available companion R package funkycells is discussed. The random coefficient autoregressive model of order 1, RCA(1), is a model well-suited for volatile time series. Detection of changes between stable and explosive regimes of scalar data modeled with the RCA(1) is explored in chapter 6. We derive a (maximally selected) likelihood ratio statistic and show that it has power versus breaks occurring even as close as O(\log \log N) periods from the beginning/end of sample. Moreover, the use of quasi maximum likelihood-based estimates yields better power properties, with the added bonus of being nuisance-free. Our test statistic has the same distribution - of the Darling-Erd\H{o}s type - irrespective of whether the data are stationary or not, and can therefore be applied with no prior knowledge on this. Our simulations show that the test has very good power and, when applying a suitable correction to the asymptotic critical values, the correct size. We illustrate the usefulness and generality of our approach through applications to economic and epidemiological time series. Chapter 7 provides summaries and discussions on each chapter. Directions for future work are considered. These directions, with the provided commentary, extend the scope of the models and may behoove practitioners and researchers alike.

Description

Keywords

time series, change point, functional data, high-dimensional data

LC Subject Headings

Citation