Statistics and Actuarial Science

Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9934

This is the collection for the University of Waterloo's Department of Statistics and Actuarial Science.

Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).

Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.

Browse

Now showing 1 - 6 of 6

Data Depth Inference for Difficult Data
(University of Waterloo, 2022-07-18) Ramsay, Kelly; Chenouri, Shoja'eddin
We explore various ways in which a robust, nonparametric statistical tool, the data depth function can be used to conduct inference on data which could be described as difficult. This can include data which are difficult in structure, such as multivariate, functional, or multivariate functional data. It can also include data which are difficult in the sense that published statistics must satisfy privacy constraints. We begin with multivariate data. In Chapter 2, we develop two robust, nonparametric methods for multiple change-point detection in the covariance matrix of a multivariate sequence of observations. We demonstrate that changes in ranks generated from data depth functions can be used to detect certain types of changes in the covariance matrix of a sequence of observations. In order to catch more than one change, the first algorithm uses methods similar to that of wild-binary segmentation (Fryzlewicz, 2014). The second algorithm estimates change-points by maximizing a penalized version of the classical Kruskal Wallis ANOVA test statistic. We show that this objective function can be maximized via the well-known pruned exact linear time algorithm. We show under mild, nonparametric assumptions that both of these algorithms are consistent for the correct number of change-points and the correct location(s) of the change-point(s). We demonstrate the efficacy of these methods with a simulation study and a data analysis. We are able to estimate changes accurately when the data are heavy tailed or skewed. We are also able to detect second order change-points in a time series of multivariate financial returns, without first imposing a time series model on the data. In Chapter 3 we extend these methods to the setting of functional data, where we develop a group of hypothesis tests which detect differences between the covariance kernels of several samples. These tests, called functional Kruskal Wallis for covariance tests, are based on functional data depth ranks, which are combined using the classical Kruskal Wallis test statistic. These tests are very robust; we demonstrate that these tests work well when the data are very heavy tailed, both in simulation and theoretically. Specifically, in order for the test to be consistent there is no need to assume that the fourth moment of the observations is finite, which is a typical assumption of existing methods. These tests offer several other benefits: they have a simple distribution under the null hypothesis, they are computationally cheap and they posses linear invariance properties. We show via simulation that these tests have higher power than their competitors in some situations, while still maintaining a reasonable size. We characterize the behavior of these tests under the null hypothesis and show consistency of the several versions of the tests under general alternative hypotheses. We also provide a method for computing sample size and provide some analysis under local alternatives when the ranks are based on a new depth function. In Chapter 4 we present methods for detecting change-points in the variability of a sequence of functional data, thus, combining the methods of Chapter 2 and Chapter 3. Our methods allow the user to test for one change-point, to test for an epidemic period, or to detect an unknown amount of change-points in the data. Since our methodology is based on depth-ranks, we have no need to estimate the covariance operator, which makes our methods computationally cheap. For example, our procedure can identify multiple change-points in O($n\log n$) time. Our procedure is fully non-parametric and is robust to outliers through the use of data depth ranks. We show that when $n$ is large, our methods have simple behavior under the null hypothesis. We also show that the functional Kruskal Wallis for covariance change-point procedures are $n^{-1/2}$-consistent. In addition to asymptotic results, we provide a finite sample accuracy result for our at-most-one change-point estimator. In simulation, we compare our methods against several other methods from the literature. We also present an application of our methods to intraday asset returns and f-MRI scans. In Chapter 5 we investigate differentially private estimation of depth functions and their associated medians. We then present a private method for estimating depth-based medians, which is based on the exponential mechanism (McSherry, 2007). We compute the sample complexity of these private medians as a function of the dimension, prior parameters and privacy parameter. As a by-product of our work, we present a smooth depth function, which we show has the same depth-like properties as its non-smooth counterpart. Another by-product of our work is uniform concentration for several depth functions. We also present methods and algorithms for estimating private depth values at in-sample and out-of-sample points. In addition, we extend the propose-test-release methodology of (Brunel, 2020) to be used with depth functions and the exponential mechanism. We show that when using propose-test-release to projection depth values, the probability of no reply is small, and the private depth values concentrate around their population counterparts. We also give an algorithm to approximate the ``test'' step in propose-test-release, since it is computationally difficult. We show that this approximation maintains the low probability of no-reply as in the original propose-test-release. Chapter 6 presents some possible directions for future research related to network data and shape data.
Enhanced Backward Multiple Change-Point Detection
(University of Waterloo, 2023-05-09) Pirnia, Shahab; Chenouri, Shoja'eddin
Many statistical tools are built upon a specific set of assumptions on the distribution of the data at hand. However, the distribution of the observations in the dataset may not remain constant and may change due to some external events. For a sequence of observations, the points after which the distribution function has changed are commonly referred to as change points. Identifying such points can also be critical in gaining insights into the distributional behaviour of random variables and constructing statistical models. Thus, the change points analysis potentially applies to almost all data-driven disciplines, such as biology, finance, and public policies. Change points analysis is categorized into online and offline analysis. The online change points analysis is designed to detect changes in the distribution of random variables as new observations are introduced. On the other hand, offline analysis is concerned with recovering change points within a historical dataset. In this thesis, we are only concerned with offline change point analysis; for simplicity, we refer to offline change points analysis as change points analysis. Change point analysis was born 70 years ago from the quality control discipline Page (1954). Initially, the main focus of the change points literature was on the single change point scenario in which, at most, one change point exists within a sequence of random vari- ables. However, with the advent of computers, the focus has switched to multiple change point detection problems. This shift does not imply that single change point detection methods are irrelevant. For instance, many multiple change point detection methods re- cover change points by conducting a single change point test locally. This class of change point detection methods is called local search methods. One of the primary concerns of local search methods is the application of a single change point test statistic within the largest possible segment of the sequence of random variables with exactly one change point. Obtaining such intervals is a difficult task. For instance, wild binary segmentation Fryzlewicz et al. (2014) extracts the change points from intervals containing multiple change points. On the other hand, the narrowest over threshold Bara- nowski et al. (2019) estimates the change points within the narrowest intervals in which a predefined threshold is satisfied. Thus, the accuracy of the estimated locations of change points may suffer due to the shortness of these intervals. In this thesis, we propose two local search methods that attempt to infer locations of change points within the desirable in- tervals. The first method, enhanced backward detection (EBD), recovers the change points by eliminating unlikely candidates sequentially. The second method, i.e., narrowest over threshold via interval selection with shortened exhaustive search (NOT-IS.SES), estimates the location of change points by following a top-down approach. That is, the change points are added to the active set sequentially. EBD and NOT-IS.SES are general procedures that can be applied to a wide range of change point problems by simply changing the underlying single change point test statistics.
Large Data-to-Text Generation
(University of Waterloo, 2023-05-16) Sarangian, Varnan; Chenouri, Shoja'eddin
This thesis presents a domain-driven approach to sports game summarization, a specific instance of large data-to-text generation (DTG). We first address the data fidelity issue in the Rotowire dataset by supplementing existing input records and demonstrating larger relative improvements compared to previously proposed purification schemes. As this method further increases the total number of input records, we alternatively formulate this problem as a multimodal problem (i.e. visual data-to-text), discussing potential advantages over purely textual approaches and studying its effectiveness for future expansion. We work exclusively with pre-trained end-to-end transformers throughout, allowing us to evaluate the efficacy of sparse attention and multimodal encoder-decoders in DTG and providing appropriate benchmarks for future work. To automatically evaluate the statistical correctness of generated summaries, we also extend prior work on automatic relation extraction and build an updated pipeline that incorporates low amounts of human-annotated data which are quickly inflated via data augmentation. By formulating this in a ”text-to-text” fashion, we are able to take advantage of LLMs and achieve significantly higher precision and recall than previous methods while tracking three times the number of unique relations. Our updated models are more consistent and reliable by incorporating human-verified data partitions into the training and evaluation process.
Measurement System Assessment Studies for Multivariate and Functional Data
(University of Waterloo, 2024-04-15) Lashkari, Banafsheh; Chenouri, Shoja'eddin
A measurement system analysis involves understanding and quantifying the variability in measurement data attributed to the measurement system. A primary goal of such analyses is to assess the measurement system's impact on the overall variability of the data, determining its suitability for the intended purpose. While there are established methods for evaluating measurement systems for a single variable, their applicability is limited when dealing with other data types, such as multivariate and functional data. This thesis addresses a critical gap in the literature concerning the assessment of measurement systems when dealing with multivariate and functional observations. The primary objective is to enhance the understanding of measurement system assessment studies, particularly focusing on multivariate measurements and extending to functional data measurements. Chapter 1 serves as an introduction. We review several statistical properties and parameters for assessing the measurement systems. This chapter includes some real-world examples of measurement system assessment problems for multivariate and functional data and elaborates on the challenges involved. We also outline the contents that will be explored in the subsequent chapters. While the literature on measurement system analysis in multivariate and functional data domains is limited, there is also a notable absence of a systematic theoretical investigation for univariate methods. In Chapter 2, we address this gap by conducting a thorough theoretical examination of measurement system assessment estimators for univariate data. The chapter explores various estimation methods for estimating variance components and other essential parameters crucial for measurement system analysis. We provide a comprehensive scrutiny of the statistical properties of these estimators. This foundational understanding serves as the basis for subsequent exploration into the more intricate domains of multivariate and functional data. In Chapter 3, we extend the scope of measurement system assessment to include multivariate data. This chapter involves adapting the definitions of measurement system assessment parameters to multivariate settings. We employ transformations that yield summary scalar measures for variance-covariance matrices, with a specific focus on the determinant, trace, and Frobenius norm of the variance-covariance matrix components. Building upon the statistical concepts and properties discussed in Chapter 2, we conduct a targeted review of existing theories related to variance-covariance component estimation. A key emphasis is placed on the statistical properties of estimators introduced for one of the parameters in measurement system assessment—the signal-to-noise ratio. Our investigation includes an exploration of its convergence properties and the construction of approximate confidence intervals. Additionally, we conduct a comparative analysis of the application of three transformations, namely, the determinant, the trace, and the Frobenius norm, based upon their asymptotic properties. In Chapter 4, our exploration takes a significant step forward as we establish a framework for assessing measurement systems tailored to functional data types. This involves extending the definition of parameters used in the evaluation of measurement systems for univariate data by applying bounded operators on covariance kernels. To estimate the measurement system assessment parameters, we first provide methods to estimate the covariance kernel components. Initially, we explore a classical estimation approach without smoothing. Subsequently, we leverage specialized tools in functional data analysis, within the framework of reproducing kernel Hilbert space (RKHS), to obtain smooth estimates of the covariance kernel components. The fifth chapter is devoted to a case study application, where we apply the developed framework to a real-world functional dataset. Specifically, we analyze the surface roughness of printed products in the context of additive manufacturing. The comprehensive analysis in Chapter 5 employs statistical methods for univariate and multivariate data types and techniques from functional data analysis. We are in the process of converting the materials in Chapters 2, 3, and 4 to three separate articles for submission.
Robustness in Dimensionality Reduction
(University of Waterloo, 2016-04-14) Liang, Jiaxi; Small, Christopher; Chenouri, Shoja'eddin
Dimensionality reduction is widely used in many statistical applications, such as image analysis, microarray analysis, or text mining. This thesis focuses on three problems that relate to the robustness in dimension reduction. The first topic is the performance analysis in dimension reduction, that is, quantitatively assessing the performance of a algorithm on a given dataset. A criterion for success is established from the geometric point of view to address this issues. A family of goodness measures, called \textsl{local rank correlation}, is developed to assess the performance of dimensionality reduction methods. The potential application of the local rank correlation in selecting tuning parameters of dimension reduction algorithms is also explored. The second topic is the sensitivity analysis in dimension reduction. Two types of influence functions are developed as measures of robustness, based on which we develop graphical display strategies for visualizing the robustness of a dimension reduction method, and flagging potential outliers. In the third part of the thesis, a novel robust PCA framework, called \textsl{Performance-Weighted Bagging PCA}, is proposed from the perspective of model averaging. It obtains a robust linear subspace by weighted averaging a collection of subspaces produced by subsamples. The robustness against outliers is achieved by a proper weighting scheme, and possible choices of weighting scheme are investigated.
Statistical Learning and Stochastic Process for Robust Predictive Control of Vehicle Suspension Systems
(University of Waterloo, 2017-09-12) Mozaffari, Ahmad; Chenouri, Shoja'eddin
Predictive controllers play an important role in today's industry because of their capability of verifying optimum control signals for nonlinear systems in a real-time fashion. Due to their mathematical properties, such controllers are best suited for control problems with constraints. Also, these interesting controllers can be equipped with different types of optimization and learning modules. The main goal of this thesis is to explore the potential of predictive controllers for a challenging automotive problem, known as active vehicle suspension control. In this context, it is intended to explore both modeling and optimization modules using different statistical methodologies ranging from statistical learning to random process control. Among the variants of predictive controllers, learning-based model predictive controller (LBMPC) is becoming more and more interesting to the researchers of control society due to its structural flexibility and optimal performance. The current investigation will contribute to the improvement of LBMPC by adopting different statistical learning strategies and forecasting methods to improve the efficiency and robustness of learning performed in LBMPC. Also, advanced probabilistic tools such as reinforcement learning, absorbing state stochastic process, graphical modelling, and bootstrapping are used to quantify different sources of uncertainty which can affect the performance of the LBMPC when it is used for vehicle suspension control. Moreover, a comparative study is conducted using gradient-based as well as deterministic and stochastic direct search optimization algorithms for calculating the optimal control commands. By combining the well-established control and statistical theories, a novel variant of LBMPC is developed which not only affords stability and robustness, but also surpasses a wide range of conventional controllers for the vehicle suspension control problem. The findings of the current investigation can be interesting to the researchers of automotive industry (in particular those interested in automotive control), as several open issues regarding the potential of statistical tools for improving the performance of controllers for vehicle suspension problem are addressed.

Browse

Browsing Statistics and Actuarial Science by Author "Chenouri, Shoja'eddin"