Statistics and Actuarial Science
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9934
This is the collection for the University of Waterloo's Department of Statistics and Actuarial Science.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Browsing Statistics and Actuarial Science by Author "Cotton, Cecilia"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Causal Inference with Covariate Balance Optimization(University of Waterloo, 2018-12-04) Xie, Yuying; Zhu, Yeying; Cotton, CeciliaCausal inference is a popular problem in biostatistics, economics, and health science studies. The goal of this thesis is to develop new methods for the estimation of causal effects using propensity scores or inverse probability weights where weights are chosen in such a way to achieve balance in covariates across the treatment groups. In Chapter 1, we introduce Neyman-Rubin Causal framework and causal inference with propensity scores. The importance of covariate balancing in causal inference is furthered discussed in this chapter. Besides, some general definitions and notations for causal inference are provided with many other popular propensity score approaches or weighting techniques in Chapter 2. In Chapter 3, we describe a new model averaging approach to propensity score estimation in which parametric and nonparametric estimates are combined to achieve covariate balance. Simulation studies are conducted across different scenarios varying in the degree of interactions and nonlinearity in the treatment model. The results show that the proposed method produces less bias and smaller standard errors than existing approaches. They also show that a model averaging approach with the objective of minimizing the average Kolmogorov-Smirnov statistic leads to the best performance. The proposed approach is applied to a real data set in evaluating the causal effect of formula or mixed feeding versus exclusive breastfeeding in the first month of life on a child's BMI Z-score at age 4. The data analysis shows that formula or mixed feeding is more likely to lead to obesity at age 4, compared to exclusive breastfeeding. In Chapter 4, we propose using kernel distance to measure balance across different treatment groups and propose a new propensity score estimator by setting the kernel distance to be zero. Compared to other balance measures, such as absolute standardized mean difference (ASMD) and Kolmogorov Smirnov (KS) statistic, kernel distance is one of the best bias indicators in estimating the causal effect. That is, the balance metric based on kernel distance is shown to have the strongest correlation with the absolute bias in estimating the causal effect, compared to several commonly used balance metrics. The kernel distance constraints are solved by generalized method of moments. Simulation studies are conducted across different scenarios varying in the degree of nonlinearity in both the propensity score model and outcome model. The proposed approach produces smaller mean squared error in estimating causal treatment effects than many existing approaches including the well-known covariate balance propensity score (CBPS) approach when the propensity score model is misspecified. An application to data from the International Tobacco Control (ITC) policy evaluation project is provided. Often interest lies in the estimation of quantiles other than the average causal effect. Other quantities such as quantiles or the quantile treatment effect may be of interest. In Chapter 5, we propose a multiply robust method for estimating marginal quantiles of potential outcomes by achieving mean balance in (1) the propensity score, and (2) the conditional distributions of potential outcomes. An empirical likelihood or entropy measure can be utilized instead of using inverse probability weighting. Simulation studies are conducted across different scenarios of correctness in both the propensity score models and outcome models. Our estimator is consistent if any of the models are correctly specified.Item Causal Inference with Recurrent Data via Propensity Score Methods(University of Waterloo, 2019-01-07) Liang, Haodi; Cotton, CeciliaPropensity score methods are increasingly being used to reduce estimation bias of treatment effects for observational studies. Previous research has shown that propensity score methods consistently estimate the marginal hazard ratio for time to event data. However, recurrent data frequently arise in the biomedical literature and there is a paucity of research into the use of propensity score methods when data are recurrent in nature. The objective of my thesis is to extend the existing propensity score methods to recurrent data setting. We review current propensity score methods for estimating treatment effects when the outcome is a single time to event. Then we propose a new class of inverse probability treatment weighting (IPTW) estimators to estimate treatment effects for recurrent data. We illustrate our methods through both estimating equation theory and a series of Monte Carlo simulations. The simulation results indicate that when there is no censoring, the newly proposed IPTW estimators allow us to consistently estimate the marginal hazard ratio for each event. Under administrative censoring regime, the stabilized IPTW estimator consistently estimates the marginal hazard ratio while the conventional IPTW estimator yields significant bias, especially when the proportion of subjects being censored is high. For variance estimation, we incorporate the robust variance estimator and the bootstrap variance estimator to deal with the within-subject correlation induced by weighting. In addition, we apply our methods to a real life example. We note that although the Cox proportional hazards model we used for estimating the marginal hazard ratio may be subject to misspecification, the estimate still converges and has meaningful interpretations.Item Estimation Methods with Recurrent Causal Events(University of Waterloo, 2024-08-21) Zhang, Wenling; Cotton, Cecilia; Wen, LanThis dissertation presents a comprehensive exploration on causal effects of treatment strategies on recurrent events within complex longitudinal settings. Utilizing a series of advanced statistical methodologies, this work focuses on addressing challenges in causal inference when faced with the complexities related to various treatment strategies, recurrent outcomes and time-varying covariates that are confounded or censored. The first chapter lays the groundwork by introducing two real-life datasets that provide a practical context for investigating recurrent causal events. In this chapter, we establish the foundation of essential concepts and terminologies. An overview of conventional causal estimands and various estimation methods in non-recurrent event settings is described, providing the necessary tools and knowledge base for effective causal analysis in more intricate longitudinal studies with recurrent event outcomes discussed in subsequent chapters. Chapter two extends the traditional time-fixed measure of marginal odds ratios (MORs) to a more complex, causal longitudinal setting. The novel Aggregated Marginal Odds Ratio (AMOR) is introduced to manage scenarios where treatment varies in time and outcome also recurs. Through Monte Carlo simulations, we demonstrate that AMOR can be estimated with low bias and stable variance, when employing appropriate stabilized weight models, for both absorbing and non-absorbing treatment settings. With the 1997 National Longitudinal Study of Youth dataset, we investigate the causal effect of youth smoking on their recurrent enrollment and dropout from school, with the proposed AMOR estimator. In the third chapter, the focus shifts to the causal effect of static treatment on recurrent event outcomes with time-varying covariates. We derive the identifying assumptions and employ a variety of estimators for the average causal effect estimation, addressing the issues of time-varying confounding and censoring. We conduct simulations to verify the robustness of these methods against potential model misspecifications. Among the proposed estimators, we conclude that the targeted maximum likelihood (TML) estimator is the appropriate one for complex longitudinal settings. Therefore, we implement targeted maximum likelihood estimation to the Systolic Blood Pressure Intervention Trial (SPRINT) dataset. Adopting an intention-to-treat analysis, we estimate the average causal effect of intensive versus standard blood pressure lowering therapy on acute kidney injury recurrences for participants surviving the first four years of SPRINT. Chapter four further investigates the average causal effect of time-varying treatments on the recurrence outcome of interest with censoring. Building on the methodologies in Chapter \ref{ch:tmle1_tf}, this chapter explores the singly and doubly robust estimators, especially the TML estimator, in the time-varying treatment context. Then simulation studies are conducted to support the theoretical derivations and validate the robustness of the estimators. The application of the proposed methods on the SPRINT yields some insightful findings. By incorporating participants' medication adherence levels over time as part of the treatment, we are able to investigate various adherence-related questions, and shifting from intention-to-treat to per-protocol analysis for causal effects estimation comparing the intensive versus standard blood pressure therapies. The dissertation concludes with a summary of the main findings and a discussion of significant and promising areas for future research in Chapter five. The studies conducted demonstrate the potential of advanced causal inference methods in handling the complexities of longitudinal data in medical and social research, offering valuable insights into how treatment strategies affect the recurrent causal outcomes over time. This work not only contributes to the theoretical advancements in statistical methodologies but also provides practical implications for the analysis of clinical trials and observational studies involving recurrent events.Item Event History Analysis in Longitudinal Cohort Studies with Intermittent Inspection Times(University of Waterloo, 2016-01-20) Zhu, Yayuan; Lawless, Jerald; Cotton, CeciliaEvent history studies based on disease clinic data often face several complications. Specifically, patients visit the clinic irregularly, and the intermittent inspection times depend on the history of disease-related variables; this can cause event or failure times to be dependently interval-censored. Furthermore, failure times could be truncated, treatment assignment is non-randomized and can be confounded, and there are competing risks of the failure time outcomes under study. I propose a class of inverse probability weights applied to estimating functions so that the informative inspection scheme and confounded treatment are appropriately dealt with. As a result, the distribution of failure time outcomes can be consistently estimated. I consider parametric, non- and semi-parametric estimation. Monotone smoothing techniques are employed in a two-stage estimation procedure for the non- or semi-parametric estimation. Simulations for a variety of failure time models are conducted for examining the finite sample performances of proposed estimators. This research is initially motivated by the Psoriatic Arthritis (PsA) Toronto Cohort Study at the Toronto Western Hospital and the proposed methodologies are applied to this cohort study as an illustration.Item Marginal Causal Sub-Group Analysis with Incomplete Covariate Data(University of Waterloo, 2019-01-11) Cuerden, Meaghan; Cook, Richard; Cotton, Cecilia; Diao, LiqunIncomplete data arises frequently in health research studies designed to investigate the causal relationship between a treatment or exposure, and a response of interest. Statistical methods for conditional causal effect parameters in the setting of incomplete data have been developed, and we expand upon these methods for estimating marginal causal effect parameters. This thesis focuses on the estimation of marginal causal odds ratios, which are distinct from conditional causal odds ratios in logistic regression models; marginal causal odds ratios are frequently of interest in population studies. We introduce three methods for estimating the marginal causal odds ratio of a binary response for different levels of a subgroup variable, where the subgroup variable is incomplete. In each chapter, the subgroup variable, exposure variable and the response variable are binary and the subgroup variable is missing at random. In Chapter 2, we begin with an overview of inverse probability weighted methods for confounding in an observational setting where data are complete. We also briefly review methods to deal with incomplete data in a randomized setting. We then introduce a doubly inverse probability weighted estimating equation approach to estimate marginal causal odds ratios in an observational setting, where an important subgroup variable is incomplete. One inverse probability weight accounts for the incomplete data, and the other weight accounts for treatment selection. Only complete cases are included in the response model. Consistency results are derived, and a method to obtain estimates of the asymptotic standard error is introduced; the extra variability introduced by estimating two weights is incorporated in the estimation of the asymptotic standard error. We give a method for hypothesis testing and calculation of confidence intervals. Simulation studies show that the doubly weighted estimating equation approach is effective in a non-ignorable missingness setting with confounding, and it is straightforward to implement. It also performs well when the missing data process is ignorable, and/or when confounding is not present. In Chapter 3, we begin with an overview of an EM algorithm approach for estimating conditional causal effect parameters in the setting of incomplete covariate data, in both randomized and observational settings. We then propose the use of a doubly weighted EM-type algorithm approach to estimate the marginal causal odds ratio in the setting of missing subgroup data. In this method, instead of using complete case analysis in the response model, all available data is used and the incomplete subgroup variable is “filled in” using a maximum likelihood approach. Two inverse probability weights are used here as well, to account for confounding and incomplete data. The weight which accounts for the incomplete data is needed, even though an EM approach is being used, because the marginal causal odds ratio is of interest. A method to obtain asymptotic standard error estimates is given where the extra variability introduced by estimating the two inverse probability weights, as well as the variability introduced by estimating the conditional expectation of the incomplete subgroup variable, is incorporated. Simulation studies show that this method is effective in terms of obtaining consistent estimates of the parameters of interest; however it is difficult to implement, and in certain settings there is a loss of efficiency in comparison to the methods introduced in Chapter 2. In Chapter 4, we begin by reviewing multiple imputation methods in randomized and observational settings, where estimation of the conditional causal odds ratio is of interest. We then propose the use of multiple imputation with one inverse probability weight to account for confounding in an observational setting where the subgroup variable is incomplete. We discuss methods to correctly specify the imputation model in the setting where the conditional causal odds ratio is of interest, as well as in the setting where the marginal causal odds ratio is of interest. We use standard methods for combining the estimates of the marginal log odds ratios from each imputed dataset. We propose a method for estimating the asymptotic standard error of the estimates, which incorporates both the estimation of the parameters in the weight for confounding, and the multiply imputed datasets. We give a method for hypothesis testing and calculation of confidence intervals. Simulation studies show that this method is efficient and straightforward to implement, but correct specification of the imputation model is necessary. In Chapter 5, the three methods that have been introduced are used in an application to an observational cohort study of 418 colorectal cancer patients. We compare patients who received an experimental chemotherapy with patients who received standard chemotherapy; of interest is estimation of the marginal causal odds ratio of a thrombotic event during the course of treatment or 30 days after treatment is discontinued. The important subgroups are (i) patients receiving first line of treatment, and (ii) patients receiving second line of treatment. In Chapter 6, we compare and contrast the three methods proposed. We also discuss extensions to different response models, models for missing response data, and weighted models in the longitudinal data setting.