Advances in similarity-based prediction modeling
| dc.contributor.author | Kim, Minzee | |
| dc.date.accessioned | 2026-04-28T13:03:01Z | |
| dc.date.available | 2026-04-28T13:03:01Z | |
| dc.date.issued | 2026-04-28 | |
| dc.date.submitted | 2026-04-21 | |
| dc.description.abstract | Personalized predictive modeling has been growing rapidly in recent years, especially with the availability of Electronic Health Records (EHRs). This approach aims to improve a model's predictive performance by fitting a unique model to each individual. We train the model on a subset of the training data consisting of individuals that are similar to the individual we are predicting for, identified through some patient similarity metric. Studies have shown that using a personalized model trained on a customized subset of the data leads to better prediction than using a global model trained on all the available data in the training data. In this thesis, we discuss advancements in similarity-based prediction modeling through extensive simulation studies and data analyses. Longitudinal and time-to-event data are often analyzed in biomarker research to study the association between the longitudinal biomarker measurements and the event-time outcome, in which the longitudinal information contributes to the probability of the outcome of interest. An attractive feature of fitting a joint model on this type of data is that we can dynamically predict the survival probability as additional longitudinal information becomes available. In Chapter 2, we propose a new similarity-based method for the dynamic prediction of joint models where we consider training the model on only a targeted subset of the data to obtain an improved outcome prediction. Through a comprehensive simulation study and an application to intensive care unit data on patients diagnosed with sepsis, we demonstrate that the predictive performance of the dynamic prediction of joint models can be improved with our proposed similarity-based approach. Next, we develop a new patient similarity metric designed to improve the predictive performance of a personalized model for binary response data. Specifically, we introduce a weighted cosine similarity metric in Chapter 3 that extends the standard cosine similarity metric by assigning predictor-specific weights when computing similarity between participants. These weights are estimated using the relaxed adaptive group lasso. Results from our simulation study and an analysis of intensive care unit data involving patients with circulatory system disease show that although the proposed similarity metric leads to a slight deterioration in calibration, it produces substantial gains in discrimination. Overall predictive performance measured by the Brier Score improves because the increase in discrimination outweighs the loss in calibration; therefore, our proposed similarity metric more effectively identifies clinically similar patients, resulting in improved predictive accuracy. Finally, in Chapter 4, we conduct a comprehensive comparison of several similarity metrics to investigate how the choice of similarity metric influences predictive performance in personalized modeling, again in the context of binary response data. By fitting models using only a subset of training participants who are most similar to the individual of interest, prediction accuracy for that individual can be improved. Consequently, selecting an appropriate similarity metric that identifies the most relevant subset of data is critical. We compare a range of distance-based and cosine similarity measures alongside clustering-based approaches, an area that is not well explored in the existing literature. In addition, we perform an extensive simulation study to examine how different data-generating mechanisms and underlying dataset characteristics affect the relative effectiveness of each similarity metric. Finally, we end with a discussion chapter that summarizes the key contributions of the thesis along with highlighting some key areas of future work. | |
| dc.identifier.uri | https://hdl.handle.net/10012/23072 | |
| dc.language.iso | en | |
| dc.pending | false | |
| dc.publisher | University of Waterloo | en |
| dc.subject | personalized prediction | |
| dc.subject | precision medicine | |
| dc.subject | joint modeling | |
| dc.subject | similarity metric | |
| dc.subject | cosine similarity | |
| dc.subject | clustering | |
| dc.subject | similarity-based modeling | |
| dc.title | Advances in similarity-based prediction modeling | |
| dc.type | Doctoral Thesis | |
| uws-etd.degree | Doctor of Philosophy | |
| uws-etd.degree.department | Statistics and Actuarial Science | |
| uws-etd.degree.discipline | Statistics | |
| uws-etd.degree.grantor | University of Waterloo | en |
| uws-etd.embargo.terms | 4 months | |
| uws.contributor.advisor | Dubin, Joel | |
| uws.contributor.affiliation1 | Faculty of Mathematics | |
| uws.peerReviewStatus | Unreviewed | en |
| uws.published.city | Waterloo | en |
| uws.published.country | Canada | en |
| uws.published.province | Ontario | en |
| uws.scholarLevel | Graduate | en |
| uws.typeOfResource | Text | en |