The Libraries will be performing routine maintenance on UWSpace on October 13th, 2025, from 8 - 9 am ET. UWSpace will be unavailable during this time. Service should resume by 9 am ET.
 

Quantifying and Mitigating Uncertainty in Crash Risk Prediction for Road Safety Analysis

Loading...
Thumbnail Image

Date

2025-09-17

Advisor

Fu, Liping

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Road safety analysis is a cornerstone of traffic safety management programs like Vision Zero, which aim to eliminate fatalities and serious injuries on roadways. Central to road safety analysis is the ability to accurately predict crash risk; however, this task is challenged by significant uncertainty arising from the random nature of crashes (aleatoric uncertainty) and limitations in data and modeling (epistemic uncertainty). These uncertainties can lead to the misidentification of hazardous locations, resulting in false positives and negatives, and the inefficient allocation of limited safety resources. While numerous statistical models exist for risk prediction, most traditional crash-based approaches provide simple point estimates, failing to formally quantify the inherent uncertainty in their predictions. Proactive conflict-based analysis has emerged as a promising alternative that avoids direct reliance on sparse crash data, but its application introduces new methodological challenges. The reliability of conflict-based predictions is not well understood, and key methodological choices, such as the duration of data collection and the selection of analytical thresholds for Extreme Value Theory (EVT) models, introduce significant, often unaddressed, uncertainty into the results. To overcome these challenges, this thesis systematically develops and evaluates a framework to quantify, investigate, and reduce critical sources of uncertainty in road safety analysis. First, to quantify the impact of uncertainty on network screening, a frequentist approach is employed to establish a joint confidence region (CR) for hotspot rankings, moving beyond simple point estimates. This is achieved by first estimating the confidence interval (CI) of risk for each location using a hierarchical Full Bayesian (FB) model that considers both crash frequency and severity. Second, this research investigates a primary source of data uncertainty in conflict-based analysis by systematically assessing the relationship between sample size and prediction reliability using a unique, year-long LiDAR dataset and a Bayesian Peak-Over-Threshold (POT) EVT model. Third, to address methodological uncertainty in EVT, an automated and objective approach for threshold selection is developed and validated, comparing a Sequential Goodness-of-Fit Selection Method (SGFSM) with an Automatic L-moment Ratio Selection Method (ALRSM) to reduce analytical subjectivity. The analysis demonstrates that explicitly accounting for uncertainty can lead to substantially different hotspot identifications, revealing that rankings based on point estimates alone may be unreliable. The sample size analysis reveals that the common practice of using short-term conflict data is inadequate for reliable collision predictions, a finding that challenges the validity of a significant portion of the existing literature on conflict-based safety analysis. Finally, the automated threshold selection approach, particularly the L-moment-based approach, proves to be a robust and objective method that improves the accuracy of crash risk estimation. Collectively, this research provides researchers and practitioners with an evidence-based methodology to understand, quantify, and mitigate key uncertainties in road safety analysis, fostering more reliable safety assessments and a more effective allocation of resources.

Description

Keywords

road safety, traffic safety, crash risk prediction, uncertainty quantification, aleatoric uncertainty, epistemic uncertainty, hotspot identification, network screening, conflict-based analysis, surrogate safety measures, Extreme Value Theory (EVT), Peak-over-Threshold (POT), Generalized Pareto (GP) distribution, threshold selection, automated threshold selection, L-moment ratio, sample size analysis, data collection duration, LiDAR, Bayesian modeling, Full Bayesian (FB), hierarchical Bayesian model, joint confidence region, hotspot ranking, Post-Encroachment Time (PET)

LC Subject Headings

Citation