Nonparametric Methods for Road Safety Analysis

Thakali, Lalita

Nonparametric Methods for Road Safety Analysis

Files

Thakali_Lalita.pdf (2.44 MB)

Date

2016-08-16

Authors

Thakali, Lalita

Advisor

Fu, Liping
Chen, Tao

Publisher

University of Waterloo

Abstract

Crash models for predicting long-term crash risk at some specific components of a road network are fundamental to road safety analyses such as network screening and countermeasure studies. These models are often calibrated using historical crash data from the sites of interest, aiming at capturing the underlying relationship between crash risk and various risk factors. Based on how the relationships are determined, crash models can be classified into two types: parametric or nonparametric. Parametric models represent the state of the art and practice methodology for road safety analyses. While this approach provides an easy-to-implement and easy-to-interpret tool, they come at the cost of the need for pre-selection of model forms, which, without knowing the true relation of crash and risk factors, could easily lead to misspecifications and biased estimations. In contrast, a nonparametric approach does not pre-specify a model structure but instead determines the structure from data, thereby providing greater flexibility to capture underlying complex relations. Despite this advantage of being a specification free approach, nonparametric models have not yet been accepted as part of the mainstream methodologies for road safety analyses. Little were known about their relative performance in comparison to parametric models and the practical implications of their applications for the common road safety analysis tasks such as network screening and countermeasure effectiveness estimation. Furthermore, crash data for road safety analysis and modeling are growing steadily in size and completeness with the advancement in information and sensor technologies. It is, however, unclear what implications this increased data availability has for road safety analyses in general and crash modeling in specific. Will a data-driven nonparametric technique become a more attractive alternative for addressing the complex problem of crash modeling in this era of Big Data? In this thesis, we have introduced one of the most popular nonparametric techniques - kernel regression (KR) - as an alternative for crash modeling. One of the uniqueness of this method is that it takes a fully data-driven approach in determining the relationship between crash frequency and risk factors. Compared to other nonparametric methods, it does not contain any hidden structures to train. Therefore, when a new crash dataset is available, it can be used directly in updating crash prediction without re-calibrating the underlying models. We made two methodological contributions to facilitate the application of a nonparametric model for road safety analyses. We first extended the KR method, similar to Empirical Bayesian (EB) method using parametric models, to account for the site-specific crash history in predicting risk. We then developed a bootstrap-based algorithm for identifying the important variables to be included in a nonparametric model. The research also made significant knowledge contributions to the practice field related to applications of nonparametric models for road safety analyses. First, we benchmarked the crash prediction performance of the KR model against the mainstream model – Negative Binomial (NB) model. Using three large crash datasets, we investigated the performance of the KR and NB models as a function of the amount of training data. Through a rigorous bootstrapping validation process, we found that the two approaches exhibit strikingly different patterns, especially in terms of sensitivity to data size. While the performance of the KR method improved significantly with increase in data size, the NB model showed less sensitivity. Meanwhile, the KR method outperformed the NB model in terms of predictive performance, and that performance advantage increased noticeably by data size. Secondly, we compared the two approaches in their ability to capture the underlying complex relationships between crash frequency and predicting variables. The KR method was shown to yield more sensible results on the effects of various risk factors in both case studies as compared to the NB model. Our other main contribution comes from the investigation on the practical implications of applying the KR models for two critical road safety analyses tasks – network screening and countermeasure study. Both KR method and NB model were employed in a case study under the two popular network screening frameworks, i.e., regression-based and EB-based. Their performances were compared in terms of site ranking and identification of crash hotspots. The two approaches were found to yield more similar rankings when applied in the EB-based framework, irrespective of the ranking measures (i.e., crash frequency or crash rate), than in the regression-based framework. Similar comparative results were obtained in locating the crash hotspots. Likewise, for countermeasure studies, the two popular approaches – the before-after EB study and the cross-sectional study – were considered in case studies using both KR and NB crash prediction models. As expected, the two different crash modeling techniques showed significant differences in their estimates on crash modification factors (CMF). Different from the NB model based approach, the KR-based method was able to capture the sensitivity of CMFs to traffic levels as well as combine the effect of multiple countermeasures without requiring any assumptions on the interaction between the countermeasures.