Bayesian Nonparametric Dirichlet Process Mixture Modeling in Transportation Safety Studies
MetadataShow full item record
In transportation safety studies, it is often necessary to account for unobserved heterogeneity and multimodality in data. The commonly used standard generalized linear models (e.g., Poisson-gamma models) do not fully address unobserved heterogeneity, assuming unimodal exponential families of distributions. This thesis illustrates how restrictive assumptions (e.g., unimodality) common to most road safety studies can be relaxed employing Bayesian nonparametric Dirichlet process mixture models. We use a truncated Dirichlet process, so that our models reduce to the form of finite mixture (latent class) models, which can be estimated employing standard Markov chain Monte Carlo methods, emphasizing computational simplicity. Interestingly, our approach estimates the number of latent subpopulations as part of its analysis algorithm using an elegant mathematical framework. We use pseudo Bayes factors for model selection, showing how the predictive capability of models can be affected by different assumptions. In univariate settings, we extend standard generalized linear models to a Dirichlet process mixture generalized linear model in which the random intercepts density is modeled nonparametrically, thereby adding flexibility to the model. We examine the performance of the proposed approach using both simulated and real data. We also examine the performance of the proposed model in terms of replicating datasets with high proportions of zero crashes. In terms of engineering insights, we provide a policy example related to the identification of high-crash locations, a critical component of the transportation safety management process. With respect to multilevel settings, this thesis introduces a flexible latent class multilevel model for analyzing crash data that are of hierarchical nature. We extend the standard multilevel model by accounting for unobserved cross-group heterogeneity through multimodal intercepts (group effects). The proposed method allows identifying latent subpopulations (and consequently outliers) at the highest level of the hierarchy (e.g., geographic areas). We evaluate our method on two recent railway grade crossing crash datasets from Canada. This research confirms the need for a multilevel approach for both datasets due to the presence of spatial dependencies among crossings nested within the same region. We provide a novel approach to benchmark different regions based on their safety performance measures. To this end, we identify latent clusters among different regions that share similar unidentified features, stimulating further investigations to explore reasons behind such similarities and dissimilarities. This could have important policy implications for various safety management programs. This thesis also investigates inference for multivariate crash data by introducing two flexible Bayesian multivariate models: a multivariate mixture of points and a mixture of multivariate normal densities. We use a Dirichlet process mixture to keep the dependence structure unconstrained, relaxing the usual homogeneity assumptions. We allow for interdependence between outcomes through a Dirichlet process prior on the random intercepts density. The resulting models collapse into a form of latent class multivariate model, an appealing way to address unobserved heterogeneity in multivariate settings. Therefore, the multivariate models that we derive in this thesis account for correlation among crash types through a heterogeneous correlation structure, which better captures the complex structure of correlated data. To our knowledge, this is the first study to propose and apply such a model in the transportation literature. Using a highway injury-severity dataset, we illustrate how the robustness to homogeneous correlation structures can be examined using a multivariate mixture of points model that relaxes the homogeneity assumption with respect to the location of the dependence structure. We then use the mixture of multivariate normal densities model‒relaxing the homogeneity assumption with respect to both the location and the covariance matrix‒to investigate the effects of various factors on pedestrian and cyclist safety in an urban setting, modeling both outcomes simultaneously. To our knowledge, this is the first study to conduct a joint safety analysis of active modes at an intersection level, a micro-level, which is expected to provide more detailed insights. We show how spurious assumptions affect predictive performance of the multivariate model and the interpretation of the explanatory variables using marginal effects. The results show that our flexible model specification better captures the underlying structure of pedestrian/cyclist crash data, resulting in a more accurate model that contributes to a better understanding of safety correlates of non-motorist road users. This in turn helps decision-makers in selecting more appropriate countermeasures targeting vulnerable road users, promoting the mobility and safety of active modes of transportation.
Cite this version of the work
Shahram Heydari (2017). Bayesian Nonparametric Dirichlet Process Mixture Modeling in Transportation Safety Studies. UWSpace. http://hdl.handle.net/10012/12118