Mixture Models for Coarsened Multivariate Failure Time Data

Cook, RichardJiang, Shu2018-08-132018-12-122018-08-132018-08-08http://hdl.handle.net/10012/13572The aim of this thesis is to develop statistical methodology for the analysis of life history data under incomplete observation schemes and with latent features which must be accom- modated to ensure models provide a reasonable representation of the processes of interest and advance scientific understanding. Life history data frequently arise in health studies of disease processes in which indi- viduals pass through a series of stages of disease. Multistate models offer an appealing approach to modelling processes in settings where the stages can be meaningfully char- acterized into a finite number of disjoint stages and we adopt such models for much of the research in this thesis. In many instances, because processes are only observed in- termittently, the precise number, types and times of transitions between assessments are not available. For failure time processes at most a single transition can occur between assessments and the resulting data are called interval-censored failure time data. For more general multistate processes it is more generally called a panel data observation scheme. We investigate problems related to interval-censored data throughout this thesis, and con- sider a more extreme form of incomplete data due to aggregation. The term coarsened data is used to unify these settings. Despite careful attempts to collect and exploit available information to characterize the dynamic features of life history processes, substantial unexplained variability often exists between individuals or groups of individuals. Heterogeneity can be accommodated in various ways. Finite mixture models can be specified to accommodates distinct classes, or sub-populations, in which different disease processes govern progression in the different classes; latent class models are often used when class membership is fixed. When there are two classes and no disease progression occurs in one class, so-called cure rate models are often used. Classical mixture models with continuous random effect models are also often used to account for heterogeneity which can be characterized by a more finely distinguished nature of unexplained variation. This approach is often used in frailty models for survival data or more generally accommodating between cluster variation in clustered data. In this thesis, the focus is on methods for statistical modeling and inference for mul- tivariate failure time and multistate processes subject to intermittent observation; the resulting data are interval-censored multivariate failure time data and panel data respec- tively. Finite mixture models offer a powerful approach for accommodating heterogeneity when there are distinct types of processes present in a population with latent sub-populations following one of such processes. Methods for fitting finite mixture models and conduct- ing score tests for genetic markers are developed in Chapter 2 for a problem involving heterogeneous multistate processes under intermittent observation. When there are multiple marginal processes of interest, the correlation between such processes must be taken into account. In Chapter 3 we develop multivariate models for the joint analysis of marginal processes. Copula models are popular for modeling the correlation between marginal failure time processes, while odds ratios are commonly used to capture the association between binary variables. Through the use of multivariate mixture models the dependence structure can be decomposed into one for susceptibility and one for the failure times given joint susceptibility. Mixed multistate processes involving aggregate data are developed in Chapter 4 and 5. The computational challenges are addressed through the use of composite likelihood. We deal with between-cluster variation/within-cluster correlation in both chapters and propose two approaches to deal with such data. Specifically, we propose a marginal approach where we introduce dependence modeling via copulas, propose a composite likelihood and derive procedure for inference. A random effect model is also formulated in which a cluster-level latent variable accommodates heterogeneity between clusters. An optimal cost-effective design is also proposed which gives insights regarding the efficiency of studies involving aggregation and tracking. In Chapter 5, sample size criteria are developed to meet design objectives and cost-effective optimal allocations of clusters to the tracking and aggregate observation schemes are developed.enMixture Models for Coarsened Multivariate Failure Time DataDoctoral Thesis