Statistical Models and Methods for Dependent Life History Processes

Lee, Jooyoung

Statistical Models and Methods for Dependent Life History Processes

Files

Lee_Jooyoung.pdf (2.67 MB)

Date

2018-11-28

Authors

Lee, Jooyoung

Advisor

Cook, Richard

Publisher

University of Waterloo

Abstract

This thesis deals with statistical issues in the analysis of complex life history processes which have characteristics of heterogeneity and dependence. We are motivated, in this thesis, by three specific types of processes; i) processes featuring recurrent episodic conditions ii) multi-type recurrent events, and iii) clustered multi state processes as arise in family studies. In chronic diseases featuring recurrent episodic conditions, symptom onset is followed by a period during which symptoms are present until recovery. In the analysis of data from such processes, analysis is often based only on the recurrent onset of disease, ignoring the duration of symptoms. This loss of information may lead to incorrect conclusions in the analysis of this data. In Chapter 2, we propose a novel model for an alternating two-state process including symptom-free state and symptomatic state to recognize the duration of symptoms. This approach reflects the dynamics of individual's disease process and helps to understand a course of disease. Intensity-based models with multiplicative random effects are considered where the disease onset time is governed by a conditionally Markov intensity and the time of recovery is governed by a conditionally semi-Markov intensity. A bivariate random effect with one multiplicative component for each intensity is introduced to accommodate between-individual heterogeneity and a dependence between bivariate random effect variables offers a natural and more general framework for modeling the two state process. A copula function is used for the joint distribution of random effects which retains the marginal features and gives flexible choices of dependence structure. The proposed model is a semiparametric model for which estimation is carried out using an expectation-maximization algorithm. The aforementioned problem leads us to investigate the impact of ignoring symptom duration in a randomized trial setting. In Chapter 3, we define two risk sets for recurrent event analyses: one involves including individuals during their symptomatic period, and the other excluding individuals from the risk set during symptomatic periods. In a clinical trial, the balance between treatment groups in unmeasured confounders present at the time of randomization can be lost following randomization as the risk set changes, thus, retaining individuals in the risk set is a common approach. Here we examine asymptotic and empirical biases of estimators from the rate-based models when two different risk sets are applied. We assume that the true underlying process is an alternating two-state process where the true risk set is the one that excludes individuals when they are experiencing an exacerbation. We consider two scenarios of the true model. First, there is no between-variation for each process and no dependence between two processes. The second scenario is to use the proposed dependent alternating two-states model in Chapter 2. Issues of model misspecification and causal inference are considered. When focus is on clinical trials, power implications of risk set misspecification is of interest. In Chapter 4, attention is directed at multiple recurrent events where each endpoint is of interest. The use of composite endpoint which is the time point of the first event of any type is a simple way to analyse such data. However, when multiple events are of comparable importance, use of a composite endpoint analysis may not be suitable. We propose a copula-based model for multi-type recurrent events where each type of recurrent event process arises from a mixed-Poisson model and random effects linking the events through a copula function. When more than two types of events are considered, composite likelihood is adopted to ease the computational burden, and simultaneous and two-stage estimation are explored. An aim of family studies is typically to gain knowledge about factors governing the inheritance of diseases. One may be interested in examining a dependence of disease onset between family members, and in identifying genetic markers associated with heritable disease. A common procedure is to collect families is through probands in which such affected individuals are selected from a disease registry and their family members (non-probands) are, then, recruited for examination. This approach to sampling families motivates us to consider the disease onset process along with survival since the proband must be diseased and alive to be recruited, and family members may need to be alive. In Chapter 5, we propose a model for a clustered illness-death process for family studies which accounts for the semi-competing risks problem for disease onset as well as biased sampling. We model within-family association in the age of disease onset via a copula function and applied to the possibly latent disease onset time and incorporate survival through a marginal illness-death model. The ascertainment condition is reflected in the likelihood or composite likelihood construction. Two study designs regarding the recruitment of family members are considered. One involves the collection of disease history from family members via the proband or medical records. The other requires family members to undergo a medical examination in which case they must be alive at the time of the family study. Family data alone are insufficient to estimate all of the parameters of the illness-death processes. We therefore make use of auxiliary data including the population mortality data and additional registry data to address the estimatability issue. Another source of auxiliary data is current status survey. The issue of missing genetic markers is also addressed in each study design.