Life History Analysis with Response-Dependent Observation

Zhong, Yujie

Life History Analysis with Response-Dependent Observation

Files

Zhong_Yujie.pdf (939.52 KB)

Date

2015-05-14

Authors

Zhong, Yujie

Publisher

University of Waterloo

Abstract

This thesis deals with statistical issues in the analysis of dependent failure time data under complex observation schemes. These observation schemes may yield right-censored, interval-censored and current status data and may also involve response-dependent selection of individuals. The contexts in which these complications arise include family studies, clinical trials, and population studies. Chapter 2 is devoted to the development and study of statistical methods for family studies, motivated by work conducted in the Centre for Prognosis Studies in the Rheumatic Disease at the University of Toronto. Rheumatologists at this centre are interested in studying the nature of within-family dependence in the occurrence of psoriatic arthritis (PsA) to gain insight into the genetic basis for this disease. Families are sampled by selecting members from a clinical registry of PsA patients maintained at the centre and recruiting their respective consenting family members; the member of the registry leading to the sampling of the family is called the proband. Information on the disease onset time for non-probands may be collected by recall or a review of medical records, but some non-probands simply provide their disease status at the time of assessment. As a result family members may provide a combination of observed or right-censored onset times, and current status information. Gaussian copula-based models are studied as a means of flexibly characterizing the within-family association in disease onset times. Likelihood and composite likelihood procedures are also investigated where the latter, like the estimating function approach, reduces the need to specify high-order dependencies and computational burden. Valid analysis of this type of data must address the response-biased sampling scheme which renders at least one affected family member (proband) with a right-truncated onset time. This right-truncation scheme, combined with the low incidence of disease among non-probands, means there is little information about the marginal onset time distribution from the family data alone, so we exploit auxiliary data from an independent sample of independent individuals to enhance the information on the parameters in the marginal age of onset distribution. For composite likelihood approaches, we consider simultaneous and two-stage estimation procedures; the latter greatly simplified the computational burden, especially when weakly, semi- or non-parametric marginal models are adopted. The proposed models and methods are examined in simulation studies and are applied to data from the PsA family study yielding important insight regarding the parent of origin hypothesis. Cluster-randomized trials are employed when it is appropriate on ethical, practical, or contextual grounds to assign groups of individuals to receive one of two or more interventions to be compared. This design also offers a way of minimizing contamination across treatment groups and enhancing compliance. Although considerable attention has been directed at the development of sample size formulae for cluster-randomized trials with continuous or discrete outcomes, relatively little work has been done for trials involving censored event times. In Chapter 3, asymptotic theory for sample size calculations for correlated failure time data arising in cluster-randomized trials is explored. When the intervention effect is specified through a semi-parametric proportional hazards model fitted under a working independence assumption, robust variance estimates are routinely used. At the design stage however, some model specification is required for the marginal distributions, and copula models are utilized to accommodate the within-cluster dependence. This method is appealing since the intervention effects are specified in terms of the marginal proportional hazards formulation while the within-cluster dependence is modeled by a separate association parameter. The resulting joint model enabled one to evaluate the robust sandwich variance, based on which the sample size criteria for right censored event times is developed. This approach has also been extended to deal with interval-censored event times and within-cluster dependence in the random right censoring times. The validity of the sample size formula in finite samples was investigated via simulation for a range of cluster sizes, censoring rates and degree of within-cluster association among event times. The power and efficiency implications of copula misspecification are studied, along with the effect of within-cluster dependence in the censoring times. The proposed sample size formula can be applied in a broad range of practical settings, and an application to a study of otitis media is given for illustration. Chapter 4 considers dependent failure time data in a slightly different context where the events correspond to transitions in a multistate model. A central goal in oncology is the reduction of mortality due to cancer. The therapeutic advances in the treatment of many cancers and the increasing pressure to ensure experimental treatments are evaluated in a timely and cost-effective manner, have made it challenging to design feasible trials with adequate power to detect clinically important effects based on the time from randomization to death. This has lead to increased use of the composite endpoint of progression-free survival, defined as the time from randomization to the first of progression or death. While trials may be designed with progression or progression-free survival as the primary endpoint, regulators are interested in statements about the effect of treatment on survival following progression. One approach to investigate this is to estimate the treatment effect on the time from progression to death, but this is not an analysis that benefits from randomization since the only individuals who contribute to this analysis are those that experienced progression. Also assessing the treatment effect on marginal features might lead to dependent censoring for the survival time following progression as other variables which have both effect on progression and post-progression survival time are omitted from the model. In Chapter 4 we consider a classical illness-death model which can be used to characterize the joint distribution of progression and death in this setting. Inverse probability weighting can then be used to address for the observational nature of this improper sub-group analysis and dependent censoring. Such inverse weighted equations yield consistent estimates of the causal treatment effect by accounting for the effect of treatment and any prognostic factors that may be shared between the model for the sojourn time distribution in the progression state and the transition intensity for progression. Due to the non-collapsibility of the Cox regression model we focus here on additive regression models. Chapter 5 discusses prevalent cohort studies and the problem of measurement error in the reported disease onset time along with other topics for further research.