Modeling and Prediction of Disease Processes Subject to Intermittent Observation
This thesis is concerned with statistical modeling and prediction of disease processes subject to intermittent observation. Times of disease progression are interval-censored when progression status is only known at a series of assessment times. This situation arises routinely in clinical trials and cohort studies when events of interest are only detectable upon imaging, based on blood tests, or upon careful clinical examination. The work that follows is motivated by the study of demographic, genetic and clinical data available from the University of Toronto Psoriasis Registry and the University of Toronto Psoriatic Arthritis Registry, each involving cohorts of several hundred patients with the respective diseases. Chapter 2 deals with the problem of selecting important prognostic biomarkers from a large set of candidates biomarkers when the status with respect to an event of interest (e.g. disease progression) is only known at irregularly spaced and individual-specific assessment times. Penalized regression techniques (e.g. LASSO, adaptive LASSO and SCAD) are adapted to deal with the interval-censored event times arising from this observation scheme. An expectation-maximization algorithm is developed which is demonstrated to perform well in extensive simulation studies involving independent and correlated continuous and binary covariates. Application to the motivating study of the development of arthritis mutilans in patients with psoriatic arthritis is given and several important human leukocyte antigen (HLA) variables are identified for further investigation. Extensions of this algorithm are developed for settings in which data from different sources with distinct disease-related entry conditions are to be synthesized. The extended Turnbull-type expectation-maximization algorithm is based on a complete data likelihood which incorporates missing information from individuals not meeting the entry criteria of the respective registries. Simulation studies demonstrate good empirical performance and an application to the motivating study identifies HLA markers associated with the onset of psoriatic arthritis among individuals with psoriasis. This analysis is carried out using data from a psoriasis registry in which the times to psoriatic arthritis are left-truncated, and psoriatic arthritis registry in which the onset times are right-truncated. Chapter 3 deals with the challenge of assessing the accuracy of a predictive model when response times are interval-censored. Inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimators of predictive accuracy are developed and evaluated based on the mean prediction error and the area under the receiver operating characteristic curve. The weights are estimated from a multistate model which jointly considers the event process, the inspection process, and the right-censoring processes. We investigate the performance of the proposed methods by simulation and illustrate their application in the context of a motivating rheumatology study in which HLA markers are used for predicting disease progression in psoriatic arthritis. A two-phase model is developed in Chapter 4 for chronic diseases which feature an indolent phase followed by a phase with more active disease resulting in progression and damage. The time-scales for the intensity functions for the active phase are more naturally based on the time since the start of the active phase, corresponding to a semi-Markov formulation. In cohort studies for which the disease status is only known at a series of clinical assessment times, transition times are interval-censored which means the time origin for phase II is interval-censored. Weakly parametric models with piecewise constant baseline hazard and rate functions are specified and an expectation-maximization algorithm is described for model fitting. A computationally faster two-stage estimation procedure is also developed and the asymptotic variances of the resulting estimators are derived. Simulation studies examining the performance of the proposed model show good performance under both maximum likelihood and two-stage estimation. An application to data from the motivating study of disease progression in psoriatic arthritis illustrates the procedure, and identifies new human leukocyte antigens associated with the duration of the indolent phase, and others associated with disease progression in the active phase. Open problems and topics for ongoing and future research are discussed in Chapter 5.