Modeling and Prediction of Disease Processes Subject to Intermittent Observation
Loading...
Date
2016-07-21
Authors
Wu, Ying
Advisor
Cook, Richard
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
This thesis is concerned with statistical modeling and prediction of disease processes subject to intermittent
observation.
Times of disease progression are interval-censored when progression status is only known at a series of
assessment times.
This situation arises routinely in clinical trials and cohort studies when events of interest are only
detectable upon imaging, based on blood tests, or upon careful clinical examination.
The work that follows is motivated by the study of demographic, genetic and clinical data available from the
University of Toronto Psoriasis
Registry and the University of Toronto Psoriatic Arthritis Registry, each involving cohorts of several hundred
patients with the respective diseases.
Chapter 2 deals with the problem of selecting important prognostic biomarkers from a large set of candidates
biomarkers when the status with respect to an event of interest (e.g. disease progression) is only known at
irregularly spaced and individual-specific assessment times.
Penalized regression techniques (e.g. LASSO, adaptive LASSO and SCAD) are adapted to deal with the
interval-censored event times arising from this observation scheme.
An expectation-maximization algorithm is developed which is demonstrated to perform well in extensive simulation studies
involving independent and correlated continuous and binary covariates.
Application to the motivating study of the development of arthritis mutilans in patients with psoriatic arthritis
is given and several important human leukocyte antigen (HLA) variables are identified for further
investigation.
Extensions of this algorithm are developed for settings in which data from different sources
with distinct disease-related entry conditions are to be synthesized.
The extended Turnbull-type expectation-maximization algorithm is based on a complete data likelihood which
incorporates missing information from individuals not meeting the entry criteria of the respective registries.
Simulation studies demonstrate good empirical performance and an application to the motivating study identifies
HLA markers associated with the onset of psoriatic arthritis among individuals with psoriasis.
This analysis is carried out using data from a psoriasis registry in which the times to psoriatic arthritis are left-truncated, and psoriatic arthritis registry in which the onset times are right-truncated.
Chapter 3 deals with the challenge of assessing the accuracy of a predictive model when response times are
interval-censored.
Inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimators of predictive accuracy are developed and evaluated based on the
mean prediction error and the area under the receiver operating characteristic curve.
The weights are estimated from a multistate model which jointly considers the event process, the inspection process,
and the right-censoring processes.
We investigate the performance of the proposed methods by simulation and illustrate their application in the context
of a motivating rheumatology study in which HLA markers are used for predicting disease progression in
psoriatic arthritis.
A two-phase model is developed in Chapter 4 for chronic diseases which feature an indolent phase followed by a
phase with more active disease resulting in progression and damage.
The time-scales for the intensity functions for the active phase are more naturally based on the time since the
start of the active phase, corresponding to a semi-Markov formulation.
In cohort studies for which the disease status is only known at a series of clinical assessment times,
transition times are interval-censored which means the time origin for phase II is interval-censored.
Weakly parametric models with piecewise constant baseline hazard and rate functions are specified and an
expectation-maximization algorithm is described for model fitting.
A computationally faster two-stage estimation procedure is also developed and the asymptotic variances of the
resulting estimators are derived.
Simulation studies examining the performance of the proposed model show good performance under both
maximum likelihood and two-stage estimation.
An application to data from the motivating study of disease progression in psoriatic arthritis illustrates
the procedure, and identifies new human leukocyte antigens associated with the duration of the indolent phase, and others associated with disease progression in the active phase.
Open problems and topics for ongoing and future research are discussed in Chapter 5.