Browsing by Author "Zeng, Leilei"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Empirical Likelihood Methods for Causal Inference(University of Waterloo, 2024-08-21) Huang, Jingyue; Wu, Changbao; Zeng, LeileiThis thesis develops empirical likelihood methods for causal inference, focusing on the estimation and inference of the average treatment effect (ATE) and the causal quantile treatment effect (QTE). Causal inference has been a critical research area for decades, as it is essential for understanding the true impact of interventions, policies, or actions, thereby enabling informed decision-making and providing insights into the mechanisms shaping our world. However, directly comparing responses between treatment and control groups can yield invalid results due to potential confounders in treatment assignments. In Chapter 1, we introduce fundamental concepts in causal inference under the widely adopted potential outcome framework and discuss the challenges in observational studies. We formulate our research problems concerning the estimation and inference of the ATE and review some commonly used methods for ATE estimation. Chapter 2 provides a brief review of traditional empirical likelihood methods, followed by the pseudo-empirical likelihood (PEL) and sample empirical likelihood (SEL) approaches in survey sampling for one-sample problems. In Chapter 3, we propose two inferential procedures for the ATE using a two-sample PEL approach. The first procedure employs estimated propensity scores for the formulation of the PEL function, resulting in a maximum PEL estimator of the ATE equivalent to the inverse probability weighted estimator discussed in the literature. Our focus in this scenario is on the PEL ratio statistic and establishing its theoretical properties. The second procedure incorporates outcome regression models for PEL inference through model-calibration constraints, and the resulting maximum PEL estimator of the ATE is doubly robust. Our main theoretical result in this case is the establishment of the asymptotic distribution of the PEL ratio statistic. We also propose a bootstrap method for constructing PEL ratio confidence intervals for the ATE to bypass the scaling constant which is involved in the asymptotic distribution of the PEL ratio statistic but is very difficult to calculate. Finite sample performances of our proposed methods with comparisons to existing ones are investigated through simulation studies. A real data analysis to examine the ATE of maternal smoking during pregnancy on birth weights using our proposed methods is also presented. In Chapter 4, we explore two SEL-based approaches for the estimation and inference of the ATE. Both involve a traditional two-sample empirical likelihood function with different ways of incorporating propensity scores. The first approach introduces propensity scores-calibrated constraints alongside the standard model-calibration constraints, while the second approach uses propensity scores to form weighted versions of the model-calibration constraints. Both approaches result in doubly robust estimators, and we derive the limiting distributions of the two SEL ratio statistics to facilitate the construction of confidence intervals and hypothesis tests for the ATE. Bootstrap methods for constructing SEL ratio confidence intervals are also discussed for both approaches. We investigate finite sample performances of the methods through simulation studies. While inferences on the ATE are an important problem with many practical applications, analyzing the QTE is equally important as it reveals intervention impacts across different population segments. In Chapter 5, we extend the PEL and the two SEL approaches from Chapters 3 and 4, each augmented with model-calibration constraints, to develop doubly robust estimators for the QTE. Two types of model-calibration constraints are proposed: one leveraging multiple imputations of potential outcomes and the other employing direct modeling of indicator functions. We calculate two types of bootstrap-calibrated confidence intervals for each of the six formulations, using point estimators and empirical likelihood ratios, respectively. We also discuss computational challenges and present simulation results. Our proposed approaches support the integration of multiple working models, facilitating the development of multiply robust estimators, distinguishing our methods from existing approaches. Chapter 6 summarizes the contributions of this thesis and outlines some research topics for future work.Item Statistical Methods for Event History Data under Response Dependent Sampling and Incomplete Observation(University of Waterloo, 2020-07-17) Shi, Yidan; Thomson, Mary; Zeng, LeileiThis thesis discusses statistical problems in event history data analysis including survival analysis and multistate models. Research questions in this thesis are motivated by the Nun Study, which contains longevity data and longitudinal follow-up of cognition functions in 678 religious sisters. Our research interests lie in modeling the survival pattern and the disease process for dementia. These data are subject to a process-dependent sampling scheme, and the homogeneous Markov assumption is violated when using a multistate model to fit the panel data for cognition. In this thesis, we formulated three statistical questions according to the aforementioned issues and propose approaches to deal with these problems. Survival analysis is often subject to left-truncation when the data are collected within certain study windows. Naive methods ignoring the sampling conditions yield invalid estimates. Much work has been done to deal with the bias caused by left-truncation. However, discussion on the loss-in-efficiency is limited. In Chapter 2, we proposed a method in which auxiliary information is borrowed to improve the efficiency in estimation. The auxiliary information includes summary-level statistics from a previous study on the same cohort and census data for a comparable population. The likelihood and score functions are developed. A Monte Carlo approximation is proposed to deal with the difficulty in obtaining tractable forms of the score and information functions. The method is illustrated by both simulation and real data application to the Nun Study. Continuous-time Markov models are widely used for analyzing longitudinal data on the disease progression over time due to the great convenience for computing the probability transition matrices and the likelihood functions. However, in practice, the Markov assumption does not always hold. Most of the existing methods relax the Markov assumption while losing the advantage of that assumption in the calculation of transition probabilities. In Chapter 3, we consider the case where the violation of the Markov property is due to multiple underlying types of disease. We propose a mixture hidden Markov model where the underlying process is characterized by a mixture of multiple time-homogeneous Markov chains, one for each disease type, while the observation process contains states corresponding to the common symptomatic stages of these diseases. The method can be applied to modeling the disease process of Alzheimer's disease and other types of dementia. In the Nun Study, autopsies were conducted on some of the deceased participants so that one can know whether these individuals have Alzheimer's pathology in their brains. Our method can incorporate these partially observed pathology data as disease type indicators to improve the efficiency in estimation. The predictions for the overall prevalence and type-specific prevalence for dementia are calculated based on the proposed method. The performance of the proposed methods is also evaluated via simulation studies. Many prospective cohort studies of chronic diseases select individuals whose observed process history satisfies particular conditions. For instance, studies aiming to estimate the incidence rate of dementia or the effect of genetic factors on the disease would recruit individuals in the condition of being alive and disease-free. In contrast, some other studies may aim to collect information on disease progression or mortality from the time of the disease onset. Under such settings, individuals are recruited if they are in a subset of the states at the study entry, and the methods of estimation need to account for such state-dependent selection conditions. For multistate analysis, one option is to construct the likelihood based on the prospective data given the history up to and including the time at accrual. This approach yields consistent estimates under state-dependent sampling condition with a price of loss in efficiency. Alternatively, the likelihood contribution from the retrospective and current status data at the time of accrual can be incorporated, but with difficulty in obtaining such information. For example, subjects' initial states are often unknown, imposing a challenge for the computation of the contribution from the current status data at the time of recruitment. However, auxiliary information on the initial states may be available, such as the age-specific population prevalence data related to the disease. In Chapter 4, we proposed a weighted-likelihood method to incorporate auxiliary prevalence data and account for the state-dependent selection condition. The method is demonstrated by simulation and applied to the Nun Study of aging and Alzheimer's disease. A Bayesian sensitivity test is conducted to evaluate the impact of misspecification of the auxiliary prevalence.Item The association between functional social support, marital status and memory in middle-aged and older adults: An analysis of the Canadian Longitudinal Study on Aging(Elsevier, 2025) Haghighi, Paniz; Zeng, Leilei; Tyas, Suzanne L; Meyer, Samantha B; Oremus, MarkPurpose Although several studies have reported positive associations between functional social support (FSS) and memory, few have explored how other social variables, such as marital status, may affect the magnitude and direction of this association. We examined whether marital status modifies the association between FSS and memory in a sample of community-dwelling, middle-aged and older adults. Methods Data at three timepoints, spanning six years, were analyzed from the Tracking Cohort of the Canadian Longitudinal Study on Aging (n = 10,318). Linear mixed models were used to regress memory onto FSS across all three timepoints, adjusting for multiple covariates. The moderating effect of marital status was assessed by adding its interaction with FSS in the model. Separate regression models were built for overall FSS and four subtypes (positive interactions, affectionate, emotional/informational, and tangible support). Results We found significant and positive adjusted associations for overall FSS (β: 0.07; 95 % CI: 0.01, 0.13), positive interactions (β: 0.06; 95 % CI: 0.01, 0.11), and affectionate support (β: 0.05; 95 % CI: 0.00, 0.11) with memory. However, the interaction between marital status and FSS (overall and subtypes) was not statistically significant (likelihood ratio test p-value = 0.75), indicating that FSS does not have differing effects on memory depending on marital status. Conclusion Our findings do not provide evidence to suggest that marital status affects the association between FSS and memory in middle-aged and older adults. Nonetheless, policymakers and practitioners should take a comprehensive approach when exploring how various dimensions of social relationships may uniquely influence cognitive trajectories.Item Topics in the Design of Life History Studies(University of Waterloo, 2018-08-20) Moon, Nathalie C.; Zeng, Leilei; Cook, RichardSubstantial investments are being made in health research to support the conduct of large cohort studies with the objective of improving understanding of the relationships between diverse features (e.g. exposure to toxins, genetic biomarkers, demographic variables) and disease incidence, progression, and mortality. Longitudinal cohort studies are commonly used to study life history processes, that is patterns of disease onset, progression, and death in a population. While primary interest often lies in estimating the effect of some factor on a simple time-to-event outcome, multistate modelling offers a convenient and powerful framework for the joint consideration of disease onset, progression, and mortality, as well as the effect of one or more covariates on these transitions. Longitudinal studies are typically very costly, and the complexity of the follow-up scheme is often not fully considered at the design stage, which may lead to inefficient allocation of study resources and/or underpowered studies. In this thesis, several aspects of study design are considered to guide the design of complex longitudinal studies, with the general aim being to obtain efficient estimates of parameters of interest subject to cost constraints. Attention is focused on a general $K$ state model where states $1, \ldots, K-1$ represent different stages of a chronic disease and state $K$ is an absorbing state representing death. In Chapter 2, we propose an approach to design efficient tracing studies to mitigate the loss of information stemming from attrition, a common feature of prospective cohort studies. Our approach exploits observed information on state occupancy prior to loss-to-followup, covariates, and the time of loss-to-followup to inform the selection of individuals to be traced, leading to more judicious allocation of resources. Two settings are considered. In the first there are only constraints on the expected number of individuals to be traced, and in the second the constraints are imposed on the expected cost of tracing. In the latter, the fact that some types of data may be more costly to obtain via tracing than other types of data is dealt with. In Chapter 3, we focus on two key aspects of longitudinal cohort studies with intermittent assessments: sample size and the frequency of assessments. We derive the Fisher information as the basis for studying the interplay between these factors and to identify features of minimum-cost designs to achieve desired power. Extensions which accommodate the possibility of misclassification of disease status at the intermittent assessments times are developed. These are useful to assess the impact of imperfect screening or diagnostic tests in the longitudinal setting. In Chapter 4, attention is turned to state-dependent sampling designs for prevalent cohort studies. While incident cohorts involve recruiting individuals before they experience some event of interest (e.g. onset of a particular disease) and prospectively following them to observe this event, prevalent cohorts are obtained by recruiting individuals who have already experienced this event at some point in the past. Prevalent cohort sampling yields length-biased data which has been studied extensively in the survival setting; we demonstrate the impact of this in the multistate setting. We start with observation schemes in which data are subject to left- or right-truncation in the failure-time setting. We then generalize these findings to more complex multistate models. While the distribution of state occupancy at recruitment in a prevalent cohort sample may be driven by the prevalences in the population, we propose approaches for state-dependent sampling at the design stage to improve efficiency and/or minimize expected study cost. Finally, Chapter 5 features an overview of the key contributions of this research and outlines directions for future work.