Selection models for efficient two-phase design of family studies

Zhong, YujieCook, Richard2022-08-082022-08-082021-01-30https://doi.org/10.1002/sim.8772http://hdl.handle.net/10012/18491This is the peer reviewed version of the following article: “Zhong Y and Cook RJ (2021), Selection models for efficient two-phase design of family studies, Statistics in Medicine, 40 (2): 254–270” which has been published in final form at https://doi.org/10.1002/sim.8772.Family studies routinely employ biased sampling schemes in which individuals are randomly chosen from a disease registry and genetic and phenotypic data are obtained from their consenting relatives. We view this as a two-phase study and propose the use of an efficient selection model for the recruitment of families to form a phase II sample subject to budgetary constraints. Simple random sampling, balanced sampling and use of an approximately optimal selection model are considered where the latter is chosen to minimize the variance of parameters of interest. We consider the setting where family members provide current status data with respect to the disease and use copula models to address within-family dependence. The efficiency gains fromthe use of an optimal selection model over simple random sampling and balanced sampling schemes are investigated as is the robustness of optimal sampling to model misspecification. An application to a family study on psoriatic arthritis is given for illustration.enAge of onset, biased sampling, clustered data, copula model, efficiency, selection modelSelection models for efficient two-phase design of family studiesArticle