Body Mass Index and Missing Data: Examining the Levels, Patterns, and Impacts of Missing Data in a Large Cohort Study of Canadian Youth
MetadataShow full item record
Missing data are generally unavoidable in survey-based research. Small amounts of random missingness may not pose significant problems; however, issues arise when data are missing in large proportions or when missingness follows a systematic pattern. Survey items that are tied to social desirability can be markedly impacted by non-response. Youth are a major target for survey-based research, with many cohort studies using surveys to monitor youth health and their health behaviours. Two common health-related measures that are often collected using self-reported surveys are height and weight, used to calculate body mass index (BMI). BMI is used as a proxy for body adiposity at the population level to identify individuals with overweight or obesity (OWOB). BMI is an important measure for research and population surveillance as it is a well-established predictor of future chronic disease. Among youth, OWOB trajectories tend to track into adulthood, and there is substantial literature exploring youth OWOB and associated factors. However, few existing studies have examined youth nonreporting of height and weight. Those studies which have examined nonreporting suggest that for youth, BMI tends to be missing in high proportions (exceeding missingness for other measures) and often follows a systematic pattern of missingness. There are several methods through which researchers can manage missing data. The most common approach is complete case analysis (CCA), whereby missing cases are deleted, and analyses are performed using only complete data. Due to the loss of information, CCA can introduce inefficiencies and bias into statistical results. Hence, in situations where data are missing systematically and in high proportions, CCA is not recommended. More sophisticated techniques, such as multiple imputation (MI), can yield unbiased and efficient estimates in these situations; however, they are not commonly leveraged in epidemiological studies. In fact, systematic reviews have suggested that information on missing data is typically not presented. This dissertation aimed to explore levels, patterns, and impacts of missing data among youth, specifically focusing on nonreporting of height, weight, and subsequently calculated BMI using data from a large youth-focused survey. This research leveraged data from 74,501 youth who participated in the 2018/19 wave of the COMPASS study. The COMPASS study is a survey-based cohort study among youth aged 12-19 years in Canada examining a variety of different aspects of health and health behaviours. Study 1 examined variables associated with missingness in BMI, height, and weight using model selection in three separate logistic regressions. Study 2 examined patterns, hierarchies, and subgroups of missing BMI, height, and weight data using classification and regression tree (CART) models. Finally, Study 3 compared the differences in findings between CCA and MI missing data approaches in the context of factors associated with youth BMI through linear mixed models. Study 1 found that nearly 1 in 3 youth in this sample were missing BMI data. Among those with missing BMI, 32\% did not report their weight, 20\% did not report their height, 36\% reported neither weight nor height, and 12\% were reduced to missing due to unrealistic values. A greater proportion of females were missing weight only, whereas a greater proportion of males were missing height only. Of all the youth-reported measures, BMI, height, and weight showed the highest degree of missingness. For both males and females, perceiving oneself as overweight was associated with a greater likelihood of BMI being missing. Indicators of poor diet and physical inactivity were also significantly associated with missing BMI. Taken together, results of Study 1 suggest that social desirability played a significant role in nonreporting patterns, and it is likely that those who have a higher BMI are less likely to report their height or weight. Study 2 identified that certain subgroups of youth (characterized by various health behaviours and indicators) were more likely to be missing BMI. Confirming findings from Study 1, patterns of systematic missingness in BMI were identified using CART models. Examining the identified subgroups highlighted that a combination of weight perception, low physical activity, poor academic performance, and poor mental health almost certainly lead to nonreporting. Study 2 also identified a hierarchy of importance for the variables related to missingness in BMI, height, and weight, providing more context to the associations observed in Study 1 and highlighting the utility of a CART approach to examine missing data. Studies 1 and 2 illustrated that in this sample, BMI missingness was highly prevalent and non-random. Using the findings from these two studies, Study 3 illustrated the bias that can occur when missing data are not managed appropriately. MI and CCA approaches produced contrasting results across sex-stratified models examining factors associated with youth BMI. These results illustrated how bias from deleting cases may impact findings and lead to considerably different research conclusions, highlighting the importance of thorough examination and appropriate handling of missing data. This dissertation fills an important gap in the research examining patterns and impacts of missingness in youth BMI, height, and weight. In this dissertation, missingness in youth BMI was found to be highly prevalent and followed a systematic pattern. Identified patterns indicated that nonreporting was likely influenced at least in part by social desirability, and that those with a higher BMI were less likely to report their height and/or weight. Subgroups of youth who had poorer outcomes for physical activity, school grades, and mental health were nearly guaranteed to be missing BMI. When carried forward into an analysis examining factors related to youth BMI, deleting the missing cases introduced bias into findings. This research highlights a great need for improved missing data reporting and handling within youth OWOB research. Similar cohort studies that collect youth height and weight through self-report measures should perform thorough examinations of missing data and choose appropriate methodologies to manage missingness. This research also suggests that researchers should exert caution when interpreting and utilizing results from studies where missing data are not well-reported.
Cite this version of the work
Amanda Doggett (2022). Body Mass Index and Missing Data: Examining the Levels, Patterns, and Impacts of Missing Data in a Large Cohort Study of Canadian Youth. UWSpace. http://hdl.handle.net/10012/18963