Multiple Imputation in the context of the unequal sampling probabilities of case cohort studies

Multiple imputation (MI) is commonly used to address missing data in epidemiological studies, but valid use requires compatibility between the imputation and analysis models. How to achieve compatibility is unclear when missing data occur in the context of unequal sampling probabilities, such as in case cohort studies, where the exposure is collected only on cases and a random subcohort. The unequal sampling probabilities in this study design can be accounted for during analyses through inverse probability weighting (IPW). To ensure compatibility these weights also need to be incorporated into the imputation model. This study assessed the performance of various approaches to accommodate weighting during MI to handle missing covariate data in the context of a case-cohort study estimating either a risk ratio or odds ratio. The study was motivated by a case-cohort investigation within the Barwon Infant Study (BIS). A simulation study was conducted to mimic BIS and missingness was introduced into two covariates, varying the proportion of incomplete cases, probability of subcohort selection and strength of outcome-exposure relationships. Various methods to incorporate weighting in the imputation were applied to handle covariate missingness, while IPW was used to analyse the imputed datasets. IPW was also applied to complete cases for comparison. The MI methods were also applied to the BIS data. All MI methods performed similarly in terms of bias and efficiency, with marked improvement when compared to IPW applied to the complete cases. A similar pattern of results were seen in the case study. Our results suggest that MI increases the accuracy and efficiency in the analysis of case-cohort studies with missing covariate data compared to IPW applied to complete cases. How weighting is accounted for during MI makes little difference in the analysis of case-cohort studies.