Prospective prediction in the presence of missing data

Marshall G.; Warner B.; MaWhinney, S; Hammermeister, K


A variety of methods and algorithms are available for estimating parameters in the class of a generalized linear model in the presence of missing values. However, there is little information on how this already built model can be used for prediction in new observations with missing data in the covariates. Dropping the observations with missing values is a widespread practice with serious statistical and non-statistical implications. One solution is to fit separate regression models, or submodels, to each pattern of missing covariates. In practice, for any iterative regression method, this approach is computationally intensive. We propose a simple methodology to predict outcomes for individuals with incomplete information based on the estimated coefficients and covariance from the already built model. This method does not require revisiting the original data set used to build the original model and works by generating a first-order approximation of any submodel coefficient estimates. This is achieved by using the SWEEP operator on an augmented covariance matrix obtained from the original model. We refer to this approach as the one-step sweep (OSS) method. The methodology is demonstrated using data from the Department of Veterans Affairs Continuous Improvement in Cardiac Surgery Program (CICSP). These data contain 30 day mortality, the outcome of interest, and risk information for over 14 000 patients who underwent coronary artery bypass grafting (CABG) surgery over a four-year period. Using complete data from the first 3.5 years of this study period, a logistic regression model was built. This model was then used to predict mortality for patients undergoing CABG in the most recent 6-months. In order to evaluate the performance of the OSS method we randomly generated observations with missing covariates in the 6-month prediction database. We use this simulation to demonstrate that the computationally efficient OSS substantially reduces the error in risk-adjusted mortality created when cases with incomplete information are eliminated. Lastly, we derive the relationship between the OSS method and data imputation. Copyright © 2002 John Wiley & Sons, Ltd.

Más información

Título según WOS: Prospective prediction in the presence of missing data
Título según SCOPUS: Prospective prediction in the presence of missing data
Título de la Revista: STATISTICS IN MEDICINE
Volumen: 21
Número: 4
Editorial: Wiley
Fecha de publicación: 2002
Página de inicio: 561
Página final: 570
Idioma: English