Using casual inference and machine learning with exposure determinant modeling to identify important workplace controls

Shkembi A.; Virji, MA; He J.; Nambunmee K.; Ruiz-Rudolph P.; Neitzel R.L.

Keywords: lasso, random forests, e-waste, LMIC, Boosted regression tree, counterfactual, forward selection

Abstract

Objectives Exposure determinant modeling can help industrial hygienists understand where, when, and how to control occupational exposures for their particular work environment. Yet, in practice, the ability to evaluate exposure determinants is degraded by selection bias (where only a subset of all exposed workers is sampled) and the statistical issue of small n, large p (few samples but many exposure determinants). This study explored the application of the causal inference framework and machine learning algorithms in exposure determinant modeling using a small n, large p example of potential determinants of heavy metal concentrations among informal electronic waste recycling workers.Methods As a case study, we used a multivariable logistic regression model to construct inverse probability weights to account for selection bias into a video substudy of 41 of 226 possible workers monitored for exposures to heavy metals. Forty-four determinants of biomarkers (eg tool use, job tasks, and personal protective equipment use) were quantified through video monitoring. Concentrations of heavy metals in blood (Pb and Mn) and urine (Ni and Cu) were sampled. We identified the best-performing biomarker determinant model by comparing the leave-one-out cross-validation root-mean-squared error (LOOCV-RMSE) of 5 models: 2 traditional models (multivariate linear regression and forward selection), and 3 machine learning algorithms (LASSO, boosted regression trees, and random forests). Using the best-performing model, we estimated reductions in heavy metal concentrations through hypothetical workplace controls to identify the most important determinant of biomarker concentrations.Results The random forest model had the lowest LOOCV-RMSE and was used as the final biomarker determinant model. Stopping workers from bending their backs while dismantling e-waste was the most important determinant of heavy metal concentrations. Using blood Pb as an example, this translated to an estimated reduction of 0.81 mu g/dL (95% confidence interval: 0.66, 0.98) in comparison with maintaining the status quo. Using a traditional regression model (forward selection without inverse probability weights), back bending was not identified as an important determinant of blood Pb.Discussion Our causal inference approach with machine learning algorithms overcomes the common limitations of exposure determinant modeling and produces easy-to-interpret estimates of biomarker concentration reductions from hypothetical workplace controls. This can aid industrial hygienists in choosing the most effective hazard controls that can be contextualized to their particular work setting.

Más información

Título según WOS: Using casual inference and machine learning with exposure determinant modeling to identify important workplace controls
Fecha de publicación: 2025
Idioma: English
DOI:

10.1093/annweh/wxaf069

Notas: ISI