Mining the risk: early cardiovascular detection in workers

Jorquera, Ricardo; Droppelmann, Guillermo; Dollmann, Max; Blanco, Gonzalo; Ahumada, Ignacio; Lira, Alfonso; Feijoo, Felipe

Abstract

Background: Cardiovascular disease (CVD) is the leading cause of death worldwide. Although tools exist to assess individual cardiovascular risk (CVR), they often fall short in unique populations such as miners, who work under extreme conditions. To address these limitations, this study proposes the use of machine learning (ML) and longitudinal data to predict risk progression using accessible clinical markers. Body mass index (BMI) and blood glucose (BG) were chosen as key CVR proxies because they are affordable, measured routinely in occupational health checks, and responsive to metabolic stresses common in mining environments. Methods: We conducted a retrospective longitudinal analysis of 89,045 Chilean mining workers (420,966 preemployment exams; 2021-2024). For each worker, we formed successive visit pairs to model transitions between clinically defined BMI and BG categories. Four binary outcomes based on the scenario per biomarker were specified (any upward transition; adjacent upward transition; obesity-morbid obesity/prediabetes-diabetes; any transition ending in morbid obesity/diabetes). Machine learning techniques were built to assess transitions for each scenario and biomarker. We applied a stratified 70/30 train-test split, repeated 7-fold cross-validation within training, random hyperparameter search (AUC objective), and downsampling of the majority classes within folds to address the imbalance. Performance in the original (imbalanced) test set was summarized by AUC, accuracy, sensitivity, and specificity with 95% CIs of the cross-validation process. The correlation between models was assessed using Pearson's correlations of predicted probabilities. Results: Predicting BMI transitions (N = 18,035 pairs) was highly accurate between models. The best performance occurred for severe progression (Scenario 4, defined as any transition ending in morbid obesity): where XGB achieved AUC 0.95 and accuracy 0.91, with high sensitivity and strong specificity. For broader BMI transitions across scenarios 1-3, models remained reliable AUC 0.84-0.87. BG transitions (N = 16,161 pairs) were harder but still actionable. The strongest results were for progression to diabetes (Scenario 4), with RF reaching AUC 0.83 (95% CI: 0.82-0.90) and accuracy 0.76; other BG scenarios yielded AUC 0.71-0.77. Cross-validation closely matched test performance. Pairwise probability correlations were typically >0.90 for BMI and >0.80 for BG in severe scenarios, indicating good generalization and no evidence of overfitting. Conclusion: ML models effectively predict clinically relevant BMI and BG risk transitions in the extraction of occupational health data. The use of longitudinal visit pairs and scenario-based evaluation improves the capacity of the models to achieve high AUC values and maintain accuracy and sensitivity, while ensuring generalization and consistency. These findings highlight the potential of this approach to improve the assessment of CVR and support preventive decision-making in high-risk working populations.

Más información

Título según WOS: ID WOS:001636295800001 Not found in local WOS DB
Título de la Revista: FRONTIERS IN MEDICINE
Volumen: 12
Editorial: FRONTIERS MEDIA SA
Fecha de publicación: 2025
DOI:

10.3389/fmed.2025.1678172

Notas: ISI