Safety-Aware Multi-Agent Deep Reinforcement Learning for Adaptive Fault-Tolerant Control in Sensor-Lean Industrial Systems: Validation in Beverage CIP
Abstract
Fault-tolerant control in safety-critical industrial systems demands adaptive responses to equipment degradation, parameter drift, and sensor failures while maintaining strict operational constraints. Traditional model-based controllers struggle under these conditions, requiring extensive retuning and dense instrumentation. Recent safe multi-agent reinforcement learning (MARL) frameworks with control barrier functions (CBFs) achieve real-time constraint satisfaction in robotics and power systems, yet assume comprehensive state observability-incompatible with sensor-hostile industrial environments where instrumentation degradation and contamination risks dominate design constraints. This work presents a safety-aware multi-agent deep reinforcement learning framework for adaptive fault-tolerant control in sensor-lean industrial environments, achieving formal safety through learned implicit barriers under partial observability. The framework integrates four synergistic mechanisms: (1) multi-layer safety architecture combining constrained action projection, prioritized experience replay, conservative training margins, and curriculum-embedded verification achieving zero constraint violations; (2) multi-agent coordination via decentralized execution with learned complementary policies. Additional components include (3) curriculum-driven sim-to-real transfer through progressive four-stage learning achieving 85-92% performance retention without fine-tuning; (4) offline extended Kalman filter validation enabling 70% instrumentation reduction (91-96% reconstruction accuracy) for regulatory auditing without real-time estimation dependencies. Validated through sustained deployment in commercial beverage manufacturing clean-in-place (CIP) systems-a representative safety-critical testbed with hard flow constraints (>= 1.5 L/s), harsh chemical environments, and zero-tolerance contamination requirements-the framework demonstrates superior control precision (coefficient of variation: 2.9-5.3% versus 10% industrial standard) across three hydraulic configurations spanning complexity range 2.1-8.2/10. Comprehensive validation comprising 37+ controlled stress-test campaigns and hundreds of production cycles (accumulated over 6 months) confirms zero safety violations, high reproducibility (CV variation < 0.3% across replicates), predictable complexity-performance scaling (R-2=0.89), and zero-retuning cross-topology transferability. The system has operated autonomously in active production for over 6 months, establishing reproducible methodology for safe MARL deployment in partially-observable, sensor-hostile manufacturing environments where analytical CBF approaches are structurally infeasible.
Más información
| Título según WOS: | ID WOS:001672039600001 Not found in local WOS DB |
| Título de la Revista: | TECHNOLOGIES |
| Volumen: | 14 |
| Número: | 1 |
| Editorial: | MDPI |
| Fecha de publicación: | 2026 |
| DOI: |
10.3390/technologies14010044 |
| Notas: | ISI |