The richness of little voices: using artificial intelligence to understand early language development

Petrache Mircea Alexandru; Carvallo, Andres; Silva, Valentina; Barcelo, Pablo; Peña, Marcela

Abstract

How informative are preschoolers’ speech vocalizations? Preschoolers’ speech is often imprecise, highly variable and hard to interpret by humans and machines; consequently, its predictive value for later developmental outcomes remains quite underexplored. Here, we analyzed 6.595 brief vocalizations (0.5-5s) from 127 preschoolers aged 3–4 years, including 74 children with diagnosed language delay, recorded in naturalistic environments. The vocalization models robustly distinguished children with and without language delay (ROC-AUC 0.90), beyond the acoustic properties of the recordings (ROC-AUC: 0.62), and outperformed similar models analyzing metadata that literature reports as predictive factor for early language development (ROC-AUC: < 0.69 [95% CI: 0.08 - 0.15 to 0.48 - 0.73], P < 0.001]). This indicates that neural networks applied to foundational model audio vectorizations can extract meaningful developmental markers from brief samples of immature speech, to classify speech status, offering a promising, scalable approach for language abilities early screening.

Más información

Título de la Revista: bioRxvi
Fecha de publicación: 2026
URL: 10.64898/2026.01.30.702650
Notas: bioRxvi