Integrating acoustic micro-disfluencies and emotional context for robust Alzheimer’s dementia detection using transformer-based models
- MOJ Applied Bionics and Biomechanics
-
Karim Dabbabi, Ahlem Kehili, Adnen Cherif
PDF Full Text
Abstract
Alzheimer’s dementia poses significant challenges for early detection due to its subtle impact on speech and emotional patterns. Existing methods often lack the ability to capture both micro-level speech disfluencies and macro-level emotional-contextual dynamics. To address this, we propose a model integrating Temporal Acoustic Micro-Disfluency Patterns (TAMP) and Emotion-Contextual Acoustic Memory Fusion (ECAMF) features with GRU, multi-head attention, and a Transformer encoder. Evaluated on the ADReSSo dataset, the model achieved 90.2% accuracy, 89.6% precision, 88.7% recall, and an F1-score of 89.1%, significantly outperforming traditional and pre-trained methods. This approach offers an efficient and robust solution for early Alzheimer’s dementia detection, emphasizing both acoustic and emotional features in a unified framework, while demonstrating improved sensitivity to nuanced speech and emotional patterns.
Keywords
Alzheimer’s dementia, speech analysis, acoustic features, emotional embeddings, transformer encoder