| Titre : | Recognition and Analysis of Individual and Collective Human Behaviors by Combining Image Information and Speech Signal |
| Auteurs : | Zineddine Sarhani Kahhoul, Auteur ; Nadjiba Terki, Auteur |
| Type de document : | Monographie imprimée |
| Editeur : | Université Mohamed Kheider, 2025 |
| Langues: | Anglais |
| Mots-clés: | Automatic Speech Emotion Recognition (ASER), Deep Learning, Convolutional Neural Networks (CNN), Attention Mechanisms, Ensemble Learning, Spectrograms, Affective Computing, Algerian Arabic, Speech Corpus. |
| Résumé : |
The automated recognition of human emotion is a cornerstone of modern affective comput?ing, yet progress is often hindered by the limitations of unimodal analysis, the performance
gap on real-world data, and a critical lack of resources for under-resourced languages. This thesis presents a comprehensive framework to address these challenges, with a deep focus on advancing the state-of-the-art in Automatic Speech Emotion Recognition (ASER). The research makes three primary contributions. First, an efficient and lightweight archi?tecture, the CBAM-DenseNet121, is proposed to resolve the trade-off between accuracy and computational complexity. By integrating an attention mechanism with a dense convolutional network, this model achieves highly competitive performance on the benchmark CREMA-D dataset while utilizing substantially fewer parameters than comparable state-of-the-art models. Second, a novel high-accuracy framework is introduced, combining a custom DeepSpec?CNN with an architecturally diverse ensemble learning strategy. This approach reframes the classification problem using the control dimension of the Geneva Wheel of Emotions (GWE), establishing a new state-of-the-art performance on CREMA-D by significantly improving upon existing methods. Finally, to address data scarcity, this thesis introduces the Open Your Heart (OYH) corpus, a new, large-scale dataset containing several hours of genuine emotional speech in the Algerian Arabic dialect. Comprehensive performance baselines were established on this challenging corpus using traditional machine learning models, providing a vital new benchmark for future research. Collectively, this thesis advances the field through the dual contribution of novel, high?performance ASER models and the creation of an essential new corpus. The findings provide a robust foundation for building more nuanced, culturally aware, and socially intelligent systems. |
| Type de document : | Thése doctorat |
Disponibilité (1)
| Cote | Support | Localisation | Statut | Emplacement | |
|---|---|---|---|---|---|
| th/1431 | Livre | BIB.FAC.ST. | Empruntable |
Erreur sur le template



