Emotion Recognition from Speech Using wav2vec 2.0 Embeddings

Authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer

Abstract:

Emotion recognition datasets are relatively small, making the use of deep learning techniques challenging. In this work, we propose a transfer learning method for speech emotion recognition (SER) where features extracted from pre-trained wav2vec 2.0 models are used as input to shallow neural networks to recognize emotions from speech. We propose a way to combine the output of several layers from the pre-trained model, producing richer speech representations than the model’s output alone. We evaluate the proposed approaches on two standard emotion databases, IEMOCAP and RAVDESS, and compare different feature extraction techniques using two wav2vec 2.0 models: a generic one, and one finetuned
for speech recognition. We also experiment with different shallow architectures for our speech emotion recognition model, and report baseline results using traditional features. Finally, we show that our best performing models have better average recall than previous approaches that use deep neural networks trained on spectrograms and waveforms or shallow neural networks trained on features extracted from wav2vec 1.0.

More information: https://www.isca-speech.org/archive/interspeech_2021/pepino21_interspeech.html

Andres Juarez2022-05-06T14:39:33-03:00 6/mayo/2022|Papers|

On extremal factors of de Bruijn-like graphs

AgrOptim: A novel multi-objective simulation optimization framework for extensive cropping systems

A Linear Proof Language for Second-Order Intuitionistic Linear Logic

The discrepancy estimate of the Champernowne constant

No Need for Ad-hoc Substitutes: The Expected Cost is a Principled All-purpose Classification Metric

Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study

AI-assisted bronchoscopy in the intensive care unit: development of a training tool for identifying anatomic positions

Information Extraction from Electronic Health Records Written in Spanish for Epidemic Intelligence

Optimality of DSatur algorithm on chordal graphs

A linear linear lambda-calculus

LOCC convertibility of entangled states in infinite-dimensional systems

Understanding Toxicity and Sentiment Dynamics in Social Media: LLM Analysis of Diverse and Focused Interest Users

Unavoidable Boundary Conditions: A Control Perspective on Goal Conflicts

Polarization dynamics: a study of individuals shifting between political communities on social media

Nested perfect arrays