Emotion Recognition from Speech Using wav2vec 2.0 Embeddings

Authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer

Abstract:

Emotion recognition datasets are relatively small, making the use of deep learning techniques challenging. In this work, we propose a transfer learning method for speech emotion recognition (SER) where features extracted from pre-trained wav2vec 2.0 models are used as input to shallow neural networks to recognize emotions from speech. We propose a way to combine the output of several layers from the pre-trained model, producing richer speech representations than the model’s output alone. We evaluate the proposed approaches on two standard emotion databases, IEMOCAP and RAVDESS, and compare different feature extraction techniques using two wav2vec 2.0 models: a generic one, and one finetuned
for speech recognition. We also experiment with different shallow architectures for our speech emotion recognition model, and report baseline results using traditional features. Finally, we show that our best performing models have better average recall than previous approaches that use deep neural networks trained on spectrograms and waveforms or shallow neural networks trained on features extracted from wav2vec 1.0.

More information: https://www.isca-speech.org/archive/interspeech_2021/pepino21_interspeech.html

Andres Juarez2022-05-06T14:39:33-03:00 6/May/2022|Papers|

Activity homogeneity: a measure for comparing time discretization and state quantization in ODE simulation

A note on busy beaver bounds

EnCodecMAE: Leveraging Neural Codecs for Universal Audio Representation Learning

Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching

Modal Abstractions for Smart Contract Validation

Integrating Bayesian and neural networks models for eye movement prediction in hybrid search

Algorithms to prove the maximum number of MUBs in arbitrary dimensión

Rauzy complexity and block entropy

Hybrid resource allocation control in cyber-physical systems: a novel simulation-driven methodology with applications to UAVs

Mapping Semantic Segmentation to Point Clouds Using Structure from Motion for Forest Analysis

A multi-scale agent-based model of aerosol-mediated indoor infections in heterogeneous scenarios

Non-crossing H-graphs: a generalization of proper interval graphs admitting FPT algorithms

The discrepancy estimate of the Champernowne constant

No Need for Ad-hoc Substitutes: The Expected Cost is a Principled All-purpose Classification Metric

Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study

Emotion Recognition from Speech Using wav2vec 2.0 Embeddings

Compartir en las redes

Related Posts