Phone and speaker spatial organization in self-supervised speech representations

Authors: Pablo Riera, Manuela Cerdeiro, Leonardo Pepino and Luciana Ferrer.

Abstract:
Self-supervised representations of speech are currently being widely used for a large number of applications. Recently, some efforts have been made in trying to analyze the type of information present in each of these representations. Most such work uses downstream models to test whether the representations can be successfully used for a specific task. The downstream models, though, typically perform nonlinear operations on the representation extracting information that may not have been readily available in the original representation. In this work, we analyze the spatial organization of phone and speaker information in several state-of-the-art speech representations using methods that do not require a downstream model. We measure how different layers encode basic acoustic parameters such as formants and pitch using representation similarity analysis. Further, we study the extent to which each representation clusters the speech samples by phone or speaker classes using non-parametric statistical testing. Our results indicate that models represent these speech attributes differently depending on the target task used during pretraining.

More information:
https://arxiv.org/abs/2302.14055

Andres Juarez2023-11-27T13:30:24-03:00 27/noviembre/2023|Papers|

Activity homogeneity: a measure for comparing time discretization and state quantization in ODE simulation

A note on busy beaver bounds

EnCodecMAE: Leveraging Neural Codecs for Universal Audio Representation Learning

Are Optimal Algorithms Still Optimal? Rethinking Sorting in LLM-Based Pairwise Ranking with Batching and Caching

Modal Abstractions for Smart Contract Validation

Integrating Bayesian and neural networks models for eye movement prediction in hybrid search

Algorithms to prove the maximum number of MUBs in arbitrary dimensión

Rauzy complexity and block entropy

Hybrid resource allocation control in cyber-physical systems: a novel simulation-driven methodology with applications to UAVs

Mapping Semantic Segmentation to Point Clouds Using Structure from Motion for Forest Analysis

A multi-scale agent-based model of aerosol-mediated indoor infections in heterogeneous scenarios

Non-crossing H-graphs: a generalization of proper interval graphs admitting FPT algorithms

The discrepancy estimate of the Champernowne constant

No Need for Ad-hoc Substitutes: The Expected Cost is a Principled All-purpose Classification Metric

Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study

Phone and speaker spatial organization in self-supervised speech representations

Compartir en las redes

Related Posts