{"id":2553,"date":"2023-11-27T13:30:24","date_gmt":"2023-11-27T16:30:24","guid":{"rendered":"https:\/\/icc.fcen.uba.ar\/?p=2553"},"modified":"2023-11-27T13:30:24","modified_gmt":"2023-11-27T16:30:24","slug":"phone-and-speaker-spatial-organization-in-self-supervised-speech-representations","status":"publish","type":"post","link":"https:\/\/icc.fcen.uba.ar\/en\/phone-and-speaker-spatial-organization-in-self-supervised-speech-representations\/","title":{"rendered":"Phone and speaker spatial organization in self-supervised speech representations"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1144px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-1\"><p>Authors: Pablo Riera, Manuela Cerdeiro, Leonardo Pepino and Luciana Ferrer.<\/p>\n<p>Abstract:<br \/>\nSelf-supervised representations of speech are currently being widely used for a large number of applications. Recently, some efforts have been made in trying to\u00a0analyze the type of information present in each of these representations. Most such work uses downstream models to test whether the representations can be successfully\u00a0used for a specific task. The downstream models, though, typically perform nonlinear operations on the representation extracting information that may not have been\u00a0readily available in the original representation. In this work, we analyze the spatial organization of phone and speaker information in several state-of-the-art speech\u00a0representations using methods that do not require a downstream model. We measure how different layers encode basic acoustic parameters such as formants and pitch using\u00a0representation similarity analysis. Further, we study the extent to which each representation clusters the speech samples by phone or speaker classes using non-parametric statistical testing. Our results indicate that models represent these speech attributes differently depending on the target task used during pretraining.<\/p>\n<p>More information:<br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2302.14055\" target=\"_blank\" rel=\"noopener\">https:\/\/arxiv.org\/abs\/2302.14055<\/a><\/p>\n<\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":9,"featured_media":2554,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[98],"tags":[],"class_list":["post-2553","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-papers"],"_links":{"self":[{"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/posts\/2553","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/comments?post=2553"}],"version-history":[{"count":1,"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/posts\/2553\/revisions"}],"predecessor-version":[{"id":2555,"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/posts\/2553\/revisions\/2555"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/media\/2554"}],"wp:attachment":[{"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/media?parent=2553"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/categories?post=2553"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icc.fcen.uba.ar\/en\/wp-json\/wp\/v2\/tags?post=2553"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}