Authors: Mariel Estevez and Luciana Ferrer
Abstract:
Speaker verification (SV) systems are currently used for consequential tasks like giving access to bank accounts or making forensic decisions. Ensuring that these systems are fair and do not disfavor any particular group is crucial. In this work, we analyze the performance of two X-vector-based SV systems across groups defined by gender and accent of the speakers when speaking English. To this end, we created a new dataset based on the VoxCeleb corpus by selecting samples from speakers with accents from different countries. We used this dataset to evaluate system performance of SV systems trained with VoxCeleb data. We show that performance, measured with a calibration-sensitive metric, is markedly degraded on groups that are underrepresented in training: females and speakers with nonnative accents in English. Finally, we show that a simple data balancing approach mitigates this undesirable bias on the minority groups without degrading performance on the majority groups.
More information: https://ieeexplore.ieee.org/document/10095150