ECE Seminar Lecture Series

Location-based training for multi-channel speaker separation and diarization

DeLiang Wang, PhD, Perception and Neurodynamics Laboratory, Ohio State University

Wednesday, April 3, 2024
Noon–1 p.m.

1400 Wegmans Hall

 

 

 

Permutation ambiguity is a key issue in deep learning-based talker-independent speaker separation. Permutation invariant training (PIT) is widely used for addressing the permutation ambiguity problem. In multi-channel scenarios, permutation ambiguity may be naturally resolved by leveraging the spatial relations of different speakers. We present location-based training (LBT), a new approach to achieve talker independency in multi-channel speaker separation. Unlike PIT that examines all possible permutations, LBT assigns speakers according to their positions in physical space. Specifically, we propose two training criteria: azimuth-based and distance-based training, using speaker azimuths and distances relative to a microphone array. Evaluation results show that LBT significantly outperforms PIT on two-speaker and three-speaker mixtures with different array geometries and in various acoustic conditions.  In addition, LBT is employed in a new speaker diarization approach for meeting environments. This approach integrates speaker separation and allows multiple non-overlapped speakers to be assigned to the same output stream. As a result, the proposed approach is capable of processing long audio recordings involving many participating speakers. The evaluation results on the LibriCSS dataset demonstrate that the new multi-channel diarization approach advances the state-of-the-art performance in speaker diarization and speaker-attributed speech recognition by a large margin.

DeLiang Wang looking at cameraDeLiang Wang received the B.S. degree and the M.S. degree from Peking (Beijing) University and the Ph.D. degree in 1991 from the University of Southern California all in computer science. Since 1991, he has been with the Department of Computer Science & Engineering and the Center for Cognitive and Brain Sciences at The Ohio State University, where he is a Professor and University Distinguished Scholar. He received the U.S. Office of Naval Research Young Investigator Award in 1996, the 2008 Helmholtz Award from the International Neural Network Society, the 2007 Outstanding Paper Award of the IEEE Computational Intelligence Society and the 2019 Best Paper Award of the IEEE Signal Processing Society. He is an IEEE Fellow and ISCA Fellow, and currently serves as the Editor-in-Chief of Neural Networks.

 

 

Refreshments will be provided.