An improved deep learning approach for speech enhancement

Main Article Content

Malek Miled
Mohamed Anouar Ben Messaoud

Abstract

Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cross-domain framework. This framework utilizes our knowledge of the spectrogram and overcomes some of the limitations faced by time-frequency domain methods. First, we apply the intrinsic mode functions of the empirical mode decomposition and an improved version of principal component analysis. Then, we design a cross-domain learning framework to determine the correlations along the frequency and time axes. At low SNR = -5 dB, the effectiveness of our proposed approach is demonstrated by its performance based on objective and subjective measures. With average scores of -0.49, 2.47, 2.44, and 0.68 for SegSNR, PESQ, Cov, and STOI, respectively. The results highlight the success of our approach in addressing low SNR conditions.

Downloads

Download data is not yet available.

Article Details