Go To Program
139

Statistical Modeling of the Speech Signal


Ivan Tashev 1 Alex Acero 1
1 Speech Technology Group, Microsoft Research, Redmond, Washington 98034, United States

The Gaussian distribution is the most commonly used statistical model of the speech signal. In this paper we propose more general statistical model for the distributions of the real and imaginary parts of the speech signal DFT coefficients and their magnitudes. Based on experimental measurements with the TIMIT database we have shown that the Generalized Gaussian Distribution holds well across frequencies and audio frame sizes. A Weibull distribution is proposed to model the statistical behavior of the speech signal amplitude in the frequency domain. Estimation of the distribution parameters from experimental measurements corresponds well to the distribution of the real and imaginary parts. We propose and evaluate several statistical models of various complexities. Overall these statistical models fit the actual measurements with a Jensen-Shannon divergence below 0.0012 for real and imaginary parts and below 0.003 for magnitudes. The results presented in this paper are applicable for improving speech processing algorithms based on statistical signal processing.



View pdf