Chair: Peter C. Doerschuk, Purdue University, USA
Bryan S. Nollett, University of Illinois at Urbana-Champaign (U.S.A.)
Douglas L. Jones, University of Illinois at Urbana-Champaign (U.S.A.)
Nonlinearities in the amplifier and loudspeaker of hands-free speakerphones limit the performance of linear adaptive acoustic echo cancellers, necessitating the use of nonlinear cancellation schemes. A nonlinear acoustic echo canceller based on the Wiener-Hammerstein model structure of a cascade of linear, memoryless nonlinear, and linear elements is proposed. By modeling the true structure of the nonlinear system, the proposed canceller requires relatively few adaptive parameters, offering significantly lower storage and computational requirements than more general nonlinear adaptive filtering techniques. Experimental results on measured loudspeaker signals indicate that the proposed nonlinear echo canceller provides as much as an 8.4 dB improvement in Echo Return Loss Enhancement (ERLE) over a linear Normalized LMS canceller with little additional computation.
Zijun Yang, University of Missouri-Columbia (U.S.A.)
Jozsef Vass, University of Missouri-Columbia (U.S.A.)
Yunxin Zhao, University of Illinois (U.S.A.)
Xinhua Zhuang, University of Missouri-Columbia (U.S.A.)
A highly efficient algorithm termed adaptive forward-backward vector quantization (AFBVQ) is developed for variable bit rate quantization of linear predictive coding (LPC) coefficients and integrated with the FS1016 Federal Standard Code Excited Linear Predictive (CELP) coder. This results in a high performance low bit rate speech coder called as AFBVQ-CELP which brings in two-fold bit rate reduction by backward LPC indexing and by forward LPC VQ. In AFBVQ, a previously decoded and temporally close speech signal is re-segmented into overlapping blocks. As the LPC coefficients calculated from one of those synthetic blocks are spectrally close to the current unquantized LPC coefficients, the backward LPC indexing is used to encode the current speech block; otherwise, the forward linear prediction is practised with the split vector quantization supported by a very efficient codebook initialization termed Mixture Gaussian Clustering (MGC) [1]. When compared to FS1016 CELP coder, AFBVQ-CELP reduces the LPC bit rate by 18 bit-per-frame (bpf) at the same spectral distortion. It means the overall bit rate is reduced from 4.8 kbps (FS1016 CELP) to 4.2 kbps. Furthermore, the proposed AFBVQ consistently outperforms the traditional forward LPC VQ by 3 bpf with the same spectral distortion. Subjective listening tests show that with AFBVQ-CELP the LPC bit rate can be further reduced to 8.4 bpf, resulting in 3.94 kbps overall bit rate without compromising the decoded speech quality.
Wan-Chieh Pai, Purdue University (U.S.A.)
Peter C. Doerschuk, Purdue University (U.S.A.)
A nonlinear statistical speech production model based on AM-FM modulation and signal processing methods to extract the component signals are described. Preliminary ideas on using these signals to compute features for a Hidden Markov Model speech recognizer are presented.
Michael Moore, University of California Santa Barbara (U.S.A.)
Sanjit Mitra, University of California Santa Barbara (U.S.A.)
Reinhard Bernstein, University Erlangen-Nuremberg (Germany)
The 1-D Teager algorithm can be used to perform mean weighted highpass filtering with relatively few operations. We propose a generalization of the Teager algorithm. The modified algorithm allows us to adjust the dependence of the highpass output on the local mean. The derivation and interpretation of the modified algorithm is presented. Finally, the response of several implementations to a test input is presented.
Ilya Shmulevich, Purdue University (U.S.A.)
Edward J. Coyle, Purdue University (U.S.A.)
The recursive median filter is used to improve the structure of the output of the key finding algorithm for establishing tonal contexts of musical patterns in a musical composition. Ibis is subsequently incorporated into a system for recognition of musical patterns. Krumhansl's key-finding algorithm is used as a basis. The sequence of maximum correlations that it outputs is smoothed with a cubic spline and is used to determine weights for perceptual and absolute pitch errors. Maximum correlations are used to create the assigned key sequence, which is processed by a recursive median filter. In most cases, the recursive median filter establishes the key more accurately than the standard median filter. Additionally, since the recursive median is idempotent, the key-finding output is guaranteed to be a root signal.
Kenneth E. Barner, University of Delaware (U.S.A.)
This paper develops colored L4 filters and evaluates their performance in a fundamental speech processing problem: estimation of the glottal function for speech pitch detection. Colored L4 filters are an extension of the temporal/rank order based L4 filters in which the rank indexes are quantized (colored) and a bias is added to each weight. Quantizing the rank indexes reduces the number of filter parameters and allows the observation window size to grow beyond that previously practical. It is shown that the window size/rank quantization tradeoff has advantages in many applications.
Joel MacAuslan, Speech Technology & Applied Research Corp. (U.S.A.)
Karen Chenausky, Speech Technology & Applied Research Corp. (U.S.A.)
Jiahong Juda, Harvard/Smithsonian Center for Astrophysics (U.S.A.)
Ivan Manev, University of Maine (U.S.A.)
A technique to find repetitions in a laryngeal signal such as the volume velocity seems to identify the moments at which the larynx changes mode. The technique is based on the "close-returns plots" that have been used in the past. Pictures derived from the technique may help identify the segments when the laryngeal dynamics are confined to one attractor, a necessary condition for most nonlinear-dynamics processing. The technique appears equally useful for identifying such attractor changes in any nonlinear dynamical flow.