Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments

Krishnamoorthy, P.; Mahadeva Prasanna, S. R.

doi:10.1007/s12046-009-0043-8

Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments

Published: 21 December 2009

Volume 34, pages 729–754, (2009)
Cite this article

Download PDF

Access provided by ICE Institution of Civil Engineers

Sadhana Aims and scope Submit manuscript

Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments

Download PDF

P. Krishnamoorthy¹ &
S. R. Mahadeva Prasanna¹

160 Accesses
10 Citations
Explore all metrics

Abstract

This paper presents an experimental evaluation of the combined temporal and spectral processing methods for speaker recognition task under noise, reverberation or multi-speaker environments. Automatic speaker recognition system gives good performance in controlled environments. Speech recorded in real environments by distant microphones is degraded by factors like background noise, reverberation and interfering speakers. This degradation strongly affects the performance of the speaker recognition system. Combined temporal and spectral processing (TSP) methods proposed in our earlier study are used for pre-processing to improve the speaker-specific features and hence the speaker recognition performance. Different types of degradation like background noise, reverberation and interfering speaker are considered for evaluation. The evaluation is carried out for the individual temporal processing, spectral processing and the combined TSP method. The experimental results show that the combined TSP methods give relatively higher recognition performance compared to either temporal or spectral processing alone.

Article PDF

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Article 24 February 2015

Review of various stages in speaker recognition system, performance measures and recognition toolkits

Article 05 December 2017

A Mathematical Approach to Speech Enhancement for Speech Recognition and Speaker Identification Systems

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Allen J, Berkley D 1979 Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65: 943–950
Article Google Scholar
Bimbot F, Bonastre J F, Fredouille C, Gravier G, Chagnolleau M I, Meignier S, Merlin T, Garcia O J, Delacretaz P, Reynolds 2004 A tutorial on text-independent speaker verification. EURASIP J. Applied Signal process. 4: 430–451
Article Google Scholar
Boll S 1979 Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust., Speech, Signal process. ASSP-27 113–120
Google Scholar
Campbell J P 1997 Speaker recognition: A tutorial. Proc. IEEE 85(9): 1437–1462
Article Google Scholar
Ephraim Y, Malah D 1984 Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal process. ASSP-32 1109–1121
Google Scholar
Furui S 1981 Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Audio, Speech and Language process. 29(3): 342–350
Article Google Scholar
Greenberg S, Kingsbury B E D 1997 The modulation spectrogram: in pursuit of an invariant representation of speech. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Munich, Germany 1647–1650
Habets E A P, Gannot S, Cohen I, Sommen P C W 2008 Joint dereverberation and residual echo suppression of speech signals in noisy environments. IEEE Trans. Audio, Speech, and Language Process. 16(8): 1433–1451
Article Google Scholar
Heck L P, Konig Y, Sönmez M K, Weintraub M 2000 Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communication 31(2–3): 181–192
Article Google Scholar
Kamath S, Loizou P 2002 A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Orlando, USA
Krishnamoorthy P, Prasanna S R M 2007 Processing noisy speech for enhancement. J. IETE Technical Review, Special issue on spoken language processing 24: 349–355
Google Scholar
Krishnamoorthy P, Prasanna S R M 2008 Temporal and spectral processing of degraded speech. In: IEEE Proc. Int. Conf. Advanced Computing and Communications 112–118
Krishnamoorthy P, Prasanna S R M 2009 Reverberant speech enhancement by temporal and spectral processing. IEEE Trans. Speech, Audio and Language Process. 17(2): 253–266
Article Google Scholar
Lebart K, Boucher J 2001 A new method based on spectral subtraction for speech dereverberation. Acta Acoustica 87: 359–366
Google Scholar
Markel J 1972 The SIFT algorithm for fundamental frequency estimation. IEEE Trans. Audio and Electroacoustics 20: 367–377
Article Google Scholar
McAulay R, Quatieri T 1986 Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust., Speech, Signal process. ASSP-34 744–754
Google Scholar
Ming J, Hazen T, Glass J, Reynolds D 2007 Robust speaker recognition in noisy conditions. IEEE Trans. Audio, Speech, and Language process. 15(5): 1711–1723
Article Google Scholar
Morgan D, George E, Lee L, Kay S 1997 Cochannel speaker separation by harmonic enhancement and suppression. IEEE Trans. Speech Audio process. 5: 407–424
Article Google Scholar
Murty K S R, Yegnanarayana B 2008 Epoch extraction from speech signals. IEEE Trans. Audio, Speech, and Language Process. 16(8): 1602–1613
Article Google Scholar
Ortega-Garcia J, Gonzalez-Rodriguez J 1996 Overview of speech enhancement techniques for automatic speaker recognition. In: Proc. Fourth Int. Conf. Spoken Language. 2: 929–932
Article Google Scholar
Parsons T 1976 Separation of speech from interfering speech by means of harmonic selection. J. Acoust. Soc. Am. 60: 911–918
Article Google Scholar
Picone J 1993 Signal modelling techniques in speech recognition. Proc. IEEE 81(9): 1215–1247
Article Google Scholar
Prakash V, Hansen J 2007 In-set/out-of-set speaker recognition under sparse enrollment. IEEE Trans. Audio, Speech, and Language process. 15(7): 2044–2052
Article Google Scholar
Prasanna S R M, Sandeep Reddy B, Krishnamoorthy P 2009 Vowel onset point detection using source, spectral peaks and modulation spectrum energies. IEEE Trans. Speech, Audio and Language Process. 17(4): 556–565
Article Google Scholar
Prasanna S R M, Subramanian A 2005 Finding pitch markers using first order Gaussian differentiator. In: IEEE Proc. Third Int. Conf. Intelligent Sensing Information Process. 140–145
Prasanna S R M, Yegnanarayana B 2004 Extraction of pitch in adverse conditions. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Vol. 1. Montreal, Quebec, Canada I-109–I-112
Google Scholar
Proakis J G, Manolakis D G 1996 Digital signal processing-principles, algorithms, and applications, 3rd Edition. Prentice Hall
Rao K, Prasanna S R M, Yegnanarayana B 2007 Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal process. Letters 14(10): 762–765
Article Google Scholar
Reynolds D 1994 Experimental evaluation of features for robust speaker identification. IEEE Trans., Speech Audio process. 2(4): 639–643
Article Google Scholar
Reynolds D A 1995 Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17: 91–108
Article Google Scholar
Reynolds D A 2000 Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10(1–3): 19–41
Article Google Scholar
Sankar A, Lee C-H 1996 A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. Speech Audio process. 4(3): 190–202
Article Google Scholar
Varga A, Steeneken H J M 1993 Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Communication 12(3): 247–251
Article Google Scholar
Wu M, Wang D 2006 A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Trans. Audio, Speech, Language process. 14: 774–784
Article Google Scholar
Yegnanarayana B, Avendano C, Hermansky H, Satyanarayana Murthy P 1999 Speech enhancement using linear prediction residual. Speech Communication 28: 25–42
Article Google Scholar
Yegnanarayana B, Prasanna S R M, Duraiswami R, Zotkin D 2005 Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio process. 13: 1110–1118
Article Google Scholar
Yegnanarayana B, Prasanna S R M, Mathew M 2003 Enhancement of speech in multispeaker environment. In: Proc. European Conf. Speech process., Technology. Geneva, Switzerland 581–584
Yegnanarayana B, Satyanarayana Murthy P 2000 Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio process. 8: 267–281
Article Google Scholar
Zilovic M, Ramachandran R, Mammone R 1998 Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Trans. Speech Audio process. 6(3): 260–267
Article Google Scholar
Zue V, Seneff S, Glass J 1990 Speech database development at MIT: TIMIT and beyond. Speech Communication 9(4): 351–356
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, 781 039, India
P. Krishnamoorthy & S. R. Mahadeva Prasanna

Authors

P. Krishnamoorthy
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Krishnamoorthy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krishnamoorthy, P., Mahadeva Prasanna, S.R. Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments. Sadhana 34, 729–754 (2009). https://doi.org/10.1007/s12046-009-0043-8

Download citation

Received: 10 September 2008
Revised: 10 March 2009
Published: 21 December 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s12046-009-0043-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments

Abstract

Article PDF

Similar content being viewed by others

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Review of various stages in speaker recognition system, performance measures and recognition toolkits

A Mathematical Approach to Speech Enhancement for Speech Recognition and Speaker Identification Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments

Abstract

Article PDF

Similar content being viewed by others

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Review of various stages in speaker recognition system, performance measures and recognition toolkits

A Mathematical Approach to Speech Enhancement for Speech Recognition and Speaker Identification Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation