Abstract
This paper presents an experimental evaluation of the combined temporal and spectral processing methods for speaker recognition task under noise, reverberation or multi-speaker environments. Automatic speaker recognition system gives good performance in controlled environments. Speech recorded in real environments by distant microphones is degraded by factors like background noise, reverberation and interfering speakers. This degradation strongly affects the performance of the speaker recognition system. Combined temporal and spectral processing (TSP) methods proposed in our earlier study are used for pre-processing to improve the speaker-specific features and hence the speaker recognition performance. Different types of degradation like background noise, reverberation and interfering speaker are considered for evaluation. The evaluation is carried out for the individual temporal processing, spectral processing and the combined TSP method. The experimental results show that the combined TSP methods give relatively higher recognition performance compared to either temporal or spectral processing alone.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Allen J, Berkley D 1979 Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65: 943–950
Bimbot F, Bonastre J F, Fredouille C, Gravier G, Chagnolleau M I, Meignier S, Merlin T, Garcia O J, Delacretaz P, Reynolds 2004 A tutorial on text-independent speaker verification. EURASIP J. Applied Signal process. 4: 430–451
Boll S 1979 Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust., Speech, Signal process. ASSP-27 113–120
Campbell J P 1997 Speaker recognition: A tutorial. Proc. IEEE 85(9): 1437–1462
Ephraim Y, Malah D 1984 Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal process. ASSP-32 1109–1121
Furui S 1981 Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Audio, Speech and Language process. 29(3): 342–350
Greenberg S, Kingsbury B E D 1997 The modulation spectrogram: in pursuit of an invariant representation of speech. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Munich, Germany 1647–1650
Habets E A P, Gannot S, Cohen I, Sommen P C W 2008 Joint dereverberation and residual echo suppression of speech signals in noisy environments. IEEE Trans. Audio, Speech, and Language Process. 16(8): 1433–1451
Heck L P, Konig Y, Sönmez M K, Weintraub M 2000 Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communication 31(2–3): 181–192
Kamath S, Loizou P 2002 A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Orlando, USA
Krishnamoorthy P, Prasanna S R M 2007 Processing noisy speech for enhancement. J. IETE Technical Review, Special issue on spoken language processing 24: 349–355
Krishnamoorthy P, Prasanna S R M 2008 Temporal and spectral processing of degraded speech. In: IEEE Proc. Int. Conf. Advanced Computing and Communications 112–118
Krishnamoorthy P, Prasanna S R M 2009 Reverberant speech enhancement by temporal and spectral processing. IEEE Trans. Speech, Audio and Language Process. 17(2): 253–266
Lebart K, Boucher J 2001 A new method based on spectral subtraction for speech dereverberation. Acta Acoustica 87: 359–366
Markel J 1972 The SIFT algorithm for fundamental frequency estimation. IEEE Trans. Audio and Electroacoustics 20: 367–377
McAulay R, Quatieri T 1986 Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust., Speech, Signal process. ASSP-34 744–754
Ming J, Hazen T, Glass J, Reynolds D 2007 Robust speaker recognition in noisy conditions. IEEE Trans. Audio, Speech, and Language process. 15(5): 1711–1723
Morgan D, George E, Lee L, Kay S 1997 Cochannel speaker separation by harmonic enhancement and suppression. IEEE Trans. Speech Audio process. 5: 407–424
Murty K S R, Yegnanarayana B 2008 Epoch extraction from speech signals. IEEE Trans. Audio, Speech, and Language Process. 16(8): 1602–1613
Ortega-Garcia J, Gonzalez-Rodriguez J 1996 Overview of speech enhancement techniques for automatic speaker recognition. In: Proc. Fourth Int. Conf. Spoken Language. 2: 929–932
Parsons T 1976 Separation of speech from interfering speech by means of harmonic selection. J. Acoust. Soc. Am. 60: 911–918
Picone J 1993 Signal modelling techniques in speech recognition. Proc. IEEE 81(9): 1215–1247
Prakash V, Hansen J 2007 In-set/out-of-set speaker recognition under sparse enrollment. IEEE Trans. Audio, Speech, and Language process. 15(7): 2044–2052
Prasanna S R M, Sandeep Reddy B, Krishnamoorthy P 2009 Vowel onset point detection using source, spectral peaks and modulation spectrum energies. IEEE Trans. Speech, Audio and Language Process. 17(4): 556–565
Prasanna S R M, Subramanian A 2005 Finding pitch markers using first order Gaussian differentiator. In: IEEE Proc. Third Int. Conf. Intelligent Sensing Information Process. 140–145
Prasanna S R M, Yegnanarayana B 2004 Extraction of pitch in adverse conditions. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal process. Vol. 1. Montreal, Quebec, Canada I-109–I-112
Proakis J G, Manolakis D G 1996 Digital signal processing-principles, algorithms, and applications, 3rd Edition. Prentice Hall
Rao K, Prasanna S R M, Yegnanarayana B 2007 Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal process. Letters 14(10): 762–765
Reynolds D 1994 Experimental evaluation of features for robust speaker identification. IEEE Trans., Speech Audio process. 2(4): 639–643
Reynolds D A 1995 Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17: 91–108
Reynolds D A 2000 Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10(1–3): 19–41
Sankar A, Lee C-H 1996 A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. Speech Audio process. 4(3): 190–202
Varga A, Steeneken H J M 1993 Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Communication 12(3): 247–251
Wu M, Wang D 2006 A two-stage algorithm for one-microphone reverberant speech enhancement. IEEE Trans. Audio, Speech, Language process. 14: 774–784
Yegnanarayana B, Avendano C, Hermansky H, Satyanarayana Murthy P 1999 Speech enhancement using linear prediction residual. Speech Communication 28: 25–42
Yegnanarayana B, Prasanna S R M, Duraiswami R, Zotkin D 2005 Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio process. 13: 1110–1118
Yegnanarayana B, Prasanna S R M, Mathew M 2003 Enhancement of speech in multispeaker environment. In: Proc. European Conf. Speech process., Technology. Geneva, Switzerland 581–584
Yegnanarayana B, Satyanarayana Murthy P 2000 Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio process. 8: 267–281
Zilovic M, Ramachandran R, Mammone R 1998 Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Trans. Speech Audio process. 6(3): 260–267
Zue V, Seneff S, Glass J 1990 Speech database development at MIT: TIMIT and beyond. Speech Communication 9(4): 351–356
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Krishnamoorthy, P., Mahadeva Prasanna, S.R. Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments. Sadhana 34, 729–754 (2009). https://doi.org/10.1007/s12046-009-0043-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12046-009-0043-8