Automatic Speaker Verification
Automatic Speaker Verification
Automatic Speaker Verification
Outline
Introduction Speaker identification vs verification Speaker verification overview The parts of a speaker verification system Evaluation of speaker verification performance Application Future Directions
Introduction
Extracting Information from Speech
Goal: Automatically extract information transmitted in speech signal
Speech Signal
Words
How are you?
Speaker identity
Dr. Ahmad
Speaker identification
Determine the speaker identity Selection between a set of known voices The user does not claim an identity Closed set identification Assume that all speakers are known to the system Open set identification Possibility that speaker is not among the speakers known to the system ?
Speaker Verification
Synonyms: authentication, detection User claims an identity System task: Accept or reject identity claim The voice can come from outside the set of known speakers All speakers known: closed set Impostor: All voices but the true identity
Is this Ahmads voice?
Identification vs verification
Speaker1 Speaker1 Speaker2 Speaker2
Speaker ID
decision decision
Speaker SpeakerN N Speaker Speaker Model Model Feature Feature extraction extraction Impostor Impostor Model Model
+ _
decision decision
Speech Modalities
Application dictates different speech modalities: Text-dependent recognition Recognition system knows text spoken by person Examples: fixed phrase, prompted phrase Used for applications with strong control over user input Knowledge of spoken text can improve system performance Prompting may reduce risk of impostors using voice recordings Text-independent recognition Recognition system does not know text spoken by person Examples: User selected phrase, conversational speech Used for applications with less control over user input More flexible system but also more difficult problem Speech recognition can provide knowledge of spoken text
Speech is easily produced It does not require advanced input devices Can be applied using telephones, PCs Can be supplied with - password phrase
to improve security
- Personal knowledge
Speaker verification
Which features? How to model the speaker How to model the imposters How to make the decision to minimize probability of error
Ahmad
Salma
Accepted!
No feature has all these attributes Features derived from spectrum of speech have proven to be the most effective in automatic systems Typically: MFCCs
Speaker Models
Speaker models (voiceprints) represent voice biometric in compact and generalizable form Modern speaker verification systems use Hidden Markov Models (HMMs)
HMMs are statistical models of how a speaker produces sounds HMMs represent underlying statistical variations in the speech state (e.g., phoneme) and temporal changes of speech between the states. Fast training algorithms (EM) exist for HMMs with guaranteed convergence properties. h-a-d
Speaker Models
Form of HMM depends on the application Fixed Phrase Word/phrase models Open Semsame
Text-independent
General speech
Verification Decision
The decision is a 2-class hypothesis test H0: the speaker is an impostor H1: the speaker is indeed the claimed speaker. Statistic computed on test utterance S as likelihood ratio: =log Likelihood S came from speaker HMM Likelihood S did not come from speaker HMM
Speaker Speaker Model Model Feature Feature extraction extraction Impostor Impostor Model Model
+ _
decision decision
DET-curve
Applications
Transaction authentication Toll fraud prevention Telephone credit card purchases Telephone brokerage (e.g., stock trading)
Applications
Access control Physical facilities Computers and data networks
Applications
Monitoring Remote time and attendance logging Home parole verification Prison telephone usage
Applications
Information retrieval Customer information for call centers Audio indexing (speech skimming device)
Applications
Forensics Voice sample matching
Recorded threat
Suspect
Future Directions
Research will focus on using speaker recognition for more unconstrained, uncontrolled situations
Audio search and retrieval Increasing robustness to channel variability Incorporating higher-levels of knowledge into decisions