Automatic Speaker Verification

Automatic Speaker Verification
Zouhir Wakaf, PhD
Outline
Introduction Speaker identification vs verification Speaker verification overview The parts of a speaker verification system Evaluation of speaker verification performance Application Future Directions
Introduction
Extracting Information from Speech
Goal: Automatically extract information transmitted in speech signal
Speech Signal
Speech Speech recognition recognition
Words
How are you?
Speaker Speaker recognition recognition
Speaker identity
Dr. Ahmad
Speaker identification
Determine the speaker identity Selection between a set of known voices The user does not claim an identity Closed set identification Assume that all speakers are known to the system Open set identification Possibility that speaker is not among the speakers known to the system ?
Whose voice is this?
Speaker Verification
Synonyms: authentication, detection User claims an identity System task: Accept or reject identity claim The voice can come from outside the set of known speakers All speakers known: closed set Impostor: All voices but the true identity
Is this Ahmads voice?
Identification vs verification
Speaker1 Speaker1 Speaker2 Speaker2
Feature Feature extraction extraction
Speaker ID
decision decision
Speaker SpeakerN N Speaker Speaker Model Model Feature Feature extraction extraction Impostor Impostor Model Model
+ _
decision decision
> accept < reject
Speech Modalities
Application dictates different speech modalities: Text-dependent recognition Recognition system knows text spoken by person Examples: fixed phrase, prompted phrase Used for applications with strong control over user input Knowledge of spoken text can improve system performance Prompting may reduce risk of impostors using voice recordings Text-independent recognition Recognition system does not know text spoken by person Examples: User selected phrase, conversational speech Used for applications with less control over user input More flexible system but also more difficult problem Speech recognition can provide knowledge of spoken text
Speech for Identification

Speech is easily produced It does not require advanced input devices Can be applied using telephones, PCs Can be supplied with - password phrase
to improve security
- Personal knowledge
Speaker verification
Which features? How to model the speaker How to model the imposters How to make the decision to minimize probability of error
Phases of Speaker Verification System

Two distinct phases to any speaker verification system Enrolment Phase Enrolment speech for
each speaker Voiceprints (models) for each speaker
Ahmad Salma Verification Phase
Model Model training training
Ahmad
Salma
Verification Verification decision decision
Accepted!
Claimed identity: Salma
Features for Speaker Recognition

Humans use several levels of perceptual cues for speaker recognition There are no exclusive speaker identity cues Low-level acoustic cues (physical traits) most applicable for automatic systems Desirable attributes of features for an automatic system Occur naturally and frequently in speech Occur naturally and frequently in speech Practical Easily measurable Easily measurable Not Notchange changeover overtime timeor orbe beaffected affectedby byspeakers speakershealth health Robust Not Notbe beaffected affectedby byreasonable reasonablebackground backgroundnoise noisenor nor depend on specific transmission characteristics depend on specific transmission characteristics Secure Not Notbe besubject subjectto tomimicry mimicry
Features for Speaker Recognition
No feature has all these attributes Features derived from spectrum of speech have proven to be the most effective in automatic systems Typically: MFCCs
Speaker Models
Speaker models (voiceprints) represent voice biometric in compact and generalizable form Modern speaker verification systems use Hidden Markov Models (HMMs)
HMMs are statistical models of how a speaker produces sounds HMMs represent underlying statistical variations in the speech state (e.g., phoneme) and temporal changes of speech between the states. Fast training algorithms (EM) exist for HMMs with guaranteed convergence properties. h-a-d
Speaker Models
Form of HMM depends on the application Fixed Phrase Word/phrase models Open Semsame
Prompted phrases/passwords /s/ /i/
Phoneme models /x/
Text-independent
single state HMM
General speech
Text-independent speaker verification

The imposter model is built using speech from all speakers GMM with high number of mixture components The speaker model is built using speaker adaptation Relatively small amount of speech
Verification Decision
The decision is a 2-class hypothesis test H0: the speaker is an impostor H1: the speaker is indeed the claimed speaker. Statistic computed on test utterance S as likelihood ratio: =log Likelihood S came from speaker HMM Likelihood S did not come from speaker HMM
Speaker Speaker Model Model Feature Feature extraction extraction Impostor Impostor Model Model
+ _
decision decision
> accept < reject
Verification Performance Evaluating Speaker Verification Systems

There are many factors to consider in evaluating speaker verification systems
Channel and microphone characteristics Noise level and type Variability between enrolment and verification speech Fixed/prompted/user-selected phrases Free text Duration and number of sessions of enrolment and verification speech Size and composition
Speech quality Speech modality Speech duration Speaker population
DET-curve
Importance of the error types depend on application!
Applications
Transaction authentication Toll fraud prevention Telephone credit card purchases Telephone brokerage (e.g., stock trading)
Applications
Access control Physical facilities Computers and data networks
Applications
Monitoring Remote time and attendance logging Home parole verification Prison telephone usage
Applications
Information retrieval Customer information for call centers Audio indexing (speech skimming device)
Applications
Forensics Voice sample matching
Recorded threat
Suspect
Future Directions
Research will focus on using speaker recognition for more unconstrained, uncontrolled situations
Audio search and retrieval Increasing robustness to channel variability Incorporating higher-levels of knowledge into decisions
Speaker recognition technology will become an integral part of speech interfaces

Personalization of services and devices Unobtrusive protection of transactions and information

Automatic Speaker Verification

Uploaded by

Copyright:

Available Formats

Automatic Speaker Verification

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Speaker Verification

Uploaded by

Copyright:

Available Formats

Automatic Speaker Verification

Zouhir Wakaf, PhD

Speech Speech recognition recognition

Speaker Speaker recognition recognition

Whose voice is this?

Feature Feature extraction extraction

> accept < reject

Speech for Identification

Phases of Speaker Verification System

Ahmad Salma Verification Phase

Feature Feature extraction extraction

Model Model training training

Feature Feature extraction extraction

Verification Verification decision decision

Claimed identity: Salma

Features for Speaker Recognition

Features for Speaker Recognition

Prompted phrases/passwords /s/ /i/

Phoneme models /x/

single state HMM

Text-independent speaker verification

> accept < reject

Verification Performance Evaluating Speaker Verification Systems

Speech quality Speech modality Speech duration Speaker population

Importance of the error types depend on application!

Speaker recognition technology will become an integral part of speech interfaces

You might also like