An Overview of Audio Event Detection Methods From Feature Extraction To Classification

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

An Overview of Audio Event Detection

Methods from Feature Extraction to

Abstract With substantial attention been given to Audio Event Detection (AED) in
various types of applications, scientist are now more motivated to perform extensive
researches on AED and its support in many different and practical applications. Feature
extraction and classification process for AED is quite challenging because of the noticeable
features in audio signals. Several AED works only focuses on a few highlight acoustic
events and none actually covers and surveyed the state-of-the-art of audio event
detection. Hence this paper will be different from any previous efforts in terms of
comprehensiveness, emphasis and timeliness. This review starts with fundamental of AED
systems, concerning preprocessing, feature extraction and classification methods. We also
expanded this research by critically compare of the audio detection methods and
algorithms according to accuracy and false alarm by using different types of data sets.



Audio Event Detection (AED) aims to detect different types of audio signals
such as speech and non-speech within a long and unstructured stream of audio.
AED can be considered as a new area of research with the ambitious goal of
replacing the intelligent surveillance systems (ISS) with traditional surveillance
systems (TSS). The traditional systems, requires the regions of interest (ROI)
equipped with cameras, microphones, or other kinds of sensors to be constantly
monitored by human operators and to record audio data to multimedia dataset. A
multimedia dataset often consists of millions of audio clips, which includes
environmental, speech and music sounds with other non-speech utterances to use
in AED. The basis for most of AED related research fields and applications are
feature extraction and audio classification. These appear to be significant tasks in
many approaches that were carried out in numerous areas and environments. They
include the detection of abnormal event (gunshot) in security [1], speech
recognition [2-4], speaker recognition [5, 6], animal vocalization [7-10], home care
application [11], medical diagnostic problems [12], bioacoustics monitoring [8, 9],
sport events [13-15], faults and failure detection in complex industrial systems [16]
and several others. The performance of an AED system, such as its complication,
accuracy of classification and false alarm relies extremely on the extraction of the
audio features and the classifiers [17, 18].
Feature extraction is one of the most significant factors in audio signal
processing [19]. Audio signal comprises many features of which not all are essential
for audio processing. A set of extracted features from the input audio signal were
employed by all classification systems where each features represents an element
of the vector in the feature space. Therefore, many different audio classification

methods based on the evaluation of system performance were proposed. These

approaches mostly are different from each other in selection of classifier or number
of acoustical features involved. In the perspective of decomposition, the extracted
features are classified into temporal, spectral and perceptual features. Audio
classification is another major stage and key issues in audio signal processing and
pattern recognition with possible applications in audio detection, documentation
and event analysis. Audio classification is the ability to precisely classify the
selected feature vector to its corresponding class. Different types of classifier
include manual classification which was time consuming, supervised, unsupervised,
and semi-supervised learning Algorithm were employed to reduce classification
A number of issues relating to feature extraction and classification have been
reviewed in many existing literature. Lu [20] reviewed a survey by covering time
and frequently domain features. In speaker recognition fields, Kinnunen reviewed
features and speaker modeling [21] and in another review [22] he covered types of
features and best known clustering algorithms in accuracy terms. A survey reviewed
by Prakash and Nithya to covered all aspects of semi-supervised learning algorithms
[23]. Bhavsar and Ganatra [24] considered machine learning classification algorithm
and their comparative in terms of speed, accuracy, scalability and other issues. This
has in turn helped other researchers study the existing algorithms and develop
innovative algorithms for applications or requirements previously unavailable.
Although, there is hundreds of audio event detection methods proposed in
various fields and different areas but unfortunately, only a few extensive studies
actually survey or compare them. While most of the works in AED focuses on a few
highlight acoustic events, none of them covered the state-of-the-art of AED. Our
work will differ from all previous efforts in terms of emphasis, timeliness, and
comprehensiveness. The need to perform detailed and comprehensive studies on
the vital aspects of AED methods has lead researchers to orchestrate reviews of the
AED classification methods and algorithms. The goal of this survey is to highlight
the classification issues and challenges in AED methods and as a way to analyze the
audio event detection methods and algorithms from various perspectives.
Furthermore, we present a comparative study based on key attributes such as
accuracy, false alarm, precision and recall. This is considered the most recent
advancement in this area, as well as identifying future research trends which greatly
benefit both general and expert readers.
This review paper is organized as follow: Section 2 discusses on research
methodology. An overview of preprocessing, feature extraction and classification
methods are accessible in Section 3. Section4 consists of a discussion about
evaluation and performance of classification methods in terms of accuracy rate and
an argument about the comparison of techniques and their accuracy based on
reviewed article. Finally Section 5 presents some final clarifications about this

You might also like