Anuran Calls (MFCCs)
Donated on 2/23/2017
Acoustic features extracted from syllables of anuran (frogs) calls, including the family, the genus, and the species labels (multilabel).
Dataset Characteristics
Multivariate
Subject Area
Biology
Associated Tasks
Classification, Clustering
Feature Type
Real
# Instances
7195
# Features
-
Dataset Information
Additional Information
This dataset was used in several classifications tasks related to the challenge of anuran species recognition through their calls. It is a multilabel dataset with three columns of labels. This dataset was created segmenting 60 audio records belonging to 4 different families, 8 genus, and 10 species. Each audio corresponds to one specimen (an individual frog), the record ID is also included as an extra column. We used the spectral entropy and a binary cluster method to detect audio frames belonging to each syllable. The segmentation and feature extraction were carried out in Matlab. After the segmentation we got 7195 syllables, which became instances for train and test the classifier. These records were collected in situ under real noise conditions (the background sound). Some species are from the campus of Federal University of Amazonas, Manaus, others from Mata Atlântica, Brazil, and one of them from Córdoba, Argentina. The recordings were stored in wav format with 44.1kHz of sampling frequency and 32bit of resolution, which allows us to analyze signals up to 22kHz. From every extracted syllable 22 MFCCs were calculated by using 44 triangular filters. These coefficients were normalized between -1 ≤ mfcc ≤ 1. The amount of instances per class are: Families: Bufonidae 68 Dendrobatidae 542 Hylidae 2165 Leptodactylidae 4420 Genus: Adenomera 4150 Ameerega 542 Dendropsophus 310 Hypsiboas 1593 Leptodactylus 270 Osteocephalus 114 Rhinella 68 Scinax 148 Species: AdenomeraAndre 672 AdenomeraHylaedact… 3478 Ameeregatrivittata 542 HylaMinuta 310 HypsiboasCinerascens 472 HypsiboasCordobae 1121 LeptodactylusFuscus 270 OsteocephalusOopha… 114 Rhinellagranulosa 68 ScinaxRuber 148
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 22
Additional Variable Information
Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an mel-frequency cepstrum (MFC). Due to each syllable has different length, every row (i) was normalized acording to MFCCs_i/(max(abs(MFCCs_i))).
Dataset Files
File | Size |
---|---|
Frogs_MFCCs.csv | 3 MB |
Readme.txt | 5.1 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset anuran_calls_mfccs = fetch_ucirepo(id=406) # data (as pandas dataframes) X = anuran_calls_mfccs.data.features y = anuran_calls_mfccs.data.targets # metadata print(anuran_calls_mfccs.metadata) # variable information print(anuran_calls_mfccs.variables)
Colonna, J., Nakamura, E., Cristo, M., & Gordo, M. (2015). Anuran Calls (MFCCs) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5CC9H.
Creators
Juan Colonna
Eduardo Nakamura
Marco Cristo
Marcelo Gordo
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.