Data

19 pages, 9663 KiB

Open AccessData Descriptor

Two Datasets over South Tyrol and Tyrol Areas to Understand and Characterize Water Resource Dynamics in Mountain Regions

by Ludovica De Gregorio, Giovanni Cuozzo, Riccardo Barella, Francisco Corvalán, Felix Greifeneder, Peter Grosse, Abraham Mejia-Aguilar, Georg Niedrist, Valentina Premier, Paul Schattan, Alessandro Zandonai and Claudia Notarnicola

Data 2024, 9(11), 136; https://doi.org/10.3390/data9110136 - 16 Nov 2024

Abstract

In this work, we present two datasets for specific areas located on the Alpine arc that can be exploited to monitor and understand water resource dynamics in mountain regions. The idea is to provide the reader with information about the different sources of [...] Read more.

In this work, we present two datasets for specific areas located on the Alpine arc that can be exploited to monitor and understand water resource dynamics in mountain regions. The idea is to provide the reader with information about the different sources of water supply over five defined test areas over the South Tyrol (Italy) and Tyrol (Austria) areas in alpine environments. The snow cover fraction (SCF) and Soil Moisture Content (SMC) datasets are derived from machine learning algorithms based on remote sensing data. Both SCF and SMC products are characterized by a spatial resolution of 20 m and are provided for the period from October 2020 to May 2023 (SCF) and from October 2019 to September 2022 (SMC), respectively, covering winter seasons for SCF and spring–summer seasons for SMC. For SCF maps, the validation with very high-resolution images shows high correlation coefficients of around 0.9. The SMC products were originally produced with an algorithm validated at a global scale, but here, to obtain more insights into the specific alpine mountain environment, the values estimated from the maps are compared with ground measurements of automatic stations located at different altitudes and characterized by different aspects in the Val Mazia catchment in South Tyrol (Italy). In this case, an MAE between 0.05 and 0.08 and an unbiased RMSE between 0.05 and 0.09 m³·m⁻³ were achieved. The datasets presented can be used as input for hydrological models and to hydrologically characterize the study alpine area starting from different sources of information. Full article

(This article belongs to the Topic Techniques and Science Exploitations for Earth Observation and Planetary Exploration)

► Show Figures

Figure 1

8 pages, 489 KiB

Open AccessData Descriptor

Dataset to Quantify Spillover Effects Among Concurrent Green Initiatives

by Rong Zhang, Qi Zhang, Conghe Song and Li An

Data 2024, 9(11), 135; https://doi.org/10.3390/data9110135 - 13 Nov 2024

Abstract

Green initiatives are popular mechanisms globally to enhance environmental and human wellbeing. However, multiple green initiatives, when overlapping geographically and targeting the same participants, may interact with each other, giving rise to what is termed “spillover effects”, where one initiative and its outcomes [...] Read more.

Green initiatives are popular mechanisms globally to enhance environmental and human wellbeing. However, multiple green initiatives, when overlapping geographically and targeting the same participants, may interact with each other, giving rise to what is termed “spillover effects”, where one initiative and its outcomes influence another. This study examines the spillover effects among four major concurrent initiatives in the United States (U.S.) and China using a comprehensive dataset. In the U.S., we analysed county-level data in 2018 for the Conservation Reserve Program (CRP) and the Environmental Quality Incentives Program (EQIP), both operational for over 25 years. In China, data from Fanjingshan and Tianma National Nature Reserves (2014–2015) were used to evaluate the Grain-to-Green Program (GTGP) and the Forest Ecological Benefit Compensation (FEBC) program. The dataset comprises 3106 records for the U.S. and 711 plots for China, including several socio-economic variables. The results of multivariate linear regression indicate that there exist significant spillover effects between CRP & EQIP and GTGP & FEBC, with one initiative potentially enhancing or offsetting another’s impacts by 22% to 100%. This dataset provides valuable insights for researchers and policymakers to optimize the effectiveness and resilience of concurrent green initiatives. Full article

► Show Figures

Figure 1

11 pages, 456 KiB

Open AccessData Descriptor

The Design of a Script Identification Algorithm and Its Application in Constructing a Text Language Identification Dataset

by Mamtimin Qasim, Wushour Silamu and Minghui Qiu

Data 2024, 9(11), 134; https://doi.org/10.3390/data9110134 - 11 Nov 2024

Abstract

Script identification is easier to implement than language identification, and its identification rate is very high. The fewer languages are identified when using a language identification algorithm, the higher the identification rate is. However, no systematic study on SI involving multiple languages and [...] Read more.

Script identification is easier to implement than language identification, and its identification rate is very high. The fewer languages are identified when using a language identification algorithm, the higher the identification rate is. However, no systematic study on SI involving multiple languages and determining how to construct relevant language identification datasets has been conducted. Therefore, in this paper, we discuss and design a script identification algorithm and the construction of a language identification dataset based on script groups. The data sources in this paper comprise 261 different languages’ text corpora from the Leipzig Corpora Collection, which are grouped into 23 different script groups. In the Unicode encoding scheme, different scripts are arranged into different code regions. Based on this feature, we propose a written script identification algorithm based on regular expression matching, the micro F-score of which reaches 0.9929 in sentence-level script identification experiments. To reduce noise when constructing the language identification dataset for each script, a script identification algorithm is used to filter out other-script content in each text. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

13 pages, 3724 KiB

Open AccessData Descriptor

Additions to Space Physics Data Facility and pysatNASA: Increasing Mars Global Surveyor and Mars Atmosphere and Volatile EvolutioN Dataset Utility

by Teresa M. Esman, Alexa J. Halford, Jeff Klenzing and Angeline G. Burrell

Data 2024, 9(11), 133; https://doi.org/10.3390/data9110133 - 8 Nov 2024

Abstract

The Space Physics Data Facility (SPDF) is a digital archive of space physics data and is useful for the storage, analysis, and dissemination of data. We discuss the process used to create an amended dataset and store it on the SPDF. The operational [...] Read more.

The Space Physics Data Facility (SPDF) is a digital archive of space physics data and is useful for the storage, analysis, and dissemination of data. We discuss the process used to create an amended dataset and store it on the SPDF. The operational software to generate the archival data software uses the open-source Python package pysat, and an end-user module has been added to the pysatNASA module. The result is the addition of data products to the Mars Global Surveyor (MGS) magnetometer (MAG) dataset, its archival location on SPDF, and pysat compatibility. The primary and metadata format increases the convenience and efficiency for users of the MGS MAG data. The storage of planetary and heliophysics data in one location supports the use of data throughout the solar system for comparison, while pysat compatibility enables loading data in an identical format for ease of processing. We encourage the use of the outlined process for past, present, and future space science missions of all sizes and funding levels. This includes balloons to Flagship-class missions. Full article

► Show Figures

Figure 1

15 pages, 996 KiB

Open AccessData Descriptor

The VNF Cybersecurity Dataset for Research (VNFCYBERDATA)

by Believe Ayodele and Victor Buttigieg

Data 2024, 9(11), 132; https://doi.org/10.3390/data9110132 - 8 Nov 2024

Abstract

Virtualisation has received widespread adoption and deployment across a wide range of enterprises and industries throughout the years. Network Function Virtualisation (NFV) is a technical concept that presents a method for dynamically delivering virtualised network functions as virtualised or software components. Virtualised Network [...] Read more.

Virtualisation has received widespread adoption and deployment across a wide range of enterprises and industries throughout the years. Network Function Virtualisation (NFV) is a technical concept that presents a method for dynamically delivering virtualised network functions as virtualised or software components. Virtualised Network Function (VNF) has distinct advantages, but it also faces serious security challenges. Cyberattacks such as Denial of Service (DoS), malware/rootkit injection, port scanning, and so on can target VNF appliances just like any other network infrastructure. To create exceptional training exercises for machine or deep learning (ML/DL) models to combat cyberattacks in VNF, a suitable dataset (VNFCYBERDATA) exhibiting an actual reflection, or one that is reasonably close to an actual reflection, of the problem that the ML/DL model could address is required. This article describes a real VNF dataset that contains over seven million data points and twenty-five cyberattacks generated from five VNF appliances. To facilitate a realistic examination of VNF traffic, the dataset includes both benign and malicious traffic. Full article

► Show Figures

Figure 1

7 pages, 1284 KiB

Open AccessData Descriptor

Influence of Temperature Variability on the Efficacy of Negative Ions in Removing Particulate Matter and Pollutants: An Experimental Database

by Paola M. Ortiz-Grisales, Leidy Gutiérrez-León and Carlos D. Zuluaga-Ríos

Data 2024, 9(11), 131; https://doi.org/10.3390/data9110131 - 8 Nov 2024

Abstract

Cities globally must make urgent decisions to ensure a sustainable future as rising pollution, particularly PM2.5, poses severe health risks like respiratory and heart diseases. PM2.5’s harmful composition also impacts vegetation and the environment. Immediate government intervention is necessary to mitigate these effects. [...] Read more.

Cities globally must make urgent decisions to ensure a sustainable future as rising pollution, particularly PM2.5, poses severe health risks like respiratory and heart diseases. PM2.5’s harmful composition also impacts vegetation and the environment. Immediate government intervention is necessary to mitigate these effects. This study tackles the urgent problem of reducing PM2.5 levels in Medellín’s urban and indoor environments, where pollution presents serious health risks. To explore effective solutions, this research provides new data on the interaction between particulate matter from various pollutants and negative ions under different temperature conditions, offering valuable insights into air quality improvement strategies. Using a high-voltage system, ions bind to pollutants, accelerating their removal. Experiments measured temperature, humidity, formaldehyde, volatile organic compounds, negative ions, and PM2.5 in a 40 cm³ chamber across various conditions. Pollutants tested included cigarette smoke, incense, charcoal, and gasoline at two voltage levels and three temperature ranges. The data, available in CSV format, were based on 36,000 samples and repeated tests for reliability. This resource is designed to support studies investigating particulate matter control in urban and indoor environments, as well as to improve our understanding of negative ion-based air purification processes. The data are publicly available and structured in formats compatible with leading data analysis platforms. Full article

► Show Figures

Figure 1

9 pages, 3118 KiB

Open AccessData Descriptor

Non-Destructive Wood Analysis Dataset: Comparing X-Ray and Terahertz Imaging Techniques

by Caroline Marc, Bertrand Marcon, Louis Denaud and Stéphane Girardon

Data 2024, 9(11), 130; https://doi.org/10.3390/data9110130 - 5 Nov 2024

Abstract

Wood density measurement plays a crucial role in assessing wood quality and predicting its mechanical performance. This dataset was collected to compare the accuracy and reliability of two non-destructive techniques, X-rays and terahertz waves, for measuring wood density. While X-rays have been commonly [...] Read more.

Wood density measurement plays a crucial role in assessing wood quality and predicting its mechanical performance. This dataset was collected to compare the accuracy and reliability of two non-destructive techniques, X-rays and terahertz waves, for measuring wood density. While X-rays have been commonly used in the industry due to their effectiveness, they pose health risks due to ionizing radiation. Terahertz waves, on the other hand, are non-ionizing and offer high spatial resolution. This article presents a database of wood samples measurements obtained using both techniques, on the same 110 samples with a fine location of the measuring points, on a wide range of wood species (tropical and temperate ones) and densities, from 111 kg·m⁻³ to 1086 kg·m⁻³. The database includes X-ray and terahertz scans, sample dimensions, moisture content, and color photographs. Full article

► Show Figures

Figure 1

18 pages, 4493 KiB

Open AccessArticle

Data Hub for Life Cycle Assessment of Climate Change Solutions—Hydrogen Case Study

by Shiva Zargar, Miyuru Kannangara, Giovanna Gonzales-Calienes, Jianjun Yang, Jalil Shadbahr, Cyrille Decès-Petit and Farid Bensebaa

Data 2024, 9(11), 129; https://doi.org/10.3390/data9110129 - 5 Nov 2024

Abstract

Life cycle assessment, which evaluates the complete life cycle of a product, is considered the standard methodological framework to evaluate the environmental performance of climate change solutions. However, significant challenges exist related to datasets used to quantify these environmental indicators. Although extensive research [...] Read more.

Life cycle assessment, which evaluates the complete life cycle of a product, is considered the standard methodological framework to evaluate the environmental performance of climate change solutions. However, significant challenges exist related to datasets used to quantify these environmental indicators. Although extensive research and commercial data on climate change technologies, pathways, and facilities exist, they are not readily available to practitioners of life cycle assessment in the right format and structure using an open platform. In this study, we propose a new open data hub platform for life cycle assessment, considering a hierarchical data flow starting with raw data collected on climate change technologies at laboratory, pilot, demonstration, or commercial scales to provide the information required for policy and decision-making. This platform makes data accessible at multiple levels for practitioners of life cycle assessment, while making data interoperable across platforms. The proposed data hub platform and workflow are explained through the polymer electrolyte membrane electrolysis hydrogen production as a case study. The climate change environment impact of 1.17 ± 0.03 kg CO₂ eq./kg H₂ was calculated for the case study. The current data hub platform is limited to evaluating environmental impacts; however, future additions of economic and social aspects are envisaged. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

11 pages, 1930 KiB

Open AccessData Descriptor

Towards a Datatset of Digitalized Historical German VET and CVET Regulations

by Thomas Reiser, Jens Dörpinghaus, Petra Steiner and Michael Tiemann

Data 2024, 9(11), 128; https://doi.org/10.3390/data9110128 - 3 Nov 2024

Abstract

The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education [...] Read more.

The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education and training) and CVET (continuing vocational education and training) regulations to support educational research. This dataset contains data from 1908 to the present and includes 2125 documents as PDF, 983 fully converted XML documents, and additional metadata for 7090 documents from the archive. We present an overview of the historical background and the challenges of processing different historical documents from three different federal states. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—3rd Edition)

► Show Figures

Figure 1

8 pages, 535 KiB

Open AccessData Descriptor

Thermal Transmittance Limits Dataset for New and Existing Buildings Across EU Regulations

by Paolo Maria Congedo, Cristina Baglivo, Delia D’Agostino and Paola Maria Albanese

Data 2024, 9(11), 127; https://doi.org/10.3390/data9110127 - 31 Oct 2024

Abstract

Building energy regulations are essential for reducing energy consumption in the European Union (EU) and achieving climate neutrality goals. This data article supplements the “Overview of EU Building Envelope Energy Requirement for Climate Neutrality” by presenting a detailed dataset on building regulations across [...] Read more.

Building energy regulations are essential for reducing energy consumption in the European Union (EU) and achieving climate neutrality goals. This data article supplements the “Overview of EU Building Envelope Energy Requirement for Climate Neutrality” by presenting a detailed dataset on building regulations across all 27 EU member states, with a focus on building envelope efficiency. The data include thermal transmittance limits for windows, walls, floors, and roofs, offering insights into regulatory differences and potential opportunities for harmonization. Information was sourced from the Energy Performance of Buildings Directive (EPBD) database, national reports, and scientific literature to ensure comprehensive coverage. Key aspects of each country’s regulations are summarized in tables, covering both new constructions and renovations. The inclusion of Köppen–Geiger climate classifications allows for climate-specific analyses, providing valuable context for researchers, policymakers, and construction professionals. This dataset enables comparative studies, helping to identify best practices and inform policy interventions aimed at enhancing energy efficiency across Europe. It also supports the development of tailored strategies to improve building performance in different environmental conditions, ultimately contributing to the EU’s energy and climate targets. Full article

► Show Figures

Figure 1

5 pages, 507 KiB

Open AccessData Descriptor

Long-Term Outdoor Cultivation of Nannochloropsis in California, Hawaii, and New Mexico

by Alina A. Corcoran, Marcela Saracco Alvarez, Taryn Cornell, Isidora Echenique-Subiabre, Julia Gerber, Stephanie Getto, Ahlem Jebali, Heather Martinez, Jakob O. Nalley, Charles J. O’Kelly, Aidan Ryan, Jonathan B. Shurin and Shawn R. Starkenburg

Data 2024, 9(11), 126; https://doi.org/10.3390/data9110126 - 29 Oct 2024

Abstract

The project “Optimizing Selection Pressures and Pest Management to Maximize Cultivation Yield” (OSPREY, award #DE-EE08902) was undertaken to enhance the annual productivity, stability, and quality of algal production strains for biofuels and bioproducts. The foundation of this project was the year-round cultivation of [...] Read more.

The project “Optimizing Selection Pressures and Pest Management to Maximize Cultivation Yield” (OSPREY, award #DE-EE08902) was undertaken to enhance the annual productivity, stability, and quality of algal production strains for biofuels and bioproducts. The foundation of this project was the year-round cultivation of a Nannochloropsis strain across three outdoor systems in California, Hawaii, and New Mexico. We aimed to leverage environmental selection pressures to drive strain improvement and use metagenomic techniques to inform pest management tools. The resulting dataset includes environmental and biological parameters from these cultivation campaigns, captured in a single CSV file. This dataset aims to serve a wide range of end users, from biologists to algal farmers, addressing the scarcity of publicly available data on algae cultivation. Further data releases will include 16S rRNA amplicon sequencing and shotgun sequencing datasets. Full article

► Show Figures

Figure 1

21 pages, 1676 KiB

Open AccessArticle

Enhancing Access Across Europe for Documents Published According to Freedom of Information Act: Applying Woogle Design and Technique to Estonian Public Information Act Document

by Gerda Viira and Maarten Marx

Data 2024, 9(11), 125; https://doi.org/10.3390/data9110125 - 29 Oct 2024

Abstract

In the Netherlands, the Open Government Act (Wet openbare overheid or Woo/Wob in Dutch) is in effect, with the primary objective of ensuring a more transparent government. In line with the legislation, a search engine named Woogle has been designed and developed to [...] Read more.

In the Netherlands, the Open Government Act (Wet openbare overheid or Woo/Wob in Dutch) is in effect, with the primary objective of ensuring a more transparent government. In line with the legislation, a search engine named Woogle has been designed and developed to centralize documents published under the Open Government Act. The Estonian Public Information Act serves a similar purpose and requires all public institutions to publish information generated during official duties, fostering transparency and public oversight. Currently, Estonia’s document repositories are decentralized, and content search is not supported, which hinders people’s ability to efficiently locate information. This study aims to assess public information accessibility in Estonia and to apply Woogle’s design and techniques to Estonia’s document repositories, thereby evaluating its potential for broader European implementation. The methodology involved web scraping data and documents from 57 Estonian public institutions’ document repositories. The results indicate that Woogle’s design and techniques can be implemented in Estonia. From a technical perspective, the alignment of the fields was successful, while it was found that content-wise, the Estonian data present challenges due to inconsistencies and lack of comprehensive categorization. The findings suggest potential scalability across European countries, pointing to a broader applicability of the Woogle model for creating a corpus of Freedom of Information Act documents in Europe. The collected data are available as a dataset. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

8 pages, 2987 KiB

Open AccessData Descriptor

Curated Polyoxometalate Formula Dataset

by Aleksandar Kondinski, Nadiia Gumerova and Annette Rompel

Data 2024, 9(11), 124; https://doi.org/10.3390/data9110124 - 29 Oct 2024

Abstract

Reticular and cluster materials often feature complex formulas, making a comprehensive overview challenging due to the need to consult various resources. While datasets have been collected for metal-organic frameworks (MOFs), covalent organic frameworks (COFs), and zeolites, among others, there remains a gap in [...] Read more.

Reticular and cluster materials often feature complex formulas, making a comprehensive overview challenging due to the need to consult various resources. While datasets have been collected for metal-organic frameworks (MOFs), covalent organic frameworks (COFs), and zeolites, among others, there remains a gap in systematically organized information for polyoxometalates. This paper introduces a carefully curated dataset of 1984 polyoxometalate (POM) and related cluster metal oxide formula instances, currently connecting over 2500 POM material instances. These POM instances incorporate 75 different chemical elements, with compositions ranging from binary to octonary element clusters. This dataset not only enhances accessibility to polyoxometalate data but also aims to facilitate further research and development in the study of these complex inorganic compounds. Full article

(This article belongs to the Section Chemoinformatics)

► Show Figures

Graphical abstract

7 pages, 343 KiB

Open AccessData Descriptor

Sustainable Transportation Characteristics Diary—Example of Older (50+) Cyclists

by Sreten Jevremović, Carol Kachadoorian, Filip Arnaut, Aleksandra Kolarski and Vladimir A. Srećković

Data 2024, 9(11), 123; https://doi.org/10.3390/data9110123 - 25 Oct 2024

Abstract

Cycling is a sustainable and healthy form of transportation that is gradually becoming the primary means of transportation over shorter distances in many countries. This paper describes the dataset used to determine the cycling characteristics of seniors in the USA and Canada. For [...] Read more.

Cycling is a sustainable and healthy form of transportation that is gradually becoming the primary means of transportation over shorter distances in many countries. This paper describes the dataset used to determine the cycling characteristics of seniors in the USA and Canada. For these purposes, a specially created questionnaire was used in a survey conducted from August 2021 to July 2022. The questionnaire contained sections related to the general socio-demographic characteristics of the respondents, general characteristics of cycling (type of bicycle, cycle time, mileage, etc.), and specific characteristics of cycling (riding in night conditions, termination of cycling, motivating and demotivating factors for cycling, etc.). The total sample consisted of 5096 respondents (50+ years old). This database is particularly significant because it represents the first set of publicly available data related to the cycling characteristics of older adults. The database can be used by various researchers dealing with this topic, but also by the decision-makers who want to design a sustainable and accessible cycling infrastructure, respecting the requirements of this category of users. Finally, this dataset can serve as an adequate basis in the process of determining the specificities and understanding the needs of older cyclists in traffic. Full article

► Show Figures

Figure 1

8 pages, 2266 KiB

Open AccessData Descriptor

Towards a Taxonomy Machine: A Training Set of 5.6 Million Arthropod Images

by Dirk Steinke, Sujeevan Ratnasingham, Jireh Agda, Hamzah Ait Boutou, Isaiah C. H. Box, Mary Boyle, Dean Chan, Corey Feng, Scott C. Lowe, Jaclyn T. A. McKeown, Joschka McLeod, Alan Sanchez, Ian Smith, Spencer Walker, Catherine Y.-Y. Wei and Paul D. N. Hebert

Data 2024, 9(11), 122; https://doi.org/10.3390/data9110122 - 25 Oct 2024

Abstract

The taxonomic identification of organisms from images is an active research area within the machine learning community. Current algorithms are very effective for object recognition and discrimination, but they require extensive training datasets to generate reliable assignments. This study releases 5.6 million images [...] Read more.

The taxonomic identification of organisms from images is an active research area within the machine learning community. Current algorithms are very effective for object recognition and discrimination, but they require extensive training datasets to generate reliable assignments. This study releases 5.6 million images with representatives from 10 arthropod classes and 26 insect orders. All images were taken using a Keyence VHX-7000 Digital Microscope system with an automatic stage to permit high-resolution (4K) microphotography. Providing phenotypic data for 324,000 species derived from 48 countries, this release represents, by far, the largest dataset of standardized arthropod images. As such, this dataset is well suited for testing the efficacy of machine learning algorithms for identifying specimens into higher taxonomic categories. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

18 pages, 939 KiB

Open AccessArticle

Computing the Commonalities of Clusters in Resource Description Framework: Computational Aspects

by Simona Colucci, Francesco Maria Donini and Eugenio Di Sciascio

Data 2024, 9(10), 121; https://doi.org/10.3390/data9100121 - 20 Oct 2024

Abstract

Clustering is a very common means of analysis of the data present in large datasets, with the aims of understanding and summarizing the data and discovering similarities, among other goals. However, despite the present success of the use of subsymbolic methods for data [...] Read more.

Clustering is a very common means of analysis of the data present in large datasets, with the aims of understanding and summarizing the data and discovering similarities, among other goals. However, despite the present success of the use of subsymbolic methods for data clustering, a description of the obtained clusters cannot rely on the intricacies of the subsymbolic processing. For clusters of data expressed in a Resource Description Framework (RDF), we extend and implement an optimized, previously proposed, logic-based methodology that computes an RDF structure—called a Common Subsumer—describing the commonalities among all resources. We tested our implementation with two open, and very different, RDF datasets: one devoted to public procurement, and the other devoted to drugs in pharmacology. For both datasets, we were able to provide reasonably concise and readable descriptions of clusters with up to 1800 resources. Our analysis shows the viability of our methodology and computation, and paves the way for general cluster explanations to be provided to lay users. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

9 pages, 3341 KiB

Open AccessData Descriptor

Rainfall Erosivity over Brazil: A Large National Database

by Mariza P. Oliveira-Roza, Roberto A. Cecílio, David B. S. Teixeira, Michel C. Moreira, André Q. Almeida, Alexandre C. Xavier and Sidney S. Zanetti

Data 2024, 9(10), 120; https://doi.org/10.3390/data9100120 - 14 Oct 2024

Abstract

Rainfall erosivity (RE) represents the potential of rainfall to cause soil erosion, and understanding its impact is essential for the adoption of soil and water conservation practices. Although several studies have estimated RE for Brazil, currently, no single reliable and easily accessible database [...] Read more.

Rainfall erosivity (RE) represents the potential of rainfall to cause soil erosion, and understanding its impact is essential for the adoption of soil and water conservation practices. Although several studies have estimated RE for Brazil, currently, no single reliable and easily accessible database exists for the country. To fill this gap, this work aimed to review the research and generate a rainfall erosivity database for Brazil. Data were gathered from studies that determined rainfall erosivity from observed rainfall records and synthetic rainfall series. Monthly and annual rainfall erosivity values were organized on a spreadsheet and in the shapefile format. In total, 54 studies from 1990 to 2023 were analyzed, resulting in the compilation of 5516 erosivity values for Brazil, of which 6.3% were pluviographic, and 93.7% were synthetic. The regions with the highest availability of information were the Northeast (35.6%), Southeast (30.1%), South (19.9%), Central-West (7.7%), and North (6.7%). The database, which can be accessed on the Mendeley Data platform, can aid professionals and researchers in adopting public policies and carrying out studies aimed at environmental conservation and management basin development. Full article

(This article belongs to the Section Spatial Data Science and Digital Earth)

► Show Figures

Figure 1

19 pages, 8517 KiB

Open AccessArticle

Data Mining Approach for Evil Twin Attack Identification in Wi-Fi Networks

by Roman Banakh, Elena Nyemkova, Connie Justice, Andrian Piskozub and Yuriy Lakh

Data 2024, 9(10), 119; https://doi.org/10.3390/data9100119 - 14 Oct 2024

Abstract

Recent cyber security solutions for wireless networks during internet open access have become critically important for personal data security. The newest WPA3 network security protocol has been used to maximize this protection; however, attackers can use an Evil Twin attack to replace a [...] Read more.

Recent cyber security solutions for wireless networks during internet open access have become critically important for personal data security. The newest WPA3 network security protocol has been used to maximize this protection; however, attackers can use an Evil Twin attack to replace a legitimate access point. The article is devoted to solving the problem of intrusion detection at the OSI model’s physical layers. To solve this, a hardware–software complex has been developed to collect information about the signal strength from Wi-Fi access points using wireless sensor networks. The collected data were supplemented with a generative algorithm considering all possible combinations of signal strength. The k-nearest neighbor model was trained on the obtained data to distinguish the signal strength of legitimate from illegitimate access points. To verify the authenticity of the data, an Evil Twin attack was physically simulated, and a machine learning model analyzed the data from the sensors. As a result, the Evil Twin attack was successfully identified based on the signal strength in the radio spectrum. The proposed model can be used in open access points as well as in large corporate and home Wi-Fi networks to detect intrusions aimed at substituting devices in the radio spectrum where IEEE 802.11 networking equipment operates. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

9 pages, 1033 KiB

Open AccessData Descriptor

A Dataset of Two-Dimensional XBeach Model Set-Up Files for Northern California

by Andrea C. O’Neill, Kees Nederhoff, Li H. Erikson, Jennifer A. Thomas and Patrick L. Barnard

Data 2024, 9(10), 118; https://doi.org/10.3390/data9100118 - 11 Oct 2024

Abstract

Here, we describe a dataset of two-dimensional (2D) XBeach model files that were developed for the Coastal Storm Modeling System (CoSMoS) in northern California as an update to an earlier CoSMoS implementation that relied on one-dimensional (1D) modeling methods. We provide details on [...] Read more.

Here, we describe a dataset of two-dimensional (2D) XBeach model files that were developed for the Coastal Storm Modeling System (CoSMoS) in northern California as an update to an earlier CoSMoS implementation that relied on one-dimensional (1D) modeling methods. We provide details on the data and their application, such that they might be useful to end-users for other coastal studies. Modeling methods and outputs are presented for Humboldt Bay, California, in which we compare output from a nested 1D modeling approach to 2D model results, demonstrating that the 2D method, while more computationally expensive, results in a more cohesive and directly mappable flood hazard result. Full article

► Show Figures

Figure 1

Journal Description

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Topical Collections

Further Information

Guidelines

MDPI Initiatives

Follow MDPI