User profiles for Alessio Netti

Alessio Netti

HPC/AI Senior Research Engineer, DeepL
Verified email at deepl.com
Cited by 285

Operational data analytics in practice: experiences from design to deployment in production HPC environments

A Netti, M Ott, C Guillen, D Tafani, M Schulz - Parallel Computing, 2022 - Elsevier
As HPC systems continue to grow in scale and complexity, efficient and manageable operation
is increasingly critical. For this reason, many centers are starting to explore the use of …

DCDB wintermute: Enabling online and holistic operational data analytics on HPC systems

A Netti, M Müller, C Guillen, M Ott, D Tafani… - Proceedings of the 29th …, 2020 - dl.acm.org
As we approach the exascale era, the size and complexity of HPC systems continues to
increase, raising concerns about their manageability and sustainability. For this reason, more …

From facility to application sensor data: modular, continuous and holistic monitoring with DCDB

A Netti, M Müller, A Auweter, C Guillen, M Ott… - Proceedings of the …, 2019 - dl.acm.org
Today's HPC installations are highly-complex systems, and their complexity will only
increase as we move to exascale and beyond. At each layer, from facilities to systems, from …

A machine learning approach to online fault classification in HPC systems

A Netti, Z Kiziltan, O Babaoglu, A Sîrbu… - Future Generation …, 2020 - Elsevier
As High-Performance Computing (HPC) systems strive towards the exascale goal, failure
rates both at the hardware and software levels will increase significantly. Thus, detecting and …

A conceptual framework for HPC operational data analytics

A Netti, W Shin, M Ott, T Wilde… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
This paper provides a broad framework for understanding trends in Operational Data Analytics
(ODA) for High-Performance Computing (HPC) facilities. The goal of ODA is to allow for …

Hpc hardware design reliability benchmarking with hdfit

P Omland, A Netti, Y Peng, A Baldovin… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Chips pack ever more, ever smaller transistors. Fault rates increase in turn and become more
concerning, particularly at the scale of High-Performance Computing (HPC) systems: on …

Mixed precision support in HPC applications: What about reliability?

A Netti, Y Peng, P Omland, M Paulitsch, J Parra… - Journal of Parallel and …, 2023 - Elsevier
In their quest for exascale and beyond, High-Performance Computing (HPC) systems
continue becoming ever larger and more complex. Application developers, on the other hand, …

FINJ: A fault injection tool for HPC systems

A Netti, Z Kiziltan, O Babaoglu, A Sîrbu… - Euro-Par 2018: Parallel …, 2019 - Springer
We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC)
systems, with a focus on the management of complex experiments. FINJ provides support for …

Towards a predictive energy model for HPC runtime systems using supervised learning

…, G Poerwawinata, M Maiterth, A Netti… - Euro-Par 2019: Parallel …, 2020 - Springer
High-Performance Computing systems collect vast amounts of operational data with the
employment of monitoring frameworks, often augmented with additional information from …

AccaSim: a customizable workload management simulator for job dispatching research in HPC systems

C Galleguillos, Z Kiziltan, A Netti, R Soto - Cluster Computing, 2020 - Springer
We present AccaSim, a simulator for workload management in HPC systems. Thanks to
AccaSim’s scalability to large workload datasets, support for easy customization, and practical …