Advances in exploratory data analysis, visualisation and quality for data centric AI systems

H Patel, S Guttula, RS Mittal, N Manwani… - Proceedings of the 28th …, 2022 - dl.acm.org
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022dl.acm.org
It is widely accepted that data preparation is one of the most time-consuming steps of the
machine learning (ML) lifecycle. It is also one of the most important steps, as the quality of
data directly influences the quality of a model. In this tutorial, we will discuss the importance
and the role of exploratory data analysis (EDA) and data visualisation techniques to find
data quality issues and for data preparation, relevant to building ML pipelines. We will also
discuss the latest advances in these fields and bring out areas that need innovation. To …
It is widely accepted that data preparation is one of the most time-consuming steps of the machine learning (ML) lifecycle. It is also one of the most important steps, as the quality of data directly influences the quality of a model. In this tutorial, we will discuss the importance and the role of exploratory data analysis (EDA) and data visualisation techniques to find data quality issues and for data preparation, relevant to building ML pipelines. We will also discuss the latest advances in these fields and bring out areas that need innovation. To make the tutorial actionable for practitioners, we will also discuss the most popular open-source packages that one can get started with along with their strengths and weaknesses. Finally, we will discuss on the challenges posed by industry workloads and the gaps to be addressed to make data-centric AI real in industry settings.
ACM Digital Library
Showing the best result for this search. See all results