JMP SUMMIT EUROPE 2018 - Data Mining Under The Curse of Dimensionality (Gianpaolo Polsinelli - LFoundry Italy)
JMP SUMMIT EUROPE 2018 - Data Mining Under The Curse of Dimensionality (Gianpaolo Polsinelli - LFoundry Italy)
JMP SUMMIT EUROPE 2018 - Data Mining Under The Curse of Dimensionality (Gianpaolo Polsinelli - LFoundry Italy)
With a so high number of data the standard technique of one variable at the time could fail because of the influence of a large number of manufacturing variables.
Data by itself isn’t useful. To be useful it must be converted into actionable information to drive yield and product quality improvement.
Here comes the Machine Learning (ML).
In order to avoid model over-fitting issue the reduction of sample dimensionality is needed too. In other words to increase the signal-noise ratio of available data
we need to reduce the feature number before apply any ML model. Once the interesting patterns have been extracted from database, they will be validate by
experience of engineer.
Explore and Cleaning Data Find relationships
Stratification and
Predict future observations
Dimensionality reduction
the entire process has been realized using JMP13 JSL features
Curse of Dimensionality
Gianpaolo Polsinelli, Felice Russo
The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds
or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.
The expression was coined by Richard E. Bellman when considering problems in dynamic optimization.
There are multiple phenomena referred to by this name in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining,
and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the
available data become sparse.
Remove
Empty
&
Stagnant
Columns
Weighting Stratification
By selecting the Weight column option, a Selecting the new sampling %
new COLUMN is created • % of random rows reduction
• % of rows reduction Stratified by Weight Column
Weight _ Column
# of _ Badsample
# of _ Goodsample
Going to next step, the user can chose different MACHINE LEARNING models.
The X variables will be the main predictors found by algo.
Machine Learning and Model Validation
Gianpaolo Polsinelli, Felice Russo
Three models implemented… The example below is related to modeling using Cluster Variables technique