ML Unit 1
ML Unit 1
ML Unit 1
MACHINE LEARNING
“Machine Learning is a subset of artificial intelligence. It focuses
mainly on the designing of systems, thereby allowing them to learn and
make predictions based on some set of matrices in machines”.
MACHINE LEARNING
MACHINE LEARNING
MACHINE LEARNING
1. Supervised Machine Learning
In supervised learning, you train your model on a
labelled dataset that means we have both raw input
data as well as its results. We split our data into a
training dataset and test dataset where the training
dataset is used to train our network whereas the test
dataset acts as new data for predicting results or to
see the accuracy of our model.
MACHINE LEARNING
MACHINE LEARNING
Some algorithms for supervised learning
1.Linear Regression
2.Random Forest
3.Support Vector Machines (SVM)
MACHINE LEARNING
Applications of Supervised Learning
•Sentiment Analysis: It is a natural language processing technique in
which we analyze and categorize some meaning out of the given text
data.
MACHINE LEARNING
1. Unsupervised Machine Learning
Unsupervised learning studies how systems can
infer a function to describe a hidden structure from
unlabeled data. The system doesn’t predict the right
output, but instead, it explores the data and can
draw inferences from datasets to describe hidden
structures from unlabeled data.
Some examples of models that belong to this family
are the following: PCA, K-means, DBSCAN, mixture
models etc.
MACHINE LEARNING
MACHINE LEARNING
Applications:
MACHINE LEARNING
Wrapper Method
•The main idea behind a wrapper method is to select
which set of features works best for a machine learning
model.
•It follows a greedy search approach by evaluating all the
possible combinations of features against the evaluation
criterion.
MACHINE LEARNING
1. Forward Selection
Forward selection is an iterative method in each iteration, we keep adding the
feature which best improves our model till an addition of a new variable does
not improve the performance of the model.
2. Backward Selection
In backward elimination, we start with all the features and removes the least
significant feature at each iteration which improves the performance of the
model. We repeat this until no improvement is observed on removal of features.
3. Exhaustive Feature Selection
This is the most robust feature selection method covered so far. This is a brute-
force evaluation of each feature subset. This means that it tries every possible
combination of the variables and returns the best performing subset.
MACHINE LEARNING
Filter Method
Annova or F-Test
A univariate test,
Linear model for
testing the individual
effect of each of
features with target.
•ANOVA assumes a
linear relationship
between the features
and the target, and also
that the variables are
normally distributed. MACHINE LEARNING
MACHINE LEARNING
Feature Selection using Mutual Information(MI)
•It measures the amount of information obtained about one variable through
observing the other variable. In other words, it determines how much we can know
about one variable by understanding another—it’s a little bit like correlation, but
mutual information is more general.
•In machine learning, mutual information measures how much information the
presence/absence of a feature contributes to making the correct prediction on Y.
•The mutual information between two random variables X and Y can be stated
formally as follows:
•I(X ; Y) = H(X) – H(X | Y)
• Where I(X ; Y) is the mutual information for X and Y,
• H(X) is the entropy for X and H(X | Y) is the conditional entropy for X
given Y.
•Mutual information is a measure of dependence or “mutual dependence” between
two random variables. As such, the measure is symmetrical, meaning that I(X ; Y) =
I(Y ; X).
Categorical Feature Selection via Chi-Square Test of
Independence
MACHINE LEARNING
MACHINE LEARNING
MACHINE LEARNING
MACHINE LEARNING
Normalization
• min-max normalization
• z-score normalization
For example,
Suppose that the maximum and minimum values for the attribute income are $98,000
and $12,000, respectively. We would like to map income to the range [0; 1].
For example,
Suppose that the mean and standard deviation of the values for the attribute
income are $54,000 and $16,000, respectively. With z-score normalization, a value
of $73,600 for income is transformed to
Normalization by decimal scaling
Suppose that the recorded values of A range from -986 to 917. The maximum absolute
value of A is 986. To normalize by decimal scaling, we therefore divide each value by
1,000 (i.e., j = 3) so that -986 normalizes to -0.986.
Dimensionality reduction
1.The input data are normalized, so that each attribute falls within the same range. This step helps
ensure that attributes with large domains will not dominate attributes with smaller domains.
2. PCA computes N orthonormal vectors which provide a basis for the normalized input data. These
are unit vectors that each point in a direction perpendicular to the others. These vectors are
referred to as the principal components. The input data are a linear combination of the principal
components.
3. The principal components are sorted in order of decreasing strength. The principal components
essentially serve as a new set of axes for the data, providing important information about variance.
That is, the sorted axes are such that the first axis shows the most variance among the data, the
second axis shows the next highest variance, and so on. This information helps identify groups or
patterns within the data.
4. Since the components are sorted according to decreasing order of signficance, the size of the
data can be reduced by eliminating the weaker components, i.e., those with low variance. Using the
strongest principal components, it should be possible to reconstruct a good approximation of the
original data.
Linear Discriminant
Linear Analysis Analysis or Normal
Discriminant Discriminant
Analysis or Discriminant Function Analysis is a dimensionality
reduction technique that is commonly used for supervised
classification problems. It is used for modelling diffe rences in
groups i.e. separating two or more classes. It is used to project
the features in higher dimension space into a lower dimension
space.
For example, we have two classes and we need to separate them
efficiently. Classes can have multiple features. Using only a
single feature to classify them may result in some overlapping as
shown in the below figure. So, we will keep on increasing the
number of features for proper classification.