ML UNIT-1 Notes PDF
ML UNIT-1 Notes PDF
ML UNIT-1 Notes PDF
Unit-1:
Introduction: Definition of learning systems, Goals and applications of machine learning, Aspects of
developing a learning system: training data, concept representation, function approximation. Inductive
Classification: The concept learning task, Concept learning as search through a hypothesis space, General-
to-specific ordering of hypotheses, Finding maximally specific hypotheses, Version spaces and the candidate
elimination algorithm, Learning conjunctive concepts, The importance of inductive bias.
FIG. Classification
Some typical classification problems include:
Image classification
Prediction of disease
Win–loss prediction of games
Prediction of natural calamity like earthquake, flood, etc.
Recognition of handwriting
Regression: In linear regression, the objective is to predict numerical features like real estate or stock
price, temperature, marks in an examination, sales revenue, etc. The underlying predictor variable and the
target variable are continuous in nature.
In case of simple linear regression, there is only one predictor variable whereas in case of multiple linear
regression, multiple predictor variables can be included in the model.
A typical linear regression model can be represented in the form –
where ‘x’ is the predictor variable and ‘y’ is the target variable.
Typical applications of regression can be seen in
Demand forecasting in retails
Sales prediction for managers
Price prediction in real estate
Weather forecast
Skill demand forecast in job market.
Unsupervised learning: In unsupervised learning, there is no labeled training data to learn from and no
prediction to be made. In unsupervised learning, the objective is to take a dataset as input and try to find
natural groupings or patterns within the data elements or records. Therefore, unsupervised learning is often
termed as descriptive model and the process of unsupervised learning is referred as pattern discovery or
knowledge discovery.
FIG. Unsupervised learning
Two areas of unsupervised learning, i.e. clustering and association analysis.
Inductive Classification:
In concept learning we only learn a description for the positive class; label everything that doesn’t satisfy
that description as negative.
Each concept can be viewed as describing some subset of objects or events defined over a larger set (e.g., the
subset of animals that constitute birds). Alternatively, each concept can be thought of as a boolean-valued
function defined over this larger set (e.g., a function defined over all animals, whose value is true for birds
and false for other animals).
Concept learning. Inferring a boolean-valued function from training examples of its input and output.
TABLE : Positive and negative training examples for the target concept EnjoySport.
The most general hypothesis-that every day is a positive example-is represented by
<?, ?, ?, ?, ?, ?>
and the most specific possible hypothesis-that no day is a positive example-is represented by
< Φ, Φ, Φ, Φ, Φ, Φ >
The definition of the EnjoySport concept learning task in this general form is given in Below.
Given:
Instances X: Possible days, each described by the attributes
o Sky (with possible values Sunny, Cloudy, and Rainy),
o AirTemp (with values Warm and Cold),
o Humidity (with values Normal and High),
o Wind (with values Strong and Weak),
o Water (with values Warm and Cool), and
o Forecast (with values Same and Change).
Hypotheses H: Each hypothesis is described by a conjunction of constraints on the attributes Sky,
AirTemp, Humidity, Wind, Water, and Forecast. The constraints may be "?" (any value is
acceptable), "0 (no value is acceptable), or a specific value.
Target concept c: EnjoySport : X (0,l)
Training examples D: Positive and negative examples of the target function (see above Table).
Determine:
A hypothesis h in H such that h(x) = c(x) for all x in X.
The set of items over which the concept is defined is called the set of instances, which we denote by X.
The concept or function to be learned is called the target concept, which we denote by c. In general, c can be
any boolean valued function defined over the instances X; that is, c : X{O, 1}.
In the current example, the target concept corresponds to the value of the attribute EnjoySport (i.e., c(x) = 1
if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).
When learning the target concept, the learner is presented a set of training examples, each consisting of an
instance x from X, along with its target concept value c(x) (e.g., the training examples in above Table).
Instances for which c(x) = 1 are called positive examples, or members of the target concept. Instances for
which C(X) = 0 are called negative examples, or nonmembers of the target concept. We will often write the
ordered pair (x, c(x)) to describe the training example consisting of the instance x and its target concept
value c(x). We use the symbol D to denote the set of available training examples.
Given a set of training examples of the target concept c, the problem faced by the learner is to hypothesize,
or estimate, c. We use the symbol H to denote the set of all possible hypotheses that the learner may consider
regarding the identity of the target concept. Usually H is determined by the human designer's choice of
hypothesis representation.
In general, each hypothesis h in H represents a boolean-valued function defined over X; that is,
h : X {O, 1}.
The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X.
Now consider the sets of instances that are classified positive by h l and by h2. Because h2 imposes fewer
constraints on the instance, it classifies more instances as positive. In fact, any instance classified positive by
hl will also be classified positive by h2. Therefore, we say that h2 is more general than h1.
First, for any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1
First, we consider the hypothesis to be a more specific hypothesis. Hence, our hypothesis would be:
h = {ϕ, ϕ, ϕ, ϕ}
Consider example 1: The data in example 1 is {GREEN, HARD, NO, WRINKLED}.
We see that our initial hypothesis is more specific and we have to generalize it for this example.
Hence, the hypothesis becomes: h = {GREEN, HARD, NO, WRINKLED}
Consider example 2:
Here we see that this example has a negative outcome. Hence we neglect this example and our hypothesis
remains the same. h = {GREEN, HARD, NO, WRINKLED}
Consider example 3: Here we see that this example has a negative outcome. hence we neglect this example
and our hypothesis remains the same. h = {GREEN, HARD, NO, WRINKLED}
Consider example 4: The data present in example 4 is {ORANGE, HARD, NO, WRINKLED}. We compare
every single attribute with the initial data and if any mismatch is found we replace that particular attribute
with a general case (“ ?”). After doing the process the hypothesis becomes:
h = {?, HARD, NO, WRINKLED }
Consider example 5: The data present in example 5 is {GREEN, SOFT, YES, SMOOTH}. We compare
every single attribute with the initial data and if any mismatch is found we replace that particular attribute
with a general case ( “?” ). After doing the process the hypothesis becomes: h = {?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis have the general condition, example
6 and example 7 would result in the same hypothesizes with all general attributes. h = {?, ?, ?, ? }
Hence, for the given data the final hypothesis would be:
Final Hypothesis: h = { ?, ?, ?, ? }.
Note:
Find-S algorithm for above Enjoy Sport table is: h {Sunny, Warm, ?, Strong, ?, ?}
Definition: A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each
example (x, c(x)) in D.
Definition(Version space). A concept is complete if it covers all positive examples. A concept is consistent if
it covers none of the negative examples. The version space is the set of all complete and consistent concepts.
This set is convex and is fully defined by its least and most general elements.
FIGURE :A version space with its general and specific boundary sets. The version space includes all six
hypotheses shown here, but can be represented more simply by S and G. Arrows indicate instances of the
more-general-than relation. This is the version space for the Enjoysport concept learning problem and
training examples described in above Table Enjoy Sport
The CANDIDATE-ELIMINATION algorithm computes the version space containing all hypotheses
from H that are consistent with an observed sequence of training examples. It begins by initializing the
version space to the set of all hypotheses in H; that is, by initializing the G boundary set to contain the most
general hypothesis in H
Go {(?, ?, ?, ?, ?, ?)}
and initializing the S boundary set to contain the most specific (least general) hypothesis
So {(Φ, Φ, Φ, Φ, Φ, Φ)}
Algorithm: CANDIDATE-ELIMINATION algorithm using version spaces. Notice the duality in how
positive and negative examples influence S and G
Steps:
Training examples:
1. <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes
2. <Sunny, Warm, High, Strong, Warm, Same>, Enjoy Sport = Yes
Training Example:
3. <Rainy, Cold, High, Strong, Warm, Change>, EnjoySport=No
FIGURE CANDIDATE-ELMNATION Trace 2. Training example 3 is a negative example that forces the G 2
boundary to be specialized to G3. Note several alternative maximally general hypotheses are included in G 3.
Training Example:
4. <Sunny, Warm, High, Strong, Cool, Change>, EnjoySport = Yes
CANDIDATE-ELIMINATION Trace 3. The positive training example generalizes the S boundary, from S 3
to S4. One member of G3 must also be deleted, because it is no longer more general than the S 4 boundary.
The final version space for the EnjoySport concept learning problem and training examples described earlier.
Learning conjunctive concepts
An Unbiased Learner
The solution to the problem of assuring that the target concept is in the hypothesis space H is to provide
a hypothesis space capable of representing every teachable concept that is representing every possible
subset of the instances X.
The set of all subsets of a set X is called the power set of X
In the EnjoySport learning task the size of the instance space X of days described by the six attributes is
96 instances.
Thus, there are 296 distinct target concepts that could be defined over this instance space and learner
might be called upon to learn.
The conjunctive hypothesis space is able to represent only 973 of these - a biased hypothesis space
indeed
Let us reformulate the EnjoySport learning task in an unbiased way by defining a new hypothesis space
H' that can represent every subset of instances
The target concept "Sky = Sunny or Sky = Cloudy" could then be described as
(Sunny, ?, ?, ?, ?, ?) v (Cloudy, ?, ?, ?, ?, ?)
An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another,
independent of the observed data. [...] Inductive biases can express assumptions about either the data-
generating process or the space of solutions.
The below figure explains