0% found this document useful (0 votes)
42 views

M.L. 3,5,6 Unit 3

S.p.p.u. B.E. comp. science 2019 pattern 7th semester subject - machine learning all syllabus notes

Uploaded by

atharv more
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

M.L. 3,5,6 Unit 3

S.p.p.u. B.E. comp. science 2019 pattern 7th semester subject - machine learning all syllabus notes

Uploaded by

atharv more
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

M.L.

3,5,6

Unit 3

Bias, Variance, Generalization, Underfitting, Overfitting, Linear Regression, Regression: Lasso


Regression, Ridge Regression, Gradient Descent Algorithm, Evaluation Metrics: MAE, RMSE,
R2

1. Bias:
Bias refers to the error introduced by approximating a real-world problem with a simplified
model. High bias can lead to underfitting, where the model fails to capture the underlying
patterns in the data.

2. Variance:
Variance refers to the sensitivity of a model to fluctuations in the training data. High variance can
lead to overfitting, where the model becomes too complex and fits the noise in the data instead
of the underlying patterns.

3. Generalization:
Generalization refers to the ability of a model to perform well on unseen data. A model with
good generalization is able to accurately predict outcomes for new, unseen instances.

4. Underfitting:
Underfitting occurs when a model is too simple to capture the underlying patterns in the data. It
leads to high bias and poor performance on both the training and test data.

5. Overfitting:
Overfitting occurs when a model becomes too complex and fits the noise or random variations
in the training data. It leads to low bias but high variance, causing poor performance on the test
data.

6. Linear Regression:
Linear regression is a popular regression algorithm used to model the relationship between a
dependent variable and one or more independent variables. It assumes a linear relationship and
aims to find the best-fit line that minimizes the difference between the predicted and actual
values.

7. Regression: Lasso Regression:


Lasso Regression is a linear regression technique that incorporates L1 regularization. It adds a
penalty term to the loss function, forcing some of the coefficients to become exactly zero. Lasso
regression performs feature selection by shrinking less relevant features to zero.

8. Ridge Regression:
Ridge Regression is a linear regression technique that incorporates L2 regularization. It adds a
penalty term to the loss function, which discourages large weights in the model. Ridge
regression helps prevent overfitting by reducing the magnitude of the coefficients.

9. Gradient Descent Algorithm:


Gradient Descent is an iterative optimization algorithm used to find the minimum of a loss
function. It starts with an initial set of model parameters and updates them iteratively in the
direction of steepest descent by calculating the gradients of the loss function with respect to the
parameters.

10. Evaluation Metrics:


Evaluation metrics are used to assess the performance of a model. Three common evaluation
metrics for regression problems are:

- Mean Absolute Error (MAE): It calculates the average absolute difference between the
predicted and actual values. MAE represents the average magnitude of the errors.
- Root Mean Squared Error (RMSE): It calculates the square root of the average squared
difference between the predicted and actual values. RMSE penalizes larger errors more than
MAE and provides a measure of the standard deviation of the errors.
- R2 (R-squared): R2 represents the proportion of the variance in the dependent variable that
can be explained by the independent variables. It ranges from 0 to 1, where a higher value
indicates a better fit of the model to the data.

Unit 5

Clustering is a fundamental task in unsupervised machine learning that involves grouping


similar data points together. Several clustering algorithms exist, each with its own characteristics
and applications. In this response, I will explain the concepts of K-Means, K-medoids,
Hierarchical, Density-based, and Spectral Clustering. I will also touch upon outlier analysis,
specifically the introduction of isolation factor and local outlier factor. Lastly, I will mention
evaluation metrics and scoring methods, including the elbow method and extrinsic and intrinsic
methods.

1. K-Means Clustering:
K-Means is a partition-based clustering algorithm. It aims to divide a dataset into K clusters,
where K is a predefined number. The algorithm iteratively assigns data points to the nearest
cluster centroid and recalculates the centroids until convergence. Each data point belongs to the
cluster with the nearest centroid.

Example: Consider a dataset of customer information, where each data point represents a
customer. K-Means can be used to cluster customers into segments based on their purchasing
patterns, such as high-value, medium-value, and low-value customers.
2. K-medoids Clustering:
K-medoids is similar to K-Means but uses medoids as representatives of clusters instead of
centroids. A medoid is the most centrally located point within a cluster, minimizing the
dissimilarity to other points. The algorithm iteratively assigns data points as medoids and
updates the cluster assignments until convergence.

Example: In a dataset of images, K-medoids clustering can be used to identify representative


images for various themes or categories, such as landscapes, animals, and architecture.

3. Hierarchical Clustering:
Hierarchical clustering creates a hierarchy of clusters using either an agglomerative (bottom-up)
or divisive (top-down) approach. It starts with each data point as an individual cluster and
merges or splits clusters based on similarity metrics. The result is a tree-like structure called a
dendrogram, which can be cut at different levels to obtain different numbers of clusters.

Example: Hierarchical clustering can be used in genetics to classify patients into different
subgroups based on gene expression levels, aiding in the identification of distinct disease
profiles.

4. Density-based Clustering:
Density-based clustering, such as DBSCAN (Density-Based Spatial Clustering of Applications
with Noise), groups together data points based on their density in the feature space. It defines
clusters as regions of high density separated by regions of low density. Data points that do not
belong to any cluster are considered outliers or noise.

Example: Density-based clustering can be applied to identify traffic congestion patterns in a city
based on GPS data, where clusters represent congested areas.

5. Spectral Clustering:
Spectral clustering is a technique that utilizes the eigenvectors of a similarity matrix to perform
dimensionality reduction and clustering. It treats data points as nodes in a graph and performs
clustering based on the graph's Laplacian matrix. Spectral clustering can handle complex
cluster shapes and is effective for non-linearly separable data.

Example: Spectral clustering can be used to group documents into topics based on their
content, where each document is represented as a vector in a high-dimensional space.

Outlier Analysis:
Outliers are data points that significantly deviate from the norm or expected patterns. Outlier
analysis helps identify and understand these unusual observations. Two commonly used
methods are:

1. Isolation Forest: The isolation forest algorithm isolates outliers by randomly selecting a
feature and then randomly selecting a split value between the minimum and maximum values of
that feature. Outliers are expected to require fewer splits to be isolated compared to normal data
points.

2. Local Outlier Factor (LOF): LOF measures the local deviation of a data point with respect to
its neighbors. It calculates the ratio of the average local density of its neighbors to

its own local density. Points with a significantly lower density compared to their neighbors are
considered outliers.

Evaluation Metrics and Score:


To assess the quality of clustering results, several evaluation metrics and scoring methods are
used. Here are a few examples:

1. Elbow Method: The elbow method helps determine the optimal number of clusters (K) for
algorithms like K-Means. It plots the within-cluster sum of squares (WCSS) against the number
of clusters and suggests selecting the number of clusters at the "elbow" point where the
improvement in WCSS starts to diminish significantly.

2. Extrinsic Evaluation: Extrinsic evaluation compares clustering results against externally


available ground truth labels or known class assignments. Metrics such as precision, recall,
F1-score, and Rand Index are used to measure the agreement between the clustering and
ground truth.

3. Intrinsic Evaluation: Intrinsic evaluation assesses the quality of clustering based on internal
criteria without using external labels. Metrics like silhouette coefficient, Calinski-Harabasz index,
and Davies-Bouldin index measure compactness, separation, and density-based clustering
characteristics.

These evaluation metrics help quantify the performance of clustering algorithms and guide the
selection of appropriate parameters and techniques.

Unit 6

Artificial Neural Networks (ANNs) are a class of machine learning models inspired by the
structure and function of the human brain. ANNs consist of interconnected artificial neurons or
nodes that process and transmit information. Here are explanations for various types of ANNs,
including Single Layer Neural Networks, Multilayer Perceptron, Back Propagation Learning,
Functional Link Artificial Neural Network, and Radial Basis Function Network. I will also touch
upon activation functions and provide an introduction to Recurrent Neural Networks (RNNs) and
Convolutional Neural Networks (CNNs).

1. Single Layer Neural Network:


A Single Layer Neural Network, also known as a Single Layer Perceptron, is the simplest form
of an ANN. It consists of one layer of artificial neurons connected directly to the input features
and produces a single output. Single Layer Neural Networks can only learn linearly separable
patterns and are limited in their representation power.

2. Multilayer Perceptron (MLP):


A Multilayer Perceptron is a feedforward neural network consisting of multiple layers of artificial
neurons. It has one input layer, one or more hidden layers, and one output layer. The neurons
are organized in a sequential manner, and information flows only in one direction, from the input
layer through the hidden layers to the output layer. MLPs are capable of learning complex
patterns and are widely used in various applications.

3. Back Propagation Learning:


Back Propagation Learning is a training algorithm commonly used with Multilayer Perceptrons. It
aims to adjust the weights and biases of the network by minimizing the error between the
predicted output and the desired output. The algorithm iteratively computes the gradients of the
error with respect to the network parameters and updates them using gradient descent
optimization.

4. Functional Link Artificial Neural Network (FLANN):


A Functional Link Artificial Neural Network extends the capabilities of a Single Layer Neural
Network by incorporating additional nonlinear transformations of the input features. These
transformations, known as functional link units, enable FLANNs to learn and represent more
complex patterns and relationships.

5. Radial Basis Function Network (RBFN):


A Radial Basis Function Network is a type of feedforward neural network that uses radial basis
functions as activation functions. RBFNs are commonly used for pattern recognition and
function approximation tasks. They have a hidden layer consisting of radial basis function units
that compute their activations based on the distance between the input and the center of each
unit.

Activation Functions:
Activation functions introduce nonlinearity into ANNs, enabling them to learn complex
relationships. Common activation functions include:

- Sigmoid: S-shaped curve that maps inputs to a range between 0 and 1.


- Rectified Linear Unit (ReLU): Activation is 0 for negative inputs and linear for positive inputs.
- Hyperbolic Tangent (tanh): S-shaped curve that maps inputs to a range between -1 and 1.
- Softmax: Used in the output layer for multi-class classification problems, it produces a
probability distribution over the classes.

Introduction to Recurrent Neural Networks (RNNs):


RNNs are designed to process sequential data, where information flows in a recurrent manner,
allowing the network to maintain an internal memory. RNNs have feedback connections that
allow the output of a neuron to serve as input to itself or other neurons in the network. This
recurrent structure enables RNNs to model temporal dependencies and handle tasks such as
natural language processing and time series prediction.

Introduction to Convolutional Neural Networks (CNNs):


CNNs are specialized neural networks designed for processing grid-like data, such as images or
time series data. They consist of convolutional layers, pooling layers, and fully connected layers.
Convolutional layers use filters to detect local patterns and spatial hierarchies, while pooling
layers downsample the feature maps to

extract the most relevant information. CNNs have achieved significant success in computer
vision tasks, including image classification, object detection, and image segmentation.

You might also like