M.L. 3,5,6 Unit 3
M.L. 3,5,6 Unit 3
3,5,6
Unit 3
1. Bias:
Bias refers to the error introduced by approximating a real-world problem with a simplified
model. High bias can lead to underfitting, where the model fails to capture the underlying
patterns in the data.
2. Variance:
Variance refers to the sensitivity of a model to fluctuations in the training data. High variance can
lead to overfitting, where the model becomes too complex and fits the noise in the data instead
of the underlying patterns.
3. Generalization:
Generalization refers to the ability of a model to perform well on unseen data. A model with
good generalization is able to accurately predict outcomes for new, unseen instances.
4. Underfitting:
Underfitting occurs when a model is too simple to capture the underlying patterns in the data. It
leads to high bias and poor performance on both the training and test data.
5. Overfitting:
Overfitting occurs when a model becomes too complex and fits the noise or random variations
in the training data. It leads to low bias but high variance, causing poor performance on the test
data.
6. Linear Regression:
Linear regression is a popular regression algorithm used to model the relationship between a
dependent variable and one or more independent variables. It assumes a linear relationship and
aims to find the best-fit line that minimizes the difference between the predicted and actual
values.
8. Ridge Regression:
Ridge Regression is a linear regression technique that incorporates L2 regularization. It adds a
penalty term to the loss function, which discourages large weights in the model. Ridge
regression helps prevent overfitting by reducing the magnitude of the coefficients.
- Mean Absolute Error (MAE): It calculates the average absolute difference between the
predicted and actual values. MAE represents the average magnitude of the errors.
- Root Mean Squared Error (RMSE): It calculates the square root of the average squared
difference between the predicted and actual values. RMSE penalizes larger errors more than
MAE and provides a measure of the standard deviation of the errors.
- R2 (R-squared): R2 represents the proportion of the variance in the dependent variable that
can be explained by the independent variables. It ranges from 0 to 1, where a higher value
indicates a better fit of the model to the data.
Unit 5
1. K-Means Clustering:
K-Means is a partition-based clustering algorithm. It aims to divide a dataset into K clusters,
where K is a predefined number. The algorithm iteratively assigns data points to the nearest
cluster centroid and recalculates the centroids until convergence. Each data point belongs to the
cluster with the nearest centroid.
Example: Consider a dataset of customer information, where each data point represents a
customer. K-Means can be used to cluster customers into segments based on their purchasing
patterns, such as high-value, medium-value, and low-value customers.
2. K-medoids Clustering:
K-medoids is similar to K-Means but uses medoids as representatives of clusters instead of
centroids. A medoid is the most centrally located point within a cluster, minimizing the
dissimilarity to other points. The algorithm iteratively assigns data points as medoids and
updates the cluster assignments until convergence.
3. Hierarchical Clustering:
Hierarchical clustering creates a hierarchy of clusters using either an agglomerative (bottom-up)
or divisive (top-down) approach. It starts with each data point as an individual cluster and
merges or splits clusters based on similarity metrics. The result is a tree-like structure called a
dendrogram, which can be cut at different levels to obtain different numbers of clusters.
Example: Hierarchical clustering can be used in genetics to classify patients into different
subgroups based on gene expression levels, aiding in the identification of distinct disease
profiles.
4. Density-based Clustering:
Density-based clustering, such as DBSCAN (Density-Based Spatial Clustering of Applications
with Noise), groups together data points based on their density in the feature space. It defines
clusters as regions of high density separated by regions of low density. Data points that do not
belong to any cluster are considered outliers or noise.
Example: Density-based clustering can be applied to identify traffic congestion patterns in a city
based on GPS data, where clusters represent congested areas.
5. Spectral Clustering:
Spectral clustering is a technique that utilizes the eigenvectors of a similarity matrix to perform
dimensionality reduction and clustering. It treats data points as nodes in a graph and performs
clustering based on the graph's Laplacian matrix. Spectral clustering can handle complex
cluster shapes and is effective for non-linearly separable data.
Example: Spectral clustering can be used to group documents into topics based on their
content, where each document is represented as a vector in a high-dimensional space.
Outlier Analysis:
Outliers are data points that significantly deviate from the norm or expected patterns. Outlier
analysis helps identify and understand these unusual observations. Two commonly used
methods are:
1. Isolation Forest: The isolation forest algorithm isolates outliers by randomly selecting a
feature and then randomly selecting a split value between the minimum and maximum values of
that feature. Outliers are expected to require fewer splits to be isolated compared to normal data
points.
2. Local Outlier Factor (LOF): LOF measures the local deviation of a data point with respect to
its neighbors. It calculates the ratio of the average local density of its neighbors to
its own local density. Points with a significantly lower density compared to their neighbors are
considered outliers.
1. Elbow Method: The elbow method helps determine the optimal number of clusters (K) for
algorithms like K-Means. It plots the within-cluster sum of squares (WCSS) against the number
of clusters and suggests selecting the number of clusters at the "elbow" point where the
improvement in WCSS starts to diminish significantly.
3. Intrinsic Evaluation: Intrinsic evaluation assesses the quality of clustering based on internal
criteria without using external labels. Metrics like silhouette coefficient, Calinski-Harabasz index,
and Davies-Bouldin index measure compactness, separation, and density-based clustering
characteristics.
These evaluation metrics help quantify the performance of clustering algorithms and guide the
selection of appropriate parameters and techniques.
Unit 6
Artificial Neural Networks (ANNs) are a class of machine learning models inspired by the
structure and function of the human brain. ANNs consist of interconnected artificial neurons or
nodes that process and transmit information. Here are explanations for various types of ANNs,
including Single Layer Neural Networks, Multilayer Perceptron, Back Propagation Learning,
Functional Link Artificial Neural Network, and Radial Basis Function Network. I will also touch
upon activation functions and provide an introduction to Recurrent Neural Networks (RNNs) and
Convolutional Neural Networks (CNNs).
Activation Functions:
Activation functions introduce nonlinearity into ANNs, enabling them to learn complex
relationships. Common activation functions include:
extract the most relevant information. CNNs have achieved significant success in computer
vision tasks, including image classification, object detection, and image segmentation.