ML Unit 1

ML UNIT -1
1.what are the different types of machine learning?
Ans- Machine learning is the branch of

Artificial Intelligence that focuses on
developing models and algorithms that let
computers learn from data and improve
from previous experience without being
explicitly programmed for every task. In
simple words, ML teaches the systems to
think and understand like humans by
learning from the data. Types of Machine
Learning There are several types of
machine learning, each with special
characteristics and applications. Some of
the main types of machine learning
algorithms are as follows: 1. Supervised
Machine Learning 2. Unsupervised
Machine Learning 3. Semi-Supervised
Machine Learning 1. Supervised Machine
Learning Supervised learning is defined as
when a model gets trained on a “Labelled
Dataset”. Labelled datasets have both input
and output parameters. In Supervised
Learning algorithms learn to map points
between inputs and correct outputs. It has
both training and validation datasets
labelled. Let’s understand it with the help
of an example. Example: Consider a
scenario where you have to build an image
classifier to differentiate between cats and
dogs. If you feed the datasets of dogs and
cats labelled images to the algorithm, the
machine will learn to classify between a dog
or a cat from these labeled images. When
we input new dog or cat images that it has
never seen before, it will use the learned
algorithms and predict whether it is a dog
or a cat. This is how supervised learning
works, and this is particularly an image
classification. There are two main
categories of supervised learning that are
mentioned below:  Classification 
Regression Classification Classification
deals with predicting categorical target
variables, which represent discrete classes
or labels. For instance, classifying emails as
spam or not spam, or predicting whether a
patient has a high risk of heart disease.
Classification algorithms learn to map the
input features to one of the predefined
classes. Here are some classification
algorithms:  Logistic Regression  Support
Vector Machine  Random Forest 
Decision Tree  K-Nearest Neighbors
(KNN)  Naive Bayes Regression
Regression, on the other hand, deals with
predicting continuous target variables,
which represent numerical values. For
example, predicting the price of a house
based on its size, location, and amenities, or
forecasting the sales of a product.
Regression algorithms learn to map the
input features to a continuous numerical
value. Here are some regression algorithms:
 Linear Regression  Polynomial
Regression  Ridge Regression  Lasso
Regression  Decision tree  Random Forest
2. Unsupervised Machine Learning
Unsupervised Learning Unsupervised
learning is a type of machine learning
technique in which an algorithm discovers
patterns and relationships using unlabeled
data. Unlike supervised learning,
unsupervised learning doesn’t involve
providing the algorithm with labeled target
outputs. The primary goal of Unsupervised
learning is often to discover hidden
patterns, similarities, or clusters within the
data, which can then be used for various
purposes, such as data exploration,
visualization, dimensionality reduction, and
more. Let’s understand it with the help of
an example. Example: Consider that you
have a dataset that contains information
about the purchases you made from the
shop. Through clustering, the algorithm can
group the same purchasing behavior among
you and other customers, which reveals
potential customers without predefined
labels. This type of information can help
businesses get target customers as well as
identify outliers. There are two main
categories of unsupervised learning that are
mentioned below:  Clustering 
Association Clustering Clustering is the
process of grouping data points into clusters
based on their similarity. This technique is
useful for identifying patterns and
relationships in data without the need for
labeled examples. Here are some clustering
algorithms:  K-Means Clustering
algorithm  Mean-shift algorithm 
DBSCAN Algorithm  Principal
Component Analysis  Independent
Component Analysis 3. Semi-Supervised
Learning Semi-Supervised learning is a
machine learning algorithm that works
between the supervised and unsupervised
learning so it uses both labelled and
unlabelled data. It’s particularly useful
when obtaining labeled data is costly, time-
consuming, or resource-intensive. This
approach is useful when the dataset is
expensive and time-consuming. Semi-
supervised learning is chosen when labeled
data requires skills and relevant resources
in order to train or learn from it. We use
these techniques when we are dealing with
data that is a little bit labeled and the rest
large portion of it is unlabeled. We can use
the unsupervised techniques to predict
labels and then feed these labels to
supervised techniques. This technique is
mostly applicable in the case of image data
sets where usually all images are not
labeled. Let’s understand it with the help of
an example. Example: Consider that we are
building a language translation model,
having labeled translations for every
sentence pair can be resources intensive. It
allows the models to learn from labeled and
unlabeled sentence pairs, making them
more accurate. This technique has led to
significant improvements in the quality of
machine translation services. Types of
Semi-Supervised Learning Methods There
are a number of different semi-supervised
learning methods each with its own
characteristics. Some of the most common
ones include:  Graph-based semi-
supervised learning: This approach uses a
graph to represent the relationships
between the data points. The graph is then
used to propagate labels from the labeled
data points to the unlabeled data points. 
Label propagation: This approach
iteratively propagates labels from the
labeled data points to the unlabeled data
points, based on the similarities between the
data points.  Co-training: This approach
trains two different machine learning
models on different subsets of the unlabeled
data. The two models are then used to label
each other’s predictions.  Self-training:
This approach trains a machine learning
model on the labeled data and then uses the
model to predict labels for the unlabeled
data. The model is then retrained on the
labeled data and the predicted labels for the
unlabeled data.  Generative adversarial
networks (GANs): GANs are a type of deep
learning algorithm that can be used to
generate synthetic data. GANs can be used
to generate unlabeled data for semi-
supervised learning by training two neural
networks, a generator and a discriminator.
2.how do you handle missing or corrupted data in a dataset
with suitable example?
Ans- Handling missing or corrupted data in a dataset is a crucial step in the
data preprocessing phase of machine learning. Here are some common
strategies along with examples:
1. **Removing Rows or Columns**: If the amount of missing data is small
compared to the total dataset, you may choose to remove the rows or columns
with missing or corrupted data.
```python
import pandas as pd
# Example dataset
data = {'A': [1, 2, None, 4],
'B': [5, None, 7, 8]}
df = pd.DataFrame(data)
# Drop rows with missing values

df_cleaned = df.dropna()
```
2. **Imputation**: Replace missing values with some estimated value such as

mean, median, mode, or a more sophisticated method like k-Nearest Neighbors
(KNN) imputation.
```python
from sklearn.impute import SimpleImputer
# Example dataset
data = {'A': [1, 2, None, 4],
'B': [5, None, 7, 8]}
# Impute missing values with mean

imputer = SimpleImputer(strategy='mean')
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
```
3. **Predictive Modeling**: Use machine learning algorithms to predict

missing values based on other features in the dataset.
```python
from sklearn.ensemble import RandomForestRegressor
# Example dataset with missing target variable

data = {'A': [1, 2, 3, 4, 5],
'B': [2, 4, None, 8, 10],
'Target': [10, 20, None, 40, 50]}
# Separate dataset into complete and incomplete rows

incomplete_rows = df[df['Target'].isnull()]
complete_rows = df.dropna(subset=['Target'])
# Train a model to predict missing values

X_train = complete_rows.drop(columns=['Target'])
y_train = complete_rows['Target']
model = RandomForestRegressor()
model.fit(X_train, y_train)
# Predict missing values

X_test = incomplete_rows.drop(columns=['Target'])
predicted_values = model.predict(X_test)
incomplete_rows['Target'] = predicted_values
# Combine complete and imputed rows

df_imputed = pd.concat([complete_rows, incomplete_rows])
```
Each approach has its own advantages and limitations, and the choice depends
on the nature of the data, the amount of missing data, and the specific
requirements of the machine learning task.
3.what are the three stages of building a model in a machine

learning?
Ans- In machine learning, building a model typically involves three main
stages:
1. **Data Preprocessing**:
- This stage involves preparing the dataset for model training by cleaning,
transforming, and scaling the data as necessary.
- Example: Suppose you have a dataset of house prices with features like
number of bedrooms, square footage, and location. In the preprocessing stage,
you might handle missing values, encode categorical variables, and scale
numerical features to a common range.
2. **Model Selection and Training**:
- This stage involves selecting an appropriate machine learning algorithm,
training the model on the prepared dataset, and tuning hyperparameters to
optimize performance.
- Example: After preprocessing the house price dataset, you might choose to
use a regression algorithm such as linear regression, decision tree regression,
or a more sophisticated model like random forest regression. You then train the
chosen model on the preprocessed data and adjust hyperparameters through
techniques like cross-validation to improve performance.
3. **Model Evaluation and Testing**:

- This stage involves evaluating the trained model's performance on unseen
data to assess its accuracy, generalization ability, and potential overfitting.
- Example: Once you've trained the regression model on the house price
dataset, you evaluate its performance using metrics like mean squared error
(MSE), root mean squared error (RMSE), or R-squared (R2) score on a separate
test set that the model hasn't seen during training. This step helps determine
how well the model can predict house prices on new data.
These stages are iterative and may involve revisiting previous steps based on
the evaluation results or changes in requirements. Additionally, it's crucial to
continually monitor and update the model as new data becomes available or as
the underlying problem domain evolves.
4.what are the differences between machine learning and
deep machine learning?
Ans-
S.
No. Machine Learning Deep Learning
1. Machine Learning is a superset of Deep Learning is a subset of Machine

Deep Learning Learning
The data represented in Machine

The data representation used in Deep
2. Learning is quite different compared
Learning is quite different as it uses
to Deep Learning as it uses structured
neural networks(ANN).
data
Deep Learning is an evolution of

3. Machine Learning is an evolution of
Machine Learning. Basically, it is
AI.
how deep is the machine learning.
4. Machine learning consists of

Big Data: Millions of data points.
thousands of data points.
Anything from numerical values to

5. Outputs: Numerical Value, like
free-form elements, such as free text
classification of the score.
and sound.
Uses various types of automated

Uses a neural network that passes
6. algorithms that turn to model
data through processing layers to,
functions and predict future action
interpret data features and relations.
from data.
Algorithms are detected by data Algorithms are largely self-depicted

7. analysts to examine specific variables on data analysis once they’re put into
in data sets. production.
Machine Learning is highly used to

8. Deep Learning solves complex
stay in the competition and learn new
machine-learning issues.
things.
9. Training can be performed using A dedicated GPU (Graphics

S.
Processing Unit) is required for

the CPU (Central Processing Unit).
training.
Although more difficult to set up,

10. More human intervention is involved
deep learning requires less
in getting results.
intervention once it is running.
Although they require additional

setup time, deep learning algorithms
Machine learning systems can be
11. can produce results immediately
swiftly set up and run, but their
(although the quality is likely to
effectiveness may be constrained.
improve over time as more data
becomes available).
12. Its model takes less time in training A huge amount of time is taken
due to its small size. because of very big data points.
Feature engineering is not needed

13. Humans explicitly do feature because important features are
engineering. automatically detected by neural
networks.
Machine learning applications are

Deep learning systems utilize much
14. simpler compared to deep learning
more powerful hardware and
and can be executed on standard
resources.
computers.
15. The results of an ML model are easy The results of deep learning are
to explain. difficult to explain.
Machine learning models can be used

16. Deep learning models are appropriate
to solve straightforward or a little bit
for resolving challenging issues.
challenging issues.
Deep learning technology enables

Banks, doctor’s offices, and increasingly sophisticated and
17. mailboxes all employ machine autonomous algorithms, such as self-
learning already. driving automobiles or surgical
robots.
S.
Deep learning, on the other hand,

Machine learning involves training
18. uses complex neural networks with
algorithms to identify patterns and
multiple layers to analyze more
relationships in data.
intricate patterns and relationships.
Machine learning algorithms can Deep learning algorithms, on the

19. range from simple linear models to other hand, are based on artificial
more complex models such as neural networks that consist of
decision trees and random forests. multiple layers and nodes.
Deep learning algorithms, on the

Machine learning algorithms
other hand, require large amounts of
20. typically require less data than deep
data to train the neural networks but
learning algorithms, but the quality
can learn and improve on their own
of the data is more important.
as they process more data.
Deep learning, on the other hand, is

Machine learning is used for a wide
mostly used for complex tasks such
21. range of applications, such
as image and speech recognition,
as regression, classification,
natural language processing, and
and clustering.
autonomous systems.
Machine learning algorithms for

complex tasks, but they can also be Deep learning algorithms are more
22. more difficult to train and may accurate than machine learning
require more computational algorithms.
resources.
5.what are the applications of supervised machine learning

in modern business?
Ans- Supervised learning, a prominent branch of machine learning, finds
extensive applications in various real-world scenarios. Here's a detailed
explanation of how supervised learning is applied in different domains:
1. **Spam Filters**:
- Supervised learning algorithms, such as Naive Bayes or Support Vector
Machines, are employed by email clients to differentiate between spam and
non-spam emails.
- The algorithms are trained on a labeled dataset consisting of examples of
both spam and legitimate emails.
- During training, the algorithms learn patterns and features characteristic of
spam and use this knowledge to classify incoming emails as either spam or
non-spam.
2. **Fraud Detection**:
- Financial institutions utilize supervised learning algorithms to detect
fraudulent transactions in real-time.
- These algorithms are trained on a dataset containing labeled examples of
fraudulent and non-fraudulent transactions.
- By analyzing transaction features such as amount, location, and frequency,
the algorithms learn to identify anomalous patterns indicative of fraudulent
activity.
3. **Recommendation Systems**:
- Online platforms like Netflix and Amazon leverage supervised learning
algorithms to provide personalized recommendations to users.
- These algorithms learn from historical user interactions, such as movies
watched or products purchased, in a labeled dataset.
- Using techniques like collaborative filtering or matrix factorization, the
algorithms predict user preferences and suggest similar items that users might
be interested in.
4. **Speech Recognition**:
- Voice assistants like Siri and Alexa rely on supervised learning algorithms to
understand and respond to spoken commands.
- The algorithms are trained on a dataset containing transcribed speech
paired with the corresponding text labels.
- By analyzing the acoustic features of speech signals and their corresponding
textual representations, the algorithms learn to recognize and interpret spoken
commands accurately.
5. **Image Classification**:
- Image recognition systems, such as those employed by social media
platforms, use supervised learning algorithms to classify images based on their
content.
- These algorithms are trained on a labeled dataset comprising images
annotated with their corresponding categories (e.g., cats, dogs).
- Through techniques like convolutional neural networks (CNNs), the
algorithms learn hierarchical representations of image features and can
accurately classify new images into predefined categories.
In each of these applications, supervised learning enables the development of

predictive models that can generalize from labeled training data to make
accurate predictions or classifications on unseen data. This ability to learn from
labeled examples makes supervised learning a powerful tool in addressing a
wide range of real-world problems.
6.what is supervised and unsupervised machine learning

techniques?
Ans- Supervised and unsupervised learning encompass a variety of
techniques within each category. Here are some commonly used techniques for
both:
**Supervised Learning Techniques:**

1. **Linear Regression**: A linear model used for predicting a continuous
target variable based on one or more input features.
2. **Logistic Regression**: A regression model used for binary classification,

predicting the probability of an instance belonging to a particular class.
3. **Decision Trees**: Tree-like models where each internal node represents a

decision based on a feature, leading to leaf nodes representing the predicted
outcome.
4. **Random Forests**: An ensemble learning method that builds multiple

decision trees and combines their predictions to improve accuracy and reduce
overfitting.
5. **Support Vector Machines (SVM)**: A discriminative model used for

classification and regression tasks by finding the hyperplane that best separates
different classes.
6. **Neural Networks**: Deep learning models composed of interconnected

layers of nodes, capable of learning complex patterns and representations.
**Unsupervised Learning Techniques:**
1. **K-Means Clustering**: A clustering algorithm that partitions data into k

clusters based on similarity, with each cluster represented by its centroid.
2. **Hierarchical Clustering**: A method that creates a hierarchy of clusters by

recursively merging or splitting clusters based on their similarity.
3. **Principal Component Analysis (PCA)**: A dimensionality reduction
technique that identifies the most significant directions (principal components)
in the data and projects it onto a lower-dimensional space.
4. **Autoencoders**: Neural network architectures used for unsupervised

learning, where the network learns to reconstruct its input, forcing it to
capture essential features and patterns.
5. **Anomaly Detection**: Techniques to identify rare events or outliers in

data that deviate significantly from the norm.
6. **Generative Adversarial Networks (GANs)**: Deep learning models

composed of two neural networks, a generator and a discriminator, trained
adversarially to generate realistic data samples.
These are just a few examples of supervised and unsupervised learning

techniques, and there are many more variations and combinations within each
category. The choice of technique depends on the specific problem, the nature
of the data, and the desired outcome.
7.how will you know which machine algorithm to choose for

your classification problem?
Ans- Choosing the right machine learning algorithm for a classification
problem involves considering several factors:
1. **Size and Complexity of the Dataset**:

- For small to medium-sized datasets with a relatively low number of features,
simpler models like logistic regression, decision trees, or k-nearest neighbors
(KNN) may be sufficient.
- For large, high-dimensional datasets, more complex models like support
vector machines (SVM), random forests, or neural networks might be more
appropriate.
2. **Linearity of the Data**:

- Linear models like logistic regression or linear SVM are suitable when the
relationship between features and the target variable is linear.
- Non-linear models like decision trees, random forests, or neural networks
are more appropriate when the relationship is non-linear.
3. **Interpretability**:
- If interpretability is crucial, simpler models like logistic regression or
decision trees may be preferred as they provide easily interpretable results.
- More complex models like random forests or neural networks may offer
higher accuracy but are often considered black-box models, making
interpretation more challenging.
4. **Handling of Missing Data and Outliers**:

- Some algorithms like decision trees and random forests can handle missing
data and outliers well, while others like SVM may require preprocessing steps
to handle them effectively.
5. **Scalability**:
- Consider the scalability of the algorithm with respect to the size of the
dataset. Some algorithms, like linear models, are highly scalable and suitable
for large datasets, while others, like k-nearest neighbors, may be less efficient.
6. **Performance Metrics**:
- Consider the evaluation metrics relevant to your problem (e.g., accuracy,
precision, recall, F1-score) and choose algorithms that optimize those metrics
effectively.
- For imbalanced datasets, algorithms that handle class imbalance well, such
as those with class weights or algorithms specifically designed for imbalanced
data, may be preferred.
7. **Cross-Validation and Model Selection**:

- Perform cross-validation with multiple algorithms and compare their
performance on validation data to select the best-performing model.
- Consider ensemble methods like random forests or gradient boosting, which
combine multiple models to improve overall performance.
8. **Domain Knowledge and Experimentation**:

- Domain knowledge and experimentation play a crucial role in selecting the
most suitable algorithm. It's often necessary to try multiple algorithms and
fine-tune their parameters to find the best solution for a specific problem.
By considering these factors and experimenting with different algorithms, you

can choose the most appropriate machine learning algorithm for your
classification problem.
8.what is model evaluation and model selection?
Ans- Model evaluation is the process of assessing how well a machine

learning model performs on unseen data. It helps you determine if the
model generalizes well and can make accurate predictions on new data
points. There are various metrics and techniques to evaluate models,
depending on the type of problem you’re solving (classification,
regression, etc.).
Here are some common evaluation metrics for different types of
problems:
1. Classification Metrics:
 Accuracy: The proportion of correctly classified instances out of

the total instances.
 Precision: The proportion of true positives out of the total
predicted positives.
 Recall (Sensitivity): The proportion of true positives out of the
total actual positives.
 F1 Score: The harmonic mean of precision and recall, providing
a balance between the two.
2. Regression Metrics:
 Mean Absolute Error (MAE): The average of the absolute

differences between the predicted and actual values.
 Mean Squared Error (MSE): The average of the squared
differences between the predicted and actual values.
 Root Mean Squared Error (RMSE): The square root of the mean
squared error, which is more sensitive to large errors than
MAE.
 R-squared: The proportion of the variance in the dependent
variable that is predictable from the independent variables,
ranging from 0 to 1.
Model selection is the process of choosing the best model from a set of
candidates based on their performance on a validation set. It’s essential
because different models may perform better on different types of data,
and there’s often a trade-off between model complexity and performance.
Here are some common techniques for model selection:
1. Train-Test Split:
The simplest method is to split your data into a training set and a testing
set. Train each candidate model on the training set and evaluate their
performance on the testing set. Choose the model with the best
performance on the testing set.
2. K-Fold Cross-Validation:
K-Fold Cross-Validation is a more robust method that divides your data

into k equal-sized folds. For each fold, use it as the testing set while using
the remaining k-1 folds as the training set. Train and evaluate each
candidate model k times and calculate the average performance across all
folds. Choose the model with the best average performance.
3. Grid Search and Randomized Search:
These methods are used to optimize hyperparameters of a model. In Grid

Search, you define a set of possible values for each hyperparameter, and
the algorithm tries every possible combination. In Randomized Search,
the algorithm samples a random combination of hyperparameters from a
specified distribution. Both methods can be combined with cross-
validation for more accurate results.
Here’s an outline of the model evaluation and selection process:
1. Split your dataset into training and validation sets (or use cross-
validation).
2. Train each candidate model on the training set (or k-1 folds in
cross-validation).
3. Evaluate each model’s performance on the validation set (or the
kth fold in cross-validation) using appropriate evaluation
metrics.
4. Compare the models’ performance and select the best one for
your problem.
5. Train the chosen model on the entire dataset and use it to make
predictions on new data.
Model evaluation and selection are essential steps in the machine

learning pipeline to ensure you have the best model for your specific
problem. In the next lesson, we will explore some fundamental
machine learning algorithms like linear regression and logistic
regression.
10.how to evaluate the machine learning model and select

the best one?
Ans-an example scenario of evaluating and selecting the best machine
learning model for a classification problem.
**Problem**: Predicting whether a customer will churn (cancel their
subscription) based on various features such as usage patterns, demographics,
and customer interactions.
**Steps**:
1. **Define Evaluation Metrics**: Since it's a binary classification problem

(churn or not churn), we can use metrics like accuracy, precision, recall, and F1-
score. Let's prioritize F1-score as it balances precision and recall.
2. **Split Data**: Split the dataset into training (70%), validation (15%), and
test (15%) sets.
3. **Choose Models**: Select a variety of classifiers such as logistic regression,

decision trees, random forests, and gradient boosting machines (GBM).
4. **Train Models**: Train each model on the training set using default
hyperparameters.
5. **Hyperparameter Tuning**: Use the validation set to tune the

hyperparameters of each model. For example, for the GBM model, you could
use grid search to find the optimal learning rate, maximum depth, and number
of estimators.
6. **Evaluate Models**: Evaluate each model's performance on the validation

set using the F1-score.
7. **Select Best Model**: Choose the model with the highest F1-score on the
validation set. Let's say the GBM model performs the best.
8. **Final Evaluation**: Evaluate the selected GBM model on the test set to get
an unbiased estimate of its performance. If the F1-score on the test set is
satisfactory and the model generalizes well, proceed to deployment.
9. **Iterate if Necessary**: If the performance of the GBM model is not

satisfactory on the test set, consider revisiting previous steps, such as trying
different algorithms, feature engineering techniques, or hyperparameter tuning
strategies.
10. **Deployment**: Once the GBM model is selected and evaluated

satisfactorily, deploy it in the production environment to predict customer
churn based on new data.
Throughout this process, it's essential to document each step, including the
evaluation metrics, hyperparameters, and model performance, to ensure
reproducibility and transparency.
11.considering a long list of machine learning

algorithms,given a dataset ,how do you decide which one to
use?
Ans- Deciding which machine learning algorithm to use for a given dataset
involves considering several factors. Here's a systematic approach:
1. **Problem Type**: Determine the problem type (classification, regression,

clustering, etc.) based on the nature of the target variable. Different algorithms
are suited for different problem types.
2. **Dataset Size**: Consider the size of your dataset. Some algorithms

perform well with small datasets, while others require large amounts of data to
generalize effectively.
3. **Dataset Complexity**: Assess the complexity of your dataset in terms of
features, noise, and underlying patterns. Some algorithms are more robust to
noise and can handle complex relationships better than others.
4. **Interpretability**: Decide whether interpretability of the model is

important. Linear models like logistic regression are more interpretable
compared to complex models like neural networks.
5. **Computational Resources**: Take into account the computational

resources available. Some algorithms, such as deep learning models, require
significant computational power and may not be feasible with limited
resources.
6. **Baseline Models**: Start with simple, well-established algorithms as

baseline models. For classification, logistic regression or decision trees could be
good starting points. For regression, linear regression or random forests are
common choices.
7. **Domain Knowledge**: Consider domain knowledge and prior experience

with similar datasets. Certain algorithms may be preferred in specific domains
based on domain-specific requirements or constraints.
8. **Algorithm Performance**: Experiment with a variety of algorithms and

evaluate their performance using appropriate evaluation metrics. Cross-
validation can help assess how well each algorithm generalizes to unseen data.
9. **Ensemble Methods**: If no single algorithm performs significantly better

than others, consider using ensemble methods like bagging (e.g., random
forests) or boosting (e.g., gradient boosting machines) to combine the
predictions of multiple models.
10. **Iterative Process**: Don't hesitate to iterate and refine your choice of
algorithm based on experimentation and feedback. It may take several
iterations to find the best-performing algorithm for your specific dataset and
problem.
By considering these factors systematically and experimenting with different

algorithms, you can make an informed decision on which machine learning
algorithm(s) to use for your dataset.

ML Unit 1

Uploaded by

Copyright:

Available Formats

ML Unit 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Unit 1

Uploaded by

Copyright:

Available Formats

ML UNIT -1

1.what are the different types of machine learning?

Ans- Machine learning is the branch of

# Drop rows with missing values

2. **Imputation**: Replace missing values with some estimated value such as

# Impute missing values with mean

3. **Predictive Modeling**: Use machine learning algorithms to predict

# Example dataset with missing target variable

# Separate dataset into complete and incomplete rows

# Train a model to predict missing values

# Predict missing values

# Combine complete and imputed rows

3.what are the three stages of building a model in a machine

3. **Model Evaluation and Testing**:

1. Machine Learning is a superset of Deep Learning is a subset of Machine

The data represented in Machine

Deep Learning is an evolution of

4. Machine learning consists of

Anything from numerical values to

Uses various types of automated

Algorithms are detected by data Algorithms are largely self-depicted

Machine Learning is highly used to

9. Training can be performed using A dedicated GPU (Graphics

Processing Unit) is required for

Although more difficult to set up,

Although they require additional

Feature engineering is not needed

Machine learning applications are

Machine learning models can be used

Deep learning technology enables

Deep learning, on the other hand,

Machine learning algorithms can Deep learning algorithms, on the

Deep learning algorithms, on the

Deep learning, on the other hand, is

Machine learning algorithms for

5.what are the applications of supervised machine learning

In each of these applications, supervised learning enables the development of

6.what is supervised and unsupervised machine learning

**Supervised Learning Techniques:**

2. **Logistic Regression**: A regression model used for binary classification,

3. **Decision Trees**: Tree-like models where each internal node represents a

4. **Random Forests**: An ensemble learning method that builds multiple

5. **Support Vector Machines (SVM)**: A discriminative model used for

6. **Neural Networks**: Deep learning models composed of interconnected

**Unsupervised Learning Techniques:**

1. **K-Means Clustering**: A clustering algorithm that partitions data into k

2. **Hierarchical Clustering**: A method that creates a hierarchy of clusters by

4. **Autoencoders**: Neural network architectures used for unsupervised

5. **Anomaly Detection**: Techniques to identify rare events or outliers in

6. **Generative Adversarial Networks (GANs)**: Deep learning models

These are just a few examples of supervised and unsupervised learning

7.how will you know which machine algorithm to choose for

1. **Size and Complexity of the Dataset**:

2. **Linearity of the Data**:

4. **Handling of Missing Data and Outliers**:

7. **Cross-Validation and Model Selection**:

8. **Domain Knowledge and Experimentation**:

By considering these factors and experimenting with different algorithms, you

8.what is model evaluation and model selection?

2. Imputation: Replace missing values with some estimated value such as

3. Predictive Modeling: Use machine learning algorithms to predict

3. Model Evaluation and Testing:

Supervised Learning Techniques:

2. Logistic Regression: A regression model used for binary classification,

3. Decision Trees: Tree-like models where each internal node represents a

4. Random Forests: An ensemble learning method that builds multiple

5. Support Vector Machines (SVM): A discriminative model used for

6. Neural Networks: Deep learning models composed of interconnected

Unsupervised Learning Techniques:

1. K-Means Clustering: A clustering algorithm that partitions data into k

2. Hierarchical Clustering: A method that creates a hierarchy of clusters by

4. Autoencoders: Neural network architectures used for unsupervised

5. Anomaly Detection: Techniques to identify rare events or outliers in

6. Generative Adversarial Networks (GANs): Deep learning models

1. Size and Complexity of the Dataset:

2. Linearity of the Data:

4. Handling of Missing Data and Outliers:

7. Cross-Validation and Model Selection:

8. Domain Knowledge and Experimentation:

1. Define Evaluation Metrics: Since it's a binary classification problem

3. Choose Models: Select a variety of classifiers such as logistic regression,

5. Hyperparameter Tuning: Use the validation set to tune the

6. Evaluate Models: Evaluate each model's performance on the validation

9. Iterate if Necessary: If the performance of the GBM model is not

10. Deployment: Once the GBM model is selected and evaluated

1. Problem Type: Determine the problem type (classification, regression,

2. Dataset Size: Consider the size of your dataset. Some algorithms

4. Interpretability: Decide whether interpretability of the model is

5. Computational Resources: Take into account the computational

6. Baseline Models: Start with simple, well-established algorithms as

7. Domain Knowledge: Consider domain knowledge and prior experience

8. Algorithm Performance: Experiment with a variety of algorithms and

9. Ensemble Methods: If no single algorithm performs significantly better