Final Project Report

Final Project Report
Topic:-Applying Machine Learning Algorithms for Analyzing and predicting

Agriculture(Crops) Performance with many types of fertilizer and
temperature,humidity,rainfall.
Submitted By:-Saswata Banerjee
Submitted To:-Partha Koley
Course Name:-Machine Learning With Python
Euphoria GenX
----------------------------X--------------------------------
Index:-
1. Abstract……………………………………………………………………………..1-2
2. Acknowledgement……………………………………………………………..2-3
3. SDK(kits)…………………………………………………………………………….3-4
4. Model …………………………………………………………………………………4-11
5. Machine Learning……………………………………………………………….12
6. Supervised And Unsupervised……………………………………………12-13
7. Python…………………………………………………………………………………13
8. Workflow Project………………………………………………………………..13-15
9. The Elbow Method………………………………………………………………15-16
10. Distribution of agricultural conditions……………………………….16-18
11. Prediction of Crops………………………………………………………………18-20
12. Confusion matrix using logistic Regression,Kmeans…………..20-25
13. Classification report for logistic logistic regression……………25-26
14. Source code and Output…………………………………………………….26-29
15. Conclusion……………………………………………………………………………29-30
16. Future scope………………………………………………………………………..31-32
17. Bibliography………………………………………………………………………..32
------------------------------------------------------------------------X-----------------------------------------------------------------------------------
1
1.Abstract:-
Smarter applications are making better use of the insights gleaned from data,
having an impact on every industry and research discipline. At the core the
revolution lies the tools and the methods that are driving it, from processing the
massive piles of data generated each day to learning from and taking useful
action. In this paper we first introduced you to the python programming
characteristics and features. Python is one of the most preferred languages for
scientific computing, data science, and machine learning, boosting both
performance and productivity by enabling the use of low-level libraries. This
paper offers insight into the field of machine learning with python, taking a tour
through important topics and libraries of python which enables the development
of machine learning model a easy process.Then we will look at different types of
machine learning and various algorithms of machine leaning. And at last, we will
look at the one of the most used models i.e., Linear Regression.
Linear Regression is a Machine Learning algorithm based on supervised learning.
It performs a regression task. It is used to predict the value of a variable based
on the value of another variable. The variable you want to predict is called
dependent variable. The variable you are using to predict the other variable’s
value is called the independent variable.
Hypothesis function for linear regression:-
Y = mx + c
And at last, in this paper, we will be going to understand one of the linear
2
regression models for an ice-cream selling company which will predict the sales
done by the business on different temperatures.
Keywords:- Python; Machine Learning; Artificial Intelligence; Regression; Linear
Regression.
2.Acknowledgement:-
SDT(Software Development Tools):-Machine learning involves using algorithms to allow
computer software programs to 'learn' different tasks making use of the available data. ML
programs become more accurate the more they train on the relevant test data.
Many types of Software Development Tools are there.Some of them which is used in
Programming are:-
1.numpy:-
Work:-In Python we have lists that serve the purpose of arrays, but they are slow to process.
NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.
The array object in NumPy is called ndarray, it provides a lot of supporting functions that make
working with ndarray very easy.
Arrays are very frequently used in data science, where speed and resources are very important.
Language:-Python,C and C++.
2.pandas:-
Work:-Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
Language:-Python.
3.Matplotlib:-
Work:-Matplotlib is easy to use and an amazing visualizing library in Python. It is built
on NumPy arrays and designed to work with the broader SciPy stack and consists of
3
several plots like line, bar, scatter, histogram, etc.
Language:-Python.
4.Pyplot:-
Work:-pyplot is a collection of command style functions that make matplotlib work
like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a
figure, creates a plotting area in a figure, plots some lines in a plotting area,
decorates the plot with labels, etc.
Language:-Python.
5.Seaborn:-
Work:-Seaborn is a library for making statistical graphics in Python. It builds on top
of matplotlib and integrates closely with pandas data structures. Seaborn helps you
explore and understand your data.
Language:-Graphics,Python.
3.SDK:-As an AI language model, I can provide you with information about Google
Colab and its software development kit (SDK).
Google Colab is a cloud-based interactive computing environment that provides free
access to a Jupyter Notebook environment along with computational resources,
including CPU, GPU, and TPU. It allows users to write, run, and share Python code in a
collaborative and interactive manner. Colab is hosted on Google Drive, and notebooks
can be created, shared, and saved directly in Google Drive.
While Google Colab does not have an official SDK, it provides a Python library called
"google.colab" that allows developers to interact with the Colab environment
programmatically. The "google.colab" library provides functionality for tasks such as

4
importing and exporting files, installing Python packages, managing Colab sessions,
and connecting to external services like Google Drive and Google Sheets.
Some of the common tasks that can be performed using the "google.colab" library
include:-
Importing and exporting files:-The library allows you to upload and download files to and
from the Colab environment. For example, you can use the "files.upload()" function to
upload files from your local machine to Colab, and the "files.download()" function to
download files from Colab to your local machine.
Installing Python packages:-The library provides a way to install Python packages
directly from within the Colab environment using the "!pip install" command.
Managing Colab sessions:-The library allows you to manage the lifecycle of a Colab
session. You can use functions like "drive.mount()" to mount your Google Drive,
"drive.flush_and_unmount()" to flush and unmount the Google Drive, and "os.kill()" to
terminate the current session.Connecting to external services: The library provides
functionality to connect to external services like Google Drive and Google Sheets,
allowing you to read and write data to these services from within a Colab notebook.
Interacting with Colab UI:- The library allows you to interact with the Colab user
interface programmatically, for example, by using the "IPython.display" module to
display images, videos, and other media in the output of a Colab cell.
Overall, while Google Colab does not have a standalone SDK, the "google.colab" library
provides a convenient way to interact with the Colab environment programmatically and
automate various tasks within Colab notebooks. You can import the "google.colab"
library in your Python code and use its functions to perform operations within the Colab
environment.
4.Model:-
5
SDLC Model:-
Waterfall Model:-
The waterfall is a universally accepted SDLC model. In this method, the whole process of
software development is divided into various phases.
The waterfall model is a continuous software development model in which development is
seen as flowing steadily downwards (like a waterfall) through the steps of requirements
analysis, design, implementation, testing (validation), integration, and maintenance.
Linear ordering of activities has some significant consequences. First, to identify the end of a
phase and the beginning of the next, some certification techniques have to be employed at
the end of each step. Some verification and validation usually do this mean that will ensure
that the output of the stage is consistent with its input (which is the output of the previous
step), and that the output of the stage is consistent with the overall requirements of the
system.
RAD Model:-
RAD or Rapid Application Development process is an adoption of the waterfall model; it
targets developing software in a short period. The RAD model is based on the concept that a
better system can be developed in lesser time by using focus groups to gather system
requirements.
o Business Modeling
o Data Modeling
6
o Process Modeling
o Application Generation
o Testing and Turnover
Spiral Model:-
The spiral model is a risk-driven process model. This SDLC model helps the group to adopt
elements of one or more process models like a waterfall, incremental, waterfall, etc. The
spiral technique is a combination of rapid prototyping and concurrency in design and
development activities.
Each cycle in the spiral begins with the identification of objectives for that cycle, the
different alternatives that are possible for achieving the goals, and the constraints that exist.
This is the first quadrant of the cycle (upper-left quadrant).
The next step in the cycle is to evaluate these different alternatives based on the objectives
and constraints. The focus of evaluation in this step is based on the risk perception for the
project.
The next step is to develop strategies that solve uncertainties and risks. This step may
involve activities such as benchmarking, simulation, and prototyping.
V-Model:-
In this type of SDLC model testing and the development, the step is planned in parallel. So,
there are verification phases on the side and the validation phase on the other side. V-Model
joins by Coding phase.
Incremental Model:-
The incremental model is not a separate model. It is necessarily a series of waterfall cycles.
The requirements are divided into groups at the start of the project. For each group, the
SDLC model is followed to develop software. The SDLC process is repeated, with each
release adding more functionality until all requirements are met. In this method, each cycle
7
act as the maintenance phase for the previous software release. Modification to the
incremental model allows development cycles to overlap. After that subsequent cycle may
begin before the previous cycle is complete.
Agile Model:-
Agile methodology is a practice which promotes continues interaction of development and
testing during the SDLC process of any project. In the Agile method, the entire project is
divided into small incremental builds. All of these builds are provided in iterations, and each
iteration lasts from one to three weeks.
Any agile software phase is characterized in a manner that addresses several key
assumptions about the bulk of software projects:
1. It is difficult to think in advance which software requirements will persist and which
will change. It is equally difficult to predict how user priorities will change as the project
proceeds.
2. For many types of software, design and development are interleaved. That is, both
activities should be performed in tandem so that design models are proven as they are
created. It is difficult to think about how much design is necessary before construction is
used to test the configuration.
3. Analysis, design, development, and testing are not as predictable (from a planning
point of view) as we might like.
Iterative Model:-
It is a particular implementation of a software development life cycle that focuses on an
initial, simplified implementation, which then progressively gains more complexity and a
broader feature set until the final system is complete. In short, iterative development is a
way of breaking down the software development of a large application into smaller pieces.
8
Big bang model:-
Big bang model is focusing on all types of resources in software development and coding,
with no or very little planning. The requirements are understood and implemented when they
come.
This model works best for small projects with smaller size development team which are
working together. It is also useful for academic software development projects. It is an ideal
model where requirements are either unknown or final release date is not given.
Prototype Model:-
The prototyping model starts with the requirements gathering. The developer and the user
meet and define the purpose of the software, identify the needs, etc.
A 'quick design' is then created. This design focuses on those aspects of the software that
will be visible to the user. It then leads to the development of a prototype. The customer then
checks the prototype, and any modifications or changes that are needed are made to the
prototype.
Looping takes place in this step, and better versions of the prototype are created. These are
continuously shown to the user so that any new changes can be updated in the prototype.
This process continue until the customer is satisfied with the system. Once a user is
satisfied, the prototype is converted to the actual system with all considerations for quality
and security.
Machine Learning Life Cycle:-

9
1. Gathering Data:-
Data Gathering is the first step of the machine learning life cycle. The goal of this step is to
identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the most
important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will be
the prediction.
This step includes the below tasks:
o Identify various data sources

o Collect data
o Integrate the data obtained from different sources
By performing the above task, we get a coherent set of data, also called as a dataset. It will
be used in further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a step
where we put our data into a suitable place and prepare it to use in our machine learning
training.
In this step, first, we put all data together, and then randomize the ordering of data.
This step can be further divided into two processes:

10
o Data exploration:-
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:-
Now the next step is preprocessing of data for its analysis.
3. Data Wrangling:-
Data wrangling is the process of cleaning and converting raw data into a useable format. It is
the process of cleaning the data, selecting the variable to use, and transforming the data in a
proper format to make it more suitable for analysis in the next step. It is one of the most
important steps of the complete process. Cleaning of data is required to address the quality
issues.
It is not necessary that data we have collected is always of our use as some of the data may
not be useful. In real-world applications, collected data may have various issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
So, we use various filtering techniques to clean the data.
It is mandatory to detect and remove the above issues because it can negatively affect the
quality of the outcome.
4. Data Analysis:-
Now the cleaned and prepared data is passed on to the analysis step. This step involves:
o Selection of analytical techniques

o Building models
11
o Review the result
The aim of this step is to build a machine learning model to analyze the data using various
analytical techniques and review the outcome. It starts with the determination of the type of
the problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build the model using
prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build the model.
5. Train Model:-
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.
We use datasets to train the model using various machine learning algorithms. Training a
model is required so that it can understand the various patterns, rules, and, features.
6. Test Model:-
Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as per the requirement
of project or problem.
7. Deployment:-
The last step of machine learning life cycle is deployment, where we deploy the model in the
real-world system.
If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system. But before deploying the
project, we will check whether it is improving its performance using available data or not.
The deployment phase is similar to making the final report for a project.
12
5.Machine Learning:-
ML-based deep learning can simplify the task of crop breeding. Algorithms simply collect
field data on plant behavior and use that data to develop a probabilistic model.
Crop yield prediction is another instance of machine learning in the agriculture sector.
The technology amplifies decisions on what crop species to grow and what activities to
perform during the growing season. Tech-wise, crop yield is used as a dependent
variable when making predictions. The major factors include temperature, soil type,
rainfall, and actual crop information. Based on these inputs, ML algorithms like neural
networks and multiple linear regression produce forecasts.
The goal of this research is to present a comparison between different
clustering and segmentation techniques, both supervised and unsupervised,
to detect plant and crop rows. Aerial images, taken by an Unmanned Aerial
Vehicle (UAV), of a corn field at various stages of growth were acquired in
RGB format through the Agronomy Department at the Kansas State
University. Several segmentation and clustering approaches were applied to
these images, namely K-Means clustering, Excessive Green (ExG) Index
algorithm, Support Vector Machines (SVM), Gaussian Mixture Models (GMM),
and a deep learning approach based on Fully Convolutional Networks (FCN),
to detect the plants present in the images. A Hough Transform (HT) approach
was used to detect the orientation of the crop rows and rotate the images so
that the rows became parallel to the x-axis. The result of applying different
13
segmentation methods to the images was then used in estimating the
location of crop rows in the images by using a template creation method
based on Green Pixel Accumulation (GPA) that calculates the intensity
profile of green pixels present in the images. Connected component analysis
was then applied to find the centroids of the detected plants. Each centroid
was associated with a crop row, and centroids lying outside the row
templates were discarded as being weeds. A comparison between the
various segmentation algorithms based on the Dice similarity index and
average run-times is presented at the end of the work.
Python is also being used for developing the IoT devices. AI is assisting IoT
in enabling real-time data analytics to help make informed decisions to farmers. Precision
agriculture or smart Agriculture relies on emerging technologies such as AI, ML and data
analytics to revolutionize farming practices.
8. Workflow Project:-
WorkFlow Management Crops Prediction(Agricultural System)

Workflow management in agricultural systems for crop prediction involves the efficient
coordination and automation of tasks and processes related to crop cultivation, monitoring,
14
and prediction of yields. Here's a general outline of a typical workflow management system
for crop prediction in an agricultural setting:-
Data Collection:- Data related to various factors that influence crop growth and yield, such as
weather conditions, soil characteristics, historical crop data, and satellite imagery, are
collected and integrated into the workflow management system. This data can be collected
through various sensors, drones, and other data sources.
Data Preprocessing:- The collected data is preprocessed to clean and transform it into a
format suitable for analysis. This may involve data cleaning, normalization, aggregation, and
feature extraction to reduce noise and ensure data quality.
Data Analysis:- The preprocessed data is analyzed using various statistical and machine
learning techniques to identify patterns, trends, and correlations between different variables.
For example, machine learning algorithms such as decision trees, random forests, and neural
networks can be used to predict crop yields based on historical data and environmental
factors.
Crop Prediction:- Based on the analysis results, the workflow management system can
generate crop prediction models that can forecast crop yields for different crops and regions.
These models can be continuously updated with new data to improve their accuracy over
time.
Decision Support:- The workflow management system can provide decision support to
farmers by presenting them with insights and recommendations based on the crop prediction
models. For example, it can suggest optimal planting times, irrigation schedules, and
fertilization plans based on the predicted crop yields and current weather conditions.
Task Automation:-The workflow management system can automate various tasks related to
crop cultivation, such as scheduling irrigation, applying fertilizers, and monitoring pest
control, based on the predicted crop yields and environmental conditions. This can help
farmers optimize their operations, reduce costs, and increase productivity.

15
Monitoring and Feedback:-The workflow management system can continuously monitor the
actual crop growth and yield data and compare it with the predicted results. This feedback
loop allows for ongoing validation and refinement of the prediction models, and helps farmers
make informed decisions about their crop management practices.
Reporting and Visualization:- The workflow management system can generate reports and
visualizations to provide farmers and other stakeholders with a clear understanding of the
crop prediction results, trends, and performance metrics. This can help farmers evaluate the
effectiveness of their crop management strategies and make data-driven decisions for future
seasons.
Integration with Crop Management Tools:- The workflow management system can be
integrated with other crop management tools, such as farm management software, precision
agriculture equipment, and agricultural drones, to enable seamless coordination and
execution of tasks based on crop prediction results.
Continuous Improvement:-The workflow management system can be continuously improved
by incorporating new data sources, updating prediction models, and refining decision support
algorithms based on feedback from farmers and other stakeholders. This iterative process
helps ensure that the system remains accurate, reliable, and relevant over time.
Overall, an effective workflow management system for crop prediction in agricultural
systems involves the integration of data collection, preprocessing, analysis, prediction,
decision support, task automation, monitoring, reporting, and continuous improvement
components to enable efficient and data-driven crop management practices.
9.Elbow Method:-
16
Elbow Method of Crops Prediction(Agriculture)
The Elbow Method is a commonly used technique in data science and machine learning to
determine the optimal number of clusters or groups in a dataset. It can also be applied in
agriculture for crop prediction, specifically in crop classification or clustering tasks.
For each value of k, run the clustering algorithm and compute the sum of squared distances
(SSE) of each data point to its centroid within each cluster. Plot the SSE values against the
corresponding values of k in a line chart.
The Elbow Method can help in optimizing the clustering process and improving the accuracy
of crop prediction models by identifying the appropriate number of clusters or groups in the
dataset. It can also aid in making informed decisions related to crop management, resource
allocation, and agricultural planning.
10.Distribution Of Agricultural Conditions:-
The distribution of agricultural conditions can vary greatly depending on various factors such
as climate, soil type, topography, water availability, and human intervention. Here are some
general patterns of agricultural conditions distribution:-
Climate:-Climate plays a crucial role in determining agricultural conditions. Crops have
specific requirements for temperature, precipitation, and sunlight. In general, agricultural

17
areas tend to be concentrated in regions with favorable climates for crop growth. For
example, areas with moderate temperatures, adequate rainfall, and ample sunlight are often
conducive to agriculture. Regions with harsh climates such as deserts, extreme cold, or
excessive rainfall may have limited agricultural potential.
Soil type:- Soil type is another critical factor that influences agricultural conditions. Different
crops require different types of soils for optimal growth. For example, crops like rice and
cranberries thrive in acidic soils, while crops like wheat and corn prefer well-drained loamy
soils. Agricultural areas are often found in regions with fertile soils that provide essential
nutrients and support healthy crop growth.
Topography:-Topography, or the physical characteristics of the land, can also affect
agricultural conditions. Flat or gently sloping lands are generally more suitable for
agriculture as they allow for easier irrigation and cultivation. Steep slopes or rugged terrains
may pose challenges in terms of soil erosion, water runoff, and accessibility, which can
impact agricultural productivity.
Water availability:-Access to water is critical for agriculture. Regions with ample water
resources such as rivers, lakes, or groundwater reserves are often conducive to agriculture.
Irrigation systems are often developed in areas with limited rainfall to support crop growth.
In contrast, areas with limited water resources may face challenges in agricultural
production.
Human intervention:-Human intervention, including agricultural practices and infrastructure
development, can greatly influence agricultural conditions. Agricultural technologies, such
as irrigation systems, fertilizers, and crop management practices, can enhance agricultural
productivity and expand the potential for agriculture in regions with suboptimal conditions.
Human settlements and infrastructure, such as roads and markets, also play a role in
determining the distribution of agricultural conditions.
Overall, the distribution of agricultural conditions is influenced by a complex interplay of

18
factors including climate, soil type, topography, water availability, and human intervention.
Understanding these factors is crucial for planning and managing agricultural activities and
ensuring sustainable food production.
11.Predictions of Crops:-
As an AI language model, I can provide predictions about potential trends and factors
that may impact crop production in the future. However, it's important to note that crop
predictions are subject to various factors, including weather conditions, technological
advancements, economic factors, and policy changes, which can all influence crop
production. Additionally, unforeseen events or disruptions, such as natural disasters or
disease outbreaks, can also significantly impact crop yields. With these considerations in
mind, here are some potential predictions for crops:
Climate-resilient crops:-With the increasing impacts of climate change, there may be a
growing demand for climate-resilient crops that are adapted to changing weather patterns,
such as drought-tolerant or heat-tolerant varieties. Advances in biotechnology and genetic
engineering may lead to the development of genetically modified crops that are better able
to withstand extreme weather conditions, helping to ensure stable crop production in the
face of climate challenges.
Vertical farming:- Vertical farming, which involves growing crops indoors in stacked layers
19
using artificial lighting, may become more widespread due to its potential for year-round
production in urban environments and reduced reliance on traditional agricultural land.
Advances in LED lighting technology, automation, and data analytics may drive increased
adoption of vertical farming, allowing for the cultivation of a wide variety of crops in
controlled environments with optimized resource use.
Organic and regenerative agriculture: There may be a growing demand for organic and
regenerative agricultural practices that prioritize soil health, biodiversity, and ecosystem
sustainability. Consumers' increasing focus on health and environmental sustainability may
drive demand for crops grown using organic or regenerative practices, which can promote
soil fertility, reduce chemical inputs, and enhance overall ecosystem resilience.
Precision agriculture:- Precision agriculture, which involves using technologies such as
drones, sensors, and data analytics to optimize crop management, may continue to gain
momentum. Advancements in remote sensing, data analytics, and artificial intelligence may
enable farmers to make data-driven decisions about planting, irrigation, nutrient
management, and pest control, resulting in improved crop yields, reduced input use, and
enhanced sustainability.
Alternative protein crops:-As global demand for protein-rich foods continues to rise, there
may be an increasing focus on alternative protein crops, such as legumes, insects, and algae.
These crops are rich in protein, require fewer resources to produce compared to traditional
animal agriculture, and may be more sustainable and environmentally friendly.
Resurgence of traditional and indigenous crops:-There may be a renewed interest in
traditional and indigenous crops that are well adapted to local climates and have genetic
diversity. These crops may be seen as more resilient to changing environmental conditions
and may offer unique nutritional and cultural benefits.
Increased adoption of genetically modified crops:-Advances in genetic engineering may lead
to increased adoption of genetically modified crops with enhanced traits, such as resistance
20
to pests, diseases, or environmental stress. However, the adoption of genetically modified
crops may continue to be a topic of debate, with concerns about safety, environmental
impacts, and consumer acceptance.
It's important to note that these predictions are speculative and may be subject to change as
new technologies, policies, and environmental factors emerge. The future of crop production
will likely be shaped by a complex interplay of various factors, and careful monitoring and
adaptive management will be necessary to ensure sustainable and resilient crop production
systems.
Example:-
12.Confusion Matrix:-
A confusion matrix, also known as an error matrix, is a commonly used evaluation metric in
machine learning and data mining to assess the performance of a classification model. K-
means, however, is an unsupervised clustering algorithm that does not inherently provide
labels or ground truth for classification. Therefore, using a confusion matrix directly with K-
means is not applicable.
However, if you are interested in evaluating the performance of a classification model that is
trained using K-means clustering as a feature extraction step, you can follow these steps to
generate a confusion matrix:-
Perform K-means clustering:-Use K-means algorithm to cluster your data into K groups. The
21
clusters obtained from K-means can be treated as pseudo-labels for your data.
Train a classifier:-Use the cluster assignments obtained from K-means as features and train
a classification model, such as logistic regression, decision tree, or support vector machine
(SVM), using a labeled dataset. The labeled dataset should have true class labels for each
data point that are used for training the classifier.
Make predictions:-Use the trained classifier to make predictions on a test dataset. The
predicted class labels can be obtained from the output of the classifier.
Create a confusion matrix:-Compare the predicted class labels with the true class labels
from the test dataset to create a confusion matrix. The confusion matrix will have rows
representing the true class labels and columns representing the predicted class labels. The
diagonal elements of the confusion matrix represent the number of correct predictions, while
the off-diagonal elements represent the misclassifications.
Calculate performance metrics:-Use the values in the confusion matrix to calculate various
performance metrics such as accuracy, precision, recall, and F1 score, which provide
insights into the classification performance of the model.Here's an example of how you can
create a confusion matrix using K-means clustering as a feature extraction step in Python.
A confusion matrix, also known as an error matrix, is a performance evaluation tool used in
machine learning and statistics to assess the accuracy of a classification model. It is a table
that displays the true positive (TP), true negative (TN), false positive (FP), and false negative
(FN) values for a set of predictions compared to the actual ground truth.
Here is an example of a confusion matrix:-

Actual/Predicte | Positive | Negative
-------------------- |---------- |----------
Positive | TP | FP
Negative | FN | TN
Each cell in the confusion matrix represents the count or percentage of instances that fall
into a specific category based on the model's predictions and the actual ground truth. The
key terms used in a confusion matrix are:

22
True Positive (TP):-The number of instances that are actually positive and are correctly
predicted as positive by the model.
True Negative (TN):-The number of instances that are actually negative and are correctly
predicted as negative by the model.
False Positive (FP):-The number of instances that are actually negative but are incorrectly
predicted as positive by the model.
False Negative (FN):-The number of instances that are actually positive but are incorrectly
predicted as negative by the model.
The confusion matrix provides valuable insights into the performance of a classification
model, allowing for the calculation of various performance metrics such as accuracy,
precision, recall, F1 score, and specificity, which help in understanding the model's strengths
and weaknesses. It is a useful tool for evaluating and fine-tuning machine learning models to
improve their classification accuracy.
Confusion Matrix using Logistic Regression:-
A confusion matrix is a commonly used tool to evaluate the performance of a classification
model, such as logistic regression. It is a matrix that shows the number of true positives (TP),
false positives (FP), true negatives (TN), and false negatives (FN) for a given set of
predictions compared to the actual ground truth.
Here's an example of how you can create a confusion matrix using logistic regression in
Python:-
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
# Load your dataset
# X is the feature matrix, y is the target variable
X, y = load_your_dataset()
# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the logistic regression model
logreg = LogisticRegression()
23
# Train the model
logreg.fit(X_train, y_train)
# Make predictions on the test set
y_pred = logreg.predict(X_test)
# Create a confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Extract values from the confusion matrix
tn, fp, fn, tp = cm.ravel()
# Print the confusion matrix
print("Confusion Matrix:")
print("True Negatives (TN):", tn)
print("False Positives (FP):", fp)
print("False Negatives (FN):", fn)
print("True Positives (TP):", tp)
# You can also visualize the confusion matrix using a heatmap
import seaborn as sns
import matplotlib.pyplot as plt
# Create a heatmap of the confusion matrix
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
In the example above, we first load our dataset and split it into train and test sets using
train_test_split from Scikit-learn. Then we initialize a logistic regression model, fit it to the
training data, and make predictions on the test data. We create a confusion matrix using
confusion_matrix from Scikit-learn, and then extract the values for TP, FP, FN, and TP from
the confusion matrix. Finally, we print the values and visualize the confusion matrix using a
heatmap with Seaborn and Matplotlib.
Confusion Matrix using Kmeans:-

A confusion matrix, also known as an error matrix, is a commonly used evaluation metric in
machine learning and data mining to assess the performance of a classification model. K-
means, however, is an unsupervised clustering algorithm that does not inherently provide
labels or ground truth for classification. Therefore, using a confusion matrix directly with K-
means is not applicable.
However, if you are interested in evaluating the performance of a classification model that is
trained using K-means clustering as a feature extraction step, you can follow these steps to
generate a confusion matrix:-

24
Perform K-means clustering:-Use K-means algorithm to cluster your data into K groups. The
clusters obtained from K-means can be treated as pseudo-labels for your data.
Train a classifier:-Use the cluster assignments obtained from K-means as features and train
a classification model, such as logistic regression, decision tree, or support vector machine
(SVM), using a labeled dataset. The labeled dataset should have true class labels for each
data point that are used for training the classifier.
Make predictions:-Use the trained classifier to make predictions on a test dataset. The
predicted class labels can be obtained from the output of the classifier.
Create a confusion matrix:-Compare the predicted class labels with the true class labels
from the test dataset to create a confusion matrix. The confusion matrix will have rows
representing the true class labels and columns representing the predicted class labels. The
diagonal elements of the confusion matrix represent the number of correct predictions, while
the off-diagonal elements represent the misclassifications.
Calculate performance metrics:-Use the values in the confusion matrix to calculate various
performance metrics such as accuracy, precision, recall, and F1 score, which provide
insights into the classification performance of the model.
Here's an example of how you can create a confusion matrix using K-means clustering as a
feature extraction step in Python:-
from sklearn.cluster import KMeans

# Step 1: Perform K-means clustering
kmeans = KMeans(n_clusters=3)
kmeans.fit(X_train) # X_train is your training data
# Step 2: Train a classifier
X_train_kmeans = kmeans.transform(X_train)
X_test_kmeans = kmeans.transform(X_test) # X_test is your test data
clf = LogisticRegression()
clf.fit(X_train_kmeans, y_train) # y_train is your true class labels for training data
# Step 3: Make predictions
y_pred = clf.predict(X_test_kmeans)
# Step 4: Create a confusion matrix
25
confusion_mat = confusion_matrix(y_test, y_pred) # y_test is your true class labels for test
data, y_pred is the predicted class labels
# Step 5: Calculate performance metrics
accuracy = (confusion_mat[0, 0] + confusion_mat[1, 1]) / np.sum(confusion_mat)
precision = confusion_mat[1, 1] / (confusion_mat[1, 1] + confusion_mat[0, 1])
recall = confusion_mat[1, 1] / (confusion_mat[1, 1] + confusion_mat[1, 0])
f1_score = 2 * (precision * recall) / (precision + recall)
print("Confusion Matrix:\n", confusion_mat)
print("Accuracy: {:.2f}".format(accuracy))
print("Precision: {:.2f}".format(precision))
print("Recall: {:.2f}".format(recall))
print("F1 Score: {:.2f}".format(f1_score))
13.Classification Report using Logistic Regression:-
Here's an example of how you can generate a classification report using logistic regression
in Python, utilizing the sklearn library.
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report
# Load your dataset
# Replace X and y with your own features and target variable
X, y = load_your_dataset()
# Split the dataset into training and testing sets
# Create and fit the logistic regression model
clf = LogisticRegression()
clf.fit(X_train, y_train)
# Make predictions on the testing set
y_pred = clf.predict(X_test)
# Generate the classification report
report = classification_report(y_test, y_pred)
# Print the classification report
print(report)
The classification_report() function from sklearn.metrics generates a report that includes
metrics such as precision, recall, F1-score, and support for each class in a classification
problem. You can interpret the report to assess the performance of your logistic regression
model.
Here's an example of how you can generate a classification report for agriculture and
crop production using logistic regression. Please note that this is a hypothetical
example and the data and results are not based on actual data.
26
import pandas as pd
# Load the dataset (example data)
data = pd.read_csv('agriculture_dataset.csv')
# Split the data into features and target variable
X = data.drop('Crop_Type', axis=1) # Features
y = data['Crop_Type'] # Target variable
# Split the data into training and testing sets
# Create and fit the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict on the testing data
y_pred = model.predict(X_test)
# Generate the classification report
report = classification_report(y_test, y_pred)
# Print the classification report
print(report)
The classification_report function from scikit-learn is used to generate the classification
report, which provides metrics such as precision, recall, F1-score, and support for each
class in the target variable (Crop_Type in this case). The report gives an overview of the
performance of the logistic regression model in predicting the crop type based on the
features provided in the dataset.
14.Source Code and Output:-
Source Code:-
from google.colab import files

uploaded = files.upload()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import interact
data=pd.read_csv("data (1) (1).csv")
print(data)
print(data.isnull().sum())
sns.heatmap(data.isnull())
plt.show()
Output:-
27
print("Avg nitrogen {0:.2f}".format(data["N"].mean()))

print("Avg phosphorus {0:.2f}".format(data["P"].mean()))
print("Avg Potassium {0:.2f}".format(data["K"].mean()))
print("Avg temperature {0:.2f}".format(data["temperature"].mean()))
print("Avg humidity {0:.2f}".format(data["humidity"].mean()))
print("Avg ph {0:.2f}".format(data["ph"].mean()))
print("Avg rainfall {0:.2f}".format(data["rainfall"].mean()))
Output:-
Avg nitrogen 50.55
Avg phosphorus 53.36
Avg Potassium 48.15
Avg temperature 25.62
Avg humidity 71.48
Avg ph 6.47
Avg rainfall 103.46
@interact
def summary(crops=list(data["label"].value_counts().index)):
x=data[data['label']==crops]
print(x['label'])
print("Min nitrogen required",x["N"].min())
print("Avg nitrogen required",x["N"].mean())
print("Max nitrogen required",x["N"].max())
print("Min phosphorus required",x["P"].min())
print("Avg phosphorus required",x["P"].mean())
print("Max phosphorus required",x["P"].max())
print("Min Potassium required",x["K"].min())
print("Avg Potassium required",x["K"].mean())
print("Max Potassium required",x["K"].max())
print("Min temperature required",x["temperature"].min())
print("Avg temperature required",x["temperature"].mean())
print("Max temperature required",x["temperature"].max())
print("Min ph required",x["ph"].min())
print("Avg ph required",x["ph"].mean())
print("Max ph required",x["ph"].max())
print("Min humidity required",x["humidity"].min())
print("Avg humidity required",x["humidity"].mean())
print("Max humidity required",x["humidity"].max())
print("Min rainfall required",x["rainfall"].min())
print("Avg rainfall required",x["rainfall"].mean())
print("Max rainfall required",x["rainfall"].max())
Output:-
28
plt.subplot(3,4,1)
sns.histplot(data['N'],color="green")
plt.xlabel("Nitrogen")
plt.grid()
plt.subplot(3,4,2)
sns.histplot(data['P'],color="red")
plt.xlabel("P")
plt.grid()
plt.subplot(3,4,3)
sns.histplot(data['K'],color="yellow")
plt.xlabel("K")
plt.grid()
plt.subplot(3,4,4)
sns.histplot(data['ph'],color="blue")
plt.xlabel("PH")
plt.grid()
plt.subplot(2,4,5)
sns.histplot(data['temperature'],color="yellow")
plt.xlabel("temperature")
plt.grid()
plt.subplot(2,4,6)
sns.histplot(data['humidity'],color="green")
plt.xlabel("humidity")
plt.grid()
plt.subplot(2,4,7)
sns.histplot(data['rainfall'],color="blue")
plt.xlabel("rainfall")
plt.grid()
"""**Elbow method**"""
from pandas.core.common import random_state
x=data.drop(['label'],axis=1)
x=x.values
wcss=[]
for i in range(1,11):
km=KMeans(n_clusters=i,init="k-means++", max_iter=2000,n_init=10,random_state=0)
km.fit(x)
wcss.append(km.inertia_)
plt.plot(range(1,11),wcss)
plt.show()
km=KMeans(n_clusters=4,init="k-means++", max_iter=2000,n_init=10,random_state=0)
y_means=km.fit_predict(x)
a=data['label']
y_means=pd.DataFrame(y_means)
z=pd.concat([y_means,a],axis=1)
z=z.rename(columns={0:'cluster'})
29
print("Cluster 1",z[z['cluster']==0]['label'].unique())
y=data['label']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.2,random_state=0)
model=LogisticRegression()
model.fit(x_train,y_train)
y_pred=model.predict(np.array([[40,40,40,40,100,7,200]]))
print(y_pred)
y_pred=model.predict(x_test)
cr=classification_report(y_test,y_pred)
print(cr)
cm=confusion_matrix(y_test,y_pred)
sns.heatmap(cm,annot=True)
print(cm)
Output:-

x=data.drop(['label'],axis=1)
x=x.values
plt.rcParams['figure.figsize']=(10,4)
wcss=[]
for i in range(1,11):
km=KMeans(n_clusters=i,init='k-means++',max_iter=2000,n_init=10,random_state=0)
km.fit(x)
wcss.append(km.inertia_)
plt.plot(range(1,11),wcss)
plt.xlabel("No of cluster")
plt.ylabel("wcss")
plt.show()
Output:-
15.Conclusion:-
In conclusion, machine learning has emerged as a promising tool for predicting crop yields
30
and improving agricultural practices. By leveraging large datasets and sophisticated
algorithms, machine learning models can analyze various factors such as weather patterns,
soil conditions, historical crop data, and management practices to make accurate
predictions about crop yields.
One key benefit of crop prediction using machine learning is its potential to optimize
agricultural practices. Farmers can use these predictions to make informed decisions about
planting schedules, irrigation, fertilization, and pest management, leading to more efficient
resource allocation and higher yields. Additionally, machine learning can help farmers
identify early warning signs of crop stress or disease outbreaks, allowing for timely
interventions and reducing crop losses.
Machine learning in crop prediction also has the potential to contribute to sustainable
agriculture by optimizing resource use. For example, by predicting crop water requirements,
farmers can implement targeted irrigation strategies, minimizing water waste and conserving
this precious resource. Similarly, by predicting crop nutrient needs, farmers can apply
fertilizers more judiciously, reducing the risk of nutrient runoff and environmental pollution.
However, it's important to note that machine learning models for crop prediction are not
without limitations. Accurate predictions depend on the availability of reliable data, and in
many regions, data may be sparse or inconsistent. Additionally, machine learning models are
not immune to biases and may suffer from limitations in generalization, especially when
applied to different regions or crop varieties. Therefore, it's crucial to continue refining and
validating these models using field data and expert knowledge.
In conclusion, machine learning has the potential to revolutionize crop prediction and
agricultural practices, leading to improved crop yields, resource optimization, and
sustainable agriculture. However, ongoing research, data collection, and model validation
are necessary to ensure their reliability and effectiveness in real-world farming scenarios.
31
16.Future scope:-
The future scope of machine learning in crop prediction is promising and holds significant
potential for revolutionizing agriculture and improving crop production. Here are some key
areas where machine learning can play a significant role in the future:-
Machine learning algorithms can analyze a vast amount of data,
including soil quality, weather patterns, pest and disease prevalence, and plant growth rates
to provide farmers with precise recommendations on planting, fertilization, irrigation, and
pest control. This can optimize resource usage, reduce input costs, and increase crop yields.
Machine learning can be used to analyze historical data on
crop diseases and pests and create predictive models that can help farmers anticipate
disease outbreaks and pest infestations. This can enable early intervention and prevent crop
losses, reducing the reliance on chemical pesticides and minimizing environmental impact.
As climate change continues to impact agriculture, machine
learning can help farmers adapt by providing predictive models that take into account
changing weather patterns, temperature fluctuations, and rainfall variability. This can enable
farmers to make informed decisions about crop selection, planting times, and irrigation
strategies.
Machine learning algorithms can analyze data on crop growth,
historical yield data, weather patterns, and other factors to create accurate crop yield
forecasts. This can help farmers with crop planning, marketing, and financial decision-
making.
Machine learning can aid in crop breeding programs
by analyzing genetic data and identifying optimal combinations of traits for crop
improvement. This can accelerate the development of new crop varieties with improved yield,
resistance to diseases and pests, and other desirable traits.

Machine learning can analyze remote sensing data,
including satellite imagery, to monitor crop health, detect stressors such as nutrient
deficiencies, water stress, and disease outbreaks. This can help farmers make data-driven
decisions about crop management and optimize inputs.
Machine learning can power decision support systems that
provide farmers with real-time recommendations and insights for crop management. These
systems can integrate data from various sources and provide personalized recommendations
based on the specific needs of each farm.
In conclusion, machine learning has a bright future in crop prediction and agriculture, and it
has the potential to significantly improve crop production, optimize resource usage, and
contribute to sustainable farming practices. Continued advancements in machine learning
algorithms, data collection, and analytics are expected to drive further innovation in this
field in the future.
17.Bibliography:-
1. Application of Machine Learning in Agriculture,written by Mohammad
Ayoub Khan,Rijwan Khan,Mohammad Aslam Ansari.
2.Data Visualization Storytelling Using Data,written by Sharada
Sringeswara,Purvi Tiwari,U. Dinesh Kumar.
3.Python Data Analytics(with pamdas,numpy,matplotlib),written by Fabio
Nelli.
4.Data Analytics with Python,written by Dr. Bhaves Devra,Dr. Dilip
Kumar,Dr. Shajahan Basheer,Dr. Proloy Ghosh.

Final Project Report

Uploaded by

Copyright:

Available Formats

Final Project Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Project Report

Uploaded by

Copyright:

Available Formats

Final Project Report

Topic:-Applying Machine Learning Algorithms for Analyzing and predicting

Submitted By:-Saswata Banerjee

Submitted To:-Partha Koley

Course Name:-Machine Learning With Python

6. Supervised And Unsupervised……………………………………………12-13

9. The Elbow Method………………………………………………………………15-16

10. Distribution of agricultural conditions……………………………….16-18

11. Prediction of Crops………………………………………………………………18-20

12. Confusion matrix using logistic Regression,Kmeans…………..20-25

13. Classification report for logistic logistic regression……………25-26

14. Source code and Output…………………………………………………….26-29

16. Future scope………………………………………………………………………..31-32

action. In this paper we first introduced you to the python programming

scientific computing, data science, and machine learning, boosting both

performance and productivity by enabling the use of low-level libraries. This

of machine learning model a easy process.Then we will look at different types of

Linear Regression is a Machine Learning algorithm based on supervised learning.

It performs a regression task. It is used to predict the value of a variable based

value is called the independent variable.

Hypothesis function for linear regression:-

done by the business on different temperatures.

Keywords:- Python; Machine Learning; Artificial Intelligence; Regression; Linear

SDT(Software Development Tools):-Machine learning involves using algorithms to allow

working with ndarray very easy.

Language:-Python,C and C++.

Relevant data is very important in data science.

Work:-Matplotlib is easy to use and an amazing visualizing library in Python. It is built

Work:-pyplot is a collection of command style functions that make matplotlib work

decorates the plot with labels, etc.

Work:-Seaborn is a library for making statistical graphics in Python. It builds on top

explore and understand your data.

Colab and its software development kit (SDK).

Google Colab is a cloud-based interactive computing environment that provides free

access to a Jupyter Notebook environment along with computational resources,

can be created, shared, and saved directly in Google Drive.

"google.colab" that allows developers to interact with the Colab environment

programmatically. The "google.colab" library provides functionality for tasks such as

download files from Colab to your local machine.

Installing Python packages:-The library provides a way to install Python packages

"drive.flush_and_unmount()" to flush and unmount the Google Drive, and "os.kill()" to

terminate the current session.Connecting to external services: The library provides

interface programmatically, for example, by using the "IPython.display" module to

software development is divided into various phases.

The waterfall model is a continuous software development model in which development is

analysis, design, implementation, testing (validation), integration, and maintenance.

RAD or Rapid Application Development process is an adoption of the waterfall model; it

spiral technique is a combination of rapid prototyping and concurrency in design and

This is the first quadrant of the cycle (upper-left quadrant).

involve activities such as benchmarking, simulation, and prototyping.

joins by Coding phase.

begin before the previous cycle is complete.

Agile methodology is a practice which promotes continues interaction of development and

iteration lasts from one to three weeks.

assumptions about the bulk of software projects:

used to test the configuration.

point of view) as we might like.

It is a particular implementation of a software development life cycle that focuses on an