Python/Numpy Basics: UCI ML Repository
Python/Numpy Basics: UCI ML Repository
Python/Numpy basics
Download Iris Dataset from here. And then run python program to do following in colab.
1. Plot the data points using the first two dimensions (Sepal Length and Sepal Width) .
2. Use three different shapes (triangle, square, circle) to plot datapoints for three different classes.
You should use the class information from class label and use them when you decide on shapes
(see slide 6 and slide 10)
3. Calculate the mean data point for each class and show them with similar shape with the larger
size.
4. Now, plot a line (l) in this plot with line equation. The line is l = span{ [ −2.75
2.75 ]
}. Therefore, the
5. Now calculate the projection of each data points on the line l (spanned by the vector [ −2.75
2.75 ]
).
And plot the projected point on the line using the same shape but smaller size. So all smaller
shapes would be on the line. (See slide 10)
6. Draw the normal distribution function for all the sepal length (X1) . To do that, first calculate
sample mean(µ1) and sample variance.
7. Draw the bivariate normal distribution function for the sepal length (X1) and the Sepal Width
(X2). So you need to draw the function f(X1,X2). To do that, first calculate sample sepal length
mean(µ1), sample sepal width mean(µ2), and the covariance matrix ∑ .
Download the magic04.data data file from the UCI ML Repository. The dataset has 10 real attributes,
and the last one is simply the class label, which is categorical, and which you will ignore for this
assignment. Assume that attributes are numbered starting from 0.
7. Which pair of attributes has the largest covariance, and which pair of attributes has the smallest
covariance? Print these values.
Assignment 1 Due Date 14/20/2021