0% found this document useful (0 votes)
73 views2 pages

Python/Numpy Basics: UCI ML Repository

The document outlines two assignments for a Python/Numpy basics course, including instructions to plot Iris dataset data using different shapes for each class, calculate class means, project points onto a line, and draw normal distributions; and for a numeric data analysis assignment, to calculate statistical measures like the mean vector, covariance matrices using different methods, correlation between attributes, attribute variances, and maximum/minimum covariance between attribute pairs. Students are to complete Python scripts to analyze and visualize datasets following the given instructions.

Uploaded by

ARJU Zerin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
73 views2 pages

Python/Numpy Basics: UCI ML Repository

The document outlines two assignments for a Python/Numpy basics course, including instructions to plot Iris dataset data using different shapes for each class, calculate class means, project points onto a line, and draw normal distributions; and for a numeric data analysis assignment, to calculate statistical measures like the mean vector, covariance matrices using different methods, correlation between attributes, attribute variances, and maximum/minimum covariance between attribute pairs. Students are to complete Python scripts to analyze and visualize datasets following the given instructions.

Uploaded by

ARJU Zerin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 2

Assignment 1 Due Date 14/20/2021

Python/Numpy basics

Download Iris Dataset from here. And then run python program to do following in colab.

1. Plot the data points using the first two dimensions (Sepal Length and Sepal Width) .
2. Use three different shapes (triangle, square, circle) to plot datapoints for three different classes.
You should use the class information from class label and use them when you decide on shapes
(see slide 6 and slide 10)
3. Calculate the mean data point for each class and show them with similar shape with the larger
size.

4. Now, plot a line (l) in this plot with line equation. The line is l = span{ [ −2.75
2.75 ]
}. Therefore, the

equation of the line is: x2 = - x2 (See slide 10)

5. Now calculate the projection of each data points on the line l (spanned by the vector [ −2.75
2.75 ]
).

And plot the projected point on the line using the same shape but smaller size. So all smaller
shapes would be on the line. (See slide 10)
6. Draw the normal distribution function for all the sepal length (X1) . To do that, first calculate
sample mean(µ1) and sample variance.
7. Draw the bivariate normal distribution function for the sepal length (X1) and the Sepal Width
(X2). So you need to draw the function f(X1,X2). To do that, first calculate sample sepal length
mean(µ1), sample sepal width mean(µ2), and the covariance matrix ∑ .

Numeric Data Analysis

Download the magic04.data data file from the UCI ML Repository. The dataset has 10 real attributes,
and the last one is simply the class label, which is categorical, and which you will ignore for this
assignment. Assume that attributes are numbered starting from 0.

Write a script to answer the following questions.

1. Compute the multivariate mean vector


2. Compute the sample covariance matrix as inner products between the columns of the centered
data matrix (see Eq. (2.38) in chapter 2).
3. Compute the sample covariance matrix as outer product between the centered data points (see
Eq. (2.39) in chapter 2)
4. Compute the correlation between Attributes 1 and 2 by computing the cosine of the angle
between the centered attribute vectors. Plot the scatter plot between these two attributes.
5. Assuming that Attribute 1 is normally distributed, plot its probability density function.
6. Which attribute has the largest variance, and which attribute has the smallest variance? Print
these values.

7. Which pair of attributes has the largest covariance, and which pair of attributes has the smallest
covariance? Print these values.
Assignment 1 Due Date 14/20/2021

You might also like