Assignment Data Science using Python (BCA-DSE4.2) Date of Submission: 11-3-2024
1. What is NumPy, and why is it used in data analysis?
2. List three mathematical functions provided by NumPy. 3. Explain the difference between NumPy array size and shape. 4. Given a dataset, apply NumPy random functions to generate sample data for analysis. 5. Use NumPy to perform basic arithmetic operations on an array of numbers. 6. What are the main differences between a Pandas Series and a DataFrame? 7. List two methods for handling missing values in Pandas. 8. Explain how the Pandas read_csv function differs from write_csv, and why both are essential for data analysis. 9. Describe the process of data cleansing using Pandas. 10.Apply the Pandas concat() function to combine multiple DataFrame objects into one. 11.Using a sample dataset, demonstrate how to handle missing values effectively. 12.Analyze the implications of using join() vs. append() in Pandas for merging data. 13.Do the following program based on the csv file given Market Region No_of_Orders Profit Sales Africa Western Africa 251 -12,901.51 78,476.06 Africa Southern Africa 85 11,768.58 51,319.50 Africa North Africa 182 21,643.08 86,698.89 Africa Eastern Africa 110 8,013.04 44,182.60 Africa Central Africa 103 15,606.30 61,689.99 Asia Pacific Western Asia 382 -16,766.90 124,312.24 Asia Pacific Southern Asia 469 67,998.76 351,806.60 Asia Pacific Southeastern 533 20,948.84 329,751.38 Asia Asia Pacific Oceania 646 54,734.02 408,002.98 Asia Pacific Eastern Asia 414 72,805.10 315,390.77 Asia Pacific Central Asia 37 -2,649.76 8,190.74 Europe Western Europe 964 82,091.27 656,637.14 Europe Southern 338 18,911.49 215,703.93 Europe Europe Northern 367 43,237.44 252,969.09 Europe Europe Eastern Europe 241 25,050.69 108,258.93 LATAM South America 496 12,377.59 210,710.49 LATAM Central America 930 74,679.54 461,670.28 LATAM Caribbean 288 13,529.59 116,333.05 USCA Western US 490 44,303.65 251,991.83 USCA Southern US 255 19,991.83 148,771.91 USCA Eastern US 443 47,462.04 264,973.98 USCA Central US 356 33,697.43 170,416.31 USCA Canada 49 7,246.62 26,298.81
(i) List all the columns found in the sales dataset.
(ii) What units are used to measure sales and profits in the dataset? (iii) Explain how the profit is calculated in the sales dataset. (iv) Describe the relationship between the number of orders and sales based on the dataset. (v) Calculate the total profit for a specific region using the dataset. (vi) Using the dataset, identify which region has the highest number of orders. (vii) Analyze the dataset to find which market has the highest average sales. (viii) Compare the profits across different regions within a single market. (ix) Evaluate the performance of each market based on the profit margins provided in the dataset. (x) Assess which region demonstrates the most significant disparity between the number of orders and sales volume.