MCD2080 Business Statistics Group Assignment-Final
MCD2080 Business Statistics Group Assignment-Final
MCD2080 Business Statistics Group Assignment-Final
Trimester 2, 2024
Group Assignment
Glassdoor is a free digital platform that gathers information and reviews from employees or former
employees about companies, salaries, and even job openings.
The dataset used for this group assignment contains a random sample of job advertisements
from Glassdoor.com. It is used to analyse the current job trends in the data science field based on
job positions, company size, software skills, etc.
Refer to the workbook labelled Job Advertisements.xlsx in the Group assignment section on
Moodle. This data can be used to understand various software skill requirements and other factors
in job advertisements for Data Analysts, Data Engineers and Data Scientists. In this assignment,
your task is to investigate and report how the expected salary is associated with various factors
such as job types and software skills requirements.
Data definition:
In the file “Job Advertisements.xlsx”, you are provided with both numeric and categorical data.
Note that this data has already been cleaned for you, and any missing records are removed. The
following table contains the data definition.
Column Column Name Data Definition
A Advertisement ID The unique identifier for the job posting
B Job Type A simplified job title
C Company Name Full name of the company the advertisement is posted for
D Company Size Range of number of employees in the company
E Ownership Type Company type of ownership. 8 ownership types provided
F Industry The industry to which the organisation belongs
G Min Salary Minimum expected salary ($ 000 per year) for the job
H Expected Salary Average expected salary ($ 000 per year) for the job
I Python A binary indicator of whether the job requires Python
knowledge/skills (1:Yes, 0:No)
J AWS A binary indicator of whether the job requires AWS
knowledge/skills (1:Yes, 0:No)
K Excel A binary indicator of whether the job requires Excel
knowledge/skills (1:Yes, 0:No)
Purpose:
We wish to explore the relationships between the expected salary and other independent
variables. This is done by utilising the following statistical tools:
2. Summary Statistics
3. Confidence Intervals
4. Hypothesis Testing
5. Regression Analysis
Assignment questions:
1 a). Discuss and compare the average expected salary for Data Engineers and Data Analysts using
the following factors:
• Ownership
• Industry
Construct appropriate charts to support your discussion. Keep your discussion succinct.
Your answer to this question should not be longer than 1-2 pages.
b). We wish to compare the distribution of the expected salary between data analysts and
engineers.
Generate Summary statistics and histograms and use them to compare the distributions.
(14 marks)
2
Week 7 Check point: Do questions 2 & 3.
2. We will now explore the relationship between the expected salary of Data Analysts and Data
Engineers.
a). Calculate the 95% Confidence Interval estimate of the true average expected salary for Data
Analysts and Engineers. Report your results using the table below.
b). Calculate the 95% Confidence Interval estimate of the true average expected salary for Data
Analysts and Engineers who have the following software skills:
• Excel
• Python
• AWS
For each variable, report your results using the following format in the examples provided.
For part (c) only, the expected length of the answer should be less than a page.
(20 marks)
3
3. We wish to disentangle the relationship between expected salary and Excel skills in each job
type.
Use your knowledge in Hypothesis Testing to answer the following questions.
Hint: For each test, state the hypotheses, p-value and conclusion in the context of the question.
(6 marks)
Expected salary and all other variables, such as three software skills, the two job types (data
analysts and data engineers), and the minimum salary. You are required to produce one
multiple regression output.
This section includes an analysis of the statistical significance of various factors in the model.
Highlight the key factors that the multiple regression reveals as being the driver of Expected
Salary.
(15 marks)
5. Based on the statistical analysis and results in questions 1 to 4, draw conclusions on the
following:
(20 marks)
4
Assignment marks
The maximum total mark for the assignment is 175. Your total score will be composed of two
parts:
Please note that any group member who will not give feedback to other group members will be
awarded zero marks.
You will be required to fill in the peer evaluation on Teammates to be eligible for this component.
Please note that the Unit Leader reserves the right to adjust individual report marks
based on the peer evaluation. Should the feedback indicate that an individual did not
contribute to the group assignment, the reporting mark will be adjusted to zero, implying
that the individual’s group assignment contribution to their final grade will be 0%.
Report requirements:
Presentation:
Use PowerPoint or other cloud-based apps eg Google slide, Prezi or Visme, etc.
• The link is set up using an Assignment Tool on Moodle. Please submit the group
Report/Answers in Word document or PDF.
• If the question has sub-parts, for example, (a), (b)…, please indicate the labels for each part
clearly.
• DO NOT click on "submit all and finish" before you finish all questions.
• ONLY 1 attempt is allowed for the Assignment. Group members should appoint one member
to submit on behalf of the group.