MCD2080 Business Statistics Group Assignment-Final

MCD2080 Business Statistics

Trimester 2, 2024

Group Assignment

Problem background:

Glassdoor is a free digital platform that gathers information and reviews from employees or former
employees about companies, salaries, and even job openings.

The dataset used for this group assignment contains a random sample of job advertisements
from It is used to analyse the current job trends in the data science field based on
job positions, company size, software skills, etc.

Refer to the workbook labelled Job Advertisements.xlsx in the Group assignment section on
Moodle. This data can be used to understand various software skill requirements and other factors
in job advertisements for Data Analysts, Data Engineers and Data Scientists. In this assignment,
your task is to investigate and report how the expected salary is associated with various factors
such as job types and software skills requirements.

Data definition:

In the file “Job Advertisements.xlsx”, you are provided with both numeric and categorical data.
Note that this data has already been cleaned for you, and any missing records are removed. The
following table contains the data definition.
Column Column Name Data Definition
A Advertisement ID The unique identifier for the job posting
B Job Type A simplified job title
C Company Name Full name of the company the advertisement is posted for
D Company Size Range of number of employees in the company
E Ownership Type Company type of ownership. 8 ownership types provided
F Industry The industry to which the organisation belongs
G Min Salary Minimum expected salary ($ 000 per year) for the job
H Expected Salary Average expected salary ($ 000 per year) for the job
I Python A binary indicator of whether the job requires Python
knowledge/skills (1:Yes, 0:No)
J AWS A binary indicator of whether the job requires AWS
knowledge/skills (1:Yes, 0:No)
K Excel A binary indicator of whether the job requires Excel
knowledge/skills (1:Yes, 0:No)
We wish to explore the relationships between the expected salary and other independent
variables. This is done by utilising the following statistical tools:

1. Pivot Tables and Charts

2. Summary Statistics

3. Confidence Intervals

4. Hypothesis Testing

5. Regression Analysis

Assignment questions:

Answer all questions.

Week 4 Check point: Do question 1

1 a). Discuss and compare the average expected salary for Data Engineers and Data Analysts using
the following factors:

• Ownership
• Industry

Construct appropriate charts to support your discussion. Keep your discussion succinct.

Your answer to this question should not be longer than 1-2 pages.

b). We wish to compare the distribution of the expected salary between data analysts and

Generate Summary statistics and histograms and use them to compare the distributions.

In your discussion, include measures of central tendency, variability and shape.

When discussing, include contextual interpretations of the measures used.

Your answer to this question should not be longer than 2 pages.

(14 marks)

Week 7 Check point: Do questions 2 & 3.

2. We will now explore the relationship between the expected salary of Data Analysts and Data

a). Calculate the 95% Confidence Interval estimate of the true average expected salary for Data
Analysts and Engineers. Report your results using the table below.

Confidence Interval Estimate of Average Expected Salary for Job Types

Job Type Lower Boundary / Limit Upper Boundary / Limit
Data Analysts
Data Engineers

b). Calculate the 95% Confidence Interval estimate of the true average expected salary for Data
Analysts and Engineers who have the following software skills:

• Excel

• Python


For each variable, report your results using the following format in the examples provided.

Confidence Interval Estimate of Average Expected Salary of Data

Analysts requiring Excel Skills
Excel Skills Lower Boundary / Limit Upper Boundary / Limit
0 (No)
1 (Yes)

Confidence Interval Estimate of Average Expected Salary of Data

Engineers requiring Excel Skills
Excel Skills Lower Boundary / Limit Upper Boundary / Limit
0 (No)
1 (Yes)

(Please use a similar format for Python and AWS)

c). Discuss your results obtained in (a) and (b). Remember to discuss answers for all tables

For part (c) only, the expected length of the answer should be less than a page.

(20 marks)

3. We wish to disentangle the relationship between expected salary and Excel skills in each job
Use your knowledge in Hypothesis Testing to answer the following questions.

a). Do a majority/minority of data analyst roles require Excel skills?

b). Do a majority/minority of data analyst roles require Python skills?

c). Do a majority/minority of data engineer roles require Excel skills?

d). Do a majority/minority of data engineer roles require Python skills?

Hint: For each test, state the hypotheses, p-value and conclusion in the context of the question.

(6 marks)

Week 11 Final presentation and report submission: Do questions 4 & 5.

4. Estimate a multiple regression model to analyse the relationship between:

Expected salary and all other variables, such as three software skills, the two job types (data
analysts and data engineers), and the minimum salary. You are required to produce one
multiple regression output.

This section includes an analysis of the statistical significance of various factors in the model.
Highlight the key factors that the multiple regression reveals as being the driver of Expected

Your answer to this question should be approximately 1 to 1.5 pages.

(15 marks)

5. Based on the statistical analysis and results in questions 1 to 4, draw conclusions on the

a). All factors associated with Expected Salary.

b). The importance of software skills for different job types
c). Recommendations for job seekers to improve their ability to obtain higher-paying

Your answer to this question should be approximately 1 to 1.5 pages.

(20 marks)

Assignment marks

The maximum total mark for the assignment is 175. Your total score will be composed of two

• Final assignment report (Questions 1-5): maximum marks of 75.

• Presentation: a maximum mark of 100
(i). Week 4 check point - 20 (staff: 10 & peer to peer evaluation: 10)
(ii). Week 7 check point - 30 (staff:15 & peer to peer evaluation:15)
(iii). Week 11 check point - 40 (staff:20 & peer to peer evaluation:20)

Please note that any group member who will not give feedback to other group members will be
awarded zero marks.

You will be required to fill in the peer evaluation on Teammates to be eligible for this component.

Please note that the Unit Leader reserves the right to adjust individual report marks
based on the peer evaluation. Should the feedback indicate that an individual did not
contribute to the group assignment, the reporting mark will be adjusted to zero, implying
that the individual’s group assignment contribution to their final grade will be 0%.

Report requirements:

● All answers should be in font size 12pt and 1.5 spacing.

● Plots and tables must be legible, with appropriate labels to aid readers.
● Statistical results need to be summarised in succinct table formats.

● You will lose marks for poor presentation.


Use PowerPoint or other cloud-based apps eg Google slide, Prezi or Visme, etc.

Week 11 Final Assignment submission guidelines

• The link is set up using an Assignment Tool on Moodle. Please submit the group
Report/Answers in Word document or PDF.
• If the question has sub-parts, for example, (a), (b)…, please indicate the labels for each part
• DO NOT click on "submit all and finish" before you finish all questions.
• ONLY 1 attempt is allowed for the Assignment. Group members should appoint one member
to submit on behalf of the group.

