Chapter 008-Data Analysis Techniques-Update
Chapter 008-Data Analysis Techniques-Update
Chapter 008-Data Analysis Techniques-Update
❑ Knowing the different statistical analysis methods and how to use them can
help you explore data, find patterns, and discover trends in your research.
❑ Choosing the appropriate statistical tests for each research question is the
most challenging feature in statistics but also the most necessary in order to
❑ It is a probability distribution that is symmetric about the mean, showing that data near
the mean are more frequent in occurrence than data far from the mean.
❑ BECAUSE in most cases, the distribution of many statistical tests is normal or follows
❑ It also indicates the type of analysis that can be perform on the data (Selection of tests)
- Bell-shaped
Parametric Statistical Tests
❑ Parametric tests are hypothesis testing procedures which assume that the variables of
interest are measured on an interval or ratio scale and observations must be drawn from
the normally distributed population.
❑ When the dependent variable is measured on a continuous scale, then a parametric test
should typically be selected.
❑ These types of test includes t-tests, f-tests, z-tests and ANOVA tests.
❑ The appropriate use of such tests requires one to check whether the data fulfil certain
assumptions or conditions.
❖ First, observations should be independent i.e. the occurrence of A should not affect
the probability of B.
❖ Second, data should follow a normal distribution with mean equals to zero and a
given variance.
❑ Statistical tests such as Kolmogorov-Smirnov, Shapiro-Wilk and D’Agostino-Pearson are
used under the null hypothesis to test that the sample data fits a standard normal
distribution.
Parametric Statistical Tests Cont’d
T- test:
i. Independent t- test (Student):
- one sampled or 2 sampled t-test
- independent variables (males/females, Malay/Chinese, case/control)
ii. Paired t- test: (dependent variables (before/after, 2 methods used on the same
patients)
ANOVA
- Comparing between more than 2 groups (Malay/Chinese/Indians)
Pearson correlation
- Correlation between 2 quantitative variables (weight and height; weight and age).
Example 1: t-test
t-test is used when you have 2 conditions and you will like to compare the
differences between them to see if it is significant.
So in this example, I want to see if there is a significant difference between
my 2 conditions.
❑ My data came from the same people. Condition A was recorded in the
morning while the Condition B was taken in afternoon.
❑ array1: values from Condition A
❑ array2: values from Condition B
❑ tails: choose 1 if you have a direct hypothesis, then it will be 1 tail test
(if you can predict the direction of the effect) otherwise choose 2 for
two tails test.
❑ type: 1-if the data comes from the same participants
❑ type: 2-if the data comes from different groups and variance
associated with each group are the same
❑ type: 3-if the data comes from different groups and variance With this p-value, it shows that the difference between the 2
associated with each group are not the same. conditions is significant at 0.05 level
Demo Session 演示会议
Example 2: f-test
We can use f-test to test whether the 2 variances of 2 population are equal or not.
Female Male
935 978
955 782
967 905
1002 973
1000 1006
964 1017 The f-value is greater than the critical 1 tail
1952 995 value, therefore we reject H0 (Null hypothesis)
933
H0:𝜎1 2 = 𝜎2 2 (The null hypothesis states that the variance of both groups are equal)
H1:𝜎1 2 ≠ 𝜎2 2 (The alternate hypothesis states that the variance of both groups are not
equal)
To carry out this test in MS Excel, Click Data tab -> Analysis Group ->
click on Data Analysis -> Select F-Test Two-Sample for Variances ->OK
-> Variable 1 & 2 Range -> choose Output Range -> OK
Demo Session 演示会议
Example 3: ANOVA
We can use ANOVA test when comparing between more than 2 groups.
❖ Pie Charts.
Bivariate Analysis
❑ Identifies the relationship between 2 variables
❑ It is a form of statistical analysis, used to find out if there is a relationship between two sets of values.
❑ Another example could be, if you want to find out the relationship between caloric intake and weight.
❑ Caloric intake would be your independent variable, X and weight would be your dependent variable, Y.
❑ Let’s say you had a caloric intake of 3,000 calories per day and a weight of 300lbs.
❑ You would write that with the x-variable followed by the y-variable: (3000,300).
❑ (X,Y)=(100,56),(23,84),(398,63),(56,42)
❖ Correlation Coefficients
Multivariate Analysis
❑ Identifies the relationship between more than 2 variables
Blood Frequency %
group
A 12 24
B 18 36
AB 5 10
O 15 30
Total 50 100
Example 2
❑ Table 2: Distribution of 50 patients at the hospital in
February 2022 according to their age.
<20 12 24
20-39 18 36
40-49 5 10
50+ 15 30
Total 50 100
Example 3: Complex Frequency Distribution Table
Lung cancer
Total
Smoking Cases Control
No. % No. % No. %
Smoker 15 75% 8 20% 23 38.33
Non smoker
5 25% 32 80% 37 61.67
quantities.
◼ Line graph
Nominal Data Presentation
It all the same information,
(based on the same data).
Just different method of presentation.
Summary Statistics (Mathematical Presentation)
❑Summary statistics summarize and provide information about your
sample data.
❑It tells you something about the values in your data set.
❑This includes where the mean lies and whether your data is skewed.
It falls into three main categories:
❑Measures of location (also called central tendency)
❖ Mean (also called the arithmetic mean or average)
❖ Geometric mean (used for interest rates and other types of growth)
❖ Trimmed Mean (the mean with outliers excluded i.e. a piece of data that is
an abnormal distance from other points)
❖ Median (the middle of a dataset). ❑Graphs/charts.
❑Measures of spread. ❖ Histogram.
❖ range (how spread out your data is). ❖ Frequency Distribution Table.
❖ Interquartile range (where the “middle fifty” percent of your data is). ❖ Box plot.
❖ Quartiles (boundaries for the lowest, middle and upper quarters of data. ❖ Bar chart.
❖ Skewed (does your data have mainly low, or mainly high values?). ❖ Scatter plot.
❖ Kurtosis (a measure of how much data is in the tails). ❖ Pie chart.
Determining the Appropriate Statistical Test
What statistical test should I
❑ Regardless of which strategy for the statistical analysis methods use for my data?
utilize.
https://forms.gle/SPizKfEhKFNGrbNh6