SPSS Intermediate Understanding Your Data
SPSS Intermediate Understanding Your Data
SPSS Intermediate Understanding Your Data
Statistical
Consulting
Website
http://www.flinders.
edu.au/library/rese
arch/eresearch/stati
stics-consulting/
or go to Flinders
University Website
A-Z
Index S
Statistical
Consultant
Introductory Level
Introduction to IBM SPSS
Introduction to Statistical Analysis
IBM SPSS - Intermediate Level
Understanding Your Data (Descriptive
Statistics, Graphs and Custom Tables)
Correlation and Multiple Regression
Logistic Regression and Survival
Analysis
Basic Statistical Techniques for
Difference Questions
Advanced Statistical Techniques
for Difference Questions
Longitudinal Data Analysis Repeated Measures ANOVA
Categorical Data Analysis
IBM SPSS - Advanced Level
Structural Equation Modelling using Amos
Linear Mixed Models
Longitudinal Data Analysis - Mixed and
Latent Variable Growth Curve Models
Scale Development
Complex Sample Survey Design / ABS and
FaHCSIA Confidentialised Datasets
SPSS
Statistical Package for the Social Sciences
PASW
Predictive Analytics Software
What is Statistics ?
statistics (st -t s t ks)
n.
1. (used with a sing. verb) The mathematics of the collection, organization,
and interpretation of numerical data, especially the analysis of population
characteristics by inference from sampling.
2. (used with a pl. verb) Numerical data.
http://www.thefreedictionary.com/Statistics
Interval Data
Differences between
measurements but no
true zero
Temperature in Celsius,
Standardized exam
score
Ordinal Data
Ordered Categories
(rankings, order, or
scaling)
Nominal Data
Ratio Data
Missing attributes
Missing attribute values
Only aggregated data
Inconsistent data
Different coding
Different naming conventions
Impossible values
Out-of-range values
Noisy data
Errors
Outliers
Inaccurate values
Missing data
Missing data codes for items such as Not applicable
and refuse to answer have not been pre coded in
the questionnaire, even though they should have
been
Need to find these cases replace with the appropriate
data code
8=refuse to answer
9=missing
Pawel Skuza 2013
Descriptive statistics
Summarising and presenting data
Measures of Central Tendency
Measures of Dispersion or Variability
Central Tendency
To summarise the location of a
distribution
Mode
Median
Mean
Median
The median is the central of an ordered distribution
Order the data from smallest to largest
For an odd number of data values in the distribution
Median=middle value of the data
For an even number of data values in the distribution
Median=(sum of the middle two values)/2
Trimmed Mean
The trimmed mean is produced by discarding
the most extreme values
In SPSS, the trimmed mean is calculated by
discarding the top and bottom 5% of the cases
Variability
To summarise the spread or dispersion
of a distribution
Low variability => scores are similar
High variability => scores differ
Range
Interquartile range (IQR)
Standard deviation / Variance
10
Range
Two definitions used
Exclusive range
Xmax Xmin
This is the most commonly used way of
calculating the range
Deviation Score
Remember
The arithmetic mean uses information about every
observation
di X i X
Pawel Skuza 2013
11
n 1
X X
n 1
X X
N
N
n 1
2
n 1
12
So
First quartile (Q1) is the 25th percentile
25% or one quarter of the cases fall below this score
Interquartile Range
(Q3-Q1)
13
Outliers
Outliers are observations that deviate
significantly from the majority of observations
Outliers can lead to
Model misspecification
Biased parameter estimation
Incorrect analysis results
How to report?
Altman, D. G. (1980). Statistics and ethics in
medical research. VI - Presentation of results.
British Medical Journal, 281(6254), 15421544.
Lang, T. A., & Secic, M. (2006). How to report
statistics in medicine : annotated guidelines
for authors, editors, and reviewers (2nd ed.).
New York: American College of Physicians.
Thabane, L., & Akhtar-Danesh, N. (2008).
Guidelines for reporting descriptive statistics
in health research. Nurse researcher, 15(2),
72-81.
Whitley, E., & Ball, J. (2002). Statistics review
1: Presenting and summarising data. Critical
Care, 6(1), 66-71.
Pawel Skuza 2013
14
Reproduced from Morgan, G. A., Leech, N. L., Gloeckner, G. W., & Barrett, K. C. (2007).
SPSS for introductory statistics : use and interpretation (3rd ed.). Mahwah, NJ: Lawrence
Erlbaum.
Summary Statistics
Categorical variables in SPSS
Analyse > Descriptive Statistics
Frequencies (Statistics: quartiles,
percentiles)
Explore (median, percentiles)
Graphs
Bar chart
15
Summary Statistics
Continuous variables in SPSS
Analyse > Descriptive Statistics
Descriptives
Explore
Graphs in Explore
Histogram
Boxplot
Pawel Skuza 2013
Exercise 2
Describing categorical data
Summarising continuous data
Data Exercise_2.sav
Simplified data from PISA 2003 Study - Australia
(The Programme for International Students Assessment)
http://www.pisa.oecd.org
16
Exercise 3
Checking Normality
Data Exercise_3.sav --- Sample of
Chicago high schools
Research question Are variables Average
class size, Reading at national norm %,
Limited English % normally distributed?
Transformations
Why?
A lot of statistical tests and methods are based
around the normal distribution assumption
Often skewness and heterogeneity of variances is
a problem
Advantages
Allows the use of standard methods
Allows the use of more powerful methods
Disadvantages
Converts measurements into a foreign unit
17
Transformations
Exercise 4
Missing Values Analysis
Data Exercise_4.sav --- Sample of
passengers from the Titanic
18
Abraham, W. T., & Russell, D. W. (2004). Missing data: A review of current methods and
applications in epidemiological research. Current Opinion in Psychiatry, 17(4), 315-321.
Allison, P. D. (2003). Missing Data Techniques for Structural Equation Modeling. Journal of
Abnormal Psychology, 112(4), 545-557.
Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses.
Journal of School Psychology, 48(1), 5-37.
Buhi, E. R., Goodson, P., & Neilands, T. B. (2008). Out of sight, not out of mind: Strategies
for handling missing data. American Journal of Health Behavior, 32(1), 83-92.
Enders, Craig K. (2010). Applied missing data analysis. New York: Guilford Press.
Everitt, Brian. (2003). Missing Values, Drop-outs, Compliance and Intention-to-Treat. In B.
Everitt (Ed.), Modern medical statistics : A practical guide (pp. 46-66). London: Arnold
Fitzmaurice, Garrett. (2008). Missing data: implications for analysis. Nutrition, 24(2), 200202. doi: DOI: 10.1016/j.nut.2007.10.014
McKnight, Patrick E. (2007). Missing data : a gentle introduction. New York: Guilford Press.
Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of
reporting practices and suggestions for improvement. Review of Educational Research,
74(4), 525-556.
Streiner, D. L. (2002). The case of the missing data: Methods of dealing with dropouts and
other research vagaries. Canadian Journal of Psychiatry, 47(1), 68-75.
19
Time permitting
Additional Exercises 5 & 6
Multiple Response Set
Data Exercise_5.sav
Merging files
Merge ITEMS_1.sav
Merge ITEMS_2.sav
20
Tutorial
Find illustrated, step-by-step
instructions for the basic features
Case studies
Hands-on examples of various types of
statistical procedures
Statistics coach
To help you find the procedure you
want to use
21
(!!! Please keep in mind that usually online resources are not academically peer
reviewed. Despite many of them being of high quality as well as being very useful
from educational point of view, they shouldnt be treated as a completely reliable
and academically sound references.
Archives of [email protected]
List Serve that is endorsed by IBM SPSS
http://www.listserv.uga.edu/archives/spssx-l.html
Other forums
http://groups.google.com/group/comp.soft-sys.stat.spss/topics?gvc=2
http://www.spssforum.com/
Pawel Skuza 2013
22
THANK YOU
Please provide us with your feedback by
completing the short survey.
23