SPSS
SPSS
SPSS
Objectives
Learn about SPSS
Open SPSS
Review the layout of SPSS
Become familiar with Menus and Icons
Exit SPSS
What is SPSS?
SPSS is a Windows based program that can be used to perform data entry and
analysis and to create tables and graphs. SPSS is capable of handling large amounts of
data and can perform all of the analyses covered in the text and much more. SPSS is
commonly used in the Social Sciences and in the business world, so familiarity with this
program should serve you well in the future. SPSS is updated often. This document was
written around an earlier version, but the differences should not cause any problems. If
you want to go further and learn much more about SPSS, I strongly recommend Andy
Fields book (Field, 2009, Discovering statistics using SPSS). Those of us who have used
software for years think that we know it all and dont pay a lot of attention to new
features. I learned a huge amount from Andys book.
Opening SPSS
Depending on how the computer you are working on is structured, you can open
SPSS in one of two ways.
1. If there is an SPSS shortcut like this
cursor on it and double click the left mouse button.
You will see a screen that looks like the image on the next page. The dialog box
that appears offers choices of running the tutorial, typing in data, running queries, or
opening an existing data source. The window behind this is the Data Editor window
which is used to display the data from whatever file you are using. You could select any
one of the options on the start-up dialog box and click OK, or you could simply hit
Cancel. If you hit Cancel, you can either enter new data in the blank Data Editor or you
could open an existing file using the File menu bar as explained later.
Click Cancel, and well get acquainted with the layout of SPSS.
Layout of SPSS
The Data Editor window has two views that can be selected from the lower left
hand side of the screen. Data View is where you see the data you are using. Variable
View is where you can specify the format of your data when you are creating a file or
where you can check the format of a pre-existing file. The data in the Data Editor is
saved in a file with the extension .sav.
Menu bar
Icons
The other most commonly used SPSS window is the SPSS Viewer window which
displays the output from any analyses that have been run and any error messages.
Information from the Output Viewer is saved in a file with the extension .spo. Lets open
an output file and look at it.
On the File menu, click Open and select Output. Select appendixoutput.spo from the
files that can be found at
http://www.uvm.edu/~dhowell/fundamentals7/SPSSManual/SPSSLongerManual/Data
ForSPSS/. (At the moment this set of web pages is the most recent version whichever
of my books you are using.) Click Ok. The following will appear. The left hand side
is an outline of all of the output in the file. The right side is the actual output. To
shrink or enlarge either side put your cursor on the line that divides them. When the
double headed arrow appears, hold the left mouse button and move the line in either
direction. Release the button and the size will be adjusted.
Finally, there is the Syntax window which displays the command language used to
run various operations. Typically, you will simply use the dialog boxes to set up
commands, and would not see the Syntax window. The Syntax window would be
activated if you pasted the commands from the dialog box to it, or if you wrote you own
syntax--something we will not focus on here. Syntax files end in the extension .sps.
SPSS Menus and Icons
Now, lets review the menus and icons.
Review the options listed under each menu on the Menu Bar by clicking them one at a
time. Follow along with the below descriptions.
Utilities allows you to list file information which is a list of all variables, there
labels, values, locations in the data file, and type.
Add-ons are programs that can be added to the base SPSS package. You probably
do not have access to any of those.
Window can be used to select which window you want to view (i.e., Data Editor,
Output Viewer, or Syntax). Since we have a data file and an output file open, lets try
this.
Select Window/Data Editor. Then select Window/SPSS Viewer.
Help has many useful options including a link to the SPSS homepage, a statistics
coach, and a syntax guide. Using topics, you can use the index option to type in any key
word and get a list of options, or you can view the categories and subcategories available
under contents. This is an excellent tool and can be used to troubleshoot most problems.
The Icons directly under the Menu bar provide shortcuts to many common
commands that are available in specific menus. Take a moment to review these as well.
Place your cursor over the Icons for a few seconds, and a description of the underlying
command will appear. For example, this icon is the shortcut for Save. Review the
others yourself.
In the chapters that follow, we will review many specific functions available
through these Menus and Icons, but it is important that you take a few moments to
familiarize yourself with the layout and options before beginning.
Exiting SPSS
To close SPSS, you can either left click on the close button
located on the
upper right hand corner of the screen or select Exit from the File menu.
Choose one of these approaches.
A dialog box like the one below will appear for every open window asking you if you
want to save it before exiting. You almost always want to save data files. Output files
may be large, so you should ask yourself if you need to save them or if you simply want
to print them.
Click No for each dialog box since we do not have any new files or changed files to
save.
Exercises
1. Look up ANOVA in Help/Help topics. What kind of information did you
find?
2. Look up compare groups for significant differences in Help/ Statistics Coach.
What did you learn?
3. Open appendixd.sav. In the Data Viewer click Grid Lines in the View menu and
note what happens.
4. While in the Data Viewer for appendixd.sav, click Font in the View menu and
select the font style and size of your choice.
5. Using Edit/Options/General, under Variable View select Display Labels and
File. In future this means that SPSS will list the variables in the order they appear
in the file using the variable labels rather than variable names. As you are
analyzing data in future exercises try to notice whether or not you like this option.
If not, change it.
2: Entering Data
Objectives
Understand the logic of data files
Create data files and enter data
Insert cases and variables
Merge data files
Read data into SPSS from other sources
The Logic of Data Files
Each row typically represents the data from 1 case, whether that be a person,
animal, or object. Each column represents a different variable. A cell refers to the
juncture of a specific row and column. For example, the first empty cell in the right hand
corner would include the data for case 1, variable 1.
Entering Data
Open SPSS and follow along as your read this description.
To enter data, you could simply begin typing information into each cell. If you
did so, SPSS would give each column a generic label such as var00001. Clearly this is
not desirable, unless you have a superior memory, because you would have no way of
identifying what var00001 meant later on. Instead, we want to specify names for our
variables. To do this, you can double left click on any column head, this will
automatically take you to the Variable View. Alternatively, you can simply click on
Variable View on the bottom left hand corner of your screen.
The first column of variable view is Name. In earlier versions names could only
be 8 characters long. Although that restriction no longer applies, you should keep names
short for ease of reading. For example, if I had depression data that was collected at
intake, and 1 month, 6 months, and 1 year post intervention, I would name those
variables depress0 or depresin (i.e., in for intake), depress1, depress6, and depres12.
SPSS also has preferences for variable names. For example, a number cannot begin a
variable name (e.g., 12depres would not be a valid name). Error messages will appear if
you have selected a name that is not allowed in SPSS. The rules for variable names
appear below. They can be found by typing variable names in the Index option under
Help/Topics and then selecting rules from the list that appears.
The next columns are for Width and Decimals. You could have set this while
specifying your variable type, or you can specify them in these columns. The default for
width is 8 characters and the default for decimals is 2. To change this, left click the cell,
and up and down arrows will appear, as illustrated below. Left click the up arrow if you
want to increase the number, click the down arrow to decrease the value. Alternatively,
you can simply type the desired value in the cell.
Click here to change the width
The next column is Label. This is a very nice feature that allows you to provide
more information about the variable than you could fit in the 8 character variable name.
For example, I could type Depression assessed at intake for the example used above.
When you hold your cursor over a variable name in the Data View, the full label will
appear. This is very useful when you need a quick reminder. An example of this feature
is below.
Variable label
Since labels are so much more detailed than variable names, we can specify that SPSS
label variables this way in dialog boxes and output. Lets do this.
Click Edit/Options/Output Labels and select labels for each of the options. Then
click Ok.
The next column is Values. This allows you to assign variable labels. You will
typically use this option for categorical variables. For example, we may want the number
1 to represent males and the number 2 to represent females when we enter data on
gender. Lets try this.
Type gender in the first Name column.
Scroll over to the Values column and left click. Then, left click on the gray box that
appears on the right hand side of the cell. The Value Labels dialog box will appear.
Type the numeric value where it says Value, then type the Value Label or text to
explain what it means. Click Add. Do this for males and females. When you are
done, click Ok.
Of the remaining columns, you are most likely to use Align, which allows you to
specify how the data will appear in the cells. Your choices are left justified, right
justified, or centered. This is simply a matter of personal preference.
After you have completed specifying your variables, you can click on Data View
and begin entering your data. Put your cursor on the cell in which you want to enter data.
Type the value. If you hit Enter the cursor will move to the cell under the one you just
filled. You can also use the arrow keys to move to the next cell in any given direction.
Typically, you will either enter all of the values in one column by going down or you will
enter all of the variables in a row going from left to right.
Lets try this together.
Select depression scores.sav from the files that can be found at
http://www.uvm.edu/~dhowell/fundamentals7/SPSSManual/SPSSLongerManual/Data
ForSPSS/. SPSS will ask you if you want to save the file with the gender variable.
Click No. Then, SPSS will open depression scores. As you can see, variables have
been named and labeled, but the data have not been entered.
Enter the following data that appear on the next page in the Variable View window.
Pay attention to your own preferences for data entry (i.e., using the arrows or enter,
going across or down). Notice that there is no subject 10.
When you are done, click Save, but do not close the file. We will continue to use it as
an example.
ID
1
2
3
4
5
6
7
8
9
11
12
depressin
30.000
32.000
35.000
45.000
45.000
25.000
60.000
55.000
40.000
37.000
30.000
depress1
25.00
30.00
35.00
42.00
45.00
25.00
45.00
50.00
40.00
30.00
25.00
depress6
23.00
30.00
35.00
40.00
38.00
20.00
30.00
40.00
35.00
25.00
22.00
depres12
20.00
28.00
40.00
35.00
40.00
20.00
40.00
35.00
30.00
20.00
20.00
Inserting a Variable
After specifying the types of variables for the depression data, I realized I forgot
to include a column for ID number. Typically, I like ID to be the first variable in my data
file. I can add this in one of two ways.
1. In Variable View, highlight the first row and then click Insert Variable on the
Data menu. This will place a new variable before the selected variable.
2. In Data View, highlight the first variable column and then click the Insert
Variable icon
. This will also place a new variable column at the beginning of
the file.
Use one of the approaches above to Insert the new variable at the beginning of the file.
Name the variable ID, and label it as participant identification number.
Enter the ID data that appeared on the previous page.
Click Save, and leave the file open.
Inserting a Case
As you can see, the data for ID 10 is missing. I found the missing data and want
to enter it in the file. Id like my data to be in order by ID number, so I want to insert a
case between the person with ID 9 and ID 11. To do so, I can highlight the row for the
case with ID 11, and either:
1. click on Insert Case on the Data menu or
2. click on the Insert Case icon
. In either case, a blank row will appear
before the highlighted case. Try it yourself.
Insert a case for ID 10 using one of the above approaches.
Enter the following data:10, 38, 35, 38, 38 for ID, depresin, depress1, depress6, and
depres12 respectively.
Check the accuracy of your data entry, then click Save.
Merging Files
Adding Cases. Sometimes data that are related may be in different files that you
would like to combine or merge. For example, in a research methods class, every student
may collect and then enter data in their own data file. Then, the instructor might want to
put all of their data into one file that includes more cases for data analysis. In this case,
each file contains the same variables but different cases. To combine these files, have
one of the data files open, then left click on Merge Files on the Data menu and select
Add Cases. Then specify the file from which the new data will come and click Open. A
dialog box will appear showing you which variables will appear in the new file. View it,
and if all seems in order, click OK. The two files will be merged. This is fairly simple.
See if you can do it yourself in Exercise 3 at the end of this chapter.
Adding Variables. In other cases, you might have different data on the same
cases or participants in different files. For example, I may have recorded the
demographic information from the participants in my depression study in one file and the
depression data in another file. I may want to put them together because Id like to see if
demographic variables, like socioeconomic status or gender are related to depression. In
this case, you need to be sure the variables on the same participants end up in the correct
row, that is, you want to match the cases. In this case, we will use ID to match cases.
SPSS requires that the files you merge be in ascending order by the matching variable.
So, in both files, ID must start at 1. You can set this up by sorting cases as discussed
below. Then, make sure one of the files is open. Since this procedure is more
complicated, lets try this one together.
Open depression scores.sav from your disk (this is the data that you just entered).
Check to see if the cases are in ascending order by ID. They should be since we just
entered them that way.
Now, open depression demographics.sav. These data are not in order by ID. To fix
this, click Sort Cases under the Data menu.
In the dialog box, select participant
identification number and move it into
the Sort by box by clicking the arrow.
Make sure Ascending is selected for
Sort Order. Then click Ok.
While the demographic file is still open, click on Merge Files in the Data menu, and
select Add Variables.
The next dialog box will ask you to indicate which file the new variables are coming
from. Select depression scores.sav and click Ok. The following dialog box will
appear.
(I have had trouble doing things this way in the past, but succeeded by not selecting
Match cases on key variable in sorted files and just clicking OK. I dont know why,
but it worked for me.)
A dialog box will appear, reminding you that the files must be sorted. Click Ok, and
the files will be merged. You may want to do a Save As and give the merged file a
new name like depression complete.sav to help you remember what is in it.
Reading Data In From Other Sources
SPSS can also recognize data from several other sources. For example, you can
open data from Microsoft EXCEL in SPSS or you can get SPSS to read data entered in a
text file. This is an attractive option, particularly if you do not have your own version of
SPSS. It allows you to enter data in other more common programs, save them to disk,
and simply open them when you have the opportunity to use a PC that has SPSS on it.
Lets try some examples.
Opening data from EXCEL. The complete depression data is also on the web in
a file named complete depression.xls (xls is the extension for Excel data files). Take a
moment to open this file in Excel and look it over. You will see it looks just like the file
we just created except that the variable names are different because they are longer.
Specific instructions follow.
Open complete depression.xls.
Rename the variables in Excel to include eight characters if your version of SPSS will
not accept longer names. When you are done, Save your changes and close the file
because SPSS cannot read the file if it is open in another program.
Open SPSS and select Read Text Data from the File menu.
A dialog box will appear. Under Files of type, select Excel. Under Look in select the
subdirectory that holds the file you want. (I suggest saving all of the files found with
the above web link to a directory and then loading from there. That just makes life a
bit easier.) Depression complete.xls should appear. Select it and click Open. The
following dialog box will appear.
The downside is the new data file does not include variable labels or values, so
you would need to add them. You should also make sure that SPSS has identified the
variables as the correct type.
Text Data. Now, lets try an example with text data. A text data file can be
created in any word processing program or in Notepad or any other text editor. Just be
sure to save the file with the .txt or .dat file extension. SPSS can recognize text data in
several formats. Lets begin with the simplest example. I have collected data from 11
people and typed it in the following format (this is a sample, not the whole file).
012345
021123
031234
042345
051455
062111
071122
082334
092543
101345
111345
The first two digits are the ID number. The next digit is
gender. Digits 4, 5, and 6 are the responses to the first 3
questions on a survey. No characters or spaces separate the
variables. The data are on to be found in simpletextdata.txt
Normally I create data files with a space between the variables (or
use a tab). This makes them much easier to read.
Open SPSS.
In the SPSS File menu, click Read text data.
Select simpletextdata under Files of type Text and click Open.
In the next dialog box, click No for Does your text file have a predefined format and
click Next.
In the next dialog box, select Fixed width under How are your variables arranged,
then select No for Are variable names included in the top of your file. Then click
Next.
In the next dialog box, indicate that the data starts on line 1, 1 line represents a case,
and you want to import all cases, then click Next. The following dialog box will
appear. We need to tell SPSS where to insert breaks for variables.
The next dialog box will show you a draft of what your new data file will look like.
Notice, the variables will have generic names like v1, v2, etc. Then click Next.
At the next dialog box, you can click Finish and your new data file will appear. You
could then specify variable names, types, and labels as illustrated above.
Lets take one more example. This is based on the same data, but this time the
text file is tab delimitated (a tab was inserted between each variable) and has variable
names at the top. Below, is an example of the first two lines of from this text file.
ID
01
Gender Q1
2
3
Q2
4
Q3
5
In the next dialog box, check Tab as the type of delimiter and then click Next. (Many
of the files for the text are delimited by a space rather than a tab. You can simply choose
space in the dialog box.)
You will see a draft of your data file. Review it, and then click Next.
Click Finish at the next dialog box and your new data file will appear.
One difference between these two examples, is that the second included the
variable names at the top of the file. This, in my opinion, is the better approach because
it reduces the chance of making mistakes later in the process.
This chapter included information about entering data and opening files of various
types. This is an important part of the process because data entry errors contribute to
inaccurate results. Further, good variable names and labels allow you to perform
subsequent analyses more efficiently. Completing the following exercises will help you
internalize these processes.
Exercises
1. The data from Appendix D are on the web in a file named RxTime.sav. Open this
file and label the variables and values as described in the Howell text book on
page. Save the changes when you are done because we will use this file in
subsequent examples and exercises
2. Read the data from Fig3-2.dat on the website. These are the data on intrusive
thoughts shown in Figure 3.2 of your text. These are raw data with variable names
in the first line.
3. Review the following data. Then, create your own data file and enter the data.
Be sure to include variable and value labels. Then open exercise2.2.sav on the
disk which includes the same data. Note the similarities and differences between
your file and the file on disk. Which do you prefer? Why?
Age
Gender
Average Hours of
Sleep
Number of Classes
Missed
Grade in Course
18
18
17
19
20
21
23
22
18
Male
Female
Female
Female
Male
Female
Male
Male
Male
Seven
Four
Six
Ten
Eight
Seven and a half
Nine
Eight
Six
0
1
2
5
2
3
1
2
3
A
C
B
F
B
C
B
A
D
4. Merge the following files from the disk using the add cases option: merge1.sav
and merge2.sav.
5. Read the following text data file into SPSS: textdataexercise.txt. Be sure to open
the text file and notice the format before you proceed.
6. Read readexcelexercise.xls into SPSS. Note any problems that arise and how you
solved them.
In the main descriptives dialog box, check the box that says Save standardized
values as variables. SPSS will calculate z scores for each of the variables using the
formula you learned about and append them to the end of your data file. Click Ok.
The resulting output will look like this. Note that the variable labels are used rather
than the variable names. Remember, we specified this as the default in
Edit/Options/Output Labels.
Double click the table so you can edit it. As was the case with graphs, SPSS has many
options to edit statistics in tables as well. Lets try some of them.
Under Pivot, select Transpose Rows and Columns. Which orientation do you
prefer? I like the first since its more conventional, so I will Transpose the Rows and
Columns again to return to the original orientation.
Now, click on Format/Table properties. Take a moment to view all of the options in
this dialog box. General allows you to specify the width of row and column labels.
Footnotes allows you to chose numeric or alphabetic labels and subscript or
superscript as the position for those labels. Cell formats allows you to change the
font style and size, color, and the alignment. Borders allows you to add or remove
borders around rows, columns, and even cells. Printing allows you to select options
such as rescaling tables to fit on paper. After youve viewed the options, hit Cancel.
The resulting table is below. I could edit each individual cell by double clicking on it
and then edit the text. For example, I could alter each statistic to include 2 decimal
places if I wanted. You try it.
Now, click on Window/SPSS Statistics Data Editor and look at the standardized
values (z scores) SPSS added to your file. A brief portion of the Data Editor appears
below. You can see that SPSS named each variable with a z. SPSS also labeled the
new variables. Check this out in Variable View.
Frequencies
Now, well use the frequencies command to help us examine the distributions of
the same continuous variables.
Select Analyze/Descriptive Statistics/Frequency.
Put the variables of interest in the
Variable list box. Unselect
Display frequency tables,
because this will be a list of the
frequency of every value. (Ignore
what looks like an error message.)
Click on Charts, select
Histogram with normal curve
and click Continue. Now, click
Statistics.
Click on Statistics. This dialog
box has all of the same options
we selected under Descriptives
earlier. However, the
Descriptives dialog box did not
include the median and mode.
Select all of the statistics of
interest and click Continue.
Then, click Ok. A sample of the
output follows.
Frequencies
Histogram
Take a moment to review the output. It looks like ADD is somewhat normally
distributed, though a bit negatively skewed. Looking at your own output, are the other
variables normally distributed? I also remember now that English grade is nominal too.
Variables were scored as A, B, C, D, and F, though coded as 1 - 4. As noted in the text,
we could analyze this as continuous data, but it seems that reporting the frequencies
rather than measures of central tendency and dispersion may be more appropriate for this
variable.
As before, you can edit the tables or the graphs by double clicking on them. One
difference we have seen between the Descriptives and Frequencies options is that
descriptives only include mean for measures of central tendency whereas Frequencies
include the mean, median, and mode. Further, Descriptives does not have any built in
graphing options, but Frequencies does.
Now lets use Frequencies to describe categorical data.
Select Analyze/Descriptive Statistics/Frequencies.
This time, put gender, level of English class, English grade, repeated a grade, social
problems, and drop out status in the variable list. Select Display frequency table.
Since there is a finite number of values, we want to know how many people fit in
every category. Click on Statistics and unselect all of the options because we decided
that measures of central tendency and variability are not useful for these data. Then
click Continue. Next, click on Charts. Click on Bar chart and select Percentages
as the Chart Values. Click Continue and then Ok. A sample of the resulting output is
below. Take a moment to review it.
Notice that the frequency tables include a column labeled Percent and another
labeled Valid percent. This is an important distinction when you have missing cases.
The percent column indicates the percent of cases in each category out of those cases for
which there is complete data on the variable. Valid percent indicates the percent of cases
in each category out of the total number of cases, even if some data are missing. For
example, imagine a sample of 100 students. Fifty cases are women, 40 are men, and 10
are missing the data. The percent of men would be 44.4%, but the valid percent of men
would be 40%. Which do you believe is the more accurate way to describe the sample?
Id argue the valid percent. Now lets move on to a more complicated type of frequency
table.
Crosstabs
Sometimes we need to know the number and percent of cases that fall in multiple
categories. This is useful when we have multiple categorical variables in a data set. For
example, in the data set we have been using, Id like to know what percent of dropout and
nondropout students had social problems. Well use crosstabs to calculate this.
Click Analyze/Descriptive Statistics/Crosstabs.
Both the table and the graph show that of those youth with social problems, an
equal number did and did not ultimately drop out. This suggests that social problems in
ninth grade and drop out status are independent, something we can test later using chi
square.
Compare Means
Now, lets consider a case where we want to describe a continuous variable but at
different levels of a categorical variable. This is often necessary when you are comparing
group means. For example, we can compare ADD symptoms for males and females.
Lets try it together.
Select Analyze/Compare Means/Means. Notice this is the first time we havent
selected Descriptive Statistics in this chapter.
Select ADD score for the
Dependent List and Gender
for the Independent List.
Click Options. Notice that
mean, standard deviation and
number of cases are already
selected under statistics. Add
any other descriptive you are
interested in, then click
Continue and then Ok. The
output follows.
Notice that this table gives you the marginal descriptives (i.e., the descriptive for
gender independent of social problems and vice versa) under totals and the cell
descriptives (i.e., the descriptives at each level of the variables-e.g., for boys with social
problems).
Exit SPSS. There is no need to save the Data File since we havent changed it. It is up
to you to decide whether or not you would like to save the output file for future
reference.
Weve reviewed a variety of options for calculating descriptive statistics
depending on the type of data and the kinds of questions. Weve also seen that many of
the graphs we reviewed in Chapter 3 are options in the subcommands under Descriptive
Statistics. In the following chapters you will discover that descriptive statistics are an
option embedded within many other analyses dialog boxes (e.g. t-test, ANOVA, etc).
Try the following exercises to be sure you understand all of the various options for
calculating descriptives and to help you identify your own preferences.
Exercise
1. Using merge1.sav calculate the mean, median, mode, range, variance, and
standard deviation for the following variables: self-esteem, anxiety, coping, and
health. Create a histogram for anxiety. Note how you did each.
2. Using the data in appendixd.sav, calculate the frequency and percent of females
and males who did and did not have social problems.
3. Using the data in appendixd.sav, calculate the mean, variance, and standard
deviation for GPA broken down by social problems and drop out status.
5. Correlation
Objectives
Calculate correlations
Calculate correlations for subgroups using split file
Create scatterplots with lines of best fit for subgroups and multiple
correlations
Correlation
The first inferential statistic we will focus on is correlation. As noted in the text,
correlation is used to test the degree of association between variables. All of the
inferential statistics commands in SPSS are accessed from the Analyze menu. Lets open
SPSS and replicate the correlation between height and weight presented in the text.
Open HeightWeight.sav. Take a moment to review the data file.
Under Analyze, select Correlate/Bivariate. Bivariate means we are examining the
simple association between 2 variables.
In the dialog box, select height
and weight for Variables.
Select Pearson for
Correlation Coefficients since
the data are continuous. The
default for Tests of
Significance is Two-tailed.
You could change it to Onetailed if you have a directional
hypothesis. Selecting Flag
significant correlations means
that the significant correlations
will be noted in the output by
asterisks. This is a nice
feature. Then click Options.
For example I may run correlations between height, weight, and blood pressure. One
subject may be missing blood pressure data. If I check Exclude cases listwise, SPSS will
not include that persons data in the correlation between height and weight, even though
those data are not missing. If I check Exclude cases pairwise, SPSS will include that
persons data to calculate any correlations that do not involved blood pressure. In this
case, the persons data would still be reflected in the correlation between height and
weight. You have to decide whether or not you want to exclude cases that are missing
any data from all analyses. (Normally it is much safer to go with listwise deletion, even
though it will reduce your sample size.) In this case, it doesnt matter because there are
no missing data. Click Continue. When you return to the previous dialog box, click
Ok. The output follow.
Correlations
Descriptive Statistics
HEIGHT
WEIGHT
Mean
68.72
145.15
St d. Dev iation
3.66
23.74
N
92
92
Correlations
HEIGHT
WEIGHT
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
HEIGHT
WEIGHT
1.000
.785**
.
.000
92
92
.785**
1.000
.000
.
92
92
Subgroup Correlations
We need to get SPSS to calculate the correlation between height and weight
separately for males and females. The easiest way to do this is to split our data file by
sex. Lets try this together.
In the Data Editor window, select Data/Split file.
Select Organize output by
groups and Groups Based on
Gender. This means that any
analyses you specify will be
run separately for males and
females. Then, click Ok.
Notice that the order of the data file has been changed. It is now sorted by Gender,
with males at the top of the file.
Now, select Analyze/Correlation/Bivariate. The same variables and options you
selected last time are still in the dialog box. Take a moment to check to see for
yourself. Then, click Ok. The output follow broken down by males and females.
Correlations
SEX = Male
Descriptive Statisticsa
HEIGHT
WEIGHT
Mean
70.75
158.26
a. SEX = Male
St d. Dev iation
2.58
18.64
N
57
57
SEX = Female
Descriptive Statisticsa
HEIGHT
WEIGHT
Mean
65.40
123.80
St d. Dev iation
2.56
13.37
N
35
35
a. SEX = Female
As before, our results replicate those in the text. The correlation between height
and weight is stronger for males than females. Now lets see if we can create a more
complicated scatterplot that illustrates the pattern of correlation for males and females on
one graph. First, we need to turn off split file.
Select Data/Split file from the Data Editor window. Then select Analyze all cases, do
not compare groups and click Ok. Now, we can proceed.
Scatterplots of Data by Subgroups
Select Graphs/Legacy/Scatter. Then, select Simple and click Define.
When your graph appears, you will see that the only way males and females are
distinct from one another is by color. This distinction may not show up well, so lets edit
the graph.
Double click the graph to activate the Chart Editor. Then double click on one of the
female dots on the plot. SPSS will highlight them. (I often have trouble with this. If it
selects all the points, click again on a female one. That should do it.) Then click the
Marker menu.
Click on
Select
theChart/Options.
circle under Marker Type and chose a Fill color. Then click
Apply. Then click on the male dots, and select the open circle in Marker
Type and click Apply. Then, close the dialog box. The resulting graph
should look just like the one in the textbook.
I would like to alter our graph to include the line of best fit for both groups.
Under Elements, select Fit Line at Subgroups. Then select Linear and click
Continue. (I had to select something else and then go back to Linear to highlight the
Apply button.) The resulting graph follows. I think it looks pretty good.
Edit the graph to suit your style as you learned in Chapter 3 (e.g., add a title,
change the axes titles and legend).
This more complex scatterplot nicely illustrates the difference in the correlation
between height and weight for males and females. Lets move on to a more complicated
example.
Overlay Scatterplots
Another kind of scatterplot that might be useful is one that displays the
association between different independent variables with the same dependant variable.
Above, we compared the same correlation for different groups. This time, we want to
compare different correlations. Lets use the course evaluation example from the text . It
looks like expected grade is more strongly related to ratings of fairness of the exam than
ratings of instructor knowledge is related to the exam. Id like to plot both correlations. I
can reasonably plot them on the same graph since all of the questions were rated on the
same scale.
Open courseevaluation.sav. You do not need to save HeightWeight.sav since you did
not change it. So click No.
First, lets make sure the correlations reported in the text are accurate. Click
Analyze/Correlation/Bivariate and select all of the variables. Click Ok. The output
follow. Do they agree with the text?
Correlati ons
OVERALL
TEACH
EXAM
KNOWLEDG
GRADE
ENROLL
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
OVERALL
1.000
.
50
.804**
.000
50
.596**
.000
50
.682**
.000
50
.301*
.034
50
-.240
.094
50
TEACH
.804**
.000
50
1.000
.
50
.720**
.000
50
.526**
.000
50
.469**
.001
50
-.451**
.001
50
EXAM
KNOWLEDG
GRADE
ENROLL
.596**
.682**
.301*
-.240
.000
.000
.034
.094
50
50
50
50
.720**
.526**
.469**
-.451**
.000
.000
.001
.001
50
50
50
50
1.000
.451**
.610**
-.558**
.
.001
.000
.000
50
50
50
50
.451**
1.000
.224
-.128
.001
.
.118
.376
50
50
50
50
.610**
.224
1.000
-.337*
.000
.118
.
.017
50
50
50
50
-.558**
-.128
-.337*
1.000
.000
.376
.017
.
50
50
50
50
Click on exam and grade and shift them into Y-X Pairs. Then click on exam and
knowledge and click them into Y-X pairs. Since exam is the commonality between
both pairs, Id like it to be on the Y axis. If it is not listed as Y, highlight the pair
and click on the two-headed arrow. It will reverse the ordering. Exam should then
appear first for both. Then, click Ok.
As in the previous example, the dots are distinguished by color. Double click the
graph and use the Marker icon to make them more distinct as you learned above. Also
use the Elements menu to Fit line at total. It will draw a line for each set of data.
Note that the axes are not labeled. You could label the Y Axis Grade.
But you could not label the X axis because it represents two different
variables-exam and knowledge. That is why the legend is necessary. (If you
figure out how to label that axis, please let me know. It should be so easy.)
As you can see, the association between expected grade and fairness of the exam
is stronger than the correlation between instructors knowledge and the fairness of the
exam.
Now, you should have the tools necessary to calculate Person Correlations and to
create various scatterplots that compliment those correlations. Complete the following
exercises to help you internalize these steps.
Exercises
Exercises 1 through 3 are based on appendixd.sav.
1. Calculate the correlations between Add symptoms, IQ, GPA, and English
grade twice, once using a one-tailed test and once using a two-tailed test.
Does this make a difference? Typically, when would this make a difference.
2. Calculate the same correlations separately for those who did and did not drop
out, using a two-tailed test. Are they similar or different?
3. Create a scatterplot illustrating the correlation between IQ score and GPA for
those who did and did not drop out. Be sure to include the line of best fit for
each group.
4. Open courseevaluation.sav. Create a scatterplot for fairness of exams and
teacher skills and exam and instructor knowledge on one graph. Be sure to
include the lines of best fit. Describe your graph.
At the main dialog box, click on Plots so we can see our options.
It looks like we can create scatterplots here.
Click Help to see what the abbreviations
represent. Id like to plot the Dependent
variable against the predicted values to see
how close they are. Select Dependnt for Y
and Adjpred for X. Adjpred is the adjusted
prediction. Used Help/Topics/Index to find
out what this means for yourself. Then, click
Continue.
SY MPTOMS
STRESS
N
107
107
Correlati ons
Pearson Correlation
Sig. (1-tailed)
N
SY MPTOMS
1.000
.506
.
.000
107
107
SY MPTOMS
STRESS
SY MPTOMS
STRESS
SY MPTOMS
STRESS
STRESS
.506
1.000
.000
.
107
107
Variabl es Entered/Removedb
Model
1
Variables
Entered
STRESSa
Variables
Remov ed
.
Method
Enter
Model Summaryb
Model
1
R
.506a
R Square
.256
Adjusted
R Square
.249
St d. Error of
the Estimate
17.56
Model
1
Regression
Residual
Total
Sum of
Squares
11148.382
32386.048
43534.430
df
1
105
106
Mean Square
11148.382
308.439
F
36.145
Sig.
.000a
Coeffi ci entsa
Model
1
(Constant)
STRESS
Unstandardized
Coef f icients
B
St d. Error
73.890
3.271
.783
.130
St andardi
zed
Coef f icien
ts
Beta
.506
t
22.587
6.012
Sig.
.000
.000
Zero-order
.506
Correlations
Part ial
.506
Part
.506
Charts
How does our output compare to the output presented in the textbook? Take a moment
to identify all of the key pieces of information. Find r2, find the ANOVA used to test
the significance of the model, find the regression coefficients used to calculate the
regression equation. One difference is that the text did not include the scatterplot.
What do you think of the scatterplot? Does it help you see that predicting symptoms
based on stress is a pretty good estimate? You could add a line of best fit to the
scatterplot using what you learned in Chapter 5.
Now, click Window/Symptoms and stress.sav and look at the new data (residuals and
predicted values) in your file. A small sample is below. Note how they are named
and labeled.
Lets use what we know about the regression equation to check the accuracy of the
scores created by SPSS. We will focus on the unstandardized predicted and residual
values. This is also a great opportunity to learn how to use the Transform menus to
perform calculations based on existing data.
We know from the regression equation that:
Symptoms Predicted or Y = 73.890 + .783* Stress.
We also know that the residual can be computed as follows:
Residual = Y- Y or Symptoms Symptoms Predicted Values.
Well use SPSS to calculate these values and then compare them to the values computed
by SPSS.
In the Data Editor window, select Transform/Compute.
Check the Data Editor to see if your new variable is there, and compare it to pre_1.
Are they the same? The only difference I see is that our variable is only expressed to 2
decimal places. But, the values agree.
Follow similar steps to calculate the residual. Click on Transform/Compute. Name
your Target Variable sympres and Label it symptoms residual. Put the formula
symptoms-sympred in the Numeric Expression box by double clicking the two preexisting variables and typing a minus sign between them. Then, click Ok.
Compare these values to res_1. Again they agree. A portion of the new data file is
below.
Now that you are confident that the predicted and residual values computed by
SPSS are exactly what you intended, you wont ever need to calculate them yourself
again. You can simply rely on the values computed by SPSS through the Save command.
Multiple Regression
Now, lets move on to multiple regression. We will predict the dependent
variable from multiple independent variables. This time we will use the course
evaluation data to predict the overall rating of lectures based on ratings of teaching skills,
instructors knowledge of the material, and expected grade.
Open course evaluation.sav. You may want to save symptoms and stress.sav to
include the residuals. Thats up to you.
Select Analyze/Regression/Linear.
Select overall as the Dependent
variable, and teach, knowledge,
and grade as the Independents.
Since there are multiple
independent variables, we need
to think about the Method of
entry. As noted in the text,
stepwise procedures are
seductive, so we want to select
Enter meaning all of the
predictors will be entered
simultaneously.
Click Statistics and select Descriptives and Part and partial correlations. Click
Continue.
Click Plots and select Dependnt as Y and Adjpred as X. Click Continue.
Click Save and select the Residuals and Predicted values of your choice. Click
Continue.
Click Ok at the main dialog box. The output follows.
OVERALL
TEACH
KNOWLEDG
GRADE
N
50
50
50
50
Correlations
Pearson Correlation
Sig. (1-tailed)
OVERALL
TEACH
KNOWLEDG
GRADE
OVERALL
TEACH
KNOWLEDG
GRADE
OVERALL
TEACH
KNOWLEDG
GRADE
OVERALL
1.000
.804
.682
.301
.
.000
.000
.017
50
50
50
50
TEACH
.804
1.000
.526
.469
.000
.
.000
.000
50
50
50
50
Variabl es Entered/Removedb
Model
1
Variables
Entered
Variables
Remov ed
GRADE,
KNOWLED
a
G, TEACH
Method
Enter
Model Summaryb
Model
1
R
.863a
R Square
.745
Adjusted
R Square
.728
St d. Error of
the Estimate
.32
KNOWLEDG
.682
.526
1.000
.224
.000
.000
.
.059
50
50
50
50
GRADE
.301
.469
.224
1.000
.017
.000
.059
.
50
50
50
50
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
13.737
4.708
18.445
df
3
46
49
Mean Square
4.579
.102
F
44.741
Sig.
.000a
Coeffi cientsa
Model
1
Unstandardized
Coef f icients
B
St d. Error
-.927
.596
.759
.112
.534
.132
-.153
.147
(Constant)
TEACH
KNOWLEDG
GRADE
St andardi
zed
Coef f icien
ts
Beta
t
-1.556
6.804
4.052
-1.037
.658
.355
-.088
Sig.
.127
.000
.000
.305
Charts
Scatterplot
Dependent Variable: OVERALL
5.0
4.5
4.0
3.5
OVERALL
3.0
2.5
2.0
2.0
2.5
3.0
3.5
4.0
4.5
5.0
Compare this output to the results in the text. Notice the values are the same, but the
styles are different since the output in the book (earlier edition) is from Minitab, a
different data analysis program.
Exit SPSS. Its up to you to decide if you want to save the changes to the data file and
the output file.
In this chapter, you have learned to use SPSS to calculate simple and multiple
regressions. You have also learned how to use built in menus to calculate descriptives,
residuals and predicted values, and to create various scatterplots. As you can see, SPSS
has really simplified the process. Complete the following exercises to increase your
comfort and familiarity with all of the options.
Exercises
1. Using data in course evaluations.sav, predict overall quality from expected grade.
2. To increase your comfort with Transform, calculate the predicted overall score
based on the regression equation from the previous exercise. Then calculate the
residual. Did you encounter any problems?
3. Using data in HeightWeight.sav, predict weight from height and gender. Compare
your results to the output in Table 11.6 of the textbook.
4. Using the data in cancer patients.sav, predict distress at time 2 from distress at
time 1, blame person, and blame behavior. Compare your output to the results
presented in Table 11.7 in the textbook.
One-Sample Statisti cs
N
ELEVATE
10
Mean
1.46
St d. Dev iation
.34
St d. Error
Mean
.11
One-Sample Test
Test Value = 1
ELEVATE
t
4.298
df
9
Sig. (2-tailed)
.002
Mean
Dif f erence
.46
Notice that descriptive statistics are automatically calculated in the one-sample t-test.
Does our t-value agree with the one in the textbook? Look at the Confidence
Interval. Notice that it is not the confidence interval of the mean, but the confidence
interval for the difference between the sample mean and the test value we specified, in
this case 1.
Now, lets move on to related or paired samples t-tests.
Paired Samples t-tests
A paired samples t-test is used to compare two related means. It tests the null
hypothesis that the difference between two related means is 0. Lets begin with the
example of weight gain as a function of family therapy in the text. We want to see if the
difference in weight before and after a family therapy intervention is significantly
different from 0.
Open anorexia family therapy.sav. You dont need to save moon illusion.sav since we
didnt change the data file.
Select Analyze/Compare Means/Paired Samples t-test.
Select weight before and weight after
family therapy and click them into the
Paired Variables box using the arrow.
Then click Options. Notice you can select
the confidence interval you want again.
Leave it at 95%, click Continue, and then
click Ok. The output follows.
T-Test
Mean
Pair
1
Std. Error
Mean
83.2294
17
5.0167
1.2167
90.4941
17
8.4751
2.0555
Correlation
17
.538
Sig.
.026
Mean
Pair
1
-7.2647
St d. Dev iation
St d. Error
Mean
7.1574
1.7359
-3.5847
t
-4.185
df
Sig. (2-tailed)
16
.001
Notice, the descriptives were automatically calculated again. Compare this output to
the results in the text. Are they in agreement? The mean difference is negative here
because weight after the treatment was subtracted from weight before the treatment.
So the mean difference really shows that subjects tended to weigh more after the
treatment. If you get confused by the sign of the difference, just look at the mean
values for the before and after weights. Notice that this time the confidence interval is
consistent with what we would expect. It suggests we can be 95% confident that the
actual weight gain of the population of anorexics receiving family therapy is within the
calculated limits.
If you want to see the mean difference graphically, try to make a bar graph using what
you learned in Chapter 3. [Hint: Select Graphs/Legacy/Bar, then select Simple and
Summaries of separate variables. Select weight before and weight after family
therapy for Bars Represent. Use mean as the Summary score. Click Ok. Edit your
graph to suit your style.] Mine appears below.
In this chapter, you have learned to calculate each of the 3 types of t-tests covered
in the textbook. You have learned to display mean differences graphically as well.
Complete the following exercises to help you internalize when each type of t-test should
be used.
Exercises
1. Use the data in sat.sav to compare the scores of students who did not see the
reading passage to the score you would expect if they were just guessing (20)
using a one-sample t test. Compare your results to the results in the textbook in
Section 12.9. What conclusions can you draw from this example?
2. Open moon illusion paired.sav. Use a paired samples t-test to examine the
difference in the moon illusion in the eyes elevated and the eyes level conditions.
Compare your results to the results presented in Section 13.3 in the textbook.
3. Create a bar graph to display the difference, or lack thereof, in the moon illusion
in the eyes level and eye elevated conditions, from the previous exercise.
4. Using the data in horn honking.sav, create a boxplot illustrating the group
differences in latencies for low status and high status cars. Compare your boxplot
to the one in the textbook in Figure 14.3.
5. Open anorexia weight gain.sav. In this data set, weight gain was calculated for
three groups of anorexics. One group received family therapy, another cognitive
behavioral therapy, and the final group was a control group. Use an independent
samples t-test to compare weight gain between the control group and family
therapy group. Compare your results to the data presented in the textbook in
Table 14.1.
6. In the same data set, use independent t-tests to compare the weight gain for the
cognitive behavior therapy and control group and for the two therapy groups.
Now that you have compared each of the groups, what conclusions would you
draw about which type of therapy is most effective?
7. Using the same data set, create a bar graph or box plot that illustrates weight gain
for all 3 groups.
Note that .05 is the default under Significance level. After consulting with SPSS
technical support, it is clear that this is the experiment-wise or family-wise significance
level. So any comparison flagged by SPSS as significant is based on a Bonferroni
Type Correction. You do not need to adjust the significance level yourself.
Click Options. In the next dialog box, select
Descriptives under Statistics, and select Means plot
so SPSS will create a graph of the group means for us.
The default under Missing Values is Exclude cases
analysis by analysis. Lets leave this as is. Click
Continue and then Ok. The output follows.
Means Plots
19
18
17
16
15
14
LBW Experimental
LBW Control
Full-term
GROUP
The plot that SPSS created is an effective way to illustrate the mean differences.
You may want to edit the graph using what you learned in Chapter 3 to make it more
elegant. Some people would prefer a bar chart since these are independent groups and a
line suggests they are related. You could create a bar chart of these group means
yourself.
Lets re-run the same analysis using the General Linear Model (GLM) and see
how they are similar and different.
General Linear Model to Calculate One-Way ANOVAs
The Univariate General Linear Model is really intended to test models in which
there is one dependent variable and multiple independent variables. We can use it to run
a simple one-way ANOVA like the one above. One advantage of doing so is that we can
estimate effect size from this menu, but we could not from the One-Way ANOVA menus.
Lets try it.
Select Analyze/General Linear Model/Univariate.
As you can see by this dialog box, there
are many more options than the OneWay ANOVA. This is because the
GLM is a powerful technique that can
examine complex designs. Well just
focus on what is relevant to us. As
before, select maternal role adaptation
as the Dependent Variable and group
as the Fixed Factor or independent
variable. Then, click Plots.
Profile Plots
Estimated Marginal Means of maternal role adaptation (low sores better)
19
18
17
16
15
14
LBW Experimental
LBW Control
Full-term
GROUP
Compare this output to the output from the One-Way ANOVA and the results in the
textbook.
One difference is the appearance of the ANOVA summary table. Now, there is a
row labeled intercept and another labeled adjusted. You can ignore these. The F value
for Group is still the same, and that is what we are interested in. Notice the eta squared
column. What does it say for group? Does this value agree with the text? Unfortunately
SPSS does not calculate Omega squared, so you would have to do this by hand.
(Unfortunately, it also does not calculate any of the more useful effect size measures,
such as d. Did the Bonferroni and the previous LSD multiple comparisons yield the same
results?
You could edit any of the tables and graphs to look more elegant. For example,
the current title of the graph is cut off. You would probably want to name it something
else or use two lines of text. Editing the output would be ideal if you wanted to include
your output in a paper. Use what you learned in Chapters 3 and 4 of this Manual to do
so.
In this chapter, you learned 3 methods to calculate a One-Way ANOVA. I prefer
the General Linear Model approach since this is the only one that gives us the option of
calculating multiple comparisons and eta squared. Of course, you may feel otherwise
depending on the information you wish to calculate. Complete the following exercises.
Exercises
Each of these exercises is based on Eysenck recall.sav. This study is presented in section
16.1 in the textbook.
1. Use ANOVA to compare the means. Select a post hoc procedure of your choice.
Summarize the results.
2. Edit the ANOVA summary table so that it is suitable for inclusion in a paper.
3. Use SPSS to calculate eta squared. Note, how did you do this?
4. Create a bar chart to illustrate the differences between groups.
Between-Subjects Factors
AGE
CONDITIO
1
2
1
2
3
4
5
Value Label
Older
Younger
Counting
Rhy ming
Adjectiv e
Imagery
Intentional
N
50
50
20
20
20
20
20
Descriptive Statistics
Dependent Variable: RECALL
AGE
Older
Y ounger
Total
CONDITIO
Counting
Rhy ming
Adjectiv e
Imagery
Intentional
Total
Counting
Rhy ming
Adjectiv e
Imagery
Intentional
Total
Counting
Rhy ming
Adjectiv e
Imagery
Intentional
Total
Mean
7.00
6.90
11.00
13.40
12.00
10.06
6.50
7.60
14.80
17.60
19.30
13.16
6.75
7.25
12.90
15.50
15.65
11.61
St d. Dev iation
1.83
2.13
2.49
4.50
3.74
4.01
1.43
1.96
3.49
2.59
2.67
5.79
1.62
2.02
3.54
4.17
4.90
5.19
N
10
10
10
10
10
50
10
10
10
10
10
50
20
20
20
20
20
100
Ty pe I II Sum
of Squares
1945.490a
13479.210
240.250
1514.940
190.300
722.300
16147.000
2667.790
df
9
1
1
4
4
90
100
99
Mean Square
216.166
13479.210
240.250
378.735
47.575
8.026
F
26.935
1679.536
29.936
47.191
5.928
Sig.
.000
.000
.000
.000
.000
Et a Squared
.729
.949
.250
.677
.209
Profile Plots
Estimated Marginal Means of RECALL
13.5
13.0
12.5
12.0
11.5
11.0
10.5
10.0
9.5
Older
Younger
AGE
16
14
12
10
6
Counti ng
Rhyming
CON DITIO
Adj ective
Imagery
Intentional
AGE
8
Older
6
4
Younger
Counti ng
Rhyming
Adj ective
Imagery
Intentional
CON DITIO
RECALL * AGE
Et a
.300
Et a Squared
.090
Measures of Association
RECALL * CONDITIO
Et a
.754
Et a Squared
.568
As you can see, these values agree with those in the text for age and condition.
You would still need to calculate eta squared for the interaction between age and
condition.
Simple Effects
Now that we know there is a significant interaction between age and condition,
we need to calculate the simple effects to help us interpret the interaction. The easiest
way to do this is to split the file using the Data/Split File menu selections. Then, we can
re-run the ANOVA testing the effects one independent variable on the dependent variable
at each level of the other independent variable. For example, we can see the effect of
condition on recall for younger participants and older participants. Because we will most
likely wish to run our significance test using MSerror from the overall ANOVA, we will
have to perform some hand calculations. After we get the new MS values for condition
in each group, we will need to divide them by MSerror from the original analysis as noted
in the text.
In Data Editor View, click on Data/Split file.
Now, we are going to calculate the effect of condition on recall for each age group, so
select Analyze/Compare Means/One-Way ANOVA.
Select recall as the Dependent Variable and condition as the Factor. Then click
Continue. There is no need to use Options to calculate means or create plots since we
already did that when we ran the factorial ANOVA. So, click Ok. The output follows.
AGE = Older
ANOVAa
RECALL
Between Groups
Within Groups
Total
Sum of
Squares
351.520
435.300
786.820
df
4
45
49
Mean Square
87.880
9.673
F
9.085
Sig.
.000
F
53.064
Sig.
.000
a. AGE = Older
AGE = Younger
ANOVAa
RECALL
Between Groups
Within Groups
Total
Sum of
Squares
1353.720
287.000
1640.720
df
4
45
49
Mean Square
338.430
6.378
a. AGE = Younger
Compare MScondition (between groups) in the above tables to those presented in the text.
As you can see, they are in agreement. Now, divide them by the MSerror from the
original ANOVA, 8.026. The calculations follow.
Fconditions at old =
87.88
10.95
8.026
Fconditions at young =
338.43
42.15
8.026
Thus, we end up with the same results. Although we had to perform some hand
calculations, having SPSS calculate the mean square for conditions for us certainly
simplifies things.
In this chapter you learned to calculate Factorial ANOVAs using GLM
Univariate. In addition, you learned a shortcut to assist in calculating simple effects.
Complete the following exercises to better familiarize yourself with these commands and
options.
Exercises
1. Using Eysenck factorial.sav, calculate the simple effects for age at various
conditions and compare them to the data in Table 17.4. [Hint: Split the file by
condition now, and run the ANOVA with age as the independent variable.]
2. Use the data in adaptation factorial.sav to run a factorial ANOVA where group
and education are the independent variables and maternal role adaptation is the
dependent variable. Compare your results to Table 17.5 in the textbook.
3. Create a graph that illustrates the lack of an interactive effect between education
and group on adaptation from the previous exercise.
Descriptive Statistics
Mean
headache
duration week
headache
duration week
headache
duration week
headache
duration week
headache
duration week
1
2
3
4
5
St d. Dev iation
20.78
7.17
20.00
10.22
9.00
3.12
5.78
3.42
6.78
4.12
Profile Plots
Estimated Marginal Means of MEASURE_1
30
20
10
0
1
TIME
As you can see, there is a lot of output, much of which we can ignore for our
purposes. Specifically, ignore Multivariate Tests, Tests of Within-Subjects Contrasts,
and Tests of Between Subjects Effects. The multivariate tests are another way to run the
analysis, and often not a good way. The contrasts give tests of linear and quadratic trends
in the data, and are not particularly of interest here. There are no between subjects
factors, so that output is not of interest. Now, lets look at the rest and compare it to the
answers in the text. First, you can compare the mean scores for each week by looking at
the Descriptive Statistics table. The next piece is Mauchlys Test of Sphericity, which
tests the assumption that each of the time periods is approximately equally correlated
with every other score. As noted in the text, when this assumption is violated, various
corrections are applied. Also, as noted in the text, this is not a particularly good test, but it
is about the best we have. The next table of interest, Tests of Within-Subjects Effects,
is what we really want to see. Compare the textbook values to those listed in the rows
marked Sphericity Assumed, because they were calculated the same way. As you can
see, they are in agreement.
Now, note the values for eta squared and observed power. Can you interpret
them? Nearly 73% of the variability in headache duration is accounted for by time.
Observed power is based on the assumption that the true difference in population means
is the difference implied by the sample means. Typically, we want to calculate power
going into an experiment based on anticipated or previous effect size in other similar
studies. This is useful in making decisions about sample size. So, observed power
calculated here is not particularly useful.
The graph is a nice illustration of the mean headache duration over time. You
may want to edit it to include more meaningful labels and a title.
Now, we need to calculate multiple comparisons to help us understand the
meaning of the significant effect of time on headache duration.
Multiple Comparisons
Lets just try one of the possible multiple comparisons, the comparison between
the overall baseline mean and the overall training mean. We can use SPSS
Transform/Compute to calculate these averages for us rather than doing it manually.
In the Data Editor window, select Transform/Compute.
Type baseline under Target
Variable. The under the list of
Functions, select MEAN and
arrow it into the dialog box. We
need to tell SPSS from what
variables to calculate the mean.
Select week1 and week2 to replace
the 2 question marks. Make sure
they are separated by a comma and
the question marks are gone.
Then, click Ok.
Look at the new variable in the Data Editor. Does it look right?
Click Transform/Compute again. Click Reset to remove the previous information.
Name the next Target Variable training. Select MEAN again. Specify, week3,
week4, and week5. Make sure the question marks are gone and commas separate each
variable. Then, click Ok. Check out your new variable.
Use Analyze/Descriptives to calculate the means for baseline and training. The data
follow.
As you can see, the means are consistent with those reported in the textbook. Now, you
can apply formula using MSerror from the ANOVA. The computations follow.
20.39 7.17
13.20
9.14
1
1
2.086
22.53( )
18 27
Although some hand calculations are required, we saved time and reduced the likelihood
of making errors by using SPSS to compute the new mean scores for baseline and
training for us.
In this chapter, you learned to use the General Linear Model to calculate repeated
measures ANOVAs. In addition, you learned to use SPSS to calculate new means for use
in multiple comparisons. Try the following exercises to help you become more familiar
with the process.
Exercises
The following exercises are based on Eysenck repeated.sav.
1. Use a repeated measures ANOVA to examine the effect of condition on recall.
Compare your results to those presented in the textbook in Section 18.7.
2. Use SPSS to calculate the effect size of condition.
3. Plot the mean difference in recall by conditions.
4. Use SPSS to calculate the mean of counting, rhyming, adjective, and intentional
and label it lowproc for lower processing. Then use the multiple comparisons
procedure explained in the textbook to compare the mean recall from the lower
processing conditions to the mean recall for imagery, which was the highest
processing condition. Write a brief statement explaining the results.
Chi-Square Test
Frequencies
As you can see, the expected values were 25 each, just as we expected. Now,
compare this Chi Square to the value computed in the text. Once again, they are in
agreement.
Goodness of Fit Chi Square Categories Unequal
Now, lets try an example where the expected values are not equal across
categories. The difference is we have to specify the expected proportions. This example
is based on Exercise 19.3 in the text, but the numbers in the data set are slightly different.
In the exercise, Howell discusses his theory that when asked to sort one-sentence
characteristics like I eat too fast into piles ranging from not at all like me to very
much like me, the percentage of items placed in each pile will be approximately 10%,
20%, 40%, 20%, and 10%. In our data set, the frequencies are 7, 11, 21 ,7, and 4
respectively.
Open unequal categories.sav. There is no need to save RPS.sav since we did not
change the data file in anyway.
Choose Data/Weight Cases and use Frequency as the weighting variable.
Select Analyze/Nonparametric Statistics/Chi Square.
Chi-Square Test
Frequencies
RATING
Observ ed N
7
11
Expected N
5.0
10.0
Residual
2.0
1.0
21
20.0
1.0
7
4
50
10.0
5.0
-3.0
-1.0
Test Statistics
Chi-Square a
df
Asy mp. Sig.
RATING
2.050
4
.727
As you can see, SPSS calculated the expected values based on the proportions that
we indicated-check the math if you would like. In this case, the fact that the Chi Square
is not significant supports the hypothesis. The observed frequencies of ratings fit with
the predicted frequencies.
Chi Square for Contingency Tables
Lets use an example illustrated in the text. We want to examine the hypothesis
that Prozac is an effect treatment to keep anorexics from relapsing.
As you can see the data are nicely displayed in Table 19.4 in the text.
Select File/New/Data.
In Variable View, create two variables. Name one fault and specify the Values such
that 1 = low fault and 2 = high fault. Name the other variable verdict and specify the
Values such that 1= guilty and 2 = not guilty. Then return to the Data View.
There are four possible combinations of the two variables, as illustrated in the text.
They are Drug/Success, Drug/Relapse, Placebo/Success, and Placebo/Relapse. So,
enter 1,1,2, 2 under Drug and 1, 2, 1, 2 under Outcome, in the first four rows. Then
add a column labeled Freq containing the frequencies for each cell. A sample follows.
Compare the Expected Counts to the values in the text. Finally, compare the Chi Square
values. We are interested in the Pearson Chi Square because it was calculated the same
way as the one in the textbook. Once again, the results are consistent with the textbook.
In this chapter you learned to use SPSS to calculate Goodness of Fit tests with and
without equal frequencies. You also learned to calculate Chi Square for contingency
tables, and learned a trick to reduce data entry by weighting cases. Complete the
following exercises to help you become familiar with these commands.
Exercises
1. Using alley chosen.sav, use a Goodness of Fit Chi Square to test the hypothesis
that rats are more likely than chance to choose Alley D.
2. Solve Exercise 19.3 from the textbook using SPSS. Create the data file yourself.
3. Create your own data file to represent the observed data presented in the textbook
in Table 19.2 using Weight Cases.
4. Using the data file you created in Exercise 3, calculate a Chi Square using
crosstabs to examine the hypothesis that the number of bystanders is related to
Compare this output to the results in Section 20.1 of the textbook. Specifically,
focus on the row labeled Wilcoxon W in the Test Statistics table. As you can see they
are the same. There is not a statistically significant difference in stressful life events for
the 2 groups. But if this is the Mann-Whitney test, why did I tell you to look at
Wilcoxons W? The reason is that I cheated in the text. To avoid talking about two
Wilcoxon tests, I called this one the Mann-Whitney (which is basically true) but showed
you how to calculate the Wilcoxon statistic. It honestly doesnt make any difference.
Wilcoxons Matched Pairs Signed-Ranks Test
Now, lets compare paired or related data. We will use the example illustrated in
Section 20.2 of the textbook. We will compare the volume of the left hippocampus in
twin pairs, one of whom is schizophrenic and one of whom is normal.
Open Hippocampus Volume.sav.
Select Analyze/Nonparametric Tests/2 Related Samples.
The Sum of Ranks column includes the T values. Compare them to the values in
the text. Note that the test statistic in SPSS is z. Regardless, the results are the same.
There is a significant difference in hippocampal volume between normals and
schizophrenics..
Kruskal-Wallis is already selected in the main dialog box, so just click Ok. The output
follows.
Kruskal-Wallis Test
Ranks
PROBLEM
GROUP
1
2
3
Total
N
7
8
4
19
Mean Rank
5.00
14.38
10.00
Test Statisticsa,b
Chi-Square
df
Asy mp. Sig.
PROBLEM
10.407
2
.005
As you can see these results agree with those in the text, with minor differences in
the decimal places. This is due to rounding. Both sets of results support the conclusion
that problems solved correctly varied significantly by group.
Friedmans Rank Test for K Related Samples
Now, lets move on to an example with k related samples. Well use the data
presented in Table 20.5 of the textbook as an example. We want to see if reading time is
effected when reading pronouns that do not fit common gender stereotypes.
Open pronouns.sav.
Select Analyze/Nonparametric Tests/K Related Samples.
Ranks
HESHE
SHEHE
NEUTTHEY
Mean Rank
2.00
2.64
1.36
Test Statisticsa
N
Chi-Square
df
Asy mp. Sig.
11
8.909
2
.012
a. Friedman Test
As you can see, the Chi Square value is in agreement with the one in the text. We
can conclude that reading times are related to pronoun conditions.
In this chapter, you learned to use SPSS to calculate each of the Nonparametric
Statistics included in the textbook. Complete the following exercises to help you become
familiar with each.
Exercises
1. Using birthweight.sav, use the Mann-Whitney Test to compare the birthweight of
babies born to mothers who began prenatal care in the third trimester to those who
began prenatal classes in the first trimester. Compare your results to the results
presented in Table 20.2 of the textbook. (Note: SPSS chooses to work with the
sum of the scores in the larger group (71), and thus n1 and n2 are reversed. This
will give you the same z score, with the sign reversed. Notice that z in the output
agrees with z in the text.)
2. Using anorexia family therapy.sav (the same example used for the paired t-test in
Chapter 7 of this manual), compare the subjects weight pre and post intervention
using Wilcoxons Matched Pairs Signed Ranks Test. What can you conclude?
3. Using maternal role adaptation.sav (the same example used for one-way
ANOVA in Chapter 8 of this manual), compare maternal role adaptation for the 3
groups of mothers using the Kruskal-Wallis ANOVA. What can you conclude?
4. Using Eysenck recall repeated.sav (the same example used for Repeated
Measures ANOVA in Chapter 10 of this manual), examine the effect of
processing condition on recall using Friedmans Test. What can you conclude?