SPSS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 90

1: Introduction to SPSS

Objectives
Learn about SPSS
Open SPSS
Review the layout of SPSS
Become familiar with Menus and Icons
Exit SPSS
What is SPSS?
SPSS is a Windows based program that can be used to perform data entry and
analysis and to create tables and graphs. SPSS is capable of handling large amounts of
data and can perform all of the analyses covered in the text and much more. SPSS is
commonly used in the Social Sciences and in the business world, so familiarity with this
program should serve you well in the future. SPSS is updated often. This document was
written around an earlier version, but the differences should not cause any problems. If
you want to go further and learn much more about SPSS, I strongly recommend Andy
Fields book (Field, 2009, Discovering statistics using SPSS). Those of us who have used
software for years think that we know it all and dont pay a lot of attention to new
features. I learned a huge amount from Andys book.
Opening SPSS
Depending on how the computer you are working on is structured, you can open
SPSS in one of two ways.
1. If there is an SPSS shortcut like this
cursor on it and double click the left mouse button.

on the desktop, simply put the

2. Click the left mouse button on the


button on your screen, then put
your cursor on Programs or All Programs and left click the mouse. Select SPSS 17.0
for Windows by clicking the left mouse button. (For a while that started calling the
program PASW Statistics 17, but they seem to have given that up as a dumb idea when
everyone else calls it SPSS. The version number may change by the time you read this.)
Either approach will launch the program.
Use one of these approaches to open SPSS yourself.

You will see a screen that looks like the image on the next page. The dialog box
that appears offers choices of running the tutorial, typing in data, running queries, or
opening an existing data source. The window behind this is the Data Editor window
which is used to display the data from whatever file you are using. You could select any
one of the options on the start-up dialog box and click OK, or you could simply hit
Cancel. If you hit Cancel, you can either enter new data in the blank Data Editor or you
could open an existing file using the File menu bar as explained later.
Click Cancel, and well get acquainted with the layout of SPSS.
Layout of SPSS
The Data Editor window has two views that can be selected from the lower left
hand side of the screen. Data View is where you see the data you are using. Variable
View is where you can specify the format of your data when you are creating a file or
where you can check the format of a pre-existing file. The data in the Data Editor is
saved in a file with the extension .sav.
Menu bar
Icons

Start-up dialog box

The other most commonly used SPSS window is the SPSS Viewer window which
displays the output from any analyses that have been run and any error messages.
Information from the Output Viewer is saved in a file with the extension .spo. Lets open
an output file and look at it.

On the File menu, click Open and select Output. Select appendixoutput.spo from the
files that can be found at
http://www.uvm.edu/~dhowell/fundamentals7/SPSSManual/SPSSLongerManual/Data
ForSPSS/. (At the moment this set of web pages is the most recent version whichever
of my books you are using.) Click Ok. The following will appear. The left hand side
is an outline of all of the output in the file. The right side is the actual output. To
shrink or enlarge either side put your cursor on the line that divides them. When the
double headed arrow appears, hold the left mouse button and move the line in either
direction. Release the button and the size will be adjusted.

Finally, there is the Syntax window which displays the command language used to
run various operations. Typically, you will simply use the dialog boxes to set up
commands, and would not see the Syntax window. The Syntax window would be
activated if you pasted the commands from the dialog box to it, or if you wrote you own
syntax--something we will not focus on here. Syntax files end in the extension .sps.
SPSS Menus and Icons
Now, lets review the menus and icons.
Review the options listed under each menu on the Menu Bar by clicking them one at a
time. Follow along with the below descriptions.

File includes all of the options


you typically use in other programs, such
as open, save, exit. Notice, that you can
open or create new files of multiple types
as illustrated to the right.

Edit includes the typical cut,


copy, and paste commands, and allows
you to specify various options for
displaying data and output.
Click on Options, and you will see
the dialog box to the left. You can
use this to format the data, output,
charts, etc. These choices are rather
overwhelming, and you can simply
take the default options for now. The
author of your text (me) was too
dumb to even know these options
could easily be set.
View allows you to select which toolbars you want to show, select font size, add
or remove the gridlines that separate each piece of data, and to select whether or not to
display your raw data or the data labels.
Data allows you to select several options ranging from displaying data that is
sorted by a specific variable to selecting certain cases for subsequent analyses.
Transform includes several options to change current variables. For example,
you can change continuous variables to categorical variables, change scores into rank
scores, add a constant to variables, etc.
Analyze includes all of the commands to carry out statistical analyses and to
calculate descriptive statistics. Much of this book will focus on using commands located
in this menu.
Graphs includes the commands to create various types of graphs including box
plots, histograms, line graphs, and bar charts.

Utilities allows you to list file information which is a list of all variables, there
labels, values, locations in the data file, and type.
Add-ons are programs that can be added to the base SPSS package. You probably
do not have access to any of those.
Window can be used to select which window you want to view (i.e., Data Editor,
Output Viewer, or Syntax). Since we have a data file and an output file open, lets try
this.
Select Window/Data Editor. Then select Window/SPSS Viewer.
Help has many useful options including a link to the SPSS homepage, a statistics
coach, and a syntax guide. Using topics, you can use the index option to type in any key
word and get a list of options, or you can view the categories and subcategories available
under contents. This is an excellent tool and can be used to troubleshoot most problems.
The Icons directly under the Menu bar provide shortcuts to many common
commands that are available in specific menus. Take a moment to review these as well.
Place your cursor over the Icons for a few seconds, and a description of the underlying
command will appear. For example, this icon is the shortcut for Save. Review the
others yourself.

In the chapters that follow, we will review many specific functions available
through these Menus and Icons, but it is important that you take a few moments to
familiarize yourself with the layout and options before beginning.
Exiting SPSS
To close SPSS, you can either left click on the close button
located on the
upper right hand corner of the screen or select Exit from the File menu.
Choose one of these approaches.
A dialog box like the one below will appear for every open window asking you if you
want to save it before exiting. You almost always want to save data files. Output files
may be large, so you should ask yourself if you need to save them or if you simply want
to print them.

Click No for each dialog box since we do not have any new files or changed files to
save.
Exercises
1. Look up ANOVA in Help/Help topics. What kind of information did you
find?
2. Look up compare groups for significant differences in Help/ Statistics Coach.
What did you learn?
3. Open appendixd.sav. In the Data Viewer click Grid Lines in the View menu and
note what happens.
4. While in the Data Viewer for appendixd.sav, click Font in the View menu and
select the font style and size of your choice.
5. Using Edit/Options/General, under Variable View select Display Labels and
File. In future this means that SPSS will list the variables in the order they appear
in the file using the variable labels rather than variable names. As you are
analyzing data in future exercises try to notice whether or not you like this option.
If not, change it.

2: Entering Data
Objectives
Understand the logic of data files
Create data files and enter data
Insert cases and variables
Merge data files
Read data into SPSS from other sources
The Logic of Data Files
Each row typically represents the data from 1 case, whether that be a person,
animal, or object. Each column represents a different variable. A cell refers to the
juncture of a specific row and column. For example, the first empty cell in the right hand
corner would include the data for case 1, variable 1.
Entering Data
Open SPSS and follow along as your read this description.
To enter data, you could simply begin typing information into each cell. If you
did so, SPSS would give each column a generic label such as var00001. Clearly this is
not desirable, unless you have a superior memory, because you would have no way of
identifying what var00001 meant later on. Instead, we want to specify names for our
variables. To do this, you can double left click on any column head, this will
automatically take you to the Variable View. Alternatively, you can simply click on
Variable View on the bottom left hand corner of your screen.
The first column of variable view is Name. In earlier versions names could only
be 8 characters long. Although that restriction no longer applies, you should keep names
short for ease of reading. For example, if I had depression data that was collected at
intake, and 1 month, 6 months, and 1 year post intervention, I would name those
variables depress0 or depresin (i.e., in for intake), depress1, depress6, and depres12.
SPSS also has preferences for variable names. For example, a number cannot begin a
variable name (e.g., 12depres would not be a valid name). Error messages will appear if
you have selected a name that is not allowed in SPSS. The rules for variable names
appear below. They can be found by typing variable names in the Index option under
Help/Topics and then selecting rules from the list that appears.

Next, you can select the Type of


variable. Left click on the empty cell, then left
click on the gray box with dots that appears.
The dialog box to the right appears. The most
commonly used types of data include numeric,
date, and string. For numeric data, width and
decimal places refer to the number of characters
and decimal places that will be displayed in the
Data Editor window. If you entered a value with 3 decimal places, SPSS would save that
value, but would only display the value to 2 decimal places.
String variables are those that consist of text. For example, you could type Male
and Female if gender were a variable of interest. It is important to note that SPSS is case
sensitive meaning that female and Female would not be viewed as the same category.
Misspellings are also problematic with string data (e.g., femal would not be recognized
as the intended female). For these reasons, it is advantageous to use numbers to
represent common categories, and then supply names for those levels as discussed below.
Click on Date to see the variety of available formats. Then, click Cancel.

The next columns are for Width and Decimals. You could have set this while
specifying your variable type, or you can specify them in these columns. The default for
width is 8 characters and the default for decimals is 2. To change this, left click the cell,
and up and down arrows will appear, as illustrated below. Left click the up arrow if you
want to increase the number, click the down arrow to decrease the value. Alternatively,
you can simply type the desired value in the cell.
Click here to change the width

The next column is Label. This is a very nice feature that allows you to provide
more information about the variable than you could fit in the 8 character variable name.
For example, I could type Depression assessed at intake for the example used above.
When you hold your cursor over a variable name in the Data View, the full label will
appear. This is very useful when you need a quick reminder. An example of this feature
is below.

Variable label

Since labels are so much more detailed than variable names, we can specify that SPSS
label variables this way in dialog boxes and output. Lets do this.
Click Edit/Options/Output Labels and select labels for each of the options. Then
click Ok.
The next column is Values. This allows you to assign variable labels. You will
typically use this option for categorical variables. For example, we may want the number
1 to represent males and the number 2 to represent females when we enter data on
gender. Lets try this.
Type gender in the first Name column.
Scroll over to the Values column and left click. Then, left click on the gray box that
appears on the right hand side of the cell. The Value Labels dialog box will appear.

Type the numeric value where it says Value, then type the Value Label or text to
explain what it means. Click Add. Do this for males and females. When you are
done, click Ok.

Of the remaining columns, you are most likely to use Align, which allows you to
specify how the data will appear in the cells. Your choices are left justified, right
justified, or centered. This is simply a matter of personal preference.
After you have completed specifying your variables, you can click on Data View
and begin entering your data. Put your cursor on the cell in which you want to enter data.
Type the value. If you hit Enter the cursor will move to the cell under the one you just
filled. You can also use the arrow keys to move to the next cell in any given direction.
Typically, you will either enter all of the values in one column by going down or you will
enter all of the variables in a row going from left to right.
Lets try this together.
Select depression scores.sav from the files that can be found at
http://www.uvm.edu/~dhowell/fundamentals7/SPSSManual/SPSSLongerManual/Data
ForSPSS/. SPSS will ask you if you want to save the file with the gender variable.
Click No. Then, SPSS will open depression scores. As you can see, variables have
been named and labeled, but the data have not been entered.
Enter the following data that appear on the next page in the Variable View window.
Pay attention to your own preferences for data entry (i.e., using the arrows or enter,
going across or down). Notice that there is no subject 10.
When you are done, click Save, but do not close the file. We will continue to use it as
an example.
ID
1
2
3
4
5
6
7
8
9
11
12

depressin
30.000
32.000
35.000
45.000
45.000
25.000
60.000
55.000
40.000
37.000
30.000

depress1
25.00
30.00
35.00
42.00
45.00
25.00
45.00
50.00
40.00
30.00
25.00

depress6
23.00
30.00
35.00
40.00
38.00
20.00
30.00
40.00
35.00
25.00
22.00

depres12
20.00
28.00
40.00
35.00
40.00
20.00
40.00
35.00
30.00
20.00
20.00

Inserting a Variable
After specifying the types of variables for the depression data, I realized I forgot
to include a column for ID number. Typically, I like ID to be the first variable in my data
file. I can add this in one of two ways.
1. In Variable View, highlight the first row and then click Insert Variable on the
Data menu. This will place a new variable before the selected variable.

2. In Data View, highlight the first variable column and then click the Insert
Variable icon
. This will also place a new variable column at the beginning of
the file.
Use one of the approaches above to Insert the new variable at the beginning of the file.
Name the variable ID, and label it as participant identification number.
Enter the ID data that appeared on the previous page.
Click Save, and leave the file open.
Inserting a Case
As you can see, the data for ID 10 is missing. I found the missing data and want
to enter it in the file. Id like my data to be in order by ID number, so I want to insert a
case between the person with ID 9 and ID 11. To do so, I can highlight the row for the
case with ID 11, and either:
1. click on Insert Case on the Data menu or
2. click on the Insert Case icon
. In either case, a blank row will appear
before the highlighted case. Try it yourself.
Insert a case for ID 10 using one of the above approaches.
Enter the following data:10, 38, 35, 38, 38 for ID, depresin, depress1, depress6, and
depres12 respectively.
Check the accuracy of your data entry, then click Save.
Merging Files
Adding Cases. Sometimes data that are related may be in different files that you
would like to combine or merge. For example, in a research methods class, every student
may collect and then enter data in their own data file. Then, the instructor might want to
put all of their data into one file that includes more cases for data analysis. In this case,
each file contains the same variables but different cases. To combine these files, have
one of the data files open, then left click on Merge Files on the Data menu and select
Add Cases. Then specify the file from which the new data will come and click Open. A
dialog box will appear showing you which variables will appear in the new file. View it,
and if all seems in order, click OK. The two files will be merged. This is fairly simple.
See if you can do it yourself in Exercise 3 at the end of this chapter.

Adding Variables. In other cases, you might have different data on the same
cases or participants in different files. For example, I may have recorded the
demographic information from the participants in my depression study in one file and the
depression data in another file. I may want to put them together because Id like to see if
demographic variables, like socioeconomic status or gender are related to depression. In
this case, you need to be sure the variables on the same participants end up in the correct
row, that is, you want to match the cases. In this case, we will use ID to match cases.
SPSS requires that the files you merge be in ascending order by the matching variable.
So, in both files, ID must start at 1. You can set this up by sorting cases as discussed
below. Then, make sure one of the files is open. Since this procedure is more
complicated, lets try this one together.
Open depression scores.sav from your disk (this is the data that you just entered).
Check to see if the cases are in ascending order by ID. They should be since we just
entered them that way.
Now, open depression demographics.sav. These data are not in order by ID. To fix
this, click Sort Cases under the Data menu.
In the dialog box, select participant
identification number and move it into
the Sort by box by clicking the arrow.
Make sure Ascending is selected for
Sort Order. Then click Ok.

While the demographic file is still open, click on Merge Files in the Data menu, and
select Add Variables.
The next dialog box will ask you to indicate which file the new variables are coming
from. Select depression scores.sav and click Ok. The following dialog box will
appear.

Select Match cases on


key variable in
sorted files, then
highlight id under
excluded variables,
then click the arrow to
move id to Key
Variables. Click Ok.

(I have had trouble doing things this way in the past, but succeeded by not selecting
Match cases on key variable in sorted files and just clicking OK. I dont know why,
but it worked for me.)
A dialog box will appear, reminding you that the files must be sorted. Click Ok, and
the files will be merged. You may want to do a Save As and give the merged file a
new name like depression complete.sav to help you remember what is in it.
Reading Data In From Other Sources
SPSS can also recognize data from several other sources. For example, you can
open data from Microsoft EXCEL in SPSS or you can get SPSS to read data entered in a
text file. This is an attractive option, particularly if you do not have your own version of
SPSS. It allows you to enter data in other more common programs, save them to disk,
and simply open them when you have the opportunity to use a PC that has SPSS on it.
Lets try some examples.
Opening data from EXCEL. The complete depression data is also on the web in
a file named complete depression.xls (xls is the extension for Excel data files). Take a
moment to open this file in Excel and look it over. You will see it looks just like the file
we just created except that the variable names are different because they are longer.
Specific instructions follow.
Open complete depression.xls.
Rename the variables in Excel to include eight characters if your version of SPSS will
not accept longer names. When you are done, Save your changes and close the file
because SPSS cannot read the file if it is open in another program.
Open SPSS and select Read Text Data from the File menu.
A dialog box will appear. Under Files of type, select Excel. Under Look in select the
subdirectory that holds the file you want. (I suggest saving all of the files found with
the above web link to a directory and then loading from there. That just makes life a
bit easier.) Depression complete.xls should appear. Select it and click Open. The
following dialog box will appear.

Select Read variable names from the first row


of data, because that is where the names appear in
the Excel file. Then, click Ok. Check out your
new file in SPSS. How does it look? There is no
need to save this file since it contains the same
information as depression complete.sav.

The downside is the new data file does not include variable labels or values, so
you would need to add them. You should also make sure that SPSS has identified the
variables as the correct type.
Text Data. Now, lets try an example with text data. A text data file can be
created in any word processing program or in Notepad or any other text editor. Just be
sure to save the file with the .txt or .dat file extension. SPSS can recognize text data in
several formats. Lets begin with the simplest example. I have collected data from 11
people and typed it in the following format (this is a sample, not the whole file).
012345
021123
031234
042345
051455
062111
071122
082334
092543
101345
111345

The first two digits are the ID number. The next digit is
gender. Digits 4, 5, and 6 are the responses to the first 3
questions on a survey. No characters or spaces separate the
variables. The data are on to be found in simpletextdata.txt
Normally I create data files with a space between the variables (or
use a tab). This makes them much easier to read.

Open SPSS.
In the SPSS File menu, click Read text data.
Select simpletextdata under Files of type Text and click Open.
In the next dialog box, click No for Does your text file have a predefined format and
click Next.
In the next dialog box, select Fixed width under How are your variables arranged,
then select No for Are variable names included in the top of your file. Then click
Next.
In the next dialog box, indicate that the data starts on line 1, 1 line represents a case,
and you want to import all cases, then click Next. The following dialog box will
appear. We need to tell SPSS where to insert breaks for variables.

Hold your cursor between


the numbers where a
break should be inserted.
An arrow will appear.
Add a break after the 2nd,
3rd, 4th, and 5th columns.
When you are done, click
Next.

The next dialog box will show you a draft of what your new data file will look like.
Notice, the variables will have generic names like v1, v2, etc. Then click Next.
At the next dialog box, you can click Finish and your new data file will appear. You
could then specify variable names, types, and labels as illustrated above.
Lets take one more example. This is based on the same data, but this time the
text file is tab delimitated (a tab was inserted between each variable) and has variable
names at the top. Below, is an example of the first two lines of from this text file.
ID
01

Gender Q1
2
3

Q2
4

Q3
5

On the File menu, click Read text data.


Select tabtextdata under Files of type Text, then click Open.
On the next dialog box, you will see a draft that shows a box between each variable
to represent the tabs. Are they in the right place? Select No for predefined format and
then click Next.
Select Delimited for file arrangement and Yes for variable names at the top of the file,
then click Next.
In the next dialog box, indicate that the data starts on line 2, Each line represents a
case, and you want to import all cases, then click Next.

In the next dialog box, check Tab as the type of delimiter and then click Next. (Many
of the files for the text are delimited by a space rather than a tab. You can simply choose
space in the dialog box.)
You will see a draft of your data file. Review it, and then click Next.
Click Finish at the next dialog box and your new data file will appear.
One difference between these two examples, is that the second included the
variable names at the top of the file. This, in my opinion, is the better approach because
it reduces the chance of making mistakes later in the process.
This chapter included information about entering data and opening files of various
types. This is an important part of the process because data entry errors contribute to
inaccurate results. Further, good variable names and labels allow you to perform
subsequent analyses more efficiently. Completing the following exercises will help you
internalize these processes.
Exercises
1. The data from Appendix D are on the web in a file named RxTime.sav. Open this
file and label the variables and values as described in the Howell text book on
page. Save the changes when you are done because we will use this file in
subsequent examples and exercises
2. Read the data from Fig3-2.dat on the website. These are the data on intrusive
thoughts shown in Figure 3.2 of your text. These are raw data with variable names
in the first line.
3. Review the following data. Then, create your own data file and enter the data.
Be sure to include variable and value labels. Then open exercise2.2.sav on the
disk which includes the same data. Note the similarities and differences between
your file and the file on disk. Which do you prefer? Why?

Age

Gender

Average Hours of
Sleep

Number of Classes
Missed

Grade in Course

18
18
17
19
20
21
23
22
18

Male
Female
Female
Female
Male
Female
Male
Male
Male

Seven
Four
Six
Ten
Eight
Seven and a half
Nine
Eight
Six

0
1
2
5
2
3
1
2
3

A
C
B
F
B
C
B
A
D

4. Merge the following files from the disk using the add cases option: merge1.sav
and merge2.sav.
5. Read the following text data file into SPSS: textdataexercise.txt. Be sure to open
the text file and notice the format before you proceed.
6. Read readexcelexercise.xls into SPSS. Note any problems that arise and how you
solved them.

4. Descriptive Statistics: Measures of


Variability and Central Tendency
Objectives
Calculate descriptive for continuous and categorical data
Edit output tables
Although measures of central tendency and variability were presented as separate
chapters in the Fundamentals text, they are presented together here because they are
options located in the same command windows in SPSS. Descriptive statistics are
calculated using the Analyze menu. Most are calculated using either the Descriptives or
Frequencies command under Descriptive Statistics. When calculating descriptives for
more complex designs including more than one independent variable, you can also use
the Means/ Compare Means or the Descriptive Statistics/ Crosstabs command which
allow you to calculate descriptive statistics of subgroups.
It is always important to take a moment to think about the type of data you are
using and what descriptive statistics will be most useful given the type. For continuous
or measurement data, you typically report measures of central tendency and measures of
variability. For categorical data (i.e., nominal data) you typically report the frequency of
each value. Though you dont typically report the frequencies for continuous data, it is
often useful to observe the frequency distributions or histograms of continuous
distributions to note if they are normal or skewed.
Descriptive Statistics
Lets begin by calculating descriptive statistics for the data in Appendix D which
can be found on the web as appendix.dav. (In some editions of these books the file is
referred to as Appendix Data Set or as Add.dat or as ADD.dat.) In this data set, I think
of ADD symptoms, IQ score, English grade, and GPA as continuous variables. Well
calculate measures of central tendency and variability for each of these.
Open appendixd.sav.
In the Analyze menu, select Descriptive Statistics and then Descriptives.

Select each of the continuous variables


by either double clicking them, which
automatically puts them in the Variable
list, highlight them one at a time by
single clicking them and then clicking
the arrow to shift them into the variable
list, or by holding the control key down
while highlighting all of the variables of
interest and then shifting them into the
variable list all at once by clicking the
arrow. Then click Options.
Select each of the measures youve been
learning about (Mean, Std. deviation, Variance,
Range, Minimum and Maximum). Then, select
the Display Order you would prefer. This will
determine the order they appear in for the
resulting table. I like them in the order I
indicated in the Variable list. Then click
Continue.

In the main descriptives dialog box, check the box that says Save standardized
values as variables. SPSS will calculate z scores for each of the variables using the
formula you learned about and append them to the end of your data file. Click Ok.
The resulting output will look like this. Note that the variable labels are used rather
than the variable names. Remember, we specified this as the default in
Edit/Options/Output Labels.

Double click the table so you can edit it. As was the case with graphs, SPSS has many
options to edit statistics in tables as well. Lets try some of them.

Under Pivot, select Transpose Rows and Columns. Which orientation do you
prefer? I like the first since its more conventional, so I will Transpose the Rows and
Columns again to return to the original orientation.
Now, click on Format/Table properties. Take a moment to view all of the options in
this dialog box. General allows you to specify the width of row and column labels.
Footnotes allows you to chose numeric or alphabetic labels and subscript or
superscript as the position for those labels. Cell formats allows you to change the
font style and size, color, and the alignment. Borders allows you to add or remove
borders around rows, columns, and even cells. Printing allows you to select options
such as rescaling tables to fit on paper. After youve viewed the options, hit Cancel.

Now, select Format/Table


Looks. Scroll through the
TableLook Files and look at
the samples. Select one you
like and click Ok. I chose
Academic.

The resulting table is below. I could edit each individual cell by double clicking on it
and then edit the text. For example, I could alter each statistic to include 2 decimal
places if I wanted. You try it.

Now, click on Window/SPSS Statistics Data Editor and look at the standardized
values (z scores) SPSS added to your file. A brief portion of the Data Editor appears
below. You can see that SPSS named each variable with a z. SPSS also labeled the
new variables. Check this out in Variable View.

Frequencies
Now, well use the frequencies command to help us examine the distributions of
the same continuous variables.
Select Analyze/Descriptive Statistics/Frequency.
Put the variables of interest in the
Variable list box. Unselect
Display frequency tables,
because this will be a list of the
frequency of every value. (Ignore
what looks like an error message.)
Click on Charts, select
Histogram with normal curve
and click Continue. Now, click
Statistics.
Click on Statistics. This dialog
box has all of the same options
we selected under Descriptives
earlier. However, the
Descriptives dialog box did not
include the median and mode.
Select all of the statistics of
interest and click Continue.
Then, click Ok. A sample of the
output follows.

Frequencies

Histogram

Take a moment to review the output. It looks like ADD is somewhat normally
distributed, though a bit negatively skewed. Looking at your own output, are the other
variables normally distributed? I also remember now that English grade is nominal too.
Variables were scored as A, B, C, D, and F, though coded as 1 - 4. As noted in the text,

we could analyze this as continuous data, but it seems that reporting the frequencies
rather than measures of central tendency and dispersion may be more appropriate for this
variable.
As before, you can edit the tables or the graphs by double clicking on them. One
difference we have seen between the Descriptives and Frequencies options is that
descriptives only include mean for measures of central tendency whereas Frequencies
include the mean, median, and mode. Further, Descriptives does not have any built in
graphing options, but Frequencies does.
Now lets use Frequencies to describe categorical data.
Select Analyze/Descriptive Statistics/Frequencies.
This time, put gender, level of English class, English grade, repeated a grade, social
problems, and drop out status in the variable list. Select Display frequency table.
Since there is a finite number of values, we want to know how many people fit in
every category. Click on Statistics and unselect all of the options because we decided
that measures of central tendency and variability are not useful for these data. Then
click Continue. Next, click on Charts. Click on Bar chart and select Percentages
as the Chart Values. Click Continue and then Ok. A sample of the resulting output is
below. Take a moment to review it.

Notice that the frequency tables include a column labeled Percent and another
labeled Valid percent. This is an important distinction when you have missing cases.
The percent column indicates the percent of cases in each category out of those cases for
which there is complete data on the variable. Valid percent indicates the percent of cases
in each category out of the total number of cases, even if some data are missing. For
example, imagine a sample of 100 students. Fifty cases are women, 40 are men, and 10
are missing the data. The percent of men would be 44.4%, but the valid percent of men
would be 40%. Which do you believe is the more accurate way to describe the sample?
Id argue the valid percent. Now lets move on to a more complicated type of frequency
table.
Crosstabs
Sometimes we need to know the number and percent of cases that fall in multiple
categories. This is useful when we have multiple categorical variables in a data set. For
example, in the data set we have been using, Id like to know what percent of dropout and
nondropout students had social problems. Well use crosstabs to calculate this.
Click Analyze/Descriptive Statistics/Crosstabs.

Select social problems for Rows and


dropped out for Columns. Click on
Cells and select Observed for Counts,
and select Row, Column, and Total
under Percentages. The click
Continue. Lets select Display
clustered bar charts to see if we find
this option useful. Then, click Ok. The
output follows. You can edit both the
table and the chart as you have learned.

Both the table and the graph show that of those youth with social problems, an
equal number did and did not ultimately drop out. This suggests that social problems in
ninth grade and drop out status are independent, something we can test later using chi
square.
Compare Means
Now, lets consider a case where we want to describe a continuous variable but at
different levels of a categorical variable. This is often necessary when you are comparing
group means. For example, we can compare ADD symptoms for males and females.
Lets try it together.
Select Analyze/Compare Means/Means. Notice this is the first time we havent
selected Descriptive Statistics in this chapter.
Select ADD score for the
Dependent List and Gender
for the Independent List.
Click Options. Notice that
mean, standard deviation and
number of cases are already
selected under statistics. Add
any other descriptive you are
interested in, then click
Continue and then Ok. The
output follows.

Do you think males and females differed in their ADD symptoms?


Lets try another more complicated example. This time, lets calculate descriptive
statistics for ADD symptoms broken down by gender and whether or not a child had
social problems.
Select Analyze/Compare Means/Means.

Just like before, select


ADD score for the
Dependent List, and
gender for the Layer 1
Independent List. Then
click Next. Select social
problems as the Layer 2
Independent List. Select
whatever statistics you
want under Options and
then click Continue and
Ok. The output is below.

Notice that this table gives you the marginal descriptives (i.e., the descriptive for
gender independent of social problems and vice versa) under totals and the cell
descriptives (i.e., the descriptives at each level of the variables-e.g., for boys with social
problems).
Exit SPSS. There is no need to save the Data File since we havent changed it. It is up
to you to decide whether or not you would like to save the output file for future
reference.
Weve reviewed a variety of options for calculating descriptive statistics
depending on the type of data and the kinds of questions. Weve also seen that many of
the graphs we reviewed in Chapter 3 are options in the subcommands under Descriptive
Statistics. In the following chapters you will discover that descriptive statistics are an
option embedded within many other analyses dialog boxes (e.g. t-test, ANOVA, etc).
Try the following exercises to be sure you understand all of the various options for
calculating descriptives and to help you identify your own preferences.

Exercise
1. Using merge1.sav calculate the mean, median, mode, range, variance, and
standard deviation for the following variables: self-esteem, anxiety, coping, and
health. Create a histogram for anxiety. Note how you did each.
2. Using the data in appendixd.sav, calculate the frequency and percent of females
and males who did and did not have social problems.
3. Using the data in appendixd.sav, calculate the mean, variance, and standard
deviation for GPA broken down by social problems and drop out status.

5. Correlation
Objectives
Calculate correlations
Calculate correlations for subgroups using split file
Create scatterplots with lines of best fit for subgroups and multiple
correlations
Correlation
The first inferential statistic we will focus on is correlation. As noted in the text,
correlation is used to test the degree of association between variables. All of the
inferential statistics commands in SPSS are accessed from the Analyze menu. Lets open
SPSS and replicate the correlation between height and weight presented in the text.
Open HeightWeight.sav. Take a moment to review the data file.
Under Analyze, select Correlate/Bivariate. Bivariate means we are examining the
simple association between 2 variables.
In the dialog box, select height
and weight for Variables.
Select Pearson for
Correlation Coefficients since
the data are continuous. The
default for Tests of
Significance is Two-tailed.
You could change it to Onetailed if you have a directional
hypothesis. Selecting Flag
significant correlations means
that the significant correlations
will be noted in the output by
asterisks. This is a nice
feature. Then click Options.

Now you can see how


descriptive statistics are built
into other menus. Select
Means and standard
deviations under Statistics.
Missing Values are
important. In large data sets,
pieces of data are often
missing for some variables.

For example I may run correlations between height, weight, and blood pressure. One
subject may be missing blood pressure data. If I check Exclude cases listwise, SPSS will
not include that persons data in the correlation between height and weight, even though
those data are not missing. If I check Exclude cases pairwise, SPSS will include that
persons data to calculate any correlations that do not involved blood pressure. In this
case, the persons data would still be reflected in the correlation between height and
weight. You have to decide whether or not you want to exclude cases that are missing
any data from all analyses. (Normally it is much safer to go with listwise deletion, even
though it will reduce your sample size.) In this case, it doesnt matter because there are
no missing data. Click Continue. When you return to the previous dialog box, click
Ok. The output follow.
Correlations
Descriptive Statistics

HEIGHT
WEIGHT

Mean
68.72
145.15

St d. Dev iation
3.66
23.74

N
92
92

Correlations

HEIGHT

WEIGHT

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

HEIGHT
WEIGHT
1.000
.785**
.
.000
92
92
.785**
1.000
.000
.
92
92

**. Correlation is signif icant at the 0.01 lev el


(2-tailed).

Notice, the correlation coefficient is .785 and is statistically significant, just as


reported in the text. In the text, Howell made the point that heterogeneous samples affect
correlation coefficients. In this example, we included both males and females. Lets
examine the correlation separately for males and females as was done in the text.

Subgroup Correlations
We need to get SPSS to calculate the correlation between height and weight
separately for males and females. The easiest way to do this is to split our data file by
sex. Lets try this together.
In the Data Editor window, select Data/Split file.
Select Organize output by
groups and Groups Based on
Gender. This means that any
analyses you specify will be
run separately for males and
females. Then, click Ok.

Notice that the order of the data file has been changed. It is now sorted by Gender,
with males at the top of the file.
Now, select Analyze/Correlation/Bivariate. The same variables and options you
selected last time are still in the dialog box. Take a moment to check to see for
yourself. Then, click Ok. The output follow broken down by males and females.
Correlations
SEX = Male
Descriptive Statisticsa

HEIGHT
WEIGHT

Mean
70.75
158.26

a. SEX = Male

St d. Dev iation
2.58
18.64

N
57
57

SEX = Female
Descriptive Statisticsa

HEIGHT
WEIGHT

Mean
65.40
123.80

St d. Dev iation
2.56
13.37

N
35
35

a. SEX = Female

As before, our results replicate those in the text. The correlation between height
and weight is stronger for males than females. Now lets see if we can create a more
complicated scatterplot that illustrates the pattern of correlation for males and females on
one graph. First, we need to turn off split file.
Select Data/Split file from the Data Editor window. Then select Analyze all cases, do
not compare groups and click Ok. Now, we can proceed.
Scatterplots of Data by Subgroups
Select Graphs/Legacy/Scatter. Then, select Simple and click Define.

To be consistent with the graph in the


text book, select weight as the Y Axis
and height as the X Axis. Then, select
sex for Set Markers by. This means
SPSS will distinguish the males dots
from the female dots on the graph.
Then, click Ok.

When your graph appears, you will see that the only way males and females are
distinct from one another is by color. This distinction may not show up well, so lets edit
the graph.
Double click the graph to activate the Chart Editor. Then double click on one of the
female dots on the plot. SPSS will highlight them. (I often have trouble with this. If it
selects all the points, click again on a female one. That should do it.) Then click the
Marker menu.
Click on
Select
theChart/Options.
circle under Marker Type and chose a Fill color. Then click
Apply. Then click on the male dots, and select the open circle in Marker
Type and click Apply. Then, close the dialog box. The resulting graph
should look just like the one in the textbook.
I would like to alter our graph to include the line of best fit for both groups.
Under Elements, select Fit Line at Subgroups. Then select Linear and click
Continue. (I had to select something else and then go back to Linear to highlight the
Apply button.) The resulting graph follows. I think it looks pretty good.

Edit the graph to suit your style as you learned in Chapter 3 (e.g., add a title,
change the axes titles and legend).
This more complex scatterplot nicely illustrates the difference in the correlation
between height and weight for males and females. Lets move on to a more complicated
example.
Overlay Scatterplots
Another kind of scatterplot that might be useful is one that displays the
association between different independent variables with the same dependant variable.
Above, we compared the same correlation for different groups. This time, we want to
compare different correlations. Lets use the course evaluation example from the text . It
looks like expected grade is more strongly related to ratings of fairness of the exam than
ratings of instructor knowledge is related to the exam. Id like to plot both correlations. I
can reasonably plot them on the same graph since all of the questions were rated on the
same scale.
Open courseevaluation.sav. You do not need to save HeightWeight.sav since you did
not change it. So click No.

First, lets make sure the correlations reported in the text are accurate. Click
Analyze/Correlation/Bivariate and select all of the variables. Click Ok. The output
follow. Do they agree with the text?
Correlati ons

OVERALL

TEACH

EXAM

KNOWLEDG

GRADE

ENROLL

Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N

OVERALL
1.000
.
50
.804**
.000
50
.596**
.000
50
.682**
.000
50
.301*
.034
50
-.240
.094
50

TEACH
.804**
.000
50
1.000
.
50
.720**
.000
50
.526**
.000
50
.469**
.001
50
-.451**
.001
50

EXAM
KNOWLEDG
GRADE
ENROLL
.596**
.682**
.301*
-.240
.000
.000
.034
.094
50
50
50
50
.720**
.526**
.469**
-.451**
.000
.000
.001
.001
50
50
50
50
1.000
.451**
.610**
-.558**
.
.001
.000
.000
50
50
50
50
.451**
1.000
.224
-.128
.001
.
.118
.376
50
50
50
50
.610**
.224
1.000
-.337*
.000
.118
.
.017
50
50
50
50
-.558**
-.128
-.337*
1.000
.000
.376
.017
.
50
50
50
50

**. Correlation is signif icant at the 0.01 lev el (2-tailed).


*. Correlation is signif icant at the 0.05 lev el (2-tailed).

Now, lets make our scatterplot.


Select Graphs/Legacy/Scatter. Then select Overlay and click Define.

Click on exam and grade and shift them into Y-X Pairs. Then click on exam and
knowledge and click them into Y-X pairs. Since exam is the commonality between
both pairs, Id like it to be on the Y axis. If it is not listed as Y, highlight the pair
and click on the two-headed arrow. It will reverse the ordering. Exam should then
appear first for both. Then, click Ok.

As in the previous example, the dots are distinguished by color. Double click the
graph and use the Marker icon to make them more distinct as you learned above. Also
use the Elements menu to Fit line at total. It will draw a line for each set of data.

Note that the axes are not labeled. You could label the Y Axis Grade.
But you could not label the X axis because it represents two different
variables-exam and knowledge. That is why the legend is necessary. (If you
figure out how to label that axis, please let me know. It should be so easy.)
As you can see, the association between expected grade and fairness of the exam
is stronger than the correlation between instructors knowledge and the fairness of the
exam.
Now, you should have the tools necessary to calculate Person Correlations and to
create various scatterplots that compliment those correlations. Complete the following
exercises to help you internalize these steps.
Exercises
Exercises 1 through 3 are based on appendixd.sav.
1. Calculate the correlations between Add symptoms, IQ, GPA, and English
grade twice, once using a one-tailed test and once using a two-tailed test.
Does this make a difference? Typically, when would this make a difference.

2. Calculate the same correlations separately for those who did and did not drop
out, using a two-tailed test. Are they similar or different?

3. Create a scatterplot illustrating the correlation between IQ score and GPA for
those who did and did not drop out. Be sure to include the line of best fit for
each group.
4. Open courseevaluation.sav. Create a scatterplot for fairness of exams and
teacher skills and exam and instructor knowledge on one graph. Be sure to
include the lines of best fit. Describe your graph.

6: Regression and Multiple Regression


Objectives
Calculate regressions with one independent variable
Calculate regressions with multiple independent variables
Scatterplot of predicted and actual values
Calculating residuals and predicted values
Regression
Regression allows you to predict variables based on another variable. In this
chapter we will focus on linear regression or relationships that are linear (a line) rather
than curvilinear (a curve) in nature. Lets begin with the example used in the text in
which mental health symptoms are predicted from stress.
Open symptoms and stress.sav.
Select Analyze/Regression/Linear.

Select symptoms as the Dependent


variable and stress as the
Independent variable. Then, click
on Statistics to explore our
options. The following dialog box
will appear.

As you can see there are


many options. We will focus
only on information covered
in the textbook. Estimates
and Model Fit are selected
by default. Leave them that
way. Then select
Descriptives and Part and
partial correlations. SPSS
will then calculate the mean
and standard deviation for
each variable in the equation
and the correlation between
the two variables. Then,
click Continue.

At the main dialog box, click on Plots so we can see our options.
It looks like we can create scatterplots here.
Click Help to see what the abbreviations
represent. Id like to plot the Dependent
variable against the predicted values to see
how close they are. Select Dependnt for Y
and Adjpred for X. Adjpred is the adjusted
prediction. Used Help/Topics/Index to find
out what this means for yourself. Then, click
Continue.

In the main dialog box, click Save, and the


dialog box to the left will appear. For
Predicted Values, select Unstandardized and
Standardized. For Residuals, also select
Unstandardized and Standardized. Now,
SPSS will save the predicted values of
symptoms based on the regression equation
and the residual or difference between the
predicted values and actual values of
symptoms in the data file. This is a nice
feature. Remember, the standardized values
are based on z score transformations of the
data whereas the unstandardized values are
based on the raw data. Click Continue.

Finally, click on Options.

Including a constant in the equation is


selected by default. This simply means
that you want both a slope and an
intercept (the constant). Thats good. We
will always leave this checked. Excluding
cases listwise is also fine. We do not have
any missing cases in this example
anyway. Click Continue, and then Ok in
Descriptive Statistics
the main dialog box. The output follows.
Mean
90.70
21.47

SY MPTOMS
STRESS

Std. Dev iat ion


20.27
13.10

N
107
107

Correlati ons

Pearson Correlation
Sig. (1-tailed)
N

SY MPTOMS
1.000
.506
.
.000
107
107

SY MPTOMS
STRESS
SY MPTOMS
STRESS
SY MPTOMS
STRESS

STRESS
.506
1.000
.000
.
107
107

Variabl es Entered/Removedb

Model
1

Variables
Entered
STRESSa

Variables
Remov ed
.

Method
Enter

a. All requested v ariables entered.


b. Dependent Variable: SYMPTOMS

Model Summaryb

Model
1

R
.506a

R Square
.256

Adjusted
R Square
.249

St d. Error of
the Estimate
17.56

a. Predictors: (Constant), STRESS


b. Dependent Variable: SYMPTOMS
ANOVAb

Model
1

Regression
Residual
Total

Sum of
Squares
11148.382
32386.048
43534.430

a. Predictors: (Const ant), STRESS


b. Dependent Variable: SY MPTOMS

df
1
105
106

Mean Square
11148.382
308.439

F
36.145

Sig.
.000a

Coeffi ci entsa

Model
1

(Constant)
STRESS

Unstandardized
Coef f icients
B
St d. Error
73.890
3.271
.783
.130

St andardi
zed
Coef f icien
ts
Beta
.506

t
22.587
6.012

Sig.
.000
.000

Zero-order
.506

Correlations
Part ial
.506

Part
.506

a. Dependent Variable: SYMPTOMS

Charts

How does our output compare to the output presented in the textbook? Take a moment
to identify all of the key pieces of information. Find r2, find the ANOVA used to test
the significance of the model, find the regression coefficients used to calculate the
regression equation. One difference is that the text did not include the scatterplot.
What do you think of the scatterplot? Does it help you see that predicting symptoms
based on stress is a pretty good estimate? You could add a line of best fit to the
scatterplot using what you learned in Chapter 5.
Now, click Window/Symptoms and stress.sav and look at the new data (residuals and
predicted values) in your file. A small sample is below. Note how they are named
and labeled.

Lets use what we know about the regression equation to check the accuracy of the
scores created by SPSS. We will focus on the unstandardized predicted and residual
values. This is also a great opportunity to learn how to use the Transform menus to
perform calculations based on existing data.
We know from the regression equation that:
Symptoms Predicted or Y = 73.890 + .783* Stress.
We also know that the residual can be computed as follows:
Residual = Y- Y or Symptoms Symptoms Predicted Values.
Well use SPSS to calculate these values and then compare them to the values computed
by SPSS.
In the Data Editor window, select Transform/Compute.

Check the Data Editor to see if your new variable is there, and compare it to pre_1.
Are they the same? The only difference I see is that our variable is only expressed to 2
decimal places. But, the values agree.
Follow similar steps to calculate the residual. Click on Transform/Compute. Name
your Target Variable sympres and Label it symptoms residual. Put the formula
symptoms-sympred in the Numeric Expression box by double clicking the two preexisting variables and typing a minus sign between them. Then, click Ok.
Compare these values to res_1. Again they agree. A portion of the new data file is
below.

Now that you are confident that the predicted and residual values computed by
SPSS are exactly what you intended, you wont ever need to calculate them yourself
again. You can simply rely on the values computed by SPSS through the Save command.
Multiple Regression
Now, lets move on to multiple regression. We will predict the dependent
variable from multiple independent variables. This time we will use the course
evaluation data to predict the overall rating of lectures based on ratings of teaching skills,
instructors knowledge of the material, and expected grade.
Open course evaluation.sav. You may want to save symptoms and stress.sav to
include the residuals. Thats up to you.
Select Analyze/Regression/Linear.
Select overall as the Dependent
variable, and teach, knowledge,
and grade as the Independents.
Since there are multiple
independent variables, we need
to think about the Method of
entry. As noted in the text,
stepwise procedures are
seductive, so we want to select
Enter meaning all of the
predictors will be entered
simultaneously.

Click Statistics and select Descriptives and Part and partial correlations. Click
Continue.
Click Plots and select Dependnt as Y and Adjpred as X. Click Continue.
Click Save and select the Residuals and Predicted values of your choice. Click
Continue.
Click Ok at the main dialog box. The output follows.

Descriptive Stati stics


Mean
3.55
3.66
4.18
3.49

OVERALL
TEACH
KNOWLEDG
GRADE

Std. Dev iat ion


.61
.53
.41
.35

N
50
50
50
50

Correlations

Pearson Correlation

Sig. (1-tailed)

OVERALL
TEACH
KNOWLEDG
GRADE
OVERALL
TEACH
KNOWLEDG
GRADE
OVERALL
TEACH
KNOWLEDG
GRADE

OVERALL
1.000
.804
.682
.301
.
.000
.000
.017
50
50
50
50

TEACH
.804
1.000
.526
.469
.000
.
.000
.000
50
50
50
50

Variabl es Entered/Removedb

Model
1

Variables
Entered

Variables
Remov ed

GRADE,
KNOWLED
a
G, TEACH

Method

Enter

a. All requested v ariables entered.


b. Dependent Variable: OVERALL

Model Summaryb

Model
1

R
.863a

R Square
.745

Adjusted
R Square
.728

St d. Error of
the Estimate
.32

a. Predictors: (Constant), GRADE, KNOWLEDG, TEACH


b. Dependent Variable: OVERALL

KNOWLEDG
.682
.526
1.000
.224
.000
.000
.
.059
50
50
50
50

GRADE
.301
.469
.224
1.000
.017
.000
.059
.
50
50
50
50

ANOVAb

Model
1

Regression
Residual
Total

Sum of
Squares
13.737
4.708
18.445

df
3
46
49

Mean Square
4.579
.102

F
44.741

Sig.
.000a

a. Predictors: (Const ant), GRADE, KNOWLEDG, TEACH


b. Dependent Variable: OVERALL

Coeffi cientsa

Model
1

Unstandardized
Coef f icients
B
St d. Error
-.927
.596
.759
.112
.534
.132
-.153
.147

(Constant)
TEACH
KNOWLEDG
GRADE

St andardi
zed
Coef f icien
ts
Beta

t
-1.556
6.804
4.052
-1.037

.658
.355
-.088

Sig.
.127
.000
.000
.305

a. Dependent Variable: OVERALL

Charts
Scatterplot
Dependent Variable: OVERALL
5.0

4.5

4.0

3.5

OVERALL

3.0

2.5
2.0
2.0

2.5

3.0

3.5

4.0

4.5

5.0

Regression Adjusted (Press) Predicted Value

Compare this output to the results in the text. Notice the values are the same, but the
styles are different since the output in the book (earlier edition) is from Minitab, a
different data analysis program.
Exit SPSS. Its up to you to decide if you want to save the changes to the data file and
the output file.
In this chapter, you have learned to use SPSS to calculate simple and multiple
regressions. You have also learned how to use built in menus to calculate descriptives,
residuals and predicted values, and to create various scatterplots. As you can see, SPSS

has really simplified the process. Complete the following exercises to increase your
comfort and familiarity with all of the options.
Exercises
1. Using data in course evaluations.sav, predict overall quality from expected grade.
2. To increase your comfort with Transform, calculate the predicted overall score
based on the regression equation from the previous exercise. Then calculate the
residual. Did you encounter any problems?
3. Using data in HeightWeight.sav, predict weight from height and gender. Compare
your results to the output in Table 11.6 of the textbook.
4. Using the data in cancer patients.sav, predict distress at time 2 from distress at
time 1, blame person, and blame behavior. Compare your output to the results
presented in Table 11.7 in the textbook.

7. Comparing Means Using t-tests.


Objectives
Calculate one sample t-tests
Calculate paired samples t-tests
Calculate independent samples t-tests
Graphically represent mean differences
In this chapter, we will learn to compare means using t-tests. We will cover
information that is presented in the Fundamentals textbook in Chapters, 12, 13, and 14
and in the Methods book as Chapter 7. One important thing to note is that SPSS uses the
term paired sample t-test to reflect what the textbook refers to as related samples t-tests.
They are the same thing.
One Sample t-tests
One sample t-tests are typically used to compare a sample mean to a known
population mean. Lets use the moon illusion example illustrated in the text. We want to
know if there was a moon illusion using the apparatus. If there was, the obtained ratio
should not equal 1. Lets try this together.
Open moon illusion.sav.
Select Analyze/Compare Means/One-Sample t-test.

Select illusion as the Test Variable.


Type 1 in as the Test Value. We are
testing the null hypothesis that the sample
mean = 1. Then, click Options.
Notice, SPSS allows us to specify what
Confidence Interval to calculate. Leave
it at 95%. Click Continue and then Ok.
The output follows.

One-Sample Statisti cs

N
ELEVATE

10

Mean
1.46

St d. Dev iation
.34

St d. Error
Mean
.11

One-Sample Test
Test Value = 1

ELEVATE

t
4.298

df
9

Sig. (2-tailed)
.002

Mean
Dif f erence
.46

95% Conf idence


Interv al of t he
Dif f erence
Lower
Upper
.22
.71

Notice that descriptive statistics are automatically calculated in the one-sample t-test.
Does our t-value agree with the one in the textbook? Look at the Confidence
Interval. Notice that it is not the confidence interval of the mean, but the confidence
interval for the difference between the sample mean and the test value we specified, in
this case 1.
Now, lets move on to related or paired samples t-tests.
Paired Samples t-tests
A paired samples t-test is used to compare two related means. It tests the null
hypothesis that the difference between two related means is 0. Lets begin with the
example of weight gain as a function of family therapy in the text. We want to see if the
difference in weight before and after a family therapy intervention is significantly
different from 0.
Open anorexia family therapy.sav. You dont need to save moon illusion.sav since we
didnt change the data file.
Select Analyze/Compare Means/Paired Samples t-test.
Select weight before and weight after
family therapy and click them into the
Paired Variables box using the arrow.
Then click Options. Notice you can select
the confidence interval you want again.
Leave it at 95%, click Continue, and then
click Ok. The output follows.
T-Test

Paired Samples Statistics

Mean
Pair
1

weight bef ore f amily


therapy
weight af ter f amily therapy

Std. Dev iat ion

Std. Error
Mean

83.2294

17

5.0167

1.2167

90.4941

17

8.4751

2.0555

Paired Samples Correlations


N
Pair
1

weight bef ore f amily


therapy & weight
af ter f amily therapy

Correlation
17

.538

Sig.
.026

Paired Samples Test


Paired Dif f erences

Mean
Pair
1

weight bef ore f amily


therapy - weight
af ter f amily therapy

-7.2647

St d. Dev iation

St d. Error
Mean

7.1574

1.7359

95% Conf idence


Interv al of t he
Dif f erence
Lower
Upper
-10.9447

-3.5847

t
-4.185

df

Sig. (2-tailed)
16

.001

Notice, the descriptives were automatically calculated again. Compare this output to
the results in the text. Are they in agreement? The mean difference is negative here
because weight after the treatment was subtracted from weight before the treatment.
So the mean difference really shows that subjects tended to weigh more after the
treatment. If you get confused by the sign of the difference, just look at the mean
values for the before and after weights. Notice that this time the confidence interval is
consistent with what we would expect. It suggests we can be 95% confident that the
actual weight gain of the population of anorexics receiving family therapy is within the
calculated limits.
If you want to see the mean difference graphically, try to make a bar graph using what
you learned in Chapter 3. [Hint: Select Graphs/Legacy/Bar, then select Simple and
Summaries of separate variables. Select weight before and weight after family
therapy for Bars Represent. Use mean as the Summary score. Click Ok. Edit your
graph to suit your style.] Mine appears below.

Independent Samples t-test


An independent samples t-test is used to compare means from independent
groups. Lets try one together using the horn honking example in the text. We will test
the hypothesis that people from the Adams et al. (1996) study that homophobic subjects
are more aroused by homosexual videos.
Open Homophobia.sav. You dont need to save anorexia family therapy.sav since we
did not change the data file.
Select Analyze/Compare Means/Independent Samples t-test.

Select latency as the Test Variable


and group as the Grouping
Variable. Then, click Define
Groups.

Type 1 for Group 1 and 2 for Group 2, to


indicate what groups are being compared.
Then, click Continue. The Options are the
same as the other kinds of t-tests. Look at
them if you would like. Then, click Ok. The
output follows.

As before, the descriptives were calculated automatically. Remember, with an


independent groups t-test, we are concerned with homogeneity of variance because it
determines whether or not to use the pooled variance when calculating t. Since
Levenes test for the equality of variances is significant, we know the variances are
significantly different, so they probably should not be pooled. Thus, we will use the t
reported in the row labeled Equal variances not assumed. Compare this value to the t
value reported in the text. The results support the hypothesis that men who score high
on a homopohobic scale also show higher levels of arousal to a homosexual videa.
Now, lets create a bar graph to illustrate this group difference.
Select Graphs/Legacy/Bar. Then select Simple and Summaries for groups of cases
and click Define. Then select latency for Bars Represent, and select group for
Category Axis. Click Ok. Edit the graph to suit your style. My graph follows.

In this chapter, you have learned to calculate each of the 3 types of t-tests covered
in the textbook. You have learned to display mean differences graphically as well.
Complete the following exercises to help you internalize when each type of t-test should
be used.
Exercises
1. Use the data in sat.sav to compare the scores of students who did not see the
reading passage to the score you would expect if they were just guessing (20)
using a one-sample t test. Compare your results to the results in the textbook in
Section 12.9. What conclusions can you draw from this example?
2. Open moon illusion paired.sav. Use a paired samples t-test to examine the
difference in the moon illusion in the eyes elevated and the eyes level conditions.
Compare your results to the results presented in Section 13.3 in the textbook.
3. Create a bar graph to display the difference, or lack thereof, in the moon illusion
in the eyes level and eye elevated conditions, from the previous exercise.
4. Using the data in horn honking.sav, create a boxplot illustrating the group
differences in latencies for low status and high status cars. Compare your boxplot
to the one in the textbook in Figure 14.3.
5. Open anorexia weight gain.sav. In this data set, weight gain was calculated for
three groups of anorexics. One group received family therapy, another cognitive

behavioral therapy, and the final group was a control group. Use an independent
samples t-test to compare weight gain between the control group and family
therapy group. Compare your results to the data presented in the textbook in
Table 14.1.
6. In the same data set, use independent t-tests to compare the weight gain for the
cognitive behavior therapy and control group and for the two therapy groups.
Now that you have compared each of the groups, what conclusions would you
draw about which type of therapy is most effective?
7. Using the same data set, create a bar graph or box plot that illustrates weight gain
for all 3 groups.

8. Comparing Means Using One Way


ANOVA
Objectives
Calculate a one-way analysis of variance
Run various multiple comparisons
Calculate measures of effect size
A One Way ANOVA is an analysis of variance in which there is only one
independent variable. It can be used to compare mean differences in 2 or more groups.
In SPSS, you can calculate one-way ANOVAS in two different ways. One way is
through Analyze/Compare Means/One-Way ANOVA and the other is through
Analyze/General Linear Model/Univariate. Well try both in this chapter so we can
compare them.
One-Way ANOVA
Lets begin with an example in the textbook illustrated in Table 16.6. Maternal
role adaptation was compared in a group of mothers of low birth-weight (LBW) infants
who had been in an experimental intervention, mothers of LBW infants who were in a
control group, and mothers of full-term infants. The hypothesis was that mothers of
LBW infants in the experimental intervention would adapt to their maternal role as well
as mothers of healthy full-term infants, and each of these groups would adapt better than
mothers of LBW infants in the control group.
Open maternal role adaptation.sav.
Select Analyze/Compare Means/One-Way ANOVA.
Select maternal role adaptation for the
Dependent List since it is the dependent
variable. Select group as the Factor or
independent variable. Then click Post Hoc to
see various options for calculating multiple
comparisons. If the ANOVA is significant,
we can use the post hoc tests to determine
which specific groups differ significantly
from one another.

As you can see, there are


many options. Lets select
LSD under Equal Variances
Assumed since it is Fishers
Least Significant Difference
Test which is calculated in the
text, except that SPSS will test
the differences even if the
overall F is not significant.

Note that .05 is the default under Significance level. After consulting with SPSS
technical support, it is clear that this is the experiment-wise or family-wise significance
level. So any comparison flagged by SPSS as significant is based on a Bonferroni
Type Correction. You do not need to adjust the significance level yourself.
Click Options. In the next dialog box, select
Descriptives under Statistics, and select Means plot
so SPSS will create a graph of the group means for us.
The default under Missing Values is Exclude cases
analysis by analysis. Lets leave this as is. Click
Continue and then Ok. The output follows.

Means Plots
19

18

17

16

15

14
LBW Experimental

LBW Control

Full-term

GROUP

Compare this output to the results presented in the text.


We can see the descriptive statistics and the F value are the same. It is harder to
compare the post hoc comparisons because SPSS does not display the t values. They
simply report the mean difference and the significance level. The important thing to note
is that the conclusions we can draw based on each of these approaches are the same.

The plot that SPSS created is an effective way to illustrate the mean differences.
You may want to edit the graph using what you learned in Chapter 3 to make it more
elegant. Some people would prefer a bar chart since these are independent groups and a
line suggests they are related. You could create a bar chart of these group means
yourself.
Lets re-run the same analysis using the General Linear Model (GLM) and see
how they are similar and different.
General Linear Model to Calculate One-Way ANOVAs
The Univariate General Linear Model is really intended to test models in which
there is one dependent variable and multiple independent variables. We can use it to run
a simple one-way ANOVA like the one above. One advantage of doing so is that we can
estimate effect size from this menu, but we could not from the One-Way ANOVA menus.
Lets try it.
Select Analyze/General Linear Model/Univariate.
As you can see by this dialog box, there
are many more options than the OneWay ANOVA. This is because the
GLM is a powerful technique that can
examine complex designs. Well just
focus on what is relevant to us. As
before, select maternal role adaptation
as the Dependent Variable and group
as the Fixed Factor or independent
variable. Then, click Plots.

Select group for the Horizontal Axis (X axis),


and click add. Since there is only one dependent
variable, SPSS knows that maternal role
adaptation is on the Y axis without us needing the
specify this. Click Continue.

Since this procedure can be used with


multiple independent variables, we need
to specify which ones to run post hoc
comparisons for even though there is
only one in our design. Select group
for Post Hoc Tests for. This time, lets
select Bonferroni to see if it makes a
difference.

Under Display, select Descriptive


statistics and Estimates of effect
size. Then click Continue. In the
main dialog box, click Ok. The
output follows.

Post Hoc Tests

Profile Plots
Estimated Marginal Means of maternal role adaptation (low sores better)
19

Estimated Marginal Means

18

17

16

15

14
LBW Experimental

LBW Control

Full-term

GROUP

Compare this output to the output from the One-Way ANOVA and the results in the
textbook.
One difference is the appearance of the ANOVA summary table. Now, there is a
row labeled intercept and another labeled adjusted. You can ignore these. The F value
for Group is still the same, and that is what we are interested in. Notice the eta squared
column. What does it say for group? Does this value agree with the text? Unfortunately
SPSS does not calculate Omega squared, so you would have to do this by hand.
(Unfortunately, it also does not calculate any of the more useful effect size measures,
such as d. Did the Bonferroni and the previous LSD multiple comparisons yield the same
results?
You could edit any of the tables and graphs to look more elegant. For example,
the current title of the graph is cut off. You would probably want to name it something
else or use two lines of text. Editing the output would be ideal if you wanted to include
your output in a paper. Use what you learned in Chapters 3 and 4 of this Manual to do
so.
In this chapter, you learned 3 methods to calculate a One-Way ANOVA. I prefer
the General Linear Model approach since this is the only one that gives us the option of
calculating multiple comparisons and eta squared. Of course, you may feel otherwise
depending on the information you wish to calculate. Complete the following exercises.
Exercises
Each of these exercises is based on Eysenck recall.sav. This study is presented in section
16.1 in the textbook.
1. Use ANOVA to compare the means. Select a post hoc procedure of your choice.
Summarize the results.
2. Edit the ANOVA summary table so that it is suitable for inclusion in a paper.

3. Use SPSS to calculate eta squared. Note, how did you do this?
4. Create a bar chart to illustrate the differences between groups.

9. Comparing Means Using Factorial


ANOVA
Objectives
Examine main effects and interactive effects
Calculate effect size
Calculate multiple comparisons for main effects
Calculate simple effects for interactive effects
Display means graphically
Factorial ANOVA using GLM Univariate
A Factorial ANOVA is an analysis of variance that includes more than one
independent variable and calculates main effects for each independent variable and
calculates interactive effects between independent variables. To calculate Factorial
ANOVAs in SPSS we will use the General Linear Model again. Lets try an example
together. We will use the extension of the Eysenck study described in the textbook in
Chapter 17. Now there are 2 independent variables, condition and age, being considered
in relation to the dependent variable, recall.
Open Eysenck recall factorial.sav.
Select Analyze/General Linear Model/Univariate. Univariate means there is only
one dependent variable.

Select recall as the Dependent


Variable. Select age and condition as
the Fixed Factors or independent
variables. Then click on Plots.

Since we are testing 3 effects, 2 main and one


interactive, we may want to display 3 different
graphs. First, select age for the Horizontal
Axis and click Add. Then, select condition for
the Horizontal Axis and click Add. Finally,
select condition as the Horizontal Axis and age
for Separate Lines to illustrate any interactive
effect. I want to organize the interactive graph
this way because I think it will be easier to
interpret 2 lines representing the age groups
than 5 separate lines representing the
conditions. Click Continue.

Click on Post Hoc. Since age only has 2


levels, there is no need to calculate multiple
comparisons. If the effect is significant it
can only mean the older and younger groups
differ. Condition has 5 levels, so select it in
Post Hoc Tests for. Select LSD as the
procedure. Then, click Continue.

Click Options. Select Display Means for


and choose Age*Condtion. Select
Descriptive Statistics and Estimates of
Effect Size under Display. Then, click
Continue, and finally, Ok. The output
follows.

Between-Subjects Factors

AGE
CONDITIO

1
2
1
2
3
4
5

Value Label
Older
Younger
Counting
Rhy ming
Adjectiv e
Imagery
Intentional

N
50
50
20
20
20
20
20

Descriptive Statistics
Dependent Variable: RECALL
AGE
Older

Y ounger

Total

CONDITIO
Counting
Rhy ming
Adjectiv e
Imagery
Intentional
Total
Counting
Rhy ming
Adjectiv e
Imagery
Intentional
Total
Counting
Rhy ming
Adjectiv e
Imagery
Intentional
Total

Mean
7.00
6.90
11.00
13.40
12.00
10.06
6.50
7.60
14.80
17.60
19.30
13.16
6.75
7.25
12.90
15.50
15.65
11.61

St d. Dev iation
1.83
2.13
2.49
4.50
3.74
4.01
1.43
1.96
3.49
2.59
2.67
5.79
1.62
2.02
3.54
4.17
4.90
5.19

N
10
10
10
10
10
50
10
10
10
10
10
50
20
20
20
20
20
100

Tests of Between-Subjects Effects


Dependent Variable: RECALL
Source
Corrected Model
Intercept
AGE
CONDITIO
AGE * CONDI TI O
Error
Total
Corrected Total

Ty pe I II Sum
of Squares
1945.490a
13479.210
240.250
1514.940
190.300
722.300
16147.000
2667.790

df
9
1
1
4
4
90
100
99

Mean Square
216.166
13479.210
240.250
378.735
47.575
8.026

a. R Squared = .729 (Adjusted R Squared = .702)

Post Hoc Tests

F
26.935
1679.536
29.936
47.191
5.928

Sig.
.000
.000
.000
.000
.000

Et a Squared
.729
.949
.250
.677
.209

Profile Plots
Estimated Marginal Means of RECALL
13.5
13.0
12.5

Estimated Marginal Means

12.0
11.5
11.0
10.5
10.0
9.5
Older

Younger

AGE

Estimated Marginal Means of RECALL


18

16

14

12

10

6
Counti ng

Rhyming

CON DITIO

Adj ective

Imagery

Intentional

Estimated Marginal Means of RECALL


22
20
18
16
14
12
10

AGE
8
Older

6
4

Younger

Counti ng

Rhyming

Adj ective

Imagery

Intentional

CON DITIO

Compare this output to the results presented in the text.


As you can see, most of these results are in agreement. However, this is not the
case for effect size. The reason is that SPSS calculates partial eta squared which is
different from the computation in the text. SPSS uses the following equation:
SS A
where A refers to an independent variable. The result will be the same as
SSerror SS A
eta squared if there is only one independent variable because the denominator would
equal SStotal, but will differ when there are multiple independent variables. That explains
why the eta squared calculated in the previous chapter was in agreement with the value in
the text. This leaves us with 3 options: either report the adjusted eta squared, figure out
another way to calculate eta squared with SPSS, or calculate eta squared by hand. You
can use Compare Means/Means to calculate eta squared for the main effects. See if you
can remember how. (But you still dont have any of the d-family measures, and I dont
know any way to get them except by hand.)
Select Analyze/Compare Means/Means. Select recall for the Dependent List, and
age and condition in Layer 1 of the Independent List. Click Options and select
ANOVA table and eta. Click Continue and Ok. Just the relevant output is displayed
below.
Measures of Association

RECALL * AGE

Et a
.300

Et a Squared
.090

Measures of Association

RECALL * CONDITIO

Et a
.754

Et a Squared
.568

As you can see, these values agree with those in the text for age and condition.
You would still need to calculate eta squared for the interaction between age and
condition.
Simple Effects
Now that we know there is a significant interaction between age and condition,
we need to calculate the simple effects to help us interpret the interaction. The easiest
way to do this is to split the file using the Data/Split File menu selections. Then, we can
re-run the ANOVA testing the effects one independent variable on the dependent variable
at each level of the other independent variable. For example, we can see the effect of
condition on recall for younger participants and older participants. Because we will most
likely wish to run our significance test using MSerror from the overall ANOVA, we will
have to perform some hand calculations. After we get the new MS values for condition
in each group, we will need to divide them by MSerror from the original analysis as noted
in the text.
In Data Editor View, click on Data/Split file.

Select Organize output by groups,


and select age for Groups Based on.
Then, click Ok.

Now, we are going to calculate the effect of condition on recall for each age group, so
select Analyze/Compare Means/One-Way ANOVA.
Select recall as the Dependent Variable and condition as the Factor. Then click
Continue. There is no need to use Options to calculate means or create plots since we
already did that when we ran the factorial ANOVA. So, click Ok. The output follows.

AGE = Older
ANOVAa
RECALL

Between Groups
Within Groups
Total

Sum of
Squares
351.520
435.300
786.820

df
4
45
49

Mean Square
87.880
9.673

F
9.085

Sig.
.000

F
53.064

Sig.
.000

a. AGE = Older

AGE = Younger
ANOVAa
RECALL

Between Groups
Within Groups
Total

Sum of
Squares
1353.720
287.000
1640.720

df
4
45
49

Mean Square
338.430
6.378

a. AGE = Younger

Compare MScondition (between groups) in the above tables to those presented in the text.
As you can see, they are in agreement. Now, divide them by the MSerror from the
original ANOVA, 8.026. The calculations follow.
Fconditions at old =

87.88
10.95
8.026

Fconditions at young =

338.43
42.15
8.026

Thus, we end up with the same results. Although we had to perform some hand
calculations, having SPSS calculate the mean square for conditions for us certainly
simplifies things.
In this chapter you learned to calculate Factorial ANOVAs using GLM
Univariate. In addition, you learned a shortcut to assist in calculating simple effects.
Complete the following exercises to better familiarize yourself with these commands and
options.
Exercises
1. Using Eysenck factorial.sav, calculate the simple effects for age at various
conditions and compare them to the data in Table 17.4. [Hint: Split the file by
condition now, and run the ANOVA with age as the independent variable.]
2. Use the data in adaptation factorial.sav to run a factorial ANOVA where group
and education are the independent variables and maternal role adaptation is the
dependent variable. Compare your results to Table 17.5 in the textbook.

3. Create a graph that illustrates the lack of an interactive effect between education
and group on adaptation from the previous exercise.

10. Comparing Means Using Repeated


Measures ANOVA
Objectives
Calculate repeated measures ANOVAs
Calculate effect size
Conduct multiple comparisons
Graphically illustrate mean differences
Repeated measures ANOVAs are used to examine mean differences in related
variables. Typically the independent variable is either time (e.g., depression is measured
in the same group of people at multiple points in time) or condition (e.g., each subject
receives every condition). In SPSS, we will use the General Linear Model to calculate
repeated measures ANOVAs.
Using GLM Repeated Measures to Calculate Repeated Measures ANOVAs
Lets begin with an example from the exercises in Chapter 18 in the
Fundamentals book. In this example, the duration of migraine headaches was recorded
among the same group of individuals over 5 weeks. The first 2 weeks were part of a
baseline period and the final 3 weeks were part of an intervention period in which
subjects were trained to apply relaxation techniques. In this case, the independent
variable is time and the dependent variable is headache duration.
Open migraines.sav.
Select Analyze/General Linear Model/Repeated Measures.
The default for Within-Subject
Factor Name is Factor 1. Lets
change it to Time by typing in the
box. Specify 5 for Number of Levels
since there are 5 weeks and then click
Add. Next, click Define.

As you can see, there are 5 spots


under Within-Subject Variables.
We need to indicate that each week
is a variable, and we want to list
them in the right order. You can
either select them one at a time and
arrow them into the variable list, or
you can select them all by holding
down the control button while you
select each one, and arrow them in
at once. This is a simple design, so
there are no Between-Subject
Factors or Covariates. Click Plots.
Select time as the Horizontal Axis or Y
Axis. Then click Add, and Continue. This
will plot the mean duration of headaches for
each week for us. In the main dialog box,
click Options.

Under Display, select Descriptive


Statistics, Estimates of Effect Size, and
Observed Power. Then click Continue.
And finally, Ok. The output follows.

Descriptive Statistics
Mean
headache
duration week
headache
duration week
headache
duration week
headache
duration week
headache
duration week

1
2
3
4
5

St d. Dev iation

20.78

7.17

20.00

10.22

9.00

3.12

5.78

3.42

6.78

4.12

Profile Plots
Estimated Marginal Means of MEASURE_1
30

Estimated Marginal Means

20

10

0
1

TIME

As you can see, there is a lot of output, much of which we can ignore for our
purposes. Specifically, ignore Multivariate Tests, Tests of Within-Subjects Contrasts,
and Tests of Between Subjects Effects. The multivariate tests are another way to run the
analysis, and often not a good way. The contrasts give tests of linear and quadratic trends
in the data, and are not particularly of interest here. There are no between subjects
factors, so that output is not of interest. Now, lets look at the rest and compare it to the
answers in the text. First, you can compare the mean scores for each week by looking at
the Descriptive Statistics table. The next piece is Mauchlys Test of Sphericity, which
tests the assumption that each of the time periods is approximately equally correlated
with every other score. As noted in the text, when this assumption is violated, various
corrections are applied. Also, as noted in the text, this is not a particularly good test, but it
is about the best we have. The next table of interest, Tests of Within-Subjects Effects,
is what we really want to see. Compare the textbook values to those listed in the rows
marked Sphericity Assumed, because they were calculated the same way. As you can
see, they are in agreement.
Now, note the values for eta squared and observed power. Can you interpret
them? Nearly 73% of the variability in headache duration is accounted for by time.
Observed power is based on the assumption that the true difference in population means
is the difference implied by the sample means. Typically, we want to calculate power
going into an experiment based on anticipated or previous effect size in other similar
studies. This is useful in making decisions about sample size. So, observed power
calculated here is not particularly useful.

The graph is a nice illustration of the mean headache duration over time. You
may want to edit it to include more meaningful labels and a title.
Now, we need to calculate multiple comparisons to help us understand the
meaning of the significant effect of time on headache duration.
Multiple Comparisons
Lets just try one of the possible multiple comparisons, the comparison between
the overall baseline mean and the overall training mean. We can use SPSS
Transform/Compute to calculate these averages for us rather than doing it manually.
In the Data Editor window, select Transform/Compute.
Type baseline under Target
Variable. The under the list of
Functions, select MEAN and
arrow it into the dialog box. We
need to tell SPSS from what
variables to calculate the mean.
Select week1 and week2 to replace
the 2 question marks. Make sure
they are separated by a comma and
the question marks are gone.
Then, click Ok.

Look at the new variable in the Data Editor. Does it look right?
Click Transform/Compute again. Click Reset to remove the previous information.
Name the next Target Variable training. Select MEAN again. Specify, week3,
week4, and week5. Make sure the question marks are gone and commas separate each
variable. Then, click Ok. Check out your new variable.
Use Analyze/Descriptives to calculate the means for baseline and training. The data
follow.

As you can see, the means are consistent with those reported in the textbook. Now, you
can apply formula using MSerror from the ANOVA. The computations follow.

20.39 7.17
13.20

9.14
1
1
2.086
22.53( )
18 27

Although some hand calculations are required, we saved time and reduced the likelihood
of making errors by using SPSS to compute the new mean scores for baseline and
training for us.
In this chapter, you learned to use the General Linear Model to calculate repeated
measures ANOVAs. In addition, you learned to use SPSS to calculate new means for use
in multiple comparisons. Try the following exercises to help you become more familiar
with the process.
Exercises
The following exercises are based on Eysenck repeated.sav.
1. Use a repeated measures ANOVA to examine the effect of condition on recall.
Compare your results to those presented in the textbook in Section 18.7.
2. Use SPSS to calculate the effect size of condition.
3. Plot the mean difference in recall by conditions.
4. Use SPSS to calculate the mean of counting, rhyming, adjective, and intentional
and label it lowproc for lower processing. Then use the multiple comparisons
procedure explained in the textbook to compare the mean recall from the lower
processing conditions to the mean recall for imagery, which was the highest
processing condition. Write a brief statement explaining the results.

11. Chi Square


Objectives
Calculate goodness of fit Chi Square
Calculate Chi Square for contingency tables
Calculate effect size
Save data entry time by weighting cases
A Chi Square is used to analyze categorical data. It compares observed
frequencies to expected or predicted frequencies. We will examine simple goodness of
fit Chi Squares that involved only one variable and more complicated contingency tables
that include 2 or more variables. Each type is programmed through different menu
options. Lets start with goodness of fit.
Goodness of Fit Chi Square All Categories Equal
Lets begin by using a new example in Chapter 19 on the frequency with which a
school yard player will throw Rock, Paper, Scissors. We want to test the null
hypothesis that they are thrown equally often.
Open RPS.sav. This file contains a string variable (Choice), a numerical variable
(NumChoice = 1, 2, 3) numbering the choices, and another numerical variable named
Freq, containing the frequency of each choice. We need NumChoice because SPSS does
not allow you to specify a string variable as a test variable.
Go to Data/Weight Cases and select Freq as the weights.
Select Analyze/Nonparametric Tests/Chi Square.
Select NumChoice as the Test
Variable. Under Expected Values,
All categories equal is the default.
This is what we want since our null
hypothesis is that each throw is
equally likely to be chosen. Click Ok.
The output follows.

Chi-Square Test
Frequencies

As you can see, the expected values were 25 each, just as we expected. Now,
compare this Chi Square to the value computed in the text. Once again, they are in
agreement.
Goodness of Fit Chi Square Categories Unequal
Now, lets try an example where the expected values are not equal across
categories. The difference is we have to specify the expected proportions. This example
is based on Exercise 19.3 in the text, but the numbers in the data set are slightly different.
In the exercise, Howell discusses his theory that when asked to sort one-sentence
characteristics like I eat too fast into piles ranging from not at all like me to very
much like me, the percentage of items placed in each pile will be approximately 10%,
20%, 40%, 20%, and 10%. In our data set, the frequencies are 7, 11, 21 ,7, and 4
respectively.
Open unequal categories.sav. There is no need to save RPS.sav since we did not
change the data file in anyway.
Choose Data/Weight Cases and use Frequency as the weighting variable.
Select Analyze/Nonparametric Statistics/Chi Square.

Select Category as the Test


Variable. Under Expected Values,
select Values. Now, we have to type
in the expected proportion of cases
that should fit each category. Note
that even those it requests
expected values, it really wants
expected proportions. These must
be specified in order to match the
ascending numeric order of the
categories in our data files (e.g., 1 =
not at all like to 5 = very much like).
So, type 10, click Add. Type 20,
click Add, etc. Then, click Ok. The
output follows.

Chi-Square Test
Frequencies
RATING

not at all like me


somewhat unlike me
neither like me or
unlike me
somewhat like me
v ery much like me
Total

Observ ed N
7
11

Expected N
5.0
10.0

Residual
2.0
1.0

21

20.0

1.0

7
4
50

10.0
5.0

-3.0
-1.0

Test Statistics
Chi-Square a
df
Asy mp. Sig.

RATING
2.050
4
.727

a. 0 cells (.0%) hav e expected f requencies less than


5. The minimum expected cell f requency is 5.0.

As you can see, SPSS calculated the expected values based on the proportions that
we indicated-check the math if you would like. In this case, the fact that the Chi Square
is not significant supports the hypothesis. The observed frequencies of ratings fit with
the predicted frequencies.
Chi Square for Contingency Tables
Lets use an example illustrated in the text. We want to examine the hypothesis
that Prozac is an effect treatment to keep anorexics from relapsing.
As you can see the data are nicely displayed in Table 19.4 in the text.
Select File/New/Data.

In Variable View, create two variables. Name one fault and specify the Values such
that 1 = low fault and 2 = high fault. Name the other variable verdict and specify the
Values such that 1= guilty and 2 = not guilty. Then return to the Data View.
There are four possible combinations of the two variables, as illustrated in the text.
They are Drug/Success, Drug/Relapse, Placebo/Success, and Placebo/Relapse. So,
enter 1,1,2, 2 under Drug and 1, 2, 1, 2 under Outcome, in the first four rows. Then
add a column labeled Freq containing the frequencies for each cell. A sample follows.

Select Data/Weight Cases.


Select Weight cases by and
select counts as the Frequency
Variable. Click Ok. Until we
turn this off, SPSS will run
analyses based on the
frequencies we have specified
here.

Select Analyze/Descriptive Statistics/Crosstabs.

To be consistent with the


presentation in the text, select
Treatment for Rows and
Outcome for Columns. Select
Display clustered bar charts
to help us visualize the data.
Click on Statistics.

Select Chi-square, and then click Continue.


(One would have thought that chi-square
would be the default, but oddly enough it
isnt. Under Nominal, select Phi and
Cramers V as well so we can get a measure
of effect size. In the main dialog box, click
on Cells.

Under Count, select Observed and


Expected. Under Percentages, select Row,
Column, and Total. Then click Continue.
In the main dialog box, click Ok. The output
follows.

Compare the Expected Counts to the values in the text. Finally, compare the Chi Square
values. We are interested in the Pearson Chi Square because it was calculated the same
way as the one in the textbook. Once again, the results are consistent with the textbook.
In this chapter you learned to use SPSS to calculate Goodness of Fit tests with and
without equal frequencies. You also learned to calculate Chi Square for contingency
tables, and learned a trick to reduce data entry by weighting cases. Complete the
following exercises to help you become familiar with these commands.
Exercises
1. Using alley chosen.sav, use a Goodness of Fit Chi Square to test the hypothesis
that rats are more likely than chance to choose Alley D.
2. Solve Exercise 19.3 from the textbook using SPSS. Create the data file yourself.
3. Create your own data file to represent the observed data presented in the textbook
in Table 19.2 using Weight Cases.
4. Using the data file you created in Exercise 3, calculate a Chi Square using
crosstabs to examine the hypothesis that the number of bystanders is related to

seeking assistance. Be sure to calculate Cramers Phi. Compare your results to


the textbook.

12. Nonparametric Statistics


Objectives
Calculate Mann-Whitney Test
Calculate Wilcoxons Matched-Pairs Signed-Ranks Test
Calculate Kruskal-Wallis One-Way ANOVA
Calculate Friedmans Rank Test for k Correlated Samples
Nonparametric statistics or distribution-free tests are those that do not rely on
parameter estimates or precise assumptions about the distributions of variables. In this
chapter we will learn how to use SPSS Nonparametric statistics to compare 2
independent groups, 2 paired samples, k independent groups, and k related samples.
Mann-Whitney Test
Lets begin by comparing 2 independent groups using the Mann-Whitney Test.
Well use the example presented in Table 20.1 in the textbook. We want to compare the
number of stressful life events reported by cardiac patients and orthopedic patients.
Open stressful events.sav.
Select Analyze/Nonparametric Tests/Two Independent Samples.
Select data as the Test
Variable and group as the
Grouping Variable. Click
on Define Groups and
specify 1 for Group 1 and 2
for Group 2, then click
Continue. Under TestType, select Mann-Whitney
U. Then click on Options.

Under Statistics, select


Descriptives. Then click
Continue. In the main
dialog box, click Ok. The
output follows.

Compare this output to the results in Section 20.1 of the textbook. Specifically,
focus on the row labeled Wilcoxon W in the Test Statistics table. As you can see they
are the same. There is not a statistically significant difference in stressful life events for
the 2 groups. But if this is the Mann-Whitney test, why did I tell you to look at
Wilcoxons W? The reason is that I cheated in the text. To avoid talking about two
Wilcoxon tests, I called this one the Mann-Whitney (which is basically true) but showed
you how to calculate the Wilcoxon statistic. It honestly doesnt make any difference.
Wilcoxons Matched Pairs Signed-Ranks Test
Now, lets compare paired or related data. We will use the example illustrated in
Section 20.2 of the textbook. We will compare the volume of the left hippocampus in
twin pairs, one of whom is schizophrenic and one of whom is normal.
Open Hippocampus Volume.sav.
Select Analyze/Nonparametric Tests/2 Related Samples.

Select Normal and


Schizophrenic for the
Test Pairs List. Select
Wilcoxon for Test Type.
Then, click Ok. The
output follows.

The Sum of Ranks column includes the T values. Compare them to the values in
the text. Note that the test statistic in SPSS is z. Regardless, the results are the same.
There is a significant difference in hippocampal volume between normals and
schizophrenics..

Kruskal-Wallis One-Way ANOVA


Now lets compare more than 2 independent groups. Well use the example
illustrated in Table 20.4 of the text, comparing the number of problems solved correctly
in one hour by people who received a depressant, stimulant, or placebo drug.
Open problem solving.sav.
Select Analyze/Nonparametric Test/ K Independent Samples.

Select problem as the


Test Variable and group
as the Grouping
Variable. Then, click on
Define Range.

Indicate 1 for the Minimum and 3 for the


Maximum since there are 3 groups, identified
as 1,2, and 3. Click Continue.

Kruskal-Wallis is already selected in the main dialog box, so just click Ok. The output
follows.

Kruskal-Wallis Test
Ranks

PROBLEM

GROUP
1
2
3
Total

N
7
8
4
19

Mean Rank
5.00
14.38
10.00

Test Statisticsa,b

Chi-Square
df
Asy mp. Sig.

PROBLEM
10.407
2
.005

a. Kruskal Wallis Test


b. Grouping Variable: GROUP

As you can see these results agree with those in the text, with minor differences in
the decimal places. This is due to rounding. Both sets of results support the conclusion
that problems solved correctly varied significantly by group.
Friedmans Rank Test for K Related Samples
Now, lets move on to an example with k related samples. Well use the data
presented in Table 20.5 of the textbook as an example. We want to see if reading time is
effected when reading pronouns that do not fit common gender stereotypes.
Open pronouns.sav.
Select Analyze/Nonparametric Tests/K Related Samples.

Select She, He, and


They as the Test
Variables.
Friedman is the
default for Test Type,
so we can click Ok.
The output follows.

Ranks

HESHE
SHEHE
NEUTTHEY

Mean Rank
2.00
2.64
1.36

Test Statisticsa
N
Chi-Square
df
Asy mp. Sig.

11
8.909
2
.012

a. Friedman Test

As you can see, the Chi Square value is in agreement with the one in the text. We
can conclude that reading times are related to pronoun conditions.
In this chapter, you learned to use SPSS to calculate each of the Nonparametric
Statistics included in the textbook. Complete the following exercises to help you become
familiar with each.
Exercises
1. Using birthweight.sav, use the Mann-Whitney Test to compare the birthweight of
babies born to mothers who began prenatal care in the third trimester to those who
began prenatal classes in the first trimester. Compare your results to the results
presented in Table 20.2 of the textbook. (Note: SPSS chooses to work with the
sum of the scores in the larger group (71), and thus n1 and n2 are reversed. This
will give you the same z score, with the sign reversed. Notice that z in the output
agrees with z in the text.)
2. Using anorexia family therapy.sav (the same example used for the paired t-test in
Chapter 7 of this manual), compare the subjects weight pre and post intervention
using Wilcoxons Matched Pairs Signed Ranks Test. What can you conclude?
3. Using maternal role adaptation.sav (the same example used for one-way
ANOVA in Chapter 8 of this manual), compare maternal role adaptation for the 3
groups of mothers using the Kruskal-Wallis ANOVA. What can you conclude?
4. Using Eysenck recall repeated.sav (the same example used for Repeated
Measures ANOVA in Chapter 10 of this manual), examine the effect of
processing condition on recall using Friedmans Test. What can you conclude?

You might also like