An Introduction To The Resource Pack: Roger Stern, Eleanor Allan, Carlos Barahona and Ian Dale
An Introduction To The Resource Pack: Roger Stern, Eleanor Allan, Carlos Barahona and Ian Dale
An Introduction To The Resource Pack: Roger Stern, Eleanor Allan, Carlos Barahona and Ian Dale
Contents
1. Introduction 3
1.1. Why this resources pack? 3
1.2. Who is the resource pack for? 3
1.3. What does the resource pack contain? 3
1.4. Whats in this introductory guide? 3
2. An alternative approach to statistics training 4
2.1. Less theory, more practice 4
2.2. Good graphs and tables 5
2.3. Using the statistical software 5
2.4. Statistical games 7
2.5. Good-practice guides 9
2.6. More concepts 10
2.7. Training in computing for statistics 10
3. Using a spreadsheet with SSC-Stat 11
3.1. Data layout 12
3.2. Exploratory graphics 12
3.3. Data and results 13
3.4. Analysis 14
3.5. Structured data 15
3.6. Behind the scenes 15
3.7. Help 16
3.8. Moving on 16
3.9. Further use of Excel 16
4. Using a statistics package Instat 16
4.1. Menus 17
4.2. Analysis 18
4.3. Help 19
4.4. Moving on 20
5. Training courses in statistics 20
5.1. In-service training 20
5.2. Tailored training 21
5.3. Undergraduate training 21
5.4. Postgraduate training 21
6. In conclusion 22
1. Introduction
3
Guide to the SSC Resources
Reading. However, the ideas are general and do not depend entirely on our resources. They can,
for instance, be adapted and used with the users own software.
The proposed approach:
Builds on the knowledge and familiarity of students and training with spreadsheets
Encourages the use of software to discuss concepts rather than the mechanics of how
to generate output
Uses a set of good practice guidelines prepared to help in real research as an aid to
planning and interpretation of statistical analysis
Introduces the use of statistical games that may be used in conjunction with the
software and to turn statistics courses into an enjoyable experience.
Some brief consideration is also given to training in efficient statistical computing.
Section 3 outlines the key features of our Excel add-in: SSC-Stat. Section 4 outlines the use of
Instat. Section 5 outlines some training courses in which SSC-Stat, Instat and these other
resources have been used.
There should be less emphasis on formulae and hand calculations, to allow more time for
students to fully understand computer output and key concepts.
Courses tend to cover topics in an old-fashioned sequence of increasing complexity, hence
the methods are seen as a disjointed set of techniques and key concepts are often missed.
Introducing techniques according to their analytical complexity is less important now that
computers will handle the number crunching.
Courses often only include artificially small data sets. This does not prepare students for data
management and for simple descriptive methods with structured data, that are needed later.
Courses often stop too soon. Important subjects, such as regression modelling with non-
normal data, are omitted, because they are not possible to do by hand.
We should stress that we are not against the occasional hand-analysis, but it should be done
together with the corresponding computer analysis. If the computer output can be shown first, the
calculations will help students to understand the results. On the positive side we have found that
constructive integration of computers for demonstrations and practical exercises helps
participants to enjoy their statistics training, often for the first time!
4
Guide to the SSC Resources
Figure 2e shows a table of percentages 1, while Fig. 2f shows a two-way table of median yields from
the data in Fig. 2d. The tables include a tool-tip that is designed both to illustrate the importance of
understanding the data behind the summary values in a table, and to teach about percentages and
percentiles.
1
Many people lack confidence in their use or understanding of percentages. Calculations of different types of
percentages, including those that result from multiple response data, are described in Chapter 13 of the Instat
Introductory Guide (available from the help menu within Instat).
5
Guide to the SSC Resources
Fig. 2e. Two-way table of percentages Fig. 2f. Median rice yields by village and variety
The use of realistic examples permits training courses to spend longer on the important ideas of
descriptive statistics. In our experience when students and in some cases trainers are asked to
prepare outline tables and graphs that correspond to the objectives of their study, they often do not
know where to begin. However when descriptive statistics are calculated and appropriately
presented, they promote interesting discussions about statistics and the topic of analysis. This
enhances the teaching and learning process.
In many situations a descriptive summary, i.e., the appropriate use of tables and graphs are most of
what is needed. Probability and inferential ideas are needed later.
Example 2: Instat can also be used to support the discussion of probability ideas. As a
demonstration, Fig. 2g displays the distribution of the difference between two normal distributions,
and could for example be used to illustrate probability ideas or ideas about variability in data.
Fig. 2g. The difference between two normal distributions (green line)
Further examples, illustrating probability ideas similar to that in Fig. 2g, such as the sampling
distribution of the mean, and the central limit theorem, are given in Chapter 14 of the Instat
Introductory Guide. CHANDRA make these Italics a global change at each mention
Example 3 is illustrated in Fig. 2h, and concerns statistical inference. Many people have only a
hazy notion of what is meant by a confidence interval. The figure shows the yield of rice from a rice
production survey (Fig. 2d) together with estimates of the 20% and 80% points, and also confidence
limits for these percentage points. This is a good example of how to use software, in a way that is
meaningful to the researcher, to illustrate statistical concepts. And anyone who understands what is
6
Guide to the SSC Resources
meant by the 95% confidence limits for the 80% point of a set of data will certainly have mastered
the basic ideas of statistical inference!
Fig. 2h. Confidence limits are not just for the mean!
These games are available both as hand exercises and on the computer2. One useful feature is
that they can provide different challenges to trainees, depending on their ability, or previous
2
Initial developments were done at the Department of Applied Statistics in the 1970s. Since then the games have
been used widely, both in the UK and overseas. The games were updated by the Statistical Services Centre of the
University of Reading in the 1990s as part of a teaching initiative in UK universities, and further updating by the School
of Applied Statistics is currently in progress
7
Guide to the SSC Resources
knowledge. For example the rice survey typifies many real studies that collect data at multiple
levels: village, field, and plot. In a first course we can illustrate how such data may be summarised.
In later courses planning aspects, or more complex parts of the analysis can be discussed.
A guide has recently been prepared to show how these games are being used in teaching statistics
to agriculture students in the University of Nairobi. This includes information on how to generate the
(March 2005).
8
Guide to the SSC Resources
hand versions as well as how to adapt the existing games, or produce new ones. See the
Resources CD or http://www.uonbi.ac.ke/acad-depts/BUCS/ for more details.
3
The UK Governments Department for International Development (DFID) supported the production of these guides
and encouraged their wide circulation. In 2004 the set of guides was updated and republished as a book (see page
3).
9
Guide to the SSC Resources
Two of the guides relate to the use of Excel. Many analyses would be much easier if data were
entered into Excel in a controlled way. We describe what this means in the guide called Disciplined
use of spreadsheet packages for data entry.
The guide entitled The role of a database package in managing research data describes the types
of data that might be too complicated to enter into a spreadsheet. We compare the organising of
data in a spreadsheet with that in a database to provide guidance on the advantages and limitations
of each type of software.
The titles of the guides on data analysis and presentation are also shown in Fig. 2c. We use the
guide called Key concepts of inferential statistics in many of our courses. In some it provides a
summary of the main ideas from the training. On more advanced courses, we assume that these
concepts, that include standard errors, confidence limits and significance tests, are well
understood by the participants. However, the concepts are often poorly understood, and so this
guide can be used as preparatory reading before topics are discussed, if necessary, in a review
session at the start of the course.
10
Guide to the SSC Resources
Trainers should give some consideration to this issue, that is not discussed here. Elsewhere we
describe alternative ways of using Instat, and any other statistics package, so users can identify an
appropriate strategy, both for themselves and for others.4
4
This topic is discussed at greater length in the Instat Introductory Guide: Chapter 4 outlines alternative ways of using
a statistics package, Chapter 10 considers strategies for statistical software in more detail, and Appendix 1 shows
how to write and use a macro to automate or extend an analysis.
11
Guide to the SSC Resources
Fig. 3b. A view of the SPSS statistics package showing the same data and equivalent menus
5
Look in Excels Help for the entry called Guidelines for creating a list on a worksheet.
6
Often a column will contain numbers only. Then there should be no explanatory text added in the cell itself. Use
Excels facility for Comments if you wish to add an explanation. If the column contains text, like Yes, Maybe, No,
that will be used as categories, then make sure they are spelled consistently down the whole column.
12
Guide to the SSC Resources
We describe how to get this graph in the SSC-Stat tutorial. It is a standard Excel graph and could
be constructed in a few extra steps of data manipulation without the SSC-Stat add-in. However
SSC-Stat makes it very easy to construct this type of graph for the data in list format. For example,
we could also quickly try the same graph for each village instead of each variety. Unless
constructing such exploratory graphs is made easy, we find this step is often omitted from an
analysis. Graphs are an important way both to look at data and to present results in a report.
Fig. 3d. Boxplot of Yield Fig. 3e. Boxplots of Yield by Variety
A boxplot is a commonly used exploratory tool but is not available in Excel. Figures 3d and 3e
show boxplots of the data in Fig. 3c, produced by SSC-Stat, for all the yields together and
separately for each variety.
13
Guide to the SSC Resources
From a results sheet you need to return to the sheet containing the data for the next analysis. This
is so that Excel or SSC-Stat recognizes the data as opposed to the results. In SSC-Stat, an
alternative is to define the data area. This is then remembered each time you use SSC-Stat in the
current session and saves you the trouble of returning to the data sheet.
3.4. Analysis
SSC-Stats menu for analysis is highlighted in Fig. 3f. The first two options give descriptive
summary statistics, while the other options are for simple modelling. The dialogue for the
Descriptive Statistics option is shown in Fig. 3g.
The option in the dialogue to give some of the same statistics shown graphically in Fig. 3d is shown
in Fig. 3h. The results are in Fig. 3i.
14
Guide to the SSC Resources
Fig. 3h. Additional statistics added Fig. 3i. Results from using the dialogue
15
Guide to the SSC Resources
processing numeric columns. For text columns, non-blank rows would not be ignored, but you can
always choose to hide these rows before processing the data.
Some articles have complained that the algorithms used by Excel for statistical calculations are not
sound.7 We explain the implications of this in the SSC-Stat tutorial, but are also confident in the
accuracy of the calculations used in SSC-Stat.
3.7. Help
SSC-Stat includes help at three different levels. There is overall Help, accessed via the General
menu. Then each menu has its own Help, so there is help on Manipulation, Visualization and
Analysis. There is also Help on each individual dialogue. In addition, the good-practic guides
(Section 2) that relate to the use of Excel, are available from within the Help system.
3.8. Moving on
Finally, some users may be disappointed at the lack of more powerful statistical facilities in SSC-
Stat. Where, for example, is the multiple regression, or the powerful analysis of variance? Our view
is that these methods are better handled by a standard statistics package. We have made the
menus in SSC-Stat similar to those in many statistics packages. We hope this will help the learning
process when those who need more than can be done in Excel add a statistics package to their
repertoire. You could be surprised how easy statistics packages have become to use. If you need
more convincing before obtaining one of the standard packages, you could start with Instat, that is
described in the next section.
7
See, for example McCullough and Wilson (1999) On the accuracy of statistical procedures in MS Excel
97Computational Statistics and Data Analysis 31, 27-37, or http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf. A
more positive view, with which we agree, is in http://www.agresearch.cri.nz/Science/Statistics/exceluse1.htm.
8
Macros are the basis of the SSCs 1-day course Taking Microsoft Excel further: macros for data management and statistic Notes
and examples used on the course are supplied on the CD.
16
Guide to the SSC Resources
Some users are fearful of learning to use a statistics package, just as they lack confidence in
statistics. One use of Instat is as a starter-package for those who require analyses that exceed
those that are desirable in a spreadsheet. We find that most newcomers are pleasantly surprised
by how easy it now is to use a statistics package.9
Once the step of using a statistics package has been taken, appropriate use of the software can
support data analysis and the teaching of statistics. They can help overcome the blocks that some
users have of the subject. A second use of Instat is to support the teaching of statistics. This was
outlined in Section 2 and is described in detail in the Instat Introductory Guide.10
4.1. Menus
Figure 4b shows the two main windows in Instat, with the same data as in Sections 2 and 3. The
principal menus in Instat and in many other statistics packages are :
Manage to organize the data for analysis
Graphics for exploratory and presentation graphs
Statistics to analyse the data
The menus File, Edit, Window, and Help, will be familiar from other Windows software. The
special menu called Submit is for those who use commands or macros to automate parts of their
analyses. The Climatic menu is optional. It provides special facilities for processing climatic data,
and includes its own user guide and help facilities.
This guide and the Instat Introductory Guide are unusual in that they frequently mention other
statistics packages. Some users may find Instat to be sufficient for their needs, but others will use it
as a stepping-stone to using a more powerful statistics package.
It used to be difficult to mix statistics packages. Not only did you have to learn a new language, but
also it was tedious to transfer data between packages. Now data transfer is easy, and all packages
are similar to use. So, cost apart, it is feasible to use a mix of statistics packages. In Section 5 we
give examples from our training courses.
9
Occasionally we are asked the converse: competent users of a powerful statistics package wonder whether they should spend
time learning to use a spreadsheet to assist in their statistical work. We believe that there is no strong case for this.
10
When Instat is installed, the Instat Introductory Guide is available both as a Windows Help file, and in pdf format for
printing. The pdf version is also available as a separate download from http://www.ssc.rdg.ac.uk/.
17
Guide to the SSC Resources
4.2. Analysis
Fig. 4c. Instats statistics menu Fig. 4d. Sub-menu for summary statistics
The main statistics menu is shown in Fig. 4c, together with the sub-menu for summary statistics in
Fig. 4d. As an example, we show the dialogue for a grouped frequency distribution in Fig. 4e, with
some results in Fig. 4f. For example, we see that of the 36 farmers, 9 (i.e., 25%) did not apply any
fertilizer.
18
Guide to the SSC Resources
One way of looking at multiple columns together is to provide tables of summary statistics. Tables
are a powerful feature of Instat, just as Pivot tables are a strength of Excel. Examples were given in
Section 2, Figs 2f and 2g, and are described in more detail in Chapter 13 of the Instat Introductory
Guide.
The main menus in Instat for statistical modelling are the second group in Fig. 4g, namely Simple
Models, Analysis of Variance and Regression. We show the sub-menu for regression in Fig. 4g,
which includes options for both simple and multiple regressions. The use of these menus is
described in Chapters 15 to 17 of the Introductory Guide.
Fig. 4g. Instats Regression sub-menu
4.3. Help
The Help supplied with Instat is extensive. The series of about 20 good-practice guides mentioned
in Section 2 are provided with Instat, in both printable and Windows help file formats.
The Introductory Guide, part of which is shown in Fig. 4h (overleaf), is also supplied, both as a
Windows help file and in printable format. It includes chapters that examine, in more detail, the
teaching ideas mentioned in this Guide.
19
Guide to the SSC Resources
4.4. Moving on
Most statistics packages are designed primarily to support data analysis, but can also be used in
training courses. Instat is designed the other way round. It is intended largely to support training, but
can also be used for data analysis.
Most chapters in the Introductory Guide have an In Conclusion section, were we also mention the
limitations of Instat, for those users who require more. Just as it is easy to use Excel together with a
statistics package, so it is simple to integrate the use of multiple statistics packages, both within
training courses and for data analysis.
Within training courses it is indeed valuable that students become familiar with more than one
package for their statistical work. This provides them with the confidence to move to other
packages in the future, should the need arise.
20
Guide to the SSC Resources
other well-known packages. The main aim is to teach statistical ideas rather than computing.
Some participants therefore stick to the package with which they are familiar, while others use the
opportunity to explore different software, either for its own sake, or because it is the simplest for a
particular practical.
21
Guide to the SSC Resources
mix of relevant software including Excel (with SSC-Stat) and Instat as well as software readily
available within their own departments.
6. In conclusion
The target audience for this resource pack is the large number of professionals who need to
analyse data, but lack confidence in their statistical ability. If statistics can be made more relevant
and accessible, it could help research and development projects to collect and process data more
effectively. This, in turn, will help the quality assurance and the process of evidence-based decision
making.
The SSCs experiences, as providers of statistical training and consultancy, have led us to believe
that training in statistics needs to be broadened. More emphasis needs to be placed on statistical
concepts and practice, and less on theory and formulae. We also need to be more imaginative in
teaching study design and data management skills.
This resource pack is a collection of resources that we have developed and used in our own work,
and we offer them here for others to use. They include: Change this list to bullets
(i) An Excel add-in
(ii) A user-friendly software package developed with teaching in mind
(iii) Several statistical games which simulate different scenarios to help understand key
statistical concepts
(iv) A range of short good practice booklets, many of which discuss statistical concepts
from planning through to presentation of results.
Improved confidence with statistics can be acquired relatively quickly and easily, either via a short
course, provided it is the right type of short course, or by some self-study. We therefore hope that
the materials will be of interest and use to both trainers of statistics and anyone attempting to carry
out their own data analysis. We encourage trainers to adapt our ideas and to incorporate some of
the materials into their own work. Using their own datasets and software are two obvious examples
of adapting our resource pack. The SSC would welcome feedback from trainers on their
imaginative use of the resource pack.
22