Statistics and Probability in High School
Statistics and Probability in High School
Statistics and Probability in High School
and
School
Manfred Borovcnik
University of Klagenfurt, Austria
Statistics and probability are fascinating fields, tightly interwoven with the context of
the problems which have to be modelled. The authors demonstrate how investigations
and experiments provide promising teaching strategies to help high-school students
acquire statistical and probabilistic literacy.
Carmen Batanero and
In the first chapter the authors put into practice the following educational principles, Manfred Borovcnik
reflecting their views of how these subjects should be taught: a focus on the most relevant
ideas and postpone extensions to later stages; illustrating the complementary/dual nature
of statistical and probabilistic reasoning; utilising the potential of technology and show
its limits; and reflecting on the different levels of formalisation to meet the wide variety
of students’ previous knowledge, abilities, and learning types.
The remaining chapters deal with exploratory data analysis, modelling information by
probabilities, exploring and modelling association, and with sampling and inference.
Throughout the book, a modelling view of the concepts guides the presentation.
In each chapter, the development of a cluster of fundamental ideas is centred around a
statistical study or a real-world problem that leads to statistical questions requiring data
in order to be answered. The concepts developed are designed to lead to meaningful
solutions rather than remain abstract entities. For each cluster of ideas, the authors review
the relevant research on misconceptions and synthesise the results of research in order to
support teaching of statistics and probability in high school.
What makes this book unique is its rich source of worked-through tasks and its
focus on the interrelations between teaching and empirical research on understanding Carmen Batanero and Manfred Borovcnik
statistics and probability.
ISBN 978-94-6300-622-4
SensePublishers DIVS
Spine
12.522 mm
Statistics and Probability in High School
Statistics and Probability in High School
Carmen Batanero
Universidad de Granada, Spain
and
Manfred Borovcnik
University of Klagenfurt, Austria
A C.I.P. record for this book is available from the Library of Congress.
ISBN: 978-94-6300-622-4 (paperback)
ISBN: 978-94-6300-623-1 (hardback)
ISBN: 978-94-6300-624-8 (e-book)
Published by: Sense Publishers,
P.O. Box 21858,
3001 AW Rotterdam,
The Netherlands
https://www.sensepublishers.com/
All chapters in this book have undergone peer review.
Printed on acid-free paper
All Rights Reserved © 2016 Sense Publishers
No part of this work may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, electronic, mechanical, photocopying, microfilming,
recording or otherwise, without written permission from the Publisher, with the
exception of any material supplied specifically for the purpose of being entered and
executed on a computer system, for exclusive use by the purchaser of the work.
TABLE OF CONTENTS
Preface ix
Experiments 81
3.4.3. Distribution Models for Standard Situations
83
3.4.4. Central Theorems
88
3.5. Synthesis of Learning Goals 92
3.5.1. Concepts to Model and Investigate Uncertain Situations
92
3.5.2. Different Connotations of Probability
93
3.5.3. Circumstantial Evidence and Bayes’ Formula
97
3.5.4. Random Variables and Expectation
99
3.5.5. Standard Models of Distributions
100
3.5.6. Law of Large Numbers and Central Limit Theorem
101
3.6. Students’ Reasoning and Potential Difficulties 104
3.6.1. Misconceptions and Heuristics (Strategies) in Probability
Situations 104
3.6.2. Independence and Conditional Probability
107
3.6.3. Taking into Account Students’ Reasoning to Improve Teaching 108
Memory 109
3.7.2. Odds and Bayes’ Formula – Revising Weights of Evidence
109
3.7.3. Mediating Tools to Support Teaching
110
vi
TABLE OF CONTENTS
vii
TABLE OF CONTENTS
References 197
viii
PREFACE
1
In some countries the term stochastics is used to highlight the mutual dependence between
probabilistic and statistical knowledge and reasoning. Throughout the book we occasionally use
stochastics for statistics and probability to express our view that these fields are tightly
interconnected and should be taught together.
ix
PREFACE
x
CHAPTER 1
In this chapter, we describe those principles, which reflect our view of how stochastics
should be taught at high-school level. Firstly, we suggest the need to focus on the most
relevant ideas for the education of students. Secondly, we analyse the complementary nature
of statistical and mathematical reasoning on the one side and of statistical and probabilistic
reasoning on the other side. We then examine the potential and limits of technology in
statistics education and reflect on the different levels of formalisation that may be helpful to
meet the wide variety of students’ previous knowledge, abilities, and learning types.
Moreover, we analyse the ideas of probabilistic and statistical literacy, reasoning and sense
making. Finally, we demonstrate how investigations and experiments provide promising
teaching strategies to help high-school students to develop these capabilities.
1.1. INTRODUCTION
Statistical methods are important not only in various disciplines in science but
also for government and business systems, e.g., health records or retirement
pension plans. It is essential to understand statistics and probability to critically
evaluate how statistics are generated and to justify decisions, be they societal or
personal (Hall, 2011). Accordingly, statistics educators, educational authorities, as
well as statistical offices and societies call for a statistically literate society and
support projects that help children and adults to acquire the competencies needed
in the era of data information.
The aim of this chapter is to clarify what is meant by statistical literacy,
statistical thinking, statistical reasoning, and sense making. These aspects of
learning statistics have been widely discussed in the Statistical Reasoning,
Thinking, and Literacy (SRTL) Research Forum, a series of conferences
(srtl.fos.auckland.ac.nz/) starting in 1999 as well as in Ben-Zvi and Garfield (2004)
and Garfield and Ben-Zvi (2008). We also suggest that students can acquire
relevant competencies through statistical projects and investigations.
Teaching statistics and probability at high-school level is often embedded within
mathematics. However, due to its peculiarities, statistics and probability require
special attention on the part of teachers and curriculum designers in relation to the
selection of content and the best way to make the statistical ideas accessible to the
students. The goal of Chapter 1 is to present our overall perspective on the teaching
of statistics and probability. This perspective is made more explicit in Chapters 2 to
5 that deal with the teaching of the main ideas of this subject at high-school level.
Given that the time available for teaching is limited, it is important to select the key
concepts that should be taught carefully. Several authors (e.g., Borovcnik &
Kapadia 2014a;; Borovcnik & Peard, 1996;; Burrill & Biehler, 2011;; Heitele, 1975)
have investigated the history of statistics, the different epistemological approaches
to probability and statistics, and the curricular recommendations in different
countries, as well as the educational research. Based on their studies, they have
proposed various lists of “fundamental” ideas. These ideas can be taught at
different levels of formalisation depending on students’ ages and previous
knowledge (see, e.g., Borovcnik & Kapadia, 2011).
Our suggestions to teach the statistics content (Chapters 2 to 5) are organised
around four main clusters of fundamental ideas. We consider these clusters as key
foci around which activities can be organised by teachers in order to help students
acquire the related key concepts. Each chapter analyses the essential content of one
cluster using paradigmatic problems or situations that can be put forward to the
students. When appropriate, we make connections across the content in the other
chapters. In these chapters, we also inform teachers about the learning goals
implicit in the activities, point out potential difficulties encountered by learners as
described by research, and suggest promising teaching resources and situations that
embed the ideas within instruction. A summary of the fundamental ideas included
in each of these chapters follows.
2
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
Representations of data play a major role in statistics as a variety of graphs can be
used to display and extract information that may be hidden in the raw data. The
process of changing the representation of data in order to find further information
relevant to the initial problem is called transnumeration and is considered to be an
important process in statistical reasoning (Wild & Pfannkuch, 1999).
Variation and distribution are two complementary concepts that play a vital role in
statistics. Although variables and variation also appear in many areas of
mathematics, mathematics focuses on functional (deterministic) variation while
statistics deals with random variation. Hence, a goal of statistics education is to
enable students to reason about data in context under conditions of uncertainty, and
to discriminate between statistical reasoning and mathematical reasoning. Wild and
3
CHAPTER 1
Distribution is a term that is specific to statistics and probability;; it is a collection
of properties of a data set as a whole, not of a particular value in the data set. A
distribution consists of all the different values in the data including the frequencies
(or probabilities) associated with each possible value. Variation and distribution are
linked to other fundamental statistical ideas such as centre (as modelled by mean,
median, or mode), spread (as modelled by standard deviation or variance), and
shape (for example, bi-modal, uniform, symmetric, or L-shaped). Measures of
centre summarise the information about a distribution while measures of spread
summarise the variability within the data. Each value of a variable shows some
deviation from the centre. In the context of measuring an unknown quantity
(signal), this deviation may be interpreted as an error in measurement (noise).
Metaphorically, the distribution embodies an overall “model” for potential errors or
noise, while the centre can be seen as the signal (Konold & Pollatsek, 2002).
4
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
grasping the sophisticated interaction between the notion of relative frequency and
the frequentist conception of probability. The subjectivist view of probability,
which is widely used in applied statistics, has been developed hand in hand with
the frequentist view so that the two complement each other (Hacking, 1975). Their
interplay is relevant, especially for conditional probability. Bearing this in mind,
we suggest a combination of both approaches in the teaching of probability.
The fundamental ideas of variable and distribution still apply for probabilistic
modelling. However, probability distributions deal with the potential data of a
“random experiment”, which models how data will be generated, rather than with
actual data that have been collected. A helpful idea is to think of repeated
experiments that supply us with idealised frequencies. This metaphor helps us to
transfer many concepts from descriptive statistics to probability and delivers a
more concrete picture of what probability means.1 A variety of representations of
descriptive statistics can be transferred to probabilities. Furthermore, tree diagrams
may be used to simplify the discussion of combined random experiments and the
calculus of probabilities. All of these approaches are presented in Chapter 3.
5
CHAPTER 1
6
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
country). In both examples we might be interested in a number that describes the
average weight of the population. In the first case the focus is on all future units
produced, whereas in the second, the focus is all current eight-year-olds in the
country.
An important idea is that of sample representativeness. If there are no biasing
factors in how elements are selected, then the average weight of the sample data
should be a reliable estimate of the average (expected) weight of the population.
The statistical way to “guarantee” representativeness is to control the sampling
process. More precisely, random sampling is a suitable method to obtain a
representative sample, and it is possible to calculate the probability of obtaining a
biased sample. The techniques for generalising the information from samples to the
whole population are confidence intervals and tests of hypotheses.
Today, statistical inference has found its way into curricula all over the world
with a variety of approaches that attempt to make the methods and the inherent
notions more accessible to students. Specifically it is common to use simulation to
facilitate parts of the computation and to visualise the sampling variability. More
recently, resampling approaches have been used to simplify the probability models
implicit in inference methods by focussing entirely on the data set that has to be
analysed. All of these methods, as well as Bayes’ rule to update information from
empirical data, are presented in Chapter 5.
In conclusion, the most fundamental objective in probability and statistics is to
offer models to understand and interpret real observations. A model does not
completely represent reality;; yet, it can be used for explorations that may lead to
relevant results or insights about real data. A fundamental goal of teaching
statistics is that students understand the hypothetical character of probability and
statistical models and the possibility of applying these models in many different
contexts.
7
CHAPTER 1
recently the field of statistics education has grown to become a discipline in its
own right.4
The specific character of statistics and probability is also reflected in the
philosophical, ethical, procedural, and even political debates that are still ongoing
within these areas and their applications, which does not often happen in
mathematics.5 Statistics and probability are closely related to other sciences such as
demography, genetics, insurance, astronomy, and law, from which many statistical
methods were developed. Furthermore, inferential statistics has formed the basis
for a new scientific paradigm in social sciences, medicine, and other disciplines.
These disciplines share the process of a scientific argument for generalising
empirical results that leads beyond the subjectivity and the restrictions of a single
experimental situation.
Subsequently, we describe the components of an empirical study, which starts
with a contextual question (in Section 1.7.2). This question leads to an appropriate
design with a corresponding statistical question as well as a plan on how to collect
the data needed to address the question according to the chosen design. This
careful planning is necessary in order to obtain useful information about the initial
question and to keep the information free from confounding effects. The
exploration and analysis of the data are followed by drawing some conclusion from
the data (Wild & Pfannkuch, 1999). As we expose in Chapter 2, a crucial final step
is the interpretation of the results in relation to the initial question given the context
of the problem (see also the steps outlined in a statistical investigation in the
GAISE project in Franklin et al., 2007).
The main interest in applying statistics in a real-world study concerns finding,
describing, and confirming patterns in data that go beyond the information
contained in the available data. Thus, statistics is often viewed as the science of
data or as a tool for conducting quantitative investigations of phenomena, which
requires a well-planned collection of observations (Gattuso & Ottaviani, 2011;;
Moore & Cobb, 2000). This feature of statistics explains why it is easy to establish
connections between statistics and other school subjects and why it has sometimes
been argued that statistics should be taught outside the mathematics classroom
(Pereira-Mendoza, 1993).
4
For example, in the US, there are graduate programmes in statistics education and doctoral degrees
have been awarded in statistics education.
5
An example is the controversy around the use of statistical tests (Batanero, 2000;; Borovcnik, 1986a;;
Hacking, 1965).
8
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
An example described by Usiskin is the natural extension from the straight line
that goes exactly through two points to linear regression where the interest is in
finding a line that fits to more than two data points in an optimal way. Finding the
line of best fit is part of mathematical modelling;; however, statistical modelling
does not stop there. Lines are compared to other functions that may also model the
data and the interpretation of the fitted function depends on the context. Typical
questions are:
Do other functions fit better? What does it mean if we describe the relation
between the two variables under scrutiny by a line? Can we predict the value of the
dependent variable outside the range of data?
According to Usiskin, statistics education can also benefit from the strong
movement towards modelling developed since the late 1980s especially promoted
by the ICTMA, the International Community of Teachers of Mathematical
Modelling and Applications,6 and introduced only more recently in statistics
education through the Exploratory Data Analysis (EDA) movement.7 In this
modelling approach, the motivation for a particular concept emerges from the
context;; the concepts are developed interactively by contextual considerations and
mathematical principles. The teacher should find an appropriate real situation in
which the new concept makes sense for the students. Throughout the book we
present such initial problems that help students understand the related concepts,
followed by more complex situations when space permits.
Statistics at high school may sometimes be a separate course (e.g., the advanced
placement course in the US) or included in other subjects and taught when it is
needed to understand the current topics. We agree with Usiskin that regardless of
the placement, all students should experience substantial school work in statistics.
In this way, they become competent to appreciate and criticise empirically-based
arguments around them and empowered to make adequate and informed decisions.
9
CHAPTER 1
update prior probability judgements and make them “more objective”. Finally,
some probability ideas are needed to understand more informal approaches to
inference when we use simulations or re-randomisation to generate empirical
sampling distributions (see Chapter 5).
Thus, when teaching probability, one has to consider that probability is a
theoretical concept – a virtual concept according to Spiegelhalter and Gage (2014)
– and we speak about probability by using metaphors such as “propensity”, “degree
of belief”, or “limit of frequencies”, which convey only parts of this abstract
concept. Even though the relationship between probability and relative frequencies
is fundamental for the comprehension of probability and statistical methods, this
relationship is not always well understood as some students confuse frequency with
probability.8 Moreover, the different representations of measures of uncertainty (as
absolute numbers9 versus probabilities) may also involve different levels of
difficulty in understanding probability models. In Chapter 3, we discuss other
difficulties encountered by the students when interpreting small probabilities,
which usually occur in the case of risks of adverse events such as a maximum
credible accident of a nuclear power station or dying from lung cancer.
Technology has revolutionised the applications of statistics and likewise statistics
education. With software such as Fathom (Finzer, 2007) or Tinkerplots (Konold &
Miller, 2011), specially designed to support the learning of statistics and
probability, with a spreadsheet, or even with Internet applets, data analysis is no
longer the exclusive domain of statisticians (Biehler, 1997;; Biehler, Ben-Zvi,
Bakker, & Makar, 2013;; Pratt, Davies, & Connor, 2011). As demonstrated
throughout Chapters 2 to 5, software can facilitate computations and the production
of graphical representations of data. Thus, students can use methods such as fitting
a variety of models to a scatter plot (see Chapter 4) that were not accessible to
them a few years ago.
With modern technology, students can represent abstract interrelationships and
operations, interact with the setting, and see the changes in the representation or in
the results when varying some data or parameters.10 Technology provides the
possibility of dynamic visualisations where the impact of crucial parameters on a
graph can be traced. This technique may serve, for example, to explore the waiting
time for the first success in Bernoulli experiments (as done in Chapter 3).
Due to facility and speed of computations, the size of data sets is no longer a
limitation so that it becomes easier to use real data collected by the students or
taken from the Internet (e.g., from CensusAtSchool, n.d., or from several statistical
8
The expression “empirical probability” is unfortunate in this regard as a probability is always
theoretical;; only the frequencies are empirical.
9
In the sense of Gigerenzer (1994).
10
Another possibility not studied in the book is multivariate dynamic visualisation that can be
implemented with tools like those available from Gapminder (Rosling, 2009).
10
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
11
CHAPTER 1
Various factors suggest the need to be able to teach the same topic at different
levels of formalisation. Among them, we list the diversity of curricular guidelines
around the world, the different educational requirements of similar high-school
grades (e.g., technical versus social-science strands), as well as the differing
abilities and competencies of the students. As the book is intended to serve a broad
international audience, we have tried to implement the principle of adapting to a
diversity of students throughout the text.
Let us consider, for example, statistical inference, a quite sophisticated topic,
where starting with an informal approach has been recommended to reduce the
technicalities when introducing the topic (e.g., Makar, Bakker, & Ben-Zvi, 2011;;
Rossman, 2008). This informal approach could be used as an introduction for the
majority of students. However, in some countries (e.g., in New Zealand, Germany,
or Spain), a more formal approach to inference (including an exposition of
confidence intervals) is required in the last grade of high school for particular
strands.
We therefore start the exposition of basic inference methods like tests of
significance and confidence intervals in Chapter 5 with an informal approach
where the sampling distribution is estimated via simulation. We then later suggest
that students with experience in probability rules or the binomial distribution may
use this previous knowledge to compute the exact sampling distribution. We add
further activities related to statistical tests as decision making or use of Bayes’
theorem to update prior information about a parameter for those students with more
advanced experience in probability.
Other resources considering the same curricular content at different standards of
formalisation can be found in the GAISE guidelines (Franklin et al., 2007). They
provide examples of how different levels of presentation require and reflect an
increasing sophistication in understanding and applying stochastic concepts.
Statistics is embedded in a methodology that serves to generate evidence from data.
Statistical knowledge is vital in research and in public discussion where it is used
to empower arguments pro or contra some issue. Without statistical knowledge it is
difficult to discern misuse from proper use of data. Statistical knowledge involves
thinking in models, being able to apply proper models in specific situations,
considering the impact of assumptions, deriving and checking the results, and
interpreting them in the context.
12
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
The ability to understand and critically evaluate statistical results that permeate daily life,
coupled with the ability to appreciate the contributions that statistical thinking can make
in public and private, professional and personal decisions. (Wallman, 1993, p. 1)
Some schools […] teach statistics […] as part of mathematics, […] yet not in a way that
necessarily emphasises the development of statistical literacy.
13
CHAPTER 1
14
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
A major initiative towards statistical literacy for everybody is the International
Statistical Literacy Project promoted by the ISI and the IASE. The activities
include competitions for students, the World Statistics Day, the best cooperative
project competition, a newsletter, and an electronic repository of teaching
resources.
In the previous sections, we discussed the specificity of statistical and probabilistic
thinking. Stochastic concepts are derived by arguments with a strong mathematical
component. Nowhere else in mathematics, has the overlap between the scientific
and philosophical ideas been as strong, and these ideas were only separated
relatively recently. Moreover, stochastics has shaped modern research
methodology and is now an integral part of empirical research as shown in Popper
(1959).
A source of difficulty is the mixture of theoretical ideas and personal
conceptions when people apply stochastic reasoning. One example of this is the
complex relationship between randomness and causality. In a causal situation,
the outcome is completely determined by the conditions of an experiment;;
however, in a stochastic situation there is no explicit way to predict the outcome
with certainty, even though the experiment is thought to be repeatable under the
same conditions as in physics. The assumption is that the repetitions of a random
experiment are “independent”. Within probability theory, independence is
reduced to the multiplicative rule. When this rule has to be applied to a situation
with data in a real context, one has to motivate why the specific experiment can
be modelled as an independent stochastic experiment. However, many textbooks
refer to independence as a lack of causal influence, which only adds to the
confusion.
When teaching these concepts, often only the mathematical definitions are
developed while the intuitive side is neglected. Probabilistic modelling (described
in Chapter 3), however, has to rely on some basic assumptions that all methods of
statistical inference have in common;; consequently, without a connection to
probability, statistics can be deprived not only of its potential in applications but
also of its sense-making.
Consistent with the ideas above, many authors have pointed to the unique nature
of statistics and probability (e.g., Moore & Cobb, 2000;; Scheaffer, 2006) whence it
is wise to use a modelling approach for teaching. Modelling includes working with
real data, searching for suitable variables to answer questions from a context (see
Franklin et al., 2007), measuring objects on the scale of the variables, controlling
for variation in data by the selection of samples, and reducing the effects of
confounding factors by intelligent design of data production.
15
CHAPTER 1
Problem. A solution to the real problem requires that the student understands the
context and realises what is the relevant question to be answered. Often this
original question is too wide and needs to be made more precise;; for example,
restricting the possible decisions in the insurance contract. The first step in the
16
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
modelling cycle is refining the problem in such a way that it can be posed in
statistical terms;; it is necessary to get familiar with the context and fix the specific
goals that should be met in solving the problem.
Planning. Once the problem is manageable, students have to think ahead about
how to develop a strategy to get a solution. In case the solution to the problem
depends on some quantity that varies with the members of a specific population
(shoe size, in Chapter 2), it is necessary to plan how to measure this quantity across
the population members (boys and girls). Then there is the decision about which
data should be collected (number of students, specific groups in the sample) and
which strategy is followed to sample the population (systematic or random
sample). Finally, a plan of the analysis is required (in this example, a descriptive
analysis is appropriate).
– Understanding
Problem – Refining
– Interpretation – Measurement
– Solutions – Data collection
– New problems Conclusion Planning – Sampling
– Communication – Analysis
– Exploratory analysis
– Collection
– Generating hypotheses Analysis Data
– Cleaning
– Testing hypotheses – Organisation
– Estimation/prediction
Figure 1.1. The PPDAC cycle (adapted from Wild & Pfannkuch, 1999, p. 226)
Data. The next step is collecting the data. Sometimes the students collect
accessible data (e.g., from the Internet) or use data from previous questionnaires.
Other times, the students design questionnaires or experiments to collect their own
data. It has to be mentioned that their statistical understanding increases when they
are involved in the design of a data collection instrument or experiment. An
example where the students are involved in organising an experiment to collect
data to test a hypothesis is given in Chapter 5. Once the data are collected, they are
placed in a spreadsheet or other software that facilitates computations and
visualisations. This sometimes requires reorganising the data, for example, in
selecting the variables of interest and deciding the units to use for the analysis.
Data should also be inspected for errors in measurement or recording.
17
CHAPTER 1
Analysis. After the data are collected and cleaned, the analysis can start. If the
students have no idea about what can be found in the data, they might carry out an
exploratory data analysis (EDA). In this approach (see Chapters 2 and 4), the
students summarise the characteristics of the data without testing pre-set
hypotheses. Tukey (1977) was the main promoter of this approach, where multiple
data representations and visualisations are used to discover hidden patterns in the
data and to formulate new hypotheses. Other times the students perform an
inferential analysis (see Chapter 5). In this case the interest is generalising a
conclusion from the data at hand to a wider population from which the data are a
random sample. Inference is used to make estimations or predictions about a larger
population that the sample represents or to test hypotheses set prior to collecting
the data.
Conclusion. The final step in the modelling cycle is the interpretation of the
results from the analysis and relating this interpretation to the context in such a
way that produces some answers to the original problem. For example, in the tea-
tasting experiment to test a hypothesis in Chapter 5, we obtain data that are very
unlikely if the hypothesis is true. We interpret the finding as evidence against the
initial hypothesis and therefore reject it. In Chapter 2, we decide to repeat the
analysis by discarding some atypical values. The modelling cycle can be repeated
several times until a reasonable conclusion is reached.
1. Appreciation for the need for data. As suggested in Section 1.2, data are
essential in statistics;; very few statistical investigations can be completed
without properly collected and analysed data. Even if people have their own
experience with the same type of situation they should not base a solution solely
on their personal experience. Reliable data are essential to provide information
for a solution and to reach an adequate decision or judgement about the
situation. An important part of statistical thinking is being able to recognise the
points in the process where new data are needed.
18
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
Visualising the data in a bar graph (if applicable) shows the mode (or modes if
there is no unique peak) and the range of the variables. Changing to a box plot,
the median, the quartiles, and the extreme values become visible (see Chapter
2). The change of representations might reveal new relevant information in the
data.
3. Perception of and attention to sources of variation. Variation occurs in all areas
of mathematics;; however, random variation is specific to statistics. A particular
type of statistical thinking is to differentiate statistical and non-statistical
(deterministic) variation. It is also important to recognise the various sources of
variation: natural variation in the population, error in measurement, or variation
in sampling of data.
A goal of statistics is to separate irreducible and reducible variation. Even
natural variation in the population can be reduced;; for example, in Chapter 2,
the analysis of arm span should be separated between boys and girls. Statistics
offers methods to control variation when the source of variability is known.
Variation is also inherent in the conclusions;; for example, a p-value or a
confidence level indicate the quality of the used statistical argument based on
samples from the population (see Chapter 5).
4. Integration of statistics and context. Contextual knowledge is vital in all steps of
the modelling cycle. To highlight its relevance, Wild and Pfannkuch include the
integration of statistics and context as a specific type of statistical thinking. The
statistical model must be selected and exploited in such a way that the essential
elements of the real situation are captured. At the same time it is necessary to be
conscious that any model is different from reality;; hence, some differences
between the model and the investigated problem situation remain. The target is
to generate data that contain adequate information needed to answer the initial
problem;; the summary report should be oriented to synthesise, understand and
generalise the situation, when possible. Most importantly, integration with the
context is essential in the conclusion phase where it is decided whether the
solution is reasonable and applicable in the context.
5. Using appropriate statistical models. As in other areas of applied mathematics,
modelling is essential in statistics. The opportunities for modelling are
unbelievably wide in statistics (as seen from the examples in Tanur, 1989).
Moreover, statistics has developed its own set of models that were specifically
developed for the analysis of data. There is a wide range of statistical models,
some of which are highly sophisticated. Examples include the normal
distribution, regression models, or statistical tests, which can be generalised to
complex situations. However, it is possible to use simple versions of these
models at high-school level (see Chapters 3 to 5). A feature of statistical models
described by Wild and Pfannkuch is that they help us to think in terms of
distributions (aggregates) instead of concentrating on individuals.
19
CHAPTER 1
12
The person should understand that a correct strategy does not always assure a success.
20
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
(2010) describe an epidemiological study where all 331 BSE13 positive cases
could be false positives as there is no proper estimate of BSE prevalence.
Often, teachers present statistical information such as definitions of new concepts
or examples of procedures for solving statistical problems, and then give exercises
to the students to practice what they learnt. The consequence is routine learning
where students apply the formulas without any deeper understanding of the
underlying concepts.
In order to improve the situation, statistics educators recommend refocusing the
teaching of statistics on reasoning and sense making (e.g., Shaughnessy, Chance, &
Kranendonk, 2009). According to the Common Core State Standards Initiative
(CCSSI, 2010), to make sense of the problems posed to them, the students should
first understand the goals and constraints, and conjecture a possible solution path
before starting to solve the tasks posed. They may consider similar problems that
have been solved before or solve a simpler form of the original problem (e.g.,
reduce the sample size). The teacher may provide support when needed;; for
example, suggesting the use of a particular type of graph to discover patterns and
relationships in the data. Another strategy is organising the students in groups,
where more advanced students help their classmates.
Throughout the book, we attempt to make sense of the different concepts and
methods by using contexts where these ideas can be meaningful for the students.
For example, in Chapter 3, we use contexts familiar to students to introduce three
different views of probability: a) probability as a value to which the relative
frequency tends in a large number of experiments;; b) probability as a ratio of
favourable to possible cases when the elementary events are equally likely;; and c)
probability as a personal degree of belief. In the same chapter, conditional
probability is linked to circumstantial evidence. In Chapters 3 and 5, Bayes’ rule is
introduced as a method to learn from experience. The correlation coefficient is
related to both the error in prediction and the spread of scatter plots in Chapter 4.
13
Bovine spongiform encephalopathy or mad cow disease, a fatal neurodegenerative disease in cattle.
14
Regression of average (aggregate) values instead of using the original data can increase the
correlation considerably;; by the Simpson paradox a positive correlation in all subgroups may be
changed to a negative correlation in the whole group investigated.
21
CHAPTER 1
22
EDUCATIONAL PRINCIPLES FOR STATISTICS AND PROBABILITY
statistical knowledge (and reasoning), to learn to make the best use of technology,
and to orientate their teaching to students of differing abilities.
The aim of this book is to sustain teacher educators and teachers as well as to
increase their interest, competencies, and knowledge in stochastic education. We
hope to see that education researchers are encouraged to explore innovative ways
and tools for educating teachers and students in statistics and probability using the
ideas suggested in this book.
23