The Basic Concepts of Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

The Basic Concepts of Statistics

“Research and education are the two wheels of development, with research being the front wheel” – Duro Clement Dolapo

Introduction

Researchers use a wide variety of tools in order to gain an understanding of the phenomena they study.
Perhaps the most important of these is statistics. Statistics plays fundamental roles not only in the analysis
of data collected, but in the planning, designing, process of data collection and interpretation of the
results of research. No wonder statistics is increasingly taking a prominent position in research.

What is Research?

Research is simply the process of arriving at dependable solutions to problems through the planned and
systematic collection, analysis and interpretation of data. Research is the most important tool for
advancing knowledge, for promoting progress, and for enabling man to relate more effectively to his
environment, to accomplish his purposes, and to resolve his conflicts (Osuala, 2001). It is oriented towards
finding what works and what does not work in certain situation. Without research development is a
mirage. Any nation that desires development, but not ready to channel a great deal of her resources to
education and research is just tempting God no matter the amount of prayers.

What is Statistics?

Statistics is the scientific method of collecting, organizing, summarizing, analyzing, interpreting and
presenting data. If one compares the definitions of research and statistics, one will realize that both of
them relate to one word “data” which will be further explained later. Therefore, the definition of research
can be reframed as “the process of arriving at dependable solutions to problems through statistics”.
Statistics finds its application in all human endeavors. The branch of statistics that deals primarily with the
biological sciences and medical/health-related disciplines is the Biostatistics.

When approaching the study of any organized body of knowledge, especially one as diverse and complex
as statistics, it is important that some conceptual frameworks be identified from which the components
material can be viewed.

Populations and Samples

A population is a set of persons (or objects) having a common observable characteristic (Kuzma, 1998).
This is referred to as the popular population. Population can also be referred to as the observable
characteristics of persons or things. This is referred to as the statistical population. The sizes of populations
can vary. In the discipline of statistics two types of populations (infinite and finite) can be distinguished
based of the size. Infinite populations can be thought of as large populations while finite populations are
those that are smaller. The distinction is arbitrary, although some researchers regards populations that

1
are 10,000 or more as large populations, while those that are less than 10,000 are referred to as small
populations.

A sample is some subset of a population. The distinction between population and sample is crucial to
understanding of research. This is because more often than not, the researcher is not able to carry out
observation on all the units constituting a population for cost and logistic reasons. He/she can still conduct
the research by observing a subset of the population by taking a representative sample after which an
extrapolation is made from the results gotten from the sample to the population.

Data and Variable

The word data refers to the recordings of measurements made on characteristics. The singular form is
datum, but since statistics is about groups of person or objects, the word “data” predominates. Data are
values a characteristic can assume or values a variable can assume.

When characteristics take on different values they are referred to as variables. In other word, a variable
is a characteristic that can assume different value or a characteristic that varies from one subject (person
or object) to another. For example, sex is a variable because it differs among people. The possible values
of sex are “male and female”. The recorded male or female for a group of people are referred to as data.
Classification of variables is depicted in the diagram below. The classification is also used for data.

Figure 1: The different types of variables

2
Activity 2.1: List and discuss ten examples of each of the following and their possible values:
1. Nominal variable
2. Ordinal variable
3. Discrete variable
4. Continuous Variable
Note: Nominal variables often consist of a type with two possible values, e.g. dead or alive, male or
female, cured or not cured. This type is referred to as binary or dichotomous variable.

Parameter and Statistic


Closely related to populations and samples are the concept of parameters and statistics. A parameter is
defined as any summarization of the elements of a population while a like summarization of the elements
of a sample is referred to as a statistic (Blair and Taylor, 2008). Note: Do not confuse “statistics” when used
in this sense with the word “statistics” when used to refer to the discipline of study. For example the real
or actual mean systolic blood pressure of the adults in Lagos (a parameter) is a figure that is difficult to
know because it is practically impossible to measure the blood pressure of all the adults in the city. A
researcher can take a sample of 2,500 adults, measure their systolic blood pressure, calculate the mean
(a statistic), and make reference to the population. A searcher may not be able to know a parameter, but
can always calculate a statistic in order to make categorical statements or judgments about the unknown
parameter.
The distinction between parameters and statistics is so fundamental to statistical thinking that two
different conventions are commonly employed for their representation. The most popularly use alphabets
as shown below.

Table 1: Symbols for representing parameters and statistics


Summary of characteristic Parameter (Greek letters) Statistic (Roman alphabets)
Mean μ
Standard deviation σ s
Variance σ2 s2

Activity 2.2: Discuss five different parameters and their corresponding statistics

Descriptive and Inferential Statistics


The discipline of statistics consists of two component parts. The first component is referred to as
descriptive statistics while the second is termed inferential statistics. Descriptive statistics is made up of
various techniques used to summarize the information contained in a set of data. Supposed a study is
conducted to assess the packed cell volume (PCV) of 120 pregnant women attending an antenatal clinic.
If one is to report the findings of this study to the head of department, the answer could be a listing of
individual test results. After going through the list of all the results the head of department would likely
have little understanding of the information provided. The unsummarized information would overwhelm
his/her ability to arrive at a meaningful conclusion. A better way of reporting it would have been the mean
PCV (e.g. 33.4%). Other summarizations might include the lowest and the highest PCV and the various

3
graphical representations of the data. Thus, descriptive statistics, as the name implies deals with
description of data. In contrast to descriptive statistics, inferential statistics is made up of various
techniques used to provide information about parameter values based on observations made on the
values of statistics.

Figure 2: The relationship between population and sample, parameter and statistic, and inferential and descriptive statistics

Descriptive Inferential
Probability
statistics statistics

Figure 3: The relationship between descriptive and inferential statistics

Scales of measurement
We have learnt that populations and samples are made up of subjects (persons or objects), and that
subjects have measurable and observable characteristics that take up different values. We also learned
that once measurements are carried out and recorded the result is called data. But what is meant by the
word measure? Simply put, it means that we assign numbers, letters, words or some other symbols to
persons or things in order to convey information about the characteristic being measured. Thus, we may
assign the number 65 to a man in order to represent his weight in kilogram or an “M” to represent his sex
or gender. It is worthy of note that measurements taken on variables can yield different amounts of
information depending on the scale employed in the measurement process. Thus, measurements that
produce the number 1, 2, 3, 4 and 5 on one scale may convey a different amount of information about
the variable than would the same numbers obtained from use of different scale. This in turn has

4
implications for the statistical treatment of such data. The scales of measurement were first described by
Stanley S. Stevens in his book entitled “On the theory of scales of measurement” in 1946. According to
Stevens the measurement process can be conceived of as existing on four different levels which he
referred to as the nominal, ordinal, interval (or equal interval), and ratio scales as describe below.

Some important terms used to describe variables

Independent variable: A variable thought to be the cause of some effect. This is the variable that can be
manipulated.

Dependent variable: A variable thought to be affected by changes in an independent variable. It can be


thought of as an outcome.

Predictor variable: A variable thought to predict an outcome variable. It is another term for independent
variable.

Outcome variable: A variable thought to change as a function of changes in a predictor variable. It is

synonymous to dependent variable.

Ratio

Interval (or Equal interval) This assigns variables into


Ordinal categories with ranking
This assigns variables into and with the attribute of
Nominal categories with ranking how much more and how
This assigns variables into
and with the attribute of much less, and with true
categories with ranking, zero origin
This assigns variables into how much more and how
but no attribute of how
categories without ranking much less
much more and how much Examples of variables
less measured: weight, height
Examples of variables Examples of variables blood sugar, temperature
measured: sex measured: Temperature measured in
Examples of variables
(male/female), treatment measured in Centigrade or thermodynamic or Kelvin
measured: severity of
type Fahrenheit. scale.
disease
(surgery/chemotherapy)
(mild/moderate/severe)

Figure 4: Scales of measurement

Bibliography

1. Adamu SO, Tinuke L. Johnson. Statistics for Beginners, Evan Brothers Nigeria Limited (2011)
2. Afolabi Bamgboye E, A Companion of Medical Statistics, FalBam Publishers, Ibadan, Nigeria, Second edition
(2008).
3. Andy field, Discovering Statistics Using SPSS, Sage, Los Angeles, 3rd edition
4. Aviva Petrie and Caroline Sabin, Medical Statistics at a Glance, Willey, Blackwell, 3rd Edition

5
5. Babara Facem, High – Yield Behavioural Science, Lippincott Williams & Wilkins
6. Beth Dawson, Robert G. Trapp, Basic & Clinical Biostatistics, Lange Medical Books/McGraw-Hill, 4th edition,
2004.
7. Betty R. Kirkwood, Jonathan A.C. Sterne. Essentials of Medical Statistics. Blackwell scientific Publications,
California USA, 2nd edition, 2003
8. Bill Taylor, Gautam Sinha, Taposh Ghoshal.Research Methodology: A Guide for Researchers in Management &
Social Sciences.Prentice-Hall of India Private Limited (2006).
9. Bonita R, Beaglehole R, Kjettstrom T. Basic Epidemiology, World Health Organization, Second edition
10. Clifford Blair R, Richard A. Taylor, Biostatistics for the Health Sciences. Upper Saddle River, New Jersey
11. David Bowers, Allan House, David Owens, Understanding Clinical Papers. John Wiley & Sons Limited, 2012
12. David Machin, Michael J. Campbel, Stephen J. Walters, Medical Statistics: A Textbook for the Health Science.
John Wiley & Sons Limited, fourth edition
13. Gail F Dawson, Easy Interpretation of Biostatistics: The Vital Link to Applying Evidence in medical Decisions.
14. James F. Jekel, David L. Katz, Joan G. Elmore, Epidemiology, Biostatistics, and Preventive Medicine. Saunders,
Second edition
15. Kenneth F. Schulz, David A. Grimes, The Lancet Handbook on Essential Concepts in Clinical Research.
16. Kothari CR, Research Methodology: Methods and Techniques. New Age International Publishers, Second edition
(2011)
17. Mahajan BK, Methods of Biostatistics for Medical Students and Research Workers, Jaypee. Sixth edition
18. Nigel Bruse, Daniel Pope, Debbi Stanistreet, Quantitative Methods for Health Research, John Wiley & Sons
Limited (2008).
19. Osuala EC, Introduction to Research Methodology, Africana-Fep Publishers Limited, Third edition (2001)
20. Stephen H. Gehlbach, Interpreting Medical Literature. McGraw-Hill, fifth edition (2006)
21. Sylvia Wassertheil-Smoller, Biostatistics and Epidemiology: A Primer for Health Professionals, Springer-Verlag,
New York

Practice Questions
1. The data from one of the following types of variables cannot be ordered
A. Nominal
B. Ordinal
C. Discrete
D. Continuous
E. None of the above

2. All the followings are synonyms except


A. Determinant
B. Predictor
C. Exposure
D. Factor
E. Outcome

3. Which of these variables differs from the rest


A. Religion
B. Educational status
C. Ethnicity
D. Marital status
E. None of the above

6
4. The followings are examples of data except
A. “80kg”
B. “Female”
C. “Australia”
D. “Blood sugar”
E. “One Dollar”

5. “7 patients” is an example of which type of data?


A. Continuous
B. Binomial
C. Ordinal
D. Discreet
E. Nominal

6. The data for one of these are usually derived


A. Body Mass Index
B. Height
C. Blood Cholesterol
D. Waist-Hip-Ratio
E. A and D

7. All of these are synonymous except


A. Binary
B. Dichotomous
C. Ordinal
D. Binomial
E. None of the above

8. Which of the followings is NOT a parameter


A. Mean weight of a sample of primary six students in a country
B. Median age of marriage among the women in a province
C. Proportion of cars that are of Toyota make in a park
D. Standard deviation of household income in a state
E. Sensitivity of a diagnostic test among patients attending a hospital

9. Which of the following scales of measurement is known for not having a true zero origin
A. Ratio
B. Ordinal
C. Interval
D. Nominal
E. None of the above

10. Which of these arrangements is NOT correct


A. Population ↔ Statistic
B. Descriptive statistics ↔ Sample
C. Inferential statistics ↔ Parameter
D. Population mean ↔ Inferential statistics

7
E. Parameter ↔ Population

11. All the following notations are peculiar to descriptive statistics except
A. s
B.
C. μ
D. s2
E. None of the above

12. Which of the following scales of measurement has all the properties of others
A. Interval
B. Nominal
C. Ratio
D. Ordinal
E. None of the above

13. Differentiate between:


a. Population and sample
b. Variables and Data
c. Parameter and Statistics
d. Descriptive Statistics and Inferential Statistics

You might also like