STA 111 - Topic One - Lecture 1
STA 111 - Topic One - Lecture 1
STA 111 - Topic One - Lecture 1
1.1 Objectives
By the end of the lecture, the learner should be able to:
i) Understand the meaning, nature, importance and limitations of statistics
ii) Explain the types of variables
iii) Classify measurements and data into various types
1.2 Introduction
Page 1 of 12
4. Interpretation of data: Interpretation means drawing conclusions from the data
which form the basis of decision making. Correct interpretation requires a high
degree of skill and experience and is necessary in order to draw valid conclusions.
To present data in a concise and definite form – helps in classifying and tabulating
raw data for processing and further tabulation for other users.
To make it easy to understand complex and large data - permits summarization and
presentation of large quantities of information. i.e. It condenses and summarizes
voluminous data into a few presentable, understandable and precise figures. For
example, stock market prices of individual stocks and their trends are highly
complex to comprehend, but a graph of prices trends gives us the overall picture at a
glance.
To undertake and understand research in our areas of interest such as It helps in
determining functional relationship between two or more phenomenon. Statistical
techniques such as correlational analysis assist in establishing the degree of
association between two or more independent variables. For example, the coefficient
of correlation between literacy and employment gives us the degree of association
between extent of training and industrial productivity.
Used in government and other organizations to formulate new programmes and
policies as well as in administration ie It helps the central management and the
government in formulating policies. Example, the recently conducted census, will be
used as a source of information for planning by the government for the next 10 years
until another census is conducted in 2019.
For comparison of variables in different sets of data - Arrangement of data with
respect to different characteristics facilitates comparison and interpretation. For
example, data on age, height, gender, and family income of college students gives us
a much better picture of students when the data is categorized relative to these
characteristics.
Aids in forecasting outcomes of future events- Statistical methods are highly useful
tools in analyzing the past data and predicting some future trends. Eg Helps
businesses in decision making by making future estimates and expectations . For
Page 2 of 12
example, the sales for a particular product for the next year can be computed by
knowing the sales for the same product over the previous years, the current market
trends and the possible changes in the variable that affect the demand of the
product.
2. Economics. Statistics are widely used in economics study and research. The subject of
economics is mainly concerned with production and distribution of wealth as well as
savings and investments. Some of the areas of economic interest in which statistical
tools are used are as follows:
(a) Statistical methods are extensively used in measuring and forecasting Gross
National Product ( GNP ).
(b) Economic stability is primarily judged by statistical studies of business cycles.
(c) Statistical analyzes of population growth, unemployment figures, rural or urban
population shifts and so on influence much of the economic policy making.
(d) Econometric models which involve application of statistical methods and used for
optimum utilization of resources available.
(e) Financial statistics are necessary in the fields of money and banking including
consumer savings and credit availability.
3. Physical, Natural and Social Sciences. In physical sciences, as an example, the science
of meteorology uses statistics in analyzing the data gathered by satellites in predicting
weather conditions.
4. Statistics and Research. There is hardly any advanced research going on without the
use of statistics in one form or another. Statistics are used extensively in medical,
pharmaceutical and agricultural research. The effectiveness of a new drug is
determined by statistical experimentation and evaluation.
5. Other Areas. Statistics are commonly used by insurance companies, stock brokerage
firms, banks, public utility companies and so on. Statistics are also immensely useful to
politicians since they can predict their chance of winning through the use of sampling
Page 3 of 12
techniques in random selection of voters sampled and studying their attitude on issues
and policies.
Page 4 of 12
one bank to the other is four feet, when there may be some points in between where
its depth is far more than four feet. On this understanding, one may enter those
points having greater depth, which may be hazardous.
8. Since statistics are collected for a particular purpose, such data may not be relevant
or useful in other situations or cases. For example, secondary data (i.e., data
originally collected by someone else) may not be useful for the other person.
9. Statistics are not 100 per cent precise as is Mathematics or Accountancy. Those who
use statistics should be aware of this limitation.
10. In statistical surveys, sampling is generally used as it is not physically possible to
cover all the units or elements comprising the universe. The results may not be
appropriate as far as the universe is concerned. Moreover, different surveys based
on the same size of sample but different sample units may yield different results.
11. At times, association or relationship between two or more variables is studied in
statistics, but such a relationship does not indicate cause and effect‟ relationship. It
simply shows the similarity or dissimilarity in the movement of the two variables. In
such cases, it is the user who has to interpret the results carefully, pointing out the
type of relationship obtained.
12. A major limitation of statistics is that it does not reveal all pertaining to a certain
phenomenon. There is some background information that statistics does not cover.
Similarly, there are some other aspects related to the problem on hand, which are
also not covered. The user of Statistics has to be well informed and should interpret
Statistics keeping in mind all other aspects having relevance on the given problem.
1.2.5 Misuses
Sometimes people, knowingly or unknowingly, use statistical data wrongly. Such
forms of misuse include:
i) Failure to give the sources of data: this may compromise the reliability of the data
because the user of such data will not know how far this data will fit his/her
situation including if he/she wants to refer to the original source.
ii) Defective data: This may be done knowingly in order to defend one‟s position or to
prove a particular point. This apart, the definition used to denote a certain
phenomenon may be defective. For example, in case of data relating to unemployed
persons, the definition may include even those who are employed, though partially.
The question here is how far it is justified to include partially employed persons
amongst unemployed ones.
iii) Unrepresentative sample: In statistics, several times one has to conduct a survey,
which necessitates to choose a sample from the given population or universe. The
sample may turn out to be unrepresentative of the universe. One may choose a
Page 5 of 12
sample just on the basis of convenience. He may collect the desired information
from either his friends or nearby respondents in his neighbourhood even though
such respondents do not constitute a representative sample.
iv) Inadequate sample: At times one may conduct a survey based on an extremely
inadequate sample. For example, in a city we may find that there are 100,000
households. When we have to conduct a household survey, we may take a sample
of merely 100 households comprising only 0.1 per cent of the universe. A survey
based on such a small sample may not yield right information.
v) Unfair Comparisons: For instance, one may construct an index of production
choosing the base year where the production was much less. Then he may compare
the subsequent year‟s production from this low base. Such a comparison will
undoubtedly give a wrong picture of the production though in reality it is not so.
Another source of unfair comparisons could be when one makes absolute
comparisons instead of relative ones. An absolute comparison of two figures, say,
of production or export, may show a good increase, but in relative terms it may
turnout to be very negligible. Another example of unfair comparison is when the
population in two cities is different, but a comparison of overall death rates and
deaths by a particular disease is attempted. Such a comparison is wrong. Likewise,
when data are not properly classified or when changes in the composition of
population in the two years are not taken into consideration, comparisons of such
data would be unfair as they would lead to misleading conclusions.
vi) Unwanted conclusions: This may be as a result of making false assumptions. For
example, while making projections of population in the next five years, one may
assume a lower rate of growth though the past two years indicate otherwise.
Sometimes one may not be sure about the changes in business environment in the
near future. In such a case, one may use an assumption that may turn out to be
wrong. Another source of unwarranted conclusion may be the use of wrong
average. Suppose in a series there are extreme values, one is too high while the
other is too low, such as 800 and 50. The use of an arithmetic average in such a case
may give a wrong idea. Instead, harmonic mean would be proper in such a case.
vii) Confusion of correlation and causation: In statistics, several times one has to
examine the relationship between two variables. A close relationship between the
two variables may not establish a cause-and-effect-relationship in the sense that one
variable is the cause and the other is the effect. It should be taken as something that
measures degree of association rather than try to find out causal relationship.
Page 6 of 12
1. Descriptive: statistics that summarize the characteristics of given data, without
trying to extrapolate or make predictions. Utilizes numerical and graphical method
to summarize the information, look for patterns in the data set and present the
information in a convenient form (Describes or summarizes things you definitely
know).
2. Inferential: statistics used to make claims or predictions about the larger
population based on a subset (sample) of that population. Utilizes sample data to
make estimates, decisions, predictions and other generalizations about a larger set of
data. (Compares groups, tests hypothesis or predicts or infers).
Conclusions made are called Statistical inference which cannot be absolutely
certain hence the need to use probability in drawing conclusions.
Remark:
In this course, you will study numerical and graphical ways to describe and display
your data. This area of statistics is what we have called "Descriptive Statistics." You
will learn how to calculate, and even more importantly, how to interpret these
measurements and graphs.
1.3 Data
Page 7 of 12
Continuous Variable - Takes any value within a specified range. E.g. Height of
students.
Constant Variables - Takes one value. E.g. Number of hours in a day.
Page 8 of 12
b) Continuous Data
Continuous data can take any value, or any value within a range or an interval. Most
data measured by intervaland ratio scales, other than that based on counting, is
continuous.
Example: weight and height of students, distance from town to campus, an income
received byan employee are all continuous.
2. Qualitative data (Categorical data – that which yields responses such as Yes or No.
for example,” Did you buy the books?”)
Qualitative data cannot be measured on a natural numerical scale; they can only be
classifiedinto groups or categories. Take on values that are names or labels. Categories
are non - overlapping, may or may not suggest an order or rank.
Examples: The political party affiliations in a sample of 50 chief executive officers, the
size of acar (subcompact, compact, mid-size, or full-size) rented by each of a sample of
30 business travelers, acoffee tester‟s ranking (best, worst, etc.) of four brands of coffee
for a panel of 10 testers.
These data can be classified into three types:Attribute, Nominal and Ordinal.
a) Attribute Data: Also known as dichotomous data. These data has only two
categories.
Example: yes/no, male/female.
b) Nominal Data: These data have several unordered categories.
Example: type of an insurance policy (motor, medical, fire, burglary, life insurance
policies)
c) Ordinal or Ranked Data:These data have several ordered categories
Example: Questionnaire response such as Strongly Agree ......... Strongly Disagree to
questionslike:I am the best student in my class, My classmates are very co-operative, I
live in the best hostel, Muscle response (none, partial, complete), Tree vigor (Healthy,
sick, dead), Income (< 𝐾𝑆ℎ9999 , 𝐾𝑆ℎ10,000 − 𝐾𝑆ℎ19,999, 𝐾𝑆ℎ20,000 − 𝐾𝑆ℎ49,999, >
𝐾𝑆ℎ50,000)
Remark:
In economics, data is also often categorized by how it relates to time.
Cross-sectional data.
In cross-sectional data, all observations come from the same point in time. The
observations typically correspond to individuals or groups like states or countries. For
instance, a survey of Americans on who they support in the upcoming presidential
election is cross-sectional data. So is a data set with the homicide rate for each state in a
single year.
Longitudinal or time-series data.
In longitudinal or time-series data, each data point corresponds to a particular point in
time – usually for a single individual or group. For instance, if you recorded your
income every day for a year, that would give me a longitudinal data set. The GDP of
the U.S. from 1945 to the present is also a longitudinal data set.
Page 9 of 12
Panel data.
Panel data is both cross-sectional and longitudinal. It involves getting cross-sectional
data for many time periods (or, alternatively, time-series data for many different
individuals or groups). For instance, if you recorded the income for each one of your
classmates every year for the next 20 years, that would be a panel data set.
One way to think of this is in terms of dimensions. Both cross-sectional and time-series
data are one-dimensional; panel data is two-dimensional.
N/B:In Experimental methods, the researcher has to control the independent variables
while in Non-Experimental methods there is no control.
Sources of Data
There are two main sources of data collection techniques: Primary and Secondary
sources. There is also a third source known as internal data.
Page 10 of 12
ii) Questionnaire – commonly used in survey-asking people questions (Questioning)
A formal list of such questions either open or closed ended questions for which the
respondent gives answers. May be conducted through telephone, mail, live,
electronic mail or fax etc.
iii) Direct Observation - When data are collected by observation, the investigator asks
no questions and may let the one being observed or may not let him know he‟s
being observed.
iv) Interviews– face to face with the respondent. Is slow, expensive and may take away
from their working hours but allows in depth and follow-up questioning.
v) Experiments – subjects are divided into treatment groups and control groups to
measure the difference between them after some kind of treatment is given to the
former group. This is very common in medical testing.
Exercise 1.1
1. Describe meaning of each of the following terms:
i) Statistics.
ii) Data
iii) Frequency distribution
2. Discuss four functions of statistics.
3. What are the major limitations of Statistics? Explain with suitable examples
4. Distinguish between the following terms as used in statistics:
i) Descriptive and inferential statistics.
ii) Target population and sample.
iii) Census and sample survey.
Page 11 of 12
iv) Nominal and interval measurement.
v) Quantitative Data and Qualitative Data.
5. Explain the two main sources of data.
6. Categorize these measurements according to their level:
i) Students performance: Distinction, Pass, Fail
ii) Annual net income for Afya Insurance in 2012
iii) Names of insurance products
iv) Religious preference of tourists
v) Room temperature measured in Kelvin scale
vi) The Length of time spent in a restaurant
vii)The rank of an army officer
viii) The type of a vehicle driven by the president
ix) The mass of a pig
7. State which of the following variables are discrete and which are continuous:
i) Height of a person
ii) Number of employees in ABC bank
iii) Temperature on a certain day
iv) Age of a building
v) Length of a train journey
vi) Time taken to complete a project
vii)Volume of water in a container
viii) Number of children in a family
8. Classify the following examples of data as nominal, ordinal, interval or ratio giving
reasons for each:
i) The species of trees growing in a farm
ii) The grades of students at the end of semester exams
iii) The financial stability of banks in Kenya
iv) The number of years of service of all employees in karatina university
v) Favorite rainbow colours among a sample of 50 pupils in ABC school.
vi) The number of defective bulbs produced by XYZ factory between January and
May 2000.
9. List the various methods of data collection techniques you know of.
10. Research question:
Write down the advantages of data classification.
Page 12 of 12