Bstat

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

A variable

• is anything that varies


• Is some characteristics of a population or a
sample
• A variable is a measurable quantity which
changes over space or time.eg Time, cost of
goods sold, number of suppliers, type of
specification, volume and value of stock,
return on net assets
NB:
• The values of the variable are the possible
observations of the variable.
• Data are the observed values of a variable or
in simple terms, these are raw facts.
• For example: coursework marks for 10
students.
12 21 22 10 28 1

• Incidentally, data is plural for datum i.e. the


mark of one student is a datum.
TYPES OF VARIABLE
Variable

Quantitative Qualitative
Variable variable
(continous
& discrete)
• A variable may be categorical(Qualitative) or non categorical
variables(Quantitative).

• Quantitative variables (discrete or continuous) can be


transformed into qualitative data through category creation
-Qualitative variables cannot be meaningfully transformed into
Quantitative data – coding their values with numbers does not
make them quantitative
Types of Data

The objective of statistics is


to extract information from
data. There are different types
of data. To help explain this
important principle, there is
need to understand some
terms.
TYPES OF DATA
By Nature: Quantitative or Qualitative

By timeframe:
• Cross Section Data- Data values observed at a fixed point in time
• Time Series Data- Ordered data values observed over time
• Panel Data– Data observed over time from the same units of
observation

By Source: Primary or Secondary


• Primary Data - data gathered for the first time by the researcher
• Secondary Data - Data taken by the researcher from secondary
sources, internal or external o Already published records/compilation
Advantages of primary data

The data is original.


The information obtained is unbiased.
It provides accurate information and is more reliable.
It gives a provision to the researcher to capture the changes
occurring in the course of time.
It is up to date data, relevant and specific to the required product
Disadvantages of primary data

Time consuming to collect


It requires skilled researchers in order to be
collected.
It needs a big sample size in order to be accurate.
It’s more costly to collect
Advantages of Secondary data

It’s economical. It saves expenses and efforts since it is


obtainable from other sources.
It is time saving, since it is more quickly obtainable than primary
data.
It provides a basis for comparison for data collected by the
researcher.
It helps to make the collection of primary data more specific,
since with the help of primary data, one is able to identify the gaps
and inefficiencies so that the additional or missing information
may be collected
Disadvantages of Secondary data

Accuracy of secondary data is not known.


 Data may be outdated.
It may not fit in the framework of the research factors for
example units used.
Users of such data may not have as thorough understanding
of the background as the original researcher.
DATA COLLECTION

Considerations to make before data collection

•Statement of the purpose


 should be clearly be stated to avoid confusion
 Only necessary information is collected
• Scope of inquiry
 based on space or time- geographical and time
• Choice of statistical unit
• Data Sources
• Data collection technique
 Depends on time available, literacy of the respondents, language,
availability of the resources, the accuracy required
DATA COLLECTION
• Questionnaire
METHODS
This can take on the form of self administered questionnaire or completed by an
interviewer
• Direct interview
Many researchers believe that the best way to survey people is by means of personal
interviews which involves an interviewer soliciting information from a respondent by
asking prepared questions.
• Observation
This is data collected through direct observation.
• Experimentation
A more expensive but better way to produce data is through experimentation. Data
produced in this manner is called experimental. E.g. Randomized control trials

Question: Explain the advantages and disadvantages of the above methods


Sources of data
• Primary source
This refers to collecting data directly from the field. Such data,
information collected by the population census enumerators,
business survey enumerators, e.t.c.
• Secondary Source
This refers collecting data from published or unpublished
compilations e.g. journals, newspapers, magazines, sales records,
production records, textbooks e.t.c. Examples include:
 Trade associations (e.g KACITA)
 Commercial services
 National and international institutions (e.g URA, UBOS, etc)
MEASUREMENT SCALES
Measurement scales for Qualitative data
• Nominal scale
• Ordinal scale
Measurement scales for Quantitative data
• Interval scale-
• Ratio scale
NOMINAL SCALE

• It is used for variables that can be measured by


classification only. Non-numerical in nature. It involves
only naming.

• Categories without a meaningful order identify


nominal data (sex, political affiliation, industry
classification, ethnic/cultural groups, cause of
defectives).
EXAMPLES OF NOMINAL SCALES

Sex of respondent 1=Male


2=Female
Region 1= Western
2= Eastern
3= Northern
4= Central

Position held 1=Owner


2=Manager
3=Director
4=Other, Specify……………
Ordinal scale

Ordinal scale-
It involves ordering (its what’s important and significant)
• It is a measurable scale which focuses or bases on ranking of ordered
Categories.eg in athletics competition we have the first, second, third …………….
etc

Response scale 1= SD 2= D 3= N 4= A 5= SA

Tax Registration
Tax officers are helpful to us when it comes to registering for
taxes.
We find it easy registering for taxes
We do not lose so much time at registration for taxes
INTERVAL SCALE
• Interval scales are numeric scales in which we know not
only the order, but also the exact differences between the
values.  
• The classic example of an interval scale is time i.e.
the increments are known, consistent, and measurable.

• Interval scales are nice because the realm of statistical


analysis on these data sets opens up.  For example,
central
  tendency  can be measured by mode, median, or
mean; standard deviation can also be calculated.
INTERVAL SCALE-CONTINUED

NB:
Like the others, you can remember the key points of an
“interval scale” pretty easily.  “Interval” itself means
“space in between,” which is the important thing to remember.
Interval scales not only tell us about order,
but also about the value between each item
Ratio scale

Ratio scales are the best when it comes to measurement scales because they
tell us about
• the order, they tell us
• the exact value between units, AND
• they also have an absolute zero–which allows for a wide range of
both descriptive and inferential statistics to be applied.  
• At the risk of repeating myself, everything above about interval data applies to
ratio scales + ratio scales have a clear definition of zero.  Good examples of
ratio variables include height and weight.
RATIO SCALE CONTINUED

• Ratio scales provide a wealth of possibilities when it comes to statistical


analysis.  These variables can be meaningfully added, subtracted, multiplied,
divided (ratios).  
• Central tendency can be measured by mode, median, or mean; measures of
dispersion, such as standard deviation and coefficient of variation can also be
calculated from ratio scales.
SUMMARY OF DATA TYPES AND SCALE MEASURES
IN SUMMARY

Nominal scale is used to “name ,” or label a series of values.  

Ordinal scales provide good information about the order  of choices, such as


in a customer satisfaction survey.  

Interval scales give us the order of values + the ability to quantify the


difference between each one .
 
Ratio scales give us the ultimate–order, interval values, plus the ability to
calculate ratios  since a “true zero” can be defined.
DATA PRESENTATION
METHODS
• Text presentation (use of statements)
• Diagrammatic presentation (Pie charts, Pictogram)
• Graphical presentation( cumulative frequency curve, histogram,
scatter plot, etc)
• Tabular presentation (Frequency table, relative frequency table and
cumulative frequency table).
TABULAR PRESENTATION

EX1: A sample of 30 persons showed their ages as follows

20 18 25 68 23
25 16 22 29 37
35 49 42 65 37
42 63 65 49 42
53 48 65 72 69
57 48 39 58 67
FREQUENCY DISTRIBUTION
IS THE LISTING OF ALL POSSIBLE VALUES OBTAINED TOGETHER
WITH THEIR FREQUENCY WITH WHICH THESE VALUES OCCUR
Constructing a frequency distribution
• The classes should be clearly defined and each of the observation should be
included in only one class interval.
• The number of classes should neither be too large nor too small. Normally 6 to
15 are considered adequate. Fewer class intervals would mean greater interval
width with consistent loss of accuracy. Too many class intervals result in
greater complexity.
• All intervals should be of the same width. This is preferred for easy
computation.
Class width = range/no. of classes
CONSTRUCTING A FREQUENCY DISTRIBUTION
CONTINUED …………..
• intervals have to be continuous throughout the distribution. Having them
grouped as 20 to less than 25, 25 to less than 30, etc. simplifies the
computations i.e. everybody who is between and a fraction less than 25 is
included in the first category and so on.
• The lower limits of the class intervals should be simple multiples of the interval
width. 20 < 25, 25 < 30 etc. the lower limits are 20, 25, etc and for each is a
simple multiple of the class width which is 5
CONSTRUCTING A FREQUENCY DISTRIBUTION
CONTINUED …………..

Class intervals (C.I) Tally Frequency (f)

15 to less than 25 5

25 to less than 35 3

35 to less than 45 7

45 to less than 55 5

55 to less than 65 3

65 to less than 75 7

Total 30
• While the frequency table tells us the number of units in each class interval, it
does not tell us directly the number of units that lie below or above the
specified values of the class interval. This can be determined from a
cumulative frequency distribution.
• When the interest of the investigator focuses on the number of number above
of cases below the specified Value, then the specified value represents the
represents the upper limit of the class interval (less than)
• When the interest lies in finding number of cases above the specified value,
then the specified value represents the represents the lower limit of the class
interval (more than)
CUMULATIVE FREQUENCY DISTRIBUTION

Class intervals (C.I) Frequency (f) Cum. Freq Cum. Freq


(less than) (more than)
15 to less than 25 5 5 30
25 to less than 35 3 8 25
35 to less than 45 7 15 22
45 to less than 55 5 20 15
55 to less than 65 3 23 10
65 to less than 75 7 30 7
Total 30
RELATIVE FREQUENCY DISTRIBUTION

• The frequency distribution is a summary table in which the original


data is condensed into groups and their frequencies. If a researcher
would like to know the proportion or the percentages of cases in
each group then he can do so by constructing a relative frequency
distribution table.
RELATIVE FREQUENCY DISTRIBUTION
CONT ………

Class intervals (C.I) Frequency (f) Relative Frequency % Freq


15 to less than 25 5 5/30 16.7
25 to less than 35 3 3/30
35 to less than 45 7 7/30
45 to less than 55 5 5/30
55 to less than 65 3 3/30
65 to less than 75 7 7/30
Total 30

In your free time, try out the cumulative relative frequency distribution
GRAPHICAL PRESENTATION
Histogram

•In this type of representation, the given data are plotted


in the form of a series of rectangles.

•Class intervals are marked on the X-axis and the


frequencies along the Y-axis according to a suitable
scale.

•Unlike the bar chart which is one dimensional, a


histogram is two dimensional in which the length and the
width are both important.
GIVEN THE FOLLOWING INFORMATION, CONSTRUCT A HISTOGRAM

Class intervals (C.I) Frequency (f)

15 to less than 25 5

25 to less than 35 3

35 to less than 45 7

45 to less than 55 5

55 to less than 65 3

65 to less than 75 7

Total 30
GRAPHICAL PRESENTATION CONT ….
Frequency Polygon

•Refers to a line chart of frequency distribution in which either


values of discrete variables or the mid points of class intervals
are plotted against the frequencies and these plotted points are
joined together by straight lines.

•To form a polygon, the starting mid point is joined by a “fictitious”


preceding midpoint whose value is zero so that the beginning of
the curve touches the horizontal axis. The same must be done for
last midpoint
GRAPHICAL PRESENTATION CONT ….

Given the available information construct a frequency polygon

Class intervals (C.I) Frequency (f) Mid-point


15 to less than 25 5 20
25 to less than 35 3 30
35 to less than 45 7 40
45 to less than 55 5 50
55 to less than 65 3 60
65 to less than 75 7 70
Total 30
CUMULATIVE FREQUENCY CURVE

Class intervals (C.I) Cum. Freq Cum. Freq


(less than o-give) (more than o-give)
15 to less than 25 5 30
25 to less than 35 8 25
35 to less than 45 15 22
45 to less than 55 20 15
55 to less than 65 23 10
65 to less than 75 30 7
DIAGRAMMATIC PRESENTATION

Bar diagram
Suppose that the following were the gross revenues for xxx company for the years 2014,
2015, 2016

Year Revenues $
2014 120

2015 100
2016 60

Construct a bar diagram


DIAGRAMMATIC PRESENTATION CONTI…..
Construct a sub-divided bar chart for the 3 types of expenditures in dollars of a
family of four for the years 2014, 2015, 2016, 2017

Year Food Education Other

2014 3000 2000 3000

2015 3500 3000 4000

2016 4000 3500 5000

2017 5000 5000 6000


DIAGRAMMATIC PRESENTATION
(PIE DIAGRAM)
A PIE DIAGRAM ENABLES US TO SHOW THE PARTITIONING OF A TOTAL INTO ITS COMPONENT PART. THE
SIZE OF THE SLICE REPRESENTS THE PROPORTION OF THE COMPONENT OUT OF THE TOTAL.

Example:
The following figures relate to the expenditure on various construction components that are
used building a house.

Item Expenditure Expenditure in degrees


Labour 3000 70
Cement 3500 81
Steel 4000 93
Timber 5000 116
Total 15500 360

Construct a pie diagram.


Diagrammatic Presentation (Pictogram)
Pictogram means representation of data in form of pictures. It is quite a popular method used by governments
and other organizations for information exhibition.
CONCLUSION
1. Defined Statistics &statistical path
2. Described the Uses of Statistics
3. Distinguished Descriptive & Inferential Statistics
4. Defined Population, Sample, Parameter,
and Statistic
5. Defined data & variable types
6. Explained data collection and presentation methods

You might also like