Topic-1 - STA 104 - 124

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Course:

Elements of Statistics & Probability (STA 104)

o Introduction and Descriptive plots


Instructor:
Dr. Md. Sohel Rana
Associate Professor of Statistics,
Department of Mathematical & Physical Sciences,
East West University.
Email: [email protected]
Introduction
Statistics is a field of study concerned with
1- collection, organization, summarization and
analysis of data.
2- drawing of inferences about a body of data when
only a part of the data is observed.
Statisticians try to interpret and communicate the
results to others.

Dr. Md. Sohel Rana, App.Stat, EWU 2


Common Problem
• Decision or prediction about a large body of
measurements which cannot be totally enumerated.

Examples
• Knowing the health quality of the people in
Bangladesh.
• Testing Light bulbs (to enumerate population is
destructive)
• Forecasting the winner of an election (population
too big; people change their minds)
Solutions
Collect a smaller set of measurements that will
(hopefully) be representative of the larger set.
Dr. Md. Sohel Rana, App.Stat, EWU 3
Data
• The raw material of Statistics is data.
• We may define data as figures. Figures result from
the process of counting or from taking a
measurement.
For example:
- When a hospital administrator counts the number
of patients (counting).
- When a nurse weighs a patient (measurement)

Dr. Md. Sohel Rana, App.Stat, EWU 4


Sources of Data:
We search for suitable data to serve as the raw
material for our investigation.
Such data are available from one or more of the
following sources:
1- Routinely kept records.
For example:
- Hospital medical records contain immense
amounts of information on patients.
- Hospital accounting records contain a wealth of
data on the facility’s business
- activities.

Dr. Md. Sohel Rana, App.Stat, EWU 5


2- External sources.
The data needed to answer a question may already
exist in the form of
published reports, commercially available data
banks, or the research literature, i.e. someone else
has already asked the same question.

Dr. Md. Sohel Rana, App.Stat, EWU 6


3- Surveys:
The source may be a survey, if the data needed is
about answering certain questions.
For example:
If the administrator of a clinic wishes to obtain
information regarding the mode of transportation
used by patients to visit the clinic, then a survey may
be conducted among patients to obtain this
information.

Dr. Md. Sohel Rana, App.Stat, EWU 7


4- Experiments.
Frequently the data needed to answer
a question are available only as the
result of an experiment.
For example:
If a nurse wishes to know which of several
strategies is best for maximizing patient
compliance, she might conduct an experiment in
which the different strategies of motivating
compliance are tried with different patients.

Dr. Md. Sohel Rana, App.Stat, EWU 8


Introduction to Statistical Terms
A variable is a characteristic that takes on different
values in different persons, places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental clinic.

Dr. Md. Sohel Rana, App.Stat, EWU 9


Introduction to Statistical Terms
Data Set
oA collection of data values
Observation
othe value, at a particular period, of a particular variable
An experimental unit is the individual or object on which a
variable is measured.
A measurement results when a variable is actually measured on
an experimental unit.
A set of measurements, called data, can be either a sample or a
population.

Dr. Md. Sohel Rana, App.Stat, EWU 10


Populations and Samples

• A Population is the set of all items or individuals of interest


• Examples: No of patient will be admitted in the hospital in
the year 2018.

• A Sample is a subset of the population


• Examples: 100 patients selected at random for interview.

Dr. Md. Sohel Rana, App.Stat, EWU 11


Parameters & Statistics
A parameter is a numerical description of a
population characteristic.

A statistic is a numerical description of a sample


characteristic.

Parameter Population

Statistic Sample

Dr. Md. Sohel Rana, App.Stat, EWU 12


How many variables have you measured?

• Univariate data: One variable is measured on a


single experimental unit.

• Bivariate data: Two variables are measured on a


single experimental unit.

• Multivariate data: More than two variables are


measured on a single experimental unit.

Dr. Md. Sohel Rana, App.Stat, EWU 13


Types of Data

Data

Categorical Numerical
/Qualitative /Quantitative
Examples:
 Marital Status
 Are you registered to Discrete Continuous
vote?
 Eye Color Examples: Examples:
(Defined categories or  Number of Children  Weight
groups)  Defects per hour  Voltage
(Counted items) (Measured characteristics)

Dr. Md. Sohel Rana, MPS, EWU Chap 1-14


Measurement Levels

Differences between oexample: height, time,


measurements, true Ratio Data weight, etc
zero exists
Quantitative Data

Differences between oexample: temperature


measurements but no Interval Data
true zero

Ordered Categories oExample: health quality


(rankings, order, or Ordinal Data (excellent, good, adequate,
scaling) bad, terrible)

Qualitative Data

Categories (no ordering oexample: gender (male or


or direction) Nominal Data female), religion.

Dr. Md. Sohel Rana, MPS, EWU Chap 1-15


Statistical Methods

Descriptive Statistics Inferential Statistics

Descriptive statistics
Collecting, summarizing, and processing data to transform data into information

Inferential statistics
provide the bases for predictions, forecasts, and estimates that are used to
transform information into knowledge

Dr. Md. Sohel Rana, App.Stat, EWU 16


Descriptive Statistics

Graphical Numerical

Qualitative Quantitative Qualitative Quantitative

•Bar Chart •Bar/Pie Chart


•Pie Chart •Line Plot (Time Series) •Central Tendency
•Dotplot •Tables, •Dispersion Variability)
•Stem-and-Leaf Plot frequency,
•Histogram percentage,
•Ogive cumulative
•Boxplot percentage
•Cross tabulation
Note: Some graphs require a tabular
representation (frequency distribution)
Dr. Md. Sohel Rana, App.Stat, EWU 17
Graphing Quantitative Variables

• Bar/Pie Chart
• Line Plot (Time Series)
• Dotplot
• Stem-and-Leaf Plot
• Histogram
• Ogive
• Boxplot
Graphing Quantitative Variables (1)

• A single quantitative variable measured for different


population segments or for different categories of
classification can be graphed using a bar or pie chart.

A Big Mac 5

hamburger costs 4

Cost of a Big Mac ($)


$4.90 in Switzerland,
3
$2.90 in the U.S. and
$1.86 in South 2

Africa. 1

0
Switzerland U.S. South Africa
Country
Graphing Quantitative Variables (2)
• A single quantitative variable measured over time is
called a time series. It can be graphed using a line or
bar chart.
CPI: All Urban Consumers-Seasonally Adjusted
Sept Oct Nov Dec Jan Feb Mar
178.10 177.60 177.50 177.30 177.60 178.00 178.60

Dr. Md. Sohel Rana, App.Stat, EWU 20


Graphing Quantitative Variables (3) -Dotplot
• The simplest graph for quantitative data
• Plots the measurements as points on a horizontal axis,
stacking the points that duplicate existing points.
• Example: The set 4, 5, 5, 7, 6

4 5 6 7

Dr. Md. Sohel Rana, App.Stat, EWU 21


Stem and Leaf Plots (4)

• A simple graph for quantitative data


• Uses the actual numerical values of each data point.

– Divide each measurement into two parts: the stem


and the leaf.
– List the stems in a column, with a vertical line to
their right.
– For each measurement, record the leaf portion in
the same row as its matching stem.
– Order the leaves from lowest to highest in each
stem.
– Provide a key to your coding.
Dr. Md. Sohel Rana, App.Stat, EWU 22
Example : Stem-and-Leaf Plot

The prices ($) of 18 brands of walking shoes:


90 70 70 70 75 70 65 68 60
74 70 95 75 70 68 65 40 65

4 0
5
6 055588
7 000000455
8
9 05

Dr. Md. Sohel Rana, App.Stat, EWU 23


Relative Frequency Histograms (5) : cont’d

• Draw the relative frequency histogram, plotting the


subintervals on the horizontal axis and the relative
frequencies on the vertical axis.
• The height of the bar represents
• The proportion of measurements falling in that class
or subinterval.
• The probability that a single measurement, drawn at
random from the set, will belong to that class or
subinterval.

Dr. Md. Sohel Rana, App.Stat, EWU 24


Example 1
The ages of 50 patients are collected from hospital.
• 34 48 70 63 52 52 35 50 37 43 53 43 52 44
• 42 31 36 48 43 26 58 62 49 34 48 53 39 45
• 34 59 34 66 40 59 36 41 35 36 62 34 38 28
• 43 50 30 43 32 44 58 53

Range
• We choose to use 6 intervals.
• Minimum class width = (70 – 26)/6 = 7.33
• Convenient class width = 8
• Use 6 classes of length 8, starting at 25.

Dr. Md. Sohel Rana, App.Stat, EWU 25


Class Class Midpoint Frequency Relative Percent
Boundaries Frequency
25 to < 33 24.5 – 33.5 29 5 5/50 = .10 10%
34 to < 42 33.5 – 42.5 38 16 16/50 = .32 32%
43 to < 51 42.5 – 51.5 47 14 14/50 = .28 28%
52 to < 60 51.5 – 60.5 56 10 10/50 = .20 20%
61 to < 69 60.5 – 69.5 65 4 4/50 = .08 8%
70 to < 78 69.5 – 78.5 74 1 1/50 = .02 2%
Describing the Distribution

Shape? Skewed right


Outliers? No.
What proportion of the
tenured faculty are younger (16 + 5)/50 = 31/50 = .62=62%
than 42.5?
What is the probability that a
(10 + 4 + 1)/50 = 15/50 = .34
randomly selected faculty
member is 52 or older?
How Many Class Intervals?
• Many (Narrow class intervals) 3.5

• may yield a very jagged distribution


3
2.5

Frequency
with gaps from empty classes 1.5
2

• Can give a poor indication of how 0.5


1

frequency varies across classes 0

4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
Temperature

• Few (Wide class intervals)


• may compress variation too much
12
and yield a blocky distribution 10

• can obscure important patterns of

Frequency
8

variation. 4

2
0
0 30 60 More
Temperature

(X axis labels are upper class endpoints)


Dr. Md. Sohel Rana, App.Stat, EWU 28
Frequency Distribution
for Continuous Random Variables

For large samples, we can’t use the simple frequency table


to represent the data. We need to divide the data into
groups or intervals or classes. So, we need to determine:

1- The number of intervals (k).


Too few intervals are not good because information will be
lost.
Too many intervals are not helpful to summarize the data.
A commonly followed rule is that 6 ≤ k ≤ 15,
or the following formula may be used,
k = 1 + 3.322 (log10 n)

Dr. Md. Sohel Rana, App.Stat, EWU 29


2- The range (R)
It is the difference between the largest and the
smallest observation in the data set.

3- The Width of the interval (w)


Class intervals generally should be of the same
width. Thus, if we want k intervals, then w is
chosen such that
w ≥ R / k.

Dr. Md. Sohel Rana, App.Stat, EWU 30


Example:
Assume that the number of observations
equal 100, then
k = 1+3.322(log10 100)
= 1 + 3.3222 (2) = 7.6  8.
Assume that the smallest value = 5 and the largest one of the
data = 61, then
R = 61 – 5 = 56 and
w = 56 / 8 = 7.
To make the summarization more comprehensible, the
class width may be 5 or 10 or the multiples of 10.

Dr. Md. Sohel Rana, App.Stat, EWU 31


Example 2.3.1
• We wish to know how many class interval to have in the
frequency distribution of the data in Table 1.4.1 Page 9-10 of
ages of 189 subjects who Participated in a study on smoking
cessation
• Solution :
• Since the number of observations
equal 189, then
• k = 1+3.322(log 169)
• = 1 + 3.3222 (2.276)  9,
• R = 82 – 30 = 52 and
• w = 52 / 9 = 5.778

• It is better to let w = 10, then the intervals


• will be in the form:

Dr. Md. Sohel Rana, App.Stat, EWU 32


Histogram and frequency Polygon

Polygon
Interpreting Graphs: Location and Spread

• Where is the data centered on the horizontal axis, and


how does it spread out from the center?

Dr. Md. Sohel Rana, App.Stat, EWU 34


Interpreting Graphs: Shapes

Mound shaped and symmetric


(mirror images)

Skewed right: a few unusually


large measurements

Skewed left: a few unusually


small measurements

Bimodal: two local peaks


Dr. Md. Sohel Rana, App.Stat, EWU 35
Interpreting Graphs: Outliers

No Outliers Outlier

• Are there any strange or unusual measurements


that stand out in the data set?

Dr. Md. Sohel Rana, App.Stat, EWU 36


Example

• A quality control process measures the diameter of a


gear being made by a machine (cm). The technician
records 15 diameters, but inadvertently makes a typing
mistake on the second entry.

1.991 1.891 1.991 1.988 1.993 1.989 1.990 1.988


1.988 1.993 1.991 1.989 1.989 1.993 1.990 1.994

Dr. Md. Sohel Rana, App.Stat, EWU 37


Sample Surveys
In statistics, survey sampling describes the process
of selecting a sample of elements from a target
population to conduct a survey. Usually the survey is
some type of questionnaire (i.e. in-person, phone or
internet survey).
The total count of all units of the population for a
certain characteristic is known as complete
enumeration, also termed as census survey.

Dr. Md. Sohel Rana, App.Stat, EWU 38


Why Sample Surveys are Used
Information on characteristics of populations is constantly
needed by politicians, marketing departments of companies;
public officials responsible for planning health and social
services, and others. This information is often obtained by use of
sample surveys.

Example. A health department in a large state is interested in


determining the proportion of the state's children of elementary
school age who have been immunized (e.g., polio, diphtheria,
tetanus, etc.). For administrative reasons, this task must be
completed in only one month. To handle this problem selecting a
subset (a sample) from the original set of all measurements (the
population) might be the ultimate choice due to the time, travel
expenses, a sizable staff, etc.
Dr. Md. Sohel Rana, App.Stat, EWU 39
Advantages of sample surveys over census surveys

 Get information about large populations


 Less costs
 Less field time
 More accuracy i.e. Can Do A Better Job of Data
Collection
 When it’s impossible to study the whole population

Dr. Md. Sohel Rana, App.Stat, EWU 40


Principle Steps in a Sample Survey
1. Setting the study objectives
•What are the objectives of the study?
•What information/data need to be collected?
2. Defining the study population
• Sampling frame
3. Decide sample design
4. Questionnaire design
• Appropriateness, acceptability, culturally appropriate,
understandable.
5. Fieldwork
• Training/Supervision
• Quality monitoring
• Timing: seasonality
Dr. Md. Sohel Rana, App.Stat, EWU 41
Principle Steps in a Sample Survey
6. Quality assurance
• Every steps
• Minimizing
errors/bias/cheating

7. Data entry/compilation
• Validation
• Feedback

8. Analysis
9. Dissemination
10. Plans for next survey: what did you learn, what did you miss?

Dr. Md. Sohel Rana, App.Stat, EWU 42


Modes of survey administration

• Personal interview
• Telephone
• Mail
• Computer assisted self-interviewing(CASI)
Variants: CAPI (personal interview); CATI (telephone
interview) – Replaces the papers

• Internet Surveys: Surveys over the WWW

Dr. Md. Sohel Rana, App.Stat, EWU 43


Classification of Sampling
The sampling procedures that are commonly used may be
classified in to TWO categories:
1. Probability Sampling
This is the method of selecting samples according to certain
laws of probability in which each unit in the population has
some definite probability of being selected in the sample.

2. Non-probability Sampling
This is the method of selecting samples, in which the choice
of selection of sampling units depends entirely on the
judgment of the sampler.

Dr. Md. Sohel Rana, App.Stat, EWU 44


Types of Non-Probability Sampling

1. Convenient (or Convenience) Sampling

2. Quota Sampling

3. Judgment Sampling

4. Snowball Sampling

Dr. Md. Sohel Rana, App.Stat, EWU 45


Dr. Md. Sohel Rana, App.Stat, EWU 46
Convenient Sampling
Selecting easily accessible participants with no randomization.

For example, One of the most common examples of convenience


sampling is using student volunteers as subjects for the research.
Another example is using subjects that are selected from a
clinic, a class or an institution that is easily accessible to the
researcher. A more concrete example is choosing five people
from a class or choosing the first five names from the list of
patients.

Dr. Md. Sohel Rana, App.Stat, EWU 47


Quota Sampling

Quota sampling refers to selection with controls, ensuring that


specified numbers (quotas) are obtained from each specified
population subgroup (e.g. households or persons classified by
relevant characteristics), but with essentially no randomization
of unit selection within the subgroups.

For example you include exactly 50 males and 50 females in a


sample of 100.

Dr. Md. Sohel Rana, App.Stat, EWU 48


Judgment/Purposive Sampling
A purposive sample refers to selection of units based on personal
judgement rather than randomization. Because, participants are
selected based on certain predetermined characteristics, no
randomization.
For example, you want to be sure include African Americans,
EuroAmericans, Latinos and Asian Americans in relatively
equal numbers.
Snowball Sampling
Selecting participants by finding one or two participants and then
asking them to refer you to others.

For example, meeting a homeless person, interviewing that


person, and then asking him/her to introduce you to other
homeless people you might interview.
Dr. Md. Sohel Rana, App.Stat, EWU 49
Types of Probability Sampling

 Simple random sampling

 Systematic sampling

 Stratified sampling

 Cluster sampling

 Multi-stage sampling

Dr. Md. Sohel Rana, App.Stat, EWU 50


Simple Random Sampling

• Random Sampling
• Selected by using chance or
random numbers
• Each individual subject
(human or otherwise) has an
equal chance of being
selected

Dr. Md. Sohel Rana, App.Stat, EWU 51


Simple Random Sampling
Procedures:
o Lottery Method
o Use of Random Number Tables

Example: Suppose the population has 742 units, and we want to take an SRS of
size 30. Divide the random digits into segments of size 3 and throw out any
sequences of three digits not between 001 and 742. If a number occurs that has
already been included in the sample, ignore it. If we used this method with the first
line of random numbers table, the sequence of three-digit numbers would be

614 503 024 611 206 . . .


Dr. Md. Sohel Rana, App.Stat, EWU 52
Systematic Sampling

• Systematic Sampling
• Select a random starting point and then select every kth subject
in the population
• Simple to use so it is used often

Dr. Md. Sohel Rana, App.Stat, EWU 53


Stratified Sampling

Stratified Sampling
 Divide the population into at least two different groups
with common characteristic(s), then draw SOME subjects
from each group (group is called strata or stratum)
 Results in a more representative sample

Dr. Md. Sohel Rana, App.Stat, EWU 54


Cluster Sampling

 Cluster Sampling
 Divide the population into
groups (called clusters),
randomly select some of
the groups, and then
collect data from ALL
members of the selected
groups
 Used extensively by
government and private
research organizations
 Examples:
 Exit Polls
Dr. Md. Sohel Rana, App.Stat, EWU 55
Advantages of probability sample
• Provides a quantitative measure of the extent of variation due to
random effects.
• Provides acceptable data at minimum cost .
• Better control over nonsampling sources of errors.
• Mathematical statistics and probability can be applied to
analyze and interpret the data.

Disadvantages of Non-probability Sampling


• Purposively selected without any confidence.
• Selection bias.
• Bias unknown.
• No mathematical property.

Dr. Md. Sohel Rana, App.Stat, EWU 56


Questionnaires
Questionnaires: A set of common questions laid out in a standard and
logical form to record individual respondent’s attitudes and behavior.
Instructions show the interviewer or the respondent how to move
through the questions and complete the schedule. It could be printed on
paper or on a computer screen.
The key Steps of effective questionnaire design:
Step 1 – Decide what information is required
Step 2 – Make a rough listing of the questions
Step 3 – Refine the question phrasing
Step 4 – Develop the response format
Step 5 – Put the questions into an appropriate sequence
Step 6 – Finalize the layout of the questionnaire
Step 7 – Pretest and revise Dr. Md. Sohel Rana, App.Stat, EWU 57
Question Types
Different types of questions can be used, e.g. open vs. closed,
single vs. multiple responses, ranking, and rating.
Example of close-ended questions are:
Will you please do me a favor?
Example of open-ended questions:
How will you help the company if you are hired to work for us?
Example of Single Response Questions:
Gender: [] Male [] Female
Example of Multiple Response Questions:
Which of the following have you bought in the past week? Tick all
that apply.
[] Coke [] Pepsi [] Fanta [] None of these

Dr. Md. Sohel Rana, App.Stat, EWU 58


Question Types
Example of Ranking Response Questions:
Rank the following brands according to how much you like them...
Please place a 3 next to the brand you like most, a 2 in your next
preferred brand and a 1 next to your least preferred brand.
Coke ____ Pepsi ____ Fanta ____
Example of Rating Response Questions:
How do you rate the following?

Dr. Md. Sohel Rana, App.Stat, EWU 59


Pilot Survey
A pilot survey, pilot study, or pilot experiment is a small scale
preliminary study conducted in order to evaluate feasibility
(conveniently done), time, cost, adverse events, and effect size
(statistical variability) in an attempt to predict an appropriate sample
size and improve upon the study design prior to performance of a
full-scale research project.

Pilot experiments are frequently carried out before large-scale


quantitative research, in an attempt to avoid time and money being
wasted on an inadequately designed project.

Dr. Md. Sohel Rana, App.Stat, EWU 60


THE END

Dr. Md. Sohel Rana, App.Stat, EWU 61

You might also like