Topic-1 - STA 104 - 124
Topic-1 - STA 104 - 124
Topic-1 - STA 104 - 124
Examples
• Knowing the health quality of the people in
Bangladesh.
• Testing Light bulbs (to enumerate population is
destructive)
• Forecasting the winner of an election (population
too big; people change their minds)
Solutions
Collect a smaller set of measurements that will
(hopefully) be representative of the larger set.
Dr. Md. Sohel Rana, App.Stat, EWU 3
Data
• The raw material of Statistics is data.
• We may define data as figures. Figures result from
the process of counting or from taking a
measurement.
For example:
- When a hospital administrator counts the number
of patients (counting).
- When a nurse weighs a patient (measurement)
Parameter Population
Statistic Sample
Data
Categorical Numerical
/Qualitative /Quantitative
Examples:
Marital Status
Are you registered to Discrete Continuous
vote?
Eye Color Examples: Examples:
(Defined categories or Number of Children Weight
groups) Defects per hour Voltage
(Counted items) (Measured characteristics)
Qualitative Data
Descriptive statistics
Collecting, summarizing, and processing data to transform data into information
Inferential statistics
provide the bases for predictions, forecasts, and estimates that are used to
transform information into knowledge
Graphical Numerical
• Bar/Pie Chart
• Line Plot (Time Series)
• Dotplot
• Stem-and-Leaf Plot
• Histogram
• Ogive
• Boxplot
Graphing Quantitative Variables (1)
A Big Mac 5
hamburger costs 4
Africa. 1
0
Switzerland U.S. South Africa
Country
Graphing Quantitative Variables (2)
• A single quantitative variable measured over time is
called a time series. It can be graphed using a line or
bar chart.
CPI: All Urban Consumers-Seasonally Adjusted
Sept Oct Nov Dec Jan Feb Mar
178.10 177.60 177.50 177.30 177.60 178.00 178.60
4 5 6 7
4 0
5
6 055588
7 000000455
8
9 05
Range
• We choose to use 6 intervals.
• Minimum class width = (70 – 26)/6 = 7.33
• Convenient class width = 8
• Use 6 classes of length 8, starting at 25.
Frequency
with gaps from empty classes 1.5
2
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
Temperature
Frequency
8
variation. 4
2
0
0 30 60 More
Temperature
Polygon
Interpreting Graphs: Location and Spread
No Outliers Outlier
7. Data entry/compilation
• Validation
• Feedback
8. Analysis
9. Dissemination
10. Plans for next survey: what did you learn, what did you miss?
• Personal interview
• Telephone
• Mail
• Computer assisted self-interviewing(CASI)
Variants: CAPI (personal interview); CATI (telephone
interview) – Replaces the papers
2. Non-probability Sampling
This is the method of selecting samples, in which the choice
of selection of sampling units depends entirely on the
judgment of the sampler.
2. Quota Sampling
3. Judgment Sampling
4. Snowball Sampling
Systematic sampling
Stratified sampling
Cluster sampling
Multi-stage sampling
• Random Sampling
• Selected by using chance or
random numbers
• Each individual subject
(human or otherwise) has an
equal chance of being
selected
Example: Suppose the population has 742 units, and we want to take an SRS of
size 30. Divide the random digits into segments of size 3 and throw out any
sequences of three digits not between 001 and 742. If a number occurs that has
already been included in the sample, ignore it. If we used this method with the first
line of random numbers table, the sequence of three-digit numbers would be
• Systematic Sampling
• Select a random starting point and then select every kth subject
in the population
• Simple to use so it is used often
Stratified Sampling
Divide the population into at least two different groups
with common characteristic(s), then draw SOME subjects
from each group (group is called strata or stratum)
Results in a more representative sample
Cluster Sampling
Divide the population into
groups (called clusters),
randomly select some of
the groups, and then
collect data from ALL
members of the selected
groups
Used extensively by
government and private
research organizations
Examples:
Exit Polls
Dr. Md. Sohel Rana, App.Stat, EWU 55
Advantages of probability sample
• Provides a quantitative measure of the extent of variation due to
random effects.
• Provides acceptable data at minimum cost .
• Better control over nonsampling sources of errors.
• Mathematical statistics and probability can be applied to
analyze and interpret the data.