Chapter 1
Chapter 1
Chapter 1
Types of data
Categorical (qualitative) variables take categories as their values such as
“yes”, “no”, or “blue”, “brown”, “green”.
Numerical (quantitative) variables have values that represent a counted or
measured quantity.
Discrete variables arise from a counting process.
Continuous variables arise from a measuring process.
Population: A population contains all the items or individuals of interest
that you seek to study.
population: all FPT students
Sample: A sample contains only a portion of a population of interest.
sample: 100 FPT students
Sources of data
When you perform the activity that collects the data, you are using a
primary data source.
When the data collection part of these activities is done by someone else,
you are using a secondary data source.
Primary Sources: The data collector is the one using the data for analysis:
Data from a political survey.
Data collected from an experiment. (A treatment is applied to part of
a population and responses are observed.)
Observed data (A researcher observes and measures characteristics
of interest of part of a population.)
Secondary Sources: The person performing data analysis is not the data
collector:
Analyzing census data.
Examining data from print journals or data published on the internet.
Z Score
Quartiles split the ranked data into 4 segments with an equal number of
values per segment.
The first quartile, Q1, is the value for which 25% of the values are
smaller and 75% are larger.
Q2 is the same as the median (50% of the values are smaller and 50%
are larger).
Only 25% of the values are greater than the third quartile.
Find a quartile by determining the value in the appropriate position in the
ranked data, where:
First quartile position: Q1=n+14 ranked value.
Second quartile position: Q2=n+12 ranked value. = Median
Third quartile position: Q3=3*(n+1)4 ranked value.
where n is the number of observed
values.
exclusive.
PA∩B=0
PA∪B=PA+PB-PA∩B=PA+PB-0=PA+P(B)
5. Independent events
Events A and B are said to be independent if the probability of B occurring
is unaffected by the occurrence of the event A happening.
Ex: Tossing a coin twice.
Let A be the event that the first coin toss lands on heads.
Let B be the event that the second coin toss lands on heads.
🡪 Clearly the result of the first coin toss does not affect the result of
the second coin toss.
🡪 Events A and B are independent.
PA∩B=PA*P(B)
PA∪B=PA+PB-PA∩B=PA+PB-PA*P(B)
6. Collectively exhaustive events
One of the events must occur. The set of events covers the entire sample
space.
PA∪B=1
7. Events associated with OR A∪B
is the event that consists of all outcomes that are contained in either of
the two events.
8. Events associated with AND A∩B
is the event that consists of all outcomes that are contained in two
events.
III. Graph
1. Venn diagram
2. Contingency table
3. Decision tree
4. Normal distribution
Bell Shaped
Symmetrical
Mean, Median and Mode are Equal
Location is determined by the mean, μ.
Spread is determined by the standard deviation, σ.
The random variable has an infinite theoretical range: -∞ to +∞.
Bell Shaped
Symmetrical
Mean, Median and Mode are Equal
Location is determined by the mean, μ.
Spread is determined by the standard deviation, σ.
Range: The random variable has an infinite theoretical
range: -∞ to +∞.
The Standardized Normal Distribution (Also known as the “Z” distribution)
Z-score
Mean is 0.
Standard Deviation is 1. → Variance = 1^2 = 1
Symmetrical
Also called a rectangular distribution
Range: Any value between the smallest and largest is equally likely.
deviation
normal dist
1 probability density function (PDF)
ư