Biostats Lesson 3

BIOSTATISTICS AND EPIDEMIOLOGY OTHER TYPES OF GRAPHS
3rd term / Prelims / Lecture 2 – Descriptive Statistics (part 2)

BAR GRAPHS
DISTRIBUTION SHAPES
• Distribution of the data pertains to how data are

being distributed in each axis and in a given number
line.
• Distributions have no exact shape, but we are just
ball-parking or estimating. What we are after here is HORIZONTAL VERTICAL
just the overall pattern of the distribution of the
frequency. → Bar graphs are good for categorical data, have
spaces.
BELL-SHAPED → the most natural shape of the → Bar graphs are different from histograms because
frequency distribution. histograms have bars that are sticking with each
→ the most natural shape of the other demoting that the data is continuous.
frequency distribution.
UNIFORM → usually created through PARETO CHART
random distribution or purely
by chance.
J-SHAPED → most of the values are at the

higher part of the x-axis
REVERSED → most of the values are at the → Pareto charts represent a frequency distribution for
J-SHAPED lower part of the x-axis. categorical variables and the frequencies are
arranged from highest to lowest.
→ It doesn’t denote that they are continuous- so dapat
they are not sticking to each other it is still not a
histogram.
RIGHT-SKEWED → The skew refers to the tail.
→ positively skewed.
TIME SERIES GRAPH COMPOUND TIME
SERIES GRAPH
LEFT-SKEWED → negatively skewed
BIMODAL → has 2 peaks (modes)

→ If more than 2, it is called
MULTIMODAL → a line graph showing you the trend over time. The x-
axis is in terms of time.
→ compound time series graph is where it has 2 time
U-SHAPED → rarely seen. ayaw daw tagai si
series lines in one graph which is common in
sir ani.
Epidemiology.
PIE GRAPH MULTIMODAL multiple more modes; more than
two bars stand out
NO MODE Multiple more modes; more than

two bars stand out
→ shows the partition of a particular category in

relation to the whole. Shape adjectives
→ usually misused because we use it to substitute a
bar graph. SYMMETRIC
˃ Percentage/ Relative value = Pie graph
Right-side is mirror image of
˃ Absolute value = Bar graph
the left-side
STEM AND LEAF PLOTS

SKEWED
Asymmetric distribution and describing where
tapering of the sides (tails) is different.
POSITIVELY SKEWED NEGATIVELY SKEWED
right tail is longer Left tail is longer
SKEWNESS AND KURTOSIS
Skewness central tendency to go to the right, left,

or center
Kurtosis dispersion of how flat and how big that
values are
→ Instead of using graphs, you are using the numbers
as the bar graph. Types of Kurtosis
• Be careful of misleading graphs. There are graphs LEFTOKURTIC highly-peaked; sharp

with malicious intent or intentionally use to mislead
people.
DESCRIBING FREQUENCY DISTRIBUTION

MESOKURTIC bell-shaped; symmetrical
UNIMODAL single mode; a single bar stands out tails, rounded peak
BIMODAL two modes; two bars stand out PLATYKURTIC Flat; heavy tails
TRADITIONAL STATISTICS → Widely used measure of central tendency
→ Layman’s concept of average
There are three concepts:
→ Affected by presence of outliers in the data.
Measures of how they are centered or their
Central Tendency average. Sample (statistic) x̄ = Σx / n
Population (parameter) µ = Σx / N
Measures of how dispersed the data are.
Variation • Mean for Ungrouped data:
Measures of describing the position of the
Comes from raw data.
Position data value in relation to other
The normal way of computing.
values in relation to the data set.
• Mean for Grouped data:
• Average and mean are synonymous with each other
Comes form the frequency distribution table.
except in statistics because average is an all-
Makes use of the summation of all the
encompassing term which can be mean, median, or
(midpoint multiplied by the frequency).
mode.
Only an approximation.
PARAMETER VS STATISTIC
Midpoint = lower + upper / 2
Statistic characteristic or measure obtained by
using the data values from a sample.
Parameter a characteristic or measure obtained
by using all the data values from a
specific population.
▪ n = sample size
▪ N = population size
GENERAL ROUNDING RULE
• rounding should not be done until the final answer (2) Median
is calculated.
→ middlemost value.
• the calculated mean/SD should have 1 decimal
→ Obtained by sorting the values from the lowest to
place higher than the data values.
highest and getting the value in the middle.
→ Preferred as the typical value (or center) than mean
MEASURES OF CENTRAL TENDENCY when distribution is skewed.
˃ Median is less affected by outliers
→ describes where the distribution may be “centered” than the Mean.
or the bulk of the data.
→ There are various concepts of center:
Mean Center of gravity

Median Value in the middle
Mode Most typical value
˃ Collectively known as Average.
(1) Mean
→ known as arithmetic average.

→ average of values: equal to the sum of all values ˃ If the data are even, add the 2 middle values
divided by the number of values. and divide it by 2.
(3) Mode (1) Range
→ Most frequently occurring value, most typical → Simplest measure of dispersion, used to get a quick
→ Most descriptive when distributions are highly- idea of the spread.
peaked (leptokurtic) – suggesting large → The difference between the highest and lowest
concentration on a single value. value.
→ Can be used in categorical data. → waste of information: rest of the values are not
used.
(4) Midrange
→ Sum of the lowest and highest values and divided by

2.
→ Very sensitive to outliers; rarely used.
→ Gives the midpoint.
(5) Weighted mean
→ specialized version of mean. (2) Variance

→ the computation of the mean if the values or the
→ Average of the squared deviations of values from
categories are not equal in weight.
the mean:
→ (Multiplying each value by its corresponding weight) 1. find the mean
and dividing the (sum of the products) by the (sum 2. subtract the mean to each values
of the weights) 3. square the differences
4. add all squared differences
Summary of Measures of Central Tendency 5. divide the sum:
a. if sample: /n-1
Mean Sum of values, divided by total number of values x̄, µ
b. if population: /N
Median Middle point in data set that has been ordered MD
Mode Most frequent data value none
Midrange Lowest value plus highest value, divided by 2 MR → Measured in square of the original units. (a problem
for interpretation)
MEASURES OF VARIATION
(3) Standard Deviation
→ Measure the spread or variability of the values from
each other. → an extra step to variation: square root of the
variance.
→ measured back in the unit as that of the data values.
Sample (statistic) s
Population (parameter) σ
• sample variance and SD for Ungrouped data:

The normal way of computing.
• sample variance and SD for Grouped data:

(kapoy memorize sa formula so kung mugawas
nis exam wa koy ans.)
Summary of Measures of Variation 3. start at the lowest value and count over to the c
(score position)
Range Distance bet. lowest and highest value R
Variance Average of the squares of the distance s2, σ2 if c = whole number [c + (c+1)/2 ]: the value in
that each value is from the mean
Standard Square root of the variance s,σ
between c and c+1
Deviation if c = not a whole number – round up to the
next whole number.
(4) Coefficient of Variation (3) Quartiles and Deciles
→ Ratio of the standard deviation to the mean ▪ Quartiles divide the data into 4 equal parts.
→ Used to compare the measure of spread between
sets of data that are measured in different units. ▪ Decile divides the data into 10 equal parts.
→ Expressed in percentage. (Cvar = %)
MEASURES OF POSITION
→ describe the position or location of particular values (4) Interquartile Range

along the cumulative distribution.
→ Gives information on the values of the middle 50%
→ sometimes useful for determining cut-off points for
of the data.
certain categories.
→ higher the IQR, larger the variation in the
(1) Standard Score (z score) middlemost values of the data.
→ Number of standard deviations that a data value is (5) Outlier

above or below the mean.
→ extremely high or an extremely low data value when
→ obtained by subtracting the mean from the value
compared with the rest of the data values.
and dividing the result by the standard deviation.
1. find Q1 and Q3
2. find its IQR (Q1-Q3)
(2) Percentile 3. multiply IQR with 1.5
4. subtract the difference from Q1 and add in Q3.
→ divide the data into 100 equal parts. 5. Value beyond the difference/sum of both Q1,3
is the outlier.
1. Arrange the data in order.
2. Compute
Summary of Measures of Position
given = x (value)
Standard Number of standard deviations that a z
find = percentile score/ data value is above or below the mean
z score
Percentile Position in hundredths that a data value Pn
holds in the distribution
given = percentile (the place) Decile Position in tenths that a data value holds Dn
find = x (value position) in the distribution
Quartile Position in fourths that a data value holds Qn
in the distribution
EXPLANATORY DATA ANALYSIS
→ Resistant statistics – not easily affected by outliers

unlike the non-resistant statistics.
Boxplot
→ It is a graph that show some of the most important

statistics in the data set, specifically:
The median (central tendency);
P25 (Q1) and P75 (Q3) (location and
variation); and
Some extreme values
→ very versatile graph for showing distributions,
comparisons, and associations between variables.

Biostats Lesson 3

Uploaded by

Copyright:

Available Formats

Biostats Lesson 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biostats Lesson 3

Uploaded by

Copyright:

Available Formats

BIOSTATISTICS AND EPIDEMIOLOGY OTHER TYPES OF GRAPHS

3rd term / Prelims / Lecture 2 – Descriptive Statistics (part 2)

• Distribution of the data pertains to how data are

J-SHAPED → most of the values are at the

BIMODAL → has 2 peaks (modes)

NO MODE Multiple more modes; more than

→ shows the partition of a particular category in

STEM AND LEAF PLOTS

POSITIVELY SKEWED NEGATIVELY SKEWED

right tail is longer Left tail is longer

SKEWNESS AND KURTOSIS

Skewness central tendency to go to the right, left,

• Be careful of misleading graphs. There are graphs LEFTOKURTIC highly-peaked; sharp

DESCRIBING FREQUENCY DISTRIBUTION

GENERAL ROUNDING RULE

Mean Center of gravity

˃ Collectively known as Average.

→ known as arithmetic average.

→ Sum of the lowest and highest values and divided by

(5) Weighted mean

→ specialized version of mean. (2) Variance

• sample variance and SD for Ungrouped data:

• sample variance and SD for Grouped data:

(4) Coefficient of Variation (3) Quartiles and Deciles

→ describe the position or location of particular values (4) Interquartile Range

→ Number of standard deviations that a data value is (5) Outlier

→ Resistant statistics – not easily affected by outliers

→ It is a graph that show some of the most important

You might also like