Biostats Lesson 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

BIOSTATISTICS AND EPIDEMIOLOGY OTHER TYPES OF GRAPHS

3rd term / Prelims / Lecture 2 – Descriptive Statistics (part 2)


BAR GRAPHS
DISTRIBUTION SHAPES

• Distribution of the data pertains to how data are


being distributed in each axis and in a given number
line.
• Distributions have no exact shape, but we are just
ball-parking or estimating. What we are after here is HORIZONTAL VERTICAL
just the overall pattern of the distribution of the
frequency. → Bar graphs are good for categorical data, have
spaces.
BELL-SHAPED → the most natural shape of the → Bar graphs are different from histograms because
frequency distribution. histograms have bars that are sticking with each
→ the most natural shape of the other demoting that the data is continuous.
frequency distribution.
UNIFORM → usually created through PARETO CHART
random distribution or purely
by chance.

J-SHAPED → most of the values are at the


higher part of the x-axis

REVERSED → most of the values are at the → Pareto charts represent a frequency distribution for
J-SHAPED lower part of the x-axis. categorical variables and the frequencies are
arranged from highest to lowest.
→ It doesn’t denote that they are continuous- so dapat
they are not sticking to each other it is still not a
histogram.
RIGHT-SKEWED → The skew refers to the tail.
→ positively skewed.
TIME SERIES GRAPH COMPOUND TIME
SERIES GRAPH
LEFT-SKEWED → negatively skewed

BIMODAL → has 2 peaks (modes)


→ If more than 2, it is called
MULTIMODAL → a line graph showing you the trend over time. The x-
axis is in terms of time.
→ compound time series graph is where it has 2 time
U-SHAPED → rarely seen. ayaw daw tagai si
series lines in one graph which is common in
sir ani.
Epidemiology.
PIE GRAPH MULTIMODAL multiple more modes; more than
two bars stand out

NO MODE Multiple more modes; more than


two bars stand out

→ shows the partition of a particular category in


relation to the whole. Shape adjectives
→ usually misused because we use it to substitute a
bar graph. SYMMETRIC
˃ Percentage/ Relative value = Pie graph
Right-side is mirror image of
˃ Absolute value = Bar graph
the left-side

STEM AND LEAF PLOTS


SKEWED
Asymmetric distribution and describing where
tapering of the sides (tails) is different.

POSITIVELY SKEWED NEGATIVELY SKEWED

right tail is longer Left tail is longer

SKEWNESS AND KURTOSIS

Skewness central tendency to go to the right, left,


or center
Kurtosis dispersion of how flat and how big that
values are
→ Instead of using graphs, you are using the numbers
as the bar graph. Types of Kurtosis

• Be careful of misleading graphs. There are graphs LEFTOKURTIC highly-peaked; sharp


with malicious intent or intentionally use to mislead
people.

DESCRIBING FREQUENCY DISTRIBUTION


MESOKURTIC bell-shaped; symmetrical
UNIMODAL single mode; a single bar stands out tails, rounded peak

BIMODAL two modes; two bars stand out PLATYKURTIC Flat; heavy tails
TRADITIONAL STATISTICS → Widely used measure of central tendency
→ Layman’s concept of average
There are three concepts:
→ Affected by presence of outliers in the data.
Measures of how they are centered or their
Central Tendency average. Sample (statistic) x̄ = Σx / n
Population (parameter) µ = Σx / N
Measures of how dispersed the data are.
Variation • Mean for Ungrouped data:
Measures of describing the position of the
­ Comes from raw data.
Position data value in relation to other
­ The normal way of computing.
values in relation to the data set.
• Mean for Grouped data:
• Average and mean are synonymous with each other
­ Comes form the frequency distribution table.
except in statistics because average is an all-
­ Makes use of the summation of all the
encompassing term which can be mean, median, or
(midpoint multiplied by the frequency).
mode.
­ Only an approximation.
PARAMETER VS STATISTIC
Midpoint = lower + upper / 2
Statistic characteristic or measure obtained by
using the data values from a sample.
Parameter a characteristic or measure obtained
by using all the data values from a
specific population.

▪ n = sample size
▪ N = population size

GENERAL ROUNDING RULE

• rounding should not be done until the final answer (2) Median
is calculated.
→ middlemost value.
• the calculated mean/SD should have 1 decimal
→ Obtained by sorting the values from the lowest to
place higher than the data values.
highest and getting the value in the middle.
→ Preferred as the typical value (or center) than mean
MEASURES OF CENTRAL TENDENCY when distribution is skewed.
˃ Median is less affected by outliers
→ describes where the distribution may be “centered” than the Mean.
or the bulk of the data.
→ There are various concepts of center:

Mean Center of gravity


Median Value in the middle
Mode Most typical value

˃ Collectively known as Average.

(1) Mean

→ known as arithmetic average.


→ average of values: equal to the sum of all values ˃ If the data are even, add the 2 middle values
divided by the number of values. and divide it by 2.
(3) Mode (1) Range

→ Most frequently occurring value, most typical → Simplest measure of dispersion, used to get a quick
→ Most descriptive when distributions are highly- idea of the spread.
peaked (leptokurtic) – suggesting large → The difference between the highest and lowest
concentration on a single value. value.
→ Can be used in categorical data. → waste of information: rest of the values are not
used.

(4) Midrange

→ Sum of the lowest and highest values and divided by


2.
→ Very sensitive to outliers; rarely used.
→ Gives the midpoint.

(5) Weighted mean

→ specialized version of mean. (2) Variance


→ the computation of the mean if the values or the
→ Average of the squared deviations of values from
categories are not equal in weight.
the mean:
→ (Multiplying each value by its corresponding weight) 1. find the mean
and dividing the (sum of the products) by the (sum 2. subtract the mean to each values
of the weights) 3. square the differences
4. add all squared differences
Summary of Measures of Central Tendency 5. divide the sum:
a. if sample: /n-1
Mean Sum of values, divided by total number of values x̄, µ
b. if population: /N
Median Middle point in data set that has been ordered MD
Mode Most frequent data value none
Midrange Lowest value plus highest value, divided by 2 MR → Measured in square of the original units. (a problem
for interpretation)

MEASURES OF VARIATION
(3) Standard Deviation
→ Measure the spread or variability of the values from
each other. → an extra step to variation: square root of the
variance.
→ measured back in the unit as that of the data values.

Sample (statistic) s
Population (parameter) σ

• sample variance and SD for Ungrouped data:


­ The normal way of computing.

• sample variance and SD for Grouped data:


­ (kapoy memorize sa formula so kung mugawas
nis exam wa koy ans.)
Summary of Measures of Variation 3. start at the lowest value and count over to the c
(score position)
Range Distance bet. lowest and highest value R
Variance Average of the squares of the distance s2, σ2 ­ if c = whole number [c + (c+1)/2 ]: the value in
that each value is from the mean
Standard Square root of the variance s,σ
between c and c+1
Deviation ­ if c = not a whole number – round up to the
next whole number.

(4) Coefficient of Variation (3) Quartiles and Deciles

→ Ratio of the standard deviation to the mean ▪ Quartiles divide the data into 4 equal parts.
→ Used to compare the measure of spread between
sets of data that are measured in different units. ▪ Decile divides the data into 10 equal parts.
→ Expressed in percentage. (Cvar = %)

MEASURES OF POSITION

→ describe the position or location of particular values (4) Interquartile Range


along the cumulative distribution.
→ Gives information on the values of the middle 50%
→ sometimes useful for determining cut-off points for
of the data.
certain categories.
→ higher the IQR, larger the variation in the
(1) Standard Score (z score) middlemost values of the data.

→ Number of standard deviations that a data value is (5) Outlier


above or below the mean.
→ extremely high or an extremely low data value when
→ obtained by subtracting the mean from the value
compared with the rest of the data values.
and dividing the result by the standard deviation.
1. find Q1 and Q3
2. find its IQR (Q1-Q3)
(2) Percentile 3. multiply IQR with 1.5
4. subtract the difference from Q1 and add in Q3.
→ divide the data into 100 equal parts. 5. Value beyond the difference/sum of both Q1,3
is the outlier.
1. Arrange the data in order.
2. Compute
Summary of Measures of Position
given = x (value)
Standard Number of standard deviations that a z
find = percentile score/ data value is above or below the mean
z score
Percentile Position in hundredths that a data value Pn
holds in the distribution
given = percentile (the place) Decile Position in tenths that a data value holds Dn
find = x (value position) in the distribution
Quartile Position in fourths that a data value holds Qn
in the distribution
EXPLANATORY DATA ANALYSIS

→ Resistant statistics – not easily affected by outliers


unlike the non-resistant statistics.

Boxplot

→ It is a graph that show some of the most important


statistics in the data set, specifically:
­ The median (central tendency);
­ P25 (Q1) and P75 (Q3) (location and
variation); and
­ Some extreme values
→ very versatile graph for showing distributions,
comparisons, and associations between variables.

You might also like