Statistics and Biostatistics: Mrs. Khushbu K. Patel Assistant Professor Shri Sarvajanik Pharmacy College
Statistics and Biostatistics: Mrs. Khushbu K. Patel Assistant Professor Shri Sarvajanik Pharmacy College
Statistics and Biostatistics: Mrs. Khushbu K. Patel Assistant Professor Shri Sarvajanik Pharmacy College
Biostatistics
Mrs. Khushbu K. Patel
Assistant professor
Shri Sarvajanik Pharmacy College
What is Statistics?
Different authors have defined statistics differently. The best definition of statistics is given
by Croxton and Cowden according to whom statistics may be defined as the science, which
deals with collection, presentation, analysis and interpretation of numerical data.
The science and art of dealing with variation in data through collection, classification, and
analysis in such a way as to obtain reliable results. —(John M. Last, A Dictionary of
Epidemiology )
Branch of mathematics that deals with the collection, organization, and analysis of numerical
data and with such problems as experiment design and decision making. —(Microsoft
Encarta Premium 2009)
A branch of mathematic staking and transforming
numbers into useful information for decision makers.
Improve processes.
Statistics
Sources of data:-
Experiments
Surveys
Records
Example of Raw data:
Systolic BP Diastolic BP
120 80
135 90
Blood Pressure
125 85
140 95
138 86
Elements, Variables, and Observations
The elements are the entities on which data are collected.
The total number of data values in a data set is the number of elements
multiplied by the number of variables.
Data, Data Sets, Elements, Variables, and Observations
Variables
Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)
Data Set
Descriptive statistics
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Characterize data
– e.g., Sample mean = Xi
n
Inferential Statistics
• Estimation
– e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g., Test the claim that the
population mean weight is 120
pounds
Drawing conclusions about a large group of individuals based on a subset of the large group.
Inferential statistics
Population Sample
Quantitative Qualitative
data(numerical) data(categorical)
countable in a finite
take forever to count amount of time
Ex: time Ex: count change of
money in your pocket
Type of variables
Categorical (qualitative) variables have values that can only be placed
into categories, such as “yes” and “no.”
Numerical
Nominal
Nominal
Notes Examples
Lowest Level of measurement Gender
Discrete Categories 0 = Male
1 = Female
No natural order
Categorical or dichotomous
Group Membership
1= Experimental
May be referred to a qualitative
2 = Placebo
or categorical
3 = Routine
Marital Status, Colour, religion,
type of car etc.
Nominal
Nominal sounds like name
Yes, Ratio
Yes, Zero
means none?
Yes, Equally
No, Interval
Spaced
Ordered? No Ordinal
No
Nominal
Number Example Permissible
Scale
system statistics
Unique definition of Roll number of Percentages, Mode,
Nominal numbers students, Numbers Binomial test, Chi-
( 0,1,2,……..9) assign to basket ball Square test
:
players.
Order Numbers Student’s Rank Percentiles, Median,
(0<1<2……….<9) Rank-order co-
Ordinal: relation, Two-way
ANOVA
Equality of Temperature Range, Mean,
differences Standard deviation,
Interval
(2-1 = 7-6) Product Movement
: Correlation t- test and
f -test
Equality of Ratio Weight, height, Geometric Mean,
(5/10 = 3/6) distance Harmonic Mean,
Ratio: Coefficient of
variation
SOME STATISTICAL TESTS
Nominal Ordinal Interval Ratio
Mode Yes Yes Yes Yes
Median No Yes Yes Yes
Mean No No Yes Yes
Frequency Yes Yes Yes Yes
Distribution
Range No Yes Yes Yes
Add and Subtract No No Yes Yes
Multiply and No No No Yes
Divide
Standard No No Yes Yes
Deviation
NOIR
Remember Example Central Notes
Tendency
No order;
Named classifications; Limited in
Nominal Gender Mode
Mutually exclusive categories descriptive
ability
Ordered or Relative rankings;
Not necessarily
Ordinal Numbers are not equidistant; Pain scale Mode, median
equal intervals
Zero is arbitrary
Exact difference
Rank ordering; Approximately between
Exam Mode,
Interval equal intervals; Can have numbers is
marks median, mean
negative numbers known; Zero is
arbitrary
Rank ordering; Equal Length Mode, Zero means
Ratio
intervals; absolute Zero Weight Median, Mean none
Methods of presentation of data
1 Tabular presentation
2 Graphical presentation
Purpose: To display data so that they can be readily understood.
•Tables and graphs share some common features, but for any specific situation,
one is likely to be more suitable than the other.
Tabular Presentation
Types of tables:-
1.list table:- for qualitative data, count the number of observations
( frequencies) in each category.
A table consisting of two columns, the first giving an identification of the
observational unit and the second giving the value of variable for that unit.
Example : number of patients in each hospital department are
Department Number of patients
Medicine 100
Surgery 88
ENT 54
Opthalmology 30
Tabular Presentation
2. Frequency distribution table:- for qualitative and quantitative
data
Simple frequency distribution table:-
Tabular Presentation
Lung cancer
Total
Smoking positive negative
No. % No. % No. %
Smoker 15 65.2 8 34.8 23 100
Non smoker 5 13.5 32 86.5 37 100
Total 20 33.3 40 66.7 60 100
Graphical presentation
For quantitative,
For qualitative,
continuous or measured
discrete or counted
data
data
Histogram
Bar diagram
Frequency polygon
Pie or sector diagram
Frequency curve
Spot map
Line chart
Scattered or dot diagram
Bar diagram
Skin
D ig e s tive
multiple
Condition
Headache
G ynec ol ogi c
component R es pi rat or y
Circulatory
G eneral
Blood
Endocrine
0 20 40 60 80 100 120
N u m b e r of Patients
Bar diagram
You must give attention to selecting the appropriate number of class groupings
for the table, determining a suitable width of a class grouping, and establishing
the boundaries of each class grouping to avoid overlapping.
The number of classes depends on the number of values in the data. With a
larger number of values, typically there are more classes. In general, a
frequency distribution should have at least 5 but no more than 15 classes.
To determine the width of a class interval, you divide the range (Highest
value–Lowest value) of the data by the number of class groupings desired.
Example: A manufacturer of insulation randomly selects 20 winter days and records
the daily high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32,
35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 -12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits):
Class 1: 10 to less than 20
Class 2: 20 to less than 30
Class 3: 30 to less than 40
Class 4: 40 to less than 50
Class 5: 50 to less than 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
1
2
3
4
5
Tabulating Numerical Data: Cumulative Frequency
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53,
58
Why Use a Frequency Distribution?
• It condenses the raw data into a more useful form
• It allows for a quick visual interpretation of the data
• It enables the determination of the major characteristics of the data set
including where the data are concentrated / clustered
Frequency Distributions: Some Tips
Different class boundaries may provide different pictures for
the same data (especially for smaller data sets)
Practice work
https://www.mathsisfun.com/data/frequency-distribution.html
Measures of central tendacy
• The central tendency is the extent to which all the data values group
around a typical or central value.
.
The three most commonly used averages are:
• The arithmetic mean
• The Median
• The Mode
Measures of central tendacy
1. Mean:-
◦ The arithmetic average of the variable x.
◦ It has good sampling stability (e.g., it varies the least from sample to
sample), implying that it is better suited for making inferences about
population parameters.
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Note that is not the value of the median, only the position
L
(n/2) m c
Median =
f
Where
L =Lower limit of the median class
n = Total number of observations = f ( x )
m = Cumulative frequency preceding the median class
f = Frequency of the median class
c = Class interval of the median class
Median for Grouped Data Example
Find the median for the following continuous frequency distribution:
= 2.9375
Measures of Central Tendency: The Mode
3 Mode:-
◦ The most frequently occurring value in the data set.
◦ May not exist or may not be uniquely defined.
◦ It is the only measure of central tendency that can be used with
nominal variables, but it is also meaningful for quantitative variables
that are inherently discrete.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Mode for Grouped Data
d1
Mode = L c
d1 d 2
f2 = Frequency succeeding the modal class. C = Class Interval of the modal class
Mode for Grouped Data Example
Example: Find the mode for the following continuous frequency
distribution:
Class Frequency
0-1 1
1-2 4 d1
L c
2-3 8 Mode = d1 d 2
3-4 7
L =2
4-5 3 d1 f1 f 0 = 8 - 4 = 4
5-6 2
d2 f1 f2 = 8 - 7 = 1
Total 25 4
C = 1 Hence Mode = 2 1
5
= 2.8
Measure of dispersion
Variation
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Measure of dispersion
Range:-
It is the difference between the largest and smallest values.
It is the simplest measure of variation.
Disadvantage:- it is based only on two of the observations
and gives no idea of how the other observations are arranged
between these two.
Measures of Variation:
Why The Range Can Be Misleading
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
n -1
n = sample size
Xi = ith value of the variable X
Measures of Variation: The Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
i
(X X) 2
Uses:-
1. It summarizes the deviations of a large distribution from mean in one figure used as
a unit of variation.
2. Indicates whether the variation of difference of an individual from the mean is
by chance, i.e. natural or real due to some special reasons.
3. It also helps in finding the suitable size of sample for valid conclusions.
https://www.mathsisfun.com/data/standard-deviation.html
Measures of Variation: Sample Standard
Deviation
Example
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
130
4.3095
7
Standard Deviation (Sample) for Grouped Data
Frequency Distribution of Return on Investment of Mutual Funds
Mean = X f X= 1040/60=17.333
n
f(X X) 2 2 4 4 8 . 3 3
Standard Deviation = S n 1
= 5 9= 6.44
Assignment
Class Frequency
700-799 4
800-899 7
900 8
1000 10
1100 12
1200 17
1300 13
1400 10
1500 9
1600 7
1700 2
1800-1899 1
CV S 100%
X
Measure of dispersion
Coefficient of variation:-
The coefficient of variation expresses the standard deviation as a
percentage of the sample mean.
B
Measures of Variation: Comparing Standard
Deviations
The coefficient of variation (CV) is a measure of relative variability. It is the ratio of
the standard deviation to the mean (average).
Data A
Mean = 15.5
CV =21.53
11 12 13 14 15 16 17 18 19 20 21
S = 3.338
• Drug A sale
– Average price last year = $50
– Standard deviation = $5
Both stocks
S $5 have the same
CVA 100% 100% 10%
X $50 standard
• Drug B sale: deviation, but
– Average price last year = $100 stock B is less
variable
– Standard deviation = $5 relative to its
price
S $5
CVB 100% 100% 5%
X $100