Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
Categorical variable: is a variable that has a fixed rang of variable Ex: colors: red or green or ..
And coffe type: hot/cold
Quantitative variable: that is a number and not a categorical.
Statistics types:
Frequency tables:
Two way frequency table: most commonly used to represent the (frequency) relationships between
categorical datas
Two-way relative frequency table: used to represent the percentage of data comparing to it’s column total
of the frequency table and to check if there is an association between two variables.
You can visualize frequency table by drawing dot plots
Graphs:
Outliers: are the data points that has a great values comparing to the rest of datapoints at the dataset
can be calculated by : < Q1 – 1.5 * IQR or > Q3 + 1.5 * IQR
3- Population Standard deviation(σ)= (is the amount by which a single measurement differs from the mean.)
the square root of variance
4- IQR (if there is outliers and you use median instead of mean for central tendency)
Percentile: is percentage of the data points that is below or at and below the data in the question
cumulative relative frequency graph: it describes a data and their percentiles on x and y axis
Z-score: measures exactly how many standard deviations above or below the mean a data point is.
1- Z-score = data point – mean / standard deviation
2- A positive z-score says the data point is above average/ A negative z-score says the data point is below
average / A z-score close to 000 says the data point is close to average.
Z- table: to measure what proportion (percentage) of data is below a certain data in a Normal distribution
Distribution:
1- Marginal distribution
2- Conditional distribution
1- Center
2- Spread/variability
Distribution shapes:
1- Right tailed distribution (right skewed) --> the mean is greater than median
2- Left tailed distribution (left skewed) --> the mean is smaller than median
3- Symmetrical (normal distribution) --> mean is at the center and takes bell shape
1- The area of the shape from 1 standard deviation above and below the mean of normal distribution equal to
68% of the whole area.
2- The area of the shape from 2 standard deviation above and below the mean of normal distribution equal to
95% of the whole area.
3- The area of the shape from 3 standard deviation above and below the mean of normal distribution equal to
99.7% of the whole area.
Scatterplot (bivariate relationship): plot to analysis relationship between two variables to test if there is a linear
relationship (correlation) between them or not weather it’s positive or negative relationship and strength and
outliers
Linear regression:
When we see a relationship in a scatterplot, we can use a line to summarize the relationship in the data.
We can also use that line to make predictions in the data. This process is called linear regression.
We draw line between al data and generate its equation to make predictions
Residuals:
A residual is a measure of how well a line fits an individual data point
The vertical distance is known as a residual. For data points above the line, the residual is positive, and for
data points below the line, the residual is negative
The sum of squares residuals can give sense if a particular regression line best fit comparing to another line
squared residual called squared error
The sum of squared residuals and called: prediction error and we have to choose line eliminate that error
R = real point – expected (from line)
Residual plot:
It to plot the residuals distribution to check if the line is the best fit or not
To calculate the percentage of variation that not described by the variation in x (regression line)
1- Is equal = total squared errors from line / total variation in y
2- Total squared errors from line = are the total variation in y not describe by regression line
3- Note: Total variation in y = the sum of variations of each point from the mean of y
To calculate the percentage of variation that described by the variation in x (called coefficient of
determination )
1- Equal = 1 – total percentage of variation that not described by the variation in x (regression line)
A stem and leaf plot display numerical data by splitting each data point into a "leaf" (usually the last digit)
and a "stem" (the leading digit or digits).
.
Population vs samples:
Experiment setup:
Each experiment consists of:
1- Explanatory variable: variable explains changes in another variable
2- Response variable: measures the result of a study
3- Treatment group: a group who receive a treatment which we want test it’s effect
4- Control group: a group doesn’t receive a treatment for comparison with treatment group
Note: When we say there's potential bias, we should also be able to argue if the results will probably be an
overestimate or an underestimate.
Causality vs correlation:
Causality: for example: a causes b
Correlation: a and b are observed at the same time whenever I see a I see b and reverse
Probability:
Probability is simply how likely something is to happen.(chance)
Probability of a condition = (number of ways the condition can happen) / (total number of
outcomes)
The probability of event A for example is often written as P(A)
If P(A) > P(B), then event AAA has a higher chance of occurring than event B
If P(A) = P(B), then events A and B are equally likely to occur.
Theoretical vs experimental probability:
Theoretical probability: is what we expect to happen (the ideal form of probability)
experimental probability: is what actually happens when we try it out. ()
The more the experimental probability conducted the more it closes the theoretical one
Sample space:
Is a set that contains all the different outcome possibilities of an experiment
Sample space for compound events:
Is a set that contains the different outcome possibilities of an experiment of compound events
(when an event has multiple different outcome like p(HH) or P(HHH) )
Compound probability of independent events:
Compound means For example: when calculating the probability of getting two heads in row from
flipping a coin twice P(HH) ,= ¼
Independent: means the probability of the second flip is not dependent on the first flip
so for example: P(HH)= P(H1) * p(H2) = ½ * ½ = ¼
Another way to calculate P(HH) is: P((n)numbers of H) = P(H) **n
Conditional probability (Dependent Event) :
Is when one event is dependent on the outcome of one event
Ex: if we select between 2 tag names to give a first name a first prize and the second the second
prize then the probability of the second dependent about what you will pick for the first,
When we calculate probabilities involving one event AND another event dependent on the first,
we multiply their probabilities: P (A and B) = P(B∣A). P(A) = P(A∣B). P(B)
P (A and B) = P(B∣A). P(A) = P(A∣B). P(B) called Bayes' theorem used to describes the probability of an
event, based on prior knowledge of conditions that might be related to the event
Probabilities involving "at least one" success:
Rule: P(at least 1 success)= 1- P(all failures)
Ex: surgeries involving implants sometimes result in the patient's body rejecting the implant. A certain
surgery has a rejection rate of 11% percent. The rest of the patients successfully accept the implant.
Assume that the results for each patient are independent., In a random sample of 8 of these surgeries, find
Solution: P(accept)=0.89
P(at least one rejects)=1−P(all 8 accept)
permutation:
Number of possibilities (scenarios) of arranging a member of set into sequence
Ex: number of possibilities of making 5 peoples sit on 3 chairs
Law: (n!)/ (n-k)! --> since n is the number of people and k is number of chairs
Combination:
Ex: Is how many ways combination (set) from same combination on the possibilities of making 5 peoples sit
on 3 chairs
Or how many ways you can pick k things from n number without care about the order
Law: = permutation/ k! --> number of ways we can arrange k number in k spaces
Binomial coefficient (combination):
To calculate the total number of possible outcomes
For example, when calculating how many possibilities of getting exactly 3 heads after flipping 5
coins
Solution: the first head of three has 5 places to move on the 5 then the second has 4 then the
third has 3 so the number of possibilities has total: 5*4*3 = 60 possibilities of having exactly 3
heads with taking the order into consideration
If we want to not consider the possibilities of the ordering, we divide by factorial of number of
heads so: 3! = 3*2*1= 6 , then the total possibilities without taking order into consideration will
be= 60/6 = 10 possibilities
It’s general rule like as: combination
Note: when calculating the probability of unfair coin for example with giving head probability of 80%
and need to calculate the probability of 4 heads out of 6 coins P (4/6 H): (binomial probability)
First we will calculate the probability for one probability for example: HHHTHT
Then calculate how many possibilities have 4 head after flipping 6 time by using binomial coef
Then multiply the number of possibilities by the probability of one possibility to get the over all
probability of them
10% Rule:
When taking a sample for example you make questionnaire for people at mall before taking their
exit then these sample are dependence and you can’t make it independence since you can’t
prevent people to get out of the gates since if the sample size is equal or less than 10% of the
population size then we can consider it and deal with it as a independence variable (binomial
variable )