Statstics NOTES SEM2

STATISTICS NOTES
Module I: Introduction
● Data: definition, nature, characteristics and analysis of data

● Parametric and non-parametric statistics
● Descriptive statistics and inferential statistics
● Quantitative and Qualitative data analysis
Q1. Define Data.
Data in statistics is a collection of measurements or observations that provides us with information
about a population or phenomenon.
● Measurements: These are quantitative attributes expressed numerically (e.g., height, weight,
income level).
● Observations: These can be qualitative characteristics or descriptive information not
involving numbers (e.g., hair color, customer satisfaction rating).
Key Point: Data serves as the raw material for statistical analysis. By analyzing data, we can:
● Understand patterns and trends within the population.
● Make inferences about the larger group from which the data was collected.
● Test hypotheses and draw conclusions about relationships between variables.
Additionally:
● Data can be categorized into different types based on its characteristics, such as quantitative
vs. qualitative and discrete vs. continuous.
● The method of data collection is crucial and can influence the analysis (e.g., surveys,
experiments, observational studies).
● Effective data analysis often requires organizing the data in a structured format like tables or
spreadsheets.
In essence, data provides the foundation for statistical inquiry, and statistical methods allow us to
extract knowledge and meaning from that data.

Nature of Data
The nature of data in statistics can be explored through several key aspects:
1. Form and Measurement:
● Quantitative vs. Qualitative: Data can be numerical (quantitative), allowing for mathematical
operations (e.g., height, income, exam scores). Alternatively, it can be categorical
(qualitative), representing characteristics or classifications (e.g., hair color, customer
satisfaction level, blood type).
● Discrete vs. Continuous: Quantitative data can be further categorized. Discrete data takes
on distinct, countable values (e.g., number of siblings, daily rainfall). Continuous data can
theoretically take on any value within a range (e.g., weight, temperature, reaction time).
2. Scale of Measurement:
● The level of measurement determines the mathematical operations permissible on the data.
○ Nominal: Categorical data with no intrinsic order (e.g., shoe size, political party
affiliation).
○ Ordinal: Categorical data with a rank or order (e.g., customer satisfaction rating,
course grades).
○ Interval: Numerical data with consistent intervals between units, but no absolute zero
(e.g., temperature in Celsius, IQ scores).
○ Ratio: Numerical data with a true zero point, allowing ratio comparisons (e.g., weight,
time, income).
3. Inherent Variability:
● Data often exhibits variability, meaning individual observations may differ within the dataset.
This variability can be random or systematic and needs to be considered during analysis.
4. Context and Representation:
● The meaning and interpretation of data heavily depend on the context in which it was
collected. Understanding the data collection process and any potential biases is crucial.
● Data can be presented in various forms, including raw numbers, frequency tables,
histograms, and scatter plots. The chosen representation can influence how we perceive the
data's nature and relationships within it.

5. Role in Statistical Analysis:
● Data is the foundation for statistical inference. We cannot directly observe entire populations,
so data from samples allows us to draw conclusions about the larger group.
● The nature of data dictates the appropriate statistical methods to be used for analysis.
By understanding these aspects of data's nature, you can effectively analyze it, draw sound
inferences, and avoid misinterpretations in your statistical endeavors.
Characteristics Of Data
Data in statistics can be characterized along several dimensions that influence how we analyze and
interpret it. Here's a breakdown of some key characteristics:
1. Accuracy:
● Refers to the correctness and freedom from errors in the data. Inaccurate data can lead to
misleading conclusions.
2. Completeness:
● Indicates whether all relevant data points are present. Missing data can introduce bias and
hinder analysis.
3. Consistency:
● Ensures the data follows consistent formatting and measurement scales throughout the
dataset. Inconsistent data can complicate analysis and lead to errors.
4. Relevance:
● Addresses whether the data pertains to the question or problem at hand. Irrelevant data adds
noise and reduces the effectiveness of analysis.
5. Timeliness:
● Refers to how up-to-date the data is. Outdated data may not reflect current trends or
conditions.
6. Granularity:
● The level of detail within the data. More granular data provides a richer picture but can be
computationally expensive to analyze. Conversely, less granular data may obscure important
details.
7. Accessibility:
● Refers to the ease with which data can be accessed, retrieved, and manipulated. Inaccessible
data limits its usefulness for analysis.
8. Security:
● Important for protecting sensitive data from unauthorized access or modification.
9. Data Types:
● As discussed previously, data can be quantitative (numerical) or qualitative (categorical).
Additionally, it can be discrete (distinct values) or continuous (theoretical range of values).
10. Biases:
● Data can be susceptible to various biases introduced during collection, sampling, or
measurement. Recognizing and mitigating potential biases is crucial for drawing valid
conclusions.
Understanding these characteristics allows you to assess the quality of your data and determine its
suitability for specific statistical analyses. By carefully considering these aspects, you can ensure your
statistical endeavors are based on a solid foundation and produce reliable results.
Data analysis is the systematic process of inspecting, cleansing, transforming, and modeling data
with the objective of discovering useful information, informing conclusions, and supporting
decision-making. Here's a breakdown of the key phases involved:
1. Data Collection:
● This initial stage involves gathering data relevant to the research question or problem at hand.
Common methods include surveys, experiments, observational studies, and existing
databases.
2. Data Cleaning:
● Real-world data often contains errors, inconsistencies, and missing values. This phase
focuses on identifying and correcting these issues to ensure the integrity of the data.
3. Exploratory Data Analysis (EDA):
● This preliminary analysis aims to understand the data's characteristics, central tendencies,
variability, and potential relationships between variables. Techniques like descriptive statistics,
visualizations (histograms, boxplots, scatterplots), and correlation analysis are employed.
4. Data Transformation:
● In some cases, data may need to be transformed to meet the assumptions of specific
statistical tests. This might involve scaling, centering, or creating new variables.
5. Modeling and Statistical Inference:
● Based on the research question and data characteristics, appropriate statistical models are
chosen (e.g., regression analysis, hypothesis testing). These models help us understand
relationships between variables, test hypotheses, and draw inferences about the population
from which the data originated.
6. Communication and Interpretation:
● The final stage involves presenting the findings of the data analysis in a clear and concise
manner. This often involves tables, charts, and explanations of the statistical results in the
context of the research question.
Here are some additional points to consider:
● Software Tools: Statistical software packages like R, Python (with libraries like pandas and
scikit-learn), SPSS, and SAS are widely used for data analysis tasks.
● Ethical Considerations: Responsible data analysis requires considering ethical issues like
data privacy and avoiding biased interpretations.
By following these steps and considering the various aspects, you can effectively analyze data,
extract meaningful insights, and leverage them for informed decision-making.
Parametric VS Nonparametric
In statistical analysis, choosing between parametric and non-parametric methods hinges on the
assumptions you can make about your data. Here's a breakdown of both approaches:
Parametric Statistics:
● Assumptions: Relies on specific assumptions about the underlying population distribution
(often normality) and the characteristics of the data, such as equal variances between groups.
● Tests: Commonly used parametric tests include t-tests (independent and paired samples),
analysis of variance (ANOVA), and correlation analysis (Pearson's correlation coefficient).
● Strengths:
○ More powerful and statistically efficient when assumptions are met.
○ Provide more detailed information about the data, such as means and standard
deviations.
● Weaknesses:
○ Sensitive to violations of assumptions, leading to inaccurate results.
○ May not be suitable for non-normal data or data with unequal variances.
Non-parametric Statistics:
● Assumptions: Make fewer or no assumptions about the underlying population distribution or
data characteristics.
● Tests: Common non-parametric tests include Mann-Whitney U test (equivalent to the
independent samples t-test), Wilcoxon signed-rank test (equivalent to the paired samples
t-test), Kruskal-Wallis test (equivalent to ANOVA), and Spearman's rank correlation
coefficient.
● Strengths:
○ More robust to violations of assumptions and can be used with non-normal data or
data with unequal variances.
○ Easier to interpret for non-statisticians as they often rely on rankings rather than raw
data values.
● Weaknesses:
○ Less powerful and statistically efficient than parametric tests when assumptions hold
true.
○ May provide less detailed information about the data.
Choosing the Right Method:
Here are some key factors to consider when deciding between parametric and non-parametric
statistics:
● Data Type: Is your data continuous or categorical? Parametric tests are generally suited for
continuous data, while non-parametric tests can handle both.
● Normality: Can you reasonably assume your data is normally distributed? If unsure, a
normality test can be helpful.

● Sample Size: Parametric tests tend to be more reliable with larger sample sizes.
● Research Question: Are you interested in comparing means, medians, or relationships
between variables? Different tests address different questions.
Descriptive and Inferential Statistics
Descriptive Statistics: Characterizing the Data Set
Descriptive statistics meticulously describes the properties of a data set, providing a comprehensive
portrait. It focuses on summarizing and presenting key features of the data, laying the groundwork for
further analysis. Here are some prominent tools employed in descriptive statistics:
● Measures of Central Tendency: These measures pinpoint the "center" of the data, including
the mean (average), median (middle value), and mode (most frequent value). They offer
valuable insights into the typical values within the data set.
● Measures of Dispersion: These metrics quantify the data's spread or variability. Common
measures include variance, standard deviation, and range. Understanding the spread allows
for a more nuanced interpretation of the central tendency.
● Data Visualization: Visualizations like histograms, boxplots, and scatter plots effectively
portray the data's distribution and potential relationships between variables. These graphical
representations enhance our comprehension of the data's structure and patterns.
Inferential Statistics: Drawing Inferences about Populations
Inferential statistics, in contrast, ventures beyond the confines of the data set itself. It leverages
information from a sample to make inferences about a larger population from which the sample was
drawn. This allows us to generalize our findings and apply them to a broader context. Here are some
key concepts in inferential statistics:
● Hypothesis Testing: This process involves formulating a null hypothesis (no difference
between groups) and an alternative hypothesis (there is a difference). Statistical tests are
conducted to assess the evidence against the null hypothesis, allowing us to draw
conclusions about the population based on the sample data.
● Confidence Intervals: These intervals estimate a population parameter (e.g., mean) with a
certain level of confidence. We can say that the true population parameter is likely to fall
within this range. Confidence intervals provide a measure of precision associated with our
estimates.
● Sample Size and Statistical Power: The size of the sample and the chosen statistical test
influence the power of the analysis. A larger sample size and a well-chosen test lead to a
higher power, increasing the ability to detect a true effect if it exists.
The Intertwined Nature of Descriptive and Inferential Statistics
Descriptive statistics serves as the foundation for inferential statistics. By thoroughly understanding
the data's characteristics through descriptive methods, we can select appropriate inferential
techniques and interpret their results with greater accuracy. Descriptive statistics provides the context,
while inferential statistics allows us to make generalizations and draw conclusions that extend beyond
the immediate data set.
In Conclusion:
● Descriptive statistics summarizes and describes the data itself.
● Inferential statistics allows us to make inferences about a population based on sample data.
Qualitative and Quantitative Data
Quantitative Data: Measurable Attributes
Quantitative data refers to measurable characteristics that can be expressed numerically and
subjected to mathematical operations. It allows us to quantify the world around us. Here are some key
features of quantitative data:
● Numerical Representation: Values are expressed in numbers, enabling calculations of
measures like mean, median, and standard deviation.
● Levels of Measurement: There are different levels of measurement for quantitative data:
○ Nominal: Categorical data with no inherent order (e.g., shoe size, political party
affiliation).
○ Ordinal: Categorical data with a rank or order (e.g., customer satisfaction rating,
course grades).
○ Interval: Numerical data with consistent intervals between units, but no absolute zero
(e.g., temperature in Celsius, IQ scores).
○ Ratio: Numerical data with a true zero point, allowing ratio comparisons (e.g., weight,
time, income).
● Examples: Income levels, exam scores, reaction times, distances.

Quantitative data is particularly suited for:
● Identifying patterns and trends within the data set through statistical analysis.
● Comparing groups or categories using statistical tests.
● Building mathematical models to predict future outcomes based on numerical relationships.
Qualitative Data: Descriptions and Characteristics
Qualitative data, in contrast, focuses on descriptive characteristics that are not easily quantified. It
delves into the subjective realm of words, experiences, and perceptions. Here are some key
characteristics of qualitative data:
● Non-Numerical Representation: Expressed in words, images, or symbols, focusing on
descriptions and qualities rather than numbers.
● Focus on Meanings and Experiences: Qualitative data aims to capture the richness and
complexity of human experience, opinions, and attitudes.
● Examples: Open-ended survey answers, interview transcripts, observations of behavior,
social media posts.
Qualitative data is particularly valuable for:
● Gaining deeper insights into motivations, opinions, and experiences that may not be easily
captured by numbers.
● Exploring complex phenomena that cannot be readily reduced to numerical values.
● Identifying emerging themes and patterns within a dataset through textual analysis.
The Strength of Combining Both Approaches
While qualitative and quantitative data represent distinct approaches, their true power lies in their
potential synergy. Employing both methods within a research study can provide a more holistic
understanding of the phenomenon under investigation. Quantitative data offers the precision of
numbers, while qualitative data adds depth and context.
In Conclusion:
A clear understanding of the distinction between qualitative and quantitative data is essential for
researchers and statisticians. Selecting the appropriate data collection methods and analysis
techniques based on the data type allows us to leverage the full potential of data for robust and
insightful analysis.
Module II: Measures of Central Tendency and Variability
● Measures of Central Tendency: Mean, Median ,Mode
● Measures of Variability: Standard Deviation, Quartile Deviation, Average Deviation
Measures of Central Tendency: Pinpointing the "Typical" Value
In statistical analysis, measures of central tendency serve as essential tools for summarizing a data
set and identifying its "center." These metrics provide a single value that represents the most
representative or typical value within the data. Three prominent measures of central tendency play a
key role: mean, median, and mode.
The Mean: Balancing the Data Points
The mean, often referred to as the average, is a widely used measure of central tendency. It is
calculated by summing the values of all data points in the set and then dividing by the total number of
data points. The mean essentially balances all the values in the data set, finding the central point
where everything balances out.
The Median: Finding the Middle Ground
The median, in contrast, focuses on the middle value when the data is arranged in ascending or
descending order. If you have an odd number of data points, the median is the exact middle value.
With an even number of data points, the median is the average of the two middle values. The median
is like finding the person standing exactly in the middle of a line-up, unaffected by extreme values at
either end.
The Mode: The Most Frequent Value

The mode identifies the value that appears most frequently within the data set. It's like a popularity
contest, highlighting the data point that has the most "votes." The mode can be particularly useful for
categorical data, where you might be looking for the most common category. However, it's important
to note that data can have multiple modes (bimodal or multimodal), or even no mode at all (uniform
distribution).
Measures of Variability: Standard Deviation, Quartile Deviation, Average Deviation
Standard Deviation: The Most Common Measure
The standard deviation (SD) is arguably the most widely used measure of variability. It calculates the
average distance of each data point from the mean. Imagine the mean as the center of a seesaw,
and the standard deviation reflects how far each data point teeters away from that center on average.
Quartile Deviation (QD): Focusing on the Middle Half
Quartile deviation (QD) specifically focuses on the variability within the middle 50% of the data,
excluding the potential influence of outliers. It calculates half the interquartile range (IQR), which is the
difference between the first quartile (Q1) and the third quartile (Q3) of the data. Here's how to find QD:
1. Calculate the IQR: IQR = Q3 - Q1
2. Quartile Deviation (QD) = IQR / 2
Average Deviation (AD)
Average deviation (AD) calculates the average of the absolute deviations of each data point from the
mean. In simpler terms, it calculates how far each data point is away from the mean, in absolute
values (without considering positive or negative direction), and then averages those distances.
Module III: Hypothesis testing
● Z test, chi square
Hypothesis Testing
Hypothesis testing is a formal process that allows us to assess the evidence for a claim about a
population parameter. It involves formulating two competing hypotheses:
● Null hypothesis (H₀): This hypothesis proposes no significant difference between groups or
no relationship between variables. It's the default assumption we aim to disprove.
● Alternative hypothesis (H₁): This hypothesis states the opposite of the null hypothesis. It
suggests a significant difference or relationship exists.
We conduct a statistical test to evaluate the evidence against the null hypothesis. If the evidence is
strong enough (p-value less than a significance level, typically 0.05), we reject the null hypothesis and
support the alternative hypothesis. However, it's important to remember that failing to reject the null
hypothesis doesn't necessarily confirm it; it simply means we don't have enough evidence to disprove
it.
The Z-Test: For Continuous, Normally Distributed Data
The z-test is a parametric test specifically designed for continuous data that is normally distributed. It
leverages the z-statistic, which represents the number of standard deviations a sample mean falls
away from the hypothesized population mean. Here are some key points about the z-test to
remember for exams:
● Assumptions: Continuous, normally distributed data.

● Applications: Testing hypotheses about a single population mean, comparing means of two
independent groups, or comparing a single mean to a hypothesized value.
● Strengths: Well-understood and widely used.
● Weaknesses: Sensitive to violations of normality assumption.
The Chi-Square Test: For Categorical Data
The chi-square test is a non-parametric test suitable for analyzing categorical data. It assesses the
difference between observed and expected frequencies in a contingency table. Imagine a table with
rows and columns representing categories, and the chi-square test helps determine if the observed
distribution of data within those categories differs significantly from what we would expect if there
were no relationship between the variables. Here are some key points about the chi-square test to
remember for exams:
● Assumptions: Categorical data, often presented in a contingency table. Minimum expected
frequencies in each cell (depending on specific chi-square test variation).
● Applications: Testing for independence between two categorical variables, goodness-of-fit
tests (comparing observed and expected frequencies in a single category).
● Strengths: Robust to violations of normality assumption, useful for categorical data.
● Weaknesses: Limited interpretation of the effect size, can be sensitive to small sample sizes.
Choosing the Right Test
Selecting the appropriate test hinges on the characteristics of your data:
● Continuous, normally distributed data: Use the z-test.
● Categorical data: Use the chi-square test.

Module IV: Correlation and Regression
● Meaning, types of correlation, product moment, rank difference methods, meaning of
regression, linear regression equation
Correlation
In statistics, correlation is a captivating concept that explores the strength and direction of the linear
association between two variables. It doesn't establish causation, but rather reflects how much one
variable tends to change in tandem with the other. Here are some key points about correlation to
remember for your exam:
● Types of Correlation: There are three main types to distinguish:
○ Positive correlation: As one variable increases, the other variable generally exhibits
a corresponding increase. Imagine two variables waltzing in the same direction.
○ Negative correlation: As one variable increases, the other variable tends to
decrease, like a tango with opposing steps.
○ Zero correlation: No linear relationship exists between the variables, similar to two
dancers moving independently.
● Correlation Coefficient: This numerical value, ranging from -1 to +1, quantifies the strength
and direction of the correlation.
○ +1 indicates a perfect positive correlation, like two variables in perfect synchrony.
○ -1 indicates a perfect negative correlation, a complete reversal in movement.
○ 0 indicates no linear relationship, essentially no coordinated movement between the
variables.
Common Correlation Measures:
● Pearson's product-moment correlation coefficient: This is the most widely used measure
for continuous, normally distributed data. It calculates the extent to which two variables
linearly relate to each other.

● Spearman's rank correlation coefficient: A non-parametric alternative suitable for ranked
data or data that deviates from a normal distribution. It assesses the monotonic relationship
between the ranks of two variables.

Regression
Regression analysis doesn't establish causation, but rather unveils the direction and strength of the
association between a dependent variable (predicted) and an independent variable (predictor). It
essentially seeks the best-fitting line that approximates the overall trend in your data.
Key Points to Remember for Exams:
● Prediction: Regression helps you predict the value of the dependent variable based on the
value of the independent variable.
● Modeling Relationships: It constructs a mathematical model to represent this relationship.
● Focus on Trends: Regression captures the general trend, but there will always be variability
around the model (not a perfect fit for every single data point).
● Types of Regression: Linear regression is the most common, but there are also other
regression techniques for more complex relationships
The Linear Regression Equation:
The cornerstone of regression analysis is the linear regression equation. This equation represents the
best-fitting straight line that captures the relationship between the independent and dependent
variables. Here's the formula, along with its components:
Y = a + bX
where:
● Y = dependent variable (predicted value) - the variable you're trying to predict (e.g., exam
scores)
● X = independent variable (predictor) - the variable you believe influences the dependent
variable (e.g., study hours)
● a = y-intercept - the point where the regression line crosses the y-axis. This represents the
predicted value of Y when X is zero (it doesn't necessarily mean X can be zero in reality).
● b = slope - the gradient of the line. It indicates the direction and strength of the relationship:
○ Positive slope (b > 0): As X increases, Y tends to increase (positive relationship
between the variables).
○ Negative slope (b < 0): As X increases, Y tends to decrease (negative relationship
between the variables).
○ Steeper slope (larger absolute value of b): The stronger the influence of X on Y.
Module V: Testing Significance of difference
● t test, one way and two way ANOVA
Testing Significance of Difference: The Core Concept
1. Formulate Hypotheses: We propose two competing hypotheses:

○ Null hypothesis (H₀): There is no significant difference between the means of the
groups being compared. (This is the default assumption we aim to disprove.)
○ Alternative hypothesis (H₁): There is a significant difference between the means of
the groups.
2. Choose the Right Test: The selection hinges on the number of independent variables
(factors) and the number of groups you're comparing:
○ t-Test: Suitable for comparing means between two groups (paired or independent).
■ One-Sample t-Test: Compares a sample mean to a hypothesized population
mean.
■ Independent-Samples t-Test: Compares means from two independent
groups.
○ ANOVA (Analysis of Variance): Designed for comparing means across three or
more groups and analyzing the influence of one or two independent variables
(factors) on the dependent variable.
■ One-Way ANOVA: Analyzes the effect of one independent variable on the
dependent variable across multiple groups.
■ Two-Way ANOVA: Examines the combined effects of two independent
variables on the dependent variable across multiple groups.
3. Statistical Test and p-value: We conduct a statistical test (specific to the chosen t-test or
ANOVA) and calculate a p-value. The p-value represents the probability of observing a
difference as extreme as the one we saw, assuming the null hypothesis is true.
4. Decision Rule: Based on a pre-defined significance level (usually alpha = 0.05), we interpret
the p-value:
○ p-value > alpha (e.g., p > 0.05): Fail to reject the null hypothesis. The observed
difference might be due to chance.
○ p-value <= alpha (e.g., p <= 0.05): Reject the null hypothesis. There is evidence of a
statistically significant difference between the means
t- test
ONE WAY ANOVA
TWO WAY ANOVA

Statstics NOTES SEM2

Uploaded by

Copyright:

Available Formats

Statstics NOTES SEM2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statstics NOTES SEM2

Uploaded by

Copyright:

Available Formats

STATISTICS NOTES

● Data: definition, nature, characteristics and analysis of data

Q1. Define Data.

Data in statistics is a collection of measurements or observations that provides us with information

about a population or phenomenon.

● Observations: These can be qualitative characteristics or descriptive information not

involving numbers (e.g., hair color, customer satisfaction rating).

● Understand patterns and trends within the population.

● Test hypotheses and draw conclusions about relationships between variables.

vs. qualitative and discrete vs. continuous.

experiments, observational studies).

extract knowledge and meaning from that data.

1. Form and Measurement:

operations (e.g., height, income, exam scores). Alternatively, it can be categorical

(qualitative), representing characteristics or classifications (e.g., hair color, customer

satisfaction level, blood type).

(e.g., temperature in Celsius, IQ scores).

4. Context and Representation:

data's nature and relationships within it.

inferences, and avoid misinterpretations in your statistical endeavors.

interpret it. Here's a breakdown of some key characteristics:

dataset. Inconsistent data can complicate analysis and lead to errors.

noise and reduces the effectiveness of analysis.

data limits its usefulness for analysis.

● Important for protecting sensitive data from unauthorized access or modification.

● As discussed previously, data can be quantitative (numerical) or qualitative (categorical).

Additionally, it can be discrete (distinct values) or continuous (theoretical range of values).

● Data can be susceptible to various biases introduced during collection, sampling, or

decision-making. Here's a breakdown of the key phases involved:

Common methods include surveys, experiments, observational studies, and existing

visualizations (histograms, boxplots, scatterplots), and correlation analysis are employed.

5. Modeling and Statistical Inference:

from which the data originated.

6. Communication and Interpretation:

context of the research question.

Here are some additional points to consider:

data privacy and avoiding biased interpretations.

extract meaningful insights, and leverage them for informed decision-making.

● Assumptions: Relies on specific assumptions about the underlying population distribution

analysis of variance (ANOVA), and correlation analysis (Pearson's correlation coefficient).

○ More powerful and statistically efficient when assumptions are met.

○ Sensitive to violations of assumptions, leading to inaccurate results.

● Assumptions: Make fewer or no assumptions about the underlying population distribution or

● Tests: Common non-parametric tests include Mann-Whitney U test (equivalent to the

t-test), Kruskal-Wallis test (equivalent to ANOVA), and Spearman's rank correlation

data with unequal variances.

○ May provide less detailed information about the data.

Choosing the Right Method:

continuous data, while non-parametric tests can handle both.

normality test can be helpful.

● Research Question: Are you interested in comparing means, medians, or relationships

between variables? Different tests address different questions.

Descriptive and Inferential Statistics

Descriptive Statistics: Characterizing the Data Set

for a more nuanced interpretation of the central tendency.

representations enhance our comprehension of the data's structure and patterns.

Inferential Statistics: Drawing Inferences about Populations

key concepts in inferential statistics:

conclusions about the population based on the sample data.

higher power, increasing the ability to detect a true effect if it exists.