Lesson 1 Introduction To Statistics 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

INTRODUCTION

TO STATISTICS
Prepared by: JAKE C. MAGBANUA
Data and Statistics
Statistics
-Collection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting,
analyzing, interpreting, and drawing conclusions.

Data consists of information coming from observations, counts, measurements, or responses


 In the field of education,
 In the field of business and economics,
 In the field of science and technology,
 In psychology,
 In the government, and others
Statistics is a very important tool in researches and studies. Statistical designs and experiments are utilized to gather
more information from a limited body of observation. Various statistical techniques are used in the laboratories,
experimental fields, or under controlled conditions. The utilization of these tools in statistics is needed so that accurate
and reliable results are determined.
 Thus, the study of statistics requires primarily the understanding of basic concepts, symbols, and
mathematical notations.
Statistics
Two main branches of Statistics
1. Descriptive Statistics
Collection, organization, summarization, and presentation of data.

2. Inferential Statistics
Generalizing from samples to populations using probabilities. Performing hypothesis testing,
determining relationships between variables, and making predictions.
3 main types of descriptive statistics

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability
of a dataset.
 Distribution refers to the frequencies of different responses.
 Measures of central tendency give you the average for each response.
 Measures of variability show you the spread or dispersion of your dataset.
Inferential Statistics

-Involves drawing the right conclusions from the statistical analysis that has been performed using descriptive
statistics. In the end, it is the inferences that make studies important and this aspect is dealt with in inferential
statistics.

-Most predictions of the future and generalizations about a population by studying a smaller sample come under the
purview of inferential statistics.

-Most social sciences experiments deal with studying a small sample population that helps determine how the
population in general behaves.

-By designing the right experiment, the researcher is able to draw conclusions relevant to his study.

-While drawing conclusions, one needs to be very careful so as not to draw the wrong or biased conclusions. Even
though this appears like a science, there are ways in which one can manipulate studies and results through various
means
Variables

Variable
Characteristic or attribute that can assume different values

Random Variable
A variable whose values are determined by chance.
Data is a specific measurement of a variable – it is the value you record in your data sheet. Data is generally
divided into two categories:

• Quantitative data represents amounts


• Categorical data represents groupings

A variable that contains quantitative data is a quantitative variable; a variable that contains categorical data is
a categorical variable. Each of these types of variable can be broken down into further types.
Variables

Discrete vs Continuous Variables (Categorical and Continuous Variables)


- Categorical variables are also known as discrete or qualitative variables. Categorical variables can be further
categorized as either nominal, ordinal or dichotomous.
- Discrete variables are usually obtained by counting. There are a finite or countable number of choices
available with discrete data. You can't have 2.63 people in the room.

Example: number of deaths, births, students, accident cases, …..

-Continuous variables can never be exact no matter what we do in getting the measurement, are usually
obtained by measuring.
Example: Length, weight, temperature, and time are all examples of continuous variables.

Note: Since continuous variables are real numbers, we usually round them or we set an interval. This implies a
boundary depending on the number of decimal places.
For example: 64 is really anything 63.5 ≤ x < 64.5. Likewise, if there are two decimal places, then 64.03 is really
anything 63.025 ≤x < 63.035. Boundaries always have one more decimal place than the data and end in a 5.
Variables

Dependent and Independent Variables


Variables can be grouped into dependent and independent variables with respect on their use.

Independent variable is used as predictor if the objective is to predict the value of one variable on the basis of
the other.
- Independent variables are referred to as treatment variables. They do not change in relation to other
factors. Instead, scientific researchers explore whether or not an independent variable causes, leads to or is
associated with a change in one or more dependent variables.

Dependent variable means the variable whose value is predicted.


- Dependent variables are factors studied in terms of how they change in relation to independent variables.
Variables

Dependent and Independent Variables (continuation)


In a scientific study, the dependent variable is the variable that the researcher is testing and measuring
in relation to the independent variable. The researcher is seeking to determine whether or not
manipulating the independent variable will lead to different outcomes regarding the dependent
variable.
Socioeconomic Status and Number of Children

A social scientist explores if there is a link between socioeconomic status and the number of children someone has.
 independent variable - socioeconomic status
 dependent variable - number of children

Job Satisfaction and Pay


A human resources professional wonders if how much money a person earns can impact the extent to which an individual
experiences job satisfaction.
 independent variable - compensation (salary or wages)
 dependent variable - job satisfaction
Variables

Qualitative and Quantitative variables


1. Quantitative Variables: Sometimes referred to as “numeric” variables, these are variables that represent a
measurable quantity. Examples include:
 Number of students in a class
 Number of square feet in a house
 Population size of a city
 Age of an individual
 Height of an individual
2. Qualitative Variables: Sometimes referred to as “categorical” variables, these are variables that take on names
or labels and can fit into categories. Examples include:
 Eye color (e.g. “blue”, “green”, “brown”)
 Gender (e.g. “male”, “female”)
 Breed of dog (e.g. “lab”, “bulldog”, “poodle”)
 Level of education (e.g. “high school”, “Associate’s degree”, “Bachelor’s degree”)
 Marital status (e.g. “married”, “single”, “divorced”)
Population vs Sample

Population
- All subjects possessing a common characteristic that is being studied.
Parameter
Characteristic or measure obtained from a population.
Sample
- Subgroup or subset of the population
Statistic (not to be confused with Statistics)
Characteristic or measure obtained from a sample.
Example: In a recent survey, 300 students of CPSU were asked if they want to be vaccinated against
COVID 19 . Thirty (30) of the students said yes. Identify the population and the sample.

Responses of all 300 students is the population


CPSU students Responses of students in
(population) survey (sample)
30 students is the sample
Population vs Sample

Population vs Sample – What is the difference?


Usually, a sample of the population is used in research, as it is easier and cost-effective to
process a smaller subset of the population rather than the entire group.

Population   Sample
The measurable characteristic of the population like the
  The measurable characteristic of the sample is called a statistic.
mean or standard deviation is known as the parameter.
The sample is a subset of the population that is derived using
Population data is a whole and complete set.  
sampling.

A survey done of an entire population is accurate and more


A survey done using a sample of the population bears accurate
precise with no margin of error except human inaccuracy in
  results, only after further factoring the margin of
responses. However, this may not be
error and confidence interval.
possible always.

The parameter of the population is a numerical or The statistic is the descriptive component of the sample found by
measurable element that defines the system of the set. using sample mean or sample proportion.  
Sample rather than the
Population
Reasons to choose a sample from a given population
 Practicality: In most cases, a population can be too large to collect accurate data – which is not practical. Samples offer a representation of
the whole population if sampled accordingly.
 It offers urgent data: When it comes to research, the amount of time available can be a defining factor for a study. A sample provides a
smaller set of the population for review, that delivers data that is useful to represent the whole population.
 Cost-effective: The cost of conducting research is often a parameter for the study.
 Accuracy of representation: Depending on the method of sampling, research conducted on a sample can be accurate with lesser non-
response bias, than if performed by the census. A sample that is selected using the non-probability method is an accurate representation of
the population. This data collected can be used to gather insight into the whole community.
 Inferential statistics: Inferential statistics is a process by which representative data is used to infer insights about the entire population.
Data collected from a sample represents the whole population. Inferential statistics can only be obtained using data samples.
 At times, a sample is more accurate than a census: A census of an entire population does not always offer accurate data due to errors
such as inconsistency in responses, or non-response bias. A carefully obtained sample, however, does away with this bias and provides more
accurate data – that adequately represents the population.
 Manageable: Sometimes, collecting an entire population of data is near impossible as some populations are too challenging to come by. In
this case, a sample can be used to represent the study as it is feasible, manageable, and accessible.
Scales of Measurement
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These
go from lowest level to highest level. Data is classified according to the highest
level which it fits.
1) Nominal is the lowest level. Only names are meaningful here.
Nominal Scale. Nominal variables (also called categorical variables) can be placed into
categories. They don’t have a numeric value and so cannot be added, subtracted, divided or multiplied.

Dichotomous variables are nominal variables which have only two categories or levels. For example, if
we were looking at gender, we would most probably categorize somebody as either "male" or "female".
Scales of Measurement
2.
Ordinal
Ordinal variables are variables that have two or more categories just like nominal variables only
the categories can also be ordered or ranked. So if you asked someone if they liked the policies of the
Duterte Administration and they could answer, "They are OK" or "Yes, “Not Okay or No”, “undecided or
it can be yes or not”, not very much and many more - a lot of categories, then you have an ordinal
variable. Why? Because you have categories in an orderly manner.
Thus, the result can be ranked, you can rank them from the most positive (Yes, a lot), to the middle
response (They are OK), to the least positive (Not very much). However, while we can rank the levels, we
cannot place a "value" to them; we cannot say that "They are OK" is twice as positive as "Not very much"
for example.

Response Strongly disagree disagree undecided agree Strongly agree


Rating 1 2 3 4 5
Scales of Measurement
3.
Interval-numbers are assigned to the items or objects. These are use to identify and rank the objects. They
also measure the degree of differences between any two classes.

Example: weights, heights, temperatures, IQ, grades, test scores

4. Ratio
measurement
-the ratio of numbers assigned in the measurement shows the ratio in the amounts of property being
measured.

In statistics, psychology, social sciences as well as education, the interval and ration are treated
ordinarily in the same manner, the only difference between interval ad ratio measurements is that there is
true zero.
Scales of Measurement

Ambiguities in classifying a type of variable


In some cases, the measurement scale for data is ordinal, but the variable
is treated as continuous. For example, a Likert scale that contains five
values - strongly agree, agree, neither agree nor disagree, disagree, and
strongly disagree - is ordinal. However, where a Likert scale contains
seven or more value - strongly agree, moderately agree, agree, neither
agree nor disagree, disagree, moderately disagree, and strongly disagree
- the underlying scale is sometimes treated as continuous (although
where you should do this is a cause of great dispute).
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
Probabilistic/random sampling

Random sampling simply describes when every element in a population has an equal chance of being
chosen for the sample.
Probability sampling means that every member of the target population has a known chance of being
included in the sample. 
Probability sampling methods include simple random sampling, systematic sampling,
stratified sampling, and cluster sampling.

 Random sampling is analogous to putting everyone's name into a hat and drawing out several names.
Each element in the population has an equal chance of occurring.
 Systematic sampling is easier to do than random sampling. In systematic sampling, the list of
elements is "counted off". That is, every kth element is taken. This is similar to lining everyone up and
numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering, all people numbered 4 would be used.
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
 Stratified sampling also divides the population into groups called strata. However, this time it is by some
characteristic, not geographically. For instance, the population might be separated into males and females. A
sample is taken from each of these strata using either random, systematic, or convenience sampling.
 Cluster sampling is accomplished by dividing the population into groups -- usually geographically. These
groups are called clusters or blocks. The clusters are randomly selected, and each element in the selected
clusters are used.
Cluster sampling starts by dividing a population into groups, or clusters. What makes this different that
stratified sampling is that each cluster must be representative of the population. Then, you randomly selecting
entire clusters to sample.
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
Non-probability sampling, on the other hand, does not involve “random” processes for selecting
participants. In non-probability sampling, the members of the population will not have an equal
chance of being selected, and in many cases, there will be members of the population who have
no chance of being selected

 Convenience sampling is very easy to do, but it's probably the worst technique to use. In convenience
sampling, readily available data is used. That is, the first people the surveyor runs into.
 Quota sampling is a non-probabilistic sampling method where we divide the survey population
into mutually exclusive subgroups. These subgroups are selected with respect to certain known (and
thus non-random) features, traits, or interests. People in each subgroup are selected by the researcher or
interviewer who is conducting the survey.
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
• Snowball sampling is where research participants recruit other participants for a test or study. It is used
where potential participants are hard to find. It’s called snowball sampling because (in theory) once you have
the ball rolling, it picks up more “snow” along the way and becomes larger and larger. Snowball sampling is
a non-probability sampling method. It doesn’t have the probability involved, with say, simple random
sampling (where the odds are the same for any particular participant being chosen). Rather, the researchers
used their own judgment to choose participants.

• Judgmental sampling is a non-probability sampling technique where the researcher selects units to be


sampled based on their knowledge and professional judgment.
This type of sampling technique is also known as purposive sampling and authoritative sampling.

Purposive sampling is used in cases where the specialty of an authority can select a more representative sample that
can bring more accurate results than by using other probability sampling techniques. The process involves nothing
but purposely handpicking individuals from the population based on the authority's or the researcher's knowledge
and judgment.
Thank you very much!

You might also like