Lesson 1 Introduction To Statistics 1
Lesson 1 Introduction To Statistics 1
Lesson 1 Introduction To Statistics 1
TO STATISTICS
Prepared by: JAKE C. MAGBANUA
Data and Statistics
Statistics
-Collection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting,
analyzing, interpreting, and drawing conclusions.
2. Inferential Statistics
Generalizing from samples to populations using probabilities. Performing hypothesis testing,
determining relationships between variables, and making predictions.
3 main types of descriptive statistics
The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability
of a dataset.
Distribution refers to the frequencies of different responses.
Measures of central tendency give you the average for each response.
Measures of variability show you the spread or dispersion of your dataset.
Inferential Statistics
-Involves drawing the right conclusions from the statistical analysis that has been performed using descriptive
statistics. In the end, it is the inferences that make studies important and this aspect is dealt with in inferential
statistics.
-Most predictions of the future and generalizations about a population by studying a smaller sample come under the
purview of inferential statistics.
-Most social sciences experiments deal with studying a small sample population that helps determine how the
population in general behaves.
-By designing the right experiment, the researcher is able to draw conclusions relevant to his study.
-While drawing conclusions, one needs to be very careful so as not to draw the wrong or biased conclusions. Even
though this appears like a science, there are ways in which one can manipulate studies and results through various
means
Variables
Variable
Characteristic or attribute that can assume different values
Random Variable
A variable whose values are determined by chance.
Data is a specific measurement of a variable – it is the value you record in your data sheet. Data is generally
divided into two categories:
A variable that contains quantitative data is a quantitative variable; a variable that contains categorical data is
a categorical variable. Each of these types of variable can be broken down into further types.
Variables
-Continuous variables can never be exact no matter what we do in getting the measurement, are usually
obtained by measuring.
Example: Length, weight, temperature, and time are all examples of continuous variables.
Note: Since continuous variables are real numbers, we usually round them or we set an interval. This implies a
boundary depending on the number of decimal places.
For example: 64 is really anything 63.5 ≤ x < 64.5. Likewise, if there are two decimal places, then 64.03 is really
anything 63.025 ≤x < 63.035. Boundaries always have one more decimal place than the data and end in a 5.
Variables
Independent variable is used as predictor if the objective is to predict the value of one variable on the basis of
the other.
- Independent variables are referred to as treatment variables. They do not change in relation to other
factors. Instead, scientific researchers explore whether or not an independent variable causes, leads to or is
associated with a change in one or more dependent variables.
A social scientist explores if there is a link between socioeconomic status and the number of children someone has.
independent variable - socioeconomic status
dependent variable - number of children
Population
- All subjects possessing a common characteristic that is being studied.
Parameter
Characteristic or measure obtained from a population.
Sample
- Subgroup or subset of the population
Statistic (not to be confused with Statistics)
Characteristic or measure obtained from a sample.
Example: In a recent survey, 300 students of CPSU were asked if they want to be vaccinated against
COVID 19 . Thirty (30) of the students said yes. Identify the population and the sample.
Population Sample
The measurable characteristic of the population like the
The measurable characteristic of the sample is called a statistic.
mean or standard deviation is known as the parameter.
The sample is a subset of the population that is derived using
Population data is a whole and complete set.
sampling.
The parameter of the population is a numerical or The statistic is the descriptive component of the sample found by
measurable element that defines the system of the set. using sample mean or sample proportion.
Sample rather than the
Population
Reasons to choose a sample from a given population
Practicality: In most cases, a population can be too large to collect accurate data – which is not practical. Samples offer a representation of
the whole population if sampled accordingly.
It offers urgent data: When it comes to research, the amount of time available can be a defining factor for a study. A sample provides a
smaller set of the population for review, that delivers data that is useful to represent the whole population.
Cost-effective: The cost of conducting research is often a parameter for the study.
Accuracy of representation: Depending on the method of sampling, research conducted on a sample can be accurate with lesser non-
response bias, than if performed by the census. A sample that is selected using the non-probability method is an accurate representation of
the population. This data collected can be used to gather insight into the whole community.
Inferential statistics: Inferential statistics is a process by which representative data is used to infer insights about the entire population.
Data collected from a sample represents the whole population. Inferential statistics can only be obtained using data samples.
At times, a sample is more accurate than a census: A census of an entire population does not always offer accurate data due to errors
such as inconsistency in responses, or non-response bias. A carefully obtained sample, however, does away with this bias and provides more
accurate data – that adequately represents the population.
Manageable: Sometimes, collecting an entire population of data is near impossible as some populations are too challenging to come by. In
this case, a sample can be used to represent the study as it is feasible, manageable, and accessible.
Scales of Measurement
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These
go from lowest level to highest level. Data is classified according to the highest
level which it fits.
1) Nominal is the lowest level. Only names are meaningful here.
Nominal Scale. Nominal variables (also called categorical variables) can be placed into
categories. They don’t have a numeric value and so cannot be added, subtracted, divided or multiplied.
Dichotomous variables are nominal variables which have only two categories or levels. For example, if
we were looking at gender, we would most probably categorize somebody as either "male" or "female".
Scales of Measurement
2.
Ordinal
Ordinal variables are variables that have two or more categories just like nominal variables only
the categories can also be ordered or ranked. So if you asked someone if they liked the policies of the
Duterte Administration and they could answer, "They are OK" or "Yes, “Not Okay or No”, “undecided or
it can be yes or not”, not very much and many more - a lot of categories, then you have an ordinal
variable. Why? Because you have categories in an orderly manner.
Thus, the result can be ranked, you can rank them from the most positive (Yes, a lot), to the middle
response (They are OK), to the least positive (Not very much). However, while we can rank the levels, we
cannot place a "value" to them; we cannot say that "They are OK" is twice as positive as "Not very much"
for example.
4. Ratio
measurement
-the ratio of numbers assigned in the measurement shows the ratio in the amounts of property being
measured.
In statistics, psychology, social sciences as well as education, the interval and ration are treated
ordinarily in the same manner, the only difference between interval ad ratio measurements is that there is
true zero.
Scales of Measurement
Random sampling simply describes when every element in a population has an equal chance of being
chosen for the sample.
Probability sampling means that every member of the target population has a known chance of being
included in the sample.
Probability sampling methods include simple random sampling, systematic sampling,
stratified sampling, and cluster sampling.
Random sampling is analogous to putting everyone's name into a hat and drawing out several names.
Each element in the population has an equal chance of occurring.
Systematic sampling is easier to do than random sampling. In systematic sampling, the list of
elements is "counted off". That is, every kth element is taken. This is similar to lining everyone up and
numbering off "1,2,3,4; 1,2,3,4; etc". When done numbering, all people numbered 4 would be used.
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
Stratified sampling also divides the population into groups called strata. However, this time it is by some
characteristic, not geographically. For instance, the population might be separated into males and females. A
sample is taken from each of these strata using either random, systematic, or convenience sampling.
Cluster sampling is accomplished by dividing the population into groups -- usually geographically. These
groups are called clusters or blocks. The clusters are randomly selected, and each element in the selected
clusters are used.
Cluster sampling starts by dividing a population into groups, or clusters. What makes this different that
stratified sampling is that each cluster must be representative of the population. Then, you randomly selecting
entire clusters to sample.
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
Non-probability sampling, on the other hand, does not involve “random” processes for selecting
participants. In non-probability sampling, the members of the population will not have an equal
chance of being selected, and in many cases, there will be members of the population who have
no chance of being selected
Convenience sampling is very easy to do, but it's probably the worst technique to use. In convenience
sampling, readily available data is used. That is, the first people the surveyor runs into.
Quota sampling is a non-probabilistic sampling method where we divide the survey population
into mutually exclusive subgroups. These subgroups are selected with respect to certain known (and
thus non-random) features, traits, or interests. People in each subgroup are selected by the researcher or
interviewer who is conducting the survey.
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
• Snowball sampling is where research participants recruit other participants for a test or study. It is used
where potential participants are hard to find. It’s called snowball sampling because (in theory) once you have
the ball rolling, it picks up more “snow” along the way and becomes larger and larger. Snowball sampling is
a non-probability sampling method. It doesn’t have the probability involved, with say, simple random
sampling (where the odds are the same for any particular participant being chosen). Rather, the researchers
used their own judgment to choose participants.
Purposive sampling is used in cases where the specialty of an authority can select a more representative sample that
can bring more accurate results than by using other probability sampling techniques. The process involves nothing
but purposely handpicking individuals from the population based on the authority's or the researcher's knowledge
and judgment.
Thank you very much!