Using Probability Distributions in R: Dnorm, Pnorm, Qnorm, and Rnorm
Using Probability Distributions in R: Dnorm, Pnorm, Qnorm, and Rnorm
Using Probability Distributions in R: Dnorm, Pnorm, Qnorm, and Rnorm
R is a great tool for working with distributions. However, one has to know which
specific function is the right wrong. Here, I’ll discuss which functions are
available for dealing with the normal distribution: dnorm, pnorm, qnorm, and
rnorm.
Distribution functions in R
Every distribution has four associated functions whose prefix indicates the type
of function and the suffix indicates the distribution. To exemplify the use of
these functions, I will limit myself to the normal (Gaussian) distribution. The four
normal distribution functions are:
f(x∣μ,σ2)=12πσ2−−−−√exp(−(x−μ)22σ2)f(x∣μ,σ2)=12πσ2exp(−(x−μ)22σ2)
From these data, we can now answer the initial question as well as additional
questions:
pp <- function(x) {
print(paste0(round(x * 100, 3), "%"))
}
# likelihood of IQ == 140?
pp(iq.df$Density[iq.df$IQ == 140])
## [1] "0.076%"
# likelihood of IQ >= 140?
pp(sum(iq.df$Density[iq.df$IQ >= 140]))
## [1] "0.384%"
# likelihood of 50 < IQ <= 90?
pp(sum(iq.df$Density[iq.df$IQ <= 90]))
## [1] "26.284%"
The cumulative density function: pnorm
The cumulative density (CDF) function is a monotonically increasing function as
it integrates over densities via
f(x|μ,σ)=12[1+erf(x−μσ2–√)]f(x|μ,σ)=12[1+erf(x−μσ2)]
To get an intuition of the CDF, let’s create a plot for the IQ data:
As we can see, the depicted CDF shows the probability of having an IQ less or
equal to a given value. This is because pnorm computes the lower tail by default,
i.e. P[X<=x]P[X<=x]. Using this knowledge, we can obtain answers to some of
our previous questions in a slightly different manner:
Note that the results from pnorm are the same as those obtained from manually
summing up the probabilities obtained via dnorm. Moreover, by
setting lower.tail = FALSE, dnorm can be used to directly compute p-values,
which measure how the likelihood of an observation that is at least as extreme
as the obtained one.
To remember that pnorm does not provide the PDF but the CDF, just imagine that
the function carries a p in its name such that pnorm is lexicographically close
to qnorm, which provides the inverse of the CDF.
# note: we can also implement our own sampler using the densities
my.sample <- sample(iq.df$IQ, 100, prob = iq.df$Density, replace = TRUE)
my.sample.df <- data.frame("IQ" = my.sample)
ggplot(my.sample.df, aes(x = IQ)) + geom_histogram()
Note that we called set.seed in order to ensure that the random number
generator always generates the same sequence of numbers for reproducibility.