Excel Summary Doc For STA1000 Ammaar Salasa 2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

STA1000 Microsoft Excel Summary

Contents
STA1000 Microsoft Excel Summary ........................................................................................... 1
STA1000 useful Microsoft Excel Formulae and Functions ......................................................... 2
General keyboard shortcuts:................................................................................................... 2
Jargon and Terminology: ........................................................................................................ 2
Notation and Conventions: ..................................................................................................... 3
Checking if a built-in function exists ........................................................................................ 5
Excel for STA1000.................................................................................................................. 6
Section: Probability ............................................................................................................. 6
Note on Conditional probability: ...................................................................................... 7
Note on graphical representation of data........................................................................ 8
Section: Measures of location, central tendency and spread ........................................ 8
Section: Probability Distributions...................................................................................10
STA1000 useful Microsoft Excel Formulae and Functions
The goal of creating this document is to demystify the world of Microsoft Excel for
students taking introductory statistics courses. The document provides a summary of
some useful ways to use Microsoft Excel and encourages a level of confidence such
that Excel can become a tool. Note that this summary is not comprehensive. Many other
useful functions and formulae exist. This summary also does not concern any use of
Macros or speak to any VBA knowledge.

General keyboard shortcuts:


Ctrl + A = Select All Ctrl + Shift + S = Save file as
Ctrl + C = Copy Ctrl + Z = undo
Ctrl + X = Cut Ctrl + Y = redo
Ctrl + V = Paste Ctrl + Shift + N = create new folder
Ctrl + Shift + Esc = Open task manager Ctrl+Shift+Enter = apply array formula
Ctrl + S = Save current file

Jargon and Terminology:


Cell: rectangular unit in a spreadsheet with row and column identifier, A1 or Z12
Column: vertical collection of cells, denoted using letters or combinations of letters
Row: horizontal collection of cells, denoted using numbers
Formula: a calculation that uses cell references or numbers to calculate a value
Function: a function is a built-in formula available in Excel designed to carry out a task
Arguments: anything contained inside the brackets of a formula or function that
generates output, arguments are separated either by commas or semi-colons
Output: the data produced by a formula or function from the given arguments
Array: a collection of cells in column, row, or table form. Corresponding or matching
arrays must have the values to be compared in the same relative positions in both
tables
Note: This summary references Excel Online and versions excel on PCs running
Windows software. While Excel for macOS and Linux among other Operating systems
exist and are similar, some differences will be observed. In the case where
irreconcilable differences exist, consider switching to Excel Online or troubleshooting by
googling the problem first.
Example:

Observe that each individual rectangle is a cell, the cell highlighted in green is in row 4
and column B. The highlighted cell has a built-in function used to add number called the
SUM function. We will soon see examples of manual formulae. The array in column A
corresponds to the array in column B since items in the same row are related. A4 gives
an English description of what is shown in B4.

Notation and Conventions:


All functions or formulae in excel begin with the ‘=’. The most basic formulae consist of
performing basic operations (+, -, *, /).
Referencing:
It is best practice to create and label cells that contain specific data and then reference
those cells when creating even the most basic formulae instead of inserting the data
itself into formulae. For example:

Notice that although I could have simply entered the formula in cell B3 but, by listing the
data as I have and referencing the cells in which they are found, I am now able to
change either or both of numbers and have my formula adjust to these new values. This
concept, referencing cells in formulae instead of the actual values, is crucial in excel.
Absolute referencing:
Excel has the feature that when a formula that references a cell is copied, the
referenced cells shift such that the referenced cells are in the same relative position as
they would be for the original cell.
Example:
Observe that after copying the formula in cell B3 to cell C3 the referenced cells also
adjust so that C1 and C2 are added instead of B1 and B2. To combat this, we can
absolute reference either the row or column or both relating to a cell. Example:

Notice that I have used ‘$’ to absolute reference, this ‘locks the item into place’. For the
first cell, B1, no matter where I copy the formula to, the column, B, will be unchanged,
the row will still adapt though. The inverse is true of the second cell, B2.
Array Formulae:
Since September 2018, new dynamic array formulae have been added to Microsoft 365
which will automatically place multiple outputs from any formula into either subsequent
row or column cells. In addition, these formulae can be entered into once cell in the
same way that a normal formula can.
In Legacy versions of Microsoft excel, applying an array formula required that all output
cells be selected, and the formula would be applied using Ctrl+Shift+Enter. These
legacy array formulae are sometimes called CSE formulas.

In the above, I have used an array formula to find the product of multiple cells and add
the result. Although this is a simple example, it demonstrates the usefulness of array
formulae in performing multi-step data manipulation for analysis.
Checking if a built-in function exists
It is not unusual to know and be able to describe what you need to do and yet not know
how to put this into mathematical language or high-level commands. Fortunately, there
are two ways to help this fact:
1. Excel’s formula finder: navigate to the formula tab at the top of your screen and
search, in words for what you need done, then press next:

You will always be able to use this functionality even in tests and exams, but do not
assume that you have loads of time to identify the correct function. Your time in the
assessment is limited.

2. Google it: although you can only do this before and after tests and exams this is
a useful tool if the excel description is confusing or the search does not produce
what you need:
Excel for STA1000
Section: Probability
=PROB(x_range, probability_range, lower limit, [upper limit])
This function is especially useful when dealing with the tables of values given when
dealing with probability mass functions of random variables. Often a question will give
you either a list of values or table of values with associated probabilities and ask for the
probability of one or more of those values occurring as an event. This function returns
the probability that values in a range are between two limits or equal to a lower limit
Arguments
x_range – range of discrete values
probability_range – probability associated with those discrete values being the result of
random experiment
lower limit – minimum/only tested value in range
upper limit (optional) - the maximum value in range, give this value if you want to test a
range, do not give if you only need to test one value.
Example:

Observe that for the x range and probability we give ranges not individual cells, I did a
bad thing by referencing a number instead of a cell.
=COMBIN(number, number chosen)
Gives the number of combinations without repetition for a given number of choices.
Arguments
Number – number of total options
Number chosen – number of items chosen
Same as introstat formula n!/(n-x)!x!
=COMBINA(number, number chosen)
Gives the number of combinations with repetition for a given number of options and
choices.
Arguments
Number – number of total options
Number chosen – number of items chosen

=PERMUT(number, number chosen)


Gives number of permutations for a given number of options and choices.
Arguments
Number – number of total options
Number chosen – number of items chosen

=PERMUTATIONA(number, number chosen)


Gives number of permutations with repetitions for a given number of options and
choices
Arguments
Number – number of total options
Number chosen – number of items chosen

Note on Conditional probability:


While there are no direct formulae or functions for working with conditional probabilities
in Excel, it is incredibly useful to create a one- or two-way table in Excel to help with
calculation of conditional probabilities.
Alternatively, you could simply list the probabilities and make use of the functions AND
as well as OR to help with calculations. For more guidance or advice on this, it is
possible to search for help on this within Microsoft Excel using the built-in search
feature.
Note on graphical representation of data
It can be useful, especially if you are struggling with which steps to take next, to input
the given data into Excel and generate pie charts, bar graphs, scatterplots and many
more graphs.
Consider this approach if ever you are stuck and have a few minutes with little idea
what to do next, since it can help to see the relationships between your 2+ variables or
between 1 variable and the probability of each different event occurring.

Section: Measures of location, central tendency and spread


=AVERAGE(number 1, number 2, …, number n) or (range)
This function calculates the mean of a set of data values. It has 1 type of argument,
simply the input values. These values can be given individually (not standard practice)
or in the form of a range (best practice).

=COUNT(number 1, number 2, …, number n) or (range)


This function calculates the number of values in a set of data values. It has 1 type of
argument, simply the input values. These values can be given individually (not standard
practice) or in the form of a range (best practice).
=COUNTIF(search array, search query)
Counts the number of times a specific phrase/term appears within a selected array or
range of cells
Arguments
Search array – group of cells in which you want to count the frequency of phrase
Search query – the specific phrase you want to find, normally given in quotations

=SUM(number 1, number 2, …, number n) or (range)


This function calculates the sum of all values in a set of data values. It has 1 type of
argument, simply the input values. These values can be given individually (not standard
practice) or in the form of a range (best practice).
=MEDIAN(number 1, number 2, …, number n) or (range)
This function calculates the median of all values in a set of data values. It has 1 type of
argument, simply the input values. These values can be given individually (not standard
practice) or in the form of a range (best practice).

=STDEV.S(number 1, number 2, …, number n) or (range)


This function calculates the sample standard deviation of all values in a set of data
values by assuming that given data is only a part of a larger population, it is the
preferred formula. It has 1 type of argument, simply the input values. These values can
be given individually (not standard practice) or in the form of a range (best practice).

=STDEV.P(number 1, number 2, …, number n) or (range)


This function calculates the population standard deviation of all values in a set of data
values by assuming that the given data represents all values in a population, it is not
preferred and should only be used if you strongly suspect that the data is not a sample
but rather the entire population. It has 1 type of argument simply the input values. These
values can be given individually (not standard practice) or in the form of a range (best
practice).

=MODE(number 1, number 2, …, number n) or (range)


This function calculates the most frequently occurring value of all values in a set of data
values. It has 1 type of argument, simply the input values. These values can be given
individually (not standard practice) or in the form of a range (best practice).

=MAX(number 1, number 2, …, number n) or (range)


This function calculates the maximum value of all values in a set of data values. It has 1
type of argument, simply the input values. These values can be given individually (not
standard practice) or in the form of a range (best practice).

=MIN(number 1, number 2, …, number n) or (range)


This function calculates the minimum value of all values in a set of data values. It has 1
type of argument, simply the input values. These values can be given individually (not
standard practice) or in the form of a range (best practice).
=PERCENTILE.INC(array; n)
This function calculates the nth percentile of data values using the inclusive method, i.e.,
includes highest and lowest values in the calculation. It has 2 types of arguments, firstly
the input array. Secondly, the n, for nth percentile.
=QUARTILE.INC(array; n)
This function calculates the nth quartile of data values using the inclusive method, i.e.,
includes highest and lowest values in the calculation. It has 2 types of arguments, firstly
the input array. Secondly, the n, for nth quartile.

Section: Probability Distributions


=BINOM.DIST(number_s, trials, prob of success, [cumulative])
Returns the individual term probability for a binomial distribution
Arguments:
Number_s: required number of successes
Trials: total number of trials of experiment
Prob of success: the probability of the event occurring in any given trial
Cumulative: True - includes the probability that fewer than this number of successes
occurs as well. False - gives the individual probability of this exact number of successes
occurring.
Equivalent to (nx)(x)p(n-x)1-p
=POISSON.DIST(x, mean, [cumulative])
Returns the Poisson distribution
Arguments
X – number that you are interested in observing
Mean: average rate of occurrence lambda, ensure it has the correct time units.
Cumulative: True - includes the probability that fewer than this number of things occurs
as well. False - gives the individual probability of this exact number of things occurring.
=EXPON.DIST(x, lambda, [cumulative])
Returns the exponential distribution
Arguments
X: amount of time between events
Lambda: average rate of occurrence, ensure that you have the correct units
Cumulative: True - includes the probability that less time passes as well. False –
evaluates the probability density function at that point

=NORM.DIST(x, mean, standard_dev, cumulative)


Returns normal distribution – Gives the two sides probability of encountering a test
statistic equal to or less extreme than the one observed. Gives opposite p value
Arguments
X: value at which the distribution is evaluated (test statistic/critical value)
Mean: hypothesized mean or given mean
Standard_dev: given standard deviation
Cumulative: True - returns probability that encountered test statistic is equal to or less
extreme than the one observed. False - evaluates the probability density function at that
point

When using this formula ensure that you are dealing with a two-sided test and
remember that is asked for a p value to subtract the output from 1, i.e. 1 –
NORM.DIST(…) = p value. NO NEED TO CONVERT TO Z OR CALCULATE A TEST
STAT
=NORM.S.DIST(z, cumulative)
Returns standard normal distribution, after converting to z and finding test stat
Arguments
Z: value at which the distribution is evaluated (test statistic/critical value)
Cumulative: True - returns probability that encountered test statistic is equal to or less
extreme than the one observed. False – returns the probability that an encountered test
statistic is equal to the observed.
Same as above but requires the calculation of the test stat using the formulae in
introstat. Like a z table but no reading from a bad pdf on a screen.

=NORM.INV(probability, mean, standard_dev)


Returns the inverse of cumulative normal distribution. You have probability and want the
alternative tested value. Once again, two tailed.
Arguments
Probability – cumulative probability of getting a test stat equal to or less extreme than
the observed
Mean: hypothesized mean or given mean
Standard_dev: given standard deviation
Useful for finding critical values when given significance level, α.

=NORM.S.INV(probability)
Returns the inverse of cumulative standard normal distribution. You have probability
and want the test stat or critical value. Once again, two-tailed will give z values.
=T.DIST.RT(x, deg_freedom)
Returns the right tailed t-distribution, sometimes called student’s t-test, probability.
Arguments
X – value at which the distribution is evaluated (critical value/test statistic)
Deg_Freedom: degrees of freedom
Gives the equivalent for a t-test of what the z table would give. Light blue highlighted
region below:
=T.DIST.2T(x, deg_freedom)
Returns the two tailed t-distribution, sometimes called student’s t-test, probability.
Arguments
X – value at which the distribution is evaluated (critical value/test statistic)
Deg_Freedom: degrees of freedom
Gives the two tailed t-test alternative of what the norm.dist formula would give. Red
highlighted area below:

=T.DIST(x, deg_freedom, cumulative)


Returns left tailed, negative, t-distribution probability.
Arguments
X – value at which the distribution is evaluated (critical value/test statistic)
Deg_Freedom: degrees of freedom
Cumulative: True - returns probability that encountered test statistic is equal to or less
extreme than the one observed. False – returns the probability that an encountered test
statistic is equal to the observed.
Seldom used
=T.INV(probability, deg_freedom)
Gives left tailed, negative, inverse of one-sided t-test probability
You have probability but need to know the test stat or critical value. Remember output
will be negative.
=T.INV.2T(probability, deg_freedom)
Gives inverse of 2-sided t-test probability. Same as above but will not be negative.
=CHISQ.DIST(x, deg_freedom, cumulative)
Returns left tailed, negative, probability of observing a test statistic equal to or less
extreme than the observed test statistic.
Arguments
X: D2, the test statistic
Deg_freedom: degrees of freedom
Cumulative: True - returns probability that encountered test statistic is equal to or less
extreme than the one observed. False – returns the probability that an encountered test
statistic is equal to the observed.
=CHISQ.DIST.RT(x, deg_freedom)
Returns right tailed, positive, probability of observing a test statistic equal to or less
extreme than the observed test statistic.
Arguments
X: D2 the test statistic
Deg_freedom: degrees of freedom
=CHISQ.INV(probability, deg freedom)
Returns left tailed inverse of probability for chi-squared distribution. When you have
probability and need D2. Negative and seldom used
=CHISQ.INV.RT(probability, deg freedom)
Returns right tailed inverse of probability for chi-squared distribution. When you have
probability and need D2. Positive and more frequently used

=CHISQ.TEST(Array1 actual range, Array2 expected range)


Conducts a chi-squared test for independence giving the test statistic D2.
Arguments
Actual range: requires a collection of cells that show the observed values
Expected range: requires a collection of cells with each cell corresponding to its
observed position in the actual range.

You might also like