Stata Commands PDF
Stata Commands PDF
General commands
1. To compute means and standard deviations of all variables:
summarize
or, using an abbreviation,
summ
2. To compute means and standard deviations of select variables:
summarize vone vtwo vthree
3. Another way to compute means and standard deviations that allows the by option:
tabstat vone vtwo, statistics(mean, sd) by(vthree)
4. To get more numerical summaries for one variable:
summ vone, detail
http://www.stat.uchicago.edu/~collins/resources/stata/stata-commands.html Page 1 of 5
Stata Commands 12/10/09 1:06 AM
5. See help tabstat to see the numerical summaries available. For example:
tabstat vone, statistics(min, q, max, iqr, mean, sd)
6. Correlation between two variables:
correlate vone vtwo
7. To see all values (all variables and all observations, not recommended for large data
sets):
list
Hit the space bar to see the next page after "-more-" or type "q" to "break"
(stop/interrupt the listing).
8. To list the first 10 values for two variables:
list vone vtwo in 1/10
9. To list the last 10 values for two variables:
list vone vtwo in -10/l
(The end of this command is "minus 10" / "lowercase letter L".)
10. Tabulate categorical variable vname:
tabulate vname
or, using an abbreviation,
tab vname
11. Cross tabulate two categorical variables:
tab vone vtwo
12. Cross tabulate two variables, include one or more of the options to produce column,
row or cell percents and to suppress printing of frequencies:
tab vone vtwo, column row cell
tab vone vtwo, column row cell nofreq
http://www.stat.uchicago.edu/~collins/resources/stata/stata-commands.html Page 2 of 5
Stata Commands 12/10/09 1:06 AM
Regression
1. Compute simple regression line (vy is response, vx is explanatory variable):
regress vy vx
2. Compute predictions, create new variable yhat:
predict yhat
3. Produce scatter plot with regression line added:
graph twoway lfit vy vx || scatter vy vx
4. Compute residuals, create new variable residuals:
predict residuals, resid
5. Produce a residual plot with horizontal line at 0:
graph residuals, yline(0)
6. Identify points with largest and smallest residuals:
sort residuals
list in 1/5
list in -5/l
(The last command is "minus 5" / "lowercase letter L".)
7. Compute multiple regression equation (vy is response, vthree, vtwo, and vvthree
are explanatory variables):
regress vy vone vtwo vthree
http://www.stat.uchicago.edu/~collins/resources/stata/stata-commands.html Page 3 of 5
Stata Commands 12/10/09 1:06 AM
In some versions of Stata, there is a potential glitch with Stata's stem command for stem-
and-leaf plots. The stem function seems to permanently reorder the data so that they are
sorted according to the variable that the stem-and-leaf plot was plotted for. The best way to
avoid this problem is to avoid doing any stem-and-leaf plots (do histograms instead).
However, if you really want to do a stem-and-leaf plot you should always create a variable
containing the original observation numbers (called index, for example). A command to do
so is:
generate index = _n
If you do this, then you can re-sort the data after the stem-and-leaf plot according to the
index variable:
sort index.
Then, the data are back in the original order.
http://www.stat.uchicago.edu/~collins/resources/stata/stata-commands.html Page 4 of 5
Stata Commands 12/10/09 1:06 AM
http://www.stat.uchicago.edu/~collins/resources/stata/stata-commands.html Page 5 of 5