Term Project Details
Term Project Details
Term Project Details
For this semester, we are going to use the concepts in Chapter 10 (Correlation
and Linear Regression) in a statistical study. Your project should follow the
following steps and details of how you performed each steps should be included
in your final report.
1. Find a statistical question to answer.
In this case, it would be about defining correlation between two sets of data such
that one (y) depends on the other (x). For example, a statistical question would
be: is there a correlation between age (x) and say, cholesterol level (y)? Note
that cholesterol level (y) depends on age (x), not the other way around.
(Remember that correlation DOES NOT NECESSARILY mean causation).
Clearly state your question and elaborate on what correlation you are going to
analyze. A project is worthless if the readers have no clue what it is that you are
trying to find and analyze.
2. Come up with your own hypothesis.
This is where you are making a guess/conclusion about what you are likely to
find. For example, a hypothesis for the age vs. cholesterol may be that older a
person, higher the cholesterol. Your analysis of the data will either confirm or
deny your early conclusion, i.e., hypothesis.
3. Collect data.
Since we are going to use Chapter 4 material exclusively, make sure that the
data that you do choose are generally speaking, linearly related. Remember you
are going to be analyzing the data to find if they are linearly correlated and if so,
come up with linear regression line that best fits the data. The equation of the
line (y=mx+b) would be then the model for the data in that, you can plug in any x
value in that equation to find the y. Therefore, if the data (x vs. y) is say, nonlinear, trying to do linear analysis would be foolish. So, once you plot your data
in a scatterplot and the data do not look linear, find a different question to answer
for which data are somewhat linear.
Make sure you have enough data. If you are going to make some conclusion
about the whole American population and all you have is 20 data, it would be
hard to impress/convince anyone with your analysis.
Give the source of your data.