Movie Sales Analysis
Movie Sales Analysis
Movie Sales Analysis
Madness
Descriptive
Statistics
Regression
Analysis
Description of
independent and
dependent
variables
Histogram analysis
Distribution Test
Correlation
Analysis
Best subset
regression
analysis: identify
relevant
independent
variables
Residual analysis
Forecasting
Forecasting gross
domestic ticket
sales for 2014
releases
Dataset
Data Field
Description
Source
Movie Title
Total Gross
Awareness Score
IMDB; E-Score
Appeal Score
IMDB; E-Score
Rotten Tomatoes
Rating
Rotten Tomatoes
Production Budget
IMDB
Studio
IMDB
Season
Genre
Movies.com
Rating
G, PG, PG-13, R
Movies.com
Distribution Test
Even among the top films, only very few are extremely successful (>$300M)
Actual blockbusters are rare - Most of the top 100 earn closer to $75M
This compares to about $75M average production budget, indicating limited profitability
even among the most successful films
R2 = 63.4%
Adj. R2 = 45.9%
Several
variables with
extremely high
VIF scores
Multiple
insignificant
variables
Correlation Matrix
Production budget, rating, and Disney studio have highest correlations to the
dependent variable
Action/Adventure and budget are highly correlated, likely due to special effects
costs. Viewers also benefit the most from this genres in-theater experience
R2 = 53%
Adj. R2 = 50%
All variables statistically significant at the ~90% level
Residuals
Heteroscedasticity present
Using our final model, we predicted ticket sales for movies released and
closed in 2014
Conclusions
Model indicates that Rotten Tomatoes Rating and Production Budget are
significant contributors to Total Gross variability
Studio has little impact on the variability of Total Gross, with the exception of
Disney (positive impact) and Focus (negative impact)
Disney: People go to see a movie because its a Disney movie, which drives up ticket sales
Focus: This indie film studio has smaller distribution and awareness
QUESTIONS?
Appendix
Additional Data
In addition to the previous data, we also aggregated the following data that we
did not include in our analysis:
Data Field
Description
Theaters
(Opening)
Opening
Theaters
Open
Date of opening
Close
Date of closing