Ciadmin, Journal Manager, 7237 Rees
Ciadmin, Journal Manager, 7237 Rees
Ciadmin, Journal Manager, 7237 Rees
ABSTRACT
Baseball, like most other sports, has a set of tenets that began early and have survived virtually
unquestioned. Modern analytics gives us an opportunity to examine some of these long-held
tenets to see if they were founded on solid evidence. This research examines some common
baseball wisdom through an initial study utilizing simulation. In particular, the profiles of several
baseball teams are constructed and various factors are examined by simulating ten baseball
seasons under various configurations with the different teams.
Contrary to conventional wisdom, a batting order where high-average hitters bat third in a lineup
and the team’s best power hitter bats cleanup (fourth), for example, does not necessarily generate
the most runs per game over the long run. Moreover, high-average hitters with less power can
generate more runs per game than power hitters with lesser averages. Finally, it appears that
hitters who perform well with runners in scoring position are more influential in helping their
team score more runs than even more powerful or higher average hitters who do not produce as
frequently in such cases. Players with lower star profiles, but who rise to the occasion with
runners in scoring position, can often be purchased by baseball clubs that have a more
constrained payroll; teams that are less well-off financially may thus purchase or trade for these
hitters and still field a team with a competitive level of high run production.
INTRODUCTION
I n the mid-1970s, Bill James, a night watchman at a bean cannery, began analyzing baseball box scores,
without a computer, with the purpose of questioning baseball’s long-held presuppositions, or “baseball
wisdom” as it is sometimes called (Hanchett, 2014). James’ analytical musings have now evolved into
a whole field - sports analytics. There is not complete agreement as to what the term means now or what it includes
and excludes [one definition of sports analytics is “the utilization of technology to answer questions [about sports]
correctly that the average human mind cannot without bias or error” (Bessire, 2014)]. However, the impact of
analytics is clear as league luminaries of multiple sports, front-office decision-makers, and industry leaders are
employing analytics in decision-making. Moreover, there are now meetings and conferences with research papers
and panelists discussing the increasing use of analytics on the field, in the front office, and in the boardroom.
Probably the preeminent sports analytics conference is the MIT Sloan Sports Analytics Conference, where topics
discussed this year included American football, baseball, basketball, and golf (Wikipedia, 2014c). There is now a
journal exclusively devoted to sports analytics - “The Journal of Sports Analytics (JSA)” - which aims to be a forum
for the discussion of practical applications of sports analytics research, serving primarily team owners, general
managers, and coaches. (JSA, 2014). Moreover, other excellent journals without a sports emphasis have been
publishing research on sports analytics, including this journal (Bartholomew and Collier, 2011). Topics are widely
diverging in these journals, ranging from playing basketball (“Every basketball player takes and makes a unique
spatial array of shots. In recent years, technology to measure the coordinates of these constellations has made
analysis of them possible” (Shortridge, Goldsberry, Adams, 2014) to refereeing Italian football matches (“Empirical
research allows for the verification of the incidence of systematic errors on the decision-making processes of
Copyright by author(s); CC-BY 11 The Clute Institute
Journal of Service Science – November 2015 Volume 8, Number 1
referees” (Lombardi, Trequattrini, and Battista, 2014)), and now yet another term has entered the analytics lexicon –
sabermetrics – which is the term for the empirical analysis of baseball, especially baseball statistics that measure in-
game activity. The term is derived from the acronym SABR which stands for the Society for American Baseball
Research (Wikipedia, 2014d). The research presented in this paper is based on in-game statistics, but instead of
doing statistical analyses on those data, those statistics are used to perform computer simulations of baseball games.
Major League Baseball (MLB) - America’s so-called national pastime - is one of four major professional
sports leagues in North America. There are 30 teams in MLB, with 29 from the United States and one from Canada.
MLB has the highest season attendance of any sports league in the world with more than 74 million spectators in
2013 (Wikipedia, 2014b).
The amount of financial resources, or player payroll available to the individual teams, varies dramatically
across the league, with teams such as the New York Yankees having a much larger budget than many of their
competitors (Wikipedia, 2014a). It is not surprising that those with more significant resources are often able to
attain higher levels of on-the-field achievement given their ability to outbid their competitors in an effort to procure
the most sought-after players. The question arises as to whether analytics can be used to identify and then obtain
players who are not known stars and who do not necessarily have the best offensive statistics, e.g., do not have the
most home runs or the highest batting averages. This question was the foundation of the recent film Moneyball,
where regression and econometrics were used for scouting and analyzing players, allowing the Oakland Athletics to
win twenty consecutive games - an American League record. In this research, the authors use computer simulation
for performing similar analytics.
The scope of this paper is to examine the hitting aspect of baseball in order to develop insight into 1) the
best types of players to acquire and 2) how to arrange them in batting order. This information should be useful to
both on-the-field managers who decide which team players to play any given day and in what order they should bat,
as well as for general managers who are trying to acquire (purchase, draft, and/or trade for) the best set of players
that the franchise’s budget allows.
The purpose of this paper is 1) to examine the effect of hitters who hit differently with runners in scoring
position (with runners on second and/or third base), 2) to study different types of batting lineups (orders in which
players bat), 3) to ascertain the relative advantage of players who are power hitters, as opposed to those who have
less power but hit with a higher batting average, and 4) the relative benefit in terms of runs scored per game as a
function of different batting averages.
The next section provides details of the methodology, including the four key components of baseball
wisdom to be investigated, and subsequent sections present results for each of these factors, a discussion of the
results, and conclusions.
METHODOLOGY
The simulation model examines only the offensive portion of baseball; that is, only that portion of major
league baseball when a team is at bat. While it is true that certain pitchers dominate hitters, whereas others are not
as effective, pitching is ignored as a control factor because the simulation produces average statistics, as a player’s
average performance is measured against all teams and all pitchers in his or her league. Also, issues, such as double
plays, sacrifice flies, and infield fly rules, were ignored in this study but could be included in a more detailed study.
Simulation Model
Three main portions of the simulation model are shown schematically in Figure 1: 1) how an “at bat” is
modeled, 2) how outs are processed, and 3) how runners are moved around the bases (with a three-base hit used as
an illustration). The simulation is written in Arena version 14.0 (Rockwell) and the schematics shown in Figure 1
are from the Arena language. The basic concept is that batters get hits a certain percentage of the time based on
their batting average and when they do get a hit, there is a certain probability that the hit is a one-base hit (single),
two-base hit (double), three-base hit (triple), or a four-base hit (home run).
Same Inning
0 0 0
Runner on Third and a T ru e
A ssign with Triple Runner on Second and T ru e A ssign with Triple Runner on First and a T ru e
A ssign with Triple A ssign Hitter to
Triple? and Man on Third and Man on
a Triple? S econd
Triple? and Man on First Third B ase
Batting averages are included in the model by sampling from a simple discrete distribution. For example,
consider a player who is hitting 0.300, which means the player gets a hit thirty percent of the time. A random
number is drawn and if the number is less than or equal to 0.300, the batter got a hit. In Figure 1a, this procedure is
implemented in the diamond-shaped box labelled “Hit?”. If a number greater than the hitter’s batting average is
drawn, the batter has made an out. The manner in which outs are processed is shown in Figure 1b. If the batter gets
a hit, the batter moves on to the “Spin Power Wheel” block in Figure 1a, where the conditional percentages for this
particular batter getting a single/double/triple/homer/ (e.g., 75/20/2/3/), given he has a hit, are looked up. Another
random number is used to determine the type of hit. The batter then branches to the section of code for the given hit.
For instance, if the batter gets a three-base hit, the process shown in Figure 1c is followed. First, the model
checks to see if there already is a runner on third base. If not, the next block - an Arena ‘Assign’ block labeled
‘Assign with Triple and Man on Third’ - is skipped. If there was a man on third already, then the “’Assign’ block is
executed which 1) records that a run has been scored and 2) removes the runner who just scored from third. Next,
there is a check to see if there was a runner on second base when the triple was hit. If so, that runner is scored and
removed from second base. Next, a check is made to see if there was a runner on first base and, if so, that runner is
scored and removed from the base paths. Finally, regardless of whether there were any runners on base before the
triple, the batter is placed on third base and the model passes control to ensure that the next batter is brought to the
plate to hit. Similar code is written for the cases where batters hit singles, doubles, and homers.
Experimental Factors
The underlying motivation of this paper is to provide an initial investigation of several common baseball
strategies. Factors are chosen which enable analysis of these strategies. Table 1 is a list of the four factors and
associated levels.
Major league baseball teams all desire players who have high batting averages and who also hit for power.
Unfortunately, such players demand large salaries and, therefore, baseball teams with relatively limited financial
resources cannot afford many of these players. The difficulty of this challenge is exacerbated by the presence of
competitors, such as the Boston Red Sox and the New York Yankees, who are not as constrained financially. Thus,
financially poorer teams are constantly looking for ways to identify and acquire players who are less expensive, yet
still allow them to field a high scoring team.
One possible strategy is for teams to purchase/utilize players who hit particularly well when runners are in
scoring position; i.e., when runners are either on second base, third base, or both. The term “scoring position” is
used in baseball parlance because, generally speaking, a hit of any kind will score the base runner.
Note that what is being investigated here differs from the term “clutch hitter” – one who produces when the
pressure is on in late innings of the game, in critical games during the pennant race or World Series, hitting with
runners in scoring position when there are two outs, etc. (Getz, 2013). This model looks at what a hitter does
whenever runners are in scoring position, whether this situation occurs early in the game, regardless of how many
outs there are, or even in a game that the media and fans do not deem critical.
The authors are not the first to suggest that an ability to drive home runners in scoring position is critical.
An article in May 2010 made the point. “As the 2010 Major League Baseball season transitions from April showers
to May flowers, one issue is abundantly clear - Winning, by and large, is directly tied to a team's ability to drive
home runners in scoring position (RISP).
“It shouldn't be surprising that two of the top three teams in that category - Tampa Bay and San Diego -
entered the weekend in first place. The third – Arizona - also has had a productive first month and was only 1-1/2
games behind the Padres in the National League West. “Conversely, the four teams struggling most to knock home
runners once they get into scoring position are having problems winning. Entering the weekend, Baltimore (with a
RISP of 0.193) was 4-18 and had the worse record in either league.
In this work, the RISP factor is modeled by introducing three potential levels of batting average for a given
player if at least one runner is in scoring position. The three levels result in 1) adding 75 points to the batter’s
normal batting average, 2) leaving the batting average as it is – this hitter hits the same whether runners are in
scoring position or not, and 3) the hitter (perhaps feeling some pressure) actually decreases his percentage of getting
a hit by 75 points. The hitter’s underlying batting average for the simulation is restored to its original value after
each RISP at bat. In other words, with RISP the hitter’s batting average is adjusted for just this one at bat.
Baseball lore and wisdom decree that a particular arrangement of hitters be utilized by a manager in a
season, with high-average hitters batting third and a power hitter batting cleanup (fourth), for example; but is this the
arrangement most likely to guarantee maximal run-scoring potential over the course of a season?
To address the question, four different lineup orders are examined: 1) in the first lineup, the batters hit in a
traditional order; 2) in the second lineup,- one of perhaps theoretical interest only, all batters are given the same
batting average; namely, the team average; 3) in the third lineup, batters are arranged in a descending batting
average order, ostensibly because the batters with the best averages should bat most frequently in a game; and 4) in
the fourth lineup, batters are sent to bat in a totally random order in every game they play.
A similar analysis from the mid-1970s, using a programmed embodiment of the Sports Illustrated baseball
game available at that time, appeared in Operations Research (Freeze, 1974). Freeze’s analysis had two teams
playing each other and his metric was number of wins per 162-game baseball season. This analysis differs in that
the statistics are almost forty years more current but also in that the metric is runs scored per game; the possible
confounding performance of the opposing team is removed. Freeze discovered that the effect of the most successful
lineup versus the least successful was less than three wins per season.
To address the question of what type of player a baseball team should trade for (a high-average singles
hitter or a lower-average slugger), the model uses three different power profiles. The first profile has hitters that are
primarily singles hitters, having a percentage distribution of types of hits of 75/15/5/5. Note that the percentages for
both doubles and homeruns were specified as shown because they were the percentage of doubles and homers for
the team in the majors with the least number of these types of hits. The second profile has a batting order with
moderately powerful hitters, with a percentage profile of 66/20/2/12. These percentages were for the team with the
average number of doubles and homeruns in 2012. Finally, a more powerful profile of power hitters was used with
percentages of hits distributed 56/24/3/17. The team with the highest percentage of doubles and home runs in the
majors in 2012 had these percentages. In short, the percentages chosen as levels for this factor are based on major
league baseball 2012 statistics.
Finally, the effect of team batting average (BA) on runs per game was examined. It is obvious that players
with higher batting averages are better run producers than those with lower averages, but how significant is batting
average when compared with other factors? Is a 25-point increase in BA, when a team is hitting 0.225, more
valuable than a similar increase to a team already hitting for a higher percentage, such as 0.250?
The major league baseball team with the best organizational batting average in 2012 batted 0.276. On the
other end of the spectrum, the team with the worst average batted 0.233 and the league-wide team batting average
was 0.253. Consequently, the three levels with respect to this factor are 1) 0.225, 2) 0.250, and 3) 0.275.
RESULTS
If a General Manager is trying to build a team with limited resources, or an on-the-field manager is trying
to maximize run production with an existing set of players, simulation results indicate that he or she should consider
a hitter’s productivity with runners in scoring position as a primary decision input. In particular, for a team with an
overall batting average of 0.250, an increase of 60 points in batting average with RISP will result in one additional
run per game. Conversely, a drop in 68 points in batting average with RISP (i.e., “choking”) will cost the team one
run per game (see Figure 2). Stated differently, if a General Manager can only afford 0.250 hitters and (s)he has a
choice in purchasing/trading for one set hitting 60 points higher (i.e., .310) with runners in scoring position and a
group hitting no differently (0.250) when runners are in scoring position, choosing the former set will result in an
extra run per game. The primary take away is that this strategy of run production improvement is likely to be
considerably less costly than obtaining a similar increase by obtaining a set of hitters with the overall improvement
necessary in overall batting average to produce an extra run. See the discussion below on factor 4 - team batting
average.
Figure 2. Runs Per Game As A Function Of An Increase/Decrease In Batting Average With Runners In Scoring Position
1.5
+60
1.0
0.5
0.0
-‐75
-‐50
-‐25
0
25
50
75
-‐0.5
-‐1.0
-68
-‐1.5
While it is certain that general managers and on-the-field managers have a general sense of how players
tend to perform in critical situations, it is much less likely that they have detailed statistics on a given player’s ability
to hit with runners in scoring position. To highlight a key point of these findings, consider a simple question. If you
were manager for a day and had the opportunity and resources to insert Reggie Jackson into your lineup, would you?
Like most managers, you would probably jump at the opportunity; after all, he is “Mr. October”. Based on the
findings in this research, this might not be a good move for your team. Baseball fans and journalists nicknamed
Reggie Jackson “Mr. October” because he performed exceptionally well in the World Series; however, a close
analysis of Reggie’s performance statistics clearly show that he hit much worse with runners in scoring position in
the other months of the baseball season (Appling, 2006), meaning that he would likely not be your best use of
resources in an effort to achieve season-long success. From the model, it is clear that ability to hit with RISP is an
important statistic that should be collected and used. The expansion of analytics into baseball and the routine
collection and analysis of in-game data should make this an easy task.
Type of Lineup
As mentioned earlier, four different batting lineup strategies were considered in this analysis. The type of
lineups included: 1) the traditional lineup, 2) a hypothetical lineup whereby every player on the team has exactly the
same batting average (note that in all four cases, the team-average batting averages are exactly the same), 3) a
descending batting average order whereby players are ordered from the player with the highest batting average
hitting first, down to the player with the worst average (usually the pitcher) hitting ninth, and 4) a totally random
batting average whereby the order the players hit is drawn at random for each game.
Results for the four scenarios are shown in Figure 3 and are quite contrary to conventional baseball
wisdom. The traditional baseball lineup did worst scoring the fewest runs, while the random order performed best
and the hypothetical all-hitters-with-the-same-average case scored second best. These results suggest that managers
making traditional lineups may actually be costing themselves a run every six or seven games (4.55-4.40 = 0.15 runs
per game) by not making daily lineups randomly.
4.60
4.55
4.50
4.45
4.40
4.35
4.30
Tradi1onal
All
have
same
BA
Descending
Order
Random
Power Profiles
Going back to the 1959 Chicago White Sox, people have debated the relative importance of the power
versus batting average profile of players and teams. Generally, power hitters are paid better than singles hitters.
In order to investigate this tradeoff, the three scenarios discussed earlier were examined. Results (see Table
2) are in the form of a tradeoff matrix, showing the effect of 25 points in batting average between adjacent columns
and the three power profiles. Note, for example, that a team with all singles hitters batting 0.250 scores almost as
many runs per game (4.05) as a team with all heavy power hitters (4.08), but batting only 0.225. Similarly, a team
with all singles hitter batting 0.275 scores slightly more runs per game (4.98) as a team with all heavy power hitters
(4.97), but batting 0.250. It is quite possible that, examining these results from a general manager’s point of view,
the singles-hitting team would be much less expensive (and it would also be possible that singles hitters would be
faster and better defensive players as well - factors not examined in this paper).
Looking at Figure 4, it is obvious, and not surprising, that an increase in team batting average leads to an
increase in run generation. However, two points should be noted: 1) slope is not constant and 2) it can be more
expensive for a general manager to get an extra run per game by buying additional batting average than to get the
extra run through purchasing a hitter who hits well with runners in scoring position.
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
0.150
0.175
0.200
0.225
0.250
0.275
0.300
0.325
0.350
0.375
0.400
With regard to the non-constant slope, note that starting with a team hitting 0.250 and increasing the team’s
overall average to 0.275 will result in an additional 0.8 runs per game, whereas starting with a team with a 0.275
average and increasing that team’s average by 25 points will lead to approximately 1.9 additional runs per nine
innings. Conversely, losing 25 points of team batting average through age, injury, or trading down for financial
reasons will impact a team differently depending on the team’s batting average before the decline.
If a general manager has an existing team with an overall batting average of ~0.230, then to get an
additional run per game through batting average, she or he will have to buy players with an average of 35-40 points
better in batting average. This could be fairly costly. Alternatively, to get an extra run per game through purchasing
hitters who hit well with runners in scoring position, the GM will still be able to purchase players with the same
batting average but with a 60-point increase in batting average with runners positioned to score. Such a strategy
could be considerably cheaper, particularly if other GMs, the media, and fans are unaware of players’ performance
in this regard or are unsure of its importance.
CONCLUSIONS
This research examines several aspects of conventional baseball wisdom and shows that several appear to
be in question. It demonstrates that computer simulation analysis can provide insight, both with respect to the
particular team profile a team should attempt to achieve and to how and when those players should be played, at
least from an offensive standpoint. The study examines the effects of several components of a hitter's portfolio;
namely, the importance of his hitting with runners in scoring position, where he bats in the lineup and the
importance of his ability to hit for average versus his ability to hit with power. It appears that the answers to these
Copyright by author(s); CC-BY 18 The Clute Institute
Journal of Service Science – November 2015 Volume 8, Number 1
questions are not obvious, even to those in MLB, yet the answers to these questions are obtainable with computer
simulation analysis. Simulation is a valuable and inexpensive tool that should at least be investigated in depth with
detailed information for teams with interest in increasing run productivity. A Decision Support System (DSS) built
around extensive situational batting data and sound analytics modeling could provide a definite advantage in
recruiting and scheduling. The generation of extensive data for a team representing the additional detailed
information on its players that has been outlined here (such as BA with RISP) should provide further specific insight
into increased run production capabilities. Baseball teams would do well to develop an individual team DSS to
examine trading implications and lineup issues for the players and to exploit the competitiveness the team presently
possesses.
AUTHOR INFORMATION
Loren Paul Rees is Andersen Professor of Business Information Technology. He received the Ph.D. in Industrial
and Systems Engineering, B.E.E. from Georgia Tech, and M.S.E.E. from the Polytechnic Institute of Brooklyn. His
current research focuses on multi-hazard disaster analysis and managerial issues in information security. He has
published in Naval Research Logistics, IIE Transactions, Decision Sciences, Transportation Research, Journal of
American Medical Informatics Association, Journal of the Operational Research Society, Computers and
Operations Research, Communications of the ACM, Decision Support Systems, and others. Email: [email protected].
Terry R. Rakes is William and Alix Houchens Professor of Information Technology at Virginia Tech. He received
the Ph.D. in Management Science, M.B.A. and B.S.I.E. from Virginia Tech. His research interests are in analytics
and big data analysis, text and data mining, disaster planning and logistics, and information security. He has
published in Management Science, Decision Sciences, Decision Support Systems, Annals of Operations Research,
OMEGA, European Journal of OR, Operations Research Letters, Information and Management, Journal of
Information Science, and others. Email: [email protected].
Jason K. Deane is Associate Professor of Business Information Technology in the Pamplin College of Business at
Virginia Tech. He received a Ph.D. in Decision and Information Sciences from the University of Florida and an
M.B.A. and B.S. in Business Administration from Virginia Tech. His current research interests are in the areas of
artificial intelligence, computer-aided decision support systems, information system security, large scale
optimization and information retrieval. He has published in such journals as Decision Support Systems, Annals of
Operations Research, Information Technology and Management, International Journal of Physical Distribution and
Logistics Management, Operations Management Research, and others. Email: [email protected].
REFERENCES
Shortridge, Ashton, Goldsberry, Kirk, Adams, & Matthew. (2014). “Creating space to shoot: quantifying spatial
relative field goal efficiency in basketball,” Journal of Quantitative Analysis in Sports, 10(3) 303-313.
Wikipedia. (2014a). “List of highest paid Major League Baseball players”. Retrieved from
http://en.wikipedia.org/wiki/List_of_highest_paid_Major_League_Baseball_players.
Wikipedia. (2014b). “Major League Baseball,” Retrieved from
http://en.wikipedia.org/wiki/Major_League_Baseball.
Wikipedia (2014c). “MIT Sloan Sports Analytics Conference”. Retrieved from
http://en.wikipedia.org/wiki/MIT_Sloan_Sports_Analytics_Conference.
Wikipedia. (2014d). “Sabermetrics,” Retrieved from http://en.wikipedia.org/wiki/Sabermetrics.