Racial Discrimination in The Soccer Transfer Market: Evidence From England and Russia

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Racial discrimination in the soccer transfer market:

Evidence from England and Russia

University of Warwick
Department of Economics
April 2016
Supervisor: Professor Andrew Oswald
Abstract
This study uses player data from England and Russias 2014/15 Premier Leagues to
examine whether there is any evidence of racial discrimination against non-Whites in
the soccer transfer market. Unlike other studies, this study first identifies the
determinants of team success and then uses them to inform the player productivity
measures in the transfer fees model, tackling the omitted variables issue that plagues
the literature. The results of the OLS, Tobit, and Instrumental Variables models (with
language family as an instrument for race) indicate that black players cost 4-10% more
on average compared to white players in Russia, but 22-26% less in England, ceteris
paribus. However, these differentials are insignificant. Using Propensity Score
Matching, black players were found to be 48% dearer in Russia and 22% cheaper in
England compared to matched white players. Again, none of the racial differentials are
statistically different from zero. Thus, the null hypothesis of discrimination is rejected, in
accordance with the most recent literature.
Acknowledgements: I am deeply grateful to Professor Andrew Oswald for his
invaluable support and guidance throughout this research. I am also indebted to Dr
Claire Crawford and Dr Michele Aquaro for their help with econometric techniques.
Word count: 4998

Contents
I

INTRODUCTION................................................................................................................. 2

II

RACIAL DISCRIMINATION IN SPORTS.............................................................................. 3


METHODOLOGY ............................................................................................................ 6

III

Assumption 1.......................................................................................................................... 7
Assumption 2.......................................................................................................................... 8
Assumptions 3 & 4.................................................................................................................. 9
IV

DATA .............................................................................................................................. 9

Teams .................................................................................................................................... 9
Players ..................................................................................................................................12
RESULTS ...........................................................................................................................20

Teams ...................................................................................................................................20
Players ..................................................................................................................................22
VI

EXTENSIONS & CONCLUSIONS ....................................................................................28

REFERENCES ...........................................................................................................................30
APPENDIX ................................................................................................................................33
Data sources & validity ..........................................................................................................33
Teams-dataset ...................................................................................................................33
Players-dataset ..................................................................................................................33
Weather effect .......................................................................................................................36
Diagnostic tests & intermediate results ..................................................................................38
APPENDIX REFERENCES .........................................................................................................42

INTRODUCTION

The use of statistical analysis to achieve an advantage in sports was popularised in


Michael Lewis book: Moneyball. It tells the story of how the Oakland As, a baseball
team, used an empirical approach to exploit many types of prejudices in evaluating
players. In a game such as soccer where luck decides 50% of the match outcome, as
estimated by Anderson and Sally (2014), inefficiencies can be heavily punished. One
type of inefficiency that economists are particularly interested in is racial discrimination.
My focus here is on transfer market discrimination. According to Deloitte (2015), transfer
expenditure in the 2013/14 season across the 92 English league clubs exceeded 1
billion for the first time. Hence, the implications of discrimination can be significant.

This study uses player data from England and Russias 2014/15 Premier Leagues to
examine whether there is any evidence of racial discrimination against non-Whites in
the soccer transfer market. Unlike other studies, this study first identifies the
determinants of team success and then uses them to inform the player productivity
measures in the transfer fees model, addressing Szymanskis (2000) criticism that the
researcher might forget to include relevant characteristics. This study is also
unprecedented in that it avoids sample selection bias by including all players, even nontransferred players, in the analysis. This is facilitated by data on estimated transfer fees.
The study does not find sufficient evidence to suggest that non-Whites are discriminated
against, neither in England nor in Russia.

The rest of this paper is organised as follows. Section II reviews the literature on racial
discrimination in sports, with a focus on soccer. The methodology is outlined in Section
III and the datasets are described in Section IV. Section V presents and discusses the
results. Extensions and conclusions are given in Section VI.

II

RACIAL DISCRIMINATION IN SPORTS

Becker (1957) initiated the economic debate on labour market discrimination, which he
defined as unequal treatment of equally productive workers. It seems natural to test for
discrimination when there are only a few individuals involved in determining the
outcome rather than thousands. Kahn (2000) remarkably noted that there is no
research setting other than sports where we know the name, face, and life history of
every production worker and supervisor in the industry.

The economic literature on American sports is more extensive since sports data
collection began earlier in the US. Kahns (1991) survey of discrimination in American
professional sports concluded that evidence of salary discrimination was found in
basketball (11-25% in favour of whites) but not in baseball or American football. He
argued that the lack of evidence in the latter two sports is due to the small size of the
samples used.

Turning to soccer, Carmichael and Thomas (1993) examined 214 permanent transfers
within England during the 1990/91 season and concluded that Nash bargaining theory
captured the important elements of the transfer market. However, Szymanski and Smith

(1997) argue that the Bosman ruling1 made it a competitive market. Szymanski (2000)
cites the fact that player wages explain most of the variation in league position as
evidence (R2=0.9).

Frick (2007) surveyed the literature on transfer fees and noted that only one study Feess et al. (2004) - included contract length as a variable. This is a significant
determinant of transfer fees since clubs aim to cash in on a player before he can leave
for free at the end of his contract. In addition, Frick raised the issue of two sources of
sample bias: i) that transfer fees announced in the news may not be a random sample
of all transfers; and ii) that transferred players may not be a random sample of all
players. Most papers have also excluded foreign and free transfers, exacerbating the
selection bias. Carmichael, Forrest, and Simmons (1999) used the Heckman two-step
procedure to account for the second type of bias but there have been no attempts to
account for both types simultaneously. This study does so by replacing undisclosed
fees and fees for non-transferred players with estimated fees.

Using a simple Ordinary Least Squares (OLS) model, Reilly and Witt (1995) found no
evidence of racial discrimination in a dataset of 202 English league transfers from the
1991/92 season. However, they only used goals and league appearances as measures
of productivity. Medcalfe (2008) repeated this exercise for 33 transfers in the English
Premier League (EPL) during the 2001/02 season. Although black players were
estimated to cost 632,977 less than equally productive white players, the race dummy

The 1995 EU Bosman ruling allowed players to switch clubs at the end of their contract without a
transfer fee being paid.

was not statistically significant. The insignificance of most of the variables suggests this
is due to the small sample size.

Szymanski (2000) was unconvinced by the possibility of fully capturing productivity in


the OLS equation. His hypothesis was that for a given level of total wages, clubs with an
above-average

proportion

of

black

players

will

systematically

outperform

if

discrimination was present. He used data from 1978-1993 for 39 English clubs and
indeed found that teams with relatively more black players over-performed, given their
wages. This contradicts previous findings but the sample size and methodology indicate
a reliable result. Szymanski later extended the dataset to include six more seasons and
did not find evidence of discrimination in that additional period (Kuper and Szymanski,
2014: p.105).

Preston and Szymanski (2000), using the same dataset as Szymanski (2000), found
little evidence of fans being the source of racism. Wilson and Ying (2003) also failed to
find evidence of consumer (fans) or co-worker (teammates) discrimination based on
nationality. Their dataset comprised the five largest European leagues (England, Spain,
France, Italy, and Germany) and spanned the years 1997-2000. These results point
towards employer prejudice being the main source of discrimination in football.

Evidence for forms of discrimination other than wage gaps is more abundant. For
example, Chu et al. (2013) showed that, in the 2011-12 season of the EPL, non-White
players were subject to discrimination in terms of number of appearances, number of

fouls, and number of cards given by referees, but not wages. Goddard and Wilson
(2009) found that black players in the EPL were more likely to be in higher divisions and
to be retained longer, suggesting barriers to entry in terms of hiring. Finally, Gallo et al.
(2013) analysed 2006-2008 EPL data and estimated that non-White players are 15%
more likely to receive a booking than a White player after controlling for team, player,
referee, and match attributes.

III

METHODOLOGY

The fundamental equation to test for evidence of a racial differential in transfer fees is:
j

log( fee) i 1 Blacki 2 Blacki * England i 3 England i i xiplayers i

(1) 2 3

i 1

A non-zero value of 1 ( 1 2 ) would be considered evidence of discrimination in


Russia (England). x players x1players

x2players

... x jplayers

is a vector of j measures of

player productivity and other controls (for example, goals scored and contract length).

Equation 1 can be estimated using an OLS model. Of the assumptions required for the
OLS estimates of 1 and 2 to be unbiased, the ones least likely to hold are:
1. Black and the other covariates are not correlated with the error term, that is:

E | Black , England , x players E 0 .


Measurement error or the omission of relevant variables can violate this
assumption and give rise to bias in the OLS estimates. It is difficult to predict the
2

Black is more accurately interpreted as non-White, since I have coded players as either black or white.
This is because I am not interested in discrimination against a specific minority.
3
log(fee) is the natural logarithm of the transfer fee.

direction of the bias since I cannot pinpoint a particular relevant variable that is
missing from the dataset.
2. E log( fee) | Black , England , x players is linear in the covariates Black , England , x players
3. There is common support over the relevant characteristics of black and white
players. Otherwise, we would be estimating 1 and 2 for the limited range
where characteristics of both black and white players overlap.
4. There are similar distributions of observables for black and white players.
Otherwise, unless the model is correctly specified, 1 and 2 will be biased.

Assumption 1
I try to identify how a player should be valued by first examining the determinants of
team success. Thus, as a first step, I estimate the following equation:
j

goal differencei 0 i xiteams i

(2 )

i 1

xteams x1teams

x2teams

... xteams
j

is a vector of j team statistics and other controls (for

example, possession or year). The results from this model will inform what x players in
equation (1) will include. This should help avoid forgetting to include relevant variables,
which Szymanski (2000) was concerned about. Since the teams-dataset is panel, I
estimate equation (2) using a Fixed Effects (FE) model. The fixed effects are the teams
playing styles. For example, FC Barcelona plays a passing game which makes it have
an above average possession. This correlation between fixed effects (playing style) and
the covariates (for example, possession) has the potential to render OLS coefficients
inconsistent.
7

The variable Black was constructed using my judgement of online images of the
players, which can be a source of measurement error. To account for this, as well as
any omitted variables bias, I run an Instrumental Variables (IV) model using the players
language family4 as an instrument for his race. Intuitively, this instrument is relevant
because language and race are deeply related. In addition, the language family of a
player is based on the official language in his country of citizenship, so it is not subject
to the researchers discretion. Finally, it is difficult to see how the instrument can affect
the value of a player directly other than through his race.

Assumption 2
The second assumption does not strictly hold as transfer fees have a positive probability
of being equal to zero in the case of free transfers. Hence, the model is more accurately
expressed as a latent variable model:
j

log( fee) i 1 Blacki 2 Blacki * England i 3 England i i xiplayers i

(1)

log( fee)i * max log( fee)i , log(0.000001)

(3)

i 1

log( fee) is the latent variable. This will be estimated using a Tobit model with a left limit

at log(0.000001)5.

4
5

Langauge families include Afro-Asiatic, Indo-European, Niger-Congo and Sino-Tibetan.


Free transfers were entered as 0.000001 to avoid taking the natural logarithm of zero.

Assumptions 3 & 4
Finally, I use Propensity Score Matching (PSM) to eliminate the bias which would arise
if assumptions 3 and 4 did not hold. This technique is designed to make the treatment
group (here, black players) as comparable as possible to the control group (white
players) in order to estimate an unbiased treatment effect (racial differential). Propensity
scores are calculated using a Probit regression of Black on the same covariates of
equation (1).

IV

DATA

Teams
This study uses two datasets: a players-dataset and a teams-dataset. The latter has
data on the first-tier clubs of England, France, Spain, Italy (20 clubs each), and
Germany (18 clubs) over the seasons 2012/13-2014/15, as well as Russian Premier
League clubs (16 clubs) for the 2013/14 and 2014/15 seasons. Hence, this is an
unbalanced panel dataset with 326 observations. The data was collected from
www.squawka.com, which licenses its data from Opta. The teams-dataset is used to
estimate equation (2).

Table 1 displays summary statistics for the teams-dataset. The dependent variable
measuring team success is Goal difference (Gdiff). This is simply goals scored minus
goals conceded during the season. As expected, the Attack set of variables are
positively correlated with Gdiff. Shorter, more accurate passing and maintaining
9

possession is also associated with a higher goal difference. Interestingly, only one
defensive statistic - percentage of aerial duels won - is positively correlated with Gdiff. It
is likely that the more the team is called into action in defence, indicated by more blocks
and interceptions, the less likely it is to have control of the match. Perhaps for the same
reason, weaker teams tend to receive more bookings.

10

TABLE 1
Summary statistics for teams-dataset

Dependent variable

Attack

Passing & Possession

Defence

Discipline

Variable

Mean

Std. Dev.

Min

Max

Correlation with Goal difference

Goal difference

25

-57

89

Shots (inside the area)


Shots (outside the area)
Shot accuracy (%)
Successful passes
Key passes
Pass accuracy (%)
Pass length (m)
Possession (%)
Interceptions
Shots blocked
Clearances
Aerial duels won (%)
Yellow cards
Red cards

259
209
0.44
11621
309
0.78
20
0.49
593
107
1083
0.49
76
4

58
44
0.04
2936
61
0.04
1
0.04
93
28
243
0.04
21
3

147
115
0.35
5785
170
0.68
16
0.38
333
45
467
0.39
36
0

447
369
0.56
25003
505
0.90
22
0.62
895
199
1677
0.62
151
14

0.75
0.25
0.64
0.71
0.57
0.67
-0.55
0.69
-0.20
-0.33
-0.32
0.45
-0.25
-0.25

n = 326 for all variables.

11

Players
The players-dataset contains the statistics of 786 outfield players6 who featured at least
once in the English and Russian Premier Leagues during the 2014/15 season. Actual
transfer fee figures were acquired from www.transfermarkt.co.uk. The real-life
performance measures of players such as goals scored and shots per game were
obtained from www.whoscored.com, another licensee of Opta. The rest of the statistics,
including estimated transfer fees, were collected from www.sofifa.com. The variable fee
equals the actual transfer fee if observed or the estimated fee otherwise. Despite the
comprehensiveness of this dataset, a panel dataset would have been even more ideal,
as it would facilitate controlling for player heterogeneity.

Descriptive statistics of the key variables in the players-dataset are presented in Table
2. The average transfer fee is 3.51m, with a relatively large standard deviation of
5.90m. Midfielders are the most common, followed by defenders and then forwards.
Table 3 compares white and black players by country. Black players in Russia are on
average 2 years younger, score more than twice as many goals, shoot more per game,
and have better positioning ability than white players. This might reflect higher barriers
to entry for black players in Russia. The differences are not as significant in England;
notwithstanding, white players play 244 more minutes over the season and are almost
half as likely to be forwards compared to black players. The aforementioned differences
are significant at 5%.

Goalkeepers are evaluated differently and are thus excluded.

12

TABLE 2
Summary statistics for players-dataset
Variable

Mean

Std. Dev.

Min

Max

Transfer fee (m)

3.51

5.90

0.000001

47.25

log(fee)

-0.36

3.45

-13.82

3.86

Black

0.20

0.40

Age

26

16

40

Expiry year of contract

2016.9

1.48

2015

2021

Minutes played

1378

915

3420

Defender

0.36

0.48

Midfielder

0.44

0.50

Forward

0.20

0.40

Goals

1.88

3.08

26

Assists

1.32

2.05

18

Shots (per game)

0.86

0.72

4.5

Pass accuracy (%)

77

12

100

Aerial duels won (per game)

1.22

1.08

8.6

Positioning ability (out of 100)

59

16

20

90

International reputation (scale of 1-4)

1.54

0.72

Quartile rank of player's team

2.52

1.11

Log(fee) is the natural logarithm of transfer fees. An international reputation of 4 indicates the most
fame. n = 786 for all variables.

13

TABLE 3
Comparing black and white players
Russia
Variable

Black

White

England
t-statistic

Black

White

t-statistic

3.02***
-0.62
4.57
5.03
(0.003)
(0.537)
-1.98**
-1.02
Age
24
26
25
26
(0.048)
(0.308)
-0.05
-0.58
Expiry year of contract
2016.7
2016.7
2016.9
2017.0
(0.963)
(0.561)
0.59
-2.43**
Minutes played
1279
1194
1341
1585
(0.554)
(0.016)
-1.41
-1.75*
Defender
0.24
0.36
0.31
0.40
(0.160)
(0.081)
0.36
-1.1
Midfielder
0.47
0.44
0.40
0.45
(0.717)
(0.272)
1.19
3.58***
Forward
0.29
0.21
0.29
0.15
(0.236)
(0.000)
4.03***
-0.14
Goals
3.18
1.41
2.07
2.12
(0.000)
(0.893)
0.92
-1.72*
Assists
1.38
1.09
1.19
1.59
(0.356)
(0.086)
2.88***
0.70
Shots (per game)
1.14
0.80
0.91
0.86
(0.004)
(0.485)
1.59
0.76
Pass accuracy (%)
78
75
79
78
(0.113)
(0.446)
-0.35
-1.44
Aerial duels won (per game)
1.01
1.07
1.21
1.39
(0.724)
(0.15)
2.00**
1.38
Positioning ability (out of 100)
60
54
63
61
(0.046)
(0.169)
1.21
0.73
International reputation (scale of 1-4)
1.32
1.22
1.82
1.76
(0.229)
(0.468)
-1.51
0.51
Quartile rank of player's team
2.26
2.57
2.55
2.49
(0.133)
(0.610)
Values reported are means. t-statistics are results of difference-in-means tests assuming unequal variances; their p-values are reported in parentheses.
*0.05<p<0.10, **0.01<p<0.05, ***p<0.01
Transfer fee (m)

2.96

1.56

14

FIGURE 1

10

Percent
20

30

40

Histogram of transfer fees, all players

10

15

20
25
30
Transfer fee (m)

35

40

45

50

35

40

45

50

FIGURE 2

20

Percent

40

60

Histogram of transfer fees, by league

10

15

20
25
30
Transfer fee (m)
England

Russia

Figure 1 above shows that transfer fees are heavily positively skewed, hence the loglinear specification in equation (1). Transfer fees are more positively skewed in Russia
than in England and there are fewer outliers in Russia as Figure 2 demonstrates.

15

FIGURE 3

.1

Density

.2

.3

Density plot of log(fee) by race, England

-15

-10

-5
log(fee)

Black
White
kernel = epanechnikov, bandwidth = 0.4829

FIGURE 4

.1

Density

.2

.3

Density plot of log(fee) by race, Russia

-15

-10

-5
log(fee)
Black
White

kernel = epanechnikov, bandwidth = 0.6144

Figures 3 and 4 compare the distribution of log(fee) between black and white players in
England and Russia respectively. Immediately one can see that there are little
differences in England. Perhaps surprisingly, black players cost almost twice as much
16

as white players in Russia (2.96m compared to 1.56m). This could hint at a


compensating differential that indemnifies black players for racial abuse from the fans.
The hypothesis that the difference in means is zero is rejected for Russia but not for
England.

FIGURE 6

Average value by position

Average value by contract expiration year

Mean transfer fee (m)


2
3

Mean transfer fee (m)


4
6

FIGURE 5

Defenders

Midfielders

Forwards

2015

2016

2017

2018

2019

2020

2021

FIGURE 8

Average value by international reputation

Average value by teams quartile league rank

Mean transfer fee (m)


10
20

Mean transfer fee (m)


4
6

30

FIGURE 7

International reputation is ranked on a scale of 1-4 where 4 indicates a very well-known player

1 indicates player belongs to one of the teams that finished in the top quarter of the league

Figures 5-8 show how the average valuation of a player changes according to his
position, the expiry year of his contract, his international reputation, and the quality of
his team respectively. Forwards cost the most, followed by midfielders and then
defenders. The average fee increases with each additional year left on the contract up
17

until 2019 and then drops substantially. Players with the maximum international
reputation of 4, including the likes of Eden Hazard and Wayne Rooney, have an
average fee of 27.37m, whereas the least well-known players cost a mere 1.26m on
average. The difference in player values between teams in the top quartile of the league
and the rest is also substantial.

Player value by age

Player value by number of minutes played

2
log(fee)
0
-4

-15

-2

-10

log(fee)
-5

FIGURE 10

FIGURE 9

15

20

25

30

35

40

1000
2000
3000
Number of minutes played in 2014/15

Age
Log of fee

Fitted values

Log of fee

Fitted values

FIGURE 12

Player value by goals scored

Player value by shots per game

-4

-4

-2

-2

log(fee)
0

log(fee)
2

FIGURE 11
6

4000

10
15
Goals scored in 2014/15
Log of fee

20

Fitted values

25

2
3
Shots per game
Log of fee

Fitted values

18

Player value by aerial duels won per game

Player value by pass accuracy

-4

-2

-2

log(fee)
0

log(fee)
0

FIGURE 14

FIGURE 13

4
6
Aerial duels won per game
Log of fee

-4

Fitted values

20

40
60
Pass completion %

80

100

FIGURE 15

-4

-2

log(fee)
0

Player value by positioning ability rating

20

40

60
Positioning, out of 100
Log of fee

80

100

Fitted values

The relationships between log(fee) and several continuous variables are shown in
Figures 9-15. A players value seems to rise with age until it hits a peak around the age
of 25, at which point it starts decreasing. Minutes played, goals scored, shots per game,
aerial duels won per game, and positioning ability all show a positive correlation with a
players valuation. These correlations are as expected. However, passing accuracy
does not show an apparent relationship with log(fee).

19

RESULTS7

Teams
Several FE models were used to estimate equation (2), starting with a simple model and
adding new variables each time. The model in Table 4 is the preferred FE model since it
contains all the regressors that were statistically significant across the different
specifications. It did not fail the normality, serial correlation, or Ramsey RESET test for
functional misspecification. The determinants of Gdiff that were not sensitive to the
inclusion of additional variables are emboldened in Table 4.

Table 5 shows that the magnitude of the effects is considerable for most variables. The
signs on the coefficients of Pass accuracy and Key passes might seem surprising. A
possible explanation is that teams with higher pass accuracy avoid better, riskier
passes. Therefore, they make successful passes but they are not able to score. Also, a
key pass is defined as a pass that leads to a shot that is not a goal. Hence, it is a
measure of wasted chances.

Unless otherwise mentioned, the significance of coefficients is evaluated at the 5% level. Intermediate
results and diagnostic tests for all models are relegated to the appendix.

20

TABLE 4
Fixed effects model for determinants of team success
Variable

Intercept
Intercept and control
Year
Shots (inside the area)
Shots (inside the area)^2
Attack

Shots (outside the area)


Shots (outside the area)^2

Passing & Possession

Coefficient
estimate
194.4
(2337.4)
0.366
(1.110)
0.369***
(0.154)
-6.67x10-5
(2.56x10-4)
0.180
(0.154)
-5.3x10-5
(3.04x10-4)

Shot accuracy (%)

51.15**
(25.36)

Successful passes

0.005
(0.004)

Successful passes ^2

-1.37x10-7
(1.30x10-7)

Key passes

-0.289***
(0.062)

Pass accuracy (%)

-3322.4***
(770.4)

Pass accuracy^2

2153.1***
(504.1)

Possession (%)

972.6**
(412.6)

Possession^2

-933.6**
(429.4)

Interceptions

Dependent variable: Goal difference


Overall R2
n
,

Fraction of variance due to fixed effects

0.709
326
8.16
0.674

-0.0034
(0.0076)

Shots blocked

0.017
(0.046)

Clearances

0.0044
(0.0058)

% of aerial duels won

65.95**
(30.97)

Yellow cards

-0.847***
(0.289)

Yellow cards^2

0.0043***
(0.0016)

Red cards

0.143
(0.756)

Red cards^2

-0.058
(0.065)

Defence

Discipline

Variables in bold were judged to be stable across all the FE specifications that were estimated. Heteroscedastic-robust standard errors are reported in parentheses.
*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

21

TABLE 5
Magnitude of effects

1
2

Variable

following a 1
s.d. increase in variable

Effect size in terms of


goal difference s.d.

Equivalent variable in
players-dataset

Shots (inside the area)

21

0.84

Shots per game

Shot accuracy (%)

0.08

Shooting ability1

Key passes

-18

-0.7

Vision1, Assists

Pass accuracy (%)

-141

-5.6

Pass accuracy

Possession (%)

35

1.38

Quartile rank of players


team2

% of aerial duels won

0.1

Aerials won per game

Yellow cards

-16

-0.64

Yellow card1

Variable was eventually discarded from transfer fee model due to its insignificance.
The teams league position is a good proxy for its players ability to maintain possession R 2 from OLS regression of possession on rank is 0.392.

Players
Assuming the market values players based on the determinants of team success
identified above, a model of transfer fees can be constructed. The last column of Table
5 matches team-level to player-level variables. Several OLS models of log(fee)
regressed on player characteristics were estimated. I am interested in the coefficients
on Black and Black*England, which measure the racial differential in Russia and the
difference in the differential between England and Russia, respectively.

The preferred OLS model, reported in Table 6, contains all the covariates that were
insensitive to the addition of other variables. The results indicate that in Russia, black
players command a transfer fee that is 9.56%8 higher on average than white players,

This is calculated using the exponential transformation 100 * exp(1 ) 1

22

ceteris paribus. The corresponding figure in England is 25.56%9 in favour of white


players, but both estimates are insignificantly different from zero.

Other significant effects include, ceteris paribus: a players value rises until the age of
25 and then starts decreasing; each additional year left on the contract increases the
players value by approximately 66%; an increase of 100 minutes played over the
season is associated with a 6.4% increase in the players valuation; winning one more
aerial duel per game adds 30% to the players transfer fee; and being in a team that is
one quartile higher in the league increases that players value by 38%. These effects
are relatively large, considering the standard deviations of the independent variables
(summarised in Table 2). However, since it did not control for unobserved player
heterogeneity, this study does not claim these effects are causal. Interestingly, of the
seven significant determinants of team success, only two of their player-level
equivalents are significant in this model. This raises questions about the efficiency of
the transfer market. Alternatively, the player-level variables may not be capturing the
same attributes as their team-level variables.

For England, the exponential transformation is applied to 1 2 .

23

TABLE 6
Modelling transfer fees
Dependent variable: log(fee)

OLS

Tobit

IV-GMM

Variable

Coefficient estimate

Marginal effects

Coefficient estimate

Intercept

-1049.1***
(169.2)

-1096.0***
(180.4)

-996.9***
(164.3)

Black

0.091
(0.217)

0.067
(0.224)

0.037
(0.343)

England

-0.515*
(0.286)

-0.581*
(0.304)

-0.859
(0.946)

Black * England

-0.387
(0.432)

-0.316
(0.410)

-0.321
(0.341)

Age

2.046***
(0.345)

2.140***
(0.380)

2.115***
(0.329)

Age^2

-0.0413***
(0.0068)

-0.043***
(0.008)

-0.043***
(0.006)

Year contract ends

0.507***
(0.084)

0.529**
(0.089)

0.480***
(0.081)

Minutes played

6.41x10-4***
(1.76x10-4)

6.64x10-4***
(1.84x10-4)

6.08x10-4***
(1.73x10-4)

Midfielder

0.081
(0.310)

0.099
(0.323)

0.075
(0.307)

Forward

0.260
(0.436)

0.292
(0.455)

0.307
(0.414)

Goals

-0.067
(0.046)

-0.069
(0.048)

-0.064
(0.043)

Assists

-0.0013
(0.0628)

-0.0033
(0.0650)

-0.012
(0.062)

Shots per game

0.299
(0.241)

0.300
(0.250)

0.308
(0.216)

Pass accuracy (%)

0.010
(0.012)

0.011
(0.012)

0.013
(0.011)

Aerials won per game

0.266**
(0.112)

0.273**
(0.117)

0.239**
(0.108)

Positioning (out of 100)

0.015
(0.010)

0.015
(0.010)

0.016
(0.010)

International Reputation

0.430*
(0.224)

0.432*
(0.233)

0.473**
(0.212)

Quartile rank of player's team

-0.484***
(0.109)

-0.494***
(0.115)

-0.441***
(0.103)

R2 = 0.304

Pseudo-R2 = 0.066

R2 = 0.301

= 29.21

= 26.03

Log pseudo-likelihood = -1955.0

= 515.35

# censored = 40
Sigma = 3.02
(0.200)
Variables in bold are significant at 5% across all the specifications. Endogenous variables in the IV-GMM model are Black and Black*England. Tobit marginal effects
) ] in equation (3), except intercept. n = 786. Heteroscedastic-robust standard errors are reported in parentheses.
are the average partial effects on [log(
*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

24

FIGURE 16

-15

-10

Residuals
-5

Fitted values against residuals of OLS log(fee) model

-10

-5

Fitted values
Non-free transfers

Free transfers

Using the Jarque-Bera normality test, the null hypothesis of the normality of errors is
rejected for this model except when free transfers are excluded. Figure 16 shows that
the non-normality of errors is caused by free transfers. Hence, I estimate a Tobit model
with a lower limit at log(0.000001).

The Tobit marginal effects in Table 6 indicate that in Russia, black players command a
transfer fee that is 6.98% higher on average than white players, ceteris paribus. The
corresponding figure in England is -27.11%, but both differentials are insignificantly
different from zero. The results do not differ substantially from the OLS results since
only 40 out of 786 observations are censored. Carmichael, Forrest and Simmon (1999)
note that there are substantive difference between their OLS and Tobit estimates.
However, 42 out of their 240 observations are censored - a much higher proportion than
in this study.
25

Turning to the issue of measurement error and omitted variables bias, an IV-GMM
model is estimated using language family as an instrument for race. Ceteris paribus,
black players were estimated to cost 3.74% more on average than white players in
Russia; in England, they cost 24.75% less. Again, both coefficients are poorly
determined. The instrument passed the formal relevance and exogeneity tests but the
null hypothesis that Black and Black*England are exogenous was not rejected, perhaps
indicating that an IV approach was not needed. This resonates well with results of the
IV-GMM and OLS models being fairly similar.

26

TABLE 7
Propensity score matching
England

Outcome
variable:
log(fee)

Treatment
variable: Black

Sample

ATT

Unmatched

-0.43
(0.44)

-35%

134

0.05

No replacement

0.03
(0.58)

3%

134(134)

0.01

With
replacement

-0.47
(0.66)

-37%

134(102)

0.03

Caliper (0.005)

-0.33
(0.67)

-28%

123(98)

0.03

Kernel
(bandwidth
0.06)

-0.24
(0.50)

-22%

132(312)

0.00

Unmatched

0.92***
(0.31)

152%

34

0.14

No replacement

0.53
(0.36)

70%

34(34)

0.06

With
replacement

0.49
(0.41)

64%

34(27)

0.12

Caliper (0.005)

0.66*
(0.40)

93%

30(26)

0.12

Differential as %

On support
Treated(Control)

Pseudo-R2

Russia

Kernel
0.39
(bandwidth
48%
33(304)
0.01
(0.28)
0.06)
ATT is the average treatment effect on the treated. Pseudo-R2 is from a post-matching Probit regression of Black on the same
independent variables as in the OLS model.

*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

Finally, I use PSM to calculate the racial differential using a comparable sample of black
and white players. All matching methods improve the comparability of the black and
white player samples since the R2 of the Probit regression of Black on the covariates is
reduced. Kernel matching produced the lowest R2; therefore, it is the preferred matching
method. With kernel matching, black players were found to cost 48% more in Russia but
27

22% less in England compared to matched white players. However, these estimates are
not statistically different from zero. This conclusion is not sensitive to the particular
matching method used.

In summary, this study finds no evidence of racial discrimination in the English transfer
market, in line with the results of Reilly and Witt (1995), Medcalfe (2008) and most of
the studies surveyed by Frick (2007). Szymanski (2000) remains one of the few studies
to have found evidence of racial discrimination in English soccer, but his investigation of
more recent data revealed that discrimination has disappeared. Although, at a first
glance, there seems to be a compensating differential paid for black players in Russia,
the racial gap is not estimated with enough precision to confirm the existence of a
compensating differential.

VI

EXTENSIONS & CONCLUSIONS

The purpose of this study was to examine whether previous results regarding
discrimination in the transfer market can be replicated using more comprehensive
measures of player performance in England and Russia. Different estimation techniques
(OLS, Tobit, IV, and PSM) were used to show the robustness of the result. To my
knowledge, no study had previously explored the Russian transfer market or used PSM
to estimate the racial gap.

The results of the OLS, Tobit, and IV models indicate that black players cost 4-10%
more on average compared to white players in Russia, but 22-26% less in England,
ceteris paribus. Using PSM, black players were found to be 48% dearer in Russia and
28

22% cheaper in England compared to matched white players. However, none of the
racial differentials are statistically different from zero, in accordance with the most
recent literature.

A limitation of this study is that it does not completely isolate England or Russia. An
undervaluation of a black player bought by an English club, for example, might be
facilitated by the prejudice of the owner of the Spanish club that sold him. A more
comprehensive study of racism in the transfer market must include more of Europes
biggest leagues, preferably following the same players over time. In addition, this study
interpreted the positive but insignificant premium for black players in Russia as a
compensating differential for racism by fans. However, it could just as well be due to the
harsh Russian weather. The appendix contains a preliminary investigation that suggests
the existence of the weather effect, but it is left for future work to explore it in detail.
Future research can also attempt to determine when exactly discrimination disappeared
and to use wages instead of transfer fees as the dependent variable.

29

REFERENCES
Anderson, C. and Sally, D. 2014. The Numbers Game: Why Everything You Know
About Football is Wrong. London: Penguin.
Becker, G. S. 1957. The Economics of Discrimination. Chicago: University of Chicago
Press.
Carmichael, F., Forrest, D., and Simmons, R. 1999. The labour market in association
football: who gets transferred and for how much? Bulletin of Economic Research, 51(2),
pp. 125-150.
Carmichael, F. and Thomas, D. 1993. Bargaining in the transfer market: theory and
evidence. Applied Economics, 25(12), pp. 1467-1476.
Chu, J., Nadarajah, S., Afuecheta, E., Chan, S., and Xu, Y. 2014. A statistical study of
racism in English football. Quality & Quantity, 48(5), pp. 2915-2937.
Deloitte. 2015. Revolution. Annual review of football finance. Manchester: Sports
Business Group, Deloitte.
Feess, E., Frick, B., and Muehlheusser, G. 2004. Legal restrictions on outside trade
clauses theory and evidence from German soccer. Discussion paper No. 1140, Institut
Zukunft der Arbeit, Bonn.
Frick, B. 2007. The football players labour market: empirical evidence from the major
European leagues. Scottish Journal of Political Economy, 54(3), pp. 422-446.

30

Gallo, E., Grund, T., and Reade, J. J. 2013. Punishing the foreigner: implicit
discrimination in the Premier League based on oppositional identity. Oxford Bulletin of
Economics & Statistics, 75(1), pp. 136156.
Goddard, J. and Wilson, J. O. S. 2009. Racial discrimination in English professional
football: evidence from an empirical analysis of players career progression. Cambridge
Journal of Economics, 33(2), pp. 295316.
Kahn, L. M. 1991. Discrimination in professional sports: a survey of the literature.
Industrial & Labor Relations Review, 44(3), pp. 395-418.
Kahn, L. M. 2000. The sports business as a labor market laboratory. The Journal of
Economic Perspectives, 14(3), pp. 75-94.
Kuper, S. and Szymanski, S. 2014. Soccernomics. 3rd ed. New York: Nation Books.
Medcalfe, S. 2008. English league transfer prices: is there a racial dimension? A reexamination with new data. Applied Economics Letters, 15(11), pp. 865-867.
Reilly, B. and Witt, R. 1995. English league transfer prices: is there a racial dimension?
Applied Economics Letters, 2(7), pp. 220-222.
Szymanski, S. 2000. A market test for discrimination in the English professional soccer
leagues. Journal of Political Economy, 108(3), 590-603.
Szymanski, S. and Preston, I. 2000. Racial discrimination in English football. Scottish
Journal of Political Economy, 47(4), pp. 342-363.

31

Szymanski, S. and Smith, R. 1997. The English football industry: profit, performance
and industrial structure. International Review of Applied Economics, 11(1), pp. 135-153.
Wilson, D. P. and Ying, Y.-H. 2003. Nationality preferences for labour in the
international football industry. Applied Economics, 35(14), pp. 1551-1559.

32

APPENDIX
Data sources & validity
Teams-dataset
For each team, there are statistics that measure the performance of the team over the whole
season. They are divided into 11 categories such as goals scored, shot accuracy, possession
and passing, and defensive actions. This data was collected from www.squawka.com, which in
turn licences its data from Opta, the worlds leading sports data provider.
The dependent variable, Goal difference, is highly correlated with other measures of
performance, such as league points and league ranking. It was chosen as the dependent
variable in order to have goals scored and goals conceded on the left-hand side of equation (2),
rather than regressing league points on goals which would somewhat result in a truism. By
definition, the mean goal difference should be zero since one teams goal scored is another
teams goal conceded, and this is verified in the dataset.

Players-dataset
Squad lists and player details such as age, contract length, and other statistics that rate the
players different features (for example, a rating of positioning ability out of 100) were collected
from www.sofifa.com, a website that hosts updated databases of the soccer videogame: FIFA.
The database dated 5th June 2015 was used, since it precedes summer transfer market
activity10. Estimated transfer fees were obtained from the same website. This was replaced by
the actual transfer fee if it was observed during the summer of 2015. Actual transfer fee figures
were obtained from www.transfermarkt.co.uk, which has been used in previous studies11. The
real-life performance measures of players such as goals scored and shots per game were
obtained from www.whoscored.com, another licensee of Opta data.
Initially, the dataset consisted of the 1120 players who were in the squads of the English and
Russian Premier League clubs during the 2014/15 season. This contains the whole population
of players in those two leagues. 260 marginal players who did not play at all during the season
were dropped, leaving 860 observations. In the regressions, the number of observations is 786
because the 74 goalkeepers are excluded. Goalkeepers are evaluated using distinctive
measures of performance and must be modelled separately.
Given the unconventional nature of the players-dataset, its validity might seem questionable.
www.sofifa.com obtains its data from the videogame FIFA. Created by EA Sports, it is one of
10

The exact dates of the summer transfer window vary by country, but it usually opens in June and
closes in early September.
11
Its German version was used in Frick (2007).

33

the most popular games in the world and large sums of money are dedicated to its development
each year. The accuracy of its information is reliable since it is scrutinised by avid soccer fans
and hence constantly updated. One of the websites best features is that it provides databases
as snapshots of time. During the summer transfer window of 2015, the only information
available in the market was what happened prior to the end of the 2014/15 season. Since I am
focusing on this window, I chose the database that dates back to 5th June 2015.
Simple tests can help illustrate the accuracy of the dataset. For example, 91 players in the
dataset were transferred in the summer of 2015, and the average actual fee paid was
4,246,813. The average estimated fee of these players is 4,274,231, and the difference in
means is not significant at 5%. For 91% of the transfers, the difference between the estimated
and the actual fee was not more than 1 standard deviation of actual transfer fees. Thus,
estimated fees seem to be a reliable indicator of how the market values players 12. Regarding
the self-coded variable Black, approximately 28% of players in England are coded as black. The
dataset that Chu et al. (2013) used had 25% of EPL players coded as black or ethnic minorities.
An increase of 3 percentage points over 3 years does not sound unreasonable.

TABLE A1
Teams-dataset variable definitions
Variable

Stata abbreviation

Definition

Goal difference

gdiff

Goals scored minus goals conceded

Year

year

Season 2014/15 = 2015 and so on

Shots (inside the area)

iashot

A shot from inside the 18-yard box

Shots (outside the area)

oashot

A shot from outside the 18-yard box

Shot accuracy

shotacc

A calculation of Shots on target divided by all shots (excluding blocked attempts)

Successful passes

sucpass

Key passes

keypass

Pass accuracy

passacc

Total completed passes during the season


The final pass or pass-cum-shot leading to the recipient of the ball having an
attempt at goal without scoring
=successful passes/total attempted passes

Possession

poss1

Interceptions

interc

Shots blocked

shotblock

Clearances

clearance

% of aerial duels won


Yellow cards
Red cards

aerialpc

=Total time team X's possession strings/total time of possession strings


This is where a player intentionally intercepts a pass by moving into the line of the
intended ball.
This is where a player blocks a shot from an opposing player
This is a defensive action where a player kicks the ball away from his own goal with
no intended recipient of the ball
Aerial duel 50/50 when ball is in the air

yellow
red

All variables in the teams-dataset were obtained from www.squawka.com. Definitions are obtained from Opta (2016).

12

Note that estimated fees date back to June 5th 2015 and actual fees are announced after the transfer
window opens in June. Hence, estimated fees could not have been artificially manipulated ex-post to
reflect actual fees paid.

34

TABLE A2
Players-dataset variable definitions
Variable
Transfer fee
Black

Stata abbreviation
fee (its log is lfee)
black

Source

Additional description

www.sofifa.com
www.transfermarkt.co.uk

Online images

=1 if non-White
=1 if player plays in England

England

eng

www.sofifa.com

Age

age

www.sofifa.com

contract

www.sofifa.com

Contracts usually end in June of that year

Year contract ends


Minutes played

mins

www.whoscored.com

Total minutes played in 2014/15

Playing position

gmp

www.sofifa.com

Defender, midfielder or forward

Goals

goals

www.whoscored.com

Assists

assists

www.whoscored.com

Shots per game

spg

www.whoscored.com

Pass accuracy

pspc

www.whoscored.com

aerialspg

www.whoscored.com

Winning a header in a direct contest with an opponent

Aerials won per game

A pass that directly leads to a chance scored by a


teammate
A shot is an attempt to score a goal, made with any
(legal) part of the body, either on or off target
Percentage of attempted passes that successfully
found a teammate

Positioning (out of 100)

posing

www.sofifa.com

FIFA's (the videogame) rating of a player's ability to


position himself favourably in attack

International Reputation

ir

www.sofifa.com

Scale of 1-4, 4 indicating a very well-known player

www.whoscored.com

=1 if in top quartile of the league table, and so on

Quartile rank of player's team

qrank

Definitions are obtained from WhoScored.com (2016).

35

Weather effect
FIGURE A1

.1

Density
.2

.3

.4

Density plot of log(fee) by regional weather, Russia

-15

-10

-5
log(fee)

Player comes from warm region


Player comes from cold region
kernel = epanechnikov, bandwidth = 0.5157

Figure A1 plots the distribution of log(fee) separately for players coming from warm13 and cold
regions. Only players of the Russian Premier League are included. It is very similar to Figure 4,
which compares the log(fee) distributions of black and white players in Russia. Hence, any
compensating differential for black players might actually be compensating them for the harsh
weather rather than racism.

13

Africa and South America were considered warm. Europe is coded as cold. Other continents were
excluded (only 3 observations in Russia).

36

TABLE A3
Isolating the weather effect in Russia
Weather effect,
all players

Outcome
variable:
log(fee)

Treatment
variable: Warm

Sample

ATT

Unmatched

1.07***
(0.28)

191%

0.11

Kernel
(bandwidth
0.06)

0.43*
(0.23)

53%

43(304)

0.01

Unmatched

1.21***
(0.41)

236%

0.14

Differential as %

On support
Treated(Control)

Pseudo-R2

Weather effect,
white players

Kernel
0.53*
(bandwidth
71%
19(304)
0.01
(0.29)
0.06)
ATT is the average treatment effect on the treated. Pseudo-R2 is from a post-matching Probit regression of Warm on the
same independent variables as in the OLS model.

*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

To test the weather effect hypothesis, I used Propensity Score Matching with the treatment
variable being equal to 1 if the players home country is in a warm region. There is no evidence
of a compensating differential for players coming from warm countries at 5%, but the differential
is significant at 10%. The effect remains positive and significant at 10% even when it is
estimated for non-black players only (granted there are only 19 white players that come from
warm countries to play in Russia). The last estimate attempts to distinguish between the race
and weather effect since it does not include black players. This points to the fact that any
transfer fee differential thought of as racial discrimination could be confused with a weather
effect. It is worth mentioning at this stage that adding the dummy variable Warm to the previous
OLS, Tobit, and IV log(fee) models does not affect the significance of the ethnic dummies
meaningfully and is insignificant itself.

37

Diagnostic tests & intermediate results


TABLE A4
Teams Fixed Effects model diagnostics
Test

Statistic

Jarque-Bera (normality)
: Errors normally distributed
Wooldridge (autocorrelation)
: No first order serial correlation
Ramsey RESET
: No non-linear terms omitted

,
,

=5.49*

=0.62

=1.53

P-value

Decision (at 5%)

0.064

Do not reject

0.434

Do not reject

0.208

Do not reject

Overall error component and a+xb default fitted values (excluding fixed effect) are used.
*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

TABLE A5
OLS log(fee) model diagnostics
Test

Statistic

Jarque-Bera (normality)
: Errors normally distributed
Jarque-Bera (normality) excluding free transfers
: Errors normally distributed
Ramsey RESET
: No non-linear terms omitted
*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

=3560***
,

= 4.478

=9.03***

P-value

Decision (at 5%)

0.000

Reject

0.107

Do not reject

0.000

Reject

TABLE A6
IV-GMM log(fee) model diagnostics
Test
Over-identifying restrictions (Hansens J) test
: Restrictions valid
C (difference-in-Sargan) endogeneity test
: Variables treated as endogenous are
exogenous
Sheas adjusted partial R2 (first-stage)

Statistic
= 2.67

= 0.326

Black: 0.464

P-value
0.615
0.988

Decision (at 5%)


Do not reject
Do not reject

Black*England: 0.349

*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

38

TABLE A7
Teams Fixed Effects model
Dependent variable: Goal difference
Variable
Intercept
Shots (inside the area)
Shots (outside the area)
Successful passes
Possession
Year
Shot accuracy
Key passes
Pass accuracy
Interceptions
Shots blocked
Clearances
% of aerial duels won

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

-80.612***
(21.759)
0.139***
(0.028)
-0.038
(0.029)
0.0012
(0.0008)
78.498
(51.372)

2452.066
(1919.874)
0.140***
(0.028)
-0.046
(0.030)
0.0015*
(0.0009)
51.203
(56.101)
-1.252
(0.948)

2303.593
(1846.262)
0.131***
(0.028)
-0.035
(0.031)
0.0014*
(0.0008)
43.822
(52.951)
-1.195
(0.911)
86.675***
(27.977)

1635.662
(1719.067)
0.375***
(0.056)
0.202***
(0.054)
0.0016
(0.0011)
70.432
(42.557)
-0.863
(0.843)
62.314**
(26.272)
-0.335***
(0.063)
-18.195
(53.227)

405.929
(1803.663)
0.375***
(0.055)
0.221***
(0.056)
0.0023**
(0.0011)
68.697
(41.679)
-0.265
(0.883)
67.186**
(26.273)
-0.344***
(0.063)
-43.381
(55.331)
-0.005
(0.007)
0.051
(0.048)
0.002
(0.006)
62.485**
(28.173)

136.3956
(1962.074)
0.368***
(0.055)
0.205***
(0.054)
0.0022*
(0.0011)
78.385*
(42.212)
-0.124
(0.965)
56.672**
(27.373)
-0.334***
(0.061)
-51.884
(56.099)
-0.005
(0.007)
0.043
(0.048)
0.003
(0.006)
69.597**
(28.971)
-0.094
(0.082)
-0.444
(0.348)

-8.009534
(2007.439)
0.426***
(0.156)
0.099
(0.148)
0.0020*
(0.0012)
89.533**
(44.915)
-0.053
(0.987)
55.092*
(27.999)
-0.336***
(0.060)
-47.562
(57.483)
-0.006
(0.007)
0.047
(0.049)
0.003
(0.006)
69.734**
(29.283)
-0.095
(0.081)
-0.436
(0.353)
-0.0001
(0.0003)
0.0002
(0.0003)

1079.953
(2219.091)
0.418***
(0.157)
0.157
(0.153)
0.0044
(0.0042)
1045.826**
(431.035)
-0.131
(1.055)
49.867*
(25.420)
-0.311***
(0.064)
-3132.93***
(796.781)
-0.002
(0.007)
0.022
(0.046)
0.002
(0.006)
60.997*
(31.641)
-0.098
(0.084)
-0.419
(0.349)
-0.0001
(0.0003)
0.00005
(0.0003)
-1.14x10-7
1.24x10-7
-1003.814**
(449.577)
2029.435***
(523.551)

0.695
326

0.699
326

0.715
326

194.4491
(2337.381)
0.369**
(0.154)
0.180
(0.154)
0.0049
(0.0043)
972.630**
(412.607)
0.366
(1.110)
51.145**
(25.363)
-0.289***
(0.062)
-3322.441***
(770.370)
-0.003
(0.008)
0.017
(0.046)
0.004
(0.006)
65.946**
(30.968)
-0.847***
(0.289)
0.143
(0.756)
-6.67x10-5
(2.56x10-4)
-5.3x10-5
(3.04x10-4)
-1.37x10-7
(1.30x10-7)
-933.645**
(429.399)
2153.113***
(504.118)
0.0043***
(0.0016)
-0.058
(0.065)
0.709
326

Yellow cards
Red cards
Shots (inside the area)^2
Shots (outside the area)^2
Successful passes ^2
Possession^2
Pass accuracy^2
Yellow cards^2
Red cards^2
Overall R2
0.647
0.632
0.678
0.690
0.680
n
326
326
326
326
326
Variables in bold were judged to be stable across all the FE specifications that were estimated. Heteroscedastic-robust standard errors are reported in parentheses.
*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

TABLE A8
OLS models for log(fee)
39

Dependent variable: log(fee)


OLS 1

OLS 2

OLS 3

OLS 4

OLS 5

Variable

OLS 6
OLS 7
Coefficient estimate

OLS 8

OLS 9

OLS 10

OLS 11

OLS 12

Intercept

-0.5460242***
(0.0993749)

-33.59748***
(3.399484)

-1462.718***
(188.4809)

-1272.035***
(178.3536)

-1271.562***
(178.0201)

-1242.346***
(181.2309)

-1237.914***
(181.4004)

-1200.135***
(179.8622)

-1199.394***
(178.8054)

-1184.483***
(178.0358)

-1126.002***
(176.0031)

-1049.07***
(169.1528)

Black

0.9247557***
(0.2543025)

0.3240023
(0.2522994)

0.491561**
(0.2292856)

0.4311717**
(0.2164018)

0.3654418*
(0.2137041)

0.3017475
(0.2086695)

0.2976347
(0.2131048)

0.2092435
(0.2171401)

0.2116939
(0.220217)

0.1869803
(0.2117008)

0.1496685
(0.2110771)

0.0913128
(0.2173581)

English League

0.3927896
(0.2552253)

0.3282041
(0.2425465)

0.137548
(0.2498663)

-0.1439521
(0.259929)

-0.124949
(0.2578703)

-0.1437809
(0.2563998)

-0.1274481
(0.2586887)

-0.1925039
(0.2602328)

-0.2417037
(0.2679291)

-0.4372963
(0.280607)

-0.7420614**
(0.2918694)

-0.5150809*
(0.285986)

Black * English League

-1.357576**
(0.5333464)

-0.7672104
(0.4934363)

-0.8283358*
(0.4614353)

-0.6146788
(0.4381091)

-0.6129415
(0.4329347)

-0.5286933
(0.4242333)

-0.5375809
(0.4274838)

-0.4974259
(0.4291308)

-0.4763813
(0.4329963)

-0.453458
(0.4305837)

-0.4843371
(0.4280017)

-0.386521
(0.4319093)

Age

2.730662***
(0.2869453)

2.821144***
(0.2835609)

2.246236***
(0.3220432)

2.214675***
(0.3241151)

2.206419***
(0.3246577)

2.157396***
(0.3300285)

2.128471***
(0.3300271)

2.097975***
(0.333515)

1.9603***
(0.3449312)

1.886224***
(0.3412702)

2.046171***
(0.3451564)

Age^2

-0.0546782***
(0.0059299)

-0.0543994***
(0.0058257)

-0.0442893***
(0.0064304)

-0.0436202***
(0.0064706)

-0.0434734***
(0.0064824)

-0.0425381***
(0.0065806)

-0.0420637***
(0.006581)

-0.0416149***
(0.0066458)

-0.0392476***
(0.006854)

-0.0386825***
(0.0067626)

-0.0413236***
(0.0068272)

0.7073776***
(0.0933439)

0.6162956***
(0.0882414)

0.6160598***
(0.0880634)

0.6017054***
(0.0896366)

0.5997781***
(0.0897038)

0.5804225***
(0.089004)

0.580143***
(0.0884968)

0.573176***
(0.0881131)

0.544854***
(0.0870872)

0.5066832***
(0.0836948)

0.0007789***
(0.0001539)

0.0008208***
(0.0001603)

0.0006791***
(0.0001781)

0.0006662***
(0.0001782)

0.0006417***
(0.0001783)

0.0005404***
(0.0001818)

0.0005381***
(0.000182)

0.0005706***
(0.0001802)

0.0006407***
(0.0001762)

Midfielder

0.3876292
(0.2390639)

0.2418484
(0.2610656)

0.1301262
(0.2837532)

0.0455546
(0.2836107)

0.2034864
(0.2886493)

-0.127355
(0.3137844)

-0.0429872
(0.311675)

0.080856
(0.3096853)

Forward

0.6755153**
(0.311238)

0.4039185
(0.3705272)

0.2411875
(0.4056347)

0.3436154
(0.4034268)

0.4101613
(0.4083816)

-0.0934717
(0.44678)

0.0592357
(0.4394817)

0.2604545
(0.4362091)

Goals

0.0492061
(0.0365807)

0.0077105
(0.0463427)

0.007531
(0.0464897)

0.0061883
(0.0463474)

-0.0076671
(0.0465488)

-0.0303144
(0.0466724)

-0.0666166
(0.0464446)

Assists

0.059565
(0.0535137)

0.0437881
(0.0550889)

0.048767
(0.0546322)

0.0827197
(0.0612376)

0.0611207
(0.0621864)

0.0296608
(0.064348)

-0.00133
(0.062779)

0.0550889
(0.2526543)

0.3257533
(0.2537331)

0.2917413
(0.2567285)

0.2044739
(0.2553503)

0.1328124
(0.2464607)

0.2994122
(0.2406342)

0.0229788**
(0.0107371)

0.02434**
(0.0108088)

0.0232419**
(0.0103834)

0.017481*
(0.0104265)

0.010429
(0.0115054)

0.2181748**
(0.1091625)

0.3091259***
(0.1148017)

0.3036484***
(0.112907)

0.2659388**
(0.1116066)

0.0261276***
(0.0097011)

0.0215587**
(0.0097083)

0.0150859
(0.0099569)

0.7113674***
(0.218271)

0.429707*
(0.2239087)

Year contract ends


Minutes played

Shots per game


Pass accuracy
Aerials won per game
Positioning (out of 100)
International Reputation

-0.4836071***
(0.1093855)

Quartile rank of player's team


R2
n

0.005
786

0.140
786

0.216
786

0.250
786

0.255
786

0.257
786

0.259
786

0.264
786

0.267
786

0.273
786

0.273
786

0.304
786

Variables in bold are significant at least at 5% across all the OLS specifications. Heteroscedastic-robust standard errors are reported in parentheses. Model 12 is the one reported in Table 6.
*0.05<p<0.10, **0.01<p<0.05, ***p<0.01

40

TABLE A9
Tobit model of log(fee)
Dependent variable: log(fee)
Variable

Coefficient estimate

-1096.032***
(180.4472)
0.0675127
Black
(0.2236095)
-0.5813164*
England
(0.3044589)
-0.3836768
Black * England
(0.4494035)
2.141238***
Age
(0.3810514)
-0.0433471***
Age^2
(0.0075765)
0.5294322***
Year contract ends
(0.0892279)
0.0006642***
Minutes played
(0.0001845)
0.0990258
Midfielder
(0.3231815)
0.2923622
Forward
(0.4553551)
-0.0690122
Goals
(0.0481485)
-0.0033454
Assists
(0.0649732)
0.3000869
Shots per game
(0.2499363)
0.0106405
Pass accuracy (%)
(0.012148)
0.2732846**
Aerials won per game
(0.116914)
0.0145672
Positioning (out of 100)
(0.0104316)
0.4320847*
International Reputation
(0.2333243)
-0.4946296***
Quartile rank of player's team
(0.1147298)
Pseudo-R2
0.0658
n
786
# censored
40
3.02246
Sigma
(0.2001577)
Variables in bold are significant at 5% at least. Heteroscedastic-robust standard errors are reported in parentheses.
*0.05<p<0.10, **0.01<p<0.05, ***p<0.01
Intercept

41

APPENDIX REFERENCES
Chu, J., Nadarajah, S., Afuecheta, E., Chan, S., and Xu, Y. 2014. A statistical study of racism in
English football. Quality & Quantity, 48(5), pp. 2915-2937.
Frick, B. 2007. The football players labour market: empirical evidence from the major European
leagues. Scottish Journal of Political Economy, 54(3), pp. 422-446.
Opta. 2016. BLOG Optas event definitions [Online]. Available: http://optasports.com/newsarea/blog-optas-event-definitions.aspx [24 April 2016].
WhoScored.com. 2016. Glossary [Online]. Available: https://www.whoscored.com/Glossary [24
April 2016].

42