SSRN Id851284
SSRN Id851284
SSRN Id851284
www.NETinst.org
October 2005
The Effect of P2P File Sharing on Music Markets: A Survival Analysis of Albums on
Ranking Charts
Rahul Telang
H John Heinz III School of Public Policy and Management
Carnegie Mellon University
1
Financial support from the NET Institute (http://www.NETinst.org) is gratefully acknowledged
1
Abstract
Recent technological and market forces have profoundly impacted the music industry.
Emphasizing threats from peer-to-peer (P2P) technologies, the industry continues to seek
sanctions against individuals who offer significant number of songs for others to copy. Yet there
is little rigorous empirical analysis of the impacts of online sharing on the success of music
products. Combining data on the performance of music albums on the Billboard charts with file
sharing data from a popular network, we: 1) assess the impact of recent developments related to
the music industry on survival of music albums on the charts, and 2) evaluate the specific impact
of P2P sharing on an album’s survival on the charts. In the post P2P era, we find significantly
reduced chart survival. The second phase of our study isolates the impact of file sharing on
album survival. We find that sharing does not seem to hurt the survival of albums.
2
1. Introduction
processes and systems in a number of domains. For example, as a result of Internet technologies,
the stock market is now accessible to more investors, able to execute orders faster, and thwart
simple arbitrage with widespread and near-instant information availability. The entertainment
industry, in particular the music business, has been profoundly impacted by such technological
changes. Music related technologies such as audio compression technologies and applications
(MP3 players in 1998), peer-to-peer (P2P) file sharing networks like Napster (in 1999), and
online music stores (in 2000) were introduced in a relatively short span of time, and gained rapid
popularity. The consumers of music, in turn, have adapted rapidly to the new topical
surroundings. Music, musicians and related terminology have consistently been among the top
effective choice to study the impact of technological change and other market forces on such
goods.
The music industry and its legal arm, the Recording Industry Association of America
(RIAA), have repeatedly claimed that emerging technologies, especially P2P networks, have
negatively impacted their business. RIAA reports that music shipments, both in terms of units
shipped and dollar value, have suddenly and sharply declined since 2000 (www.riaa.org). They
attribute this dramatic reversal in revenues, coming in the heels of sustained long-term growth in
the music business, directly to the free sharing of music on online P2P systems. This assertion
has garnered immense attention, and has been the subject of numerous debates (Liebowitz 2004,
King, 2000a, King 2000b, Mathews and Peers 2000, Peers and Gomes 2000, Evangelista 2000).
3
At the heart of the debate is whether sharing of music online leads to piracy or sampling.
Proponents of the former argue that because of low marginal cost of reproduction of digital
music and its quasi-public good characteristic, there is little loss in value and probably higher
network externality in freely sharing music with others. P2P technologies lead to free-riders and
undermine market efficiencies in the music industry with users obtaining music freely in lieu of
legally purchasing it (Alexander 2002). Claiming that the impact of online music sharing on its
business has been devastating, RIAA has aggressively pursued greater copyright enforcement
and stronger regulations (Harmon 2003). Their initial legal strategy was aimed at Napster. RIAA
succeeded in shutting down Napster, in large part due to the liability related to the centralized
structure of Napster’s file search technology. The so-called ‘Sons of Napster’ quickly emerged to
fill the vacuum, and these networks escaped the legal wrath by deploying further de-centralized
structures. In response RIAA has since altered its legal strategy by seeking sanctions against
individuals “who offer significant number of songs for others to copy” (Ziedler 2003).
The opponents, on the other hand, argue that P2P systems significantly enhance the
ability of users to sample and experience songs. Digital technologies have undoubtedly made
information sharing and sampling easier2 (Barua, et. al. 2001, Bakos et. al. 1999, Brynjolfsson
and Smith 2000) and less costly (Cunningham, et. al. 2003, Gopal et. al. 2004) for individuals. It
has been argued that consumers’ increased exposure to music, which is made possible by P2P
systems, can be beneficial to the music industry. An expert report to the court in the Napster case
alludes to the possibility that such online sharing technologies provide sampling mechanisms that
may subsequently lead to music album sales (Fader 2000). They further argue that the decline in
the music industry is due to factors other than P2P enabled music sharing. Concomitant with the
introduction and popularity of P2P systems, the music industry has also seen (a) a decrease in the
2
Online fan clubs exist for numerous popular performers.
4
number of albums released, (b) increasing competition for consumer time and resources from
non-music activities such as video games, DVDs, and online chat rooms (Mathews and Peers
2000, Mathews 2000a, Boston 2000), and (c) a downturn in the macroeconomic conditions (e.g.
drop in GDP growth rates and employment figures since 2000, until 2004).
While both arguments hold intuitive and theoretical appeal, clearly the question is
inherently empirical. However, extant literature on rigorous empirical evaluation of the impacts
of sharing on the success of music products is sparse. Much of the existing work is anecdotal or
survey-based. Issues such as self-reporting bias, sample selection problems, lack of suitable data
to draw the appropriate conclusions have led to contradictory findings. A notable exception is the
recent work by Oberholzer and Strumpf (2004) which relates the downloading activity on two
P2P servers with the sales of music albums. Their data set spans the final seventeen weeks of the
year 2002. The data is obtained from OpenNap, a relatively small P2P network with a
centralized structure as in Napster. The significant finding of the study is that the effect of
Our study complements existing empirical works and adds to the growing understanding
of the impacts of file sharing on the music industry. We employ micro-data on the performance
of music albums on the Billboard Top 100 weekly charts, and the daily file sharing activity of
these albums on WinMx, one of the most popular file sharing P2P networks. The objectives of
our study are two fold: (1) assess the impact of recent market and technological developments
related to the music industry on the survival of music albums on the top 100 charts, and (2)
evaluate the specific impact of P2P sharing on the album’s subsequent survival on the chart.
Since 1913, Billboard magazine has provided chart information based on sales of music
recordings (Gopal et. al. 2004). The chart information for the weekly Top 100 albums is based
5
on “…a national sample of retail store sales reports collected, compiled and provided by Neilsen
staggering economic implications and has far reaching influence on awareness, perceptions and
profits (Bradlow and Fader 2001). Having an album featured in the charts is the primary goad of
most popular music artists and their record labels (Strobl and Tucker 2000). Our focus here is on
the survival of albums as measured by the number of weeks an album appears on the charts
before the final drop off. This survival on the charts captures the “popular life” of an album, and
has been the object of analysis in a number of studies related to music (Strobl and Tucker 2000;
The first phase of our study provides a comparative analysis of album survival before and
after the event window, where the event window is year 1998-99. P2P networks, MP3 players,
etc. gained immense popularity from this period onward. In total, over 200 weeks of chart
information, spanning the years 1995-2004, is utilized in this phase of the study. Important
covariates of album survival are analyzed to assess any changes in their impact between the pre
(mid 1995–mid 1998) and post (mid 2000–mid 2003) time segments (henceforth referred to as
pre-TS and post-TS respectively). The covariates utilized in the study include: debut rank of the
album, reputation of the artist (as captured by the superstar status), record label that promotes
and distributes the album, and artist descriptors (solo female/solo male/group).
Our results show strong evidence that, overall, survival on the charts is significantly
lower in the post-TS period. Interestingly, albums that debut high on the charts did not
experience a significant decline in the post-TS period while those albums that debut low on the
While the first phase of the study provides the cumulative effect of technology and other
6
factors on chart survival, the second phase of the study attempts to isolate the impacts of file
sharing on chart success. Data on sharing activity is collected for over 300 albums over a period
of 60 weeks on WinMx (a very popular file sharing application), and is analyzed along with the
associated chart information and other covariates. We find that since the occurrence of the
significant events outlined above (in the mid-1998 to mid-2000 time frame), the effect of debut
rank on chart success has risen while the effect of major labels has fallen. In addition, solo
female artists perform better than either solo male artists or groups across the periods.
Our aim is to analyze the impact of various factors on an album’s survival. Survival
models are quite popular in literature (See Keifer 1988 for details) and it is well known that if the
data is right or left censored then OLS (ordinary least square) leads to biased estimates (hence
many survival models use a proportional hazard model). However, we face no left or right
censoring in our data and hence logarithmic transformation of the dependent variable (survival)
yields to OLS analysis. We also use an instrumental variable approach when analyzing the
sharing data, and again, log transformation of survival yields to robust and widely used 2-stage
least square analysis (See Abowd and Kang 2002 for similar analysis in different context).
Therefore, we use log transformation of survival as our dependent variable and use OLS for
future analysis.
The first part of our analysis aims to understand the overall trend of album survival
7
where i is the album specific subscript, Xi is a vector of album specific control variables: debut
B B
rank, superstar status, distributing label (major/minor) and debut month (from extant literature).
The effect of an artist’s gender on album survival, not tested earlier, is also explored here. Debut
post-TS is an indicator that signifies whether an album debuted on the charts in the post period,
which is 1 if the album debuted in the post period (2000-02) and 0 otherwise. The estimate δ is
of significant interest here, as it indicates how survival has changed over the pre and post
periods.
The second part of the analysis examines the specific impact of file sharing on an
individual album’s survival. We observe the number of files being shared for each album in time
segment post-TS 3 (described in Table 2 below). We use this information to understand how the
Log(Survivali) = Xi β + Log(Sharesi)λ + µi
B B B B B B B B (2)
where, as before, Xi is a vector of album specific control variables discussed above, and Shares
B B
denotes the number of files being shared for a given album during its debut week. We use a
logarithmic transformation for shares to account for high variance and skewness in the sharing
levels across albums. The estimate λ is of key interest here, which denotes the impact of initial
correlated with some unobservable album characteristics which also affect survival (say
popularity of a particular artist). While debut rank should control for some of this, such a
8
correlation will bias the estimate for λ, as Shares will be correlated with error term µi (violating
B B
OLS assumptions). One strategy then is to find an instrument which is correlated with sharing
Log(Sharesi) = Zi α + Xi β + vi
B B B B B B B B (3)
Where Zi is a vector of instruments which are uncorrelated with µi. A general strategy is to
B B B B
substitute the predicted values of sharing into the first stage (eqn. 2) and re-estimate the first
stage. This ensures that the estimates are unbiased. We use an instrument based on a natural
experiment that occurred during our data collection period. This had direct implications on
In June 25, 2003 RIAA announced that it would start legal actions against individuals
who are sharing files on P2P networks, which was extensively disseminated through various
print and broadcast media on June 26. This event had a direct impact on users sharing these files
on the network. This event can be used as an instrument as it shifted the intensity of sharing but
would be uncorrelated with the error term. Thus Zi is 1 for data after June 2003 and 0 otherwise.
B B
To use this event as an instrument we collected sharing data from July 2003 to December
2003. We include only those albums that debut between Feb-May 2003 and Jul-Oct 2003. Using
the sample described above and the event in June 2003 as the instrument, we can estimate
Data
The data set for the first analysis consists of weekly rankings of albums on the Billboard
top 100 charts. For each year, the data consists of albums that debut during 34 consecutive weeks
9
of observation. The exact start date for each year is shown in Table 1. Our data collection
captures both the traditional holiday sales period, when new releases and sales volume are the
Time Segment
T T Start Date
T T
♦ Survival: number of weeks an album appears on the Billboard top 100 charts. On occasion,
an album may drop off for some weeks and reappear again on the chart. Each album is
continuously tracked till its final drop-off. Note that the drop-off may occur well beyond the
♦ Debut rank: the rank at which an album debuts on the Billboard top 100 chart. Numerically
♦ Debut post-TS: This is an indicator variable which is 0 for albums in pre time segments and 1
for post.
♦ No of albums: The number of albums released during each year of the study period. This is
used as a control variable since more albums released in a given year may signify increased
10
♦ Superstar: a binary variable denoting the reputation of the artist. If a given album’s artist has
previously appeared on the Billboard top 100 Chart for at least 100 weeks (on or after
January 1, 1991) prior to the current album’s debut then the variable is set to 1, otherwise 0.
♦ Minor label: a binary variable that is set to 0 if the distributing label for a given album is one
♦ Solo Male: a binary variable that denotes if an album’s artist is a solo male (e.g. Eric
Clapton).
♦ Solo Female: a binary variable that denotes if an album’s artist is a solo female (e.g. Britney
Spears).
♦ Group: a binary variable that denotes if an album’s artist is a group (male or female) (e.g.
♦ Holiday_month Debut: To control for the holiday effect (or “Christmas effect”), we include
indicator variables for albums debuting in December month, which is 1 if album debuted in
Table 2 presents descriptive statistics of the data. The average survival has decreased
between the two periods, from about 14 to 10 weeks, suggesting that albums do not last as long
on the charts in the post period. Conversely, debut rank has improved from 49 to less than 40 on
average, indicating that albums debut at a better position but drop more steeply in post period,
while the number of albums released has essentially stayed the same or increased marginally.
This may indicate that album sales may be concentrated upfront in this period, however lack of
publicly available sales data precludes us from investigating this phenomenon. There is also a
physical limit to the size of upfront sales in consecutive weeks, which is primarily constrained by
11
logistics, distribution and retailer shelf space. Retail distribution is the major sales channel,
accounting for more than 98% of sales. The number of superstars appearing on the chart has
decreased marginally in post period, while male and female artists have registered a small
increase at the expense of groups. Finally, albums from minor labels show a significant jump on
Survival 14.2 wks 14.6 wks 15.3 wks 11.3 wks 9.5 wks 9.6 wks
The second data set used in the analysis relates to the album-level sharing activity.
Sharing information is captured from WinMX for the 34 week period corresponding to the time
segment post-TS3. We also collected additional data from July-Dec 2003 to be able to use as an
instrument when examining the impact of sharing on album survival, as discussed in §3.
Although a number of file sharing applications were available, we conducted our data gathering
12
on WinMX for two primary reasons: (1) During the data collection period, WinMX was the
second most widely used P2P network (Schatz 2003); and (2) KaZaA, the most popular P2P
network at the time, places a fixed limit on how many files can shown on any given search result.
Using KaZaA results in significant understatement of the level of sharing activity, due to this
The data was collected from WinMX daily. Each day, we began with the list of albums
that appeared on the Billboard top 100 chart since October 25, 2002 until the current week. The
list of albums is sorted in a random order, and search is initiated for each album. The daily
results are averaged to produce weekly information on sharing for that album. While we have
data on the sharing activity for every week after an album makes its first appearance on the chart,
our analysis focuses on the sharing levels of an album during its debut week. Inclusion of sharing
levels in subsequent weeks did not add qualitatively to the results, as the sharing levels across the
initial few weeks are highly correlated. We find the mean number of copies in our sample to be
approximately 802, with a minimum of 1 and maximum of 6620. We introduce the final model
parameter:
♦ Shares: average number of copies of an album available on the network during the debut
week.3
TP PT
Note that our focus is not on measuring the direct impact of piracy (due to downloads) on
network. The use of “availability” of a file has several advantages, and does not suffer from
potential sampling bias associated with “download” data. First, availability of a file on a user’s
computer is a greater indication of the file’s archival value to the user than his/her downloading
3
Various other formulations of shares were considered, including the proportion of tracks from an album that are
TP PT
available, and the number of unique users sharing a particular album; all produced consistent results.
13
activity, which may result in the file being listened to and discarded. Second, search results for
the number of available copies of a file returns information from a significant number of users on
the network, and it is more accurate which reduces sampling bias. However, collecting
downloading information requires monitoring “super nodes” through which control information
is routed. Oberholzer and Strumpf (2004), for example, use download data from two servers.
However, one is unlikely to measure true number of downloads and it is not clear whether such a
sampling methodology reduces bias, given the hybrid nature of the network4. Finally, a higher
availability of a file on the network increases the ease and opportunity of finding and
downloading it. Thus higher availability signifies the popularity and is probably highly
correlated with actual downloads. This has also been the modus operandi of RIAA, which has
Estimation Results
Table 3 presents the estimation results for the first part of our analysis (equation 1). We
include only Solo Male and Group in our analysis, with Solo Female as the base category.
Coefficients on all variables, except albums released, were significant (0.01 level). Of the
variables, superstar and Holiday_month debut enhance album survival, while the other variables
display a deleterious impact. In particular, we note that survival in the post-TS period, ceteris
paribus, is estimated to have declined by approximately 42% 5. This significant shift in the
4
Several nodes are connected to a super node, which monitors the activity of the connected nodes. Hence it is
possible that the downloading information may be biased by the types of users connected to the monitored super
node. Availability information, as collected and used in this paper as “shares”, usually is gathered by contacting
several super nodes for the information if it is not available with the nearest super node, which reduces the bias.
5
This result follows since the dependent variable is in logarithmic form while the explanatory variable is not.
Comparing the pre- and post-TS periods yields a difference of 1- e-0.54, which equates to a 42% decline.
14
survival pattern is consistent with our summary data in Table 1, where the mean survival time
shows a sharp decrease. Albums that debut at higher numerical rank (hence less popular) tend to
R2 0.345
Adjusted R2 0.342
The estimation results also highlight the reliance of the music business on an artist’s
superstar status for chart success is still viable. The estimate of 0.30 suggests that an album by a
superstar survives 35% more on the charts, ceteris paribus. Further, albums promoted by major
labels tend to last longer than those promoted by minor labels. Those from minor labels survive
15
23% fewer weeks on average than albums from major labels. Turning to the gender effect, it is
interesting to note that neither solo male artists nor groups survive as long as female artists on
the top 100 chart6. In fact, groups tend to survive the shortest time. Albums that are released in
December are estimated to survive 23% more weeks than albums released at other times,
The previous analysis indicated that album survival has suffered in the post period – a
period characterized by the presence of P2P sharing networks. To analyze whether this drop in
survival might be attributable to sharing, we now focus on how intensity of sharing affects
Table 4 presents the estimation results without the instrumental variable. The impact of
sharing is positive but insignificant with Shares_debut. This suggests that sharing is beneficial
with more sharing leads to longer survival. However, as noted earlier in the model discussion,
this estimate may be spurious and we now incorporate an instrumental variable for more robust
analysis.
6
Solo female is the reference category.
16
Minor label 0.10 (0.9)
R2 0.58
Adj R2 0.57
models (2) and (3). In the first stage we estimate (3), and in the second stage we estimate (2)
with the predicted values from (3). We estimate this model with the 4-month prior and post
samples around the RIAA announcement event described in Section 3 (Feb-May 2003 and July-
Oct 2003).
Table 5 reports the estimation results with the instrument. In the first stage regression,
the instrument RIAA announcement indicator is highly significant and negative. The estimated
sharing decrease linked to the RIAA announcement (threat to sue file sharers) is approximately
80%. Debut rank is also highly significant and negative, indicating that less popular albums
(which debut at higher numerical rank) have significantly less sharing opportunities available.
The first stage results also indicate that albums from superstars and those released by groups are
shared less. The fit of the first stage model is approximately 38%. The second stage analysis
indicates that, overall, sharing does not significantly affect survival (the sign is negative, but
17
Table 5: Overall Impact of Sharing on Survival using Instrument
R2 0.38 0.48
Conclusion
Our modeling approach, and the extensive data set at our disposal in particular, provides
a rigorous analysis and insight of an extremely important topical question: the effect of free
information sharing networks on mostly copyrighted goods, and its related impact on intellectual
18
ii) the superstar effect appears to be alive and well;
iii) albums from minor labels are increasingly narrowing the gap with those from
major labels;
References
Abowd John and C. Kang (2002), “Simultaneous Determination of Wage Rates and Tenure”,
Adler, M. 1985. Stardom and Talent. The American Economic Review 75(1) 208-212.
Alexander, P.J., 2002. Digital distribution, free riders, and market structure: the case of the music
Bakos, Yannis, Brynjolfsson, Erik and Lichtman, Douglas, “Shared Information Goods”. Journal
Barua, A., Konana, P., Whinston A. B., Yin, F. 2001. Driving E-Business Excellence. Sloan
Boehlert, Eric. 2001. Pay For Play. Salon.com, March 14, 2001
Boston, W. 2000. Bertelsmann is betting that users, content rule with Napster deal. The Wall
19
Bradlow, E. T., and Fader, P. S. 2001. “A Bayesian Lifetime Model for the “Hot 100” Billboard
Songs.” Journal of the American Statistical Association, Vol. 96, No. 454, pp. 368-381, June
2001.
Brynjolfsson, Erik and Smith, Michael, “Frictionless Commerce? A Comparison of Internet and
Chung, K.H and Cox A.K. 1994. A Stochastic Model of Superstardom: An Application of the
Conner, K. R., and Rummelt, R. P. 1991. Software Piracy: An Analysis of Protection Strategies.
Crain, W. Mark; Tollison, Robert D. 2002. Consumer Choice and the Popular Music Industry: A
Dhar, Ravi and Wertenbroch, Klaus. 2000. “Consumer choice between hedonic and utilitarian
Evangelista, B. 2000. CD soars after Net release. The San Francisco Chronicle, Oct 12, 2000,
B2.
Fader, Peter, S. 2000. “Expert Report of Peter S. Fader, Ph.D.”, in Record Companies and Music
Publishers vs. Napster, July 26, 2000, United States District Court, Northern District of
California.
Givon, M., Mahajan, V., and Muller, E.. 1995. Software Piracy: Estimation of Lost Sales and the
20
Gopal, R. D., S. Bhattacharjee, G. L. Sanders, “Do Artists Benefit From Online Music
forthcoming, 2002.
Gopal, R.D., Sanders, G. L., Bhattacharjee, S., Agrawal, M., Wagner, S. 2002. A Behavioral
Commerce, forthcoming.
Green, H. “Kissing Off The Big Music Labels”, Businessweek, September 6, 2004.
Hamlen, W. A. Jr. 1991. Superstardom in Popular Music: Empirical Evidence. The Review of
King, B. 2000a. Napster: Music's Friend or Foe? Wired.com News, June 14, 2000.
King, B. 2000b. Napster's Good? Bad? Er, What? Wired.com News, June 15, 2000.
Krider, Robert E. and Charles B. Weinberg. 1998. Competitive dynamics and the introduction of
new products: The motion picture timing game”, Journal of Marketing Research, Vol. 35
Leibenstein, H., “Bandwagon, Snob, and Veblen Effects in the Theory of Consumer Demand,”
Liebowitz, S. 2004. Will MP3 Downloads Annihilate the Record Industry? The Evidence So Far.
Advances in the Study of Entrepreneurship, Innovation, and Economic Growth, Vol. 15,
MacDonald, G.M. 1988. The Economics of Rising Stars. The American Economic Review 78(1)
155-166.
21
Mathews, A.W. 2000a. Sampling Free Music Over the Internet Often Leads to a Sale – Poll
Adds to Conflicting Data As Recording Industry Sorts Out Web’s Impact. Wall Street
Mathews, A.W. and M. Peers. 2000. Teen Music Buying Dropped Last Year, According to Data.
Mixon, F.G. and R.W. Ressler. 2000. A Note on Elasticity and Price Dispersions in the Music
Moe, Wendy W. and Peter S. Fader. 2001. “Modeling hedonic portfolio products: A joint
segmentation analysis of music compact disc sales”, Journal of Marketing Research, vol 38
Montgomery, Alan, L. and Moe, Wendy, W. 2000. Should record companies pay for radio
airplay? Investigating the relationship between album sales and radio airplay. Working paper,
Nelson, P. 1970. Information and Consumer Behavior. The Journal of Political Economy 78(2)
311-329.
Oberholzer F. and K. Strumpf. “The Effect of File Sharing on Record Sales: An Empirical
Peers, M. and L. Gomes. 2000. Music CD Sales Suffer in Stores Near `Wired' Colleges, Study
Radas, Sonja and Steven M. Shugan (1998). “Seasonal marketing and the timing of new product
Ravid, S.A. 1999. Information, Blockbusters and Stars: A Study of the Film Industry. Journal of
22
Rosen, S. 1981. The Economics of Superstars. The American Economic Review 71(5) 845-858.
Sawhney, Mohanbir S. and Jehoshua Eliashberg. 1996. “A parsimonious model for forecasting
gross box office revenues”, Marketing Science, vol. 15(2), pg. 113-131.
Seabrook, J., “The Money Note: Can the Record Business Survive?” The New Yorker, pp. 42-55,
July 2003.
Simon, G. 2003. Disharmony Over Music Pirates on the Internet, The Telegraph, Jan. 9, 2003, at
http://www.telegraph.co.uk.
Strobl, E.A. and Tucker C. 2000. The Dynamics of Chart Success in the U.K. Pre-Recorded
Towse, R. 1992. The Earnings of Singers: An Economic Analysis, in R. Towse and A. Khakee
23