Open access in context: a user study

David Nicholas (School of Library, Archive and Information Studies, University College London, London, UK)
Paul Huntington (School of Library, Archive and Information Studies, University College London, London, UK)
Hamid R. Jamali (School of Library, Archive and Information Studies, University College London, London, UK)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 23 October 2007

1417

Abstract

Purpose

The purpose of this research is to examine the impact on usage of the journal Nucleic Acids Research (NAR) moving to an open access model. A major objective was to examine the impact of open access in the context of other initiatives that have improved accessibility to scholarly journals. The study also aims to demonstrate the potential of deep log analysis for monitoring change in usage over time.

Design/methodology/approach

Data were gathered from the logs for the period 2003‐June 2005 and analysed using deep log methods. The data were analysed to provide the following information on use: type of item viewed; usage over time; usage for individual journal issues; usage per type of article; age of article. Usage analyses were further examined with regard to the following user characteristics: subscriber/non‐subscriber; referrer link employed, organisational affiliation; geographical location.

Findings

The analysis showed that the rise in use of NAR over the survey period (140 per cent) could largely be attributed to the opening up of the site to search engines and that the move to OA had a relatively small influence on driving usage up further (less than 10 per cent).

Originality/value

The study for the first time thoroughly analyses the usage data of a significant experimental open access journal and reveals the huge impact of search engines on driving up usage.

Keywords

Citation

Nicholas, D., Huntington, P. and Jamali, H.R. (2007), "Open access in context: a user study", Journal of Documentation, Vol. 63 No. 6, pp. 853-878. https://doi.org/10.1108/00220410710836394

Publisher

:

Emerald Group Publishing Limited

Copyright © 2007, Emerald Group Publishing Limited


1. Introduction

Proponents of open access publishing sometimes argue their case on the fairness of providing everyone with access to scholarly material, and by so doing, creating a scientific/academic level‐playing field. It is hypothesised that there is a world full of scholars who would use journals if they only could and the fact that they cannot (largely because of cost) is simply not acceptable in the internet age, an age where information is ubiquitous and free. Therefore, there would be an expectation that greater usage and a wider readership would result from a journal becoming open access.

This paper, by chronicling the move of a highly prestigious journal, Nucleic Acids Research (NAR), from a subscription to an open access model during 2004/2005 tests this assertion. It is based upon a project funded by OUP and the key outcomes from the study for publishers have appeared in an article in Learned Publishing (Nicholas et al., 2007). These outcomes were that scholarly journals, like NAR have opened their contents to non‐subscribers for some time and that open access publishing is just one in a long line of access initiatives and its impact pales in significance to that of search engines, like Google. This paper examines the detailed findings of the study, from an information science perspective, and focuses on the methodology adopted.

2. Aims

The first aim of this paper is to establish, through transactional log analysis, what happens to use and users of a journal (Nucleic Acids Research), originally available on subscription, when it becomes available to all, as a result of its publishers (OUP) moving it to an open access (author pays) model. The second aim is a methodological and information science one, which is to demonstrate a robust methodology through which change in usage can be monitored over time and the agents of change identified, in the case of this paper, specifically the move to open access.

Because open access is only one of a number of mechanisms which enhances a journal's accessibility the impact of open access is examined in the light of the performance of other access initiatives – making the journal available via consortia deals and search engines and their robots, existing free‐to‐view content resulting from the production of special issue, the ending of a (six month) time embargo and access via gateway sites, like PubMed Central. This is necessary not only because it benchmarks the open access initiative but also because the “other” access initiatives have the potential for muddying the waters as far as the impact of open access is concerned.

The transactional log methodology developed for the purpose of evaluating the behaviour of the virtual scholar, deep log analysis, was the preferred methodology by which impacts were monitored and evaluated. Logs represent the real actions of users – not their wishes, hopes or aspirations and as such provide a good means of testing the assertion that there is a potential demand for scholarly information that is not being met and that open access provides a possible solution. Simply put, logs could determine whether open access lead to more use and whether this use came from the kind of people who it aspires to reach (those currently not benefiting from the subscription model).

The following analyses were conducted:

  1. 1.

    type of page viewed (abstract, full‐text);

  2. 2.

    page views over time;

  3. 3.

    page views per journal issue;

  4. 4.

    page views per article (free, non‐free, partly free);

  5. 5.

    age of articles viewed;

  6. 6.

    type of user – subscriber/non‐subscriber;

  7. 7.

    type of user – by referrer link employed;

  8. 8.

    type of user – by organisational affiliation; and

  9. 9.

    type of user – by geographical location.

In order to determine change a before and after analysis was adopted. With NAR moving partly to an open access models in 2004 an then to a fully open access model in January 2005 it was necessary to conduct a study over a 30 month period, January 2003‐June 2005.

3. Nucleic Acids Research

The journal is highly rated having an ISI impact factor of 7.26 in 2004 and being named by ISI as one of the ten “hottest” journals of the decade. Before it moved to an open access model much of its content was already free to use. Essentially the content that was only available to subscribers was the most recent six months of content, and although this represented, at any given time, a relatively small percentage of content it was the most prized content.

In 2004 the Database and Web Server issues moved to the “author‐pays” open access model. Previously these issues had been free to all online users prior to 2004 so they were thought to offer a useful and safe test bed. Since January 2005 NAR, has been a fully open access journal. The title went open access on the basis of evidence obtained from an author questionnaire which showed authors' support for open access publishing. In a NAR survey in 2004, 63 per cent of respondents stated that they would support a full open access model (Saxby, 2006).

A summary of what was available/not available during the period 2003 to 2005 was as follows:

  1. 1.

    2003. The Database Issue, which is the first issue published each year was available free to view. All other articles in issues during 2003 were only available to subscribers for the first six months of their life. After six months at the end of the embargo they were made available to everyone. Further the Web Server issue was free for view in 2003.

  2. 2.

    2004. Most of the journal was under subscription control for the year, except for two large special issues (on databases and web servers), which were published under open access arrangements and the Database issue. The two specials were quite substantial and made up some 20 per cent of papers published during the year.

  3. 3.

    2005. All issues and articles available in open access from January and the full text became freely available from the PMC site.

4. Literature review

Citation studies have shown that open access articles were more heavily cited than non‐open access ones. Lawrence's (2001) short paper in Nature was one of the first papers to reveal this, thus this apparent open access advantage is sometimes called Lawrence Effect (Kurtz et al., 2005). Lawrence's research was taken up by others. A bigger study, which investigated articles from seven thousand journals from the ISI Web of Science, found an increase in research impact for open‐access articles in physics (Brody et al., 2004). The study showed an open access/non‐open access citation ratios of 2.5‐5.8, which was even larger than the effects reported by Lawrence (2001) whose study concerned conference papers in computer sciences. Antelman (2004) used ISI citation data of a sample of journals in philosophy, political science, electrical and electronic engineering and mathematics and showed that freely available articles had a greater research impact. However, this advantage did not apply to every field. A study by Kurtz et al. (2005) showed that in astronomy increasing access to articles did not increase the probability that they would be cited. The explanation for this was that there was no sizeable population of astronomers who were both authors of major journal articles and who did not have “sufficient” access to the core research literature. Kurtz et al. (2005) also maintained that the results implied that increasing access above a “sufficient” level had no influence on citation frequency.

One of the most interesting studies was that by Sahu et al. (2005), who used citation data to study the effect of open access on the Journal of Postgraduate Medicine, which adopted open access in 2001. They found, too, that open access was associated with an increase in the number of citations received by the articles and, most interestingly, it also led to a decrease in the time between publication and the first citation. In a longitudinal bibliometric analysis of a cohort of open access and non‐open access articles published in the same journal (Proceedings of the National Academy of Sciences), Eysenbach (2006) showed that articles published as an immediately open access article on the journal site had higher impact than self‐archived or otherwise openly accessible open access articles. He claimed that even in a journal that is widely available in research libraries, open access articles are more immediately recognized and cited by peers than non‐open access articles published in the same journal. He thought that open access publishing was likely to benefit science by accelerating dissemination and uptake of research findings.

Although citation impact has been a major focus of open access studies some commentators warn that authors do not cite all the articles they read (e.g. Simkin and Roychowdhury, 2003, 2005; Lamport, 2005) so the net should be widened. Harnad and Brody (2004) point out that the ratio of “reads” to “cites” varies by field. For example, Kurtz (2004) reports it as 17:1 and even 12:1 in astrophysics. Thus there is a need for utilising usage data in order to evaluate the full impact of open access on use and users, especially as it thought that restricting access has an important effect on use. In fact a study by Kurtz (2004) showed that restrictive access policies can cut article downloads to half the free access rate.

It should be mentioned that at the same time as the current study was being carried out, Creaser (2006) undertook a parallel study on NAR and two other OUP journals. She carried out citation analysis of articles in the journals and found out that citation rates for NAR were falling prior to the start of the open access experiment and this trend appeared to be continuing. However, she conceded the fact that at the time of her analysis little citation data were available for open access articles in the studied journals. This was due to the time delay inherent in articles receiving citations.

5. Methodology

Deep log analysis a methodology employed successfully on evaluations of a number of digital libraries including OhioLINK and ScienceDirect (Nicholas et al., 2005; Nicholas et al., 2006), was employed to profile the usage and user characteristics of NAR. The term “deep” is used because the raw server logs are more deeply mined than those logs provided by proprietary software and furnish a much wider range of use and user metrics, as this paper will demonstrate. It is also appropriate because use is related to user characteristics, in the case of this paper by:

  • referrer link employed;

  • whether the person was a subscriber or not; and

  • geographical location and organisational affiliation of the user.

Raw server transactional log files were supplied for the years 2003, 2004 and 2005 (first six months). The raw log files were loaded into SPSS (Software Package for Social Sciences) and lines related to the viewing of abstract and full text downloads were selected. The following box gives an example of a full text line from 2005; the internet protocol address has been made anonymous by substituting xxx's for numbers (see Figure 1).

The IP (Internet Protocol) number is a numeric address that is given to users connected to the internet. The date and time is the date and time stamp of when the file was sent to the user's computer. The file name gives an idea of the type of download (full for full text and abstract for abstract) and the volume, issue and page number of page delivered. Other information includes status of delivery (codes 200 and 304 or successful downloads), the browser details of the user and here also a session identification number. The session number was only available for 2004 and 2005.

The following use/user metrics were generated:

  1. 1.

    Page or item viewed. The main use metric and is the unique item (abstract or article) viewed in a session. Repeat views to the same page within a session were excluded as there was double counting in the logs. This double counting arose because the same article view, particularly in PDF, had repeated entries for the same user session. An additional view was recorded for users requesting a PDF item. Many users who had looked at an HTML full text version then went onto view the same article as a PDF.

  2. 2.

    Session. A session was considered to have been completed if the lapse of time between requests assigned to the same IP number exceeded seven and half minutes. Session calculations are complicated in that a user's session may well be longer than the site session recorded on NAR online. This occurs when a user searches a number of sites at the same time, jumping in and out of sites.

  3. 3.

    Users. Logs provide a user “trace”, but not real user or individual identification. This trace is the Internet Protocol (IP) number, which provides the name of the institution to which the user belonged. The IP number cannot be traced back to an individual, only to a computer – sometimes to a multiple user computer, such as those found in libraries. Furthermore, the IP number might have been allocated temporarily to a computer (floating IP numbers). Similarly, the use of proxy servers mean that the IP address cannot be assumed to relate to use on a specific machine.

Before 2005 all NAR content older than six months was accessible to everyone. To access more current articles a subscription was required and the IPs of institutional subscriptions were supplied which meant a distinction between subscribers/registered users and non‐subscribers/registered users could be made.

Robots are electronic agents used by search engines and organisations to log and index pages into a database. Robot activity is recorded as a use in the log file. Robots were identified by their visit to the “Robot.txt” file located on the server and there use was excluded (unless specified) from the analysis. Two years before the move to partial open access robots or agent activity made up just 2 per cent of article views; however, in the May 2004 transition period, robots accesses made up 20 times this figure or 42 per cent of usage, and in the following year made up 20 per cent of article use. Robots have been generally excluded from the analyses, and this is always indicated in the figures.

Prior to January 2005 to access NAR full‐text PMC users had to visit the OUP site, but this was not necessary after that date Thus from January 2005 PMC usage of NAR no longer showed up in OUP logs. PMC's own data could not be used because it was not compatible with deep log requirements. This meant a leakage of about 2‐3 per cent at least and because this would have undermined “the before and after” analysis PMC data was stripped out.

This paper focuses strongly on methods and amongst the results section further details of methods can be found.

6. Results

During the period January 2003 to June 2005 1,500,000 separate internet protocol (IP) numbers used NAR; they conducted more than 7,500,000 sessions and viewed approximately 13,500,000 unique pages. These are very large and impressive figures, especially for a single journal, albeit a rather large one in terms of the number and size of its issues.

6.1 Use

6.1.1 Article and abstract page views

For the first six months of 2003 176,000 views were made per month and this rose very sharply from July 2003, reaching a peak of about 287,000 in November 2003 – a 60 per cent increase all told. Apart from a seasonal dip in the December, use remained at this level until March 2004. Use then dipped significantly, falling to a relatively low level of 220,000 in August 2004. Use subsequently rose in the September to January period, reaching a new high in January 2005 of 424,000 views. The peak in January 2005 was over twice as much as recorded two years previously in January 2003 (Figure 2).

The main finding is that use of NAR has increased – and dramatically so. It increased by 143 per cent from January 2003 to January 2005[1]. Given that NAR is a well established, research journal with a defined, specialist readership, the fact that so much growth could occur in such a relatively small time has to be regarded as surprising. NAR went open access on 1 January 2005 and, significantly, usage increased between November 2004 and February 2005 by 19 per cent. On the face of it then, there appears to be an open access impact, especially as most of the increase (16 per cent) occurred between December 2004 and January 2005 when the journal changed to full open access.

Consider the differences users searching in December and January encountered. Firstly, those people searching in December who were not registered users would have found that articles published July to December were embargoed and they could not view the full‐text of these articles. Secondly, those searching in January might have been influenced by the announcements of the journal moving to OA – after all, this was the first journal to adopt this form of open access publishing, whereby the existing subscriber model simply changed to an OA one and the publicity would have inevitably generated interest. Hence an increase in usage would have been expected as a result of the journal changing publishing mode.

However, care needs to be taken in ascribing the 19 per cent increase wholly to open access as growth has been occurring for a long time and that pre‐2005 major increases have already taken place. Thus usage had increased in the two year period prior to the move to open access by 143 per cent, an increase of about 4 to 5 per cent a month. The danger is that the factors driving this usage may also be driving the increase in January and February 2005 that appear to be the result of open access.

An additional complicating factor was that January and February were traditionally recovery periods from the traditional December low. Hence part of the 19 per cent growth posted maybe accounted for by this recovery. To get closer to the truth it was decided to compare the month on month percentage increases immediately before and after the open access launch – November 2004 to March 2005, to the same period in the previous year (Table I).

Both periods saw a percentage fall in December although the fall in 2003 was greater; −12.8 compared to −9.9. The main recovery in 2003/2004 occurred in January when usage increased by 31.2 per cent although much of this was a recovery from the falls in December 2004. This increase was significantly higher than the increase in February 2005 of just 5.4 per cent. Looking at the difference between November and February the increase over this period was 19.2 per cent in 2004/2005 and 10.4 per cent in the 2003/2004 period. It seems that the increase in 2004/2005 was higher, by nearly as much again, as the increase recorded in 2003/2004. It seems that usage really did increase as a result of OA.

It was decided to focus in on the change in article usage between the three week period before the winter semester break and the three weeks after. Figures 3 and 4, respectively, give the weekly article downloads for 2004/2005 and 2003/2004. Comparing these two periods[2] average daily use grew 6 per cent in 2004/2005 (Figure 3) but fell by −9.5 per cent (Figure 4) in the same periods[3] of the previous year. Taking the previous year as an example it would have expected that usage would have fallen but usage in the open access period actually increased.

It is apparent from Figure 3 that usage from January was more assured. The highest weekly use for 2004, of 72,935 article views, was recorded in week 50 – the week beginning 9 December. This figure was then exceeded in the second week of January 2005 by 4 per cent and again in week 5 by 6.3 per cent. In fact in the 2005 period considered, the high in December 2004 was exceeded in all weeks except the first. This only happened once during the 2003/2004 period (see Figure 4).

6.2 Session analysis

6.2.1 Type of page viewed (article, abstract)

Views to table of contents and lists of issues have been excluded for this analysis because open access is all about opening up full‐text to users who could not access it because of the subscription model and therefore the concern is only with access to content. Figure 5 gives the daily use of items (articles or abstracts) viewed within an online session. Article only sessions accounted for about two‐thirds of use, abstract only sessions for a quarter and sessions where both were viewed accounted for 10 per cent of sessions. Given that users had a choice to view an article or abstract the fact that one in four sessions just viewed abstracts is evidence of the enduring popularity of the abstract.

Table II gives the percentage month on month changes in usage for the periods before and after the introduction of open access. Although there was a lift in abstract only use in December 2005 of 27 per cent this in part was compensation for the fall in abstract usage in November, there is a downward trend in abstract only session use in the first three months of 2005. The real growth in usage came from article only sessions (a 19 per cent increase in January) and in mixed sessions, where articles and abstracts were viewed (a 25 per cent increase in January). Thus, as result of moving to an open access model the signs were of a shift from abstract only sessions to a greater use of articles. This would have been expected.

Focusing on the three week period before the winter semester break (weeks 48 to 50, 2004) and the nine weeks following the break (weeks 2 to 10, 2005) abstract only sessions decreased by 13 per cent but article only usage increased by 7 per cent; sessions viewing both increased by 10 per cent. This provides more evidence of a shift from abstracts to articles, as result of the move to open access.

The high proportion of article only sessions is partly a function of the fact that for PubMed users, who made up about a third of users (Figure 6), abstracts were viewed on the NAR site as part of a sifting articles for relevance process and partly by the fact that it was just as easy to view an article as an abstract. Thus, looking at users who did not access the site via PubMed, just under two‐thirds (58 per cent) viewed articles, 27 per cent abstracts and 16 per cent both. This compares to just 1 per cent of PubMed users who viewed abstracts on the NAR site, 85 per cent of these users viewed articles and 14 per cent viewed articles and abstracts on the site.

6.2.2 Type of article viewed (free/non‐free status)

Open access is a mechanism for making articles free to view and the impact on usage of articles once not free to view is a key part of the investigation. Of course, free material was already being offered free prior to the introduction of full open access and its impact must be seen in that context. For the purpose of this analysis free status articles were:

  • articles available free after release from a six month embargo period;

  • articles from the two issues a year which were released within the (six month) embargo period – these were the first issue in each year and issue 13 – these became open access in 2004; and

  • all articles published once the journal went open access from the January 2005.

The earliest available, and used, free issue on the NAR site was Vol. 2 No. 1 (January 1971). Given that NAR is published 24 issues a year and those issues more than 6 months old were free to view, then as of January 2003 (Vol. 31) there would be approximately 684 issues available free. The only issues not free would be the latest 12 issues that were still under embargo. Hence about 2 per cent of the available issues on the site were not free at the point of search. Clearly in science, the more recent the articles the more attractive they would be, so these statistics do not actually reflect the importance of what was being offered as a consequence of open access. This material was actually the “diamonds in the mine”.

Sessions were grouped by what was viewed: those that just viewed free items, those that just viewed non‐free items and sessions where both were viewed. Free article sessions increased notably in June 2003, the date search engines were allowed in, by about 30 per cent and there after increased until January 2005 when all articles became free. In terms of sessions where just non‐free items were viewed the use of articles remained constant throughout the period.

Free article only sessions accounted for about two thirds (63 per cent) of article views in the first two quarters of 2003. This rose to three quarters from the first quarter 2004; but did not rise above 81 per cent. Sessions in which non‐free items only were viewed accounted for about a quarter of all session in the first two quarters of 2003 and this fell to 14 per cent by the last quarter of 2004.

Figure 6 gives the distribution of referrer information across quarters in 2003 and 2004 for free item sessions only and helps answer the question who used free items. At the beginning of the period free item use was predominately accounted for by users coming in from PubMed (47 per cent) and those coming from elsewhere in the Oxford Journals site (33 per cent) or from the NAR homepage. This changed from the third quarter 2003. From that time free articles were increasingly used by people coming in via search engines and it was this group that were behind the increased take‐up of free articles. Those coming in via search engines made up a quarter of free item use in the third quarter of 2003; however, they had increased their use share of free articles to over a half (51 per cent) by the fourth quarter of 2004.

In conclusion, it can be stated that the long term growth in article usage was driven by the demand for free articles.

6.2.3 Age of article

It was decided to focus on articles six months old or more recent. Before January 2005 these would have been embargoed and only available to subscribers and users linking in from an external site such as PubMed or PMC. After January 2005 articles six months and more recent were free to access. This analysis examines the question of whether the views to these newer articles had increased directly as a result of the journal going open access. Figure 7 gives the number of views to older and younger material. And there does appear to be quite a major open access impact. There was a rise in usage of six month or recent material after January 2005. The rise in usage in January 2005 is true in regard to both sessions where just recent material was viewed and in sessions where both young and old (Mixed) material was viewed (solid black line). This is looked at in more detail below.

It was decided to investigate the daily average[4] for the months November 2004 to March 2005 (Table III). A 5 per cent trimmed mean was used to handle some of the fluctuations of data; however, this did not fully accomplish this. Thus the percentage increase in usage of older material in between December and January was about 21 per cent. However, some of this increase represented a recovery from the low December semester break figure which showed a fall of 11 per cent. A comparable trend was recorded for the period 2003 to 2004, although the pickup from the December fall (−7 per cent) did not really happen until February (25 per cent). Generally, the percentage increase of older material between December 2004 and January 2005 was much in line with what was expected.

In terms of average daily article views for sessions just viewing the most recent 6 month material this increased by 42 per cent on NAR becoming open access. Again, part of this increase reflects the poor figures recorded in December, which recorded a fall of 11 per cent on the previous month. Comparisons with the previous year (2003‐2004) were made difficult as a result of a 64 per cent month‐on‐month increase recorded in February. This increase was exceptional and should be considered in the light of the month on month fall in December (2003) of 27 per cent, which was followed by a fall in January (2004) of −3 per cent and in March of 17 per cent. It is believed that the real increase in views for sessions just viewing recent material in January 2005 was about 40 per cent.

With regard to those people viewing both old and recent material in a session their use of articles doubled (103 per cent) between December 2004 and January 2005. Clearly these users have taken up the open access opportunity and increased their usage; this in part reflects the number of sessions including a view to recent material in their sessions. Interestingly this group's usage declined by 11 per cent in February.

A further investigation was undertaken to examine a narrower time window: the three week period (weeks 48 to 50) prior to the winter semester break was compared to the nine week following the break (week 2 to 10). The use of material six months and older increased just 3.5 per cent, while recent material, which was previously embargoed, increased by 27 per cent and use by those sessions viewing both increased by 79 per cent.

In conclusion, there has been an increase in the use of more recent material as a result of it becoming freely available under open access. This increase is currently quite small in absolute numbers; however, this may increase as users become more aware of the full implication of what has occurred.

It might be expected that open access would benefit the type of user who likes to keep up to date – clearly with OA “current awareness” now comes free. Clearly current awareness users will return more frequently. However, there was an insufficient length of time to test this assertion. The above analysis of recent (below six month) material suggests a positive result – a 40 per cent increase in those just looking at current material.

Clearly there is a digital visibility (Nicholas et al., 2002) issue here – how do “currency” interested users know the journal has gone OA? The gain in mixed aged material suggests that users are finding recent material as a result of exploring or linking in the site. It would be expected that, in the long term, users would become aware that recent material is now free. So the full impact of OA may well take a period (18 months) to be fully realised.

6.3 Issue‐level analysis

About 2 per cent of all articles in NAR database were embargoed for a period of six months, which meant only registered users could view them. What happens to these articles after the embargo ended will now be examined. In this regard Figure 8 charts article views over time (2003 to mid 2005) for the second issue of NAR published in 2003. This issue was first made available on the 10 January. Use in the first two days was high and reached over a 1,000 views but fell steeply to about 200 views a day within a month. Clearly the initial high is explained by current awareness use and digital visibility. By the end of March the issue had reached the beginning of the long, but clear, tail‐off in use of approximately 50 to 75 views a day.

Figure 9 gives the monthly view of the long tail off in usage of this issue. The issue became free on the 10 July 2003 – six months after publication. Interestingly, there was no visual evidence of a lift in usage as a result of the issue becoming freely available – it seems that OUP knew what they were doing in opting for a six month embargo, because that was where maximum use was concentrated. However, there is visual evidence that usage did not carry on declining at the same rate – there was even growth running up to the December 2003, although clearly this is a growth period in usage. In this case making the article freely available appears to have had only a limited impact on usage. There appears to be no pent up demand as a result of the embargo period, for this issue. In all there were 41,061 article views to this issue in 2003 – 27,323 were non free views and 13,738 were free views.

6.4 Users

In 2003 approximately 417,000 unique IP numbers accessed the site, the number of new IP numbers visited in 2004 was about 598,000 and a further 455,000 new IP's visited in 2005 (half year). “New” in this context mean excluding those who visited the previous year.

Figure 10 gives the daily (by date of first access) distribution of new IP numbers accessing the NAR site. The daily rate of new IP addresses accessing the site remained within a band of about 1,000 to 2,000 up until September 2004. There was an increase in the number of new IPs accessing during the period September to December 2004 – the typical growth period for the site corresponding to the busy autumn semester. There has been a decrease in new IP number joining from April/May 2005. There does not appear to have been an increase in new IPs as a result of open access (marked by the dotted line), which might have been expected.

6.4.1 Types of user

Registered users (subscribers) v. non‐registered users. Clearly the success of open access will be determined by whether it draws in non‐subscribers. Figure 11 examines article usage broken down by whether the user was a registered user or not by day. Use by registered scholars for the first six months of 2003 was running at more than double the usage of non registered users: approximately 2,900 views a day compared to about 1,300. From July 2003 paths narrowed and use from non registered users was increasing at a faster rate. In fact, by October 2004 there were nearly as many non‐registered use as registered article use (3,900 daily views). Usage by both groups reached a peak of about 5,600 daily views in January 2005. The extent of non‐registered use shows that there was a strong demand for NAR outside of the consortia/big deals.

Use by both registered and non registered users appeared to follow the academic pattern with less use recorded in the summer, when many researchers were away at conferences, and greater use recorded in the winter and spring period. In addition both groups recorded a strong increase in usage, especially in the period October 2004 to March 2005, when another driver was impacting (see following search engine section). Usage by non registered users increased at a faster rate compared to registered users particularly in 2003 – 140 per cent, compared to 35 per cent for the period May 2003 to November 2003.

A more focused analysis comparing the three week period (weeks 48 to 50) prior to the semester break to the nine week period after (weeks 2 to 10) argues that, counter intuitively, use by registered users actually increased in the period following open access compared to use by non‐registered users.

An examination of the use distribution of referrer link by registration status of user showed that most registered users (predominately academic organisations), entered the site via PubMed (40 per cent) but increasingly did so from search engines (25 per cent), and also via OUP menus (22 per cent). For non‐registered use it was mainly via search engines (40 per cent) and PubMed (25 per cent).

In conclusion, use by non‐registered users has always been an important feature of NAR. The door was always open to such users (via PubMed Central) and as the door opened even wider (because of open access and search engine access) then this has driven usage levels even higher.

Organisational affiliation. The change in the type of people using the journal was investigated. A reverse DNS lookup on the IP numbers was completed and organisation type was extracted. DNS information was not available for 51 per cent of IP addresses. Figure 12 gives the distribution of organisation type for registered users for each quarter from 2003 to the second quarter 2005. The main organisation type, as expected, was academic and accounted for about 87 to 90 per cent of registered use.

Examining the month on month figures (Table IV) for academic and net provider use (organisations registered as internet access providers), January saw academic use record its highest increase of 30 per cent, however, this was making up for the low December figure (−19 per cent). The overall pattern is muddied by the usage increase in March by net providers and this may reflect users accessing the NAR service from home during the spring break. The spring break had a negative impact on academic usage (−4.4 per cent). Taking the November 2004 to February 2005 period there was not a lot of difference in growth of article usage between Academic and Net Providers: 16 per cent and 17 per cent.

Geographical location. Clearly OA is potentially hugely beneficial to people in poorer countries where users cannot afford to access, for what is to them, are expensive databases.

Figure 13 gives the item (articles and abstracts) use by month by DNS country type. The main growth in use has come from USA registered users who viewed approximately 41,000 items in January 2003 and about 150,000 in January 2005, an increase of 200+ per cent. However, many users with a DNS address registered in the USA were not located in the USA. In fact many of these users would be located abroad. More certainty can be had of the other geographical designations.

African usage continued to be rooted to the floor of the Figure, but there were signs that during 2005 use grew. South American use, similarly towards the floor of the Figure, also appeared to be on a slow rise, possibly boosted by open access. However, the greatest impact appears to be on East European countries (Figure 14).

Table V gives the month on month percentage changes over the period covering the introduction of open access. The biggest increase was the growth in usage in January 2005 of 88 per cent. Part of this increase was a clawing back of the traditional December loss (−15.4), a consequence of the winter semester break. The period November 2004 to February 2005 saw article usage increase by 59 per cent. However, this period was a seasonal growth period for Central and Eastern Europe countries and last year (2003/2004) this region notched up an increase of 30 per cent.

A comparison of the three week period prior to the winter semester break with the nine week period following the end of the break for 2004/2005 and the previous year 2003/2004 showed that article usage in the open access year increased by 43 per cent compared to 26 per cent in the previous year. So there is clearly an open access effect although this effect has occurred at a time when this region seasonally increases its usage. It is estimated that the increase in article usage by East Europeans as a result of going open access is about 20 per cent.

Referrer link. The interest here lies with the last site from which NAR was accessed. It is known from previous CIBER studies (Nicholas and Huntington, 2004) that net provider users will tend to use search engines to access and search the net. This use of search engines shows up in the referrer logs. It was decided to examine for changes in the last site visited for both registered and non registered users. Referrer information was not available for a third (32.9 per cent)[5] of IP addresses.

Figure 15 gives the monthly article use by referrer group for 2003 to the second quarter 2005. Referrer information was classified into five groups: OUP journals, PubMed, Search engine, external link and other. OUP menus referrer users were those who accessed via the NAR home page or from other OUP web site pages. With regard to external links these users had arrived at the main OUP site via an external organisation or website such as a link on a university site, a search engine.

PubMed is an external (gateway) website from which users could access NAR material and is a service of the US National Library of Medicine and administer by NCBI. PubMed includes links to full text articles and other related resources, for instance to NAR. Search engine users were classified as just those using one of the following search engines Google, Yahoo, MSN, Altavista, Ask and Earthlink. “Other” relates to all other referrer groups (for example www.highwire.stanford.edu, www.genome.org, www.sciencedirect.com).

There has been a substantial shift in the make‐up of users by referrer link ‐ most notably there has been an increase in the share of accesses via a search engine. The increase has come from a position where they accounted for less than 1 per cent in the second quarter of 2003 to account for about a half (49 per cent) of sessions in the second quarter 2005. This represents a dramatic and major shift in scholarly information seeking behaviour. However, other groups have remained relatively stable.

Search engine robots were allowed to index the site for the first time in June 2003. Thus from June 2003 users who favoured using a search engine to access web based scholarly information could find the NAR site. This represented a shift in the way that users could navigate to NAR and to navigate to information within NAR. With a search engine users could both navigate the net to find the site and navigate the site to find content.

Thus in June 2003 the digital “flood gates” were open and in the global environment that is the web this is clearly the prospect of large demands being made, although one might of thought that the highly specific and difficult nature of the content would afford some protection.

Daily movements. Figures 16‐18 give the daily movement of each referrer link category and these show real differences. Figure 16 plots article use by those finding the NAR via a search engine, Figure 17 concerns OUP menus and external links and Figure 18 looks at daily movements of the Other and PubMed categories.

The daily movement of those entering via a search engine highlights the falling of use in the summer months. Usage more than halved between May and July. This is thought to reflect the fact that search engines are most likely to be used by students, a group of users who are unlikely to use the service in the summer vacation months.

A comparison between article usage of those entering via a search engine during the three week period (48 to 50) before the semester break and to the nine week (weeks 2 to 10) period after the break shows little change.

Figure 17 details users coming in via OUP menus and external links. Use by either of these two groups did not fall dramatically during the summer vacation months.

A comparison of article usage before and after the Xmas semester break showed there was a 30 per cent increase in usage from those accessing the site via OUP menus. This is thought to be explained by the fact that those users coming into the site via the home page were more likely to be alerted to the fact that all of NAR was now free.

Figure 18 gives the daily use movement for those accessing the site via PubMed and Other referrer links.

Usage by those entering the site via PubMed increased by 12 per cent in the post nine week (2 to 10) open access period and this might reflect the updating of links on the PubMed site.

7. Conclusions

By drilling deeply into the logs we have been able to determine precisely what agents and initiatives are actually responsible for use and how that use is apportioned. We believe this is the first time this has been undertaken. Thanks to the methodology adopted it has also been possible to identify specific user communities to determine the relative impact of OA. Possibly, ironically, the main finding of a study which set‐out to investigate the impact of open access publishing proved to be the major impact that search engines were having on NAR. Search engines were the main factor in driving up usage and changing the character of the NAR user base. Search engines moved from a position where they accounted for less than 1 per cent sessions in the second quarter of 2003 to about a half of sessions in the second quarter 2005.

By contrast – and as a consequence not supporting the findings of many citation studies, open access publishing had a relatively small immediate impact, adding in total less than 10 per cent on usage. The size of the impact could be explained in part by the fact that that much content was free already. There were important regional variations, thus there was approximately a 30 to 40 per cent overall increase, after taking into account the seasonal growth, in article usage from Eastern Europe that may be attributed to OA. There is evidence of an increase of usage of recent articles (less than six months old) that became free as a result of OA, particularly where these articles were found in a session where users were also searching for older historically free material. This suggests that users were making use of content available under OA as and when the need occurred, or as they found this material. However, because recent articles constituted a small percentage of the population of articles the impact will inevitably be relatively small. Undoubtedly there has been an OA effect but we are not comparing, in the case of NAR, an OA state with a non‐OA state. In the period before OA NAR journal articles older than 6 months were available free so the comparison here is a half way point to OA and non‐OA. However, perhaps what was most surprising is that so much growth could occur in the case of a specialist, advanced scholarly journal which one would have assumed to have had a limited audience, just by opening the site to search engines or the so called “Google Generation”. That is it might be expected that usage of a journal going from a complete non‐OA state to an OA state, which would include the site being opened up fully to search engine indexing, to increase by three or four fold. In fact, the NAR figures suggest that this increase will not occur overnight but will develop over a period and further that the impact is moderated by how users find out about OA and how digital services are used. In the case of NAR moving from a half way point to full‐blown OA it would be assumed that there is some further growth in the pipeline, predominately from users from second and third world economies. It is forecasted that this will have an overall effect on usage by as much as 20 per cent.

The findings show how important it is not to look at open access too narrowly or simplistically, and to consider it in the light of a number of factors that have led to scholarly journals becoming more accessible and used. As it turns out, in this case, the search engine has proved a real friend of the digital scholarly user in opening up scholarly content.

Figure 2  Number of articles viewed, smoothed, by day – January 2003‐June 2005

Figure 2

Number of articles viewed, smoothed, by day – January 2003‐June 2005

Figure 3  Weekly downloads from 18 November 2004 to 4 March 2005

Figure 3

Weekly downloads from 18 November 2004 to 4 March 2005

Figure 4  Weekly downloads from 19 November 2003 to week 3 March 2004

Figure 4

Weekly downloads from 19 November 2003 to week 3 March 2004

Figure 5  Use of articles and abstracts within sessions, January 2003 to June 2005

Figure 5

Use of articles and abstracts within sessions, January 2003 to June 2005

Figure 6  Distribution of referrer information across quarters 2003 and 2004 – free item sessions only

Figure 6

Distribution of referrer information across quarters 2003 and 2004 – free item sessions only

Figure 7  Monthly number of articles viewed by age of article (session groupings), January 2003‐June 2005

Figure 7

Monthly number of articles viewed by age of article (session groupings), January 2003‐June 2005

Figure 8  Article views over time by day (2003 to 2005) to NAR Vol. 31 No. 2

Figure 8

Article views over time by day (2003 to 2005) to NAR Vol. 31 No. 2

Figure 9  Article views over time by month (2003 to 2005) of NAR Vol. 31 No. 2: detailed examination of tail‐off in use

Figure 9

Article views over time by month (2003 to 2005) of NAR Vol. 31 No. 2: detailed examination of tail‐off in use

Figure 10  Daily (by date of first access) distribution of new IP numbers joining the NAR site

Figure 10

Daily (by date of first access) distribution of new IP numbers joining the NAR site

Figure 11  Total number of items viewed by month January 2003‐June 2005 – registered and non registered users (excluding robots)

Figure 11

Total number of items viewed by month January 2003‐June 2005 – registered and non registered users (excluding robots)

Figure 12  Quarterly distribution of organisation type for all users (use) 2003 to 2005

Figure 12

Quarterly distribution of organisation type for all users (use) 2003 to 2005

Figure 13  Number of articles viewed by month January 2003‐June 2005 by DNS location of user (excluding robots)

Figure 13

Number of articles viewed by month January 2003‐June 2005 by DNS location of user (excluding robots)

Figure 14  Articles use* by week East European countries January 2003 to June 2005

Figure 14

Articles use* by week East European countries January 2003 to June 2005

Figure 15  Monthly movements of article use by referrer groups – 2003 to 2005

Figure 15

Monthly movements of article use by referrer groups – 2003 to 2005

Figure 16  Daily movements of article use for those entering via a search engine – 2003 to 2005

Figure 16

Daily movements of article use for those entering via a search engine – 2003 to 2005

Figure 17  Daily movements of article use for those entering via OUP menus and external link – 2003 to 2005

Figure 17

Daily movements of article use for those entering via OUP menus and external link – 2003 to 2005

Figure 18  Daily movements of article use for those entering via a PubMed and other – 2003 to 2005

Figure 18

Daily movements of article use for those entering via a PubMed and other – 2003 to 2005

Table I  Daily average use figures (5 per cent trimmed mean) for period November to March – 2005 and 2004 compared

Table I

Daily average use figures (5 per cent trimmed mean) for period November to March – 2005 and 2004 compared

Table II  Session types by item viewed – daily average figures (5 per cent trimmed mean) for November 2004 to March 2005

Table II

Session types by item viewed – daily average figures (5 per cent trimmed mean) for November 2004 to March 2005

Table III  Daily average figures (5 per cent trimmed mean) for November 2004 to March 2005 by age of article viewed

Table III

Daily average figures (5 per cent trimmed mean) for November 2004 to March 2005 by age of article viewed

Table IV  Article use by academic and net‐provider user

Table IV

Article use by academic and net‐provider user

Table V  Article use by Central and Eastern European countries

Table V

Article use by Central and Eastern European countries

Notes

Based on monthly daily average: January 2003 this was 5,687 and for January 2005 13,690.

Weeks 49 to 51 in 2004 with weeks 2 to 4 in 2005.

In this case weeks 48 to 50 2003 were compared with weeks 2 to 4 in 2004.

Based on actual figures and 5 per cent trimmed mean used. This ignores 5 per cent highest and lowest values.

Quarters recording the lowest frequency of empty referrer fields were the fourth quarter of 2004 (31 per cent) and the second quarter of 2005 (31 per cent), the quarter with the highest percentage was the fourth quarter of 2003 (41 per cent).

Corresponding author

Hamid R. Jamali can be contacted at: [email protected]

References

Antelman, K. (2004), “Do open access articles have a greater research impact?”, College & Research Libraries, Vol. 65 No. 1, pp. 37282.

Brody, T., Stamerjohanns, H., Harnad, S., Gingras, Y., Vallieres, F. and Oppenheim, C. (2004), “The effect of open access on citation impact“, paper presented at National Policies on Open Access (OA) Provision for University Research Output: an International meeting, Southampton University, Southampton, 19 February 2004, available at: http://opcit.eprints.org/feb19oa/brody‐impact.pdf (accessed 29 August 2006).

Creaser, C. (2006), “Evaluation of open access journal experiment: stage 2 interim data”, Assessing the Impact of Open Access, Preliminary Findings from Oxford Journals, Oxford University Press, Oxford.

Eysenbach, G. (2006), “Citation advantage of open access articles”, PLoS Biology, Vol. 4 No. 5, e157 (accessed 1 September 2006).

Harnad, S. and Brody, T. (2004), “Comparing the impact of Open Access (OA) vs non‐OA articles in the same journals”, D‐Lib Magazine, Vol. 10 No. 6, available at: www.dlib.org/dlib/june04/harnad/06harnad.html (accessed 1 September 2006).

Kurtz, M.J. (2004), “Restrictive access policies cut readership of electronic research journal articles by a factor of two”, paper presented at National Policies on Open Access (OA) Provision for University Research Output: an International meeting, Southampton University, Southampton, 19 February 2004, available at: http://opcit.eprints.org/feb19oa/kurtz.pdf (accessed 29 August 2006).

Kurtz, M.J., Eichhorn, G., Accomazzi, A., Grant, C., Demleitner, M., Henneken, E. and Murray, S.S. (2005), “The effect of use and access on citations”, Information Processing & Management, Vol. 41 No. 6, pp. 1395402.

Lamport, L. (2005), “Authors please, not content”, Annals of Improbable Research, Vol. 11 No. 3, p. 2.

Lawrence, S. (2001), “Free online availability substantially increases a paper's impact”, Nature, Vol. 411 No. 6837, p. 521.

Nicholas, D. and Huntington, P. (2004), “Blackwell's digital users: their characteristics, preferences and information seeking behaviour: a deep log analysis”, University College London, London (restricted circulation report).

Nicholas, D., Huntington, P. and Jamali, H.R. (2007), “The impact of open access publishing (and other access initiatives) on use and users of digital scholarly journals”, Learned Publishing, Vol. 20 No. 1, pp. 1115.

Nicholas, D., Huntington, P. and Watkinson, A. (2005), “Scholarly journal usage: the results of a deep log analysis”, Journal of Documentation, Vol. 61 No. 2, pp. 24880.

Nicholas, D., Huntington, P., Jamali, H.R. and Watkinson, A. (2006), “The information seeking behaviour of the users of digital scholarly journals”, Information Processing & Management, Vol. 42 No. 5, pp. 134565.

Nicholas, D., Huntington, P., Williams, P. and Gunter, B. (2002), “Digital visibility: menu prominence and its impact on use of the NHS Direct information channel on Kingston Interactive Television”, Aslib Proceedings, Vol. 54 No. 4, pp. 21321.

Sahu, D.K., Gogtay, N.J. and Bavdekar, S.B. (2005), “Effect of open access on citations in a small biomedical journal”, 5th International Congress on Peer Review and Biomedical Publication, 16‐18 September 2005, Chicago, USA, available at: http://openmed.nic.in/1174/01/PeerReview.pdf (accessed 1 September 2006).

Saxby, C. (2006), “NAR author and reader survey”, Assessing the Impact of Open Access, Preliminary Findings from Oxford Journals, Oxford University Press, Oxford.

Simkin, M.V. and Roychowdhury, V.P. (2003), “Read before you cite!”, available at: http://arxiv.org/ftp/cond‐mat/papers/0212/0212043.pdf (accessed 23 March 2007).

Simkin, M.V. and Roychowdhury, V.P. (2005), “Do copied citations create renowned papers?”, Annals of Improbable Research, Vol. 11 No. 1, pp. 249.

Related articles