Improving the relevance of web menus using search logs: a BBCi case study

Paul Huntington (Centre for Information Behaviour and the Evaluation of Research (Ciber), School of Library, Archives and Information Studies, University College London, London, UK)
David Nicholas (Centre for Information Behaviour and the Evaluation of Research (Ciber), School of Library, Archives and Information Studies, University College London, London, UK)

Aslib Proceedings

ISSN: 0001-253X

Article publication date: 1 January 2006

674

Abstract

Purpose

The paper seeks to propose a method for selecting menu items based on an analysis of user‐entered search terms. Menu pages inform users what is coming next and what questions are going to be answered by an information communication technology service. Menus need to reflect user needs. The paper aims to argue that users reveal the scope of their information needs by the words used in their search expressions and these can be analysed to inform menu titles.

Design/methodology/approach

The paper presents an analysis and classification of user search expressions that are automatically collected by the server. The paper examines the search expressions of about 1,000 users of the BBC site related to search expressions on diabetes.

Findings

The search expressions were classified, analysed and compared with the diabetes menu of three health sites: NHS Direct (www.nhsdirect.nhs.uk); BBC health (www.bbc.co.uk); and Diabetics UK (www.diabetes.org.uk). Finally, a six‐point menu is derived.

Practical implications

The practical implication of this paper is development of relevant web menus based on user information needs as revealed in search expressions entered by users.

Originality/value

This is the first explanation of how search logs can be used to construct menu lists. Previously menus have been designed at worst to suit producers and site designers based on the information that they have available and at best on interviews with small usability or focus groups who are not necessarily users.

Keywords

Citation

Huntington, P. and Nicholas, D. (2006), "Improving the relevance of web menus using search logs: a BBCi case study", Aslib Proceedings, Vol. 58 No. 1/2, pp. 118-128. https://doi.org/10.1108/00012530610648716

Publisher

:

Emerald Group Publishing Limited

Copyright © 2006, Emerald Group Publishing Limited


Introduction

The making available of information content on the web does not mean or create its usage. Site menus are of particular importance to the users' flow through to content as they are indicators of site content and give users an idea of how likely their query will be answered by the site. Menu item prominence is information immediacy. It tells the user early on that the information system can answer queries on a topic. We have found evidence in our own studies (Nicholas et al., 2004) that many users will tend to terminate their session after just viewing one screen, perhaps leaving because they are not finding links and menu signposts for the information that they are looking for. What may in fact be happening is that if a user cannot see a menu signpost to what it is that they are looking for on the home page (immediacy) they will leave. That is, the user does not have a good reason to believe that the site can meet their information need and hence leaves. There is evidence to support this. Nicholas and Huntington (2005) found that over one‐third (38 per cent) of users of health sites said they “often” left a site if they could not see what they were looking for on the page landed at and 56 per cent “often” left if they could not see what they were looking for within the first two screens looked at.

Menu pages offer a particular type of information. They are informing users what is coming next and what questions are going to be answered by the service. Virtually everything that is available on a site is out of immediate sight so users look at menus as indicators of content. The difficulty is how to decide what goes on this menu. Traditionally this has really been a hit or miss affair. At worse site designers or managers best guess a menu based on available content and what they subjectively think users would be interested in. For better‐informed sites use is made of usability studies and focus groups, though these studies rarely include users and their remit is often just to judge the adequacy of a stated range of menu items or menu‐related tasks.

The authors argue that search logs, which are a collection of user search expressions, offer an effective way of assessing and selecting menu items. Users enter search words, in their own language, without prompting and that reflect a genuine information need, to find information that they are looking for. This group is hence the best group for investigating information needs because they are actual users participating in a real process of searching for information. The search terms entered by users reveal their information need. Furthermore, the fact that these statements of information need are collected by server computers automatically, routinely, and at very little cost, argues that the information source offers a potent insight into searching and user behaviour. In fact the difficulty posed by the information source is the quantity of the data. We are moving from analysing a limited number of search expressions related to a usability group asked to image an information need to details of thousands of search expressions of users engaged in realising an actual information need. User information needs should be the foundation for constructing user‐relevant menus. This paper analyses user search expressions, as a form of revealing information need, to develop a user‐centred menu.

This research was funded by the Department of Health and forms part of ongoing research at UCL into the use and development of online health information.

Aims, objectives and scope of the study

This research analyses and classifies web search logs and extracts themes from this information. The subject themes are compared to existing menus and are used to construct and derive a menu. The research is limited to search expressions on diabetes.

Previous literature

Menus are recognized as key to a site. In understanding the success of a web‐based system, navigation, information design, and access are identified as key usability factors (Preece, 2000). However, the design and construction of navigation menus for web sites are too often left to the web designer. A problem with developing menu labels based on this practice is that it is subjective, albeit by a professional designer, and is likely to result in loss of menu coverage. Zhang et al. (2004) investigates web‐menu layouts and uses formal concept analysis (FCA) – a mathematical model to assist in the design and automatic generation of a navigation hierarchy for a set of web documents. This procedure examines for the clustering of attributes. The clustering algorithm determines which collection of attributes forms a coherent entity. The problem of this approach is that the subject starting point is still based on subjective decisions and what information the site has and not on user information needs.

A typical method for investigating information needs regarding menu layout and construction is to observe a small group of participants as they work through a specific number of set‐tasks. Eisenberg (2005) argues that this approach examines the site's interface and process barriers that keep visitors from accomplishing a conversion task. Typically usability studies look at terminology usage and adequacy of menu structure and so on. Frick et al.(1999) investigated performance differences between web‐based navigation models using 44 college students using information‐finding tasks and found no significant differences between the models. Ebenezer's (2003) usability evaluation of the recently launched South London and Maudsley NHS Trust library web site is an example of this. Usability tests were employed to explore how users' cognitive understanding of information presentation relates to the interfaces they interact with. Further examples of this are found in Hennig's (1998) examination of the Bose Corporation intranet site, and studies by Neilsen (2002). The problem of usability tests is that by their nature they are small‐scale, are generally not populated with users with a genuine information need, are directed by a series of pre‐defined usability or set tasks, the results are impacted and dependent on the test environment and are expensive. Neilson (Eisenberg, 2005) noted that usability test participants work harder on tasks in a test session than they would at home and concluded that usability tests cannot measure persuasive momentum and individual motivation. Usability tests aim to simulate reality in a laboratory setting with regard to specific set tasks and hence the results are limited to how well the simulation can be said to exist and limited by the boundaries of the set tasks.

There has been little previous research linking search transaction log data with site menu construction though point of site entry has been previously researched. Van der Geest (2002) found that point of arrival can be used to redesign site structure. If many visitors arrive on a page that is hidden three layers deep in the site then he argues the ordering of the information should be changed and the link to that information should be positioned at a higher level. Further, Rozic‐Hristovski et al. (1999) studied site access, available by a log analysis, to evaluate a developing medical library web site in Slovenia. Both these studies used the page landed at as to some extent revealing the user's information need and hence arguing a greater digital prominence for that page. The study proposed here differs in that only search logs are examined. Other studies (e.g. D'Alessandro et al., 1998) have used log data to examine the origin of users (by IP address) and their information behaviour. In a more search expression related study Goodrum and Spink (1999) conducted a specific analysis of 1.2 million queries for images. They further identified these queries as reflecting information needs but, however, only related this to the number of different search terms used. They did not go on to use this information for the critical construction of a menu, perhaps because the search expressions were too general and were not directed at a particular topic. Jansen et al. (2000) in a related study and again with expressions covering a range of topics reported that over half of the terms used were used only once. Further, Ozmutlu et al. (2004) looked at search logs and did a country comparative time‐based study and study fluctuations over day and number of words in a query. Wolfram et al. (2001) classified a random sample of 2,414 queries from 1997 and 2,539 queries from 1999 into 11 broad subject categories to assess how web searching topics changed from 1997 to 1999.

Nicholas et al. (2002) found a relationship between service usage and menu prominence. The study found that both access to and “use”, as measured by users returning to the site, and use and user statistics, of a site declined significantly over the survey period and these declines match almost exactly changes in the positioning of the service on the menus. As the service becomes more difficult to access as its sign posting becomes ever more removed from the opening menu, the proportion of new visitors as a percentage of all users declines. New users were not coming through because of the increasing difficulty of finding the service. This research underlines the importance of menus as a signpost of what is coming next and where this information is absent the use of a service will decline.

Methods

Logs of user search queries using BBCi were supplied by the BBC (www.bbc.co.uk) for 12 days in March 2003; two sequences were supplied for analysis: 1st‐6th and 9th‐15th. This sequence recorded 4,048,137 search queries made by 891,129 users across 1,035,514 sessions: a testament to the sheer popularity of the BBC web site and search facility. Lines related to diabetes were extracted. In all search transaction logs of 1,004 users were included. This covered 1,838 searches and 384 different search expressions. SPSS was used for all data selection, classification and analysis.

The log file relates to search queries entered by users. The file provides the following fields:

  1. 1.

    Field 1: machine id (cgiper13).

  2. 2.

    Field 2: process number.

  3. 3.

    Field 3: process number.

  4. 4.

    Field 4: date.

  5. 5.

    Field 5: time.

  6. 6.

    Field 6: search expression entered.

  7. 7.

    Field 7: number of BBCi best links returned – best link may be an external or a BBC link.

  8. 8.

    Field 7: operation time.

  9. 9.

    Field 8: cookie.

  10. 10.

    Field 9: scope of search – a subject category which represents a specific area within which the user initiated the search. A scope search confines the results returned within a specified number of directories (urls).

  11. 11.

    Field 10: tab of search user defined to specify a search within the BBC or the world wide web (WWW).

There are potentially a number of interesting fields here. BBCi best links, these are short cuts to what BBC has identified as best links, are returned in response to a user search expression, although not all users are returned with a best link. The return of a BBCi best link may reflect the effectiveness of a user search expression. Also of interest is the “Scope of Search” field. This is a subject category which represents a specific area within which the user initiated the search. A scope search confines the results returned within a specified number of subject directories as this suggests that a user has landed in and chosen to limit their search to a specific area.

Results

To understand the analysis best it is probably best to highlight one user's interaction with the search engine. This particular person used the search facility 13 times over a 20‐minute period between 11.32 p.m. and 11.53 p.m. on 12 March (Table I). The user's decision to return to the search time and time again was taken relatively quickly. Generally there was a gap of between 10 to 30 seconds between the re‐framed search queries and argues that the user had quite quickly decided that nothing relevant was forthcoming from the links on offer.

The user first searched for diabetogenic, then repeated this search, presumable because they did not find anything, four times switching between tab of search options searching the WWW and allbbc (all BBC areas). This user then added to the search expression and searched for diabetogenic pregnancy, repeating this search three times, first for allbbc and then twice for the WWW. There followed a break of about 15 minutes. This break might reflect the user reading a link returned by the search engine; alternatively the user might have gone elsewhere or just decided to have a break. The user returned to the search facility at 11.52 pm and repeated six searches on contraception diabetes mellitus. The first two searches the user miss‐spelt contraception. The user alternated the searches between WWW and allbbc. The user was not returned a BBCi best link in their session. In addition the user only used the facility once, did not return, and searched for a single topic, diabetes.

Table II gives the top 20 search expressions used. The top 20 expressions accounted for a high 58 per cent of all search expressions used. However, most users looked for pages on this topic by just typing in the single word diabetes: 39 per cent typed in diabetes, 4 per cent Diabetes and 1 per cent DIABETES. In all users used 384 search expressions to find topics related to diabetes. It seems from Table II that about half of user search expressions were just looking for anything on diabetes and typed in the word diabetes while about half were looking for more specific information related to diabetes.

The top 20 expressions (Table II) give an idea of the diabetes topics looked for. They include a surprising range of topics from feline diabetes, social aspects, to searches for a variety of types of diabetes. There were also searches on diabetes and diet, aspirin, and an interest in finding out about other web sites. Each of the topics in the main accounted for less than 1 per cent of search expression use and this in part reflects the subject range and different word combinations used by users to search for their topic. Though clearly from the above groupings are suggested.

The 384 different search expressions were classified and grouped into 19 broad subject categories. The groupings and categories were refined and reflect a theme analysis that was completed on the search terms. This involved selecting a theme and looking at the number of search expressions that could be classified into the theme. The results are given in Figure 1 and give the percentage of unique user search expressions falling into each group. By unique we mean that double or more searches by a user using the same terms were excluded. Figure 1 represents the allocation into subject groupings of all search expressions but did not include expressions that included the single term diabetes. In all 84 per cent of search expressions were allocated into subject groups while 16 per cent could not be allocated. A total of 18 per cent of user search expressions looked for either contacts and or web sites; 16 per cent looked for different types of diabetes; 10 per cent related to diabetes and life stage (this grouped search expressions related to diabetes and pregnancy, children, diabetes late in life and so on). A total of 9 per cent related to diabetes and diet and types of diet; 6 per cent to different treatments for diabetes; 5 per cent included the word symptoms; 4 per cent to insulin and sugar; 3 per cent diagnosis; 3 per cent complications and 3 per cent to interactions. Causes of diabetes, travel, exercise, alcohol and smoking, statistics, obesity, aspirin and prevention each made up 2 per cent or less.

It was decided to compare the identified diabetes subject groupings as revealed by the search logs to the menu subject list available on three diabetes information services. The three services selected were: Diabetes on BBC health (www.bbc.co.uk), Diabetes on NHS Direct Online (www.nhsdirect.nhs.uk) and Diabetes UK (www.diabetes.org.uk). The menu items relating to diabetes on these sites (as at 25 November 2004) are given in Table III.

Diabetes as a menu item was not available from the home page of the NHS Direct web site but this page was accessed by going to a “Health Encyclopaedia” link and then to diabetes. The NHS Direct diabetes homepage included links to ten topics. These were Introduction, Symptoms, Causes, Diagnosis, Treatment, Complications, Prevention, Selected links, Info Partners and Audio Clips. The coverage of likely searches by these menu options can be estimated by comparing matches to the broad 19 search categories identified above. Six of the menu links – Symptoms, Causes, Diagnosis, Treatment, Complications and Prevention – can be related to user search expressions. The six NHS Direct menu topics, however, only meet about a third of likely diabetes type of enquiries, as identified in Figure 1. The percentage of search expressions accounted for by each menu item was: symptoms 5 per cent, causes 2 per cent, diagnosis 3 per cent, treatments 6 per cent, complications 3 per cent, prevention 1 per cent, and selected links 18 per cent. The NHS DO diabetes menu covers about 38 per cent of user subject search expressions and does not give an obvious indication to content, for example, related to types of diabetes, diabetes and life stage and diet.

Diabetes as a menu item did not feature on the BBC home page nor did diabetes feature as a menu option from the health home page. A diabetes menu section was found by clicking on the A‐Z index then scrolling down to health and then to diabetes. The diabetes home page included five menu items to content: About diabetes; Treatment; Living with diabetes; Prevention approach; and Links & Organisations. In comparing these five items, which in themselves are fairly broad, to the 19 search subject categories, the menu covers about 38 per cent of search queries. Again there are no obvious links to content such as types of diabetes, life stage and diet.

Diabetes UK had six menu options on the home page these were: What is diabetes; Managing diabetes; How we help; Get involved; Diabetes research and For healthcare professionals. Again these menu options are particularly wide and do not appear to address the directed subjects that users are likely to go to a diabetes site to find. There was no obvious link to types of diabetes, life stage and diabetes and diet. Perhaps site designers are less inclined to state the obvious.

Figure 1 classified approximately 300 unique user search expressions into 19 subject categories, these categories represent broad subject topics that users search for. These categories were then compared to diabetes menu options actually offered to users by three health sites. In each case the menu items, that direct users to content, covered less than half, by expression usage, of the topics searched for by users. None of the three sites, for example, addressed the user information needed to know about types of diabetes, diabetes related to a life stage and diabetes and diet, though 35 per cent of user search expressions were directed to looking for content on just these three topics. There were additional topics such as travel and diabetes, exercise, smoking and obesity that none of the sites directly covered in the opening diabetes menu. Users with a directed information need are likely to question whether any of these sites could meet their information need in these areas. Menu items link user information needs to content that meets this need and a poor menu is likely to result in users feeling confused and frustrated as to how the site could meet their need. Recent research (Nicholas and Huntington (2005)) found that over one‐third (38 per cent) of users of health sites said they “often” left a site if they could not see what they were looking for on the page landed at, further a half (47 per cent) said that they sometimes did this. Clearly the cost in terms of site usage of a poorly designed menu is high.

Using the 19 subject categories based on user search expressions (Figure 1) as a starting point it was decided to construct a six‐point menu that attempted to cover most of these categories. The six menu items are listed in Table IV along with the likely percentage of user queries covered.

This menu reflects generally what users seeking for diabetes information are looking for. It is based on actual search expressions entered by users and reflects the information needs of users.

Conclusion

Search logs and server transaction logs provide a real time insight into actual user interrogation of searching and navigating to site content. Here we have attempted to show how search logs can be used to monitor and assess item selection for navigational menus. Menus are of particular importance as they are indicators of content and give users an idea of how likely their query will be answered by the site. Menus are enabling and are the bridge or link between user information need and supplier content. It is where demand intention meets supply intention and getting it wrong interrupts the users' flow through to content. Previous studies (Nicholas and Huntington (2005)) on users have shown that over one‐third (38 per cent) of users of health sites said they “often” left a site if they could not see what they were looking for on the page landed at and 56 per cent “often” left if they could not see what they were looking for within the first two screens looked at. Providing an adequate menu in part addresses the issue and user concern for effective indicators to content. In any event menu and menu changes should be assessed and the team has argued elsewhere (Huntington et al., 2004; Nicholas et al., 2002) that an effective way to monitor the impact of a change of menu, or any site change, is to analyse actual usage statistics available from the server log. This procedure dubbed “change and see” monitors for small changes in actual user navigational and usage patterns resulting from small changes in menu and site design. Ultimately menu and site changes should aim to increase traffic and direct users effectively to content and this is best assessed by examining actual user behaviour. We hope in the future to explore all the related issues raised in this paper by further examining if usage patterns do, as predicted, increase as a result of applying the results of analysing search expressions, as a form of revealing information need, to develop user‐centred menus.

This study examined search logs as a form of revealing information need and proposes that this can be used as a base to construct a user centred menu. It examined 384 user unique search expressions related to diabetes. A total of 44 per cent had just typed in the single word diabetes and these searches were not particularly directed, while 56 per cent had typed in a directed search expression. An analysis of the directed searches showed that 84 per cent of these could be classified into one of 18 subject categories. These subject categories were compared to the menu coverage of three sites providing health information on diabetes: NHS Direct Online, BBC Health and Diabetes UK. In each case it was found that the site diabetes menu page covered only about one‐third of searches as indicated by the subject category analysis of actual user searches. Finally, a suggested six‐item menu based on reflecting the 18 subject categories was proposed.

Figure 1  The percentage of unique user search expressions falling into 19 search categories

Figure 1

The percentage of unique user search expressions falling into 19 search categories

Table I  Micro analysis of user 1

Table I

Micro analysis of user 1

Table II  The top 20 search expressions used to find topics related to diabetes

Table II

The top 20 search expressions used to find topics related to diabetes

Table III  Diabetes home pages of three health sites

Table III

Diabetes home pages of three health sites

Table IV  Six‐point menu structure based on search expressions entered by users

Table IV

Six‐point menu structure based on search expressions entered by users

Corresponding author

Paul Huntington can be contacted at: [email protected]

References

D'Alessandro, M.P., D'Alessandro, D.M., Galvin, J.R. and Erkonen, W.E. (1998), “Evaluating overall usage of a digital health sciences library”, Bulletin of the Medical Library Association, Vol. 86 No. 4, pp. 6029.

Ebenezer, C. (2003), “Usability evaluation of an NHS library web site”, Health Information and Libraries Journal, Vol. 20 No. 3, pp. 13442.

Eisenberg, B. (2005), “Prioritize usability testing and web analytics”, available at: www.clickz.com/experts/crm/traffic/article.php/3483671 (accessed 11 April 2005).

Frick, T., Kisling, E., Cai, W., Min Yu, B., Giles, F. and Brown, J.P. (2004), “Impact of navigational models on task completion in web‐based information systems”, AECT 1999 presentation to the Research and Theory Division: paper no. 439, available at: http://education.indiana.edu/ ∼frick/aect99/rtd439.html (accessed 15 January 2004).

Goodrum, A. and Spink, A. (1999), “Visual information seeking: a study of image queries on the world wide web”, Proceedings of the 62nd Annual Meeting of the American Society for Information Science, Washington, DC, October, pp. 66574.

Hennig, N. (1998), “Going forward: usability testing the web site”, Internet Librarian, 4 November, available at: www.hennigweb.com/presentations/il98/ (accessed 5 March 2001).

Huntington, P., Nicholas, D. and Warren, D. (2004), “Digital visibility and its impact upon online usage: case study a health web site”, Libri, Vol. 54 No. 4, pp. 15868.

Jansen, B.J., Spink, A. and Saracevic, T. (2000), “Real life, real users and real needs: a study and analysis of users' queries on the web”, Information Processing and Management, Vol. 36 No. 2, pp. 20727.

Neilsen, J. (2002), “Web usability for senior citizens: 46 design guidelines based on usability studies with people age 65 and older”, available at: www.useit.com/alertbox/20020428.html (accessed 11 April 2005).

Nicholas, D. and Huntington, P. (2005), Digital Health Information Consumers and the BBC Website (bbc.co.uk): Users and Usage of Non‐dedicated Health Sites, UCL, London.

Nicholas, D., Huntington, P., Williams, P. and Dobrowolski, T. (2004), “Re‐appraising information seeking behaviour in a digital environment: bouncers, checkers, returnees and the like”, Journal of Documentation, Vol. 60 No. 1, pp. 2439.

Nicholas, D., Huntington, P., Williams, P. and Gunter, B. (2002), “Digital visibility: menu prominence and its impact on use of the NHS Direct information channel on Kingston Interactive Television”, Aslib Proceedings, Vol. 54 No. 4, pp. 21321.

Ozmutlu, S., Spink, A. and Ozmutlu, H.C. (2004), “A day in the life of web searching: an exploratory study”, Information Processing and Management: An International Journal, Vol. 40 No. 2, pp. 31945.

Preece, J. (2000), Online Communities: Designing Usability, Supporting Sociability, Wiley & Sons, New York, NY.

Rozic‐Hristovski, A., Todorovski, L. and Hristovski, D. (1999), “Developing a medical library website at the University of Ljubljana, Slovenia”, Program, Vol. 33 No. 4, October, pp. 31325.

Van der Geest, T. (2002), “Evaluating a web site with server data”, Document Design, Vol. 1 No. 2, pp. 1312.

Wolfram, D., Spink, A., Jansen, B.J. and Saracevic, T. (2001), “Vox populi: the public searching of the web, available at: http://jimjansen.tripod.com/academic/pubs/jasist2001/jasist2001a.html (accessed 12 April 2005).

Zhang, G.Q., Shen, G., Staiger, J., Troy, A. and Sun, J. (2004), “FcAWN: concept analysis as a formal method for automated web‐menu design file”, available at: http://newton.eecs.cwru.edu/ ∼gqz/papers/web‐menu.pdf (accessed 24 November 2004).

Related articles