Automated Web issue analysis: A nurse prescribing case study

https://doi.org/10.1016/j.ipm.2006.03.011Get rights and content

Abstract

Web issue analysis, a new automated technique designed to rapidly give timely management intelligence about a topic from an automated large-scale analysis of relevant pages from the Web, is introduced and demonstrated. The technique includes hyperlink and URL analysis to identify common direct and indirect sources of Web information. In addition, text analysis through natural language processing techniques is used identify relevant common nouns and noun phrases. A case study approach is taken, applying Web issue analysis to the topic of nurse prescribing. The results are presented in descriptive form and a qualitative analysis is used to argue that new information has been found. The nurse prescribing results demonstrate interesting new findings, such as the parochial nature of the topic in the UK, an apparent absence of similar concepts internationally, at least in the English-speaking world, and a significant concern with mental health issues. These demonstrate that automated Web issue analysis is capable of quickly delivering new insights into a problem. General limitations are that the success of Web issue analysis is dependant upon the particular topic chosen and the ability to find a phrase that accurately captures the topic and is not used in other contexts, as well as being language-specific.

Introduction

Healthcare information and healthcare initiatives typically need to be communicated to large professional bodies such as doctors, nurses and health managers. Health-related information can be produced by a wide variety of people including academics, doctors, government spokespersons, and in some cases, non-medical people. The Web is a popular publication medium for a wide variety of health information (Zeng et al., 2004), of varying quality and accuracy (Bernstam, Shelton, Walji, & Meric-Bernstam, 2005), and is being increasingly seen as central to information provision within the health services (Murphy et al., 2004), including in the role of keeping practitioners up to date with current guidelines. The Web also seems to be a vehicle for an increased internationalisation of medical education (Hovenga, 2004). For those responsible for any aspect of healthcare information, Web publishing is a problem because of the conflicting messages it can give (Burd, Chiu, & McNaught, 2004), and hence there is a need to gain insights into what healthcare information is published for any given topic in order to decide how to respond to it. Other researchers have tackled the problem of variable quality Internet information by evaluating metrics for predicting health Web site quality (Currò et al., 2004, Hernández-Borges et al., 2003). This is useful from the perspective of deciding which sites to use or recommend, but does not help managers identify and respond to unwanted information, particularly when it comes from an unexpected source, such as a medical article in an online newspaper.

Previous researchers have developed a variety of methods designed to identify aspects of online communities or topics, although these have tended to either rely upon simple link analyses (Garrido and Halavais, 2003, Park, 2003, Tang and Thelwall, 2003) or to be very labour intensive (Foot et al., 2003, Weare and Lin, 2000). In computer science, various forms of Web mining have been developed to extract information from Web pages or log files (Chakrabarti, 2003, Kosala and Blockeel, 2000), but these have typically not been designed to be applied to wider social issues, with the closest perhaps being community identification (Flake, Lawrence, Giles, & Coetzee, 2000) and topic clustering (Chakrabarti, Joshi, Punera, & Pennock, 2002). Topic identification and tracking is also a recognised task within computer science and computational linguistics with online variants following a long tradition of offline research, primarily through the TREC conferences (e.g., Chakrabarti, VanDen Berg, & Dom, 1999; e.g., Clifton et al., 2004, Ozmutlu and Cavdur, 2005). This task is more narrowly focussed than issue analysis (as described below), however, with a typical application being the identification and categorisation of news stories. Issue tracking, the task of identifying the scope of a broad social issue and tracking it, has a pedigree from before the Web as a specific social science task, triggered by the pioneering study of Lancaster and Lee (1985), who tracked research related to acid rain over time in several databases. A more recent example is Wormell’s (2000) analysis of topics related to the Danish welfare state, a study that was able to take advantage of the availability of multiple different sources of electronic information. In bibliometrics, the mapping of papers or authors in an attempt to describe areas of science is an established practice (e.g., Leydesdorff, 1989, Small, 1973, White and Griffith, 1982). In this paper we apply Web issue analysis (Thelwall, Vann, & Fairclough, in press) to systematically identify all issues relevant to any selected health topic, at least those issues that are reflected on the Web. In essence, the method starts with one or more topic descriptions, such as ‘nurse prescribing’, and downloads all Web pages (via Google) that allude to the topic. These Web pages are then used for a range of types of link analysis. The pages are then processed to extract their noun phrases and a frequency table is produced giving the number of sites containing the noun or noun phrase. Nouns and noun phrases are much better indicators of topic discussed in a document than individual words since they can be complete concept representations. Site frequencies are reasonable indicators of the popularity of topics and are better than raw frequency counts or page based frequency counts because Web sites are often highly repetitive, duplicating content in many or all site pages (Thelwall, 2002), which is made easy by database driven Web site technology (Dørup, Hansen, Ribe, & Larsen, 2002). In Web issue analysis, the set of nouns and noun phrases extracted from topic-relevant pages are the candidate topic-relevant issues. The site frequency counts of noun phrases are suggestive indicators of their topic-relevant popularity. The table of topic-relevant issues and popularities is described as the Web environment of the topic in the belief that researchers and information managers can gain useful topic-relevant insights from its Web environment.

In this paper, Web issue analysis is applied to a specific case study to demonstrate its capabilities for providing management information in a national context. The medical field chosen is nurse prescribing in the UK. The objective of the case study is to investigate whether an automated Web issue analysis can produce useful information about the context of Web publishing for nurse prescribing.

Section snippets

Nurse prescribing background

In the UK, recent years have seen a Department of Health initiative to train a proportion of nurses to prescribe a range of medicines. Legislation was passed in 1992 to give prescriptive powers to district nurses and health visitors so that they could legally prescribe from a restricted formulary (the Nurse Prescribers’ Formulary). The government announced in May 2001 that prescriptive authority would be extended to additional nurse roles within both primary and secondary care. Nurses can

Design of the study

The study is designed to produce three different types of information about nurse prescribing from HTML Web pages.

  • 1.

    URLs of Web pages containing the phrase ‘nurse prescribing’ (henceforth: ‘nurse prescribing pages’).

  • 2.

    URLs of pages linked from by the above pages (outlinks).

  • 3.

    Noun phrases in nurse prescribing pages.

The motivating belief for collecting these three types of information is that

  • 1.

    URLs may give useful information about the types and geographic locations of organisations publishing nurse

Results

The Google API searches returned 6772 URLs from 1619 domains. After downloading these URLs and excluding errors and non-HTML pages, there were a total of 1217 Web sites containing some text, although the smallest contained only a few words.

Discussion

As discussed in Section 3, all the data should be viewed in the knowledge of the limitations of its origins. The documents included are those that are (a) publicly available on the Web and (b) indexed in Google. Point (a) is a purpose of the study, but should not be forgotten, and the omission of invisible Web pages (Ru & Horowitz, 2005), and presumably many in NHSnet, is a serious concern. Viewing Web documents as a subset of all documents about the topic, the large number of academic pages is

Conclusions

The Web analysis was able to identify a number of interesting facts. Whilst many would probably serve to confirm stakeholders’ suspicions, others (e.g. mental health, the UK focus, the minor nhs.uk role, the disconnectedness of nurse prescribing Web sites) may present surprises. Overall, then, the results should help give managers an evidence-based map of online nurse prescribing information, as well as suggesting avenues for further exploration. It is important that the results of a Web issue

References (60)

  • J. Camm

    Early intervention and mental health

    Community Practitioner

    (2005)
  • S. Chakrabarti

    Mining the Web: Analysis of hypertext and semi structured data

    (2003)
  • Chakrabarti, S., VanDen Berg, M., & Dom, B. (1999). Focused crawling: A new approach to topic-specific Web resource...
  • Chakrabarti, S., Joshi, M. M., Punera, K., & Pennock, D. M. (2002). The structure of broad topics on the Web. Available...
  • C. Clifton et al.

    Topcat: Data mining for topic identification in a text corpus

    IEEE Transactions on Knowledge and Data Engineering

    (2004)
  • Coburn, A. (2005). Lingua:En:Tagger—part-of-speech tagger for English natural language processing. Available from...
  • V. Currò et al.

    A quality evaluation methodology of health Web-pages for non-professionals

    Medical Informatics and The Internet in Medicine

    (2004)
  • J. Dørup et al.

    A comparison of technologies for database-driven Websites for medical education

    Medical Informatics and The Internet in Medicine

    (2002)
  • Flake, G. W., Lawrence, S., Giles, C. L., & Coetzee, F. M. (2000). Efficient identification of Web communities. Paper...
  • K.A. Foot et al.

    Analyzing linking practices: Candidate sites in the 2002 us electoral Web sphere

    Journal of Computer Mediated Communication

    (2003)
  • M. Garrido et al.

    Mapping networks of support for the Zapatista movement: Applying social network analysis to study contemporary social movements

  • Google (2005). Google Web APIs (beta). Available from...
  • K. Gournay

    Prescribing: The great debate

    Nursing Standard

    (2002)
  • A. Hales et al.

    Nurse prescribing: Lessons from the US

    Nursing New Zealand

    (2002)
  • G. Harries et al.

    Hyperlinks as a data source for science mapping

    Journal of Information Science

    (2004)
  • A. Hernández-Borges et al.

    User preference as quality markers of paediatric Web sites

    Medical Informatics and the Internet in Medicine

    (2003)
  • F. Hughes et al.

    Evidence and engagement in the introduction of nurse prescribing in New Zealand

    Nurse Prescribing

    (2004)
  • R. Kosala et al.

    Web mining research: A survey

    SIGKDD Explorations

    (2000)
  • F.W. Lancaster et al.

    Bibliometric techniques applied to issues management—a case-study

    Journal of the American Society for Information Science

    (1985)
  • S. Latter et al.

    Effectiveness of nurse prescribing: A review of the literature

    Journal of Clinical Nursing

    (2004)
  • Cited by (3)

    1

    Tel.: +44 1902 328713; fax: +44 1902 321478.

    2

    Tel.: +44 1902 321000; fax: +44 1902 321478.

    View full text