Automated Web issue analysis: A nurse prescribing case study
Introduction
Healthcare information and healthcare initiatives typically need to be communicated to large professional bodies such as doctors, nurses and health managers. Health-related information can be produced by a wide variety of people including academics, doctors, government spokespersons, and in some cases, non-medical people. The Web is a popular publication medium for a wide variety of health information (Zeng et al., 2004), of varying quality and accuracy (Bernstam, Shelton, Walji, & Meric-Bernstam, 2005), and is being increasingly seen as central to information provision within the health services (Murphy et al., 2004), including in the role of keeping practitioners up to date with current guidelines. The Web also seems to be a vehicle for an increased internationalisation of medical education (Hovenga, 2004). For those responsible for any aspect of healthcare information, Web publishing is a problem because of the conflicting messages it can give (Burd, Chiu, & McNaught, 2004), and hence there is a need to gain insights into what healthcare information is published for any given topic in order to decide how to respond to it. Other researchers have tackled the problem of variable quality Internet information by evaluating metrics for predicting health Web site quality (Currò et al., 2004, Hernández-Borges et al., 2003). This is useful from the perspective of deciding which sites to use or recommend, but does not help managers identify and respond to unwanted information, particularly when it comes from an unexpected source, such as a medical article in an online newspaper.
Previous researchers have developed a variety of methods designed to identify aspects of online communities or topics, although these have tended to either rely upon simple link analyses (Garrido and Halavais, 2003, Park, 2003, Tang and Thelwall, 2003) or to be very labour intensive (Foot et al., 2003, Weare and Lin, 2000). In computer science, various forms of Web mining have been developed to extract information from Web pages or log files (Chakrabarti, 2003, Kosala and Blockeel, 2000), but these have typically not been designed to be applied to wider social issues, with the closest perhaps being community identification (Flake, Lawrence, Giles, & Coetzee, 2000) and topic clustering (Chakrabarti, Joshi, Punera, & Pennock, 2002). Topic identification and tracking is also a recognised task within computer science and computational linguistics with online variants following a long tradition of offline research, primarily through the TREC conferences (e.g., Chakrabarti, VanDen Berg, & Dom, 1999; e.g., Clifton et al., 2004, Ozmutlu and Cavdur, 2005). This task is more narrowly focussed than issue analysis (as described below), however, with a typical application being the identification and categorisation of news stories. Issue tracking, the task of identifying the scope of a broad social issue and tracking it, has a pedigree from before the Web as a specific social science task, triggered by the pioneering study of Lancaster and Lee (1985), who tracked research related to acid rain over time in several databases. A more recent example is Wormell’s (2000) analysis of topics related to the Danish welfare state, a study that was able to take advantage of the availability of multiple different sources of electronic information. In bibliometrics, the mapping of papers or authors in an attempt to describe areas of science is an established practice (e.g., Leydesdorff, 1989, Small, 1973, White and Griffith, 1982). In this paper we apply Web issue analysis (Thelwall, Vann, & Fairclough, in press) to systematically identify all issues relevant to any selected health topic, at least those issues that are reflected on the Web. In essence, the method starts with one or more topic descriptions, such as ‘nurse prescribing’, and downloads all Web pages (via Google) that allude to the topic. These Web pages are then used for a range of types of link analysis. The pages are then processed to extract their noun phrases and a frequency table is produced giving the number of sites containing the noun or noun phrase. Nouns and noun phrases are much better indicators of topic discussed in a document than individual words since they can be complete concept representations. Site frequencies are reasonable indicators of the popularity of topics and are better than raw frequency counts or page based frequency counts because Web sites are often highly repetitive, duplicating content in many or all site pages (Thelwall, 2002), which is made easy by database driven Web site technology (Dørup, Hansen, Ribe, & Larsen, 2002). In Web issue analysis, the set of nouns and noun phrases extracted from topic-relevant pages are the candidate topic-relevant issues. The site frequency counts of noun phrases are suggestive indicators of their topic-relevant popularity. The table of topic-relevant issues and popularities is described as the Web environment of the topic in the belief that researchers and information managers can gain useful topic-relevant insights from its Web environment.
In this paper, Web issue analysis is applied to a specific case study to demonstrate its capabilities for providing management information in a national context. The medical field chosen is nurse prescribing in the UK. The objective of the case study is to investigate whether an automated Web issue analysis can produce useful information about the context of Web publishing for nurse prescribing.
Section snippets
Nurse prescribing background
In the UK, recent years have seen a Department of Health initiative to train a proportion of nurses to prescribe a range of medicines. Legislation was passed in 1992 to give prescriptive powers to district nurses and health visitors so that they could legally prescribe from a restricted formulary (the Nurse Prescribers’ Formulary). The government announced in May 2001 that prescriptive authority would be extended to additional nurse roles within both primary and secondary care. Nurses can
Design of the study
The study is designed to produce three different types of information about nurse prescribing from HTML Web pages.
- 1.
URLs of Web pages containing the phrase ‘nurse prescribing’ (henceforth: ‘nurse prescribing pages’).
- 2.
URLs of pages linked from by the above pages (outlinks).
- 3.
Noun phrases in nurse prescribing pages.
The motivating belief for collecting these three types of information is that
- 1.
URLs may give useful information about the types and geographic locations of organisations publishing nurse
Results
The Google API searches returned 6772 URLs from 1619 domains. After downloading these URLs and excluding errors and non-HTML pages, there were a total of 1217 Web sites containing some text, although the smallest contained only a few words.
Discussion
As discussed in Section 3, all the data should be viewed in the knowledge of the limitations of its origins. The documents included are those that are (a) publicly available on the Web and (b) indexed in Google. Point (a) is a purpose of the study, but should not be forgotten, and the omission of invisible Web pages (Ru & Horowitz, 2005), and presumably many in NHSnet, is a serious concern. Viewing Web documents as a subset of all documents about the topic, the large number of academic pages is
Conclusions
The Web analysis was able to identify a number of interesting facts. Whilst many would probably serve to confirm stakeholders’ suspicions, others (e.g. mental health, the UK focus, the minor nhs.uk role, the disconnectedness of nurse prescribing Web sites) may present surprises. Overall, then, the results should help give managers an evidence-based map of online nurse prescribing information, as well as suggesting avenues for further exploration. It is important that the results of a Web issue
References (60)
- et al.
Instruments to assess the quality of health information on the world wide Web: What can our patients actually use?
International Journal of Medical Informatics
(2005) Globalisation of health and medical informatics education—what are the issues?
International Journal of Medical Informatics
(2004)- et al.
Health informatics education for clinicians and managers—what’s holding up progress?
International Journal of Medical Informatics
(2004) - et al.
US academic departmental Web-site interlinking: Disciplinary differences
Library and Information Science Research
(2003) - et al.
Search engine coverage bias: Evidence and possible causes
Information Processing & Management
(2004) - et al.
Positive attitudes and failed queries: An exploration of the conundrums of consumer health information retrieval
International Journal of Medical Informatics
(2004) Search engine ability to cope with the changing Web
- et al.
Toward a basic framework for Webometrics
Journal of the American Society for Information Science and Technology
(2004) - et al.
Scholarly communication and bibliometrics
Annual Review of Information Science and Technology
(2002) - et al.
Screening internet Websites for educational potential in undergraduate medical education
Medical Informatics and The Internet in Medicine
(2004)
Early intervention and mental health
Community Practitioner
Mining the Web: Analysis of hypertext and semi structured data
Topcat: Data mining for topic identification in a text corpus
IEEE Transactions on Knowledge and Data Engineering
A quality evaluation methodology of health Web-pages for non-professionals
Medical Informatics and The Internet in Medicine
A comparison of technologies for database-driven Websites for medical education
Medical Informatics and The Internet in Medicine
Analyzing linking practices: Candidate sites in the 2002 us electoral Web sphere
Journal of Computer Mediated Communication
Mapping networks of support for the Zapatista movement: Applying social network analysis to study contemporary social movements
Prescribing: The great debate
Nursing Standard
Nurse prescribing: Lessons from the US
Nursing New Zealand
Hyperlinks as a data source for science mapping
Journal of Information Science
User preference as quality markers of paediatric Web sites
Medical Informatics and the Internet in Medicine
Evidence and engagement in the introduction of nurse prescribing in New Zealand
Nurse Prescribing
Web mining research: A survey
SIGKDD Explorations
Bibliometric techniques applied to issues management—a case-study
Journal of the American Society for Information Science
Effectiveness of nurse prescribing: A review of the literature
Journal of Clinical Nursing
Cited by (3)
Performance improvement of international airlines' websites
2014, OPT-i 2014 - 1st International Conference on Engineering and Applied Sciences Optimization, ProceedingsAdoption of hierarchical structure for web document analysis in knowledge management system
2011, IEEE International Conference on Industrial Engineering and Engineering Management