Reliability Prediction of Webpages in the Medical Domain

Sondhi, Parikshit; Vydiswaran, V. G. Vinod; Zhai, ChengXiang

doi:10.1007/978-3-642-28997-2_19

Parikshit Sondhi²²,
V. G. Vinod Vydiswaran²² &
ChengXiang Zhai²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7224))

Included in the following conference series:

European Conference on Information Retrieval

2817 Accesses
29 Citations
3 Altmetric

Abstract

In this paper, we study how to automatically predict reliability of web pages in the medical domain. Assessing reliability of online medical information is especially critical as it may potentially influence vulnerable patients seeking help online. Unfortunately, there are no automated systems currently available that can classify a medical webpage as being reliable, while manual assessment cannot scale up to process the large number of medical pages on the Web. We propose a supervised learning approach to automatically predict reliability of medical webpages. We developed a gold standard dataset using the standard reliability criteria defined by the Health on Net Foundation and systematically experimented with different link and content based feature sets. Our experiments show promising results with prediction accuracies of over 80%. We also show that our proposed prediction method is useful in applications such as reliability-based re-ranking and automatic website accreditation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andersen, R., Borgs, C., Chayes, J., Hopcroft, J., Jain, K., Mirrokni, V., Teng, S.: Robust PageRank and Locally Computable Spam Detection Features. In: AIRWeb 2008: Proceedings of the 4th Intl. Workshop on Adversarial Information Retrieval on the Web, pp. 69–76 (2008)
Google Scholar
Aphinyanaphongs, Y., Aliferis, C.F.: Text Categorization Models for Identifying Unproven Cancer Treatments on the Web. In: MedInfo, pp. 968–972 (2007)
Google Scholar
Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., Leonardi, S.: Link analysis for Web spam detection. ACM Trans. Web 2(1), 1–42 (2008)
Article Google Scholar
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM TOIT 5(1), 231–297 (2005)
Article Google Scholar
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of WWW (1998)
Google Scholar
Gaudinat, A., Grabar, N., Boyer, C.: Automatic Retrieval of Web Pages with Standards of Ethics and Trustworthiness Within a Medical Portal: What a Page Name Tells Us. In: Proc. of Conf. on Artificial Intelligence in Medicine (AIME), pp. 185–189 (2007)
Google Scholar
Gaudinat, A., Grabar, N., Boyer, C.: Machine Learning Approach for Automatic Quality Criteria Detection of Health Web Pages. In: Proc. of the World Congress on Health (Medical) Informatics – Building Sustainable Health Systems, vol. 129, pp. 705–709 (2007)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publ. (2006)
Google Scholar
Henzinger, M.R.: Link Analysis in Web Information Retrieval. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23, 3–8 (2000)
Google Scholar
Joachims, T.: Making large-scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning. MIT Press (1998)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Lankes, D.R.: Trusting the Internet: New Approaches to Credibility Tools, pp. 101–122. MIT Press (2008)
Google Scholar
Marriott, J.V., Stec, P., El-Toukhy, T., Khalaf, Y., Braude, P., Coomarasamy, A.: Infertility information on the World Wide Web: a cross-sectional survey of quality of infertility information on the internet in the UK. In: Human Reproduction, pp. 1520–1525 (July 2008)
Google Scholar
Martin, M.J.: Reliability and verification of natural language text on the world wide web. PhD thesis, Las Cruces, NM, USA, Chair-Hartley, Roger T (2005)
Google Scholar
Matthews, S.C., Camacho, A., Mills, P.J., Dimsdale, J.E.: The Internet for Medical Information About Cancer: Help or Hindrance? Psychosomatics 44, 100–103 (2003)
Article Google Scholar
Price, S.L., Hersh, W.R.: Filtering Web pages for Quality Indicators: An Empirical Approach to Finding High Quality Consumer Health Information on the World Wide Web. In: Proceedings of AMIA Symposium, pp. 911–915 (1999)
Google Scholar
Rubin, V.L., Liddy, E.D.: Assessing credibility of weblogs. In: AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), pp. 187–190 (2006)
Google Scholar
Tang, T.T., Craswell, N., Hawking, D., Griffiths, K., Christensen, H.: Quality and relevance of domain-specific search: A case study in mental health. Inf. Retr. 9(2), 207–225 (2006)
Article Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
MATH Google Scholar
Vydiswaran, V., Zhai, C., Roth, D.: Content-driven Trust Propagation Framework. In: Proceedings of the 17^th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 974–982 (2011)
Google Scholar
Wang, Y., Richard, R.: Rule-based Automatic Criteria Detection for Assessing Quality of Online Health Information. Journal on Information Technology in Healthcare 5(5), 288–299 (2007)
Google Scholar
Zhang, L., Zhang, Y., Zhang, Y., Li, X.: Exploring both Content and Link Quality for Anti-Spamming. In: Proceedings of the Sixth IEEE International Conference on Computer and Information Technology (CIT), p. 37 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Parikshit Sondhi, V. G. Vinod Vydiswaran & ChengXiang Zhai

Authors

Parikshit Sondhi
View author publications
You can also search for this author in PubMed Google Scholar
V. G. Vinod Vydiswaran
View author publications
You can also search for this author in PubMed Google Scholar
ChengXiang Zhai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yahoo! Research, Diagonal 177, 08018, Barcelona, Spain
Ricardo Baeza-Yates & B. Barla Cambazoglu &
Centrum Wiskunde & Informatica, Science Park 123, Amsterdam, The Netherlands
Arjen P. de Vries
Websays, Nàpols 294 7-4, 08025, Barcelona, Spain
Hugo Zaragoza
Yahoo! Research, Diagnoal 177, 08018, Barcelona, Spain
Vanessa Murdock
Yahoo! Labs, Tower 3, Matam Park, 31905, Haifa, Israel
Ronny Lempel
ISTI-CNR, via G. Moruzzi, 1, 56124, Pisa, Italy
Fabrizio Silvestri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sondhi, P., Vydiswaran, V.G.V., Zhai, C. (2012). Reliability Prediction of Webpages in the Medical Domain. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-28997-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics