Skip to main content

Reliability Prediction of Webpages in the Medical Domain

  • Conference paper
Advances in Information Retrieval (ECIR 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7224))

Included in the following conference series:

Abstract

In this paper, we study how to automatically predict reliability of web pages in the medical domain. Assessing reliability of online medical information is especially critical as it may potentially influence vulnerable patients seeking help online. Unfortunately, there are no automated systems currently available that can classify a medical webpage as being reliable, while manual assessment cannot scale up to process the large number of medical pages on the Web. We propose a supervised learning approach to automatically predict reliability of medical webpages. We developed a gold standard dataset using the standard reliability criteria defined by the Health on Net Foundation and systematically experimented with different link and content based feature sets. Our experiments show promising results with prediction accuracies of over 80%. We also show that our proposed prediction method is useful in applications such as reliability-based re-ranking and automatic website accreditation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andersen, R., Borgs, C., Chayes, J., Hopcroft, J., Jain, K., Mirrokni, V., Teng, S.: Robust PageRank and Locally Computable Spam Detection Features. In: AIRWeb 2008: Proceedings of the 4th Intl. Workshop on Adversarial Information Retrieval on the Web, pp. 69–76 (2008)

    Google Scholar 

  2. Aphinyanaphongs, Y., Aliferis, C.F.: Text Categorization Models for Identifying Unproven Cancer Treatments on the Web. In: MedInfo, pp. 968–972 (2007)

    Google Scholar 

  3. Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., Leonardi, S.: Link analysis for Web spam detection. ACM Trans. Web 2(1), 1–42 (2008)

    Article  Google Scholar 

  4. Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM TOIT 5(1), 231–297 (2005)

    Article  Google Scholar 

  5. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of WWW (1998)

    Google Scholar 

  6. Gaudinat, A., Grabar, N., Boyer, C.: Automatic Retrieval of Web Pages with Standards of Ethics and Trustworthiness Within a Medical Portal: What a Page Name Tells Us. In: Proc. of Conf. on Artificial Intelligence in Medicine (AIME), pp. 185–189 (2007)

    Google Scholar 

  7. Gaudinat, A., Grabar, N., Boyer, C.: Machine Learning Approach for Automatic Quality Criteria Detection of Health Web Pages. In: Proc. of the World Congress on Health (Medical) Informatics – Building Sustainable Health Systems, vol. 129, pp. 705–709 (2007)

    Google Scholar 

  8. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publ. (2006)

    Google Scholar 

  9. Henzinger, M.R.: Link Analysis in Web Information Retrieval. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23, 3–8 (2000)

    Google Scholar 

  10. Joachims, T.: Making large-scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning. MIT Press (1998)

    Google Scholar 

  11. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. Lankes, D.R.: Trusting the Internet: New Approaches to Credibility Tools, pp. 101–122. MIT Press (2008)

    Google Scholar 

  13. Marriott, J.V., Stec, P., El-Toukhy, T., Khalaf, Y., Braude, P., Coomarasamy, A.: Infertility information on the World Wide Web: a cross-sectional survey of quality of infertility information on the internet in the UK. In: Human Reproduction, pp. 1520–1525 (July 2008)

    Google Scholar 

  14. Martin, M.J.: Reliability and verification of natural language text on the world wide web. PhD thesis, Las Cruces, NM, USA, Chair-Hartley, Roger T (2005)

    Google Scholar 

  15. Matthews, S.C., Camacho, A., Mills, P.J., Dimsdale, J.E.: The Internet for Medical Information About Cancer: Help or Hindrance? Psychosomatics 44, 100–103 (2003)

    Article  Google Scholar 

  16. Price, S.L., Hersh, W.R.: Filtering Web pages for Quality Indicators: An Empirical Approach to Finding High Quality Consumer Health Information on the World Wide Web. In: Proceedings of AMIA Symposium, pp. 911–915 (1999)

    Google Scholar 

  17. Rubin, V.L., Liddy, E.D.: Assessing credibility of weblogs. In: AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), pp. 187–190 (2006)

    Google Scholar 

  18. Tang, T.T., Craswell, N., Hawking, D., Griffiths, K., Christensen, H.: Quality and relevance of domain-specific search: A case study in mental health. Inf. Retr. 9(2), 207–225 (2006)

    Article  Google Scholar 

  19. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  20. Vydiswaran, V., Zhai, C., Roth, D.: Content-driven Trust Propagation Framework. In: Proceedings of the 17th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 974–982 (2011)

    Google Scholar 

  21. Wang, Y., Richard, R.: Rule-based Automatic Criteria Detection for Assessing Quality of Online Health Information. Journal on Information Technology in Healthcare 5(5), 288–299 (2007)

    Google Scholar 

  22. Zhang, L., Zhang, Y., Zhang, Y., Li, X.: Exploring both Content and Link Quality for Anti-Spamming. In: Proceedings of the Sixth IEEE International Conference on Computer and Information Technology (CIT), p. 37 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sondhi, P., Vydiswaran, V.G.V., Zhai, C. (2012). Reliability Prediction of Webpages in the Medical Domain. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28997-2_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28996-5

  • Online ISBN: 978-3-642-28997-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics