Abstract
In this paper, we study how to automatically predict reliability of web pages in the medical domain. Assessing reliability of online medical information is especially critical as it may potentially influence vulnerable patients seeking help online. Unfortunately, there are no automated systems currently available that can classify a medical webpage as being reliable, while manual assessment cannot scale up to process the large number of medical pages on the Web. We propose a supervised learning approach to automatically predict reliability of medical webpages. We developed a gold standard dataset using the standard reliability criteria defined by the Health on Net Foundation and systematically experimented with different link and content based feature sets. Our experiments show promising results with prediction accuracies of over 80%. We also show that our proposed prediction method is useful in applications such as reliability-based re-ranking and automatic website accreditation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andersen, R., Borgs, C., Chayes, J., Hopcroft, J., Jain, K., Mirrokni, V., Teng, S.: Robust PageRank and Locally Computable Spam Detection Features. In: AIRWeb 2008: Proceedings of the 4th Intl. Workshop on Adversarial Information Retrieval on the Web, pp. 69–76 (2008)
Aphinyanaphongs, Y., Aliferis, C.F.: Text Categorization Models for Identifying Unproven Cancer Treatments on the Web. In: MedInfo, pp. 968–972 (2007)
Becchetti, L., Castillo, C., Donato, D., Baeza-Yates, R., Leonardi, S.: Link analysis for Web spam detection. ACM Trans. Web 2(1), 1–42 (2008)
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM TOIT 5(1), 231–297 (2005)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of WWW (1998)
Gaudinat, A., Grabar, N., Boyer, C.: Automatic Retrieval of Web Pages with Standards of Ethics and Trustworthiness Within a Medical Portal: What a Page Name Tells Us. In: Proc. of Conf. on Artificial Intelligence in Medicine (AIME), pp. 185–189 (2007)
Gaudinat, A., Grabar, N., Boyer, C.: Machine Learning Approach for Automatic Quality Criteria Detection of Health Web Pages. In: Proc. of the World Congress on Health (Medical) Informatics – Building Sustainable Health Systems, vol. 129, pp. 705–709 (2007)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publ. (2006)
Henzinger, M.R.: Link Analysis in Web Information Retrieval. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23, 3–8 (2000)
Joachims, T.: Making large-scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning. MIT Press (1998)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Lankes, D.R.: Trusting the Internet: New Approaches to Credibility Tools, pp. 101–122. MIT Press (2008)
Marriott, J.V., Stec, P., El-Toukhy, T., Khalaf, Y., Braude, P., Coomarasamy, A.: Infertility information on the World Wide Web: a cross-sectional survey of quality of infertility information on the internet in the UK. In: Human Reproduction, pp. 1520–1525 (July 2008)
Martin, M.J.: Reliability and verification of natural language text on the world wide web. PhD thesis, Las Cruces, NM, USA, Chair-Hartley, Roger T (2005)
Matthews, S.C., Camacho, A., Mills, P.J., Dimsdale, J.E.: The Internet for Medical Information About Cancer: Help or Hindrance? Psychosomatics 44, 100–103 (2003)
Price, S.L., Hersh, W.R.: Filtering Web pages for Quality Indicators: An Empirical Approach to Finding High Quality Consumer Health Information on the World Wide Web. In: Proceedings of AMIA Symposium, pp. 911–915 (1999)
Rubin, V.L., Liddy, E.D.: Assessing credibility of weblogs. In: AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), pp. 187–190 (2006)
Tang, T.T., Craswell, N., Hawking, D., Griffiths, K., Christensen, H.: Quality and relevance of domain-specific search: A case study in mental health. Inf. Retr. 9(2), 207–225 (2006)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Vydiswaran, V., Zhai, C., Roth, D.: Content-driven Trust Propagation Framework. In: Proceedings of the 17th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 974–982 (2011)
Wang, Y., Richard, R.: Rule-based Automatic Criteria Detection for Assessing Quality of Online Health Information. Journal on Information Technology in Healthcare 5(5), 288–299 (2007)
Zhang, L., Zhang, Y., Zhang, Y., Li, X.: Exploring both Content and Link Quality for Anti-Spamming. In: Proceedings of the Sixth IEEE International Conference on Computer and Information Technology (CIT), p. 37 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sondhi, P., Vydiswaran, V.G.V., Zhai, C. (2012). Reliability Prediction of Webpages in the Medical Domain. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-28997-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)