Skip to main content

PLIDMiner: A Quality Based Approach for Researcher’s Homepage Discovery

  • Conference paper
Book cover Information Retrieval Technology (AIRS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7675))

Included in the following conference series:

  • 1217 Accesses

Abstract

Researchers’ high quality homepages are important resources in academic search because they provide comprehensive and up-to-date information about researchers. Meanwhile, low quality homepages widely exist. A case study shows that 57.8% of all homepages retrieved among top 10 results from Google are low quality and 95% top researchers own out-of-date homepages. Besides, some academic portals generate dynamic homepages introducing researchers. These homepages are not maintained by researchers and may contain incorrect information. The quality of discovered homepages can not be ensured by existing work, which decreases the efficiency of academic search. It is difficult to define a high quality homepage from a quantitative perspective. Instead, on the basis of analyzing labeled high quality homepages, we propose “informative researcher’s homepage”, at least consisting of identifiable information (introducing a researcher’s basic information) and publication list (listing his/her corresponding publications), as an estimation for high quality homepage. Based on the observation that informative researchers’ homepages are organized in two ways, integrated and scattered, we propose an effective discovering model, PLIDMiner, with F1 scores over 0.9 on labeled data. Our model can also be applied to verify homepages’ quality. We crawl thousands of homepage resources from popular academic portals and assess their overall qualities. It turns out that nearly 25% of homepage resources in these portals are not informative, which strengthens our motivation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kang, I.-S., et al.: Construction of a Large-scale Test Set for Author Disambiguation. Information Processing and Management 47, 452–465 (2011)

    Article  Google Scholar 

  2. Yang, K.-H., Ho, J.-M.: Parsing Publication Lists on the Web. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 444–447 (2010)

    Google Scholar 

  3. Doan, A., Ramakrishnan, R., et al.: Community information management. IEEE Data Engineering Bulletin 29, 64–72 (2006)

    Google Scholar 

  4. Li, J., Tang, J., et al.: Arnetminer: Expertise Oriented Search Using Social Networks. Frontiers of Computer Science in China, 94–105 (2008)

    Google Scholar 

  5. Tang, J., Zhang, J., et al.: ArnetMiner: Extraction and Mining of Academic Social Networks. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998 (2008)

    Google Scholar 

  6. Torvik, V., Weeber, M., et al.: A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology 56, 140–158 (2005)

    Article  Google Scholar 

  7. Kang, I.-S., Na, S.-H., et al.: On co-authorship for author disambiguation. Information Processing and Management 45, 84–97 (2009)

    Article  Google Scholar 

  8. Qian, Y., Hu, Y., et al.: Combining machine learning and human judgment in author disambiguation. In: International Conference on Information and Knowledge Management, pp. 1241–1246 (2011)

    Google Scholar 

  9. Yang, K.H., Chung, J.M., et al.: PLF: A Publication list Web page finder for researchers. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 295–298 (2007)

    Google Scholar 

  10. Xi, W., Fox, E.A., Tan, R.P., Shu, J.: Machine Learning Approach for Homepage Finding Task. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 145–159. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Kraaij, W., Westerveld, T., Hiemstra, D.: The importance of prior probabilities for entry page search. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 27–34 (2002)

    Google Scholar 

  12. Upstill, T., Craswell, N., et al.: Query-independent evidence in home page finding. ACM Transactions on Information Systems 21, 286–313 (2003)

    Article  Google Scholar 

  13. Shakes, J., Langheinrich, M., et al.: Dynamic reference sifting: A case study in the homepage domain. Computer Networks and ISDN Systems 29, 1193–1204 (1997)

    Article  Google Scholar 

  14. Fang, Y., Si, L., et al.: Discriminative graphical models for researcher’s homepage discovery. Information Retrieval 13, 618–635 (2010)

    Article  Google Scholar 

  15. Tan, Y.F., Kan, M.Y., et al.: Search engine driven author disambiguation. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 314–315 (2006)

    Google Scholar 

  16. Pereira, D.A., Ribeiro-neto, B.A., et al.: Using web information for author name disambiguation. In: Proceedings of 9th ACM/IEEE Joint Conference on Digital Libraries, pp. 49–58 (2009)

    Google Scholar 

  17. Culotta, A., Bekkerman, R., et al.: Extracting social networks and contact information from email and the Web. In: Proceeding of Conference on Email and Anti-Spam (2004)

    Google Scholar 

  18. Matsuo, Y., et al.: Mining Social Network of Conference Participants from the Web. In: IEEE/WIC International Conference on Web Intelligence, pp. 190–193 (2003)

    Google Scholar 

  19. Mori, J., Tsujishita, T., Matsuo, Y., Ishizuka, M.: Extracting Relations in Social Networks from the Web Using Similarity Between Collective Contexts. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 487–500. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Kang, I., Kim, P., et al.: A largescale testset for authordisambiguation. Journal of the Korea Contents Association, 455–464 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ye, J., Qian, Y., Zheng, Q. (2012). PLIDMiner: A Quality Based Approach for Researcher’s Homepage Discovery. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35341-3_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35340-6

  • Online ISBN: 978-3-642-35341-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics