PLIDMiner: A Quality Based Approach for Researcher’s Homepage Discovery

Ye, Junting; Qian, Yanan; Zheng, Qinghua

doi:10.1007/978-3-642-35341-3_17

Junting Ye²¹,
Yanan Qian²¹ &
Qinghua Zheng²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7675))

Included in the following conference series:

Asia Information Retrieval Symposium

1217 Accesses

Abstract

Researchers’ high quality homepages are important resources in academic search because they provide comprehensive and up-to-date information about researchers. Meanwhile, low quality homepages widely exist. A case study shows that 57.8% of all homepages retrieved among top 10 results from Google are low quality and 95% top researchers own out-of-date homepages. Besides, some academic portals generate dynamic homepages introducing researchers. These homepages are not maintained by researchers and may contain incorrect information. The quality of discovered homepages can not be ensured by existing work, which decreases the efficiency of academic search. It is difficult to define a high quality homepage from a quantitative perspective. Instead, on the basis of analyzing labeled high quality homepages, we propose “informative researcher’s homepage”, at least consisting of identifiable information (introducing a researcher’s basic information) and publication list (listing his/her corresponding publications), as an estimation for high quality homepage. Based on the observation that informative researchers’ homepages are organized in two ways, integrated and scattered, we propose an effective discovering model, PLIDMiner, with F1 scores over 0.9 on labeled data. Our model can also be applied to verify homepages’ quality. We crawl thousands of homepage resources from popular academic portals and assess their overall qualities. It turns out that nearly 25% of homepage resources in these portals are not informative, which strengthens our motivation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kang, I.-S., et al.: Construction of a Large-scale Test Set for Author Disambiguation. Information Processing and Management 47, 452–465 (2011)
Article Google Scholar
Yang, K.-H., Ho, J.-M.: Parsing Publication Lists on the Web. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 444–447 (2010)
Google Scholar
Doan, A., Ramakrishnan, R., et al.: Community information management. IEEE Data Engineering Bulletin 29, 64–72 (2006)
Google Scholar
Li, J., Tang, J., et al.: Arnetminer: Expertise Oriented Search Using Social Networks. Frontiers of Computer Science in China, 94–105 (2008)
Google Scholar
Tang, J., Zhang, J., et al.: ArnetMiner: Extraction and Mining of Academic Social Networks. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998 (2008)
Google Scholar
Torvik, V., Weeber, M., et al.: A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology 56, 140–158 (2005)
Article Google Scholar
Kang, I.-S., Na, S.-H., et al.: On co-authorship for author disambiguation. Information Processing and Management 45, 84–97 (2009)
Article Google Scholar
Qian, Y., Hu, Y., et al.: Combining machine learning and human judgment in author disambiguation. In: International Conference on Information and Knowledge Management, pp. 1241–1246 (2011)
Google Scholar
Yang, K.H., Chung, J.M., et al.: PLF: A Publication list Web page finder for researchers. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 295–298 (2007)
Google Scholar
Xi, W., Fox, E.A., Tan, R.P., Shu, J.: Machine Learning Approach for Homepage Finding Task. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 145–159. Springer, Heidelberg (2002)
Chapter Google Scholar
Kraaij, W., Westerveld, T., Hiemstra, D.: The importance of prior probabilities for entry page search. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 27–34 (2002)
Google Scholar
Upstill, T., Craswell, N., et al.: Query-independent evidence in home page finding. ACM Transactions on Information Systems 21, 286–313 (2003)
Article Google Scholar
Shakes, J., Langheinrich, M., et al.: Dynamic reference sifting: A case study in the homepage domain. Computer Networks and ISDN Systems 29, 1193–1204 (1997)
Article Google Scholar
Fang, Y., Si, L., et al.: Discriminative graphical models for researcher’s homepage discovery. Information Retrieval 13, 618–635 (2010)
Article Google Scholar
Tan, Y.F., Kan, M.Y., et al.: Search engine driven author disambiguation. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 314–315 (2006)
Google Scholar
Pereira, D.A., Ribeiro-neto, B.A., et al.: Using web information for author name disambiguation. In: Proceedings of 9th ACM/IEEE Joint Conference on Digital Libraries, pp. 49–58 (2009)
Google Scholar
Culotta, A., Bekkerman, R., et al.: Extracting social networks and contact information from email and the Web. In: Proceeding of Conference on Email and Anti-Spam (2004)
Google Scholar
Matsuo, Y., et al.: Mining Social Network of Conference Participants from the Web. In: IEEE/WIC International Conference on Web Intelligence, pp. 190–193 (2003)
Google Scholar
Mori, J., Tsujishita, T., Matsuo, Y., Ishizuka, M.: Extracting Relations in Social Networks from the Web Using Similarity Between Collective Contexts. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 487–500. Springer, Heidelberg (2006)
Chapter Google Scholar
Kang, I., Kim, P., et al.: A largescale testset for authordisambiguation. Journal of the Korea Contents Association, 455–464 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

SPKLSTN Lab, Department of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, 710049, China
Junting Ye, Yanan Qian & Qinghua Zheng

Authors

Junting Ye
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Qian
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of computer Science and Technology, Tianjin University, Tianjin, 300072, China
Yuexian Hou
DIRO, University of Montreal, CP. 6128, succursale Centre-ville, H3C 3J7, Montreal, QC, Canada
Jian-Yun Nie
Institute of Software, Storage & Information Retrieval Laboratory, Chinese Academy of Sciences, 100190, Beijing, China
Le Sun
School of Computer Science and Technology, Tianjin University, 300072, Tianjin, China
Bo Wang
School of Computing, Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Peng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, J., Qian, Y., Zheng, Q. (2012). PLIDMiner: A Quality Based Approach for Researcher’s Homepage Discovery. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-35341-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics