Abstract
This paper addresses the problem of finding and extracting academic information from conference Web pages, then organizing academic information as ontologies, and finally generating academic linked data by matching these ontologies. The main contributions include (1) a topic-crawling method and lightweight crawling method based on search engine is presented. Crawling seeds, relevant websites filter, and crawling update strategy are discussed. (2) A new vision-based approach for extracting academic information is proposed. It first segments Web pages into text blocks and then classifies these text blocks into predefined categories. The initial classification results are improved by post-processing. Finally, academic information is extracted from the classified text blocks. (3) A global ontology is used to describe the background domain knowledge, and then the extracted academic information of each website is organized as local ontologies. Finally, academic linked data is generated by matching all local ontologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bizera, C., Lehmannb, J., Kobilarova, G., et al.: DBpedia – a crystallization point for the Web of Data. J. Web Semant. 7, 154–165 (2009)
Tang, J., Zhang, J., Yao, L., Li, J., et al.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV (2008)
Tang, J., Zhang, D., Yao, L.: Social network extraction of academic researchers. In: Proceedings of 2007 IEEE International Conference on Data Mining, Omaha, NE (2007)
Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18, 1411–1428 (2006)
Laender, A., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Record 31, 84–93 (2002)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA (1998)
Flake, G.W., Lawrence, S., Lee Giles, C., Coetzee, F.M.: Self-organization and identification of web communities. IEEE Comp. 35, 66–71 (2002)
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: VIPS: a vision-based page segmentation algorithm. Microsoft Technical Report (2003)
Liu, W., Meng, X., Meng, W.: ViDE: a vision-based approach for deep web data extraction. IEEE Trans. Knowl. Data Eng. 22, 447–460 (2010)
Wang, P., Xu, B.: Lily: ontology alignment results for OAEI 2009. In: The 4th International Workshop on Ontology Matching (OM2009), Washington, DC (2009)
Acknowledgments
This work is supported by the NSF of China (61003156 and 61003055) and the Natural Science Foundation of Jiangsu Province (BK2009136 and BK2011335).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Wang, P., Zhang, X. (2013). Finding, Extracting, and Building Academic Linked Data. In: Li, J., Qi, G., Zhao, D., Nejdl, W., Zheng, HT. (eds) Semantic Web and Web Science. Springer Proceedings in Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6880-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6880-6_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6879-0
Online ISBN: 978-1-4614-6880-6
eBook Packages: Computer ScienceComputer Science (R0)