Abstract
In this paper, we propose a method for extracting laboratory front pages from university websites. There are more than 779 universities and colleges in Japan. For selecting a university or a college, some high school students want to know what laboratories these universities or colleges have. To learn about these laboratories, high school students have to search the laboratory front pages from the university websites. However, sometimes it is difficult to find a laboratory front page because they are sometimes buried deep in the hierarchy of university websites. Our method extracts laboratory front pages by using a support vector machine model and applying certain rules. We also developed a laboratory search system that can be used to retrieve laboratory front pages extracted with our method. We evaluated our method and confirmed that is attained 85.0% precision and 65.5% recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bar-Ilan, J.: A microscopic link analysis of academic institutions within a country-the case of israel. Scientometrics 59(3), 391–403 (2004)
Bhardwaj, A., Mangat, V.: A novel approach for content extraction from web pages. In: 2014 Recent Advances in Engineering and Computational Sciences (RAECS), pp. 1–4. IEEE (2014)
Carey, H.J., Manic, M.: Html web content extraction using paragraph tags. In: 2016 IEEE 25th International Symposium on Industrial Electronics (ISIE), pp. 1099–1105. IEEE (2016)
Kenekayoro, P., Buckley, K., Thelwall, M.: Automatic classification of academic web page types. Scientometrics 101(2), 1015–1026 (2014)
Kenekayoro, P., Buckley, K., Thelwall, M.: Clustering research group website homepages. Scientometrics 102(3), 2023–2039 (2015)
Sakai, H., Nishizawa, Y., Matsunami, S., Sakaji, H.: Extraction of causal information from pdf files of the summary of financial statements of companies. J. Jpn. Soc. Artif. Intell. (in Japanese) 39(1), 172–183 (2015)
Thelwall, M.: Evidence for the existence of geographic trends in university web site interlinking. J. Documentation 58(5), 563–574 (2002)
Wilkinson, D., Harries, G., Thelwall, M., Price, L.: Motivations for academic web site interlinking: evidence for the web as a novel source of information on informal scholarly communication. J. Inf. Sci. 29(1), 49–56 (2003)
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number 15K00315.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Sakaji, H., Miyazaki, A., Sakai, H., Izumi, K. (2018). Extracting Laboratory Front Pages from University Websites. In: Barolli, L., Enokido, T., Takizawa, M. (eds) Advances in Network-Based Information Systems. NBiS 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 7. Springer, Cham. https://doi.org/10.1007/978-3-319-65521-5_103
Download citation
DOI: https://doi.org/10.1007/978-3-319-65521-5_103
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65520-8
Online ISBN: 978-3-319-65521-5
eBook Packages: EngineeringEngineering (R0)