Skip to main content

Extracting Laboratory Front Pages from University Websites

  • Conference paper
  • First Online:
Advances in Network-Based Information Systems (NBiS 2017)

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 7))

Included in the following conference series:

  • 1527 Accesses

Abstract

In this paper, we propose a method for extracting laboratory front pages from university websites. There are more than 779 universities and colleges in Japan. For selecting a university or a college, some high school students want to know what laboratories these universities or colleges have. To learn about these laboratories, high school students have to search the laboratory front pages from the university websites. However, sometimes it is difficult to find a laboratory front page because they are sometimes buried deep in the hierarchy of university websites. Our method extracts laboratory front pages by using a support vector machine model and applying certain rules. We also developed a laboratory search system that can be used to retrieve laboratory front pages extracted with our method. We evaluated our method and confirmed that is attained 85.0% precision and 65.5% recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.mext.go.jp/b_menu/toukei/chousa01/kihon/kekka/k_detail/1365622.htm.

  2. 2.

    http://taku910.github.io/mecab/.

  3. 3.

    http://hawk.ci.seikei.ac.jp/Lilas/.

References

  1. Bar-Ilan, J.: A microscopic link analysis of academic institutions within a country-the case of israel. Scientometrics 59(3), 391–403 (2004)

    Article  Google Scholar 

  2. Bhardwaj, A., Mangat, V.: A novel approach for content extraction from web pages. In: 2014 Recent Advances in Engineering and Computational Sciences (RAECS), pp. 1–4. IEEE (2014)

    Google Scholar 

  3. Carey, H.J., Manic, M.: Html web content extraction using paragraph tags. In: 2016 IEEE 25th International Symposium on Industrial Electronics (ISIE), pp. 1099–1105. IEEE (2016)

    Google Scholar 

  4. Kenekayoro, P., Buckley, K., Thelwall, M.: Automatic classification of academic web page types. Scientometrics 101(2), 1015–1026 (2014)

    Article  Google Scholar 

  5. Kenekayoro, P., Buckley, K., Thelwall, M.: Clustering research group website homepages. Scientometrics 102(3), 2023–2039 (2015)

    Article  Google Scholar 

  6. Sakai, H., Nishizawa, Y., Matsunami, S., Sakaji, H.: Extraction of causal information from pdf files of the summary of financial statements of companies. J. Jpn. Soc. Artif. Intell. (in Japanese) 39(1), 172–183 (2015)

    Google Scholar 

  7. Thelwall, M.: Evidence for the existence of geographic trends in university web site interlinking. J. Documentation 58(5), 563–574 (2002)

    Article  Google Scholar 

  8. Wilkinson, D., Harries, G., Thelwall, M., Price, L.: Motivations for academic web site interlinking: evidence for the web as a novel source of information on informal scholarly communication. J. Inf. Sci. 29(1), 49–56 (2003)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 15K00315.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroki Sakaji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Sakaji, H., Miyazaki, A., Sakai, H., Izumi, K. (2018). Extracting Laboratory Front Pages from University Websites. In: Barolli, L., Enokido, T., Takizawa, M. (eds) Advances in Network-Based Information Systems. NBiS 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 7. Springer, Cham. https://doi.org/10.1007/978-3-319-65521-5_103

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65521-5_103

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65520-8

  • Online ISBN: 978-3-319-65521-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics