Skip to main content

Data Mining, Management and Visualization in Large Scientific Corpuses

  • Conference paper
  • First Online:
  • 1620 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9654))

Abstract

Organizing scientific papers helps efficiently derive meaningful insights of the published scientific resources, enables researchers grasp rapid technological change and hence assists new scientific discovery. In this paper, we experiment text mining and data management of scientific publications for collecting and presenting useful information to support research. For efficient data management and fast information retrieval, four data storages are employed: a semantic repository, an index and search repository, a document repository and a graph repository, taking full advantage of their features and strength. The results show that the combination of these four repositories can effectively store and index the publication data with reliability and efficiency and hence supply meaningful information to support scientific research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. DrInventor. http://drinventor.eu/

  2. pdfbox. https://pdfbox.apache.org/

  3. CARRE. https://www.carre-project.eu/

  4. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, July 2002

    Google Scholar 

  5. ANNIE. https://gate.ac.uk/sale/tao/splitch6.html#chap:annie

  6. Thakker, D., Sman, T., Lakin, P.: GATE Jape Grammar Tutorial, Version 1.0, A, Pictures, UK (2009)

    Google Scholar 

  7. Microsoft Academic Search (MAS) API. http://academic.research.microsoft.com/

  8. Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220

    Google Scholar 

  9. Jin, L., Liu, L.: An ontology definition metamodel based ripple-effect analysis method for ontology evolution. In: Proceedings of 10th International Conference on Computer Supported Cooperative Work in Design, pp. 1–6. doi:10.1109/CSCWD.2006.253032

  10. Fensel, D., Hendler, J., Lieberman, H., Wahlster, W., Berners-Lee, T.: Sesame: an architecture for storing and querying RDF data and schema information. In: MIT Press eBook Chapters: Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, pp. 197–222 (2005)

    Google Scholar 

  11. CouchDB. http://couchdb.apache.org/

  12. Elasticsearch. https://www.elastic.co/products/elasticsearch

  13. Grolinger, K., Higashino, W.A., Tiwari, A., Capretz, M.A.M.: Data management in cloud environments: NoSQL and NewSQL data stores. J. Cloud Comput.: Adv. Syst. Appl. 2(22), 2–22 (2013). doi:10.1186/2192-113X-2-22

    Google Scholar 

  14. Elasticsearch Rivers. https://www.elastic.co/guide/en/elasticsearch/rivers/1.4/index.html

  15. D3. http://d3js.org/

  16. Alexander, E., Kohlmann, J., Valenza, R., Witmore, M., Gleicher Serendip, M.: Topic model-driven visual exploration of text corpora. In: IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 173–182 (2014)

    Google Scholar 

Download references

Acknowledgments

The research is supported by Dr Inventor project {the European Union Seventh Framework Programme ([FP7/2007-2013]) Dr Inventor under grant agreement no. 611383} and CARRE project {the Seventh Framework Programme of European Commission – ICT under agreement of FP7-ICT-611140}.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Wei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wei, H. et al. (2016). Data Mining, Management and Visualization in Large Scientific Corpuses. In: El Rhalibi, A., Tian, F., Pan, Z., Liu, B. (eds) E-Learning and Games. Edutainment 2016. Lecture Notes in Computer Science(), vol 9654. Springer, Cham. https://doi.org/10.1007/978-3-319-40259-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40259-8_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40258-1

  • Online ISBN: 978-3-319-40259-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics