Skip to main content

Semantic Search and Analytics over Large Repository of Scientific Articles

  • Chapter
  • First Online:
Intelligent Tools for Building a Scientific Information Platform

Part of the book series: Studies in Computational Intelligence ((SCI,volume 390))

Abstract

We present the architecture of the system aimed at search and synthesis of information within document repositories originating from different sources, with documents provided not necessarily in the same format and the same level of detail. The system is expected to provide domain knowledge interfaces enabling the internally implemented algorithms to identify relationships between documents (as well as authors, institutions et cetera) and concepts (such as, e.g., areas of science) extracted from various types of knowledge bases. The system should be scalable by means of scientific content storage, performance of analytic processes, and speed of search. In case of compound computational tasks (such as production of richer semantic indexes for the search improvements), it should follow the paradigms of hierarchical modeling and computing, designed as an interaction between domain experts, system experts, and appropriately implemented intelligent modules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey, M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J., Garcia-Molina, H., Gehrke, J., Gruenwald, L., Haas, L.M., Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Korth, H.F., Kossmann, D., Madden, S., Magoulas, R., Ooi, B.C., O’Reilly, T., Ramakrishnan, R., Sarawagi, S., Stonebraker, M., Szalay, A.S., Weikum, G.: The Claremont Report on Database Research. Commun. ACM 52(6), 56–65 (2009)

    Article  Google Scholar 

  2. Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Enriching Education through Data Mining. In: Kuznetsov, S.O., Mandal, D.P., Kundu, M.K., Pal, S.K. (eds.) PReMI 2011. LNCS, vol. 6744, pp. 1–2. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. Badr, Y., Chbeir, R., Abraham, A., Hassanien, A.: Emergent Web Intelligence: Advanced Semantic Technologies. Springer, Heidelberg (2010)

    Book  MATH  Google Scholar 

  4. Bazan, J.G.: Hierarchical Classifiers for Complex Spatio-temporal Concepts. T. Rough Sets 9, 474–750 (2008)

    Google Scholar 

  5. Betliński, P., Gora, P., Herba, K., Nguyen, T.T., Stawicki, S.: Semantic Recognition of Digital Documents. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. Springer, Heidelberg (2011)

    Google Scholar 

  6. Breitman, K., Casanova, M., Truszkowski, W.: Semantic Web: Concepts, Technologies and Applications. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  7. Butcher, S., Clarke, C., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press (2010)

    Google Scholar 

  8. Cao, L.: Data Mining and Multiagent Integration. Springer, Heidelberg (2009)

    Book  Google Scholar 

  9. Chodorow, K., Dirolf, M.: MongoDB: The Definitive Guide: Powerful and Scalable Data Storage. O’Reilly Media (2010)

    Google Scholar 

  10. Codd, E.: Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks. SIGMOD Record 38(1), 17–36 (2009)

    Article  Google Scholar 

  11. Colomb, R.: Ontology and the Semantic Web. IOS Press (2007)

    Google Scholar 

  12. Davies, J., Grobelnik, M., Mladenic, D.: Semantic Knowledge Management: Integrating Ontology Management, Knowledge Discovery, and Human Language Technologies. Springer, Heidelberg (2009)

    MATH  Google Scholar 

  13. Feldman, J.A.: From Molecule to Metaphor: A Neural Theory of Language (A Bradford Book). MIT Press (2006)

    Google Scholar 

  14. Gasevic, D., Djuric, D., Devedzic, V.: Model Driven Engineering and Ontology Development. Springer, Heidelberg (2009)

    Google Scholar 

  15. Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering with Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web. Springer, Heidelberg (2004)

    Google Scholar 

  16. Han, J.: Construction and Analysis of Web-Based Computer Science Information Networks. In: Kuznetsov, S.O., Ślęzak, D., Hepting, D.H., Mirkin, B.G. (eds.) RSFDGrC 2011. LNCS (LNAI), vol. 6743, pp. 1–2. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Helbig, H.: Knowledge Representation and the Semantics of Natural Language. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  18. Jankowski, A., Skowron, A.: A Wistech Paradigm for Intelligent Systems. T. Rough Sets 6, 94–132 (2007)

    Google Scholar 

  19. Kacprzyk, J., Zadrożny, S.: Computing with words is an implementable paradigm: Fuzzy queries, linguistic data summaries, and natural-language generation. IEEE T. Fuzzy Systems 18(3), 461–472 (2010)

    Article  Google Scholar 

  20. Kowalski, M., Ślęzak, D., Stencel, K., Pardel, P., Grzegorowski, M., Kijowski, M.: RDBMS Model for Scientific Articles Analytics. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform, Springer, Heidelberg (2011)

    Google Scholar 

  21. Ledford, J.L.: Search Engine Optimization Bible. Wiley (2009)

    Google Scholar 

  22. McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action, 2nd edn. Manning Publications (2010)

    Google Scholar 

  23. Mika, P.: Social Networks and the Semantic Web. In: Proc. of Int. Conf. on Web Intelligence (WI), pp. 285–291 (2004)

    Google Scholar 

  24. Nguyen, H.S.: Approximate Boolean Reasoning: Foundations and Applications in Data Mining, pp. 334–506 (2006)

    Google Scholar 

  25. Nguyen, L.A., Nguyen, H.S.: On Designing the SONCA System. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. Springer, Heidelberg (2011)

    Google Scholar 

  26. Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended Document Representation for Search Result Clustering. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. Springer, Heidelberg (2011)

    Google Scholar 

  27. Nolfi, S., Mirolli, M.: Evolution of Communication and Language in Embodied Agents. Springer, Heidelberg (2010)

    Book  MATH  Google Scholar 

  28. Pawlak, Z.: Information Systems Theoretical Foundations. Inf. Syst. 6(3), 205–218 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  29. Pedrycz, W., Skowron, A., Kreinovich, V. (eds.): Handbook of Granular Computing. Wiley (2008)

    Google Scholar 

  30. Poggio, T., Smale, S.: The Mathematics of Learning: Dealing with Data. Notices of the AMS 50(5), 537–544 (2003)

    MathSciNet  MATH  Google Scholar 

  31. Shinyama, Y.: PDFMiner: Python PDF Parser and Analyzer (2010), http://www.unixuser.org/~euske/python/pdfminer/

  32. Skowron, A., Stepaniuk, J., Świniarski, R.W.: Approximation Spaces in Rough-Granular Computing. Fundam. Inform. 100(1-4), 141–157 (2010)

    MATH  Google Scholar 

  33. Ślęzak, D., Kowalski, M.: Towards Approximate SQL – Infobright’s Approach. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 630–639. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  34. Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: An Analytic Data Warehouse for Ad-hoc Queries. Proc. VLDB Endow. 1(2), 1337–1345 (2008)

    Google Scholar 

  35. Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with Use of Knowledge from DBpedia. In: Yao, J. (ed.) RSKT 2011. LNCS (LNAI), vol. 6954, pp. 394–403. Springer, Heidelberg (2011)

    Google Scholar 

  36. Ulam, S.: Analogies Between Analogies: The Mathematical Reports of S. M. Ulam and His Los Alamos Collaborators. University of California Press (1990)

    Google Scholar 

  37. Valiant, L.G.: Robust Logics. Artif. Intell. 117(2), 231–253 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  38. Vapnik, V.: Learning Has Just Started (An interview with Vladimir Vapnik by Ran Gilad-Bachrach) (2008), http://seed.ucsd.edu/joomla/index.php/articles/12-interviews/9-qlearning-has-just-startedq-an-interview-with-prof-vladimir-vapnik

  39. Wasilewski, P.: Towards Semantic Evaluation of Information Retrieval. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. Springer, Heidelberg (2011)

    Google Scholar 

  40. Zadeh, L.A.: Computing with Words and Perceptions - A Paradigm Shift. In: Proc. of Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 3–5 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

Nguyen, H.S., Ślęzak, D., Skowron, A., Bazan, J.G. (2012). Semantic Search and Analytics over Large Repository of Scientific Articles. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24809-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24809-2_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24808-5

  • Online ISBN: 978-3-642-24809-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics