Skip to main content

SONCA: Scalable Semantic Processing of Rapidly Growing Document Stores

  • Conference paper
New Trends in Databases and Information Systems

Abstract

Scientific data constitutes a great asset. However, its volume is far bigger than any human can comprehend. Therefore, automatic analytical, search and indexing solutions are called for. In this paper we present the architecture and the data model of such a system. SONCA (Search based on ONtologies and Compound Analytics) is a platform to implement and exploit intelligent algorithms identifying relations between various types of objects (publications, inventions, scientists and institutions). The results of these algorithms can be used to build semantic search engines but also can be fed into further analytical algorithms in order to find even more associations.We also show experimental evaluation of the performance of SONCA. Its results are promising and we argue that SONCA’s architecture is robust.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Adar, E., Teevan, J., Agichtein, E., Maarek, Y. (eds.): Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, February 8-12. ACM (2012)

    Google Scholar 

  2. Agrawal, R., et al.: The claremont report on database research. Commun. ACM 52(6), 56–65 (2009)

    Article  Google Scholar 

  3. Burzańska, M., Stencel, K., Suchomska, P., Szumowska, A., Wiśniewski, P.: Recursive Queries Using Object Relational Mapping. In: Kim, T.-H., Lee, Y.-H., Kang, B.-H., Ślęzak, D. (eds.) FGIT 2010. LNCS, vol. 6485, pp. 42–50. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Cuzzocrea, A., Serafino, P.: LCS-hist: taming massive high-dimensional data cube compression. In: Kersten, M.L., Novikov, B., Teubner, J., Polutin, V., Manegold, S. (eds.) EBDT. ACM International Conference Proceeding Series, vol. 360, pp. 768–779. ACM (2009)

    Google Scholar 

  5. Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S.: Interactive Document Indexing Method Based on Explicit Semantic Analysis. In: Yan, J., et al. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 156–165. Springer, Heidelberg (2012)

    Google Scholar 

  6. Kersten, M.L., Manegold, S.: Revolutionary database technology for data intensive research. ERCIM News (89) (2012)

    Google Scholar 

  7. Meina, M.: Query-context search result clustering basing on graphs. In: Szczuka, M., Czaja, L., Skowron, A., Kacprzak, M. (eds.) CS&P, Puttusk, Poland, pp. 346–352. Białystok University of Technology (2011) Electronic edition

    Google Scholar 

  8. Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended Document Representation for Search Result Clustering. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Poelmans, J., Ignatov, D., Kuznetsov, S., Dedene, G., Elzinga, P., Viaene, S.: Formal concept analysis in knowledge processing: A survey on applications. Inf. Sci. (2012)

    Google Scholar 

  10. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Data Min. Knowl. Discov. 4(2/3), 89–125 (2000)

    Article  Google Scholar 

  11. Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic Analytics of PubMed Content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. Ślęzak, D., Synak, P., Borkowski, J., Wróblewski, J., Toppin, G.: A rough-columnar RDBMS engine – a case study of correlated subqueries. IEEE Data Eng. Bull. 35(1), 34–39 (2012)

    Google Scholar 

  13. Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)

    Google Scholar 

  14. Szczuka, M., Betliński, P., Herba, K.: Named Entity Matching in Publication Databases. In: Yan, J., et al. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 172–179. Springer, Heidelberg (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Grzegorowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grzegorowski, M., Pardel, P.W., Stawicki, S., Stencel, K. (2013). SONCA: Scalable Semantic Processing of Rapidly Growing Document Stores. In: Pechenizkiy, M., Wojciechowski, M. (eds) New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32518-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32518-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32517-5

  • Online ISBN: 978-3-642-32518-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics