Abstract
We present the relational database schema aimed at efficient storage and querying parsed scientific articles, as well as entities corresponding to researchers, institutions, scientific areas, et cetera. An important requirement in front of the proposed model is to operate with various types of entities, but with no increase of schema’s complexity. Another aspect is to store detailed information about parsed articles in order to conduct advanced analytics in combination with the domain knowledge about scientific topics, by means of standard SQL and RDBMS management. The overall goal is to enable offline, possibly incremental computation of semantic indexes supporting end users via other modules, optimized for fast search and not necessarily for fast analytics, as well as direct ad-hoc SQL access by the most advanced users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey, M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J., Garcia-Molina, H., Gehrke, J., Gruenwald, L., Haas, L.M., Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Korth, H.F., Kossmann, D., Madden, S., Magoulas, R., Ooi, B.C., O’Reilly, T., Ramakrishnan, R., Sarawagi, S., Stonebraker, M., Szalay, A.S., Weikum, G.: The Claremont Report on Database Research. Commun. ACM 52(6), 56–65 (2009)
Betliński, P., Gora, P., Herba, K., Nguyen, T.T., Stawicki, S.: Semantic Recognition of Digital Documents. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. Springer, Heidelberg (2011)
Chodorow, K., Dirolf, M.: MongoDB: The Definitive Guide: Powerful and Scalable Data Storage. O’Reilly Media (2010)
Grust, T.: Accelerating XPath Location Steps. In: Proc. of Int. Conf. on Management of Data (SIGMOD), pp. 109–120 (2002)
Hammond, W., Stead, W., Straube, M.: A Chartless Record-Is It Adequate? In: Proceedings of the Annual Symposium on Computer Application in Medical Care, vol. 7, pp. 89–94 (1982)
Hellerstein, J.M., Stonebraker, M., Hamilton, J.R.: Architecture of a Database System. Foundations and Trends in Databases 1(2), 141–259 (2007)
Jörg, B., Jeffery, K., van Grootel, G., Asserson, A., Dvorak, J., Rasmussen, H.: CERIF, - 1.2 Full Data Model (FDM) Introduction and Specification (2008), http://www.eurocris.org/Uploads/Web%20pages/CERIF2008/Release_1.2/CERIF2008_1.2_FDM.pdf
Kobdani, H., Schütze, H., Burkovski, A., Kessler, W., Heidemann, G.: Relational Feature Engineering of Natural Language Processing. In: Proc. of Int. Conf. on Information and Knowledge Management (CIKM), pp. 1705–1708 (2010)
Kowalski, M., Ślęzak, D., Toppin, G., Wojna, A.: Injecting Domain Knowledge into RDBMS – Compression of Alphanumeric Data Attributes. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS, vol. 6804, pp. 386–395. Springer, Heidelberg (2011)
Mihajlović, V., Blok, H.E., Hiemstra, D., Apers, P.M.G.: Score Region Algebra: Building a Transparent XML-R Database. In: Proc. of Int. Conf. on Information and Knowledge Management (CIKM), pp. 12–19 (2005)
Navarro, G., Baeza-Yates, R.A.: Proximal Nodes: A Model to Query Document Databases by Content and Structure. ACM Trans. Inf. Syst. 15(4), 400–435 (1997)
Nguyen, H.S., Ślęzak, D., Skowron, A., Bazan, J.G.: Semantic Search and Analytics over Large Repository of Scientific Articles. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform. Springer, Heidelberg (2011)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-scale Data Analysis. In: Proc. of Int. Conf. on Management of Data (SIGMOD), pp. 165–178 (2009)
Ślęzak, D., Eastwood, V.: Data Warehouse Technology by Infobright. In: Proc. of Int. Conf. on Management of Data (SIGMOD), pp. 841–846 (2009)
Ślęzak, D., Sosnowski, Ł.: SQL-Based Compound Object Comparators: A Case Study of Images Stored in ICE. In: Kim, T.-h., Kim, H.-K., Khan, M.K., Kiumi, A., Fang, W.-c., Ślęzak, D. (eds.) ASEA 2010. Communications in Computer and Information Science, vol. 117, pp. 303–316. Springer, Heidelberg (2010)
Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: An Analytic Data Warehouse for Ad-hoc Queries. Proc. VLDB Endow. 1(2), 1337–1345 (2008)
Tekli, J., Chbeir, R., Yétongnon, K.: An Overview on XML Similarity: Background, Current Trends and Future Directions. Computer Science Review 3(3), 151–173 (2009)
Teorey, T., Lightstone, S., Nadeau, T.: Database Modeling & Design: Logical Design, 4th edn. Morgan Kaufmann (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this chapter
Cite this chapter
Kowalski, M., Ślęzak, D., Stencel, K., Pardel, P., Grzegorowski, M., Kijowski, M. (2012). RDBMS Model for Scientific Articles Analytics. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24809-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-24809-2_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24808-5
Online ISBN: 978-3-642-24809-2
eBook Packages: EngineeringEngineering (R0)