Skip to main content

Rule-Based Specification and Implementation of Multimodel Data Integration

  • Conference paper
  • First Online:
Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2017)

Abstract

An approach for rule-based specification of data integration using RIF-BLD logic dialect that is a recommendation of W3C is presented. The approach allows to combine entities defined in different sources represented in different data models (relational, XML, graph-based, document-based) in the same rule. Logical semantics of RIF-BLD provides for unambiguous interpretation of data integration rules. The paper proposes an approach for implementation of RIF-BLD rules using IBM High-level integration language (HIL) as well. Thus data integration rules can be compiled into MapReduce programs and executed over Hadoop-based distributed infrastructures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Hadoop Project (2017). http://hadoop.apache.org/

  2. Ballard, C., Alon, T., Dronavalli, N., Jennings, S., Lee, M., Toratani, S.: IBM InfoSphere Information Server Deployment Architectures (2012). ibm.com/redbooks

  3. Bar-Or, A., Choudhary, S.: Transform XML using the DataStage XML stage. IBM developerWorks (2011)

    Google Scholar 

  4. Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.-C., Ozcan, F., Shekita, E.J.: Jaql: a scripting language for large scale semistructured data analysis. In: 37th International conference on very large data bases VLDB, pp. 1272–1283. Curran Associates, New York (2011)

    Google Scholar 

  5. Boley, H., Kifer, M. (eds.): RIF Framework for Logic Dialects. W3C Recommendation, 2nd edn., 5 February 2013

    Google Scholar 

  6. Boley, H., Kifer, M. (eds.): RIF Basic Logic Dialect. W3C Recommendation, 2nd edn., 5 February 2013

    Google Scholar 

  7. Burdick, D., Hernández, M.A., Ho, H., Koutrika, G., Krishnamurthy, R., Popa, L., Stanoi, I.R., Vaithyanathan, S., Das, S.: Extracting, linking and integrating data from public sources: a financial case study. IEEE Data Eng. Bull. 34(3), 60–67 (2011)

    Google Scholar 

  8. Devyatkin, D., Shelmanov, A.: Text processing framework for emergency event detection in the Arctic zone. In: Kalinichenko, L., Kuznetsov, Sergei O., Manolopoulos, Y. (eds.) DAMDID/RCDL 2016. CCIS, vol. 706, pp. 74–88. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57135-5_6

    Chapter  Google Scholar 

  9. Fagin, R., Kolaitis, P., Miller, R., Popa, L.: Data exchange: semantics and query answering. Theoret. Comput. Sci. 336(1), 89–124 (2005)

    Article  MathSciNet  Google Scholar 

  10. Hernandez, M., Koutrika, G., Krishnamurthy, R., Popa, L., Wisnesky, R.: HIL: A high-level scripting language for entity integration. In: 16th Conference (International) on Extending Database Technology Proceedings EDBT 2013, pp. 549–560 (2013)

    Google Scholar 

  11. IBM InfoSphere BigInsights Version 3.0 Information Center. https://goo.gl/lZpEQd

  12. InfoSphere Big Match for Hadoop. Technical Overview. https://goo.gl/0TMqvw

  13. Introducing JSON. http://www.json.org/

  14. Miner, D.: MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems. O’Reilly Media, Newton (2012)

    Google Scholar 

  15. The Apache Hive data warehouse software. http://hive.apache.org/

  16. Briukhov, D.O., Skvortsov, N.A., Stupnikov, S.A.: Methods of integration of multistructured data on Arctic zone for extraction of information aimed at support of search and rescue operations. Highly Available Syst. 13(2), 3–19 (2017)

    Google Scholar 

  17. The Unified State System of Information on the Global Ocean. http://portal.esimo.ru/portal

  18. Complex integrated information system MoRe. http://www.marsat.ru/ciis-more

  19. Sea Rescue (Poisk-More) Software Suite. http://map.geopallada.ru/

  20. Skvortsov, N.A., Briukhov, D.O.: Development of information warehouse schema for support of search and rescue activities in the Arctic region. Highly Available Syst. 13(2), 20–44 (2017)

    Google Scholar 

  21. Stupnikov, S.: Specification and implementation of multimodel data integration rules. In: Selected Papers of the XIX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2017), CEUR Workshop Proceedings, vol. 2022, pp. 197–205 (2017)

    Google Scholar 

  22. Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling Frame-work, 2nd edn. Addison-Wesley Professional, Boston (2008)

    Google Scholar 

  23. EMFText Concrete Syntax Mapper. http://www.emftext.org/index.php/EMFText

  24. Abstract and concrete syntax of RIF-FLD. GitHub Repository (2018). https://github.com/sstupnikov/ModelTransformation/tree/master/RIF_FLD/

Download references

Acknowledgement

The research is partially supported by Russian Foundation for Basic Research, projects 15-29-06045, 18-07-01434.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Stupnikov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Stupnikov, S. (2018). Rule-Based Specification and Implementation of Multimodel Data Integration. In: Kalinichenko, L., Manolopoulos, Y., Malkov, O., Skvortsov, N., Stupnikov, S., Sukhomlin, V. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2017. Communications in Computer and Information Science, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-319-96553-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96553-6_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96552-9

  • Online ISBN: 978-3-319-96553-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics