Skip to main content
Log in

Heterogeneous data format integration and conversion (HDFIC) using machine learning and IBM-DFDL for IoT

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

The future of the Internet of Things (IoT) demands the integration of synergetic applications to cater to societal needs. Examples of IoT-based confederated applications include Ambient Assisted Living with Active Healthy Ageing, CasAware with Smart Energy, Smart Gas Distribution Networks with GIS systems, and more. However, the data heterogeneity hinders integration, as these systems follow different standards, data formats, semantic models, and representations. Further, this leads to data interoperability issues in IoT. The major concern of academia and industry in the smooth integration of heterogeneous applications is interpreting different data formats and representing them in a common schema for further analysis. Existing solutions, such as message payload translation, middleware/cloud format, and Inter-IoT, are complex, time-consuming, and ineffective. Hence, this paper proposes the heterogeneous data format integration and conversion (HDFIC), a machine learning-based system to identify data formats using a Random Forest classifier and integrate them using the Data Format Description Language (DFDL). The content-based data format identification in the proposed HDFIC is trained with the standard features defined in RFC 7111, 8259, and 8996. Subsequently, the data is integrated into a single XML Schema Definition and converted into the required data format using the IBM App Connect Enterprise tool and DFDL. Finally, the performance of HDFIC is evaluated with the synergetic patient body vitals and room ambiance dataset for accuracy, data integration time, and conversion efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Algorithm 3
Fig. 3
Algorithm 4
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published article. Methods to generate the dataset are mentioned in the article. A sample dataset is presented in the article.

Notes

  1. https://www.ibm.com/docs/en/app-connect/11.0.0?topic=app-connect-enterprise-software.

  2. https://www.ibm.com/docs/en/app-connect/11.0.0?topic=model-data-format-description-language-dfdl.

  3. https://www.ibm.com/docs/en/integration-bus/10.0?topic=esql-overview.

  4. https://www.igi-global.com/dictionary/ambient-assisted-living/33084.

  5. http://www.activageproject.eu/.

  6. https://json-ld.org/.

  7. https://inter-iot.eu/.

  8. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html.

  9. https://www.python.org/downloads/.

  10. https://numpy.org/doc/stable/index.html.

  11. https://pandas.pydata.org/docs/index.html.

  12. https://scikit-learn.org/stable/install.html.

  13. https://xgboost.readthedocs.io/en/stable/python/python_intro.html.

  14. https://pypi.org/project/matplotlib/.

  15. https://pypi.org/project/seaborn/.

  16. https://pypi.org/project/torch/.

  17. https://docs.python.org/3/library/difflib.html.

References

Download references

Funding

The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Authors contribution The major contributions of the authors are as follows: IoT’s data format identification originating at multiple sources using a content-based approach.

Integration of identified IoT’s heterogeneous format data using common field identifier.

Representation of integrated IoT’s heterogeneous format data into an appropriate format using Data Format Definition Language (DFDL).

Corresponding author

Correspondence to Sandeep M.

Ethics declarations

Competing interests:

The authors have no relevant financial or nonfinancial interests to disclose.

Consent

Has followed ethical standards, and No conflict of interests to disclose.

Materials and/or Code availability

Data will be provided on request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

M, S., Chandavarkar, B.R. & Khatri, S. Heterogeneous data format integration and conversion (HDFIC) using machine learning and IBM-DFDL for IoT. Evolving Systems 15, 375–396 (2024). https://doi.org/10.1007/s12530-024-09568-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-024-09568-7

Keywords

Navigation