Skip to main content

Towards Increasing Open Data Adoption Through Stream Data Integration and Imputation

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices (IEA/AIE 2021)

Abstract

Open data portals are used to make a growing number of government data resources public. However, difficulties in preprocessing and integrating multiple possibly incomplete open data resources hinder the potential of open data re-use for software development. While incomplete data sets can be imputed in an offline process, this is not the case for data streams expected to be published in an online manner.

In this work, we propose a novel data stream preprocessing method aimed at simplifying open data stream re-use through the unification of time resolution and imputation of incomplete instances. The method relies on stream mining methods to predict categorical values to be imputed. A separate online learning model is built for every incomplete feature. The method we propose allows the model to benefit from both inter-feature similarities and temporal dependencies present in data streams. We validate the proposed method with public transport data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Highways Agency network journey time and traffic flow data was an example of such a data set.

References

  1. Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22

    Chapter  Google Scholar 

  2. Bifet, A., Gavaldà, R., Holmes, G., Pfahringer, B.: Machine Learning for Data Streams with Practical Examples in MOA. MIT Press, Cambridge (2018)

    Book  Google Scholar 

  3. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 71–80. ACM, New York (2000)

    Google Scholar 

  4. Gomes, H.M., et al.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9), 1469–1495 (2017)

    Article  MathSciNet  Google Scholar 

  5. Grzenda, M., Kwasiborska, K., Zaremba, T.: Hybrid short term prediction to address limited timeliness of public transport data streams. Neurocomputing 391, 305–317 (2020). https://doi.org/10.1016/j.neucom.2019.08.100

    Article  Google Scholar 

  6. Grzenda, M., Legierski, J.: Towards increased understanding of open data use for software development. Inf. Syst. Front. 23(2), 495–513 (2019). https://doi.org/10.1007/s10796-019-09954-6

    Article  Google Scholar 

  7. Grzymala-Busse, J.W., Grzymala-Busse, W.J.: Handling missing attribute values. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 37–57. Springer, Boston (2005). https://doi.org/10.1007/978-0-387-09823-4_3

    Chapter  MATH  Google Scholar 

  8. Jetzek, T., Avital, M., Bjorn-Andersen, N.: Data-driven innovation through open government data. J. Theor. Appl. Electron. Commer. Res. 9(2), 100–120 (2014). https://doi.org/10.4067/S0718-18762014000200008

    Article  Google Scholar 

  9. Miao, X., Gao, Y., Guo, S., Liu, W.: Incomplete data management: a survey. Front. Comput. Sci. 12(1), 4–25 (2018). https://doi.org/10.1007/s11704-016-6195-x

    Article  Google Scholar 

  10. Thorsby, J., Stowers, G.N., Wolslegel, K., Tumbuan, E.: Understanding the content and features of open data portals in American cities. Gov. Inf. Q. 34(1), 53–61 (2017)

    Article  Google Scholar 

  11. Yu, Q., Miche, Y., Eirola, E., van Heeswijk, M., Séverin, E., Lendasse, A.: Regularized extreme learning machine for regression with missing data. Neurocomputing 102(C), 45–51 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

The project was funded by the POB Research Centre for Artificial Intelligence and Robotics of Warsaw University of Technology within the Excellence Initiative Program - Research University (ID-UB).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Grzenda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kunicki, R., Grzenda, M. (2021). Towards Increasing Open Data Adoption Through Stream Data Integration and Imputation. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79457-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79456-9

  • Online ISBN: 978-3-030-79457-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics