Skip to main content
Log in

REMI: A framework of reusable elements for mining heterogeneous data with missing information

A Tale of Congestion in Two Smart Cities

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Applications targeting smart cities tackle common challenges, however solutions are seldom portable from one city to another due to the heterogeneity of smart city ecosystems. A major obstacle involves the differences in the levels of available information. In this work, we present REMI, which is a mining framework that handles varying degrees of information availability by providing a meta-solution to missing data. The framework core concept is the REMI layered stack architecture, offering two complementary approaches to dealing with missing information, namely data enrichment (DARE) and graceful degradation (GRADE). DARE aims at inference of missing information levels, while GRADE attempts to mine the patterns using only the existing data.We show that REMI provides multiple ways for re-usability, while being fault tolerant and enabling incremental development. One may apply the architecture to different problem instantiations within the same domain, or deploy it across various domains. Furthermore, we introduce the other three components of the REMI framework backing the layered stack. To support decision making in this framework, we show a mapping of REMI into an optimization problem (OTP) that balances the trade-off between three costs: inaccuracies in inference of missing data (DARE), errors when using less information (GRADE), and gathering of additional data. Further, we provide an experimental evaluation of REMI using real-world transportation data coming from two European smart cities, namely Dublin and Warsaw.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://www.vavel-project.eu/

  2. The objective function is indeed linear since we can define an assistive decision variable \(w_{k^{\prime },\gamma ^{\prime }} \in \{ 0,1 \}\) that equals 1 whenever the value of k is set to \(k^{\prime }\) and the value of \(\gamma \) is set to \(\gamma ^{\prime }\). We can then rewrite \(c(1 - P[k, \mathbf {y}])\) as \(c {\sum }_{k^{\prime } \in \{0,\ldots ,n_{\alpha }\}} {\sum }_{\gamma ^{\prime }\in \{0,\ldots , 2^{n_{\alpha }}\}}(1- w_{k^{\prime },\gamma ^{\prime }}P[k^{\prime },\gamma ^{\prime }])\).

References

  • Artikis, A., Weidlich, M., Schnitzler, F., Boutsis, I., Liebig, T., Piatkowski, N., Bockermann, C., Morik, K., Kalogeraki, V., Marecek, J., Gal, A., Mannor, S., Kinane, D., Gunopulos, D. (2014). Heterogeneous stream processing and crowdsourcing for urban traffic management. EDBT, 14, 712–723.

    Google Scholar 

  • Bockermann, C, & Blom, H. (2012). The streams framework. Technical Report 5. TU Dortmund University, 12.

  • Cao, X., Cong, G., Jensen, C.S. (2010). Mining significant semantic locations from GPS data. Proceedings of the VLDB Endowment, 3(1-2), 1009–1020.

    Article  Google Scholar 

  • Chen, C., Lu, C., Huang, Q., Yang, Q., Gunopulos, D., Guibas, L.J. (2016). City-scale map creation and updating using GPS collections. In KDD, pages 1465–1474. ACM.

  • Chen, M., Mao, S., Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171–209.

    Article  Google Scholar 

  • Cole, T.A., Wanik, D.W., Molthan, A.L., Roman, M.O., Griffin, R.E. (2017). Synergistic use of nighttime satellite data, electric utility infrastructure, and ambient population to improve power outage detections in urban areas. Remote Sensing, 9(3), 286.

    Article  Google Scholar 

  • Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L. (2015). A multi-view learning approach to the discovery of deviant process instances, pp. 146–165.

  • Deb, K. (2014). Multi-objective optimization. In: Search methodologies, pages 403–449. Springer.

  • Docker. Inc. Docker. https://www.docker.com/.

  • Gal, A., Mandelbaum, A., Schnitzler, F., Senderovich, A., Weidlich, M. (2017). Traveling time prediction in scheduled transportation with journey segments. Information Systems, 64, 266–280.

    Article  Google Scholar 

  • Lee, C.-H., Birch, D., Wu, C., Silva, D., Tsinalis, O., Li, Y., Yan, S., Ghanem, M., Guo, Y. (2013). Building a generic platform for big sensor data application. In: BigData Conference, pages 94–102. IEEE.

  • Mihalkova, L., Huynh, T., Mooney, R.J. (2007). Mapping and revising markov logic networks for transfer learning. In: Proceedings of the 22nd national conference on artificial intelligence. AAAI’07, pp. 608–614.

  • OpenStreetMap Foundation. OpenStreetMap. https://www.openstreetmap.org/copyright.

  • Pinelli, F., Hou, A., Calabrese, F., Nanni, M., Zegras, C., Ratti, C. (2009). Space and time-dependant bus accessibility: A case study in rome. In: 2009 12th international IEEE conference on intelligent transportation systems, pp. 1–6.

  • Pratt, L.Y. (1993). Discriminability-based transfer between neural networks. In: Advances in Neural Information Processing Systems 5, [NIPS Conference], pp. 204–211.

  • Rogers, S., Langley, P., Wilson, C. (1999). Mining GPS data to augment road models. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 104–113. ACM.

  • Schieferdecker, I., Tcholtchev, N., Lämmel, P. (2016). Urban data platforms: An overview. In Proceedings of the 12th international symposium on open collaboration companion, OpenSym ’16, pages 14:1–14:4. New York: ACM.

  • Schnitzler, F., Liebig, T., Marmor, S., Souto, G., Bothe, S., Stange, H. (2014). Heterogeneous stream processing for disaster detection and alarming. In: BigData Conference, pages 914–923. IEEE.

  • Thakur, G.S., Bhaduri, B.L., Piburn, J.O., Sims, K.M., Stewart, R.N., Urban, M.L. (2015). PlanetSense: a real-time streaming and spatio-temporal analytics platform for gathering geo-spatial intelligence from open source data. In: SIGSPATIAL/GIS, pages 11:1–11:4. ACM.

  • The Apache Software Foundation. Apache Flink. https://flink.apache.org/.

  • Xu, C., Tao, D., Xu, C. (2013). A survey on multi-view learning. CoRR.

  • Zhang, D., Zhao, J., Zhang, F., He, T., Lee, H., Son, S.H. (2016). Heterogeneous model integration for multi-source urban infrastructure data. ACM Trans. Cyber-Phys. Syst., 1(1), 4,1–4,26.

    Article  Google Scholar 

  • Zheng, Y., Zhang, L., Xie, X., Ma, W.-Y. (2009). Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the 18th international conference on World wide web, pages 791–800. ACM.

  • Zygouras, N., & Gunopulos, D. (2017). Discovering corridors from gps trajectories. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL’17, pages 61:1–61:4. New York: ACM.

Download references

Acknowledgements

This project received funding from the European Union Horizon 2020 Programme (Horizon2020/2014-2020), under grant agreement 688380.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avigdor Gal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gal, A., Gunopulos, D., Panagiotou, N. et al. REMI: A framework of reusable elements for mining heterogeneous data with missing information. J Intell Inf Syst 51, 367–388 (2018). https://doi.org/10.1007/s10844-018-0524-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-018-0524-5

Keywords

Navigation