Abstract
As a result of our increased ability to collect data from different sources, many real-world datasets are increasingly becoming multi-featured and these features can also be of different types. Examples of such multi-feature data include different modes of interactions among people (Facebook, Twitter, LinkedIn, ...) or traffic accidents associated with diverse factors (speed, light conditions, weather, ...).
Efficiently modeling and analyzing these complex datasets to obtain actionable knowledge presents several challenges. Traditional approaches, such as using single layer networks (or monoplexes) may not be sufficient or appropriate for modeling and computation scalability. Recently, multiplexes have been proposed for the elegant handling of such data.
In this position paper, we elaborate on different types of multiplexes (homogeneous, heterogeneous and hybrid) for modeling different types of data. The benefits of this modeling in terms of ease, understanding, and usage are highlighted. However, this model brings with it a new set of challenges for its analysis. The bulk of the paper discusses these challenges and the advantages of using this approach. With the right tools, both computation and storage can be reduced in addition to accommodating scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The internet movie database. ftp://ftp.fu-berlin.de/pub/misc/movies/database/
Road safety - accidents (2014). https://data.gov.uk/dataset/road-accidents-safety-data/resource/1ae84544-6b06-425d-ad62-c85716a80022
Storm events database by NOAA. https://www.ncdc.noaa.gov/stormevents/ftp.jsp
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Very Large Data Bases, pp. 487–499 (1994)
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D.: On storing voluminous RDF descriptions: the case of web portal catalogs. In: International Workshop on the Web and Databases, pp. 43–48 (2001)
Berenstein, A., Magarinos, M.P., Chernomoretz, A., Aguero, F.: A multilayer network approach for guiding drug repositioning in neglected diseases. PLOS (2016)
Boden, B., Gnnemann, S., Hoffmann, H., Seidl, T.: Mining coherent subgraphs in multi-layer graphs with edge labels. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2012), Beijing, China, pp. 1258–1266 (2012)
Bohlin, L., Edler, D., Lancichinei, A., Rosvall, M.: Community detection and visualization of networks with the map equation framework (2014). http://www.mapequation.org/assets/publications/mapequationtutorial.pdf
Chakraborty, T., Srinivasan, S., Ganguly, N., Mukherjee, A., Bhowmick, S.: Permanence and community structure in complex networks (2015). Accepted to TKDD
Chakravarthy, S., Pradhan, S.: DB-FSG: an SQL-based approach for frequent subgraph mining. In: DEXA, pp. 684–692 (2008)
Das, S., Chakravarthy, S.: Partition and conquer: map/reduce way of substructure discovery. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 365–378. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22729-0_28
Das, S., Goyal, A., Chakravarthy, S.: Plan before you execute: a cost-based query optimizer for attributed graph databases. In: Madria, S., Hara, T. (eds.) DaWaK 2016. LNCS, vol. 9829, pp. 314–328. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43946-4_21
De Domenico, M., Solé-Ribalta, A., Cozzo, E., Kivelä, M., Moreno, Y., Porter, M.A., Gómez, S., Arenas, A.: Mathematical formulation of multilayer networks. Phys. Rev. X 3(4), 041022 (2013)
Deshpande, M., Kuramochi, M., Karypis, G.: Frequent sub-structure-based approaches for classifying chemical compounds. In: IEEE International Conference on Data Mining, pp. 35–42 (2003)
Domenico, M.D., Nicosia, V., Arenas, A., Latora, V.: Layer aggregation and reducibility of multilayer interconnected networks. CoRR abs/1405.0425 (2014). http://arxiv.org/abs/1405.0425
Dong, X., Frossard, P., Vandergheynst, P., Nefedov, N.: Clustering with multi-layer graphs: a spectral perspective. CoRR abs/1106.2233 (2011). http://dblp.uni-trier.de/db/journals/corr/corr1106.html#abs-1106-2233
Holder, L.B., Cook, D.J., Djoko, S.: Substucture discovery in the SUBDUE System. In: Knowledge Discovery and Data Mining, pp. 169–180 (1994)
Horvath, S., Zhang, B., Carlson, M., Lu, K., Zhu, S., Felciano, R., Laurance, M., Zhao, W., Qi, S., Chen, Z., et al.: Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. Proc. Nat. Acad. Sci. 103(46), 17402–17407 (2006)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: ICDM 2003, Washington, DC, USA, pp. 549–552 (2003)
Huang, C.Y., Wen, T.H.: A multilayer epidemic simulation framework integrating geographic information system with traveling networks. In: 2010 8th World Congress on Intelligent Control and Automation (WCICA), pp. 2002–2007, July 2010
Jeong, H., Mason, S.P., Barabási, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)
Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J.P., Moreno, Y., Porter, M.A.: Multilayer networks. CoRR abs/1309.7233 (2013). http://arxiv.org/abs/1309.7233
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: IEEE International Conference on Data Mining, pp. 313–320 (2001)
Labatut, V.: Generalized measures for the evaluation of community detection methods. CoRR abs/1303.5441 (2013)
Magnani, M., Rossi, L.: Formation of multiple networks. In: Greenberg, A.M., Kennedy, W.G., Bos, N.D. (eds.) SBP 2013. LNCS, vol. 7812, pp. 257–264. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37210-0_28
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Ng, M.K.P., Li, X., Ye, Y.: Multirank: co-ranking for objects and relations in multi-relational data. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1217–1225. ACM (2011)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 647–652, KDD 2004. ACM, New York (2004)
Padmanabhan, S., Chakravarthy, S.: HDB-Subdue: a scalable approach to graph mining. In: DaWaK, pp. 325–338 (2009)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE, pp. 215–224 (2001)
Santra, A., Bhowmick, S., Chakravarthy, S.: Efficient community re-creation in multilayer networks using boolean operations. In: International Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland, pp. 58–67 (2017). https://doi.org/10.1016/j.procs.2017.05.246
Santra, A., Bhowmick, S., Chakravarthy, S.: Hubify: efficient estimation of central entities across multiplex layer compositions. In: 2017 IEEE International Conference on Data Mining Workshops, ICDM Workshops 2017, New Orleans, USA, 18 November 2017 (2017, to appear)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002)
Acknowledgment
We would like to extend our gratitude towards Dr. Sharma Chakravarthy, University of Texas whose insight and expertise greatly helped in shaping up of this position paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Santra, A., Bhowmick, S. (2017). Holistic Analysis of Multi-source, Multi-feature Data: Modeling and Computation Challenges. In: Reddy, P., Sureka, A., Chakravarthy, S., Bhalla, S. (eds) Big Data Analytics. BDA 2017. Lecture Notes in Computer Science(), vol 10721. Springer, Cham. https://doi.org/10.1007/978-3-319-72413-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-72413-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72412-6
Online ISBN: 978-3-319-72413-3
eBook Packages: Computer ScienceComputer Science (R0)