Abstract
Data science applications often need to deal with data that does not fit into the standard entity-attribute-value model. In this chapter we discuss three of these other types of data. We discuss texts, images and graphs. The importance of social media is one of the reason for the interest on graphs as they are a way to represent social networks and, in general, any type of interaction between people. In this chapter we present examples of tools that can be used to extract information and, thus, analyze these three types of data. In particular, we discuss topic modeling using a hierarchical statistical model as a way to extract relevant topics from texts, image analysis using convolutional neural networks, and measures and visual methods to summarize information from graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C. C., & Zhai, C. X. (2012). Mining text data. Springer Science & Business Media.
Bae, J., & Watson, B. (2011). Developing and evaluating quilts for the depiction of large layered graphs. IEEE Transactions on Visualization and Computer Graphics (TVCG / InfoVis11).
Bezerianos, A., Chevalier, F., Dragicevic, P., Elmqvist, N., & Fekete, J. D. (2010). Graphdice: A system for exploring multivariate social networks. In Proceedings of Eurographics/IEEE-VGTC Symposium on Visualization (Eurovis 2010).
Bezerianos, A., Dragicevic, P., Fekete, J.-D., Bae, J., & Watson, B. (2010). Geneaquilts: A system for exploring large genealogies. IEEE Transactions on Visualization and Computer Graphics (TVCG / InfoVis10).
Bifet, A., & Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. In Proceedings of the SIAM International Conference on Data Mining.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022.
Choi, Y., Lee, H., & Irani, Z. (2016). Big data-driven fuzzy cognitive map for prioritising it service procurement in the public sector. Annals of Operations Research.
Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for LVCSR using rectified linear units and dropout. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8609–8613). IEEE.
Duarte, D., & Ståhl, N. (2018). Machine learning. In A. Said, & V. Torra (Eds.), Data science in practice. Springer.
Friendly, M., & Denis, D. (2005). The early origins and development of the scatterplot. Journal of the History of the Behavioral Sciences, 41(2), 103–130.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228–5235.
Grn, B., & Hornik, K. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, Articles, 40(13), 1–30.
Inselberg, A. (1985). The plane with parallel coordinates. Visual Computer, 1(4), 69–91.
Kim, G.-H., Trimi, S., & Chung, J.-H. (2014). Big-data applications in the government sector. Communications of the ACM, 57(3), 78–85.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (pp. 1097–1105).
Polikar, R. (2006). Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 6(3), 21–45.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
Snidaro, L., Garcia, J., Llinas, J., & Blasch, E. (Eds.). (2016). Context-enhanced information fusion: Boosting real-world performance with domain knowledge. Cham, Switzerland: Springer. OCLC: 951075950.
Sonka, M., Hlavac, V., & Boyle, R. (2014). Image processing, analysis, and machine vision. Cengage Learning.
Steed, C., Shipman, G., Thornton, P., Ricciuto, D., Erickson, D., & Branstetter, M. (2012). Practical application of parallel coordinates for climate model analysis. In: International conference on computational science, data mining in earth science.
Viau, C., Mcguffin, M. J., Chiricota, Y., & Jurisica, I. (2010). The FlowVizMenu and parallel scatterplot matrix: Hybrid multidimensional visualizations for network exploration. IEEE Transactions on Visualization and Computer Graphics.
Yuan, P., Guo, H., Xiao, H., Zhou, H., & Qu, X. (2010). Scattering points in parallel coordinates. IEEE Transactions on Visualization and Computer Graphics, 15(6), 1001–1008.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In: European conference on computer vision (pp. 818–833). Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Bae, J., Karlsson, A., Mellin, J., Ståhl, N., Torra, V. (2019). Complex Data Analysis. In: Said, A., Torra, V. (eds) Data Science in Practice. Studies in Big Data, vol 46. Springer, Cham. https://doi.org/10.1007/978-3-319-97556-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-97556-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97555-9
Online ISBN: 978-3-319-97556-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)