Clustering of Social Networking Data Using SparkR in Big Data

Kaur, Navneet; Lal, Niranjan

doi:10.1007/978-981-13-1813-9_22

Navneet Kaur¹⁴ &
Niranjan Lal¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 906))

Included in the following conference series:

International Conference on Advances in Computing and Data Sciences

879 Accesses
3 Citations

Abstract

Due to every day growing amount of data and changing the formats, the storing and management of these data is the challenging task for the organizations. Not long ago, datasets contained thousands of data items. Currently, different technologies can store, manage and process data with increasing volumes of unstructured and heterogeneous data, data of this type are known as Big Data. Big Data is the period for a group of such huge and complicated datasets that makes it problematic to store, manage and process with existing data processing tools. Now, in Big Data, maximum of the data created is not structured. Therefore, the new situations imposed by Big Data present grave challenges at multiple levels, together with clustering problem of these data. Clustering is one of the significant Big Data analysis problems, where very large amount of heterogeneous and unstructured data must be grouped together. Here we have describe the k-mean and hierarchical clustering methods; great attention to k-means method lends itself because it remains one of the most sought-after other approaches and it is also implemented in innovative technologies for analyzing Big Data. This paper describes different categories of data, the management of unstructured data in Big Data and the clustering analysis of social network data using SparkR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tekiner, F., Keane, J.A.: Big data framework. In: International Conference on Systems, Man, and Cybernetics, pp. 1494–1499. IEEE (2013)
Google Scholar
Das, T.K., Mohan Kumar, P.: Big data analytics: a framework for unstructured data analysis. Int. J. Eng. Technol. 5(1), 154–156 (2013)
Google Scholar
Subramaniyaswamy, V., Logesh, R., Indragandhi, V.: Unstructured data analysis on big data using map reduce. In: 2nd International Symposium on Big data and Cloud Computing (ISBCC 2015), pp. 456–465 (2015)
Article Google Scholar
Saint, R., Schaffert, S., Stroka, S., Ferst, R.: Combining unstructured, fully structured and semi-structured information in semantic wikis. In 6th European Semantic Web Conference (ESWC), pp. 1–15 (2009)
Google Scholar
Carlo, B., Daniele, B., Federico, C., Simone, G.: A data quality methodology for heterogeneous data. Int. J. Database Manag. Syst. 3(1), 60–79 (2011)
Article Google Scholar
Blumberg, R., Atre, S.: The problem with unstructured data. DM Rev., 1–6 (2003)
Google Scholar
Ahmed, Z.: Data management and big data text analytics. In: National Conference on Cloud Computing and Big data, pp. 140–144 (2013)
Google Scholar
Griffin, G.K., Klemann, R.: Unlocking value in the fragmented world of big data analytics. Cisco Internet Business Solutions Group (2012)
Google Scholar
Kaisler, S., Armour, F., Alberto Espinosa, J., Money, W.: Big data: issues and challenges moving forward. In: 46th Hawaii International Conference on System Science, pp. 995–1004. IEEE (2013)
Google Scholar
Siddaraju, Sowmya, C.L., Rashmi, K., Rahul, M.: Efficient analysis of big data using map-reduce framework. Int. J. Rec. Dev. Eng. Technol. 2(6), 64–68 (2014)
Google Scholar
Ajin, V.W., Kumar, L.D.: Big data and clustering algorithms. In: International Conference on Research Advances in Integrated Navigation Systems, pp. 1–5. IEEE (2016)
Google Scholar
Kalra, M., Lal, N., Qamar, S.: K-mean clustering algorithm approach for data mining of heterogeneous data. In: Information and Communication Technology for Sustainable Development, pp. 61–70 (2017)
Google Scholar
Kurasova, O., Marcinkevicius, V., Medvedev, V., Rapecka, A., Stefanovic, P.: Strategies for big data clustering. In: 26th International Conference on Tools with Artificial Intelligence, pp. 740–747. IEEE, Limassol (2014)
Google Scholar
Davel, M., Gianey, R.: Different clustering algorithms for big data analytics: a review. In: 5th International Conference on System Modeling and Advancement in Research Trends. IEEE (2016)
Google Scholar
Youtube Dataset for result analysis. ‘https://archive.ics.uci.edu/ml/datasets.html
Srivastava, D.K.: Big challenges in big data research. Data Min. Knowl. Eng. 6(7), 282–286 (2014)
Google Scholar
Lal, N., Qamar, S., Shiwani, S.: Search ranking for heterogeneous data over dataspace. Indian J. Sci. Technol. 9(36), 1–9 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Mody University of Science and Technology, Lakshmangarh, Sikar, Rajasthan, India
Navneet Kaur & Niranjan Lal

Authors

Navneet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Niranjan Lal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Navneet Kaur .

Editor information

Editors and Affiliations

University of KwaZulu-Natal, Durban, South Africa
Mayank Singh
Jaypee University of Information Technology, Solan, India
P. K. Gupta
Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India
Vipin Tyagi
Institute of Information Theory and Automation, Prague 8, Czech Republic
Jan Flusser
University of Ottawa, Ottawa, Canada
Tuncer Ören

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaur, N., Lal, N. (2018). Clustering of Social Networking Data Using SparkR in Big Data. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T. (eds) Advances in Computing and Data Sciences. ICACDS 2018. Communications in Computer and Information Science, vol 906. Springer, Singapore. https://doi.org/10.1007/978-981-13-1813-9_22

Download citation

DOI: https://doi.org/10.1007/978-981-13-1813-9_22
Published: 26 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1812-2
Online ISBN: 978-981-13-1813-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics