Abstract
In this paper, we describe two users’ group discovery methods among LinkedIn public profiles. We start by clustering profiles according to their professional background. In this sense, we combine the so-called K-means technique with the gap statistics method and use tag clouds to scrutinize the obtained groups. The second phase of this work consists in classifying the same profiles by relying on a knowledge base. In this context, we design a support-vector-machines multi-label classifier that takes advantage of the LinkedIn job Ads taxonomy. We finally contrast results of both methods and provide insights about the trending professional orientations of the workforce from an online perspective.
Similar content being viewed by others
Notes
Elements between angle brackets are non-terminals and \((.)^+\) denotes the cardinality of elements between the parentheses which is greater than 1.
\([.]^{?}\) means that the cardinality of the element between brackets is binary.
\(\overline{\lambda }\) designates the complement of \(\lambda\): the set of labels except \(\lambda\)
References
Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp 183–194
Ahmed EB, Nabli A, Gargouri F (2014) Group extraction from professional social network using a new semi-supervised hierarchical clustering. Knowl Inf Syst 40(1):29–47
Asur S, Huberman BA (2010) Predicting the future with social media. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), vol 1. IEEE, pp 492–499
Baatarjav E-A, Phithakkitnukoon S, Dantu R (2008) Group recommendation system for facebook. In: On the move to meaningful internet systems: OTM 2008 workshops. Springer, pp 211–219
Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, New York
Carley KM (1996) A comparison of artificial and human organizations. J Econ Behav Organ 31(2):175–191
Case T, Gardiner A, Rutner P, Dyer J (2013) A linkedin analysis of career paths of information systems alumni. J South Assoc Inf Syst 1(1)
Dai K, Nespereira CG, Vilas AF, Redondo RPD (2015) Scraping and clustering techniques for the characterization of LinkedIn profiles. In: Proceedings of the fourth international conference on information technology convergence and services, pp 1–15
Dai K, Vilas AF, Redondo RPD (2017) A new MOOCs’ recommendation framework based on LinkedIn data. In: Innovations in smart learning. Springer, Singapore, pp 19–22
Hyun KD, Kim J (2015) Differential and interactive influences on political participation by different types of news activities and political conversation through social media. Comput Hum Behav 45:328–334
Java A, Song X, Finin T, Tseng B (2007). Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, pp 56–65
Joachims T (2002) Learning to classify text using support vector machines: methods, theory and algorithms, vol 668. Springer, Berlin
Jolliffe I (2002) Principal component analysis. Wiley Online Library
Lee D, Jeong O-R, Lee S-G (2008) Opinion mining of customer feedback data on the web. In: Proceedings of the 2nd international conference on ubiquitous information management and communication. ACM, pp 230–235
Lingras P, Huang X (2005) Statistical, evolutionary, and neurocomputing clustering techniques: cluster-based vs object-based approaches. Artif Intell Rev 23(1):3–29
Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media, New York
Michelson M, Macskassy SA (2010) Discovering users’ topics of interest on twitter: a first look. In: Proceedings of the fourth workshop on analytics for noisy unstructured text data. ACM, pp 73–80
Paul JA, Baker HM, Cochran JD (2012) Effect of online social networking on student academic performance. Comput Hum Behav 28(6):2117–2127
Pison G, Struyf A, Rousseeuw PJ (1999) Displaying a clustering with clusplot. Comput Stat Data Anal 30(4):381–392
Raghunathan B (2013) The complete book of data anonymization: from planning to implementation. CRC Press, Boca Raton
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection, vol 589. Wiley, New York
Sheng ML, Hsu C-L, Wu C-C (2011) The asymmetric effect of online social networking attribute-level performance. Ind Manag Data Syst 111(7):1065–1086
Sorower MS (2010) A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis
Sparrow MK (1991) The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw 13(3):251–274
Steinley D (2006) K-means clustering: a half-century synthesis. Br J Math Stat Psychol 59(1):1–34
Tahir MA, Kittler J, Bouridane A (2016) Multi-label classification using stacked spectral kernel discriminant analysis. Neurocomputing 171:127–137
Tang L, Liu H (2010) Community detection and mining in social media. Synth Lect Data Min Knowl Discov 2(1):1–137
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B (Stat Methodol) 63(2):411–423
Valenzuela S (2013) Unpacking the use of social media for protest behavior the roles of information, opinion expression, and activism. Am Behav Sci 57(7):920–942
Van Dijck J (2013) you have one identity: performing the self on facebook and linkedin. Media Cult Soc 35(2):199–215
Wang J, Guo Y (2012) Scrapy-based crawling and user-behavior characteristics analysis on taobao. In: 2012 international conference on cyber-enabled distributed computing and knowledge discovery (CyberC). IEEE, pp 44–52
Wang M, Liu M, Feng S, Wang D, Zhang Y (2014) A novel calibrated label ranking based method for multiple emotions detection in Chinese microblogs. In: Natural language processing and Chinese computing. Springer, Berlin, pp 238–250
Wu Q, Zhou D-X (2006) Analysis of support vector machine classification. J Comput Anal Appl 8(2)
Xu Y, Li Z, Gupta A, Bugdayci A, Bhasin A (2014) Modeling professional similarity by mining professional career trajectories. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1945–1954
Yamaguchi Y, Amagasa T, Kitagawa H (2011) Tag-based user topic discovery using twitter lists. In: 2011 International Conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 13–20
Zaytsev V (2012) Bnf was here: what have we done about the unnecessary diversity of notation for syntactic definitions. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 1910–1915
Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31
Zhang Y, Wu Y, Yang Q (2012) Community discovery in twitter based on user interests. J Comput Inf Syst 8(3):991–1000
Zhang Z, Li Q (2011) Questionholic: hot topic discovery and trend analysis in community question answering systems. Expert Syst Appl 38(6):6848–6855
Acknowledgements
This work is funded by Spanish Ministry of Economy and Competitiveness under the National Science Program (TEC2014-54335-C4-3-R); the European Regional Development Fund (ERDF) and the Galician Regional Government under agreement for funding the Atlantic Research Center for Information and Communication Technologies (AtlantTIC). This work is also partially funded by the European Commission under the Erasmus Mundus GreenIT Project (3772227-1-2012-ES-ERA MUNDUS-EMA21).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dai, K., Vilas, A.F. & Redondo, R.P.D. The workforce analyzer: group discovery among LinkedIn public profiles. J Ambient Intell Human Comput 9, 2025–2034 (2018). https://doi.org/10.1007/s12652-017-0484-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-017-0484-6