The workforce analyzer: group discovery among LinkedIn public profiles

Dai, Kais; Vilas, Ana Fernández; Redondo, Rebeca P. Díaz

doi:10.1007/s12652-017-0484-6

The workforce analyzer: group discovery among LinkedIn public profiles

Original Research
Published: 08 April 2017

Volume 9, pages 2025–2034, (2018)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Kais Dai¹,
Ana Fernández Vilas¹ &
Rebeca P. Díaz Redondo¹

750 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, we describe two users’ group discovery methods among LinkedIn public profiles. We start by clustering profiles according to their professional background. In this sense, we combine the so-called K-means technique with the gap statistics method and use tag clouds to scrutinize the obtained groups. The second phase of this work consists in classifying the same profiles by relying on a knowledge base. In this context, we design a support-vector-machines multi-label classifier that takes advantage of the LinkedIn job Ads taxonomy. We finally contrast results of both methods and provide insights about the trending professional orientations of the workforce from an online perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Profiling Web users using big data

Article 22 March 2018

Automatically Learning a Human-Resource Ontology from Professional Social-Network Data

A Professional Competence Measurement Approach Based on a Modified PageRank Algorithm

Notes

https://press.linkedin.com/about-linkedin.
Elements between angle brackets are non-terminals and \((.)^+\) denotes the cardinality of elements between the parentheses which is greater than 1.
\([.]^{?}\) means that the cardinality of the element between brackets is binary.
www.linkedin.com/jobs/view-all.
\(\overline{\lambda }\) designates the complement of \(\lambda\): the set of labels except \(\lambda\)
http://www.coursera.org.
http://www.udemy.com.

References

Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp 183–194
Ahmed EB, Nabli A, Gargouri F (2014) Group extraction from professional social network using a new semi-supervised hierarchical clustering. Knowl Inf Syst 40(1):29–47
Article Google Scholar
Asur S, Huberman BA (2010) Predicting the future with social media. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), vol 1. IEEE, pp 492–499
Baatarjav E-A, Phithakkitnukoon S, Dantu R (2008) Group recommendation system for facebook. In: On the move to meaningful internet systems: OTM 2008 workshops. Springer, pp 211–219
Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, New York
MATH Google Scholar
Carley KM (1996) A comparison of artificial and human organizations. J Econ Behav Organ 31(2):175–191
Article Google Scholar
Case T, Gardiner A, Rutner P, Dyer J (2013) A linkedin analysis of career paths of information systems alumni. J South Assoc Inf Syst 1(1)
Dai K, Nespereira CG, Vilas AF, Redondo RPD (2015) Scraping and clustering techniques for the characterization of LinkedIn profiles. In: Proceedings of the fourth international conference on information technology convergence and services, pp 1–15
Dai K, Vilas AF, Redondo RPD (2017) A new MOOCs’ recommendation framework based on LinkedIn data. In: Innovations in smart learning. Springer, Singapore, pp 19–22
Google Scholar
Hyun KD, Kim J (2015) Differential and interactive influences on political participation by different types of news activities and political conversation through social media. Comput Hum Behav 45:328–334
Article Google Scholar
Java A, Song X, Finin T, Tseng B (2007). Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, pp 56–65
Joachims T (2002) Learning to classify text using support vector machines: methods, theory and algorithms, vol 668. Springer, Berlin
Book Google Scholar
Jolliffe I (2002) Principal component analysis. Wiley Online Library
Lee D, Jeong O-R, Lee S-G (2008) Opinion mining of customer feedback data on the web. In: Proceedings of the 2nd international conference on ubiquitous information management and communication. ACM, pp 230–235
Lingras P, Huang X (2005) Statistical, evolutionary, and neurocomputing clustering techniques: cluster-based vs object-based approaches. Artif Intell Rev 23(1):3–29
Article Google Scholar
Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media, New York
MATH Google Scholar
Michelson M, Macskassy SA (2010) Discovering users’ topics of interest on twitter: a first look. In: Proceedings of the fourth workshop on analytics for noisy unstructured text data. ACM, pp 73–80
Paul JA, Baker HM, Cochran JD (2012) Effect of online social networking on student academic performance. Comput Hum Behav 28(6):2117–2127
Article Google Scholar
Pison G, Struyf A, Rousseeuw PJ (1999) Displaying a clustering with clusplot. Comput Stat Data Anal 30(4):381–392
Article Google Scholar
Raghunathan B (2013) The complete book of data anonymization: from planning to implementation. CRC Press, Boca Raton
Book Google Scholar
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Article MathSciNet Google Scholar
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection, vol 589. Wiley, New York
MATH Google Scholar
Sheng ML, Hsu C-L, Wu C-C (2011) The asymmetric effect of online social networking attribute-level performance. Ind Manag Data Syst 111(7):1065–1086
Article Google Scholar
Sorower MS (2010) A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis
Google Scholar
Sparrow MK (1991) The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw 13(3):251–274
Article Google Scholar
Steinley D (2006) K-means clustering: a half-century synthesis. Br J Math Stat Psychol 59(1):1–34
Article MathSciNet Google Scholar
Tahir MA, Kittler J, Bouridane A (2016) Multi-label classification using stacked spectral kernel discriminant analysis. Neurocomputing 171:127–137
Article Google Scholar
Tang L, Liu H (2010) Community detection and mining in social media. Synth Lect Data Min Knowl Discov 2(1):1–137
Article MathSciNet Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B (Stat Methodol) 63(2):411–423
Article MathSciNet Google Scholar
Valenzuela S (2013) Unpacking the use of social media for protest behavior the roles of information, opinion expression, and activism. Am Behav Sci 57(7):920–942
Article Google Scholar
Van Dijck J (2013) you have one identity: performing the self on facebook and linkedin. Media Cult Soc 35(2):199–215
Article Google Scholar
Wang J, Guo Y (2012) Scrapy-based crawling and user-behavior characteristics analysis on taobao. In: 2012 international conference on cyber-enabled distributed computing and knowledge discovery (CyberC). IEEE, pp 44–52
Wang M, Liu M, Feng S, Wang D, Zhang Y (2014) A novel calibrated label ranking based method for multiple emotions detection in Chinese microblogs. In: Natural language processing and Chinese computing. Springer, Berlin, pp 238–250
Google Scholar
Wu Q, Zhou D-X (2006) Analysis of support vector machine classification. J Comput Anal Appl 8(2)
Xu Y, Li Z, Gupta A, Bugdayci A, Bhasin A (2014) Modeling professional similarity by mining professional career trajectories. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1945–1954
Yamaguchi Y, Amagasa T, Kitagawa H (2011) Tag-based user topic discovery using twitter lists. In: 2011 International Conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 13–20
Zaytsev V (2012) Bnf was here: what have we done about the unnecessary diversity of notation for syntactic definitions. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 1910–1915
Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31
Article Google Scholar
Zhang Y, Wu Y, Yang Q (2012) Community discovery in twitter based on user interests. J Comput Inf Syst 8(3):991–1000
Google Scholar
Zhang Z, Li Q (2011) Questionholic: hot topic discovery and trend analysis in community question answering systems. Expert Syst Appl 38(6):6848–6855
Article Google Scholar

Download references

Acknowledgements

This work is funded by Spanish Ministry of Economy and Competitiveness under the National Science Program (TEC2014-54335-C4-3-R); the European Regional Development Fund (ERDF) and the Galician Regional Government under agreement for funding the Atlantic Research Center for Information and Communication Technologies (AtlantTIC). This work is also partially funded by the European Commission under the Erasmus Mundus GreenIT Project (3772227-1-2012-ES-ERA MUNDUS-EMA21).

Author information

Authors and Affiliations

Information and Computing Laboratory, AtlantTIC Research Center, University of Vigo, Vigo, Spain
Kais Dai, Ana Fernández Vilas & Rebeca P. Díaz Redondo

Authors

Kais Dai
View author publications
You can also search for this author in PubMed Google Scholar
Ana Fernández Vilas
View author publications
You can also search for this author in PubMed Google Scholar
Rebeca P. Díaz Redondo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kais Dai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dai, K., Vilas, A.F. & Redondo, R.P.D. The workforce analyzer: group discovery among LinkedIn public profiles. J Ambient Intell Human Comput 9, 2025–2034 (2018). https://doi.org/10.1007/s12652-017-0484-6

Download citation

Received: 20 November 2016
Accepted: 30 March 2017
Published: 08 April 2017
Issue Date: November 2018
DOI: https://doi.org/10.1007/s12652-017-0484-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The workforce analyzer: group discovery among LinkedIn public profiles

Abstract

Access this article

Similar content being viewed by others

Profiling Web users using big data

Automatically Learning a Human-Resource Ontology from Professional Social-Network Data

A Professional Competence Measurement Approach Based on a Modified PageRank Algorithm

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The workforce analyzer: group discovery among LinkedIn public profiles

Abstract

Access this article

Similar content being viewed by others

Profiling Web users using big data

Automatically Learning a Human-Resource Ontology from Professional Social-Network Data

A Professional Competence Measurement Approach Based on a Modified PageRank Algorithm

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation