Skip to main content
Log in

The workforce analyzer: group discovery among LinkedIn public profiles

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In this paper, we describe two users’ group discovery methods among LinkedIn public profiles. We start by clustering profiles according to their professional background. In this sense, we combine the so-called K-means technique with the gap statistics method and use tag clouds to scrutinize the obtained groups. The second phase of this work consists in classifying the same profiles by relying on a knowledge base. In this context, we design a support-vector-machines multi-label classifier that takes advantage of the LinkedIn job Ads taxonomy. We finally contrast results of both methods and provide insights about the trending professional orientations of the workforce from an online perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://press.linkedin.com/about-linkedin.

  2. Elements between angle brackets are non-terminals and \((.)^+\) denotes the cardinality of elements between the parentheses which is greater than 1.

  3. \([.]^{?}\) means that the cardinality of the element between brackets is binary.

  4. www.linkedin.com/jobs/view-all.

  5. \(\overline{\lambda }\) designates the complement of \(\lambda\): the set of labels except \(\lambda\)

  6. http://www.coursera.org.

  7. http://www.udemy.com.

References

  • Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp 183–194

  • Ahmed EB, Nabli A, Gargouri F (2014) Group extraction from professional social network using a new semi-supervised hierarchical clustering. Knowl Inf Syst 40(1):29–47

    Article  Google Scholar 

  • Asur S, Huberman BA (2010) Predicting the future with social media. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), vol 1. IEEE, pp 492–499

  • Baatarjav E-A, Phithakkitnukoon S, Dantu R (2008) Group recommendation system for facebook. In: On the move to meaningful internet systems: OTM 2008 workshops. Springer, pp 211–219

  • Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, New York

    MATH  Google Scholar 

  • Carley KM (1996) A comparison of artificial and human organizations. J Econ Behav Organ 31(2):175–191

    Article  Google Scholar 

  • Case T, Gardiner A, Rutner P, Dyer J (2013) A linkedin analysis of career paths of information systems alumni. J South Assoc Inf Syst 1(1)

  • Dai K, Nespereira CG, Vilas AF, Redondo RPD (2015) Scraping and clustering techniques for the characterization of LinkedIn profiles. In: Proceedings of the fourth international conference on information technology convergence and services, pp 1–15

  • Dai K, Vilas AF, Redondo RPD (2017) A new MOOCs’ recommendation framework based on LinkedIn data. In: Innovations in smart learning. Springer, Singapore, pp 19–22

    Google Scholar 

  • Hyun KD, Kim J (2015) Differential and interactive influences on political participation by different types of news activities and political conversation through social media. Comput Hum Behav 45:328–334

    Article  Google Scholar 

  • Java A, Song X, Finin T, Tseng B (2007). Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, pp 56–65

  • Joachims T (2002) Learning to classify text using support vector machines: methods, theory and algorithms, vol 668. Springer, Berlin

    Book  Google Scholar 

  • Jolliffe I (2002) Principal component analysis. Wiley Online Library

  • Lee D, Jeong O-R, Lee S-G (2008) Opinion mining of customer feedback data on the web. In: Proceedings of the 2nd international conference on ubiquitous information management and communication. ACM, pp 230–235

  • Lingras P, Huang X (2005) Statistical, evolutionary, and neurocomputing clustering techniques: cluster-based vs object-based approaches. Artif Intell Rev 23(1):3–29

    Article  Google Scholar 

  • Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media, New York

    MATH  Google Scholar 

  • Michelson M, Macskassy SA (2010) Discovering users’ topics of interest on twitter: a first look. In: Proceedings of the fourth workshop on analytics for noisy unstructured text data. ACM, pp 73–80

  • Paul JA, Baker HM, Cochran JD (2012) Effect of online social networking on student academic performance. Comput Hum Behav 28(6):2117–2127

    Article  Google Scholar 

  • Pison G, Struyf A, Rousseeuw PJ (1999) Displaying a clustering with clusplot. Comput Stat Data Anal 30(4):381–392

    Article  Google Scholar 

  • Raghunathan B (2013) The complete book of data anonymization: from planning to implementation. CRC Press, Boca Raton

    Book  Google Scholar 

  • Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    Article  MathSciNet  Google Scholar 

  • Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection, vol 589. Wiley, New York

    MATH  Google Scholar 

  • Sheng ML, Hsu C-L, Wu C-C (2011) The asymmetric effect of online social networking attribute-level performance. Ind Manag Data Syst 111(7):1065–1086

    Article  Google Scholar 

  • Sorower MS (2010) A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis

    Google Scholar 

  • Sparrow MK (1991) The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw 13(3):251–274

    Article  Google Scholar 

  • Steinley D (2006) K-means clustering: a half-century synthesis. Br J Math Stat Psychol 59(1):1–34

    Article  MathSciNet  Google Scholar 

  • Tahir MA, Kittler J, Bouridane A (2016) Multi-label classification using stacked spectral kernel discriminant analysis. Neurocomputing 171:127–137

    Article  Google Scholar 

  • Tang L, Liu H (2010) Community detection and mining in social media. Synth Lect Data Min Knowl Discov 2(1):1–137

    Article  MathSciNet  Google Scholar 

  • Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B (Stat Methodol) 63(2):411–423

    Article  MathSciNet  Google Scholar 

  • Valenzuela S (2013) Unpacking the use of social media for protest behavior the roles of information, opinion expression, and activism. Am Behav Sci 57(7):920–942

    Article  Google Scholar 

  • Van Dijck J (2013) you have one identity: performing the self on facebook and linkedin. Media Cult Soc 35(2):199–215

    Article  Google Scholar 

  • Wang J, Guo Y (2012) Scrapy-based crawling and user-behavior characteristics analysis on taobao. In: 2012 international conference on cyber-enabled distributed computing and knowledge discovery (CyberC). IEEE, pp 44–52

  • Wang M, Liu M, Feng S, Wang D, Zhang Y (2014) A novel calibrated label ranking based method for multiple emotions detection in Chinese microblogs. In: Natural language processing and Chinese computing. Springer, Berlin, pp 238–250

    Google Scholar 

  • Wu Q, Zhou D-X (2006) Analysis of support vector machine classification. J Comput Anal Appl 8(2)

  • Xu Y, Li Z, Gupta A, Bugdayci A, Bhasin A (2014) Modeling professional similarity by mining professional career trajectories. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1945–1954

  • Yamaguchi Y, Amagasa T, Kitagawa H (2011) Tag-based user topic discovery using twitter lists. In: 2011 International Conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 13–20

  • Zaytsev V (2012) Bnf was here: what have we done about the unnecessary diversity of notation for syntactic definitions. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 1910–1915

  • Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retr 4(1):5–31

    Article  Google Scholar 

  • Zhang Y, Wu Y, Yang Q (2012) Community discovery in twitter based on user interests. J Comput Inf Syst 8(3):991–1000

    Google Scholar 

  • Zhang Z, Li Q (2011) Questionholic: hot topic discovery and trend analysis in community question answering systems. Expert Syst Appl 38(6):6848–6855

    Article  Google Scholar 

Download references

Acknowledgements

This work is funded by Spanish Ministry of Economy and Competitiveness under the National Science Program (TEC2014-54335-C4-3-R); the European Regional Development Fund (ERDF) and the Galician Regional Government under agreement for funding the Atlantic Research Center for Information and Communication Technologies (AtlantTIC). This work is also partially funded by the European Commission under the Erasmus Mundus GreenIT Project (3772227-1-2012-ES-ERA MUNDUS-EMA21).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kais Dai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, K., Vilas, A.F. & Redondo, R.P.D. The workforce analyzer: group discovery among LinkedIn public profiles. J Ambient Intell Human Comput 9, 2025–2034 (2018). https://doi.org/10.1007/s12652-017-0484-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-017-0484-6

Keywords

Navigation