Skip to main content
Log in

GraphLMI: A data driven system for exploring labor market information through graph databases

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Labor Market Intelligence (LMI) is an emerging field of study that has been gaining interest as it allows employing Artificial Intelligence (AI) algorithms on labor market information. The goal of LMI is to support decision and policy making activities (e.g., real-time monitoring of Online Job Vacancies (OJV) across countries, forecast skill requested within vacancies, compare similar labor markets across borders, etc.). The European project in which this work is framed can be placed in this field, as it aims at collecting and classifying millions of OJVs from 28 EU Countries, handling 32 languages, and also extracting the requested skills. The result is a huge amount of information useful for understanding labor market dynamics and trends. The goal of this work is to realize a system - namely GraphLMI - that organizes such Labor Market information as a graph, enabling the representation of occupation/skill relevance and similarity over the European Labor Market; another goal is to enrich the European standard taxonomy of occupations and skills (ESCO) to better fit the labor market expectations. We formalize and design the GraphLMI data model, then we implement it as a graph-database, generated by processing 5.3+ million OJVs composed by free text and collected between 2018 and 2019 for France, Germany, and the United Kingdom. Finally, we show how the resulting knowledge can be queried through a declarative query language to understand, compare and evaluate country-based labor market dynamics for supporting policy and decision making activities at European level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Cypher Query 1
Fig. 5
Fig. 6
Cypher Query 2
Fig. 7
Fig. 8
Cypher Query 3
Fig. 9
Cypher Query 4
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The European Centre for the Development of Vocational Training

  2. An Online Job Vacancy (OJV) is a Web page containing a title - that shortly summarizes the job position - and a full description, usually used to advertise the skills a candidate should hold

  3. Notice that occupation and skill taxonomies act like dictionaries, hence they do not provide the importance of occupations and skills

  4. The European Skill, Competences, Qualifications and Occupations https://ec.europa.eu/esco/portal

  5. Publicly available at https://goo.gl/Goluxo

  6. The Commission Communication “A New Skills Agenda for Europe” COM(2016) 381/2, available at https://goo.gl/Shw7bI

  7. O*net is on its own an extension of the Standard Occupational Classification (SOC) system developed by the U.S. Bureau of Labor Statistics

  8. http://www.burning-glass.com/

  9. http://www.wollybi.com

  10. https://www.gartner.com/en/human-resources/research/talentneuron

  11. http://www.workday.com/

  12. https://www.pluralsight.com/

  13. http://www.employinsight.com/

  14. http://www.textkernel.com/

  15. https://cloud.google.com/jobs-api/

  16. A multi-graph is a graph where multiple edges between two nodes are permitted and might be specified through labels

  17. https://tinyurl.com/sv4squr

  18. See https://ec.europa.eu/esco/resources/data/static/model/html/model.xhtml for details about how ESCO is organized

  19. https://www.onetonline.org/

  20. https://tinyurl.com/sv4squr

  21. https://tinyurl.com/w5h35c4

  22. https://tinyurl.com/sxejkq3

  23. https://tinyurl.com/qnzxccj

  24. https://tinyurl.com/r53r798

  25. As term to calculate the term frequency we consider respectively skills, cities, sectors, sites and educational levels.

  26. The idea is inspired by [3] though they compute the weight of triples through arithmetic functions

  27. https://goo.gl/5FZS3E

References

  1. Alabdulkareem A, Frank MR, Sun L, AlShebli B, Hidalgo C, Rahwan I (2018) Unpacking the polarization of workplace skills. Sci Adv 4(7)

  2. Angles R, Gutierrez C (2008) Survey of graph database models. ACM Comput Surv (CSUR) 40(1):1

    Article  Google Scholar 

  3. Barrat A, Barthelemy M, Pastor-Satorras R, Vespignani A (2004) The architecture of complex weighted networks. Proc Natl Acad Sci U S A 101 (11):3747–3752

    Article  Google Scholar 

  4. Bergamaschi S, Carlini E, Ceci M, Furletti B, Giannotti F, Malerba D, Mezzanzanica M, Monreale A, Pasi G, Pedreschi D et al (2016) Big data research in italy: a perspective. Engineering 2(2):163–170

    Article  Google Scholar 

  5. Bonifati A, Ciucanu R, Lemay A (2015) Learning Path Queries on Graph Databases. In: 18Th international conference on extending database technology (EDBT). Bruxelles. note=https://doi.org/10.5441/002/edbt.2015.11

  6. Bonifati A, Fletcher G, Voigt H, Yakovets N (2018) Querying graphs. Synthesis Lect Data Manag 10(3):1–184

    Article  Google Scholar 

  7. Borgwardt KM, Kriegel HP (2005) Shortest-path kernels on graphs. In: Fifth IEEE international conference on data mining (ICDM’05). IEEE, pp 8

  8. Boselli R, Cesarini M, Mercorio F, Mezzanzanica M (2013) Inconsistency knowledge discovery for longitudinal data management: A model-based approach. In: Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data - Third International Workshop, HCI-KDD 2013, Held at SouthCHI 2013, Maribor, Proceedings, pp 183–194. https://doi.org/10.1007/978-3-642-39146-0_17

  9. Boselli R, Cesarini M, Mercorio F, Mezzanzanica M (2017) Using machine learning for labour market intelligence. ECML PKDD 2017: Machine Learning and Knowledge Discovery in Database, pp 330–342

  10. Boselli R, Cesarini M, Marrara S, Mercorio F, Mezzanzanica M, Pasi G, Viviani M (2018) Wolmis: a labor market intelligence system for classifying web job vacancies. J Intell Inf Syst 51(3):477–502

    Article  Google Scholar 

  11. Boselli R, Cesarini M, Mercorio F, Mezzanzanica M (2018) Classifying online job advertisements through machine learning. Futur Gener Comput Syst 86:319–328

    Article  Google Scholar 

  12. CEDEFOP (2014) Real-time labour market information on skill requirements: feasibility study and working prototype. https://goo.gl/qNjmrn

  13. CEDEFOP (2016) Real-time labour market information on skill requirements: Setting up the eu system for online vacancy analysis. https://goo.gl/5{FZS}3e

  14. Chikhaoui B, Chiazzaro M, Wang S (2015) A new granger causal model for influence evolution in dynamic social networks: The case of DBLP. In: AAAI

  15. Chung FR, Graham FC (1997) Spectral graph theory. 92 American Mathematical Soc

  16. Colombo E, Mercorio F, Mezzanzanica M (2019) Ai meets labor market: Exploring the link between automation and skills. Information Economics and Policy 47. https://doi.org/10.1016/j.infoecopol.2019.05.003, http://www.sciencedirect.com/science/article/pii/S0167624518301318

  17. Davoudian A, Chen L, Liu M (2018) A survey on nosql stores. ACM Comput Surv (CSUR) 51(2):40

    Google Scholar 

  18. Durand GC, Janardhana A, Pinnecke M, Shakeel Y, Kru̇ger J, Leich T, Saake G (2018) Exploring large scholarly networks with hermes. In: EDBT

  19. Durrett R (2007) Random graph dynamics, vol 200. Cambridge University Press, Cambridge

  20. Fleming M, Clarke W, Das S, Phongthiengtham P, Reddy p. (2019) The future of work: How new technologies are transforming tasks

  21. Francis N, Green A, Guagliardo P, Libkin L, Lindaaker T, Marsault V, Plantikow S, Rydberg M, Selmer P, Taylor A (2018) Cypher: an evolving query language for property graphs. In: Proceedings of the 2018 International Conference on Management of Data. ACM, pp 1433–1445

  22. Frey CB, Osborne MA (2017) The future of employment: How susceptible are jobs to computerisation? Technol Forecast Soc Change 114(Supplement C):254–280

  23. Gupta S, Varma V (2017) Scientific article recommendation by using distributed representations of text and graph. In: Proceedings of the 26th international conference on world wide web companion, pp 1267–1268

  24. Javed F, Hoang P, Mahoney T, McNair M (2017) Large-scale occupational skills normalization for online recruitment. In: Twenty-ninth IAAI conference

  25. Katarya R, Verma OP (2018) Efficient music recommender system using context graph and particle swarm. Multimed Tools Appl 77(2):2673–2687

    Article  Google Scholar 

  26. Khurana U, Deshpande A (2016) Storing and analyzing historical graph data at scale. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, pp 65–76. https://doi.org/10.5441/002/edbt.2016.09

  27. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907

  28. Li H, Guan Y, Liu L, Wang F, Wang L (2016) Re-ranking for microblog retrieval via multiple graph model. Multimed Tools Appl 75(15):8939–8954

    Article  Google Scholar 

  29. Liu L, Zhu F, Jiang M, Han J, Sun L, Yang S (2012) Mining diversity on social media networks. Multimed Tools Appl 56(1):179–205

    Article  Google Scholar 

  30. Lovaglio PG, Cesarini M, Mercorio F, Mezzanzanica M (2018) Skills in demand for ICT and statistical occupations: Evidence from web-based job vacancies. Stat Anal Data Min 11(2):78–91. https://doi.org/10.1002/sam.11372

    Article  MathSciNet  Google Scholar 

  31. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp 135–146

  32. Mercorio F, Mezzanzanica M, Moscato V, Picariello A, Sperlì G (2019) Dico: a graph-db framework for community detection on big scholarly data. IEEE Transactions on Emerging Topics in Computing:1–1. https://doi.org/10.1109/TETC.2019.2952765

  33. Mercorio F, Mezzanzanica M, Moscato V, Picariello A, Sperlì G (2020) A tool for researchers: Querying big scholarly data through graph databases. In: Brefeld U., Fromont E., Hotho A., Knobbe A., Maathuis M., Robardet C. (eds) Machine learning and knowledge discovery in databases. ECML PKDD 2019. Lecture notes in computer science, vol 11908. Springer, Cham

  34. Mezzanzanica M, Boselli R, Cesarini M, Mercorio F (2011) Data quality through model checking techniques. In: Advances in intelligent data analysis X - 10th international symposium, IDA 2011, pp 270–281

  35. Mezzanzanica M, Boselli R, Cesarini M, Mercorio F (2012) Data quality sensitivity analysis on aggregate indicators. In: DATA 2012 - Proceedings of the International Conference on Data Technologies and Applications, pp 97–108

  36. Mezzanzanica M, Boselli R, Cesarini M, Mercorio F (2015) A model-based approach for developing data cleansing solutions. J Data Inf Qual 5 (4):13:1–13:28. https://doi.org/10.1145/2641575

    Google Scholar 

  37. Mezzanzanica M, Mercorio F, Cesarini M, Moscato V, Picariello A (2018) Graphdblp: a system for analysing networks of computer scientists through graph databases. Multimed Tools Appl 77(14):18,657–18,688

  38. Papoutsoglou M, Ampatzoglou A, Mittas N, Angelis L (2019) Extracting knowledge from on-line sources for software engineering labor market: A mapping study. IEEE Access

  39. Robinson I, Webber J, Eifrem E (2015) Graph Databases. New opportunities for connected data, 2nd edn. O’Reilly Media, Inc.

  40. Stonebraker M (2010) Sql databases v. nosql databases. Commun ACM 53(4):10–11

    Article  Google Scholar 

  41. Sung-Hyuk C (2007) Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences

  42. Turrell A, Speigner B, Djumalieva J, Copple D, Thurgood J (2018) Using job vacancies to understand the effects of labour market mismatch on uk output and productivity. Bank of England Working Paper 737

  43. Vicknair C, Macias M, Zhao Z, Nan X, Chen Y, Wilkins D (2010) A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th annual Southeast regional conference, pp 1–6

  44. Vinel M, Ryazanov I, Botov D, Nikolaev I (2019) Experimental comparison of unsupervised approaches in the task of separating specializations within professions in job vacancies. In: Conference on artificial intelligence and natural language. Springer, pp 99–112

  45. Wang S, Cuomo S, Mei G, Cheng W, Xu N (2019) Efficient method for identifying influential vertices in dynamic networks using the strategy of local detection and updating. Futur Gener Comput Syst 91:10–24

    Article  Google Scholar 

  46. Xiao L, Wang S, Mei G (2020) Efficient parallel algorithm for detecting influential nodes in large biological networks on the graphics processing unit. Future Generation Computer Systems

  47. Yao W, He J, Huang G, Cao J, Zhang Y (2015) A graph-based model for context-aware recommendation using implicit feedback data. World Wide Web 18(5):1351–1371

    Article  Google Scholar 

  48. Zhao S, Gao Y, Ding G, Chua TS (2017) Real-time multimedia social event detection in microblog. IEEE Trans Cybern 48(11):3218–3231

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially supported within the research activity of a EU Project entitled ”Real-time Labor Market information on Skill Requirements: Setting up the EU system for online vacancy analysisFootnote 27 in which some of the authors are involved as coordinators and researchers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Giabelli.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Giabelli, A., Malandri, L., Mercorio, F. et al. GraphLMI: A data driven system for exploring labor market information through graph databases. Multimed Tools Appl 81, 3061–3090 (2022). https://doi.org/10.1007/s11042-020-09115-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09115-x

Keywords

Navigation